<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1d3 20150301//EN" "http://jats.nlm.nih.gov/publishing/1.1d3/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.1d3" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLOS Complex Syst</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">ploscomlsys</journal-id>
<journal-title-group>
<journal-title>PLOS Complex Systems</journal-title>
</journal-title-group>
<issn pub-type="epub">2837-8830</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.1371/journal.pcsy.0000005</article-id>
<article-id pub-id-type="publisher-id">PCSY-D-24-00007</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v3">
<subject>Computer and information sciences</subject><subj-group><subject>Network analysis</subject><subj-group><subject>Centrality</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Social sciences</subject><subj-group><subject>Linguistics</subject><subj-group><subject>Semantics</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Social sciences</subject><subj-group><subject>Sociology</subject><subj-group><subject>Communications</subject><subj-group><subject>Social communication</subject><subj-group><subject>Social media</subject><subj-group><subject>Twitter</subject></subj-group></subj-group></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Computer and information sciences</subject><subj-group><subject>Network analysis</subject><subj-group><subject>Social networks</subject><subj-group><subject>Social media</subject><subj-group><subject>Twitter</subject></subj-group></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Social sciences</subject><subj-group><subject>Sociology</subject><subj-group><subject>Social networks</subject><subj-group><subject>Social media</subject><subj-group><subject>Twitter</subject></subj-group></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Social sciences</subject><subj-group><subject>Linguistics</subject><subj-group><subject>Sociolinguistics</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Biology and life sciences</subject><subj-group><subject>Neuroscience</subject><subj-group><subject>Cognitive science</subject><subj-group><subject>Cognitive psychology</subject><subj-group><subject>Language</subject></subj-group></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Biology and life sciences</subject><subj-group><subject>Psychology</subject><subj-group><subject>Cognitive psychology</subject><subj-group><subject>Language</subject></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Social sciences</subject><subj-group><subject>Psychology</subject><subj-group><subject>Cognitive psychology</subject><subj-group><subject>Language</subject></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Physical sciences</subject><subj-group><subject>Mathematics</subject><subj-group><subject>Applied mathematics</subject><subj-group><subject>Algorithms</subject></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Research and analysis methods</subject><subj-group><subject>Simulation and modeling</subject><subj-group><subject>Algorithms</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Computer and information sciences</subject><subj-group><subject>Information theory</subject><subj-group><subject>Graph theory</subject><subj-group><subject>Clustering coefficients</subject></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Physical sciences</subject><subj-group><subject>Mathematics</subject><subj-group><subject>Graph theory</subject><subj-group><subject>Clustering coefficients</subject></subj-group></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Computer and information sciences</subject><subj-group><subject>Network analysis</subject><subj-group><subject>Social networks</subject></subj-group></subj-group></subj-group><subj-group subj-group-type="Discipline-v3">
<subject>Social sciences</subject><subj-group><subject>Sociology</subject><subj-group><subject>Social networks</subject></subj-group></subj-group></subj-group></article-categories>
<title-group>
<article-title>How position in the network determines the fate of lexical innovations on Twitter</article-title>
<alt-title alt-title-type="running-head">The social journey of lexical innovations</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" xlink:type="simple">
<name name-style="western">
<surname>Tarrade</surname>
<given-names>Louise</given-names>
</name>
<role content-type="http://credit.niso.org/contributor-roles/conceptualization/">Conceptualization</role>
<role content-type="http://credit.niso.org/contributor-roles/data-curation/">Data curation</role>
<role content-type="http://credit.niso.org/contributor-roles/methodology/">Methodology</role>
<role content-type="http://credit.niso.org/contributor-roles/visualization/">Visualization</role>
<role content-type="http://credit.niso.org/contributor-roles/writing-original-draft/">Writing – original draft</role>
<xref ref-type="aff" rid="aff001"><sup>1</sup></xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author" corresp="yes" xlink:type="simple">
<contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0003-0883-9321</contrib-id>
<name name-style="western">
<surname>Chevrot</surname>
<given-names>Jean-Pierre</given-names>
</name>
<role content-type="http://credit.niso.org/contributor-roles/conceptualization/">Conceptualization</role>
<role content-type="http://credit.niso.org/contributor-roles/funding-acquisition/">Funding acquisition</role>
<role content-type="http://credit.niso.org/contributor-roles/supervision/">Supervision</role>
<role content-type="http://credit.niso.org/contributor-roles/validation/">Validation</role>
<role content-type="http://credit.niso.org/contributor-roles/writing-review-editing/">Writing – review &amp; editing</role>
<xref ref-type="aff" rid="aff001"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff002"><sup>2</sup></xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple">
<contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0003-2954-1751</contrib-id>
<name name-style="western">
<surname>Magué</surname>
<given-names>Jean-Philippe</given-names>
</name>
<role content-type="http://credit.niso.org/contributor-roles/conceptualization/">Conceptualization</role>
<role content-type="http://credit.niso.org/contributor-roles/data-curation/">Data curation</role>
<role content-type="http://credit.niso.org/contributor-roles/funding-acquisition/">Funding acquisition</role>
<role content-type="http://credit.niso.org/contributor-roles/methodology/">Methodology</role>
<role content-type="http://credit.niso.org/contributor-roles/project-administration/">Project administration</role>
<role content-type="http://credit.niso.org/contributor-roles/supervision/">Supervision</role>
<role content-type="http://credit.niso.org/contributor-roles/validation/">Validation</role>
<role content-type="http://credit.niso.org/contributor-roles/writing-review-editing/">Writing – review &amp; editing</role>
<xref ref-type="aff" rid="aff001"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff003"><sup>3</sup></xref>
</contrib>
</contrib-group>
<aff id="aff001"><label>1</label> <addr-line>ICAR laboratory (UMR 5191), École Normale Supérieure de Lyon, France</addr-line></aff>
<aff id="aff002"><label>2</label> <addr-line>LIDILEM laboratory (EA 609), Université Grenoble Alpes, France</addr-line></aff>
<aff id="aff003"><label>3</label> <addr-line>IXXI, Complex Systems Institute, Lyon, France</addr-line></aff>
<contrib-group>
<contrib contrib-type="editor" xlink:type="simple">
<name name-style="western">
<surname>Badham</surname>
<given-names>Jennifer</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"/>
</contrib>
</contrib-group>
<aff id="edit1"><addr-line>Durham University, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND</addr-line></aff>
<author-notes>
<fn fn-type="conflict" id="coi001">
<p>The authors have declared that no competing interests exist.</p>
</fn>
<corresp id="cor001">* E-mail: <email xlink:type="simple">louise.tarrade@ens-lyon.fr</email> (LT); <email xlink:type="simple">jean-pierre.chevrot@univ-grenoble-alpes.fr</email> (J-PC)</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>3</day>
<month>9</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<month>9</month>
<year>2024</year>
</pub-date>
<volume>1</volume>
<issue>1</issue>
<elocation-id>e0000005</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>1</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>9</day>
<month>7</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-year>2024</copyright-year>
<copyright-holder>Tarrade et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="info:doi/10.1371/journal.pcsy.0000005"/>
<abstract>
<p>This study analyzes the diffusion of lexical innovations on Twitter to understand how the social network position of adopters impacts their success. Looking at both successful and failed neologisms, we categorize them into "changes" which become established and "buzzes" which decline over time. Using a corpus of 650 million French tweets, we reconstruct user networks and characterize adopters of innovations during different diffusion phases based on prestige, centrality, clustering, and external ties. In the early innovation phase, change and buzz adopters have similar peripheral profiles. During propagation, changes spread to prestigious, central individuals while buzzes do not, which predicts their eventual success or failure. By the establishment phase, changes reach highly central users with closer external ties. The results align with sociolinguistic theories about weak ties for innovation and strong ties for establishment. Additionally, logistic regression models based on early adopter profiles can predict the fate of innovations. This work sheds light on the diffusion dynamics of online lexical innovations and the crucial role of user network factors.</p>
</abstract>
<abstract abstract-type="summary">
<title>Author summary</title>
<p>In everyday language, words are constantly being created, and these words either persist or disappear. Although this phenomenon has been the subject of much linguistic research, the factors which influence the fate of a new word remain largely unknown, partly because of the difficulty of recording spontaneous language use over time. Examining the varieties of language used on social media allows us to overcome these limitations. We collected over 650 million tweets written in French, covering several years of ordinary interactions between 2.5 million users. We also collected the network of social links between these users. We identified nearly 400 words that appeared in the corpus between 2012 and 2014, and tracked their diffusion over 5 years within the network of users. Some of these words lead to changes, while others generate only ephemeral buzz. By looking at the position in the network of users who adopt these innovations, we show that words adopted by users who are more central in their community and easily in contact with other communities become established in the language, and vice versa. Thus, the position in the network of speakers who adopt these words is enough to predict their fate.</p>
</abstract>
<funding-group>
<award-group id="award001">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100001665</institution-id>
<institution>Agence Nationale de la Recherche</institution>
</institution-wrap>
</funding-source>
<award-id>ANR-10-LABX-0081</award-id>
<principal-award-recipient>
<contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0003-2954-1751</contrib-id>
<name name-style="western">
<surname>Magué</surname>
<given-names>Jean-Philippe</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award002">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100001665</institution-id>
<institution>Agence Nationale de la Recherche</institution>
</institution-wrap>
</funding-source>
<award-id>ANR-15-CE38-0011-03</award-id>
<principal-award-recipient>
<contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0003-2954-1751</contrib-id>
<name name-style="western">
<surname>Magué</surname>
<given-names>Jean-Philippe</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award003">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/501100018692</institution-id>
<institution>École Normale Supérieure de Lyon</institution>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<name name-style="western">
<surname>Tarrade</surname>
<given-names>Louise</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group id="award004">
<funding-source>
<institution-wrap>
<institution-id institution-id-type="funder-id">http://dx.doi.org/10.13039/100012952</institution-id>
<institution>Université Grenoble Alpes</institution>
</institution-wrap>
</funding-source>
<principal-award-recipient>
<contrib-id authenticated="true" contrib-id-type="orcid">https://orcid.org/0000-0003-0883-9321</contrib-id>
<name name-style="western">
<surname>Chevrot</surname>
<given-names>Jean-Pierre</given-names>
</name>
</principal-award-recipient>
</award-group>
<funding-statement>J.-P. M., J.-P. C. and L.T. are grateful to the ASLAN project (ANR-10-LABX-0081, <ext-link ext-link-type="uri" xlink:href="https://aslan.universite-lyon.fr/" xlink:type="simple">https://aslan.universite-lyon.fr/</ext-link>) of the Université de Lyon for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR). The data collection has been supported by the SoSweet ANR project (ANR-15-CE38-0011-03, <ext-link ext-link-type="uri" xlink:href="https://anr.fr/" xlink:type="simple">https://anr.fr/</ext-link>) attributed to J.-P. M. and J.-P. C. The authors are also grateful to University of Grenoble Alpes and Ecole Normale Supérieure de Lyon for the support for publication. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<fig-count count="8"/>
<table-count count="0"/>
<page-count count="20"/>
</counts>
<custom-meta-group>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>Data is available on the following repository Ortolang, that is a French government supported infrastructure for the language data. url: <ext-link ext-link-type="uri" xlink:href="http://www.ortolang.fr/market/corpora/sosweet" xlink:type="simple">www.ortolang.fr/market/corpora/sosweet</ext-link>.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="sec001">
<title>Previous work</title>
<p>Since language evolves within a social context, its usage diversifies according to the heterogeneity and changes in society, and sociolinguistic variation is omnipresent. Different variants of the same form are constantly in competition at all levels of the linguistic structure. Every human being is able to vary his or her way of speaking or to opt for a particular variant depending on whom he or she is addressing, for what purpose and in what context, with a varying degree of consciousness. Variation is the phenomenon observed in synchrony, change is its outcome from a diachronic point of view: "all change (with the exception of certain lexical innovations) results from a situation of variation—but not all variation leads to change [our translation]" (p. 23) [<xref ref-type="bibr" rid="pcsy.0000005.ref001">1</xref>].</p>
<p>As theorised by Weinreich et al. [<xref ref-type="bibr" rid="pcsy.0000005.ref002">2</xref>], variationist sociolinguistics is mainly concerned with explaining the mechanisms of linguistic change and establishing the influence of linguistic, cognitive, cultural and social factors on change. While external pressure and the influence of ones social groups (e.g. class, race, gender) have been shown to be explanatory factors for variation, social ties between individuals are also an important parameter to take into account when looking at the dynamics of the circulation of change. Thus, in his survey in Philadelphia Labov [<xref ref-type="bibr" rid="pcsy.0000005.ref003">3</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref004">4</xref>] establishes a significant correlation, particularly for women, between the use of advanced forms of the sound changes in progress and the structure of the individual’s network. Thus, the people leading the change are people with a certain local prestige, having both a high density of interaction in their local block, but also a large proportion of their friends living outside it. For their part, Milroy &amp; Milroy [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref006">6</xref>] were particularly interested in the influence of network structures on the circulation of sociolinguistic variants. Significant results concerning the relation between linguistic change and network emerge from their study of Belfast. First, they confirm and complete Granovetter’s contribution [<xref ref-type="bibr" rid="pcsy.0000005.ref007">7</xref>] on the importance of weak ties in the transmission of innovations by defining innovators as people with weak ties, peripheral to communities. The denser a network, and therefore the stronger its ties, the more conservative it is regarding the vernacular local norms and the more resistant it will be to change. In contrast, speakers with weaker and more peripheral ties will be less close to these norms and more exposed to external variants. The different variants thus pass from one linguistic community to another through peripheral individuals acting as bridges between the groups. However, according to Milroy &amp; Milroy [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>], the adoption of a variant by individuals who are both central and well-established in the community is essential for its establishment within the community. In addition, before central members adopt it, the variant must be transmitted through a large number of ties as it is less socially risky to accept an innovation that is already widely spread at the margins of the community.</p>
<p>While these studies have considerably highlighted the process of change circulation, they have also revealed a few limitations such as the limited number of speakers considered or the lack of continuous, homogeneous longitudinal data implying a synchronic approach to linguistic change—a process which, by nature, extends over time. Furthermore, sociolinguists historically favoured field surveys—inspired by the sociological approaches—often focusing on phonetic variables.</p>
<p>The diachronic study of linguistic change has thus long been left to the domain of historical linguistics which, by definition, is concerned with long-term changes, often spanning several centuries, and generally of a morphosyntactic nature. Moreover, the corpora on which it relies are written corpora often reflecting a language much more standardised than oral language. Emerging with the digital age, computational sociolinguistics [<xref ref-type="bibr" rid="pcsy.0000005.ref008">8</xref>], applied to social media, allows us to study less standardised varieties of language, which are highly propitious to variation and innovation, both synchronically and diachronically. The focus on media has increased the amount of attention paid to the lexicon, and work on lexical variation and diffusion has flourished [<xref ref-type="bibr" rid="pcsy.0000005.ref009">9</xref>–<xref ref-type="bibr" rid="pcsy.0000005.ref018">18</xref>]. Observations of lexical changes are indeed more tractable on a shorter time scale, "the lexicon [being] the component where change is the quickest (new words are constantly being created), and grammar the most stable, change taking place over a long period of time [our translation]" [<xref ref-type="bibr" rid="pcsy.0000005.ref019">19</xref>]. Furthermore, one can assume that the acceleration and multiplicity of exchanges on social media induce a phenomenon pointed out by Lorenz-Spreen et al. [<xref ref-type="bibr" rid="pcsy.0000005.ref020">20</xref>], namely that the ever-faster dissemination and consumption of information leads to a decrease in the collective attention span given to it. Consequently, the ever-increasing mass of content can lead to an acceleration of the diffusion process of linguistic innovations, whose fate would also be sealed more quickly.</p>
<p>Computational sociolinguistics has leveraged on social interaction data to address the relationship between the diffusion of linguistic innovations and the network structure connecting individuals. Particular attention has been paid to the importance of weak ties in the introduction of innovation and strong ties in their establishment within the language community. For instance, the innovative nature of information transmitted via weak ties and the greater influence of strong ties has been confirmed by a large-scale study on the transmission of information on Facebook, involving 250 million users [<xref ref-type="bibr" rid="pcsy.0000005.ref021">21</xref>]. At the linguistic level, studies on a short time scale on Twitter and Reddit have shown that the innovators, the people who introduce new linguistic forms, are individuals who have many weak ties and who are more central to the network [<xref ref-type="bibr" rid="pcsy.0000005.ref014">14</xref>]. This is in line with both Milroy’s definition of innovators [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>] and Labov’s definition of linguistic change leaders [<xref ref-type="bibr" rid="pcsy.0000005.ref003">3</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref004">4</xref>] in terms of their centrality. On the other hand, it has been shown that people with strong ties have more influence than others [<xref ref-type="bibr" rid="pcsy.0000005.ref012">12</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref014">14</xref>].</p>
<p>The belonging of individuals to an area of high density in their local network generally results in the maintenance of vernacular forms [<xref ref-type="bibr" rid="pcsy.0000005.ref022">22</xref>] and, in the same way, the more isolated a community is from others, the more its members converge linguistically [<xref ref-type="bibr" rid="pcsy.0000005.ref023">23</xref>]. On the other hand, it is likely, as Milroy &amp; Milroy [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>] suggest, that the adoption of an innovation by individuals strongly embedded in local groups facilitates the spread of the innovation through these more cohesive subgroups and the establishment of this innovation in the linguistic community more generally. Multi-agent simulations effectively showed that while the absence of solitary and very peripheral members in a network leads to a lack of innovation, the absence of people defined as leaders (highly connected agents) prevents variants from stabilizing as norms [<xref ref-type="bibr" rid="pcsy.0000005.ref024">24</xref>].</p>
<p>Other studies have examined the relationship between some structural properties of the network and the circulation of innovations. At the egocentric network level for instance, individuals with smaller networks are more linguistically malleable [<xref ref-type="bibr" rid="pcsy.0000005.ref025">25</xref>] and are therefore more likely to adopt a linguistic innovation. At the level of the network as a whole, the study of the diffusion of neologisms has showed that a larger network as well as dense connections within and between communities increase the number of new words as well as their chances of survival, in contrast to communities fragmented into many local clusters [<xref ref-type="bibr" rid="pcsy.0000005.ref026">26</xref>]. The diffusion of a neologism is also more likely to succeed if it is not limited to a few subgroups of speakers but rather spreads across different speaker communities [<xref ref-type="bibr" rid="pcsy.0000005.ref017">17</xref>].</p>
<p>As we have seen, variationist sociolinguistics has highlighted the fact that the position occupied by speakers in their community can play an important role in the diffusion of linguistic change. In brief, two main theories have emerged about individuals driving change in their local networks: one defining them as people with weak ties, peripheral to their community [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>] and the other as people central to their community, but with many ties outside it [<xref ref-type="bibr" rid="pcsy.0000005.ref004">4</xref>]. As the starting point of linguistic change is complicated to identify, it is likely that these two descriptions simply refer to two different phases in the diffusion of linguistic change. Computational studies on this issue have relied mainly on social media corpora to examine the link between networks and the diffusion of change on a larger scale. In addition to the impact of certain structural properties of the network on the diffusion of linguistic innovations, they have mainly confirmed the role of the margin and weak ties in the introduction of innovations, as well as the influence of strong ties on their stabilization. They have also shown the conservative attitudes towards vernacular norms of more closely-knit groups. <xref ref-type="fig" rid="pcsy.0000005.g001">Fig 1</xref> schematises a hypothetical toy social network formed by 18 individuals, each belonging to one of the three communities represented by the colours green, blue, and yellow. Speakers with a very closed network–such as those belonging to the triads 0-7-14 and 1-12-17, or the tetrad 2-9-10-11 –should therefore tend to be less innovative than others and intervene at a later stage of propagation. Conversely, individuals whose networks are smaller or who are located on the periphery of communities–such as nodes 6, 16 or 13—are more linguistically malleable, less conservative, and therefore more likely to take up innovations and, by extension, to facilitate their circulation. The role played by the centrality of innovators remains slightly unclear at this stage. The research carried out to date, which has focused almost exclusively on English, highlights the importance of links between individuals in the process of diffusion of linguistic innovations and sheds light on certain aspects in its own way, without however offering a complete overview of this phenomenon. Moreover, with a few exceptions, they have generally concentrated on successful innovations, leaving aside unsuccessful innovations.</p>
<fig id="pcsy.0000005.g001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcsy.0000005.g001</object-id>
<label>Fig 1</label>
<caption>
<title>Social network formed by 18 individuals belonging to three different communities.</title>
</caption>
<graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.g001" xlink:type="simple"/>
</fig>
<p>Based on a corpus of tweets in French and a short diachronic observation of the diffusion of successful and unsuccessful lexical innovations from their appearance to their stabilization or decline, we will examine <bold>a)</bold> how the structural properties of their adopters within the social network evolve over time, and <bold>b)</bold> whether the position of the speakers who adopt them at the successive phases of their diffusion can predict the fate of the lexical innovations. Our contribution is to provide a global overview of the circulation of lexical innovations within a social network. Moreover, we work with data in French, a language that is rarely studied in this type of study, where English is over-represented.</p>
</sec>
<sec id="sec002" sec-type="materials|methods">
<title>Materials and methods</title>
<sec id="sec003">
<title>Corpus</title>
<p>For this work, we rely on a corpus of around 650 million tweets in French coming from about 2.5 million users, and spanning the period from 2007 to early 2019, the largest part of which is contained between March 2012 and January 2019. An initial collection of 170 million tweets produced between 2014 and 2017 was collected using the data providers Gnip and Datasift and constitutes the user base of this corpus [<xref ref-type="bibr" rid="pcsy.0000005.ref027">27</xref>]. The selection criteria for the tweets were that they should be written in French and come from the GMT and GMT+1 time zones. In a second phase the corpus was completed—directly via the Twitter API (using the Tweepy library)—by retrieving iteratively the latest tweets of the users having produced this initial corpus, excluding retweets. The corpus was filtered according to language and client used in order to keep only tweets in French and to eliminate as much as possible tweets from bots. For the language, we simply relied on the language of the tweet as automatically identified by twitter. For the bots, we relied on the Twitter clients. Since bots produce very stereotyped tweets, we have kept the clients exhibiting sufficient tweet lengths variability. The list of retained clients and the selection criteria are available at [<xref ref-type="bibr" rid="pcsy.0000005.ref028">28</xref>]. The corpus of tweets is available on the Ortolang platform [<xref ref-type="bibr" rid="pcsy.0000005.ref029">29</xref>].</p>
</sec>
<sec id="sec004">
<title>Lexical innovations</title>
<p>As explained in [<xref ref-type="bibr" rid="pcsy.0000005.ref030">30</xref>], we first selected all the words (i.e. any sequence of alphanumeric characters that can contain an apostrophe or a hyphen) that appeared in the corpus for the first time between March 2012 and February 2014. For each of these words, we then reconstructed their usage trajectory over 5 years from their first appearance, by recovering their usage rate—i.e. the number of people who used this form out of the number of people who tweeted during the month.</p>
<p>For each of the trajectories obtained, we used a curve-fitting method using the LMFIT library for Python to fit them as closely as possible to two functions: the logistic function and the lognormal function. These functions correspond respectively to the ideal theoretical S-shaped trajectory of successful innovations [<xref ref-type="bibr" rid="pcsy.0000005.ref031">31</xref>–<xref ref-type="bibr" rid="pcsy.0000005.ref034">34</xref>] and the skewed bell-shaped trajectory of innovations whose use, after a growth phase, declines rather than stabilizes. We then used the adjustment output parameters to retain the words whose trajectory of use over 5 years most closely obeyed one or other of these laws.</p>
<p>A manual filtering stage was then necessary to remove the named entities from the almost 500 words retained. In the end, we have two types of lexical innovation:</p>
<list list-type="order">
<list-item><p>The changes correspond to lexical innovations whose monthly trajectory of use follows a (logistic) S-shaped curve. It is possible to identify three distinct phases in the diffusion of this type of innovation: an initial phase—the innovation phase—during which the usage rate of the word remains at a very low level for a few months, followed by a more or less long propagation phase during which its usage rate takes off exponentially, to finally stabilise in the fixation phase. We identified 141 changes.</p></list-item>
<list-item><p>The buzzes correspond to lexical innovations whose use trajectory per month follows a Gaussian curve. The first two phases of diffusion of innovations categorised as buzz are identical to those observed for changes. However, the last phase shows a significant decline in the rate of use of the word, until it returns to a very low rate; this is what we call the decline phase. The number of buzzes is 251.</p></list-item>
</list>
<p>To automatically delimit the three diffusion phases described above—innovation, propagation, then fixation for changes or decline for buzzes—we used the third derivative of the fitted distribution. More precisely, we looked for its maximums to identify the moments in the trajectory where the acceleration varies the most, delimiting the beginning and end of the propagation phase.</p>
<p><xref ref-type="fig" rid="pcsy.0000005.g002">Fig 2</xref> shows two changes ("rainté" and "malaisante") and two buzzes ("sweg" and "masculiste") identified with this method, their trajectory of use over 5 years, the adjustment to the reference function, and the three phases of diffusion.</p>
<fig id="pcsy.0000005.g002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcsy.0000005.g002</object-id>
<label>Fig 2</label>
<caption>
<title>Two trajectories of lexical innovations.</title>
<p>The usage rate per month of two changes (left) and two buzzes (right) represented by a rolling average with a three-month window (blue), as well as the result of the curve fitting (green). The three diffusion phases are represented by the grey shading in the background [<xref ref-type="bibr" rid="pcsy.0000005.ref030">30</xref>].</p>
</caption>
<graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.g002" xlink:type="simple"/>
</fig>
<p>The scripts used to detecting and categorizing the lexical innovations and the resulting data are available at [<xref ref-type="bibr" rid="pcsy.0000005.ref035">35</xref>].</p>
</sec>
<sec id="sec005">
<title>Control words</title>
<p>In order to characterize the dynamics of lexical innovations in the network of users, we designed a third group of control words whose use is stable. The period and duration taken into account for the control words was matched with the lexical innovations, 5 years from February 2013 to January 2018.</p>
<p>We retrieve all the words of this period with at least 100 occurrences, as well as their number of users per month. Stable words are defined by a five-year usage rate whose standard deviation is below a certain threshold. In order to make this threshold comparable from one word to another, the monthly uses were normalized over the 5 years period. After manual observation of a large sample of words, this threshold was set at 0.007. In parallel, we check that each form has at least as many non-zero values as the linguistic innovation that has the least, in order to avoid words with to long periods with a zero use rate.</p>
<p>We obtain almost 40,000 words from which we randomly select 200 words whose number of users is matched to that of lexical innovations. <xref ref-type="fig" rid="pcsy.0000005.g003">Fig 3</xref> shows a random sample of 20 forms belonging to each of the categories, change, buzz and control word.</p>
<fig id="pcsy.0000005.g003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcsy.0000005.g003</object-id>
<label>Fig 3</label>
<caption>
<title>Examples of changes, buzzes and control words.</title>
</caption>
<graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.g003" xlink:type="simple"/>
</fig>
</sec>
<sec id="sec006">
<title>User network</title>
<p>For each user of the corpus, we have retrieved the list of his followees, i.e. the people he follows. From this information, we reconstructed the static network restricted to the other users of our corpus. We did not rely on mentions to reconstruct the network of users in the corpus because this would have led to the exclusion of the vast majority of users who do not use mentioning. The resulting network counts 2.5 million users and 300 million ties.</p>
<p>From this network, we can then characterize each user according to the following network variables: local clustering coefficient, PageRank score, betweenness centrality and proximity to the outside of the community. The computations of the different network variables—except for the proximity to the outside of the community—were performed using the Python library NetworKit [<xref ref-type="bibr" rid="pcsy.0000005.ref036">36</xref>].</p>
<sec id="sec007">
<title>Clustering coefficient</title>
<p>The local clustering coefficient is the proportion of existing edges between the neighbours of a node among all possible edges. It is a measure whose values are between 0 and 1, and which therefore reflects the degree of openness of a user’s network. A clustering coefficient of 0 means that the neighbours of user <italic>u</italic> have no ties with each other, while a clustering coefficient of 1 would mean that all its neighbours also have ties to each other. Thus, the higher a user’s clustering coefficient, the closer his or her egocentric network is from a clique, i.e. a cohesive subgroup.</p>
<p>People belonging to dense sub-groups of the network with strong ties uniting their members will generally show more linguistic conservatism and be more resistant to change, and their adoption of an innovative variant is crucial to their maintenance within the community [<xref ref-type="bibr" rid="pcsy.0000005.ref006">6</xref>]. To demonstrate the relationship between maintaining vernacular norms and belonging to such a group, [<xref ref-type="bibr" rid="pcsy.0000005.ref022">22</xref>] have measured the strength of integration of nodes into their local group. Other studies have instead mobilised the notion of strength of ties—measured either by remaining as close as possible to its initial definition [<xref ref-type="bibr" rid="pcsy.0000005.ref007">7</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref021">21</xref>], or by inferring it from the interconnection of nodes [<xref ref-type="bibr" rid="pcsy.0000005.ref012">12</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref014">14</xref>] -, generally to highlight the stronger influence of strong ties. Nevertheless, the strength of ties, network density and overlap of egocentric networks are very closely interconnected concepts. In a network, dense sub-groups with strong links between their members generally go hand in hand with overlapping egocentric networks [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>]. The local clustering coefficient therefore seemed to us to be an easier measure to implement on a large network such as ours, and one that indicates, to a certain measure, of whether an individual belongs to a closely linked sub-group.</p>
<p>In this way, user with a very closed network is similar to an individual with strong ties, evolving within a more closed sub-group, and therefore less exposed to innovations coming from outside.</p>
</sec>
<sec id="sec008">
<title>PageRank score</title>
<p>The PageRank score of a user <italic>u</italic> is a measure of the prestige of an individual. This measure depends both on the number of incoming ties of <italic>u</italic>, but also on whether these incoming ties themselves have a high PageRank score. That is to say, a user followed by many people, who are themselves followed by a large number of people, will a priori have a higher PageRank score than a user followed by a larger number of people, but who are themselves followed by very few people.</p>
<p>Applied to our network of Twitter users, we consider this measure to reflect a user’s overall popularity level. This measure of popularity can to some extent be transposed, on a much larger scale, to the notion of prestige as used by Labov in his description of the leaders of linguistic change in Philadelphia [<xref ref-type="bibr" rid="pcsy.0000005.ref003">3</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref004">4</xref>]. In addition, the higher a user’s PageRank score, the more likely it is that the content they produce will be exposed to a greater number of people.</p>
</sec>
<sec id="sec009">
<title>Centrality measure</title>
<p>The measure of centrality for a user here corresponds to their centrality within the community to which they belong. The more central an individual is to his community, the more he acts as a "bridge" between its members. To calculate this score, it was therefore first necessary to detect the communities within our user network. To do this, we used the parallel implementation of the Louvain method [<xref ref-type="bibr" rid="pcsy.0000005.ref037">37</xref>] proposed by NetworKit, which allows us to identify the most densely connected groups in the network. As this method is non-overlapping, it implies that a user can only belong to one community. This shows that the great majority of the network’s users belong to large communities, most of which have hundreds of thousands of individuals.</p>
<p>Betweenness centrality defines the centrality of a node as the number of times it is on the shortest path between two other nodes in the network. As the complexity of its computation increases strongly with the size of the network, we use approximate centrality measures for communities with more than 10,000 nodes, and exact centrality for the remaining, smaller communities. We use for this the parallel implementation of the KADABRA algorithm [<xref ref-type="bibr" rid="pcsy.0000005.ref038">38</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref039">39</xref>] provided by NetworKit. For each community, we calculate the centrality measures of its users by considering the network as an undirected graph. Since the centrality scores obtained in this way depends on the size of the community, they are not comparable from one community to another. For this reason, for each community, the set of centrality values obtained for each of its users has been standardised so that the median of this set is equal to 0 and the interquartile range (IQR = Q3—Q1) to 1. The scaled centrality measures can then be compared between users from different communities. It should be noted that we observe a slight correlation between the centrality measures thus obtained and the PageRank scores (Spearman correlation: 0.59).</p>
<p>While [<xref ref-type="bibr" rid="pcsy.0000005.ref014">14</xref>] have explored several measures of centrality to define the importance of a node in their social network, we will focus exclusively on betweenness centrality. In addition to the fact that the size of our network—more than 300 million ties—does not reasonably allow us to calculate all possible network measures, we believe that this measure is the one that comes closest to centrality in Labov’s sense. For Labov, the notion of centrality refers to important people in their local community, who are often mentioned by the other inhabitants of the block, and who are strongly involved in local life [<xref ref-type="bibr" rid="pcsy.0000005.ref004">4</xref>]. These people therefore act as a bridge within their local community, which is what betweenness centrality allows us to measure at the scale of the communities in our network.</p>
</sec>
<sec id="sec010">
<title>Proximity to the community outside</title>
<p>We designed the last network variable, that indicates how fast a user is able to get in touch with a different community than his own. More precisely, from each node in the network, 10,000 random walks are performed, and for each of them we keep the number of steps that it was necessary to take before arriving in another community. The average of these 10,000 values thus obtained constitutes the final score attributed to the user for this variable.</p>
<p>The smaller the average number of steps of a user, the more directly he is in contact with another community. However, if he is located close to another community, this does not mean that he is more isolated in his own. The same user can have a central position within his community, but still have quick connections with people outside the community. We also observe a Spearman correlation of only -0.14 between these two variables.</p>
<p>This measure of proximity to the community outside is intended to reflect in part the profile of innovators described by Milroy [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>], who are likely to bring innovations to their community through more direct ties with other communities.</p>
<p>Each of the users in the corpus is therefore characterized according to this set of four network variables giving information about the degree of openness of their egocentric network, their relative prestige, their centrality within their community, and their proximity to the outside of the community.</p>
</sec>
</sec>
<sec id="sec011">
<title>Comparison of the distributions of the different network variables at the three diffusion phases and prediction</title>
<sec id="sec012">
<title>Characterisation of words</title>
<p>Contrary to what was previously initiated in [<xref ref-type="bibr" rid="pcsy.0000005.ref030">30</xref>], we do not aggregate all the users who have used a word of a given category (e.g. buzz) at a given phase of diffusion, but each word is characterized independently. We take the view that although the set of words making up a category of lexical innovation (buzz or change) follows a global dynamic, each word nevertheless has its own dynamics. Users of innovations such as morphological derivations may not be exactly the same as users of phonetic spellings or lengthenings. Analyzing the distributions for each variable at the word level rather than aggregating users by innovation type allows us to avoid overlooking the different dynamics that may exist within the same category of innovations.</p>
<p>For each diffusion phase—innovation, propagation, fixation or decline—and for each network variable, we characterize each buzz and change in the following way: for each word <italic>w</italic> and each network variable <italic>v</italic>, we retrieve the months corresponding to the diffusion phase <italic>p</italic> considered. We then retrieve all users <italic>u</italic> who adopted <italic>w</italic> for the first time during the period covered by <italic>p</italic>. Then, for each of these adopters, we retrieve the value of <italic>v</italic> that corresponds to it. At this stage, we have a set of values of <italic>v</italic>, corresponding to those of all the users who adopted <italic>w</italic> in phase <italic>p</italic>. The value of <italic>v</italic> that will be attributed to <italic>w</italic> will then be the median of this set, as the distribution of the different network variables does not follow a normal distribution. More formally, the value of a network variable <italic>v</italic> associated with a word <italic>w</italic> at phase <italic>p</italic> can be noted:
<disp-formula id="pcsy.0000005.e001">
<alternatives>
<graphic id="pcsy.0000005.e001g" mimetype="image" position="anchor" xlink:href="info:doi/10.1371/journal.pcsy.0000005.e001" xlink:type="simple"/>
<mml:math display="block" id="M1">
<mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>…</mml:mo><mml:msub><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>)</mml:mo>
</mml:math>
</alternatives>
</disp-formula></p>
<p>Finally, each of the words in each phase is represented by a four-dimensional vector corresponding to the clustering coefficient, the PageRank score, the centrality, and the average number of steps to exit the community.</p>
<p>For control words, the same procedure is used but without distinguishing the different phases of diffusion.</p>
</sec>
<sec id="sec013">
<title>Univariate tests</title>
<p>One of our goals is to characterize the actors of change. This is addressed by comparing phase by phase the distribution of the network variables of the three groups: changes, buzzes and control words. To check the significance of our observations, we use non-parametric tests, given the non-normality of the distributions. More precisely, we use the Kruskal-Wallis test which tests the null hypothesis that the population median of all groups is equal, and then as a post-hoc test the Dunn’s test which allows us to compare each pair of distributions. We applied the Bonferroni adjustment to the Dunn’s test to correct the significance level. In both cases, we set the significance threshold to p&lt;0.05.</p>
</sec>
<sec id="sec014">
<title>Predicting the fate of lexical innovations</title>
<p>We then tried to predict the fate of lexical innovations before their trajectory stabilizes or declines, i.e. as early as the innovation or propagation phase. To do this, we train a logistic regression model using the scikit-learn library on all the lexical innovations in our dataset—i.e. the 141 changes and 251 buzzes. This involves training a model for binary classification: the variable to be predicted is the type of lexical innovation: buzz <italic>vs</italic> change. The explanatory variables are the median values of the set of adopters of each word for each network variable. A first prediction is made with the data characterizing each word in the innovation phase, and a second with the data from the propagation phase.</p>
<p>To ensure that the model results are not biased by the greater number of buzzes than changes, the dataset is reduced to balanced classes by randomly selecting as many buzzes as there are changes. The data is also standardized before training the model, so that all medians are 0 and the IQR is 1. It is then split into training and test data representing 75% and 25% of the data respectively—this represents a training set of 211 items for a test set of 71 items. Given the small number of inputs and the fact that only 60% of the buzzes is considered, we train 10,000 models in this way varying the buzzes. Thus, in the training phase, the changes will always be the same, but the buzzes will vary systematically.</p>
<p>We then evaluate the quality of the prediction on the data in the innovation phase, and then in the propagation phase, by retrieving for each of the 10,000 models the following evaluation metrics: the area under the ROC curve (now AUC), the precision, and the confusion matrices. An AUC score lies between 0 and 1. If it is 0.5, it means that the model predicts as well as the hazard. The precision, also between 0 and 1, corresponds to the average rate of correct predictions. Finally, the confusion matrices give the distribution of true and false positives and true and false negatives. More precisely, we will carry out a Fisher test on each of the matrices obtained to ensure that this distribution is not due to hazard.</p>
<p>The scripts used to calculate the network variables, characterize the words, create the group of control words, and perform the univariate tests and predictions are available at [<xref ref-type="bibr" rid="pcsy.0000005.ref028">28</xref>].</p>
</sec>
</sec>
</sec>
<sec id="sec015" sec-type="results">
<title>Results</title>
<p>We will first ask whether and how the network characteristics of the individuals who adopt lexical innovations differ from those of the users of the control words composing our control group at the different phases of diffusion. At the same time, we will extend this questioning to the level of lexical innovations and ask which network characteristics are the most discriminating between changes and buzzes, always considering the timing of their diffusion. Secondly, we will try to find out whether it is possible to predict the fate of lexical innovations simply based on the four network characteristics of their adopters, described in the previous section.</p>
<sec id="sec016">
<title>Comparison of distributions</title>
<p>The figure below (<xref ref-type="fig" rid="pcsy.0000005.g004">Fig 4</xref>) shows the different distributions of median values used to characterize each word by type, by network variables and by phase of diffusion; each point thus represents a word, and each distribution a word category. Lexical innovations are shown in blue and green, representing changes and buzzes respectively, and control words in yellow. The distribution of the latter does not vary from one phase to another, since we cannot distinguish between different phases.</p>
<fig id="pcsy.0000005.g004" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcsy.0000005.g004</object-id>
<label>Fig 4</label>
<caption>
<title>Distributions of median values.</title>
<p>Distributions of median values characterizing each word by type (in blue the changes, in green the buzzes, and in yellow the control words), by network variable (rows) and by diffusion phase (columns).</p>
</caption>
<graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.g004" xlink:type="simple"/>
</fig>
<p>The results of the univariate tests performed on each set and each pair of distributions are presented in <xref ref-type="fig" rid="pcsy.0000005.g005">Fig 5</xref>, which should therefore be systematically compared with the distributions commented in <xref ref-type="fig" rid="pcsy.0000005.g004">Fig 4</xref>. Non-significant results are indicated by a hatched background. A yellow background indicates that the values in distribution A (top) are globally higher than the values of distribution B (bottom); a green background indicates the opposite. For example, the p-value obtained with Dunn’s test for the centrality of adopters in the fixation phase is 0.0275 and is therefore significant since it is lower than the significance threshold set at 0.05. The green background means that lower centrality values are more often observed for users of control words than for users who adopted a change in the fixation phase.</p>
<fig id="pcsy.0000005.g005" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcsy.0000005.g005</object-id>
<label>Fig 5</label>
<caption>
<title>P-values obtained from the different univariate tests.</title>
</caption>
<graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.g005" xlink:type="simple"/>
</fig>
<p>A first element that can be noted is that the correlation observed between the PageRank scores and user centrality measures emerges particularly well when we look at the graph of distributions, as their dynamics are similar for each category over the diffusion phases. While these variables may seem redundant from this point of view, a Spearman correlation of 0.62 for all phases considered (<xref ref-type="fig" rid="pcsy.0000005.g006">Fig 6</xref>) indicates a positive but moderate correlation. Indeed, it is quite possible to have high prestige but low centrality, as is the case for node 16 in the <xref ref-type="fig" rid="pcsy.0000005.g001">Fig 1</xref>, given that the centrality of a user is calculated in relation to the community to which he belongs. In the same way, a user can be not very central to his community while being very isolated from other communities, like nodes 1, 3 or 17 in <xref ref-type="fig" rid="pcsy.0000005.g001">Fig 1</xref>, or conversely be not very central but almost immediately in contact with other communities, like nodes 5 or 13 for example. Finally, a user can also be central to his or her community while, on average, being in contact with other communities relatively quickly—or not (node 10)—and have a relatively open (node 8) or closed (node 9) egocentric network.</p>
<fig id="pcsy.0000005.g006" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcsy.0000005.g006</object-id>
<label>Fig 6</label>
<caption>
<title>Spearman correlations.</title>
<p>Spearman correlations between the different variables—all phases and all types of words considered.</p>
</caption>
<graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.g006" xlink:type="simple"/>
</fig>
<p>In the innovation phase, we do not observe significant differences between the distributions of the clustering coefficient. In the propagation and fixation phase, however, a distinction is observed, lexical innovations having lower clustering coefficient than control words. No evolution is observed between these two phases. Thus, the first adopters of lexical innovations do not differ in the degree of openness of their own network.</p>
<p>If the PageRank scores of lexical innovations are significantly lower than those of control words during the first two phases of diffusion, this difference decreases during the fixation phase regarding changes. Users who adopt lexical innovations during the first two phases of diffusion are less prestigious than normal, particularly regarding the buzzes, whose values remain in the same range from one phase to the next, whereas those of the changes gradually approach those of the control words until they reach their level in the fixation phase. It should be noted that although the buzz adopters have significantly lower PageRank scores than the other two categories in the fixation phase, they are nevertheless higher than those observed in the two previous phases.</p>
<p>While lexical innovations have significantly lower centrality measures than control words in the innovation phase, in the propagation phase the changes stand out from the buzzes by reaching users as central as those of the control words—no significant difference being observed between these two distributions -, while the distribution of buzz adopters remains significantly lower. While the latter, like the PageRank scores, rises in the fixation phase, it remains slightly lower than the other two. The distribution of centrality measures for change adopters is even higher than that of control words. Thus, from the propagation phase onwards, changes, unlike buzz, are adopted by more central users, which would a priori facilitate their diffusion within the community.</p>
<p>In the innovation phase, the distributions of the average number of steps of the lexical innovations are lower than those of the control words, while not being distinguished from each other. The lexical innovations are therefore initially adopted by users who can generally reach outside their community more quickly, which facilitates their subsequent dissemination. Indeed, when we look at the distribution of these values in the propagation phase, the distribution of changes has not really changed, whereas the distribution of buzzes increases significantly, until it is positioned at a higher level than that of the control words. While the position of the distributions remains almost identical in the fixation phase, that of the changes is concentrated around lower values.</p>
<p>What emerges from these observations is that the first adopters of both successful changes and unsuccessful buzzes have similar network profiles. These innovators tend to be less prestigious and more peripheral compared to average users. This effect is even more pronounced for buzzes. Innovators can also reach outside their communities more easily. This likely helps facilitate the future diffusion of these new terms. This similarity fades in the propagation phase, where changes succeed in reaching much more prestigious and central users than buzzes, while maintaining a rapid proximity to the outside of the community, which should facilitate their circulation within the community but also outside it. Buzzes, on the other hand, continue to spread, but do not manage to reach more central or prestigious individuals, on the contrary. Moreover, as they are adopted at this phase by users who are less directly connected to different communities, the circulation between them will probably be obstructed later. The fixation phase confirms the dynamics of the changes, which are therefore adopted by people who are as prestigious as the users of control words, slightly more central, but also with an even more direct proximity to the outside of the community than in the propagation phase. While the distribution of prestige and centrality values of adopters during this phase tends to realign with those of changes and control words during their decline phase, buzzes continue to be adopted by less central and less prestigious people, and with a more laborious contact with the outside of their community. Finally, if the distribution of clustering coefficients is discriminating between lexical innovations and control words, the fact of being adopted by users with a more open network is characteristic of innovations in the last two phases of diffusion.</p>
<sec id="sec017">
<title>Prediction of the fate of lexical innovations</title>
<p>We can now wonder whether these differences we observe between the distributions of median values of adopters of lexical innovations are sufficiently discriminating to allow us to predict, in the innovation or propagation phases, whether a lexical innovation will maintain in the linguistic community, and become a change or, on the contrary, whether its use will eventually decline, thus becoming a buzz.</p>
<p><xref ref-type="fig" rid="pcsy.0000005.g007">Fig 7</xref> shows the results obtained for the precision of the 10,000 prediction models trained by logistic regression, first on the values attributed to changes and buzzes in the innovation phase, in green, and then on those in the propagation phase, in blue. <xref ref-type="fig" rid="pcsy.0000005.g008">Fig 8</xref> allows us to visualize the results of the AUC scores in the same way. Prediction made from the innovation phase are imprecise, with an average precision of 0.56 and an average AUC score of 0.61. If in general the models do slightly better than chance, these scores show that it is not possible to predict the fate of lexical innovations in the innovation phase.</p>
<fig id="pcsy.0000005.g007" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcsy.0000005.g007</object-id>
<label>Fig 7</label>
<caption>
<title>Precision.</title>
<p>Precision obtained by the logistic regression models trained on the 10,000 datasets in the innovation phase (green) and in the propagation phase (blue).</p>
</caption>
<graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.g007" xlink:type="simple"/>
</fig>
<fig id="pcsy.0000005.g008" position="float">
<object-id pub-id-type="doi">10.1371/journal.pcsy.0000005.g008</object-id>
<label>Fig 8</label>
<caption>
<title>AUC scores.</title>
<p>AUC scores obtained by the logistic regression models trained on the 10,000 data sets in the innovation phase (green) and in the propagation phase (blue).</p>
</caption>
<graphic mimetype="image" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.g008" xlink:type="simple"/>
</fig>
<p>If we look at these scores more closely with the confusion matrices resulting from these models trained on the innovation phase data, we can see that, in general, buzzes are slightly easier to predict than changes at this stage, with an average of 61% of buzzes correctly predicted (true positives) versus 52% of changes (true negatives). However, Fisher’s exact tests on these matrices provide p-values greater than 0.05 in almost 80% of cases, which means that when we observe imbalances in the distribution of true/false positives and negatives, these are mostly non-significant. On average, these p-values are around 0.34. However, for the confusion matrices resulting from the models trained using the propagation phase data, this average p-value of the Fisher exact tests is now 8.3e-05 and only 0.03% of the observed ratios between percentages of true/false positives and negatives are non-significant. At this stage, buzzes still seem to be slightly easier to predict than changes, with an average of 83% of buzzes correctly predicted compared to about 79% for changes.</p>
<p>This significant improvement in prediction quality when using the propagation phase data is confirmed by an improvement of precision from 0.56 to 0.81, and an average AUC score of 0.86, which confirms that the classification of lexical innovations as buzz or change at this stage leaves little to chance.</p>
<p>In summary, it would appear that despite the significant differences in the positioning of the buzz and change distributions observed in the innovation phase for PageRank score and centrality measures, it is not possible at this stage to predict what a lexical innovation will become in the future based only on the network characteristics of its first adopters. However, when we rely on the network characteristics of the adopters of innovations in the propagation phase, it becomes quite possible to predict their fate. It is the position in the community and the more or less direct link with the outside of the adopters of an innovation at this stage that seem to seal their fate and favor (or not) their future stabilization.</p>
</sec>
</sec>
</sec>
<sec id="sec018" sec-type="conclusions">
<title>Discussion</title>
<p>In this study, we examined the network characteristics of the users of our corpus and, in particular, whether it was possible to identify a ’typical profile’ of adopters of lexical innovations at each of their diffusion phases. We also wondered whether this profile is different according to the type of lexical innovation, i.e. whether adopters of changes differ from adopters of buzzes. In this way, we seek to highlight the process of diffusion of lexical innovations and the factors in terms of network structure that contribute to their success or failure in a linguistic community, and to determine whether these large-scale results are consistent with or different from those obtained by the field surveys conducted in traditional variationist sociolinguistics, notably by Lesley and James Milroy as well as by William Labov.</p>
<p>First, we established that the initial adopters of lexical innovations are users with relatively similar network characteristics, regardless of whether these innovations later succeed or fail. Contrary to what we might expect, these individuals do not have a more open or closed personal network than average. They have the possibility to be in contact with other communities more quickly, without being central in their own community, nor prestigious within the global network. As such, these observations are largely transposable to those made by Milroy &amp; Milroy [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>] who define innovators as being more peripheral and having ties in several communities. Although Milroy &amp; Milroy [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>] referred to local communities (different parts of Belfast), it is possible to transpose these observations to users during the innovation period. First, their position in the community is less central and therefore a priori more peripheral; and second, they maintain ties with at least two communities—on a much larger scale—that of social media, comprising several million users and that are defined not spatially but in relation to areas with a higher density of ties than in the rest of the network. Moreover, ties are inherently different from those maintained by the inhabitants of a city, for example.</p>
<p>While we did not characterize users in terms of strong- or weak-tied users, the local clustering coefficient as a measure of the degree of openness of a user’s network captures to some extent a similar reality. In the propagation phase, clustering coefficients of change adopters are lower than average, which is consistent with the findings of previous work. However, we do not observe evidence that the changes were subsequently adopted by people who could be described as strongly connected, or at least belonging to a more closed subgroup, which would increase the likelihood that an innovation would spread in the community [<xref ref-type="bibr" rid="pcsy.0000005.ref014">14</xref>]. On the contrary, the degree of openness of adopters of changes seems to be higher as the diffusion of these changes progresses. However, nothing suggest that they were not taken up by a few individuals belonging to more closed subgroups, but not in sufficient numbers for this to be reflected in our results. Further studies on the strength of the ties between the users of our network and their degree of embeddedness would be desirable in order to be able to study in more detail the impact of this variable on the establishment of changes in the linguistic community.</p>
<p>While a profile of early adopters emerges in the first diffusion phase for lexical innovations, it is not yet possible at this stage to know whether they will become buzzes or changes, as our low prediction results in the innovation phase indicate. However, in the propagation phase, i.e. when the rate of adoption of buzzes and changes increases exponentially, we can identify a characteristic profile for users who adopt changes or buzzes. The success or failure of an innovation seems to depend on the combination of several factors. On the one hand, changes are adopted by individuals with a PageRank score that is always lower than normal, but much higher than those of buzzes. We can suggest that the adoption of lexical innovations by individuals with very low visibility implies that buzzes have a much lower frequency of exposure than changes at this stage. Repeated exposure to a term can in some cases have a significant effect on its adoption [<xref ref-type="bibr" rid="pcsy.0000005.ref040">40</xref>]. In addition, it appears that the first phase of diffusion of changes has an average duration of 18.5 months compared with 6.5 months for buzzes, i.e. almost three times longer. Changes therefore generally remain in circulation longer before entering their growth phase, which also increases the chances of being exposed to them. Thus, the higher exposure of future changes, being longer in circulation and adopted by people whose tweets are more likely to be made visible to a larger number of users, surely increases the likelihood of some changes being adopted in the future.</p>
<p>Next, the changes are characterized by adopters who are relatively central to their community, or at least as central as those in our control group, and located closer to other communities, whereas the opposite pattern emerges from the future buzzes. Indeed, the latter are characterized by adopters who are still very peripheral in their community and have much more distant ties to other communities. The fact that changes are adopted at this phase by users who are central to their community, acting as a bridge within it and thus facilitating their diffusion, but who also have a more direct proximity to individuals from other communities is directly in line with the observations made by Labov [<xref ref-type="bibr" rid="pcsy.0000005.ref004">4</xref>] in his Philadelphia survey when he describes the leaders of change. The adoption of innovations during the propagation phase by prestigious and central individuals, having direct ties outside the community, predicts their success. Meanwhile, innovations that do not spread to prestigious, central users tend to fail. Our prediction results confirm that the profiles of early adopters influence the ultimate fate of new terms.</p>
<p>In the fixation phase, where the fate of lexical innovations is already sealed, the prestige of their adopters reaches that of our control group, when their centrality even exceeds it. Conversely, the average number of steps required to reach the outside of the community is even lower than in the previous phases. The observations of high measures of centrality within the community and immediate proximity to the outside of the community may be reminiscent of the conditions for adoption of an innovation described by Milroy &amp; Milroy [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>], i.e. for a variant to become established within a community, it is necessary that it has been adopted by people central to it, who themselves will only risk adopting the variant if it is already widely used at the margins of the community. That said, it should be noted that Milroy &amp; Milroy [<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>] were describing adoption within a local community, whereas in our case we do not know whether the adoption of the innovation takes place within a single community or within the overall linguistic community of our corpus. It would also be interesting, when looking at the conditions for the success or failure of an innovation, to determine whether the fact that an innovation has succeeded in reaching several communities is a determining factor in the success of its diffusion, as Würschinger [<xref ref-type="bibr" rid="pcsy.0000005.ref017">17</xref>] finds for example. It is partly for this reason that it would be welcome in a future work to further develop the one started on communities, both by finely characterizing them, but also by observing the circulation of innovations within and between them.</p>
<p>One point to which we must turn our attention, and which has not been studied in this work, is the role played by the category of lexical innovation. The lexical innovations we have detected cover several categories and do not seem to be homogeneously distributed between buzzes and changes. While we find borrowings, morphological derivations, lengthenings, truncations, phonetic spellings, etc. in both types of innovation, it is immediately apparent that a greater number of lengthenings are observed in buzzes, for example, while more neologisms designating new realities or practices are present in changes. Words that have a greater communicative utility, that fill a semantic gap or that can also be used in spoken language are more likely to be maintained over time [<xref ref-type="bibr" rid="pcsy.0000005.ref041">41</xref>], as well as words used in a wider range of linguistic contexts [<xref ref-type="bibr" rid="pcsy.0000005.ref042">42</xref>]. The nature of the word itself therefore has a certain impact on its chances of survival and would be an interesting factor to consider in future research. Finally, it has been shown that other factors, notably demographic and geographical [<xref ref-type="bibr" rid="pcsy.0000005.ref011">11</xref>], play an important role in the diffusion of innovations. In future research, it would be interesting to consider all of these factors, both intra- and extra-linguistic, in order to refine and complete the results presented here on the impact of the position of speakers in the network on the diffusion of innovations.</p>
<p>To conclude, our study found similar general diffusion patterns for lexical innovations as previous sociolinguistic studies [<xref ref-type="bibr" rid="pcsy.0000005.ref004">4</xref>,<xref ref-type="bibr" rid="pcsy.0000005.ref005">5</xref>]. Those studies focused on phonetic innovations, localized communities, and surveys of hundreds. In contrast, our research examined lexical innovations at scale across millions of social media users. It should be noted, however, that it is easier for a speaker to act at the lexical level than at the phonological or morphosyntactic level, for example. This is because, once acquired, speakers generally do not change the way they pronounce, just as they are less likely to change their grammar. On the contrary, lexical variables are easier to manipulate, and are also more conscious and therefore more likely to be linked to identity issues. However, although they are less malleable, the other types of variables are not hermetic to change—even if this generally involves a longer time span. Thus, the question remains open as to whether the underlying mechanisms are the same and whether the influence of the network factors highlighted here can be generalized to non-lexical variables.</p>
</sec>
</body>
<back>
<ack>
<p>We gratefully acknowledge the support of the Centre Blaise Pascal’s IT test platform at ENS de Lyon (Lyon, France) for the computing facilities. The platform operates the SIDUS solution [<xref ref-type="bibr" rid="pcsy.0000005.ref043">43</xref>] developed by Emmanuel Quemener.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pcsy.0000005.ref001"><label>1</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Marchello-Nizia</surname> <given-names>C</given-names></name>, <name name-style="western"><surname>Combettes</surname> <given-names>B</given-names></name>, <name name-style="western"><surname>Prévost</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Scheer</surname> <given-names>T</given-names></name>, editors. <article-title>Grande Grammaire Historique du Français (GGHF).</article-title> <source>De Gruyter</source>; <year>2020</year>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1515/9783110348194" xlink:type="simple">10.1515/9783110348194</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref002"><label>2</label><mixed-citation publication-type="book" xlink:type="simple"><name name-style="western"><surname>Weinreich</surname> <given-names>U</given-names></name>, <name name-style="western"><surname>Labov</surname> <given-names>W</given-names></name>, <name name-style="western"><surname>Herzog</surname> <given-names>MI</given-names></name>. <source>Empirical foundations for a theory of language change.</source> <publisher-name>WP Lehmann-Y Malkiel (Hrsgg), Directions for Historical Linguistics</publisher-name>, <publisher-loc>Austin/London</publisher-loc>. <year>1968</year>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref003"><label>3</label><mixed-citation publication-type="book" xlink:type="simple"><name name-style="western"><surname>Labov</surname> <given-names>W.</given-names></name> <chapter-title>The social origins of sound change</chapter-title>. <source>Locating Language in Time and Space</source>. <publisher-name>Academic Press</publisher-name> <publisher-loc>New York</publisher-loc>; <year>1980</year>. pp. <fpage>251</fpage>–<lpage>265</lpage>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref004"><label>4</label><mixed-citation publication-type="book" xlink:type="simple"><name name-style="western"><surname>Labov</surname> <given-names>W.</given-names></name> <source>Principles of linguistic change. Vol. 2: Social factors.</source> Digital print. <publisher-loc>Malden, Mass</publisher-loc>.: <publisher-name>Blackwell</publisher-name>; <year>2006</year>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref005"><label>5</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Milroy</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Milroy</surname> <given-names>L</given-names></name>. <article-title>Linguistic change, social network and speaker innovation</article-title>. <source>Journal of linguistics</source>. <year>1985</year>;<volume>21</volume>: <fpage>339</fpage>–<lpage>384</lpage>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref006"><label>6</label><mixed-citation publication-type="book" xlink:type="simple"><name name-style="western"><surname>Milroy</surname> <given-names>L.</given-names></name> <source>Language and social networks.</source> <edition designator="2">2nd ed</edition>. <publisher-loc>Oxford, UK; New York, NY, USA</publisher-loc>: <publisher-name>B. Blackwell</publisher-name>; <year>1987</year>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref007"><label>7</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Granovetter</surname> <given-names>MS</given-names></name>. <article-title>The Strength of Weak Ties</article-title>. <source>American Journal of Sociology</source>. <year>1973</year>;<volume>78</volume>: <fpage>1360</fpage>–<lpage>1380</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1086/225469" xlink:type="simple">10.1086/225469</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref008"><label>8</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Nguyen</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>Doğruöz</surname> <given-names>AS</given-names></name>, <name name-style="western"><surname>Rosé</surname> <given-names>CP</given-names></name>, <name name-style="western"><surname>De Jong</surname> <given-names>F</given-names></name>. <article-title>Computational Sociolinguistics: A Survey.</article-title> <source>Computational Linguistics</source>. <year>2016</year>;<volume>42</volume>: <fpage>537</fpage>–<lpage>593</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1162/COLI%5Fa%5F00258" xlink:type="simple">10.1162/COLI_a_00258</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref009"><label>9</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Schwartz</surname> <given-names>HA</given-names></name>, <name name-style="western"><surname>Eichstaedt</surname> <given-names>JC</given-names></name>, <name name-style="western"><surname>Kern</surname> <given-names>ML</given-names></name>, <name name-style="western"><surname>Dziurzynski</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Ramones</surname> <given-names>SM</given-names></name>, <name name-style="western"><surname>Agrawal</surname> <given-names>M</given-names></name>, <etal>et al</etal>. <article-title>Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach.</article-title> <name name-style="western"><surname>Preis</surname> <given-names>T</given-names></name>, editor. <source>PLoS ONE</source>. <year>2013</year>;<volume>8</volume>: <fpage>e73791</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0073791" xlink:type="simple">10.1371/journal.pone.0073791</ext-link></comment> <object-id pub-id-type="pmid">24086296</object-id></mixed-citation></ref>
<ref id="pcsy.0000005.ref010"><label>10</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Bamman</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>Eisenstein</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Schnoebelen</surname> <given-names>T</given-names></name>. <article-title>Gender identity and lexical variation in social media.</article-title> <source>J Sociolinguistics</source>. <year>2014</year>;<volume>18</volume>: <fpage>135</fpage>–<lpage>160</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/josl.12080" xlink:type="simple">10.1111/josl.12080</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref011"><label>11</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Eisenstein</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>O’Connor</surname> <given-names>B</given-names></name>, <name name-style="western"><surname>Smith</surname> <given-names>NA</given-names></name>, <name name-style="western"><surname>Xing</surname> <given-names>EP</given-names></name>. <article-title>Diffusion of Lexical Change in Social Media.</article-title> <name name-style="western"><surname>Berwick</surname> <given-names>RC</given-names></name>, editor. <source>PLoS ONE</source>. <year>2014</year>;<volume>9</volume>: <fpage>e113114</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0113114" xlink:type="simple">10.1371/journal.pone.0113114</ext-link></comment> <object-id pub-id-type="pmid">25409166</object-id></mixed-citation></ref>
<ref id="pcsy.0000005.ref012"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Goel R, Soni S, Goyal N, Paparrizos J, Wallach H, Diaz F, et al. The social dynamics of language change in online networks. International conference on social informatics. Springer; 2016. pp. 41–57.</mixed-citation></ref>
<ref id="pcsy.0000005.ref013"><label>13</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Grieve</surname> <given-names>J</given-names></name>, <name name-style="western"><surname>Nini</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Guo</surname> <given-names>D</given-names></name>. <article-title>Analyzing lexical emergence in Modern American English online.</article-title> <source>English Language and Linguistics</source>. <year>2017</year>;<volume>21</volume>: <fpage>99</fpage>–<lpage>127</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1017/S1360674316000113" xlink:type="simple">10.1017/S1360674316000113</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref014"><label>14</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Del Tredici</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Fernández</surname> <given-names>R</given-names></name>. <article-title>The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities.</article-title> <source>arXiv:180605838 [cs].</source> <year>2018</year> [cited 19 Nov 2020]. Available: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1806.05838" xlink:type="simple">http://arxiv.org/abs/1806.05838</ext-link></mixed-citation></ref>
<ref id="pcsy.0000005.ref015"><label>15</label><mixed-citation publication-type="book" xlink:type="simple"><name name-style="western"><surname>Hovy</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>Rahimi</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Baldwin</surname> <given-names>T</given-names></name>, <name name-style="western"><surname>Brooke</surname> <given-names>J</given-names></name>. <chapter-title>Visualizing Regional Language Variation Across Europe on Twitter.</chapter-title> In: <name name-style="western"><surname>Brunn</surname> <given-names>SD</given-names></name>, <name name-style="western"><surname>Kehrein</surname> <given-names>R</given-names></name>, editors. <source>Handbook of the Changing World Language Map.</source> <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing;</publisher-name> <year>2020</year>. pp. <fpage>3719</fpage>–<lpage>3742</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-030-02438-3%5F175" xlink:type="simple">10.1007/978-3-030-02438-3_175</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref016"><label>16</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Shoemark</surname> <given-names>PJ</given-names></name>. <source>Discovering and analysing lexical variation in social media text</source>. <year>2020</year>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref017"><label>17</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Würschinger</surname> <given-names>Q.</given-names></name> <article-title>Social Networks of Lexical Innovation. Investigating the Social Dynamics of Diffusion of Neologisms on Twitter.</article-title> <source>Front Artif Intell.</source> <year>2021</year>;<volume>4</volume>: <fpage>648583</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/frai.2021.648583" xlink:type="simple">10.3389/frai.2021.648583</ext-link></comment> <object-id pub-id-type="pmid">34790894</object-id></mixed-citation></ref>
<ref id="pcsy.0000005.ref018"><label>18</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Keidar</surname> <given-names>D</given-names></name>, <name name-style="western"><surname>Opedal</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Jin</surname> <given-names>Z</given-names></name>, <name name-style="western"><surname>Sachan</surname> <given-names>M</given-names></name>. <article-title>Slangvolution: A Causal Analysis of Semantic Change and Frequency Dynamics in Slang.</article-title> <year>2022</year> [cited 24 Mar 2022]. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/ARXIV.2203.04651" xlink:type="simple">10.48550/ARXIV.2203.04651</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref019"><label>19</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Gadet</surname> <given-names>F.</given-names></name> <article-title>Changement linguistique: Langage et société.</article-title> <year>2021</year>;<source>Hors série</source>: <fpage>41</fpage>–<lpage>46</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3917/ls.hs01.0042" xlink:type="simple">10.3917/ls.hs01.0042</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref020"><label>20</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Lorenz-Spreen</surname> <given-names>P</given-names></name>, <name name-style="western"><surname>Mønsted</surname> <given-names>BM</given-names></name>, <name name-style="western"><surname>Hövel</surname> <given-names>P</given-names></name>, <name name-style="western"><surname>Lehmann</surname> <given-names>S</given-names></name>. <article-title>Accelerating dynamics of collective attention.</article-title> <source>Nat Commun</source>. <year>2019</year>;<volume>10</volume>: <fpage>1759</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41467-019-09311-w" xlink:type="simple">10.1038/s41467-019-09311-w</ext-link></comment> <object-id pub-id-type="pmid">30988286</object-id></mixed-citation></ref>
<ref id="pcsy.0000005.ref021"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Bakshy E, Rosenn I, Marlow C, Adamic L. The role of social networks in information diffusion. Proceedings of the 21st international conference on World Wide Web. 2012. pp. 519–528.</mixed-citation></ref>
<ref id="pcsy.0000005.ref022"><label>22</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Dodsworth</surname> <given-names>R</given-names></name>, <name name-style="western"><surname>Benton</surname> <given-names>RA</given-names></name>. <article-title>Social network cohesion and the retreat from Southern vowels in Raleigh.</article-title> <source>Language in Society</source>. <year>2017</year>;<volume>46</volume>: <fpage>371</fpage>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref023"><label>23</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Tamburrini</surname> <given-names>N</given-names></name>, <name name-style="western"><surname>Cinnirella</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Jansen</surname> <given-names>VAA</given-names></name>, <name name-style="western"><surname>Bryden</surname> <given-names>J</given-names></name>. <article-title>Twitter users change word usage according to conversation-partner social identity.</article-title> <source>Social Networks</source>. <year>2015</year>;<volume>40</volume>: <fpage>84</fpage>–<lpage>89</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.socnet.2014.07.004" xlink:type="simple">10.1016/j.socnet.2014.07.004</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref024"><label>24</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Fagyal</surname> <given-names>Z</given-names></name>, <name name-style="western"><surname>Swarup</surname> <given-names>S</given-names></name>, <name name-style="western"><surname>Escobar</surname> <given-names>AM</given-names></name>, <name name-style="western"><surname>Gasser</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Lakkaraju</surname> <given-names>K</given-names></name>. <article-title>Centers and peripheries: Network roles in language change.</article-title> <source>Lingua.</source> <year>2010</year>;<volume>120</volume>: <fpage>2061</fpage>–<lpage>2079</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.lingua.2010.02.001" xlink:type="simple">10.1016/j.lingua.2010.02.001</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref025"><label>25</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Lev-Ari</surname> <given-names>S.</given-names></name> <article-title>Social network size can influence linguistic malleability and the propagation of linguistic change.</article-title> <source>Cognition.</source> <year>2018</year>;<volume>176</volume>: <fpage>31</fpage>–<lpage>39</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.cognition.2018.03.003" xlink:type="simple">10.1016/j.cognition.2018.03.003</ext-link></comment> <object-id pub-id-type="pmid">29544113</object-id></mixed-citation></ref>
<ref id="pcsy.0000005.ref026"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Zhu J, Jurgens D. The structure of online social networks modulates the rate of lexical change. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics; 2021. pp. 2201–2218. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/2021.naacl-main.178" xlink:type="simple">10.18653/v1/2021.naacl-main.178</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref027"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Abitbol JL, Karsai M, Magué J-P, Chevrot J-P, Fleury E. Socioeconomic Dependencies of Linguistic Patterns in Twitter: a Multivariate Analysis. Proceedings of the 2018 World Wide Web Conference on World Wide Web—WWW ‘18. Lyon, France: ACM Press; 2018. pp. 1125–1134. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/3178876.3186011" xlink:type="simple">10.1145/3178876.3186011</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref028"><label>28</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Tarrade</surname> <given-names>L.</given-names></name> <article-title>Network factors and diffusion of linguistic innovations; 2024 [cited 2024 Jul 16].</article-title> <source>figshare [internet].</source> Available from: <ext-link ext-link-type="uri" xlink:href="https://figshare.com/articles/software/Network_factors_and_diffusion_of_linguistic_innovations/26310976/1" xlink:type="simple">https://figshare.com/articles/software/Network_factors_and_diffusion_of_linguistic_innovations/26310976/1</ext-link></mixed-citation></ref>
<ref id="pcsy.0000005.ref029"><label>29</label><mixed-citation publication-type="journal" xlink:type="simple"><collab>ICAR, DANTE Inria, LIDILEM, ALMANACH</collab>. <source>SoSweet</source>. <year>2024</year> [cited 2024 April 10]. ORTOLANG [internet]. Available from: <ext-link ext-link-type="uri" xlink:href="https://hdl.handle.net/11403/sosweet/v1" xlink:type="simple">https://hdl.handle.net/11403/sosweet/v1</ext-link></mixed-citation></ref>
<ref id="pcsy.0000005.ref030"><label>30</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Tarrade</surname> <given-names>L</given-names></name>, <name name-style="western"><surname>Magué</surname> <given-names>J-P</given-names></name>, <name name-style="western"><surname>Chevrot</surname> <given-names>J-P</given-names></name>. <article-title>Detecting and categorising lexical innovations in a corpus of tweets.</article-title> <source>Psychology of Language and Communication</source>. <year>2022</year>;<volume>26</volume>: <fpage>313</fpage>–<lpage>329</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.2478/plc-2022-15" xlink:type="simple">10.2478/plc-2022-15</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref031"><label>31</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Blythe</surname> <given-names>RA</given-names></name>, <name name-style="western"><surname>Croft</surname> <given-names>W</given-names></name>. <article-title>S-Curves And The Mechanisms Of Propagation In Language Change.</article-title> <source>Language.</source> <year>2012</year>;<volume>88</volume>: <fpage>269</fpage>–<lpage>304</lpage>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref032"><label>32</label><mixed-citation publication-type="book" xlink:type="simple"><name name-style="western"><surname>Rogers</surname> <given-names>EM</given-names></name>. <source>Diffusion of innovations</source>. <edition designator="5">5th ed</edition>. <publisher-loc>New York</publisher-loc>: <publisher-name>Free Press</publisher-name>; <year>2003</year>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref033"><label>33</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Feltgen</surname> <given-names>Q</given-names></name>, <name name-style="western"><surname>Fagard</surname> <given-names>B</given-names></name>, <name name-style="western"><surname>Nadal</surname> <given-names>J-P</given-names></name>. <article-title>Frequency patterns of semantic change: corpus-based evidence of a near-critical dynamics in language change.</article-title> <source>R Soc open sci</source>. <year>2017</year>;<volume>4</volume>: <fpage>170830</fpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1098/rsos.170830" xlink:type="simple">10.1098/rsos.170830</ext-link></comment> <object-id pub-id-type="pmid">29291074</object-id></mixed-citation></ref>
<ref id="pcsy.0000005.ref034"><label>34</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Chambers</surname> <given-names>JK</given-names></name>. <article-title>Patterns of variation including change.</article-title> <source>The handbook of language variation and change</source>. <year>2013</year>; <fpage>297</fpage>–<lpage>324</lpage>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref035"><label>35</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Tarrade</surname> <given-names>L.</given-names></name> <article-title>Detection of lexical innovations</article-title>; <year>2024</year> [cited 2024 Jul 16]. figshare [internet]. Available from: <ext-link ext-link-type="uri" xlink:href="https://figshare.com/articles/software/lexical_innovation_detection/26310973/2" xlink:type="simple">https://figshare.com/articles/software/lexical_innovation_detection/26310973/2</ext-link></mixed-citation></ref>
<ref id="pcsy.0000005.ref036"><label>36</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Staudt</surname> <given-names>CL</given-names></name>, <name name-style="western"><surname>Sazonovs</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Meyerhenke</surname> <given-names>H</given-names></name>. <article-title>NetworKit: A tool suite for large-scale complex network analysis.</article-title> <source>Net Sci</source>. <year>2016</year>;<volume>4</volume>: <fpage>508</fpage>–<lpage>530</lpage>. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1017/nws.2016.20" xlink:type="simple">10.1017/nws.2016.20</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref037"><label>37</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Blondel</surname> <given-names>VD</given-names></name>, <name name-style="western"><surname>Guillaume</surname> <given-names>J-L</given-names></name>, <name name-style="western"><surname>Lambiotte</surname> <given-names>R</given-names></name>, <name name-style="western"><surname>Lefebvre</surname> <given-names>E</given-names></name>. <article-title>Fast unfolding of communities in large networks</article-title>. <source>Journal of statistical mechanics: theory and experiment</source>. <year>2008</year>;<volume>2008</volume>: <fpage>P10008</fpage>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref038"><label>38</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Borassi</surname> <given-names>M</given-names></name>, <name name-style="western"><surname>Natale</surname> <given-names>E</given-names></name>. <source>KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation</source>. <year>2016</year>; 18 pages. <comment>doi: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.4230/LIPICS.ESA.2016.20" xlink:type="simple">10.4230/LIPICS.ESA.2016.20</ext-link></comment></mixed-citation></ref>
<ref id="pcsy.0000005.ref039"><label>39</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>van der Grinten</surname> <given-names>A</given-names></name>, <name name-style="western"><surname>Angriman</surname> <given-names>E</given-names></name>, <name name-style="western"><surname>Meyerhenke</surname> <given-names>H</given-names></name>. <article-title>Parallel Adaptive Sampling with almost no Synchronization.</article-title> <source>arXiv</source>; <year>2019</year>. Available: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1903.09422" xlink:type="simple">http://arxiv.org/abs/1903.09422</ext-link></mixed-citation></ref>
<ref id="pcsy.0000005.ref040"><label>40</label><mixed-citation publication-type="other" xlink:type="simple">Romero DM, Meeder B, Kleinberg J. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. Proceedings of the 20th international conference on World wide web. 2011. pp. 695–704.</mixed-citation></ref>
<ref id="pcsy.0000005.ref041"><label>41</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Grieve</surname> <given-names>J.</given-names></name> <source>Natural selection in the modern English lexicon</source>. <year>2018</year>. pp. <fpage>153</fpage>–<lpage>157</lpage>.</mixed-citation></ref>
<ref id="pcsy.0000005.ref042"><label>42</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Stewart</surname> <given-names>I</given-names></name>, <name name-style="western"><surname>Eisenstein</surname> <given-names>J</given-names></name>. <article-title>Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline.</article-title> <source>arXiv:170900345 [physics].</source> <year>2018</year> [cited 8 Dec 2021]. Available: <ext-link ext-link-type="uri" xlink:href="http://arxiv.org/abs/1709.00345" xlink:type="simple">http://arxiv.org/abs/1709.00345</ext-link></mixed-citation></ref>
<ref id="pcsy.0000005.ref043"><label>43</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Quemener</surname> <given-names>E</given-names></name>, <name name-style="western"><surname>Corvellec</surname> <given-names>M</given-names></name>. <article-title>SIDUS—the solution for extreme deduplication of an operating system.</article-title> <source>Linux J</source>. <year>2013</year>;2013: <volume>3</volume>:<fpage>3</fpage>.</mixed-citation></ref>
</ref-list>
</back>
<sub-article article-type="aggregated-review-documents" id="pcsy.0000005.r001" specific-use="decision-letter">
<front-stub>
<article-id pub-id-type="doi">10.1371/journal.pcsy.0000005.r001</article-id>
<title-group>
<article-title>Decision Letter 0</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name name-style="western">
<surname>Pappalardo</surname>
<given-names>Luca</given-names>
</name>
<role>Section Editor</role>
</contrib>
<contrib contrib-type="author">
<name name-style="western">
<surname>Badham</surname>
<given-names>Jennifer</given-names>
</name>
<role>Academic Editor</role>
</contrib>
</contrib-group>
<permissions>
<copyright-year>2024</copyright-year>
<copyright-holder>Pappalardo, Badham</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<related-object document-id="10.1371/journal.pcsy.0000005" document-id-type="doi" document-type="article" id="rel-obj001" link-type="peer-reviewed-article"/>
<custom-meta-group>
<custom-meta>
<meta-name>Submission Version</meta-name>
<meta-value>0</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>
<named-content content-type="letter-date">12 Mar 2024</named-content>
</p>
<p>PCSY-D-24-00007</p>
<p>How position in the network determines the fate of lexical innovations on Twitter</p>
<p>PLOS Complex Systems</p>
<p>Dear Dr. Chevrot,</p>
<p>Thank you for submitting your manuscript to PLOS Complex Systems. After careful consideration, we feel that it has merit but does not fully meet PLOS Complex Systems's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.</p>
<p>Please submit your revised manuscript within 60 days May 11 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at <email xlink:type="simple">complexsystems@plos.org</email>. When you're ready to submit your revision, log on to <ext-link ext-link-type="uri" xlink:href="https://www.editorialmanager.com/pcsy/" xlink:type="simple">https://www.editorialmanager.com/pcsy/</ext-link> and select the 'Submissions Needing Revision' folder to locate your manuscript file.</p>
<p>Please include the following items when submitting your revised manuscript:</p>
<p>* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.</p>
<p>* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.</p>
<p>* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.</p>
<p>If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.</p>
<p>We look forward to receiving your revised manuscript.</p>
<p>Kind regards,</p>
<p>Jennifer Badham</p>
<p>Academic Editor</p>
<p>PLOS Complex Systems</p>
<p>Journal Requirements:</p>
<p>1. Please amend your detailed online Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published.</p>
<p>a) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)."</p>
<p>b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”</p>
<p>2. Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well.</p>
<p>3. Please update your online Competing Interests statement. If you have no competing interests to declare, please state: “The authors have declared that no competing interests exist.”</p>
<p>4. Please provide separate figure files in .tif or .eps format only and remove any figures embedded in your manuscript file. Please also ensure that all files are under our size limit of 10MB. You may leave the figure captions or legends in the manuscript.</p>
<p>For more information about how to convert your figure files please see our guidelines: <ext-link ext-link-type="uri" xlink:href="https://journals.plos.org/complexsystems/s/figures" xlink:type="simple">https://journals.plos.org/complexsystems/s/figures</ext-link></p>
<p>5. Please ensure that you refer to Figure 1 in your text as, if accepted, production will need this reference to link the reader to the figure.</p>
<p>Additional Editor Comments (if provided):</p>
<p>This is an interesting application of network diffusion and I would welcome a revised resubmission. As it stands however, the paper is unbalanced. As well as considering the included reviewer comments, please consider how the paper fits together:</p>
<p>(1) There is too much background about the linguistic theory that is not relevant to the readers of PLOS Complex Systems - it would help readers to focus on the specific debate that the research addresses</p>
<p>(2) There is too little information about methodology. For example, allocation of words to 'change' or 'buzz' and assignment to diffusion phase are handled by reference to a separate paper except to state that they more closely follow a sigmoidal or guassian distribution over time. However, these decisions are critical to the submitted paper and the included analysis. A short description should be available to readers so they can follow your argument without reading the other paper, with the referenced paper available for details if required. For example, it is unclear how you determine truncation issues - what happens to words that appear toward the end of your time period and do not have a fixation/decline period. I think this is handled by the fact that the initial appearance is the early part of the time period of the corpus, but this is not obvious from the information available withing the paper.</p>
<p>[Note: HTML markup is below. Please do not edit.]</p>
<p>Reviewers' comments:</p>
<p>Reviewer's Responses to Questions</p>
<p><!-- <font color="black"> --><bold>Comments to the Author</bold></p>
<p>1. Does this manuscript meet PLOS Complex Systems’s <ext-link ext-link-type="uri" xlink:href="https://journals.plos.org/complexsystems/s/journal-information#loc-criteria-for-publication" xlink:type="simple">publication criteria</ext-link>? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.</p>
<p>Reviewer #1: Partly</p>
<p>Reviewer #2: Yes</p>
<p>Reviewer #3: Yes</p>
<p>--------------------</p>
<p><!-- <font color="black"> -->2. Has the statistical analysis been performed appropriately and rigorously?<!-- </font> --></p>
<p>Reviewer #1: No</p>
<p>Reviewer #2: Yes</p>
<p>Reviewer #3: No</p>
<p>--------------------</p>
<p><!-- <font color="black"> -->3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?</p>
<p> The <ext-link ext-link-type="uri" xlink:href="https://journals.plos.org/complexsystems/s/data-availability" xlink:type="simple">PLOS Data policy</ext-link> requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.<!-- </font> --></p>
<p>Reviewer #1: Yes</p>
<p>Reviewer #2: Yes</p>
<p>Reviewer #3: Yes</p>
<p>--------------------</p>
<p><!-- <font color="black"> -->4. Is the manuscript presented in an intelligible fashion and written in standard English?</p>
<p>PLOS Complex Systems does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.<!-- </font> --></p>
<p>Reviewer #1: Yes</p>
<p>Reviewer #2: Yes</p>
<p>Reviewer #3: Yes</p>
<p>--------------------</p>
<p><!-- <font color="black"> -->5. Review Comments to the Author</p>
<p>Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)<!-- </font> --></p>
<p>Reviewer #1: Overall, the paper titled "How Position in the Network Determines the Fate of Lexical Innovations on Twitter" holds promise in its exploration of the dynamics of lexical innovations on the platform. However, there are several areas that require attention to enhance clarity and rigor.</p>
<p>While the paper initially piques interest, a significant portion is dedicated to well-established and known theories. The authors should streamline these sections and allocate more space to explaining essential aspects in greater detail. Moreover, the paper lacks crucial details, and more experimental findings and graphics are needed to bolster its quality and credibility.</p>
<p>Specifically, from line 239 to 244, a graphical representation would significantly enhance understanding. Adding a figure depicting the phenomenon discussed during this section would aid readers in visualizing the concepts being conveyed</p>
<p>The method of collecting the 650 million tweets is not adequately explained. It is essential to provide a clear and detailed description of the data collection process to ensure transparency and reproducibility.</p>
<p>The paper mentions the removal of tweets from bots (line 285) without explaining the methodology used for identifying and excluding them. A detailed explanation of the bot detection method is crucial for the readers to evaluate the reliability of the data.</p>
<p>In the "Lexical Innovations" section, the explanation could benefit from graphical representations, particularly illustrating the three phases of diffusion. Visual aids would facilitate comprehension and engagement.</p>
<p>A bar chart highlighting the top 20 control words could significantly enhance the presentation and understanding of the data in the "Control Words" section.</p>
<p>Lines 464 and 476 indicate references not found. The authors should address this issue by either providing the missing references or correcting the citations.</p>
<p>The paper lacks explicitly stated research questions, and the main contribution is not clear. Authors should explicitly outline their research questions and emphasize the unique contributions of their work.</p>
<p>In conclusion, the paper has potential but requires substantial revisions to address the mentioned concerns. Clarity, graphical representation, methodological transparency, and a more explicit presentation of research questions and contributions are essential for improving the overall quality of the manuscript.</p>
<p>Reviewer #2: This is a valuable and exciting paper, providing a nuanced empirical test of some longstanding, mostly untested sociolinguistic hypotheses concerning the diffusion of linguistic variables through populations and the network characteristics of innovators. The paper builds upon and advances the analysis in the same authors’ 2022 paper, e.g. by using community detection. </p>
<p>My suggestions are few. I will begin with the only one that I consider especially important:</p>
<p>The paper does a nice job connecting the main results back to the hypotheses advanced by the Milroys and by Labov. I would suggest, however, saying more about the difference between phonetic / phonological variables (the focus of the Milroys’ work and most of Labov’s work) and lexical variables. We can reasonably make different predictions about the diffusion of these distinct kinds of variables through networks, even at the level of the individual speaker, especially given the fact that this paper is dealing at least in part with adults. A few reasons:</p>
<p>• Plenty of evidence indicates that individual adult speakers don’t shift their vowels mechanistically as a function of network position; indeed, individual speakers who move to new dialect regions as adults tend to retain the regional vocalic forms they acquired as children. Lexical variables are more malleable, in part because they are more likely to evoke explicit commentary from others. (As one example: I’m from the ever-shrinking part of the U.S. that uses the word ‘pop’ to refer to carbonated drinks such as Coke. But now that I live in a different region, I’m shifting to ‘soda’ in order to avoid negative evaluation.) </p>
<p>• Phonetic and phonological variables, together with syntactic and morphological variables, are generally constrained by internal (linguistic) factors as well as social factors. In this way, they are more ‘complex’ and therefore more difficult to acquire than lexical variables. This is one reason that we don’t normally see ‘buzz’-type non-lexical changes (or at least we don’t have a lot of empirical evidence of their occurrence on a scale of months or even several years).</p>
<p>• Individual speakers often don’t realize to what extent they participate in sound changes or even stable, strongly regional phonological variables such as mergers. For example, in the U.S., many speakers have no idea whether they have the pin/pen merger, and (famously) many speakers in the Inland North dialect region believe incorrectly that they don’t participate in the Northern Cities Chain Shift. In contrast, it is easier for speakers to correctly assess their lexical practices.</p>
<p>My concern here is that readers will assume that the results shown in this paper for lexical variables can be unproblematically extended to all cases of linguistic change. The investigation of the network factors influencing non-lexical change remains important, and is further motivated by the present analysis.</p>
<p>Regarding the 4 network variables (line 333 and following): These are good choices of network variables, but I suggest saying briefly why you chose these particular network characteristics, and if possible cite a few studies that have used the same or similar network variables in service of comparable goals. In addition, I note that the choice of these variables seems quite well motivated by the preceding discussion of social network research in sociolinguistics; perhaps this connection could be made just a bit more explicit.</p>
<p>There are several word-level factors that are not mentioned in the paper, probably due to space constraints and statistical modeling decisions, that could interact with the network effects. Here are two:</p>
<p>• Length of phase: I imagine that there’s quite a bit of variability across lexical innovations (changes in particular, in contrast with buzzes) in the time it took to reach propagation and fixation/decline. I understand that phase length has been standardized in the present analysis, but could the authors say a bit more about this variability and its implications for the effects of the network variables? It’s possible that I’ve misunderstood something in this respect.</p>
<p>• Type of innovation: The authors’ 2022 paper lists distinct categories of neologisms on page 321, e.g. words related to new realities (‘fullstack’) vs. reinvigorated archaic forms (‘malaisante’). The current paper appears not to distinguish among these forms in the statistical analysis. I wonder if a sentence or two could be added about the potential differences among these categories.</p>
<p>Reviewer #3: Summary:</p>
<p>The study compares the diffusion of successful and failed lexical innovations and assesses the network determinants that can differentiate between successful and unsuccessful innovations. These network determinants include a combination of network centrality and position measures. Based on empirical evidence from social media, the authors claim similarity in the diffusion of both types of innovations when the innovations are new, in that the users peripheral to the network are the drivers of such changes. However, as the phases of diffusion progress, successful innovations differ from failed in that the successful innovations gain adoptions from the more central users in the network whereas the failed innovations do not benefit from such adoptions leading to their failure.</p>
<p>Overall, the paper presents empirical evidence for an interesting sociolinguistic problem but has some weaknesses in the methodology used to validate the claims. My perception is that the paper is not fully developed in terms of rigor, with some modifications and clarification, it would be ready for publication.</p>
<p>Strengths:</p>
<p>- The authors ground the empirical findings of their study in linguistic theories of the importance of network position and tie strength, though part of their claim about tie strength is not verified empirically.</p>
<p>- Additionally, the paper presents a comprehensive review of the existing sociolinguistic scholarship around the topic.</p>
<p>Weaknesses:</p>
<p>- The selection of the linguistic variables is not thoroughly described.</p>
<p>- The motivation of using the specific network measures deployed in the study is not sufficient.</p>
<p>- Some more inline explanation</p>
</body>
</sub-article>
<sub-article article-type="author-comment" id="pcsy.0000005.r002">
<front-stub>
<article-id pub-id-type="doi">10.1371/journal.pcsy.0000005.r002</article-id>
<title-group>
<article-title>Author response to Decision Letter 0</article-title>
</title-group>
<related-object document-id="10.1371/journal.pcsy.0000005" document-id-type="doi" document-type="peer-reviewed-article" id="rel-obj002" link-type="rebutted-decision-letter" object-id="10.1371/journal.pcsy.0000005.r001" object-id-type="doi" object-type="decision-letter"/>
<custom-meta-group>
<custom-meta>
<meta-name>Submission Version</meta-name>
<meta-value>1</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>
<named-content content-type="author-response-date">17 May 2024</named-content>
</p>
<supplementary-material id="pcsy.0000005.s001" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" position="float" xlink:href="info:doi/10.1371/journal.pcsy.0000005.s001" xlink:type="simple">
<label>Attachment</label>
<caption>
<p>Submitted filename: <named-content content-type="submitted-filename">Response to Reviewers.docx</named-content></p>
</caption>
</supplementary-material>
</body>
</sub-article>
<sub-article article-type="aggregated-review-documents" id="pcsy.0000005.r003" specific-use="decision-letter">
<front-stub>
<article-id pub-id-type="doi">10.1371/journal.pcsy.0000005.r003</article-id>
<title-group>
<article-title>Decision Letter 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name name-style="western">
<surname>Pappalardo</surname>
<given-names>Luca</given-names>
</name>
<role>Section Editor</role>
</contrib>
<contrib contrib-type="author">
<name name-style="western">
<surname>Badham</surname>
<given-names>Jennifer</given-names>
</name>
<role>Academic Editor</role>
</contrib>
</contrib-group>
<permissions>
<copyright-year>2024</copyright-year>
<copyright-holder>Pappalardo, Badham</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<related-object document-id="10.1371/journal.pcsy.0000005" document-id-type="doi" document-type="article" id="rel-obj003" link-type="peer-reviewed-article"/>
<custom-meta-group>
<custom-meta>
<meta-name>Submission Version</meta-name>
<meta-value>1</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>
<named-content content-type="letter-date">9 Jul 2024</named-content>
</p>
<p>How position in the network determines the fate of lexical innovations on Twitter</p>
<p>PCSY-D-24-00007R1</p>
<p>Dear Professor Chevrot,</p>
<p>We are pleased to inform you that your manuscript 'How position in the network determines the fate of lexical innovations on Twitter' has been provisionally accepted for publication in PLOS Complex Systems.</p>
<p>Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. </p>
<p>IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.</p>
<p>If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact <email xlink:type="simple">complexsystems@plos.org</email>.</p>
<p>Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Complex Systems.</p>
<p>Best regards,</p>
<p>Jennifer Badham</p>
<p>Academic Editor</p>
<p>PLOS Complex Systems</p>
<p>***********************************************************</p>
<p>Reviewer Comments (if any, and for reference):</p>
<p>Reviewer's Responses to Questions</p>
<p><!-- <font color="black"> --><bold>Comments to the Author</bold></p>
<p>1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.<!-- </font> --></p>
<p>Reviewer #1: All comments have been addressed</p>
<p>Reviewer #2: All comments have been addressed</p>
<p>**********</p>
<p><!-- <font color="black"> -->2. Does this manuscript meet PLOS Complex Systems's <ext-link ext-link-type="uri" xlink:href="https://journals.plos.org/complexsystems/s/journal-information#loc-criteria-for-publication" xlink:type="simple">publication criteria</ext-link>? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.<!-- </font> --></p>
<p>Reviewer #1: Yes</p>
<p>Reviewer #2: Yes</p>
<p>**********</p>
<p><!-- <font color="black"> -->3. Has the statistical analysis been performed appropriately and rigorously?<!-- </font> --></p>
<p>Reviewer #1: Yes</p>
<p>Reviewer #2: Yes</p>
<p>**********</p>
<p><!-- <font color="black"> -->4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?</p>
<p>The <ext-link ext-link-type="uri" xlink:href="https://journals.plos.org/complexsystems/s/data-availability" xlink:type="simple">PLOS Data policy</ext-link> requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.<!-- </font> --></p>
<p>Reviewer #1: Yes</p>
<p>Reviewer #2: Yes</p>
<p>**********</p>
<p><!-- <font color="black"> -->5. Is the manuscript presented in an intelligible fashion and written in standard English?</p>
<p>PLOS Complex Systems does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.<!-- </font> --></p>
<p>Reviewer #1: Yes</p>
<p>Reviewer #2: Yes</p>
<p>**********</p>
<p><!-- <font color="black"> -->6. Review Comments to the Author</p>
<p>Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)<!-- </font> --></p>
<p>Reviewer #1: I am pleased to confirm that all the revisions made to the manuscript titled "How position in the network determines the fate of lexical innovations on Twitter" have been thoroughly addressed to my satisfaction. The authors have successfully incorporated the suggested improvements, particularly in clarifying their methodology, enhancing the robustness of their data analysis, and providing a more comprehensive discussion of the results in the context of existing literature. The revised manuscript now presents a well-structured and compelling narrative that significantly contributes to our understanding of the dynamics of lexical innovation dissemination on social media platforms. Consequently, I am happy to endorse the manuscript for publication in its current form.</p>
<p>Reviewer #2: My concerns have mostly been addressed, and I believe the paper is ready for publication. One remaining question is the importance of the category of innovation, which is briefly discussed beginning at line 618. There are plenty of further questions to be asked here, but they can be seen as lying outside the paper's major goals. I would also add that the literature review gives the impression in some cases that the influence of social network factors on language is better known than it really is; for example, one simulation-based study is given as evidence that removing peripheral members from a network prevents innovation. In this domain, there are plenty more questions to be addressed in further work</p>
<p>**********</p>
<p><!-- <font color="black"> -->7. PLOS authors have the option to publish the peer review history of their article (<ext-link ext-link-type="uri" xlink:href="https://journals.plos.org/complexsystems/s/editorial-and-peer-review-process#loc-peer-review-history" xlink:type="simple">what does this mean?</ext-link>). If published, this will include your full peer review and any attached files.</p>
<p><bold>Do you want your identity to be public for this peer review?</bold> If you choose “no”, your identity will remain anonymous but your review may still be made public.</p>
<p>For information about this choice, including consent withdrawal, please see our <ext-link ext-link-type="uri" xlink:href="https://www.plos.org/privacy-policy" xlink:type="simple">Privacy Policy</ext-link>.<!-- </font> --></p>
<p>Reviewer #1: No</p>
<p>Reviewer #2: No</p>
<p>**********</p>
</body>
</sub-article>
</article>