Core team:

Peter Sheridan Dodds
University of Vermont
Vermont Complex Systems Center, Director
Peter's research focuses on system-level, big data problems in many areas including language and stories, sociotechnical systems, Earth sciences, biology, and ecology. Peter has created (and constantly evolves) a series of complex systems courses starting with Principles of Complex Systems. He co-runs the Computational Story Lab with Chris Danforth.
Most recent papers:
Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings.
Chris Danforth, Peter Sheridan Dodds, Tyler Gray. Preprint, 2019.[pdf] [arXiv] [online appendices]
Abstract:
Stretched words like `heellllp' or `heyyyyy' are a regular feature of spoken language, often used to emphasize or exaggerate the underlying meaning of the root word. While stretched words are rarely found in formal written language and dictionaries, they are prevalent within social media. In this paper, we examine the frequency distributions of `stretchable words' found in roughly 100 billion tweets authored over an 8 year period. We introduce two central parameters, `balance' and `stretch', that capture their main characteristics, and explore their dynamics by creating visual tools we call `balance plots' and `spelling trees'. We discuss how the tools and methods we develop here could be used to study the statistical patterns of mistypings and misspellings, along with the potential applications in augmenting dictionaries, improving language processing, and in any area where sequence construction matters, such as genetics.
The shocklet transform: A decomposition method for the identification of local, mechanism-driven dynamics in sociotechnical time series.
Thayer Alshaabi, Chris Danforth, Peter Sheridan Dodds, David Dewhurst, Michael Arnold, Joshua Minot, Dilan Kiley. Preprint, 2019.[pdf] [arXiv]
Abstract:
We introduce an unsupervised pattern recognition algorithm termed the Discrete Shocklet Transform (DST) by which local dynamics of time series can be extracted. Time series that are hypothesized to be generated by underlying deterministic mechanisms have significantly different DSTs than do purely random null models. We apply the DST to a sociotechnical data source, usage frequencies for a subset of words on Twitter over a decade, and demonstrate the ability of the DST to filter high-dimensional data and automate the extraction of anomalous behavior.
Chimera States and Seizures in a Mouse Neuronal Model.
Chris Danforth, Peter Sheridan Dodds, Henry Mitchell, Matthew Mahoney. Preprint, 2019.[pdf] [arXiv]
Abstract:
Chimera states---the coexistence of synchrony and asynchrony in a nonlocally-coupled network of identical oscillators---are often used as a model framework for epileptic seizures. Here, we explore the dynamics of chimera states in a network of modified Hindmarsh-Rose neurons, configured to reflect the graph of the mesoscale mouse connectome. Our model produces superficially epileptiform activity converging on persistent chimera states in a large region of a two-parameter space governing connections (a) between subcortices within a cortex and (b) between cortices. Our findings contribute to a growing body of literature suggesting mathematical models can qualitatively reproduce epileptic seizure dynamics.
Visitors to urban greenspace have higher sentiment and lower negativity on Twitter.
Aaron Schwartz, Peter Sheridan Dodds, Jarlath O'Neil-Dunne, Chris Danforth, Taylor Ricketts. People and Nature, , , 2019.[pdf] [journal page] [arXiv]
Abstract:
Urbanization and the decline of access to nature have coincided with a rise of mental health problems. A growing body of research has demonstrated an association between nature contact and improved mental affect (ie, mood). However, previous approaches have been unable to quantify the benefits of urban greenspace exposure and compare how different types of outdoor public spaces impact mood. Here, we use Twitter to investigate how mental affect varies before, during, and after visits to a large urban park system. We analyze the sentiment of tweets to estimate the magnitude and duration of the affect benefit of visiting parks. We find that affect is substantially higher during park visits and remains elevated for several hours following the visit. Visits to Regional Parks, which are greener and have greater vegetative cover, result in a greater increase in affect compared to Civic Plazas and Squares. Finally, we analyze the words in tweets around park visits to explore several theorized mechanisms linking nature exposure with mental and cognitive benefits. Negation words such as" no"," not", and" don't" decrease in frequency during visits to urban parks. These results point to the most beneficial types of nature contact for mental health benefits and can be used by urban planners and public health officials to improve the well-being of growing urban populations.
Social media usage patterns during natural hazards..
Meredith Niles, Benjamin Emery, Andy Reagan, Peter Sheridan Dodds, Chris Danforth. PLoS ONE, , , 2019.[pdf] [journal page] [arXiv]
Abstract:
Natural hazards are becoming increasingly expensive as climate change and development are exposing communities to greater risks. Preparation and recovery are critical for climate change resilience, and social media are being used more and more to communicate before, during, and after disasters. While there is a growing body of research aimed at understanding how people use social media surrounding disaster events, most existing work has focused on a single disaster case study. In the present study, we analyze five of the costliest disasters in the last decade in the United States (Hurricanes Irene and Sandy, two sets of tornado outbreaks, and flooding in Louisiana) through the lens of Twitter. In particular, we explore the frequency of both generic and specific food-security related terms, and quantify the relationship between network size and Twitter activity during disasters. We find differences in tweet volume for keywords depending on disaster type, with people using Twitter more frequently in preparation for Hurricanes, and for real-time or recovery information for tornado and flooding events. Further, we find that people share a host of general disaster and specific preparation and recovery terms during these events. Finally, we find that among all account types, individuals with “average” sized networks are most likely to share information during these disasters, and in most cases, do so more frequently than normal. This suggests that around disasters, an ideal form of social contagion is being engaged in which average people rather than outsized influentials are key to communication. These results provide important context for the type of disaster information and target audiences that may be most useful for disaster communication during varying extreme events.
English Verb Regularization in Books and Tweets.
Chris Danforth, Peter Sheridan Dodds, Andy Reagan, Tyler Gray. PLoS ONE, , , 2018.[pdf] [journal page] [online appendices]
Abstract:
The English language has evolved dramatically throughout its lifespan, to the extent that a modern speaker of Old English would be incomprehensible without translation. One concrete indicator of this process is the movement from irregular to regular (-ed) forms for the past tense of verbs. In this study we quantify the extent of verb regularization using two vastly disparate datasets: (1) Six years of published books scanned by Google (2003--2008), and (2) A decade of social media messages posted to Twitter (2008--2017). We find that the extent of verb regularization is greater on Twitter, taken as a whole, than in English Fiction books. Regularization is also greater for tweets geotagged in the United States relative to American English books, but the opposite is true for tweets geotagged in the United Kingdom relative to British English books. We also find interesting regional variations in regularization across counties in the United States. However, once differences in population are accounted for, we do not identify strong correlations with socio-demographic variables such as education or income.
Continuum rich-get-richer processes: Mean field analysis with an application to firm size.
David Rushing Dewhurst, Chris Danforth, Peter Sheridan Dodds. Physical Review E, , 97, 2018.[pdf] [journal page] [arXiv]
Abstract:
Classical rich-get-richer models have found much success in being able to broadly reproduce the statistics and dynamics of diverse real complex systems. These rich-get-richer models are based on classical urn models and unfold step-by-step in discrete time. Here, we consider a natural variation acting on a temporal continuum in the form of a partial differential equation (PDE). We first show that the continuum version of Herbert Simon's canonical preferential attachment model exhibits an identical size distribution. In relaxing Simon's assumption of a linear growth mechanism, we consider the case of an arbitrary growth kernel and find the general solution to the resultant PDE. We then extend the PDE to multiple spatial dimensions, again determining the general solution. Finally, we apply the model to size and wealth distributions of firms. We obtain power law scaling for both to be concordant with simulations as well as observational data.
A simple person's approach to understanding the contagion condition for spreading processes on generalized random networks.
Peter Sheridan Dodds. Complex Spreading Phenomena in Social Systems: Influence and Contagion in Real-World Social Networks, 27, , 2018.[pdf] [journal page] [arXiv]
Abstract:
We present derivations of the contagion condition for a range of spreading mechanisms on families of generalized random networks and bipartite random networks. We show how the contagion condition can be broken into three elements, two structural in nature, and the third a meshing of the contagion process and the network. The contagion conditions we obtain reflect the spreading dynamics in a clear, interpretable way. For threshold contagion, we discuss results for all-to-all and random network versions of the model, and draw connections between them.
A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter.
Ted James, Chris Jones, Amulya Alapati, Promise Ukandu, Chris Danforth, Peter Sheridan Dodds. Preprint, 2018.[pdf]
Abstract:
Background: Social media has the capacity to afford the healthcare industry with valuable feedback from patients who reveal and express their medical decision-making process, as well as self-reported quality of life indicators both during and post treatment. In prior work, Crannell et al. [1], we have studied an active cancer patient population on Twitter and compiled a set of tweets describing their experience with this disease. We refer to these online public testimonies as “Invisible Patient Reported Outcomes” (iPROs), because they carry relevant indicators, yet are difficult to capture by conventional means of self-report. Methods: Our present study aims to identify tweets related to the patient experience as an additional informative tool for monitoring public health. Using Twitter’s public streaming API, we compiled over 5.3 million “breast cancer” related tweets spanning September 2016 until mid December 2017. We combined supervised machine learning methods with natural language processing to sift tweets relevant to breast cancer patient experiences. We analyzed a sample of 845 breast cancer patient and survivor accounts, responsible for over 48,000 posts. We investigated tweet content with a hedonometric sentiment analysis to quantitatively extract emotionally charged topics. Results: We found that positive experiences were shared regarding patient treatment, raising support, and spreading awareness. Further discussions related to healthcare were prevalent and largely negative focusing on fear of political legislation that could result in loss of coverage. Conclusions: Social media can provide a positive outlet for patients to discuss their needs and concerns regarding their healthcare coverage and treatment needs. Capturing iPROs from online communication can help inform healthcare professionals and lead to more connected and personalized treatment regimens.
Divergent Discourse Between Protests and Counter-Protests: #BlackLivesMatter and #AllLivesMatter.
Ryan Gallagher, Andy Reagan, Chris Danforth, Peter Sheridan Dodds. PLoS ONE, e0195644, 13, 2016.[pdf] [journal page] [arXiv]
Abstract:
Since the shooting of Black teenager Michael Brown by White police officer Darren Wison in Ferguson, Missouri, the protest hashtag #BlackLivesMatter has amplified critiques of extrajudicial killings of Black Americans. In response to #BlackLivesMatter, other Twitter users have adopted #AllLivesMatter, a counter-protest hashtag whose content argues that equal attention should be given to all lives regardless of race. Through a multi-level analysis, we study how these protests and counter-protests diverge by quantifying aspects of their discourse. In particular, we introduce methodology that not only quantifies these divergences, but also reveals whether they are from widespread discussion or a few popular retweets within these groups. We find that #BlackLivesMatter exhibits many informationally rich conversations, while those within #AllLivesMatter are more muted and susceptible to hijacking. We also show that the discussion within #BlackLivesMatter is more likely to center around the deaths of Black Americans, while that of #AllLivesMatter is more likely to sympathize with the lives of police officers and express politically conservative views.
Sentiment analysis methods for understanding large-scale texts: A case for using continuum-scored words and word shift graphs.
Andy Reagan, Chris Danforth, Brian Tivnan, Jake Williams, Peter Sheridan Dodds. EPJ Data Science, 28, 6, 2017.[pdf] [arXiv]
Abstract:
The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, bearing profound implications for our understanding of human behavior. Given the growing assortment of sentiment measuring instruments, comparisons between them are evidently required. Here, we perform detailed tests of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 8 methods. We show that a dictionary-based method will only perform both reliably and meaningfully if (1) the dictionary covers a sufficiently large enough portion of a given text's lexicon when weighted by word usage frequency; and (2) words are scored on a continuous scale.
Forecasting the onset and course of mental illness with Twitter data.
Andrew G. Reece, Chris Danforth, Andy Reagan, Katharina L. M. Lix, Peter Sheridan Dodds, Ellen Langer. Scientific Reports, , , 2017.[pdf] [journal page] [arXiv]
Abstract:
We developed computational models to predict the emergence of depression and Post-Traumatic Stress Disorder in Twitter users. Twitter data and details of depression history were collected from 204 individuals (105 depressed, 99 healthy). We extracted predictive features measuring affect, linguistic style, and context from participant tweets (N=279,951) and built models using these features with supervised learning algorithms. Resulting models successfully discriminated between depressed and healthy content, and compared favorably to general practitioners' average success rates in diagnosing depression. Results held even when the analysis was restricted to content posted before first depression diagnosis. State-space temporal analysis suggests that onset of depression may be detectable from Twitter data several months prior to diagnosis. Predictive results were replicated with a separate sample of individuals diagnosed with PTSD (174 users, 243,775 tweets). A state-space time series model revealed indicators of PTSD almost immediately post-trauma, often many months prior to clinical diagnosis. These methods suggest a data-driven, predictive approach for early screening and detection of mental illness.
Slightly generalized Generalized Contagion: Unifying simple models of biological and social spreading.
Peter Sheridan Dodds. Preprint, 2017.[pdf] [arXiv]
Abstract:
We motivate and explore the basic features of generalized contagion, a model mechanism that unifies fundamental models of biological and social contagion. Generalized contagion builds on the elementary observation that spreading and contagion of all kinds involve some form of system memory. We discuss the three main classes of systems that generalized contagion affords, resembling: simple biological contagion; critical mass contagion of social phenomena; and an intermediate, and explosive, vanishing critical mass contagion. We also present a simple explanation of the global spreading condition in the context of a small seed of infected individuals.
Simon's fundamental rich-gets-richer model entails a dominant first-mover advantage.
Peter Sheridan Dodds, David Dewhurst, Lewis Mitchell, Andy Reagan, Jake Williams, Chris Danforth. Physical Review E, , 95, 2017.[pdf] [journal page] [arXiv]
Abstract:
Herbert Simon's classic rich-gets-richer model is one of the simplest empirically supported mechanisms capable of generating heavy-tail size distributions for complex systems. Simon argued analytically that a population of flavored elements growing by either adding a novel element or randomly replicating an existing one would afford a distribution of group sizes with a power-law tail. Here, we show that, in fact, Simon's model does not produce a simple power law size distribution as the initial element has a dominant first-mover advantage, and will be overrepresented by a factor proportional to the inverse of the innovation probability. The first group's size discrepancy cannot be explained away as a transient of the model, and may therefore be many orders of magnitude greater than expected. We demonstrate how Simon's analysis was correct but incomplete, and expand our alternate analysis to quantify the variability of long term rankings for all groups. We find that the expected time for a first replication is infinite, and show how an incipient group must break the mechanism to improve their odds of success. Our findings call for a reexamination of preceding work invoking Simon's model and provide a revised understanding going forward.
Connecting every bit of knowledge: The structure of Wikipedia's First Link Network.
Mark Ibrahim, Peter Sheridan Dodds, Chris Danforth. Journal of Computational Science, 21-30, 19, 2017.[pdf] [journal page] [arXiv] [online appendices]
Abstract:
Apples, porcupines, and the most obscure Bob Dylan song—is every topic a few clicks from Philosophy? Within Wikipedia, the surprising answer is yes: nearly all paths lead to Philosophy. Wikipedia is the largest, most meticulously indexed collection of human knowledge ever amassed. More than information about a topic, Wikipedia is a web of naturally emerging relationships. By following the first link in each article, we algorithmically construct a directed network of all 4.7 million articles: Wikipedia’s First Link Network. Here, we study the English edition of Wikipedia’s First Link Network for insight into how the many articles on inventions, places, people, objects, and events are related and organized. By traversing every path, we measure the accumulation of first links, path lengths, groups of path-connected articles, cycles, and the influence each article exerts in shaping the network. We find scale-free distributions describe path length, accumulation, and influence. Far from dispersed, first links disproportionately accumulate at a few articles—flowing from specific to general and culminating around fundamental notions such as Community, State, and Science. Philosophy directs more paths than any other article by two orders of magnitude. We also observe a gravitation towards topical articles such as Health Care and Fossil Fuel. These findings enrich our view of the connections and structure of Wikipedia’s ever growing store of knowledge.
Is language evolution grinding to a halt? The scaling of lexical turbulence in English fiction suggests it is not.
Eitan Pechenick, Chris Danforth, Peter Sheridan Dodds. Journal of Computational Science, , , 2017.[pdf] [journal page]
Abstract:
Of basic interest is the quantification of the long term growth of a language's lexicon as it develops to more completely cover both a culture's communication requirements and knowledge space. Here, we explore the usage dynamics of words in the English language as reflected by the Google Books 2012 English Fiction corpus. We critique an earlier method that found decreasing birth and increasing death rates of words over the second half of the 20th Century, showing death rates to be strongly affected by the imposed time cutoff of the arbitrary present and not increasing dramatically. We provide a robust, principled approach to examining lexical evolution by tracking the volume of word flux across various relative frequency thresholds. We show that while the overall statistical structure of the English language remains stable over time in terms of its raw Zipf distribution, we find evidence of an enduring ‘lexical turbulence’: The flux of words across frequency thresholds from decade to decade scales superlinearly with word rank and exhibits a scaling break we connect to that of Zipf's law. To better understand the changing lexicon, we examine the contributions to the Jensen-Shannon divergence of individual words crossing frequency thresholds. We also find indications that scholarly works about fiction are strongly represented in the 2012 English Fiction corpus, and suggest that a future revision of the corpus should attempt to separate critical works from fiction itself.
Transitions in climate and energy discourse between Hurricanes Katrina and Sandy.
Emily Cody, Jennie Stephens, Jim Bagrow, Peter Sheridan Dodds, Chris Danforth. Journal of Environmental Studies and Sciences, 87-101, 7, 2017.[pdf] [journal page] [arXiv]
Abstract:
Although climate change and energy are intricately linked, their explicit connection is not always prominent in public discourse and the media. Disruptive extreme weather events, including hurricanes, focus public attention in new and different ways offering a unique window of opportunity to analyze how a focusing event influences public discourse. Media coverage of extreme weather events simultaneously shapes and reflects public discourse on climate issues. Here, we analyze climate and energy newspaper coverage of Hurricanes Katrina (2005) and Sandy (2012) using topic models, mathematical techniques used to discover abstract topics within a set of documents. Our results demonstrate that post-Katrina media coverage does not contain a climate change topic, and the energy topic is limited to discussion of energy prices, markets, and the economy with almost no explicit linkages made between energy and climate change. In contrast, post-Sandy media coverage does contain a prominent climate change topic, a distinct energy topic, as well as integrated representation of climate change and energy, indicating a shift in climate and energy reporting between Hurricane Katrina and Hurricane Sandy.
The Lexicocalorimeter: Gauging public health through caloric input and output on social media.
Sharon Alajajian, Jake Williams, Andy Reagan, Stephen C. Alajajian, Morgan Frank, Lewis Mitchell, Jacob Lahne, Chris Danforth, Peter Sheridan Dodds. PLoS ONE, e0168893, 12, 2017.[pdf] [journal page] [arXiv]
Abstract:
We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the “caloric content” of social media and other large-scale texts. We do so by constructing extensive yet improvable tables of food and activity related phrases, and respectively assigning them with sourced estimates of caloric intake and expenditure. We show that for Twitter, our naive measures of “caloric input”, “caloric output”, and the ratio of these measures are all strong correlates with health and well-being measures for the contiguous United States. Our caloric balance measure in many cases outperforms both its constituent quantities; is tunable to specific health and well-being measures such as diabetes rates; has the capability of providing a real-time signal reflecting a population’s health; and has the potential to be used alongside traditional survey data in the development of public policy and collective self-awareness. Because our Lexicocalorimeter is a linear superposition of principled phrase scores, we also show we can move beyond correlations to explore what people talk about in collective detail, and assist in the understanding and explanation of how population-scale conditions vary, a capacity unavailable to black-box type methods.
The emotional arcs of stories are dominated by six basic shapes.
Andy Reagan, Lewis Mitchell, Dilan Kiley, Chris Danforth, Peter Sheridan Dodds. EPJ Data Science, 31, 5, 2016.[pdf] [journal page] [arXiv] [online appendices]
Abstract:
Advances in computing power, natural language processing, and digitization of text now make it possible to study our a culture's evolution through its texts using a "big data" lens. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories, forming patterns that are meaningful to us. Here, by classifying the emotional arcs for a filtered subset of 1,737 stories from Project Gutenberg's fiction collection, we find a set of six core trajectories which form the building blocks of complex narratives. We strengthen our findings by separately applying optimization, linear decomposition, supervised learning, and unsupervised learning. For each of these six core emotional arcs, we examine the closest characteristic stories in publication today and find that particular emotional arcs enjoy greater success, as measured by downloads.
Sifting robotic from organic text: A natural language approach for detecting automation on Twitter.
Jake Williams, Chris Jones, Richard Galbraith, Chris Danforth, Peter Sheridan Dodds. Journal of Computational Science, 1-7, 16, 2016.[pdf] [journal page]
Abstract:
Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) to the malevolent (e.g., spamming messages, advertisements, or radical opinions). Existing detection algorithms typically leverage metadata (time between tweets, number of followers, etc.) to identify robotic accounts. Here, we present a powerful classification scheme that exclusively uses the natural language text from organic users to provide a criterion for identifying accounts posting automated messages. Since the classifier operates on text alone, it is flexible and may be applied to any textual data beyond the Twittersphere.
What we write about when we write about causality: Features of causal statements across large-scale social discourse..
Thomas McAndrew, Joshua Bongard, Chris Danforth, Peter Sheridan Dodds, Paul Hines, Jim Bagrow. Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on, 519-524, , 2016.[pdf] [journal page] [arXiv]
Abstract:
Identifying and communicating relationships between causes and effects is important for understanding our world, but is affected by language structure, cognitive and emotional biases, and the properties of the communication medium. Despite the increasing importance of social media, much remains unknown about causal statements made online. To study real-world causal attribution, we extract a large-scale corpus of causal statements made on the Twitter social network platform as well as a comparable random control corpus. We compare causal and control statements using statistical language and sentiment analysis tools. We find that causal statements have a number of significant lexical and grammatical differences compared with controls and tend to be more negative in sentiment than controls. Causal statements made online tend to focus on news and current events, medicine and health, or interpersonal relationships, as shown by topic models. By quantifying the features and potential biases of causality communication, this study improves our understanding of the accuracy of information and opinions found online.
Public opinion polling with Twitter.
Emily Cody, Andy Reagan, Peter Sheridan Dodds, Chris Danforth. Preprint, 2016.[pdf] [arXiv] [online appendices]
Abstract:
Solicited public opinion surveys reach a limited subpopulation of willing participants and are expensive to conduct, leading to poor time resolution and a restricted pool of expert-chosen survey topics. In this study, we demonstrate that unsolicited public opinion polling through sentiment analysis applied to Twitter correlates well with a range of traditional measures, and has predictive power for issues of global importance. We also examine Twitter’s potential to canvas topics seldom surveyed, including ideas, personal feelings, and perceptions of commercial enterprises. Two of our major observations are that appropriately filtered Twitter sentiment (1) predicts President Obama’s job approval three months in advance, and (2) correlates well with surveyed consumer sentiment. To make possible a full examination of our work and to enable others’ research, we make public over 10,000 data sets, each a seven-year series of daily word counts for tweets containing a frequently used search term.
Vaporous marketing: Uncovering pervasive electronic cigarette advertisements on Twitter.
Chris Jones, Jake Williams, Allison Kurti, Mitchell Norotsky, Chris Danforth, Peter Sheridan Dodds. PLoS ONE, , 11, 2016.[pdf] [journal page]
Abstract:
Twitter has become the “wild-west” of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across Twitter feeds, offering discounts, “kid-friendly” flavors, algorithmically generated false testimonials, and free samples.
All electronic cigarette keyword related tweets from a 10% sample of Twitter spanning January 2012 through December 2014 (approximately 850,000 total tweets) were identified and categorized as Automated or Organic by combining a keyword classification and a machine trained Human Detection algorithm. A sentiment analysis using Hedonometrics was performed on Organic tweets to quantify the change in consumer sentiments over time. Commercialized tweets were topically categorized with key phrasal pattern matching.
The overwhelming majority (80%) of tweets were classified as automated or promotional in nature. The majority of these tweets were coded as commercialized (83.65% in 2013), up to 33% of which offered discounts or free samples and appeared on over a billion twitter feeds as impressions. The positivity of Organic (human) classified tweets has decreased over time (5.84 in 2013 to 5.77 in 2014) due to a relative increase in the negative words ‘ban’, ‘tobacco’, ‘doesn’t’, ‘drug’, ‘against’, ‘poison’, ‘tax’ and a relative decrease in the positive words like ‘haha’, ‘good’, ‘cool’. Automated tweets are more positive than organic (6.17 versus 5.84) due to a relative increase in the marketing words like ‘best’, ‘win’, ‘buy’, ‘sale’, ‘health’, ‘discount’ and a relative decrease in negative words like ‘bad’, ‘hate’, ‘stupid’, ‘don’t’.
Due to the youth presence on Twitter and the clinical uncertainty of the long term health complications of electronic cigarette consumption, the protection of public health warrants scrutiny and potential regulation of social media marketing.
Tracking climate change through the spatiotemporal dynamics of the Teletherms: The statistically hottest and coldest days of the year.
Peter Sheridan Dodds, Lewis Mitchell, Andy Reagan, Chris Danforth. PLoS ONE, e0154184, 11 (5), 2016.[pdf] [journal page] [arXiv] [online appendices]
Abstract:
Instabilities and long term shifts in seasons, whether induced by natural drivers or human activities, pose great disruptive threats to ecological, agricultural, and social systems. Here, we propose, quantify, and explore two fundamental markers of seasonal variations: the Summer and Winter Teletherms—the on-average annual dates of the hottest and coldest days of the year. We analyse daily temperature extremes recorded at 1218 stations across the contiguous United States from 1853–2012 to obtain estimates of the Teletherms, and to characterize their spatial and temporal dynamics. We observe substantial regional variation with the Summer Teletherm falling up to 90 days after the Summer Solstice, and 50 days for the Winter Teletherm after the Winter Solstice, and that in many locations, the Teletherm is better described as one or more sequences of dates—the Teletherm Period. We show Teletherm temporal dynamics are substantive with clear and in some cases dramatic changes moving across broad regions, suggesting links to climate change. We also compare recorded daily temperature extremes with output from two weather models finding considerable though relatively unbiased error.
Game story space of professional sports: Australian Rules Football.
Dilan Kiley, Andy Reagan, Lewis Mitchell, Chris Danforth, Peter Sheridan Dodds. Physical Review E, , , 2016.[pdf] [arXiv]
Abstract:
Sports are spontaneous generators of stories. Through skill and chance, the script of each game is dynamically written in real time by players acting out possible trajectories allowed by a sport's rules. By properly characterizing a given sport's ecology of `game stories', we are able to capture the sport's capacity for unfolding interesting narratives, in part by contrasting them with random walks. Here, we explore the game story space afforded by a data set of 1,310 Australian Football League (AFL) score lines. We find that AFL games exhibit a continuous spectrum of stories and show how coarse-graining reveals identifiable motifs ranging from last minute comeback wins to one-sided blowouts. Through an extensive comparison with a random walk null model, we show that AFL games are superdiffusive and deliver a much broader array of motifs, and we provide consequent insights into the narrative appeal of real games.
Selection models of language production support informed text partitioning: an intuitive and practical bag-of-phrases framework for text analysis.
Jake Williams, Jim Bagrow, Andy Reagan, Sharon Alajajian, Chris Danforth, Peter Sheridan Dodds. Preprint, 2016.[pdf] [arXiv]
Abstract:
The task of text segmentation, or 'chunking,' may occur at many levels in text analysis, depending on whether it is most beneficial to break it down by paragraphs of a book, sentences of a paragraph, etc. Here, we focus on a fine-grained segmentation task, which we refer to as text partitioning, where we apply methodologies to segment sentences or clauses into phrases, or lexical constructions of one or more words. In the past, we have explored (uniform) stochastic text partitioning---a process on the gaps between words whereby each space assumes one from a binary state of fixed (word binding) or broken (word separating) by some probability. In that work, we narrowly explored perhaps the most naive version of this process: random, or, uniform stochastic partitioning, where all word-word gaps are prescribed a uniformly-set breakage probability, q. Under this framework, the breakage probability is a tunable parameter, and was set to be pure-uniform: q = 1/2. In this work, we explore phrase frequency distributions under variation of the parameter q, and define non-uniform, or informed stochastic partitions, where q is a function of surrounding information. Using a crude but effective function for q, we go on to apply informed partitions to over 20,000 English texts from the Project Gutenberg eBooks database. In these analyses, we connect selection models to generate a notion of fit goodness for the 'bag-of-terms' (words or phrases) representations of texts, and find informed (phrase) partitions to be an improvement over the q = 1 (word) and q = 1/2 (phrase) partitions in most cases. This, together with the scalability of the methods proposed, suggests that the bag-of-phrases model should more often than not be implemented in place of the bag-of-words model, setting the stage for a paradigm shift in feature selection, which lies at the foundation of text analysis methodology.
Quantitative patterns in drone wars.
Javier Garcia-Bernardo, Peter Sheridan Dodds, Neil Johnson. Physica A: Statistical Mechanics and its Applications, 380-384, 443, 2016.[pdf] [journal page]
Abstract:
Attacks by drones (i.e., unmanned combat air vehicles) continue to generate heated political and ethical debates. Here we examine the quantitative nature of drone attacks, focusing on how their intensity and frequency compare with that of other forms of human conflict. Instead of the power-law distribution found recently for insurgent and terrorist attacks, the severity of attacks is more akin to lognormal and exponential distributions, suggesting that the dynamics underlying drone attacks lie beyond these other forms of human conflict. We find that the pattern in the timing of attacks is consistent with one side having almost complete control, an important if expected result. We show that these novel features can be reproduced and understood using a generative mathematical model in which resource allocation to the dominant side is regulated through a feedback loop.
Zipf's law is a consequence of coherent language production.
Jake Williams, Jim Bagrow, Andy Reagan, Sharon Alajajian, Chris Danforth, Peter Sheridan Dodds. Preprint, 2016.[pdf] [arXiv]
Abstract:
The task of text segmentation may be undertaken at many levels in text analysis---paragraphs, sentences, words, or even letters. Here, we focus on a relatively fine scale of segmentation, hypothesizing it to be in accord with a stochastic model of language generation, as the smallest scale where independent units of meaning are produced. Our goals in this letter include the development of methods for the segmentation of these minimal independent units, which produce feature-representations of texts that align with the independence assumption of the bag-of-terms model, commonly used for prediction and classification in computational text analysis. We also propose the measurement of texts' association (with respect to realized segmentations) to the model of language generation. We find (1) that our segmentations of phrases exhibit much better associations to the generation model than words and (2), that texts which are well fit are generally topically homogeneous. Because our generative model produces Zipf's law, our study further suggests that Zipf's law may be a consequence of homogeneity in language production.
Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs.
Andy Reagan, Brian Tivnan, Jake Williams, Chris Danforth, Peter Sheridan Dodds. Preprint, 2015.[pdf] [arXiv]
Abstract:
The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, bearing profound implications for our understanding of human behavior. Given the growing assortment of sentiment measuring instruments, comparisons between them are evidently required. Here, we perform detailed tests of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 20 methods. We show that a dictionary-based method will only perform both reliably and meaningfully if (1) the dictionary covers a sufficiently large enough portion of a given text's lexicon when weighted by word usage frequency; and (2) words are scored on a continuous scale.
Identifying missing dictionary entries with frequency-conserving context models.
Jake Williams, Jim Bagrow, Chris Danforth, Peter Sheridan Dodds. Physical Review E, 042808, 92, 2015.[pdf] [journal page] [arXiv]
Abstract:
In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability in the presence of ordered symbolic data (e.g., text, speech, genes, etc...). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary—an extensive, online, collaborative, and open-source dictionary that contains over 100, 000 phrasal-definitions—we develop highly effective filters for the identification of meaningful, missing phrase-entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique, and expanding our knowledge of the defined English lexicon of phrases.
Predicting Flow Reversals in a Computational Fluid Dynamics Simulated Thermosyphon using Data Assimilation.
Andy Reagan, Yves Dubief, Peter Sheridan Dodds, Chris Danforth. PLoS ONE, , , 2015.[pdf] [arXiv]
Abstract:
A thermal convection loop is a annular chamber filled with water, heated on the bottom half and cooled on the top half. With sufficiently large forcing of heat, the direction of fluid flow in the loop oscillates chaotically, dynamics analogous to the Earth’s weather. As is the case for state-of-the-art weather models, we only observe the statistics over a small region of state space, making prediction difficult. To overcome this challenge, data assimilation (DA) methods, and specifically ensemble methods, use the computational model itself to estimate the uncertainty of the model to optimally combine these observations into an initial condition for predicting the future state. Here, we build and verify four distinct DA methods, and then, we perform a twin model experiment with the computational fluid dynamics simulation of the loop using the Ensemble Transform Kalman Filter (ETKF) to assimilate observations and predict flow reversals. We show that using adaptively shaped localized covariance outperforms static localized covariance with the ETKF, and allows for the use of less observations in predicting flow reversals. We also show that a Dynamic Mode Decomposition (DMD) of the temperature and velocity fields recovers the low dimensional system underlying reversals, finding specific modes which together are predictive of reversal direction.
Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution.
Eitan Pechenick, Chris Danforth, Peter Sheridan Dodds. PLoS ONE, e0137041, 10, 2015.[pdf] [journal page] [arXiv]
Abstract:
It is tempting to treat frequency trends from Google Books data sets as indicators for the true popularity of various words and phrases. Doing so allows us to draw novel conclusions about the evolution of public perception of a given topic, such as time and gender. However, sampling published works by availability and ease of digitization leads to several important effects. One of these is the surprising ability of a single prolific author to noticeably insert new phrases into a language. A greater effect arises from scientific texts, which have become increasingly prolific in the last several decades and are heavily sampled in the corpus. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. Here, we highlight these dynamics by examining and comparing major contributions to the statistical divergence of English data sets between decades in the period 1800–2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts, in clear contrast to the first version of the fiction data set and both unfiltered English data sets. Our findings emphasize the need to fully characterize the dynamics of the Google Books corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.
Nonlinear functional mapping of the human brain.
Nick Allgaier, Tobias Banaschewski, Arun Bokde, Gareth Barker, Joshua Bongard, Chris Danforth, Peter Sheridan Dodds, Robert A. Whelan, Hugh Garavan. Preprint, 2015.[pdf] [arXiv]
Abstract:
The field of neuroimaging has truly become data rich, and novel analytical methods capable of gleaning meaning- ful information from large stores of imaging data are in high demand. Those methods that might also be applicable on the level of individual subjects, and thus potentially useful clinically, are of special interest. In the present study, we introduce just such a method, called nonlinear functional mapping (NFM), and demonstrate its application in the analysis of resting state fMRI (functional Magnetic Resonance Imaging) from a 242-subject subset of the IMAGEN project, a European study of adolescents that includes longitudinal phenotypic, behavioral, genetic, and neuroimaging data. NFM employs a computational technique inspired by biological evolution to discover and mathematically char- acterize interactions among ROI (regions of interest), without making linear or univariate assumptions. We show that statistics of the resulting interaction relationships comport with recent independent work, constituting a preliminary cross-validation. Furthermore, nonlinear terms are ubiquitous in the models generated by NFM, suggesting that some of the interactions characterized here are not discoverable by standard linear methods of analysis. We discuss one such nonlinear interaction in the context of a direct comparison with a procedure involving pairwise correlation, designed to be an analogous linear version of functional mapping. We find another such interaction that suggests a novel distinction in brain function between drinking and non-drinking adolescents: a tighter coupling of ROI associated with emotion, re- ward, and interoceptive processes such as thirst, among drinkers. Finally, we outline many improvements and extensions of the methodology to reduce computational expense, complement other analytical tools like graph-theoretic analysis, and allow for voxel level NFM to eliminate the necessity of ROI selection.
Climate change sentiment on Twitter: An unsolicited public opinion poll.
Emily Cody, Andy Reagan, Lewis Mitchell, Peter Sheridan Dodds, Chris Danforth. PLoS ONE, e0136092, 10, 2015.[pdf] [journal page] [arXiv]
Abstract:
The consequences of anthropogenic climate change are extensively debated through scientific papers, newspaper articles, and blogs. Newspaper articles may lack accuracy, while the severity of findings in scientific papers may be too opaque for the public to understand. Social media, however, is a forum where individuals of diverse backgrounds can share their thoughts and opinions. As consumption shifts from old media to new, Twitter has become a valuable resource for analyzing current events and headline news. In this research, we analyze tweets containing the word “climate” collected between September 2008 and July 2014. We determine how collective sentiment varies in response to climate change news, events, and natural disasters. Words uncovered by our analysis suggest that responses to climate change news are predominately from climate change activists rather than climate change deniers, indicating that Twitter is a valuable resource for the spread of climate change awareness.
Zipf's Law holds for phrases, not words.
Jake Williams, Peter Sheridan Dodds, Chris Danforth, Jim Bagrow, Suma Desu, Paul Lessard. Nature Scientific Reports, 12209, 5, 2015.[pdf] [journal page] [arXiv] [online appendices]
Abstract:
With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf's law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases.
Reply to Garcia et al.: Common mistakes in measuring frequency dependent word characteristics.
Peter Sheridan Dodds, Suma Desu, Morgan Frank, Andy Reagan, Jake Williams, Lewis Mitchell, Isabel Kloumann, Jim Bagrow, Karine Megerdoomian, Matthew T. McMahon, Brian Tivnan, Chris Danforth, Kameron D. Harris. Proceedings of the National Academy of Sciences, E2984–E2985, 112, 2015.[pdf] [journal page] [arXiv] [online appendices]
Abstract:
The concerns expressed by Garcia et al. are misplaced due to a range of misconceptions about word usage frequency, word rank, and expert-constructed word lists such as LIWC (Linguist Inquiry and Word Count). We provide a complete response in our paper's online appendices.
Social media appears to affect the timing, location, and severity of school shootings.
Javier Garcia-Bernardo, Hong Qi, James Shultz, Alyssa Cohen, Neil Johnson, Peter Sheridan Dodds. Preprint, 2015.[pdf] [arXiv]
Abstract:
Over the past two decades, school shootings within the United States have repeatedly devastated communities and shaken public opinion. Many of these attacks appear to be `lone wolf' ones driven by specific individual motivations, and the identification of precursor signals and hence actionable policy measures would thus seem highly unlikely. Here, we take a system wide view and investigate the timing of school attacks and the dynamical feedback with social media. We identify a trend divergence in which college attacks have continued to accelerate over the last 25 years while those carried out on K-12 schools have slowed down. We establish the copycat effect in school shootings and uncover a statistical association between social media chatter and the probability of an attack in the following days. While hinting at causality, this relationship may also help mitigate the frequency and intensity of future attacks.
Text mixing shapes the anatomy of rank-frequency distributions.
Jake Williams, Peter Sheridan Dodds, Chris Danforth, Jim Bagrow. Physical Review E, 052811, 91, 2015.[pdf] [journal page] [arXiv]
Abstract:
Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this `law' of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and non-core lexica. Here, we present and defend an alternative hypothesis, that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages, we find emphatic empirical support for the universality of our claim.
Human language reveals a universal positivity bias.
Peter Sheridan Dodds, Suma Desu, Morgan Frank, Andy Reagan, Jake Williams, Lewis Mitchell, Kameron D. Harris, Isabel Kloumann, Jim Bagrow, Karine Megerdoomian, Matthew T. McMahon, Brian Tivnan, Chris Danforth. Proceedings of the National Academy of Sciences, 2389-2394, 112, 2015.[pdf] [journal page] [arXiv] [online appendices]
Abstract:
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.
Social Media Meets Population Health: A Sentiment And Demographic Analysis of Tobacco and E-Cigarette Use Across The “Twittersphere”.
Chris Jones, Diann E. Gaalema, T.J. White, Ryan Redner, Peter Sheridan Dodds, Chris Danforth. Value in Health, , 17, 2014.[pdf] [journal page]
Abstract:
Twitter, a popular social media outlet, has become a useful tool for the study of social behavior through user interactions called tweets. The location time, and message content of tweets provide invaluable social and demographic information for an applied comparison of social behaviors across the world. Our goal is to determine the density and sentiment surrounding tobacco and e-cigarette tweets and link prevalence of word choices to tobacco and e-cigarette use at various localities.
Estimation of Global Network Statistics from Incomplete Data..
Cathy Bliss, Chris Danforth, Peter Sheridan Dodds. PLoS ONE, , , 2014.[pdf]
Abstract:
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
Constructing a taxonomy of fine-grained human movement and activity motifs through social media.
Morgan Frank, Jake Williams, Lewis Mitchell, Jim Bagrow, Peter Sheridan Dodds, Chris Danforth. Preprint, 2014.[pdf] [journal page] [arXiv]
Abstract:
Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequently visited locations along with their likely activities as a function of time of day and day of week, capitalizing on both the content and geolocation of messages. We subsequently characterize people's transition pattern motifs and demonstrate that spatial information is encoded in word choice.
Collective Philanthropy: Describing and Modeling the Ecology of Giving.
Andy Reagan, Peter Sheridan Dodds, William L. Gottesman. PLoS ONE, e98876, 9, 2014.[pdf] [journal page] [arXiv] [online appendices]
Abstract:
Reflective of income and wealth distributions, philanthropic gifting appears to follow an approximate power-law size distribution as measured by the size of gifts received by individual institutions. We explore the ecology of gifting by analysing data sets of individual gifts for a diverse group of institutions dedicated to education, medicine, art, public support, and religion. We find that the detailed forms of gift-size distributions differ across but are relatively constant within charity categories. We construct a model for how a donor's income affects their giving preferences in different charity categories, offering a mechanistic explanation for variations in institutional gift-size distributions. We discuss how knowledge of gift-sized distributions may be used to assess an institution's gift-giving profile, to help set fundraising goals, and to design an institution-specific giving pyramid.
Multi-lingual Valence Analysis Across 20th Century Literature and the Twittershere.
Chris Danforth, Peter Sheridan Dodds. 2014.[pdf] [journal page]
Abstract:
Understanding and statistically processing underlying trends in natural human language has been an ongoing goal in Computational Social Science. This work explores trends in several languages, using expressions found on the internet, in 20th century literature, and social media. We use a Hedonometer to measure happiness in several corpora, using human ratings of emotionally charged words. Previous work has established and tested the instrument on English corpora, discovering a bias towards positive word usage in billions of tweets, millions of books, music lyrics, and media articles. Until now, it has remained an open question as to whether this trend is prevalent with respect to other languages. This work extends these previous analyses through a multilingual extension of the hedonometer to uncover interesting stories and underlying trends from literature and across social media
Direct, physically motivated derivation of triggering probabilities for spreading processes on generalized random networks.
Joshua L. Payne, Kameron D. Harris, Peter Sheridan Dodds. 2014.[pdf] [arXiv]
Abstract:
We derive a general expression for the probability of global spreading starting from a single infected seed for contagion processes acting on generalized, correlated random networks. We employ a simple probabilistic argument that encodes the spreading mechanism in an intuitive, physical fashion. We use our approach to directly and systematically obtain triggering probabilities for contagion processes acting on a collection of random network families including bipartite random networks. We find the contagion condition, the location of the phase transition into an endemic state, from an expansion about the disease-free state.
An evolutionary algorithm approach to link prediction in dynamic social networks.
Cathy Bliss, Chris Danforth, Morgan Frank, Peter Sheridan Dodds. Journal of Computational Science, 750-764, 5, 2014.[pdf] [journal page] [arXiv]
Abstract:
Many real world, complex phenomena have underlying structures of evolving networks where nodes and links are added and removed over time. A central scientific challenge is the description and explanation of network dynamics, with a key test being the prediction of short and long term changes. For the problem of short-term link prediction, existing methods attempt to determine neighborhood metrics that correlate with the appearance of a link in the next observation period. Recent work has suggested that the incorporation of user-specific metadata and usage patterns can improve link prediction, however methodologies for doing so in a systematic way are largely unexplored in the literature. Here, we provide an approach to predicting future links by applying an evolutionary algorithm to weights which are used in a linear combination of sixteen neighborhood and node similarity indices. We examine Twitter reciprocal reply networks constructed at the time scale of weeks, both as a test of our general method and as a problem of scientific interest in itself. Our evolved predictors exhibit a thousand-fold improvement over random link prediction with high levels of precision for the top twenty predicted links, to our knowledge strongly outperforming all extant methods. Based on our findings, we suggest possible factors which may be driving the evolution of Twitter reciprocal reply networks.
Shadow networks: Discovering hidden nodes with models of information flow.
Jim Bagrow, Suma Desu, Morgan Frank, Narine Manukyan, Lewis Mitchell, Andy Reagan, Eric Bloedorn, Lashon B. Booker, Luther K. Branting, Michael J. Smith, Brian Tivnan, Chris Danforth, Peter Sheridan Dodds, Joshua Bongard. Preprint, 2013.[pdf] [arXiv]
Abstract:
Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing or deliberately hidden nodes may be detected in networks by the effect of their absence on predictions of the speed with which information flows through the network. We use Symbolic Regression (SR) to learn models relating information flow to network topology. These models show localized, systematic, and non-random discrepancies when applied to test networks with intentionally masked nodes, demonstrating the ability to detect the presence of missing nodes and where in the network those nodes are likely to reside.
Happiness and the Patterns of Life: A Study of Geolocated Tweets.
Chris Danforth, Lewis Mitchell, Morgan Frank, Peter Sheridan Dodds. Nature Scientific Reports, 2625, 3, 2013.[pdf] [journal page] [arXiv]
Abstract:
The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies previously had limited access to conversation content, rendering changes in expression as a function of movement invisible. In addition, they typically use the communication between a mobile phone and its nearest antenna tower to infer position, limiting the spatial resolution of the data to the geographical region serviced by each cellphone tower. We use a collection of 37 million geolocated tweets to characterize the movement patterns of 180,000 individuals, taking advantage of several orders of magnitude of increased spatial accuracy relative to previous work. Employing the recently developed sentiment analysis instrument known as the hedonometer, we characterize changes in word usage as a function of movement, and find that expressed happiness increases logarithmically with distance from an individual's average location.
The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place.
Chris Danforth, Kameron D. Harris, Lewis Mitchell, Morgan Frank, Peter Sheridan Dodds. PLoS ONE, e64417, 8, 2013.[pdf] [journal page] [arXiv] [online appendices]
Abstract:
We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated over the course of several recent years on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-level measures such as obesity rates.
Dynamics of influence processes on networks: Complete mean-field theory; the roles of response functions, connectivity, and synchrony; and applications to social contagion.
Chris Danforth, Kameron D. Harris, Peter Sheridan Dodds. Physical Review E, , , 2013.[pdf] [journal page] [arXiv]
Abstract:
We study binary state dynamics on a network where each node acts in response to the average state of its neighborhood. Allowing varying amounts of stochasticity in both the network and node responses, we find different outcomes in random and deterministic versions of the model. In the limit of a large, dense network, however, we show that these dynamics coincide. We construct a general mean field theory for random networks and show this predicts that the dynamics on the network are a smoothed version of the average response function dynamics. Thus, the behavior of the system can range from steady state to chaotic depending on the response functions, network connectivity, and update synchronicity. As a specific example, we model the competing tendencies of imitation and non-conformity by incorporating an off-threshold into standard threshold models of social contagion. In this way we attempt to capture important aspects of fashions and societal trends. We compare our theory to extensive simulations of this 'limited imitation contagion' model on Poisson random graphs, finding agreement between the mean-field theory and stochastic simulations.
Dynamical influence processes on networks: General theory and applications to social contagion.
Kameron D. Harris, Chris Danforth, Peter Sheridan Dodds. Physical Review E, , 88, 2013.[pdf] [journal page]
Abstract:
We study binary state dynamics on a network where each node acts in response to the average state of its neighborhood. By allowing varying amounts of stochasticity in both the network and node responses, we find different outcomes in random and deterministic versions of the model. In the limit of a large, dense network, however, we show that these dynamics coincide. We construct a general mean-field theory for random networks and show this predicts that the dynamics on the network is a smoothed version of the average response function dynamics. Thus, the behavior of the system can range from steady state to chaotic depending on the response functions, network connectivity, and update synchronicity. As a specific example, we model the competing tendencies of imitation and nonconformity by incorporating an off-threshold into standard threshold models of social contagion. In this way, we attempt to capture important aspects of fashions and societal trends. We compare our theory to extensive simulations of this “limited imitation contagion” model on Poisson random graphs, finding agreement between the mean-field theory and stochastic simulations.
Limited Imitation Contagion on Random Networks: Chaos, Universality, and Unpredictability.
Chris Danforth, Kameron D. Harris, Peter Sheridan Dodds. Physical Review Letters, 158701, 110, 2013.[pdf] [journal page] [arXiv]
Abstract:
We study a family of binary state, socially-inspired contagion models which incorporate imitation limited by an aversion to complete conformity. We uncover rich behavior in our models whether operating with probabilistic or deterministic individual response functions, both on dynamic or fixed random networks. In particular, we find significant variation in the limiting behavior of a population's infected fraction, ranging from steady-state to chaotic. We show that period doubling arises as we increase the average node degree, and that the universality class of this well known route to chaos depends on the interaction structure of random networks rather than the microscopic behavior of individual nodes. We find that increasing the fixedness of the system tends to stabilize the infected fraction, yet disjoint, multiple equilibria are possible depending solely on the choice of the initially infected node.
Testing the metabolic theory of ecology.
Andrew Clarke, Andrew J. Kerkhoff, Charles A. Price, David A. Coomes, Han Olff, James Stegen, Joshua Weitz, Katherine McCulloh, Karl J. Niklas, Nathan G. Swenson, Peter Sheridan Dodds, Rampal S. Etienne, Van M. Savage. Ecology Letters, 1465-1474, 15, 2012.[pdf] [journal page]
Abstract:
The Metabolic Theory of Ecology (MTE) predicts the effects of body size and temperature on metabolism through considerations of vascular distribution networks and biochemical kinetics. MTE has also been extended to characterize processes from cellular to global levels. MTE has generated both enthusiasm and controversy across a broad range of research areas. However, most efforts that claim to validate or invalidate MTE have focused on testing predictions. We argue that critical evaluation of MTE also requires strong tests of both its theoretical foundations and simplifying assumptions. To this end we synthesize available information and find that MTE's original derivations are incomplete and require additional assumptions to obtain the full scope of attendant predictions. Moreover, although some of MTE's simplifying assumptions are well supported by data, others are inconsistent with empirical tests and even more remain untested. Further, though many predictions are empirically supported on average, work remains to explain the often large variability in data. We suggest that greater effort be focused on evaluating MTE's underlying theory and simplifying assumptions in order to help delineate the scope of MTE, generate new theory, and shed light on fundamental aspects of biological form and function.
Twitter reciprocal reply networks exhibit assortativity with respect to happiness.
Cathy Bliss, Chris Danforth, Isabel Kloumann, Kameron D. Harris, Peter Sheridan Dodds. Journal of Computational Science, 388-397, 3, 2012.[pdf] [journal page] [arXiv]
Abstract:
The advent of social media has provided an extraordinary, if imperfect, 'big data' window into the form and evolution of social networks. Based on nearly 40 million message pairs posted to Twitter between September 2008 and February 2009, we construct and examine the revealed social network structure and dynamics over the time scales of days, weeks, and months. At the level of user behavior, we employ our recently developed hedonometric analysis methods to investigate patterns of sentiment expression. We find users' average happiness scores to be positively and significantly correlated with those of users one, two, and three links away. We strengthen our analysis by proposing and using a null model to test the effect of network topology on the assortativity of happiness. We also find evidence that more well connected users write happier status updates, with a transition occurring around Dunbar's number. More generally, our work provides evidence of a social sub-network structure within Twitter and raises several methodological points of interest with regard to social network reconstructions.
Positivity of the English language.
Cathy Bliss, Chris Danforth, Isabel Kloumann, Kameron D. Harris, Peter Sheridan Dodds. PLoS ONE, e29484, 7, 2012.[pdf] [journal page] [arXiv]
Abstract:
Within the last million years, human language has emerged and evolved as a fundamental instrument of social communication and semiotic representation. People use language in part to convey emotional information, leading to the central and contingent questions: (1) What is the emotional spectrum of natural language? and (2) Are natural languages neutrally, positively, or negatively biased? Previous findings are mixed: suggestive evidence of a positive bias has been found in small samples of English words [1-3], framed as the Pollyanna Hypothesis [3] and Linguistic Positivity Bias [1], while the experimental elicitation of emotional words has instead found a strong negative bias [4]. Here, we report that the human-perceived positivity of over 10,000 of the most frequently used English words exhibits a clear positive bias. More deeply, we characterize and quantify distributions of word positivity for four large and distinct corpora, demonstrating that their form is surprisingly invariant with respect to frequency of word use.
Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter.
Cathy Bliss, Chris Danforth, Isabel Kloumann, Kameron D. Harris, Peter Sheridan Dodds. PLoS ONE, e26752, 6, 2011.[pdf] [journal page] [arXiv]
Abstract:
Individual happiness is a fundamental societal metric. Normally measured through self-report, happiness has often been indirectly characterized and overshadowed by more readily quantifiable economic indicators, such as gross domestic product. Here, we use a real-time, remote-sensing, non-invasive, text-based approach&emdash;a kind of hedonometer&emdash;to uncover collective dynamical patterns of happiness levels expressed by over 50 million users in the online, global social network Twitter. With a data set comprising nearly 2.8 billion expressions involving more than 28 billion words, we explore temporal variations in happiness, as well as information levels, over time scales of hours, days, and months. Among many observations, we find a steady global happiness level, evidence of universal weekly and daily patterns of happiness and information, and that happiness and information levels are generally uncorrelated. We also extract and analyse a collection of happiness and information trends based on keywords, showing them to be both sensible and informative, and in effect generating opinion polls without asking questions. Finally, we develop and employ a graphical method that reveals how individual words contribute to changes in average happiness between any two texts.
Exact solutions for social and biological contagion models on mixed directed and undirected, degree-correlated random networks.
Joshua L. Payne, Kameron D. Harris, Peter Sheridan Dodds. Physical Review E, 016110, 84, 2011.[pdf] [journal page] [arXiv]
Abstract:
We derive analytic expressions for the probability and expected size of global spreading events starting from a single infected seed for a broad collection of contagion processes acting on random networks with both directed and undirected edges and arbitrary degree-degree correlations. Our work extends previous theoretical developments for the undirected case, and we provide numerical support for our findings by investigating an example class of networks for which we are able to obtain closed-form expressions.
Direct, physically motivated derivation of the contagion condition for spreading processes on generalized random networks.
Peter Sheridan Dodds, Kameron D. Harris, Joshua L. Payne. Physical Review E, 056122, 83, 2011.[pdf] [journal page] [arXiv]
Abstract:
For a broad range single-seed contagion processes acting on generalized random networks, we derive a unifying analytic expression for the possibility of global spreading events in a straightforward, physically intuitive fashion. Our reasoning lays bare a direct mechanical understanding of an archetypal spreading phenomena that is not evident in circuitous extant mathematical approaches.
Optimal form of branching supply and collection networks.
Peter Sheridan Dodds. Physical Review Letters, 048702, 104, 2010.[pdf] [journal page] [arXiv]
Abstract:
For the problem of efficiently supplying material to a spatial region from a single source, we present a simple scaling argument based on branching network volume minimization that identifies limits to the scaling of sink density. We discuss implications for two fundamental and unresolved problems in organismal biology and geomorphology: how basal metabolism scales with body size for homeotherms and the scaling of drainage basin shape on eroding landscapes.
Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents.
Chris Danforth, Peter Sheridan Dodds. Journal of Happiness Studies, 444-456, 11, 2010.[pdf] [journal page]
Abstract:
The importance of quantifying the nature and intensity of emotional states at the level of populations is evident: we would like to know how, when, and why individuals feel as they do if we wish, for example, to better construct public policy, build more successful organizations, and, from a scientific perspective, more fully understand economic and social phenomena. Here, by incorporating direct human assessment of words, we quantify happiness levels on a continuous scale for a diverse set of large-scale texts: song titles and lyrics, weblogs, and State of the Union addresses. Our method is transparent, improvable, capable of rapidly processing Web-scale texts, and moves beyond approaches based on coarse categorization. Among a number of observations, we find that the happiness of song lyrics trends downward from the 1960's to the mid 1990's while remaining stable within genres, and that the happiness of blogs has steadily increased from 2005 to 2009, exhibiting a striking rise and fall with blogger age and distance from the equator.
Information cascades on degree-correlated random networks.
Joshua L. Payne, Margaret (Maggie) Eppstein, Peter Sheridan Dodds. Physical Review E, 026125, 80, 2009.[pdf] [journal page]
Abstract:
We investigate by numerical simulation a threshold model of social contagion on degree-correlated random networks. We show that the class of networks for which global information cascades occur generally expands as degree-degree correlations become increasingly positive. However, under certain conditions, large-scale information cascades can paradoxically occur when degree-degree correlations are sufficiently positive or negative, but not when correlations are relatively small. We also show that the relationship between the degree of the initially infected vertex and its ability to trigger large cascades is strongly affected by degree-degree correlations.
Analysis of a threshold model of social contagion on degree-correlated networks.
Joshua L. Payne, Peter Sheridan Dodds. Physical Review E, 066115, 79, 2009.[pdf] [journal page] [arXiv]
Abstract:
We analytically determine when a range of abstract social contagion models permit global spreading from a single seed on degree-correlated, undirected random networks. We deduce the expected size of the largest vulnerable component, a network's tinderbox-like critical mass, as well as the probability that infecting a randomly chosen individual seed will trigger global spreading. In the appropriate limits, our results naturally reduce to standard ones for models of disease spreading and to the condition for the existence of a giant component. Recent advances in the distributed, infinite seed case allow us to further determine the final size of global spreading events, when they occur. To provide support for our results, we derive exact expressions for key spreading quantities for a simple yet rich family of random networks with bimodal degree distributions.
Modeling social interactions: Identification, empirical methods and policy implications.
Catherine Tucker, David Godes, Harikesh Nair, Kartik Hosanagar, Matthew Bothner, Peter Sheridan Dodds, Puneet Manchanda, Wesley R. Hartmann. Marketing Letters, 287-304, 19, 2008.[pdf] [journal page]
Abstract:
Social interactions occur when agents in a network affect other agents' choices directly, as opposed to via the intermediation of markets. The study of such interactions and the resultant outcomes has long been an area of interest across a wide variety of social sciences. With the advent of electronic media that facilitate and record such interactions, this interest has grown sharply in the business world as well. In this paper, we provide a brief summary of what is known so far, discuss the main challenges for researchers interested in this area and provide a common vocabulary that will hopefully engender future (cross-disciplinary) research. The paper considers the challenges of distinguishing actual causal social interactions from other phenomena that may lead to a false inference of causality. Further, we distinguish between two broadly defined types of social interactions that relate to how strongly interactions spread through a network. We also provide a very selective review of how insights from other disciplines can improve and inform modeling choices. Finally, we discuss how models of social interaction can be used to provide guidelines for marketing policy and conclude with thoughts on future research directions.
Influentials, networks, and public opinion formation.
Duncan J. Watts, Peter Sheridan Dodds. Journal of Consumer Research, 441-458, 34, 2007.[pdf] [journal page]
Abstract:
A central idea in marketing and diffusion research is that influentials—a minority of individuals who influence an exceptional number of their peers—are important to the formation of public opinion. Here we examine this idea, which we call the ``influentials hypothesis,'' using a series of computer simulations of interpersonal influence processes. Under most conditions that we consider, we find that large cascades of influence are driven not by influentials but by a critical mass of easily influenced individuals. Although our results do not exclude the possibility that influentials can be important, they suggest that the influentials hypothesis requires more careful specification and testing than it has received.
Cooperation in evolving social networks.
Alexander Peterhansl, Duncan J. Watts, Nobuyuki Hanaki, Peter Sheridan Dodds. Management Science, 1036-1050, 53, 2007.[pdf] [journal page]
Abstract:
We study the problem of cooperative behavior emerging in an environment where individual behaviors and interaction structures coevolve. Players not only learn which strategy to adopt by imitating the strategy of the best-performing player they observe, but also choose with whom they should interact by selectively creating and/or severing ties with other players based on a myopic cost-benefit comparison. We find that scalable cooperation—that is, high levels of cooperation in large populations—can be achieved in sparse networks, assuming that individuals are able to sever ties unilaterally and that new ties can only be created with the mutual consent of both parties. Detailed examination shows that there is an important trade-off between local reinforcement and global expansion in achieving cooperation in dynamic networks. As a result, networks in which ties are costly and local structure is largely absent tend to generate higher levels of cooperation than those in which ties are made easily and friends of friends interact with high probability, where the latter result contrasts strongly with the usual intuition.
Experimental study of inequality and unpredictability in an artificial cultural market.
Duncan J. Watts, Matthew J. Salganik, Peter Sheridan Dodds. Science Magazine, 854-856, 311, 2006.[pdf] [journal page]
Abstract:
Hit songs, books, and movies are many times more successful than average, suggesting that ``the best'' alternatives are qualitatively different from ``the rest''; yet experts routinely fail to predict which products will succeed. We investigated this paradox experimentally, by creating an artificial ``music market'' in which 14,341 participants downloaded previously unknown songs either with or without knowledge of previous participants' choices. Increasing the strength of social influence increased both inequality and unpredictability of success. Success was also only partly determined by quality: The best songs rarely did poorly, and the worst rarely did well, but any other result was possible.
Multiscale, resurgent epidemics in a hierarchical metapopulation model.
Duncan J. Watts, Roby Muhamad, Daniel C. Medina, Peter Sheridan Dodds. Proceedings of the National Academies of Science, 11157-11162, 102, 2005.[pdf] [journal page]
Abstract:
Although population structure has long been recognized as relevant to the spread of infectious disease, traditional mathematical models have understated the role of nonhomogenous mixing in populations with geographical and social structure. Recently, a wide variety of spatial and network models have been proposed that incorporate various aspects of interaction structure among individuals. However, these more complex models necessarily suffer from limited tractability, rendering general conclusions difficult to draw. In seeking a compromise between parsimony and realism, we introduce a class of metapopulation models in which we assume homogeneous mixing holds within local contexts, and that these contexts are embedded in a nested hierarchy of successively larger domains. We model the movement of individuals between contexts via simple transport parameters and allow diseases to spread stochastically. Our model exhibits some important stylized features of real epidemics, including extreme size variation and temporal heterogeneity, that are difficult to characterize with traditional measures. In particular, our results suggest that when epidemics do occur the basic reproduction number R 0 may bear little relation to their final size. Informed by our model's behavior, we suggest measures for characterizing epidemic thresholds and discuss implications for the control of epidemics.
A generalized model of social and biological contagion.
Duncan J. Watts, Peter Sheridan Dodds. Journal of Theoretical Biology, 587-604, 232, 2005.[pdf] [journal page]
Abstract:
We present a model of contagion that unifies and generalizes threshold models of social contagion and epidemiological models of disease spreading. Our model incorporates individual memory of exposure to a contagious entity (e.g., a rumor or disease), variable magnitudes of exposure (dose sizes), and heterogeneity in the susceptibility of individuals. Through analysis and simulation, we examine in detail the case where individuals may recover from an infection and then immediately become susceptible again (analogous to the so-called SIS model). We identify three basic classes of contagion models which we call epidemic threshold, vanishing critical mass, and critical mass classes respectively, where each class of models corresponds to different strategies for prevention or facilitation. We find that the conditions for a particular contagion model to belong to one of the these three classes depend only on memory length and the probabilities of being infected by one and two exposures respectively. These parameters are in principle measurable for real contagious influences or entities, thus yielding empirical implications for our model. We also study the case where individuals attain permanent immunity once recovered, finding that epidemics inevitably die out but may be surprisingly persistent when individuals possess memory.
Universal behavior in a generalized model of contagion.
Duncan J. Watts, Peter Sheridan Dodds. Phyical Review Letters, 218701, 92, 2004.[pdf] [journal page]
Abstract:
Models of contagion arise broadly both in the biological and social sciences, with applications ranging from the transmission of infectious diseases to the diffusion of innovations and the spread of cultural fads. In this Letter, we introduce a general model of contagion which, by explicitly incorporating memory of past exposures to, for example, an infectious agent, rumor, or new product, includes the main features of existing contagion models and interpolates between them. We obtain exact solutions for a simple version of the model, finding that under general conditions only three classes of collective dynamics exist, two of which correspond to familiar epidemic threshold and critical mass dynamics, while the third is a distinct intermediate case. We find that for a given length of memory, the class into which a particular system falls is determined by two parameters, each of which ought to be measurable empirically. Our model suggests novel measures for assessing the susceptibility of a population to large contagion events, and also a possible strategy for inhibiting or facilitating them.
Information exchange and the robustness of organizational networks.
Charles F. Sabel, Duncan J. Watts, Peter Sheridan Dodds. Proc. Natl. Acad. Sci., 12516-12521, 100, 2003.[pdf] [journal page]
Abstract:
The dynamics of information exchange is an important but understudied aspect of collective communication, coordination, and problem solving in a wide range of distributed systems, both physical (e.g., the Internet) and social (e.g., business firms). In this paper, we introduce a model of organizational networks according to which links are added incrementally to a hierarchical backbone and test the resulting networks under variable conditions of information exchange. Our main result is the identification of a class of multiscale networks that reduce, over a wide range of environments, the likelihood that individual nodes will suffer congestion-related failure and that the network as a whole will disintegrate when failures do occur. We call this dual robustness property of multiscale networks ultrarobustness. Furthermore, we find that multiscale networks attain most of their robustness with surprisingly few link additions, suggesting that ultrarobust organizational networks can be generated in an efficient and scalable manner. Our results are directly relevant to the relief of congestion in communication networks and also more broadly to activities, like distributed problem solving, that require individuals to exchange information in an unpredictable manner.
An experimental study of search in global social networks.
Duncan J. Watts, Peter Sheridan Dodds, Roby Muhamad. Science Magazine, 827-829, 301, 2003.[pdf] [journal page]
Abstract:
We report on a global social-search experiment in which more than 60,000 e-mail users attempted to reach one of 18 target persons in 13 countries by forwarding messages to acquaintances. We find that successful social search is conducted primarily through intermediate to weak strength ties, does not require highly connected hubs to succeed, and, in contrast to unsuccessful social search, disproportionately relies on professional relationships. By accounting for the attrition of message chains, we estimate that social searches can reach their targets in a median of five to seven steps, depending on the separation of source and target, although small variations in chain lengths and participation rates generate large differences in target reachability. We conclude that although global social networks are, in principle, searchable, actual success depends sensitively on individual incentives.
Packing-limited growth of irregular objects.
Joshua Weitz, Peter Sheridan Dodds. Physical Review E, 016117, 67, 2003.[pdf] [journal page]
Abstract:
We study growth limited by packing for irregular objects in two dimensions. We generate packings by seeding objects randomly in time and space and allowing each object to grow until it collides with another object. The objects we consider, allow us to investigate the separate effects of anisotropy and non-unit aspect ratio. By means of a connection to the decay of pore-space volume, we measure power law exponents for the object size distribution. We carry out a mean field analysis, showing that it provides an upper bound for the size distribution exponent. We find that while the details of the growth mechanism are irrelevant, the exponent is strongly shape dependent. Potential applications lie in ecological and biological environments where sessile organisms compete for limited space as they grow.
Identity and search in social networks.
Duncan J. Watts, M. E. J. Newman, Peter Sheridan Dodds. Science Magazine, 1302-1305, 296, 2002.[pdf] [journal page]
Abstract:
Social networks have the surprising property of being searchable: Ordinary people are capable of directing messages through their network of acquaintances to reach a specific but distant target person in only a few steps. We present a model that offers an explanation of social network searchability in terms of recognizable personal identities: sets of characteristics measured along a number of social dimensions. Our model defines a class of searchable networks and a method for searching them that may be applicable to many network search problems, including the location of data files in peer-to-peer networks, pages on the World Wide Web, and information in distributed databases.
Packing-limited growth.
Joshua Weitz, Peter Sheridan Dodds. Physical Review E, 056108, 65, 2002.[pdf] [journal page]
Abstract:
We consider growing spheres seeded by random injection in time and space. Growth stops when two spheres meet leading eventually to a jammed state. We study the statistics of growth limited by packing theoretically in d dimensions and via simulation in d=2, 3, and 4. We show how a broad class of such models exhibit distributions of sphere radii with a universal exponent. We construct a scaling theory which relates the fractal structure of these models to the decay of their pore space, a theory which we confirm via numerical simulations. The scaling theory also predicts an upper bound for the universal exponent and is in exact agreement with numerical results for d=4.
Geometry of river networks. III. Characterization of component connectivity.
Daniel Rothman, Peter Sheridan Dodds. Physical Review E, 016117, 63, 2001.[pdf] [journal page]
Abstract:
Essential to understanding the overall structure of river networks is a knowledge of their detailed architecture. Here, we explore the presence of randomness in river network structure and the details of its consequences. We first show that an averaged view of network architecture is provided by a proposed self-similarity statement about the scaling of drainage density, a local measure of stream concentration. This scaling of drainage density is shown to imply Tokunaga's law, a description of the scaling of side branch abundance along a given stream, as well as a scaling law for stream lengths. We then consider fluctuations in drainage density and consequently the numbers of side branches. Data is analyzed for the Mississippi River basin and a model of random directed networks. Numbers of side streams are found to follow exponential distributions as are inter-tributary distances along streams. Finally, we derive the joint variation of side stream abundance with stream length, affording a full description of fluctuations in network structure. Fluctuations in side stream numbers are shown to be a direct result of fluctuations in stream lengths. This is the last paper in a series of three on the geometry of river networks.
Geometry of river networks. II. Distributions of component size and number.
Daniel Rothman, Peter Sheridan Dodds. Physical Review E, 016116, 63, 2001.[pdf] [journal page]
Abstract:
The structure of a river network may be seen as a discrete set of nested sub-networks built out of individual stream segments. These network components are assigned an integral stream order via a hierarchical and discrete ordering method. Exponential relationships, known as Horton's laws, between stream order and ensemble-averaged quantities pertaining to network components are observed. We extend these observations to incorporate fluctuations and all higher moments by developing functional relationships between distributions. The relationships determined are drawn from a combination of theoretical analysis, analysis of real river networks including the Mississippi, Amazon and Nile, and numerical simulations on a model of directed, random networks. Underlying distributions of stream segment lengths are identified as exponential. Combinations of these distributions form single-humped distributions with exponential tails, the sums of which are in turn shown to give power law distributions of stream lengths. Distributions of basin area and stream segment frequency are also addressed. The calculations identify a single length-scale as a measure of size fluctuations in network components. This article is the second in a series of three addressing the geometry of river networks.
Geometry of river networks. I. Scaling, fluctuations, and deviations.
Daniel Rothman, Peter Sheridan Dodds. Physical Review E, 016115, 63, 2001.[pdf] [journal page]
Abstract:
This article is the first in a series of three papers investigating the detailed geometry of river networks. Branching networks are a universal structure employed in the distribution and collection of material. Large-scale river networks mark an important class of two-dimensional branching networks, being not only of intrinsic interest but also a pervasive natural phenomenon. In the description of river network structure, scaling laws are uniformly observed. Reported values of scaling exponents vary suggesting that no unique set of scaling exponents exists. To improve this current understanding of scaling in river networks and to provide a fuller description of branching network structure, here we report a theoretical and empirical study of fluctuations about and deviations from scaling. We examine data for continent-scale river networks such as the Mississippi and the Amazon and draw inspiration from a simple model of directed, random networks. We center our investigations on the scaling of the length of a sub-basin's dominant stream with its area, a characterization of basin shape known as Hack's law. We generalize this relationship to a joint probability density and provide observations and explanations of deviations from scaling. We show that fluctuations about scaling are substantial and grow with system size. We find strong deviations from scaling at small scales which can be explained by the existence of linear network structure. At intermediate scales, we find slow drifts in exponent values indicating that scaling is only approximately obeyed and that universality remains indeterminate. At large scales, we observe a breakdown in scaling due to decreasing sample space and correlations with overall basin shape. The extent of approximate scaling is significantly restricted by these deviations and will not be improved by increases in network resolution.
Re-examination of the “3/4-law” of metabolism.
Peter Sheridan Dodds, Daniel Rothman, Joshua Weitz. Journal of Theoretical Biology, 9-27, 209, 2001.[pdf] [arXiv]
Abstract:
We examine the scaling law B∝Mα which connects organismal metabolic rate B with organismal mass M, where α is commonly held to be 3/4. Since simple dimensional analysis suggests α=2/3, we consider this to be a null hypothesis testable by empirical studies. We re-analyze data sets for mammals and birds compiled by Heusner, Bennett and Harvey, Bartels, Hemmingsen, Brody, and Kleiber, and find little evidence for rejecting α=2/3 in favor of α=3/4. For mammals, we find a possible breakdown in scaling for larger masses reflected in a systematic increase in α. We also review theoretical justifications of α=3/4 based on dimensional analysis, nutrient-supply networks, and four-dimensional biology. We find that present theories for α=3/4 require assumptions that render them unconvincing for rejecting the null hypothesis that α=2/3.
Scaling, universality, and geomorphology.
Daniel Rothman, Peter Sheridan Dodds. Annual Review of Earth and Planetary Sciences, 571-610, 28, 2000.[pdf] [journal page]
Abstract:
Theories of scaling apply wherever there is similarity across many scales. This similarity may be found in geometry and in dynamical processes. Universality arises when the qualitative character of a system is sufficient to quantitatively specify its essential features, such as the exponents that characterize scaling laws. Within geomorphology, two areas where the concepts of scaling and universality have found application are the geometry of river networks and the statistical structure of topography. We first provide a pedagogical review of scaling and universality. We then describe recent progress made in applying these ideas to networks and topography. This overview then leads to a synthesis of some widely scattered ideas that attempts a classification of surface and network properties based on generic mechanisms and geometric constraints. We also briefly review how these ideas may be applied to problems in sedimentology ranging from the structure of submarine canyons, the size distribution of turbidite deposits, and the origin of stromatolites.
Unified view of scaling laws for river networks.
Daniel Rothman, Peter Sheridan Dodds. Physical Review E, 4865-4877, 59, 1999.[pdf] [journal page] [arXiv]
Abstract:
Scaling laws that describe the structure of river networks are shown to follow from three simple assumptions. These assumptions are: (1) river networks are structurally self-similar, (2) single channels are self-affine, and (3) overland flow into channels occurs over a characteristic distance (drainage density is uniform). We obtain a complete set of scaling relations connecting the exponents of these scaling laws and find that only two of these exponents are independent. We further demonstrate that the two predominant descriptions of network structure (Tokunaga's law and Horton's laws) are equivalent in the case of landscapes with uniform drainage density. The results are tested with data from both real landscapes and a special class of random networks.
Most recent press:


