Selected Publications


MixItUp Demo - Toggle Filtering AND Logic

1 to 12 of 72

Characterizing the Google Books corpus: Strong limits to inferences of socio-cultural and linguistic evolution

PLoS ONE, 10, e0137041, 2015

Abstract: It is tempting to treat frequency trends from Google Books data sets as indicators for the true popularity of various words and phrases. Doing so allows us to draw novel conclusions about the evolution of public perception of a given topic, such as time and gender. However, sampling published works by availability and ease of digitization leads to several important effects. One of these is the surprising ability of a single prolific author to noticeably insert new phrases into a language. A greater effect arises from scientific texts, which have become increasingly prolific in the last several decades and are heavily sampled in the corpus. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. Here, we highlight these dynamics by examining and comparing major contributions to the statistical divergence of English data sets between decades in the period 1800–2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts, in clear contrast to the first version of the fiction data set and both unfiltered English data sets. Our findings emphasize the need to fully characterize the dynamics of the Google Books corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.


Status: Published

Citations: 5

Cite: [bibtex]


arXiv link

Journal link

Supplementary link


Authors:


Most recent press:

[edit database entry]
Stacks Image 525289
Joshua Bongard - Department of Computer Science, Associate Professor

Bongard's work focuses on understanding the general nature of cognition, regardless of whether it is found in humans, animals or robots. This unique approach focuses on the role that morphology and evolution plays in cognition. Addressing these questions has taken him into the fields of biology, psychology, engineering and computer science.


  • Stacks Image 525371
    Josh Bongard, Victor Zykov, Hod Lipson. Resilient Machines Through
    Continuous Self-Modeling.
    Science 314, 1118 (2006). [Journal Page]
  • Stacks Image 525379
    Joey Anetsberger and Josh Bongard. Robots can ground crowd-proposed symbols by forming theories of group mind. Proceedings of the Artificial Life Conference 2016. [Link to Proceedings]
  • Stacks Image 525375
    Sam Kriegman, Nick Cheney, and Josh Bongard. How morphological development can guide evolution. arXiv 2017. [arXiv]


Stacks Image 525306
Chris Danforth -Department of Mathematics and Statistics, Flint Professor of Mathematical, Natural, and Technical Sciences

Danforth is an applied mathematician interested in modeling a variety of physical, biological, and social phenomenon. He has applied principles of chaos theory to improve weather forecasts as a member of the Mathematics and Climate Research Network, and developed a real-time remote sensor of global happiness using messages from Twitter: the Hedonometer. Danforth co-runs the Computational Story Lab with Peter Dodds, and helps run UVM's reading group on complexity.

  • Stacks Image 525319
    Peter Sheridan Dodds , Kameron Decker Harris, Isabel M. Kloumann, Catherine A. Bliss, Christopher M. Danforth. Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter. PLoS ONE 2011. [Journal Page].
  • Stacks Image 525314
    Lewis Mitchell , Morgan R. Frank, Kameron Decker Harris, Peter Sheridan Dodds, Christopher M. Danforth. The Geography of Happiness: Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place. PLoS ONE 2013. [Journal Page].
  • Stacks Image 525310
    Andrew G Reece and Christopher M Danforth. Instagram photos reveal predictive markers of depression. EPJ Data Science 2017. [Journal Page].


Stacks Image 525327
Laurent Hébert-Dufresne - Assistant Professor, Computer Science

Laurent studies the interaction of structure and dynamics. His research involves network theory, statistical physics and nonlinear dynamics along with their applications in epidemiology, ecology, biology, and sociology. Recent projects include comparing complex networks of different nature, the coevolution of human behavior and infectious diseases, understanding the role of forest shape in determining stability of tropical forests, as well as the impact of echo chambers in political discussions.

  • Stacks Image 525331
    Laurent Hébert‐Dufresne Adam F. A. Pellegrini Uttam Bhat Sidney Redner Stephen W. Pacala Andrew M. Berdahl. Edge fires drive the shape and stability of tropical forests. Ecology Letters 2018. [Journal Page]
  • Stacks Image 525335
    Samuel V. Scarpino, Antoine Allard, Laurent Hébert-Dufresne. The effect of a prudent adaptive behaviour on disease transmission. Nature Physics 2016. [Journal Page]
  • Stacks Image 525339
    Laurent Hébert-Dufresne, Joshua A. Grochow, Antoine Allard. Multi-scale structure and topological anomaly detection via a new network statistic: The onion decomposition. Nature Scientific Reports 2016. [Journal Page]


Stacks Image 525346
Paul Hines - School of Engineering, Associate Professor

Hines' work broadly focuses on finding ways to make electric energy more reliable, more affordable, with less environmental impact. Particular topics of interest include understanding the mechanisms by which small problems in the power grid become large blackouts, identifying and mitigating the stresses caused by large amounts of electric vehicle charging, and quantifying the impact of high penetrations of wind/solar on electricity systems.

  • Stacks Image 525350
    Paul D. H. Hines, Ian Dobson, Pooya Rezaei. Cascading Power Outages Propagate Locally in an Influence Graph That is Not the Actual Grid Topology. IEEE Transactions on Power Systems ( Volume: 32, Issue: 2, March 2017 ). [Journal Page]
  • Stacks Image 525354
    Mert Korkali, Jason G. Veneman, Brian F. Tivnan, James P. Bagrow & Paul D. H. Hines. Reducing Cascading Failure Risk by Increasing Infrastructure Network Interdependence. Scientific Reports volume 7, Article number: 44499 (2017. [Journal Page]
  • Stacks Image 525358
    Pooya Rezaei, Paul D. H. Hines, Margaret J. Eppstein. Estimating Cascading Failure Risk With Random Chemistry. IEEE Transactions on Power Systems ( Volume: 30, Issue: 5, Sept. 2015 ). [Journal Page]


Stacks Image 525386
James Bagrow - Assistant Professor, Department of Mathematics and Statistics

Bagrow's interests include: Complex Networks (community detection, social modeling and human dynamics, statistical phenomena, graph similarity and isomorphism), Statistical Physics (non-equilibrium methods, phase transitions, percolation, interacting particle systems, spin glasses), and Optimization(glassy techniques such as simulated/quantum annealing, (non-gradient) minimization of noisy objective functions).

  • Stacks Image 525390
    Y.-Y. Ahn, J. P. Bagrow and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466: 761-764 (2010). [Journal Page].
  • Stacks Image 525394
    M. R. Frank, J. R. Williams, L. Mitchell, J. P. Bagrow, P. S. Dodds, C. M. Danforth. Constructing a taxonomy of fine-grained human movement and activity motifs through social media. In preparation. (2015). [Journal Page].
  • Stacks Image 525398
    J. P. Bagrow and L. Mitchell. The quoter model: a paradigmatic model of the social flow of written information. To appear, Chaos (2018). [Journal Page].