In our December 27th post “On On the Origin of Species: An ode to science writers”, Clara Boothby explored how clear, compelling science writing can increase circulation of scientists’ ideas among the general public. While our previous post saw the Origin of Species as a model for scientific writing, here we explore how researchers at IU are seeking to understand the formation of groundbreaking ideas, such as those seen in Darwin’s Origin, through the use of a new analytical method called ‘topic modeling.’ Topic modelling uses statistical models to identify common topics across various documents based on the occurrence of similar semantic structures.
New ideas in science inevitably stem from past ideas. We know that past discoveries guide future discoveries because Darwin had contemporaries working on evolution, such as Alfred Russell Wallace. Past knowledge came from fields like animal husbandry, where selection for certain genetic features was already applied to such crafts as pigeon breeding. That is, it was understood that breeding birds with specific physical characteristics increased the likelihood that their offspring would carry such traits. What was unknown at the time was the extent to which genetic selection occurred without human intervention. Darwin was also influenced by prominent theories in human ecology, such as political economist Thomas Malthus’ writings on the relationship between population growth and famine.
For many young researchers, the overarching question is: “How did Darwin transition from consuming massive quantities of literature to generating novel ideas?” Thankfully, he left clues. Darwin kept logs of his readings all the way up until he published his first edition of On the Origin of Species. These logs are so detailed that they mark the very day he began writing his now-famous evolutionary biology manuscript.
By tracing his readings from the moment he began his log to the time he wrote On the Origin of Species, researchers in the Cognitive Science Program at Indiana University looked for changes in Darwin’s reading behavior that indicate movement from an information-gathering (exploitation) period to a period of synthesis (exploration). During an exploitation period, for example, one might read twelve books on the American Civil War. During an exploration period, however, readings might range widely, covering the industrial revolution, biodiversity in the American South, and the history of human surgery. Using text analysis algorithms, Jaimie Murdock, Colin Allen, and Simon DeDeo have modeled the topics on which Darwin read prior to publishing On the Origin of Species. Their model suggests that Darwin’s reading habits switched from an exploitation to exploration pattern right as he was beginning to write. In other words, Darwin’s amassed knowledge became so great that it tipped the scale over into expertise. As an expert, Darwin was able to synthesize observations about the world with his knowledge to offer a new theory of evolution.
What does this mean for modern students? Darwin’s case emphasizes that there are benefits to reading all kinds of literature, as certain themes having unifying features across genres (e.g. magnetism, death, reproduction, etc.). It may be that honing the ability to extract similar knowledge across genres is what facilitates the “A-HA!” moment of genius. If that’s the case–choose your books wisely, and keep reading!
 From a letter written by Charles Darwin to E.B. Aveling October 13th, 1880.
 Murdock, J., Allen, C., & DeDeo, S. (2017). Exploration and Exploitation of Victorian Science in Darwin’s Reading Notebooks. Cognition 159: 117-126.
Edited by Mark Juers and Elizabeth Rosdeitcher.