Data Across the Curriculum: Qualitative Literary Analysis in the Humanities

This semester, students in Professors James Lee and Erik Simpson’s special topic seminar, “Milton, Blake, and Frankenstein,” will use NVivo, a software for qualitative literary analysis, to create word trees visualizing the use of the word “sublime” in Milton’s work. This is an outgrowth of teaching that Lee began in 2013 when, as DASIL’s first Faculty Fellow, he designed a seminar on Shakespeare and Renaissance literature that used NVivo to investigate first the corpus of Shakespeare’s work and then over 20,000 documents from the Early English Books Online (You can see previous DASIL blog posts written by Prof. Lee about this research here and here).

Lee’s classes use NVivo to visualize data and generate descriptive statistics about datasets that can be exported to other programs, such as Tableau (another software for data visualization). These programs are particularly useful to students because they provide a user interface, allowing students to manipulate data easily, without having to learn a new programming language.

Lee’s personal research incorporates some of the same methodologies that he teaches to his students in class. His current project, the Global Renaissance Project, is partially funded by a “Digital Bridges for Humanistic Inquiry” grant from the Mellon Foundation. It uses network analysis and topic modeling to examine discourses surrounding race in Renaissance texts. The figure below is a still of the prototype for the topic modeling aspect of the project, which identifies clusters of words with a disproportionately high probability of occurring together in text.

Screen Shot 2016-03-24 at 6.41.14 PM

So far, the project has revealed that Renaissance representations of race were centered on cultural, geographic, and commercial factors; race as a biological or physical concept emerged as a justification for English imperialism after the Renaissance. Lee currently collaborates with professors at the University of Iowa on a “linked reading” project that combines two databases to discover how networks of printers and publishing houses contributed to the Renaissance discourse on race.

For Lee, the biggest challenge presented by the integration of digital analysis into classes is changing his students’ mindsets. He observes that humanities students are often used to classes in which students and professors develop ideas through discussion. In contrast to discussion-based classes, working with the digital humanities can mean that students exert effort in a particular line of inquiry that doesn’t yield any concrete results.

The iterative process of data analysis can be frustrating, especially since students often don’t anticipate undertaking it in their humanities courses. Professor Lee hopes continuing to integrate digital humanities into classes like the seminar he is co-teaching this semester, will help to convert students’ frustration into a “tinkering mentality” so that students come prepared to continually adjust their hypotheses based on their analysis and visualization of the data.

To explore the Global Renaissance Project prototype, click here.

The Mass Shooting Epidemic in the United States

An examination of Stanford University’s Mass Shootings of America (MSA) dataset shows why shootings have been making the headlines in the U.S. and gun violence has become a big issue addressed in the campaigns of presidential hopefuls. Stanford MSA defines a mass shooting as “3 or more shooting victims (not necessarily fatalities), not including the shooter. The shooting must not be identifiably gang or drug related” (Stanford Mass Shootings in America, courtesy of the Stanford Geospatial Center and Stanford Libraries).


The dramatic change in the number of mass shootings in the past two years is readily apparent. There were 121 mass shooting events from 1966 to 2009, but 116 just in the past 5 years. 2015 alone had 65 separate instances of mass shootings. In terms of total number of fatalities, the past 7 years are noticeably thicker than earlier years. Even in years with low numbers of mass shootings, such as 1991 which only had 5 incidences, there were a large number of fatalities (47).



The Southern states had the largest numbers of mass shootings in 2015. Florida led with 6. Even though Texas had fewer mass shootings (4), the state sustained the most fatalities, 20. North Dakota and New Hampshire are the only 2 states that have not experienced any mass shootings in the 49 year time period covered by the data (not shown). In 2015 39 mass shootings occurred in residential homes & neighborhoods, while 21 happened in public places. Back in the late 90s, schools were the primary target of mass shooters, with 3 incidents in 1997 and 1999, each.


Most of the mass shootings in the past 8 years have stemmed from a variant of an altercation, be it domestic, legal, financial, or school-related. Of course it can always be argued that all mass shooters have mental health issues, but contrary to popular belief, according to these classifications,  shooters’ mental health issues as a direct motive for shootings  has not increased in recent years, with only one incident in 2015 attributed to mental health issues. Perhaps what’s most troubling is the high number of cases where a motive can’t be identified, 23 in 2015, suggesting the need for further, more comprehensive study into the underlying causes of these mass shootings.

Many pundits largely attribute the US-specific phenomenon to things to lax gun policy. However, any progress to change gun laws, even to fund research into the causes of gun violence, has been (and continues to be) stymied by the gun lobby, led by the National Rifle Association (NRA). Re-examining the nation’s access to guns is imperative, and those in Congress who are funded by the gun lobby need to be open to that re-examination. While the data available is informative, unfettered research is integral to truly understanding the nature of gun violence and to finding effective policy solutions.

Data Across the Curriculum: The Explanatory Power of Data in Global Development & Geography

The field of geography is split into two camps: critical scholars, who are skeptical of data because they believe it silences certain voices within society and fails to explain process and context, and empirical scholars, who incorporate data to create empirical models that explain geographic concepts and trends.  Leif  Brottem, Assistant Professor of Political Science with a PhD in Geography, is a firm believer in the importance of both critical and empirical approaches.  Data analysis can compensate for and expand upon the limits of text and qualitative evidence. His focus on data analysis as a tool that illustrates narrative is evident in the work of each of his three classes, Introduction to Global Development Studies (GDS), Introduction to Geographical Analysis and Cartography, and Climate Change, Development, and Environment.

In his Introduction to Global Development Studies, for instance, Brottem utilizes infographics & charts to explain basic concepts, and utilizes data tools such as GapMinder to illustrate change over time and regional differences pertaining to a variety of development indicators. His students also complete two data analysis exercises as a part of the class: one exercise asks students to study the relationship between economic development and social development indicators, and the second has students explore different aspects of population dynamics such as carrying capacity, limits to growth and the determinants of population growth.

In Brottem’s Introduction to Geographical Analysis and Cartography course, students learn both the basic critical perspectives on how to evaluate maps and understand their overt and covert messages and practical techniques for making maps using Geographical Information Systems software.  Students complete in-class exercises and take-home labs that require creating data and using data to solve problems.


Finally, in Climate Change, Development, and Environment, Brottem utilizes data analysis in the form of topic-modeling: students investigate textual trends in various sources, from tweets to scholarly articles, using the MALLET topic model package. In addition, his students also work with Nvivo to conduct further qualitative analysis, and GIS to visualize spatial trends.

Working with data builds data literacy, a marketable and necessary skill in the real world that Brottem says isn’t typically developed in a liberal arts settings. Building data literacy is especially important in his introductory classes, because he has students who wouldn’t otherwise be exposed to data, and aims to get them comfortable with using data and reduce their fears of data, numbers, and data analysis. Brottem strongly believes that data is a powerful explanatory tool that helps students think of different ways to look at the world and their studies, beyond theory.

Journalists and Maps: The Need for Teaching Quantitative Literacy to Everyone


In recent years programs like ArcGIS and Tableau have made it very easy to produce maps. Journalists have responded by richly illustrating their articles with quantitative data displayed as maps.  Maps are both attractive and easier to explore visually than the same data provided in tabular form, so in many ways they are ideal illustrations. For the average reader information transmitted as quantitative data appears authoritative and these maps are no exception.  On the surface they seem real and informative. Unfortunately, just as with any data-driven information maps can inadvertently be misleading.

In a recent example, NBC news illustrates an article about the Supreme Court consideration of a Texas law that would force the closure of a high percentage of existing abortion clinics across the country were similar laws to be enforced or enacted more broadly with a map of the U.S. showing the number of abortions per state in 2012, using data from the CDC (Center for Disease Control).

A quick perusal of this map seems to show why Texas is so concerned about abortions.  After all it is one of the states with the most abortions.  After a moment’s examination, the viewer might (or might not) note that the states with the highest populations also seem to have the largest numbers of abortions.


Thus, this map really tells us little about which states have the biggest problem with abortions.  Some kind of standardization by population is needed.  One option would be to just use population size. Since the number of women of women who might potentially become pregnant and secure an abortion (usually defined as the number of women between 15 and 44) does not necessarily vary by state in direct proportion to the population size, this statistic may be a better measure for standardization than simple population size.  In terms of the number of abortions per 1,000 women ages 15-44 Florida and New York have high rates of abortions, but Texas no longer looks unusual.


But this still might not be the most revealing measure to use, since there is state to state variation in the birth rate.  In 2012 the average birth rate for the U.S. was 1.88, but Texas had a birth rate of 2.08.  The highest birth rate in the 50 states was 2.37 in Utah and the lowest 1.59 in Rhode Island. An option that takes into account the differential birth rates is to examine the ratio between the number of births and the number of abortions.  Using this measure, New York remains very high, but due to its relatively high birth rate Texas is even lower.

Abortion Ratio

The CDC provides all of these statistics, but the journalist chose the least revealing of the possible measures to display.  Journalists are generally both well-educated and, we assume, well-meaning.  Why not pick a better measure to map when it would have been equally easy to do so?  I suspect that the answer lies firmly in the laps of educators like myself.  While we prioritize skills like writing and speaking well, we do not mandate that all students graduate statistically or even quantitatively literate, but we should.