Skip to content

The Good, the Bad, and the Ugly: Data Collection

This series of blog posts is based on the work of Katie Orsund, in association with previous student workers at DASIL. It has been edited and formatted by Katie Orsund, Georgia Rawhouser-Mylet, and Charun Upara

In our second post featuring the Iowa Township Project, we focus on how our data were collected and managed. For context about Grinnell and the project in general, check out our first post, “Land, Census, and the Digital Humanities: The Iowa Township Project.” If you are wondering what historical claims we can support with this wealth of data, stay tuned for our third and final post featuring Katie Orsund’s research on women’s work in rural Iowa.

So what took us so long?

  1. Transcribing Census Data is a difficult process that requires materials only found in libraries.
  2. The desired data wasn’t always available so other data had to be found as a substitute.
  3. Multiple people working on the same project creates different sets of expectations.
  4. This project was worked on part-time by undergraduate students at a demanding college.
  5. All of the above.

The answer is truly all of the above. What may initially seem simple can take much longer than expected. Research questions don’t always match up to available data. In the case of the Iowa Township Project, someone decided long ago to pursue a somewhat inaccessible data set, sending us down the path we find ourselves on ten years later. In this blog post, we will retrace this ten-year path and introduce our methodology and the reality of data collection.

This Dilbert comic strip features a boss and his employee, Dilbert, in an office. The boss asks "What's taking you so long on the project?" Dilbert replies, "The application is unstable because the data model is driven by an overly complex relational database and there was no integration testing." His boss asks, "Does any of that mean the same thing as lazy?"
Figure 1: This comic strip demonstrates one of the challenges in data collection: staying motivated

The majority of our data come from three sources: the Federal Census, the Iowa State Census, and local plat maps.

The United States Census has a long history; it is the oldest continuous census in the world.  You can find it written in the Constitution as a power of Congress. Since 1790, the census has been taken every ten years.  The census, like the Constitution, is a living document, and it has undergone radical changes that reflect changes in economic, social and political thought, and new technology.[efn_note] [/efn_note] The census is available publicly, and can be found in libraries, often on microfiche, and online genealogical sources. In order to preserve the privacy of living Americans, the Census Bureau waits 72 years before publicly publishing all the information from a new census, although it does release certain data that do not reveal personal details about individuals shortly after they are collected. At the time this post was written, full census data from 1790 to 1940 was available to the public. For us, this posed no problem as the Iowa Township Project uses census data from 1860 to 1930. 

This political cartoon-style illustration depicts Uncle Sam measuring population using the 14th national census.

Beginning in 1930, the Census asked whether or not the household had a radio.

We looked no further than home for the first piece of the puzzle: in the basement of Burling Library is the mythical Microfilm Lab. Here are cabinets containing wonders such as newspapers, censuses, and other government documents. However, the excitement quickly wears off as reality slowly creeps in. You are in a dark room staring at a dim monitor trying to make out ancient cursive. The environment alone is enough to drive anyone crazy, which is why, after a few years of work in those conditions, those involved with the project looked for alternative census locations. They found it through one of the many of genealogical sources available: Family Search. Family Search is a database maintained by the Church of Jesus Christ of Latter-Day Saints that provides free scanned versions of various documents pertaining to genealogy, including the census. Beginning in 2016, we exclusively used these scans as the basis for our transcriptions, which proved quicker than searching microfiche for relevant information and caused far less eye strain.

A microfiche reader with a long beard, glasses, and a cane looks down at a piece of microfiche film. His speech bubble reads, "teach a man to microfiche,".
Figure 3: A cartoon featuring a microfiche-based pun

Unfortunately for us and other researchers, one census is missing almost completely. The 1890 Federal Census was destroyed almost entirely in a fire in the 1920s.  In order to circumvent not having any data for twenty years, we used the Iowa State Census, also taken every ten years halfway between federal censuses. We used this census for 1885 and 1895 to fill in the gap caused by the fire. We also used other documents we came across, such as farmer’s directories and city directories, to try and fill in other gaps.

A woman sits in front of a computer. She checks her watch and says, "Oops, time for my daily five-minute existential crisis. What am I doing with my life?? What's the point?? What am I going to do when I graduate????? sigh. Back to work!"
Figure 4: A comic strip highlighting the frequent small existential crises many researchers experience

Over the years, we often questioned why we were cataloging all this data. What was it all for? Well, primarily for this project we are trying to create a snapshot of who was living in Grinnell and the other townships and what their lives might have been like. Secondarily, with the data available to the public, we hope to give others the opportunity to become more familiar with the census, their town, and potentially their ancestors.

Compared to the census, the plat maps are a more unique source of data. They show areas divided into plots of land owned by individual families. No single source has a complete collection of plat maps, but compiling maps from county courthouses, the Library of Congress, and the University of Iowa gives us an image of what land ownership looked like in Richland, Rock Creek, and Grinnell over our chosen timeframe. 

Black pen lines show the streets and blocks of Rock Creek. Land is labelled with the names of land owners. Numbers representing how much land each person owns are written over collections of squares.

Figure 5: A plat map of Rock Creek

Keeping all of this in mind, it took us a long time to come up with a “complete” data set.  The transcription of the census alone took over seven years to complete, in addition to tracking down the plat maps. What was supposed to be a summer project turned into a monster of a project that spanned years.

The most challenging part of working with the data was entering them in manually. It required good eyesight and good cursive reading skills. It was also hard to estimate how long it would take. Did the census recorder write in legible dark ink or did they repeatedly try to write with a half-dead pen? What could sometimes feel tedious was simultaneously very important, since accuracy was crucial for good data collection. However, we also must acknowledge the limitations of data collected by humans with limited resources. The census collector may have made mistakes that will be forever encoded in our data.

To all those who work with and clean data, and especially those who worked on this project, we salute you.

Image 7: An example of the census as it appeared to the students transcribing it

While transcribing the census, we tried our best to preserve the original format while also creating our own systems to make sure we could analyze the data. For example, we kept households together as units, whether they included one family or many individuals boarding together. Some of the categories we used that are the most important were the Male and Female Head of Household, their professions (recorded individually), where they were born, and where their parents were born. We also made most of the information available to the public to allow for future researchers to use our data set. 

Ostensibly to save space and time, occupation was only recorded for the head of household, who was almost always male. This meant women’s work was consistently underrepresented. According to the census, the vast majority of women did not work at all, which is simply false. Many worked alongside their husbands on farms or took in work such as laundry or sewing. The census would have us incorrectly believe that women’s roles were limited to those of wives and mothers. In our next and final post, we will explore this subject, applying digital humanities and data visualization techniques to our data in order to demonstrate the realities of women’s work in rural Iowa.

Leave a Reply