Understanding Population Estimates Based Upon Stratified Random Samples

When a researcher is interested in examining distinct subgroups within a population, it is common to use a stratified random sample to better represent the entire population. This method involves dividing the population of interest into several small subgroups (called strata) based on specific variables of interest and then taking a simple random sample from each of these smaller groups. To account for stratified random samples, weights are used to better estimate population parameters.

Many people fail to recognize that data from a stratified random sample should not treated as a simple random sample (SRS), as Kathy Kamp, Professor of Anthropology, mentions in an earlier blog post. The following example explains why it is important to treat stratified random samples and SRS differently.

In 2010, CBS and the New York Times conducted a national phone survey (a stratified random sample) of 1,087 subjects as part of “a continuing series of monthly surveys that solicit[ed] public opinion on a range of political and social issues” (ICPSR 33183, 2012 March 15). In addition to political preference, they gathered information on race, sex, age, and region of residence.

The figure below demonstrates how population estimates vary depending on the use of weights. The unweighted graph incorrectly overestimates the number of females in the democratic party (52% Democrat and 40% Republican). This leads to an incorrect overestimate of the number of democrats in the nation. However, when weights are properly incorporated into the analysis we see that the ratios are actually much closer (46% Democrat and 45% Republican).




As demonstrated above, there is a difference between the weighted and unweighted graphs and resulting proportions. Specifically, the number and percent of Republican supporters increases when we take into account the weights. The weighted graph and proportions give a more accurate estimation of Political Preference by Sex in the population than the unweighted graph.

Try it on your own!

Through a summer MAP with Pam Fellers and Shonda Kuiper, we have created a Political Data app using this dataset. Follow this link in  to view the influence of weights on the population estimates for all the subgroups within this dataset. For example, select the X Axis Variable to be “Region” and the Y Axis Variable to be “Political Preference”. What do you notice about the weighted graph in comparison to the unweighted graph? You can also find datasets and several student lab activities giving details for proper estimation and testing for survey (weighted) data at this website.

Exploring Racial Disparities in New York City’s Stop-and-Frisk Policies


A comparison of the two maps above yields a surprising conclusion: African-Americans are much less likely to be arrested in areas with higher African-American populations!

One of the best examples of the use of statistics in policy research is in the controversy about New York City’s Stop-Question-and-Frisk policies, which give police officers the right to stop, search, or arrest any suspicious person with reasonable grounds for action. These policies were an effort to reduce crime rates, under the philosophy that stopping suspicious persons will prevent smaller crimes from escalating into more violent ones. In recent years, the NYPD had been under fire for alleged racial discrimination in their stops. Research on approximately 175,000 stops from January 1998 through March 1999, for example, showed that Blacks and Hispanics represented 51% and 33% of the stops, while only representing 26% and 24% of the New York City population respectively. The NYPD defended their practices, saying that since crimes mostly happen in black neighborhoods, it is natural that more black people would be found suspicious of crimes.

Using the stop-and-frisk dataset provided by the NYPD and 2010 census data, numbers were compiled into an interactive heat map of arrests directly related to the stop-and-frisk policy in New York City, as an aid to visualizing this disparity in race.

For each precinct, the visualization allows you to compare the racial make-up of the population with the proportion of arrests by race. For instance, this shows that in Precinct 104 while less than 2 % of the population in this precinct is African-Americans, over 15% of the arrests were of African-Americans.



Evidence of racial disparity is clear.  African-Americans are consistently overrepresented in arrests compared to the population in each precinct.

The exception to this trend which was alluded to at the beginning of the post:  in areas with high African-American populations, the disparity disappears, and even reverses in a few precincts! Thus African-Americans are much less likely to be arrested in neighborhoods with high African-American populations.

Use this visualization to explore the trends in arrests due to the stop-and-frisk policies in New York:

Another visualization on stops and arrests in New York City can be accessed here. You can also go here for more information on these data visualizations.  These visualizations were created as part of a Grinnell College Mentored Advanced Project with Ying Long and Zachary Segall under the direction of Shonda Kuiper.

Data Across the Curriculum: Helping the Local and the International with Consulting Research

Students in Monty Roper’s Anthropology and Global Development Studies classes gain practical experience in fieldwork, data analysis, and ways to deal effectively with clients when they act as consultants for both local organizations in Grinnell and internationally in an agricultural village in Costa Rica.  The clients they work with get free research which is presented to them both in the form of an oral consultation and in a written report.


From left: Roni Finkelstein ’15, Ellen Pinnette ’15, Liberty Britton ’14, Rosalie Curtain ’15, Emily Nucaro ’14, Ben Mothershead ’15, Zhaoyi Chen ’14, and M’tep Blount ’15, listen to Juan Carlos Bejarono explain the palm growing process.


For a Global Development Studies/Anthropology seminar, students prepare research plans during the first half of the semester and then travel to a rural agricultural community in Costa Rica to spend the two weeks of spring break collecting data which is then analyzed and written up during the remaining weeks of the semester.  The first year of the project, the class conducted an in-depth community development diagnostic.  Since then, they have investigated a variety of rural development issues, mainly focusing on tourism, women’s empowerment, and organizational issues and agricultural projects of the town’s two cooperatives.


From left: Chloe Griffin ’14 and Samanea Karrfalt ’14 present their research on “Professional Black Hair Care in Grinnell, IA”

From left: Irene Bruce ’15 and Matt Miller ’15 present their research and answer questions.


In Grinnell, Monty works with Susan Sanning, Director of Service and Social Innovation, to identify and explore possible collaborations with community partners who have research needs.  In the past, for example, Mid-Iowa Community Action (MICA) was interested in knowing why families dropped out of their Family Development and Self-Sufficiency Program (FaDSS) before their benefits were fully used, Drake Library was interested in what kinds of programming would best serve the town’s “tween” population, and a hair salon wanted to find out whether it was economically viable to invest in special hair care products and services for black customers.

Ideally positive change occurs because of the class’ research.  Grinnell students, Dillon Fischer ’13 and Sarah Burnell ’13, interviewed graduates of Grinnell High School who had gone on to attend college about their preparedness for college academics. According to the GHS Principal, these findings led the school to revise its minimum writing standards, making them more challenging. The local after school youth program, Galaxy, requested a study on donor perceptions and desires and subsequently used the results to write a successful grant proposal for support. This year’s class is planning to do more follow-ups on previous projects to ascertain longer term results.

Data Across the Curriculum: Using Real Data in Classes

mapblacksoldiercotton    F7 Maya Andelson grub eater       Figure 5       SarahMappic1      homosexuality-gender

What does eating a grub have to do with interviewing children in Costa Rica?  What does studying Don Quixote or Shakespeare have to do with examining business transactions, consulting for an NGO or designing a visualization on terrorist incidents?

ANSWER:  They all describe ways that Grinnell students are engaging with real-world data.

Well-educated individuals should be able to create, evaluate, and analyze data, so that they can ultimately engage in the most well-informed decision-making.  They will then need to be able to effectively communicate patterns in the data to others, using statistics and/or visualization tools in addition to, of course, well-crafted words.

None of these analytic or communicative skills are easy to learn and acquiring expertise demands both theory and the opportunity for practice. In an age of ubiquitous data (much of it of dubious quality) and numerous computer-assisted visualization and analytic tools (all of which can be both used and misused) the pitfalls are many, although the rewards for are great.  Employers love the data-savvy, but data analysis is an important part of decision-making in daily life as well!

Grinnell College classes in disciplines ranging from Anthropology to Spanish, History to Biology, Psychology to English, and Political Science to Statistics are engaging with real data in a variety of ways.  Some classes include data collection as well as analysis and display; others are more focused on evaluating data, interpreting it and communicating the results.

DASIL’s mission is to assist faculty and students explore the world using data.  We are embarking on a series of profiles designed to highlight some of the innovative ways the Grinnell faculty incorporate data in their classes.

See future posts for more details about how grubs, high school students, Don Quixote , and even baboons, all fit into the picture.

Analyzing the American Political Sphere with “We the People”: Part 2

DASIL’s “We the People” data explorer allows users to search petitions based on subject—civil rights, economics, and defense, to name a few. My previous post about “We the People” briefly examined government responses to petitions but did not consider their subject. This analysis expands on that post by grouping together petitions with similar subjects into three broad categories: Government, Science, and Sociology. The government category includes subjects like “Budget and Taxes” and “Defense,” science includes “Technology and Telecommunication,” “Environment,” and sociology includes “Disabilities,” “Education” and “Poverty.”  I only included petitions with over 5,000 signatures in my analysis to limit the number of results.

paige21c              paige23c              paige22c



A frequency analysis for each category reveals an interesting trend when compared with the analysis of the petitions with 100,000+ signatures.  Continue reading →