Testing Weighted Data

In previous posts we discussed the challenges of accounting for weights in stratified random samples. While the calculation of population estimates is relatively standard, there is no universally accepted norm for statistical inference for weighted data. However, some methods are more appropriate than others. We will focus on examining three different methods for analyzing weighted data and discuss which is most appropriate to use, given the information available.

Three common methods for testing stratified random samples (weighted data) are:

  • The Simple Random Sample (SRS) Method assumes that the sample is an unweighted sample that is representative of the population, and does not include adjustments based on the weights that are assigned to each entry in the data set. This is the basic chi-square test taught in most introductory statistics classes.
  • The Raw Weight (RW) Method multiplies each entry by their respective weight and runs the analysis on this adjusted weighted sample.
  • The Rao-Scott Method takes into account both sampling variability and varibility among the assigned weights to adjust the chi-square from the RW method.

One example of a data set which incorporates a weight variable is the Complementary and Alternative Medicine (CAM) Survey, which was conducted by the National Center for Health Statistics (NCHS) in 2012. For the CAM survey, NCHS researchers gathered information on numerous variables such as race, sex, region, employment, marital status, and whether each individual surveyed used various types of CAM. In this dataset, weights were assigned based on race, sex, and age.

Among African Americans who used CAM for wellness, we conducted a chi-square test to determine whether there was a significant difference in the proportion of physical therapy users in each region. Below is a table comparing the test statistics and p-values for each of the three statistical tests:


The SRS method assumes that we are analyzing data collected from a simple random sample instead of a stratified random sample. Since the proportions in our sample do not represent the population, this method is inappropriate. The RW method multiplies each entry by their weight giving a slightly more representative sample. While this method is useful for estimating populations, the multiplication of the weights tends to give p-values that are much too small. Thus, both the SRS and RW methods are inaccurate methods for testing this data set. The Rao-Scott method involves adjustments for non-SRS sample designs as well as accounting for the weights, resulting in a better representation of the population.
Try it on your own!
Through a summer MAP with Pam Fellers and Shonda Kuiper, we created a CAM Data shiny app. Go to this app and compare how population estimates and test statistics can changes based upon the statistical method that is used. For example, select the X Axis Variable to be Sex and the Color By variable to be Surgery. Examine the chi-square values from each of the three types of tests. Which test gives the most extreme p-value? The least extreme? You can also find multiple datasets and student lab activities giving details on how to properly analyze weighted data here.

Data Across the Curriculum: Using Qualitative Data Analysis in Teaching Spanish

When Spanish Professor Pérez incorporates NVivo, a qualitative research tool, into her teaching of Spanish, she sees it as a way to prepare her students for their future careers. Based on the trajectory of the field, she believes that “the digital humanities are here to stay.” While she realizes that not every student that studies Spanish plans on a career in academia or as a Spanish teacher, she hopes that working with digital technology will prepare her students to adapt to a variety of digital research tools in a wide range of fields.

After learning about NVivo, Professor Pérez decided to try using the program in her own research on festival books. Her initial project included only a small number of texts; however, with NVivo’s capacity for large-scale comparison between digital texts, her project has expanded to include around 700 texts.

Once she was familiar with NVivo, Professor Pérez decided to include a short assignment using the program in her Spanish seminar focused on Miguel de Cervantes’ classic novel, Don Quijote.


SaraSanders ‘14, the 2014-15 DASIL Post-Baccalaureate Fellow, gave an introductory workshop in the class, and Professor Pérez assigned three chapters of the Quijote to each small group of students to analyze digitally. Students then produced reports that included their analytical findings and reflections on NVivo’s usefulness.

So far, Professor Pérez has noted differences in how students respond to NVivo: the majority of her science-major students critiqued the program, wishing that it included detailed quantitative analysis, while humanities majors were usually complimentary. Eventually, she hopes to share further observations about the connection between digital technology and pedagogy at conferences and in a published article. As one of the first professors in Grinnell’s Spanish department to utilize digital analysis in her classes, she also hopes that her experiences with the developing field of digital humanities will facilitate other professors’ explorations of new technologies.


This past summer, Professor Pérez received a Steven Elkes Grant to develop the use of technology in a new course.  With the help of her research assistant, Alex Claycomb ’18, she is in the process of designing a course entitled “Designing Empire: Plazas, Power and Urban Planning in Habsburg Spain and its Colonies,” which integrates two new NVivo assignments as well as work with GIS and mapping.

Data Across the Curriculum: Integrating Data Analysis with Narrative in Political Science

From a pedagogical standpoint, Danielle Lussier, Assistant Professor of Political Science, stresses data as a tool for helping students approach problems from multiple perspectives. Working interactively with data allows them to better compare narratives and better understand the research process in both lower-level and upper-level material.

Political science is both a quantitative and qualitative field, so students at all levels of Lussier’s political science classes delve into both data types extensively and build data analytic skills as students progress in the major. Every class taught by Lussier involves data labs that draw on both cross-national data with countries as the unit of measure and on data with individuals as the unit of measure. The labs directly relate to readings, concepts, and/or countries that students study.

At the 100-level, students gain both an introduction to fundamental data concepts such as the construction and measurement of variables and to analytical computer programs like STATA, a statistical package, and ArcGIS, which analyses spatial data. The image below is of a GIS map her introductory political science students make in a data lab.


At the 200-level, Lussier’s students delve into applied data analysis and write in-depth data reports that compare data analyses from the course readings to data analyses that students reconstruct and update from the readings.

At the 300-level, students get the opportunity to pose questions about class readings and use lab time to test their inquiries with actual data from the readings. In addition, Lussier assigns students research modules that allow them to create their own qualitative variables from cross-national data that they then transform into quantitative data, giving students the opportunity to apply the data skills they’ve accumulated in each course level.

The positive impact of incorporating data into classroom work is not lost on students. Students in all levels of her courses are widely receptive to data in coursework and have viewed working with data in her classes as an integral stepping stone to both academic and professional pursuits. Adam Lauretig ’13, the first Post-Baccalaureate Fellow for DASIL, was inspired by Lussier’s data-driven coursework to pursue more advanced courses in spatial statistics, and subsequently created visualizations like the interactive timeline map of historical coups d’etat. Additionally, many of her students have cited the research and data skills developed in her class work as marketable to employers and graduate programs.