Student Spotlight: Racial Bias in the NYPD Stop-and-frisk Policy

Donald Trump recently came out in favor of an old New York Police Department’s (NYPD) “stop-and-frisk” policy that allowed police officers to stop, question and frisk individuals for weapons or illegal items. This policy was under harsh criticism for racial profiling and was declared in violation of the constitution by a federal judge in 2013.

An earlier post by Krit Petrachainan showed a potential racial discrimination against African-Americans within different precincts. Expanding on this topic, we decided to look at data in 2014, one year after the policy had been reformed, but when major official policy changes had not yet taken place.

More specifically, this study examined whether race (Black or White) influenced the chance of being frisked after being stopped in NYC in 2014 after taking physical appearance, population distribution among different race suspect, and suspected crime types into account.

2014 Data From NYPD and Study Results

For this study, we used the 2014 Stop, Question and Frisked dataset retrieved from the city of New York Police Department. After data cleaning, the final dataset has 22054 observations. To address our research question, we built a logistic regression model and ran a drop-in-deviance test to determine the importance of Race variable in our model.

Our results suggest that after the suspect is stopped, race does not significantly influence the chance of being frisked in NYC in 2014. A drop-in-deviance tests after creating a logistic regression model predicting the likelihood of being frisked gave a G-statistic of 8.99, and corresponding p-value of 0.061. This marginally significant p-value shows we do not have enough evidence to conclude that adding terms associated with Race improves the predictive power of the model.

Logistic regression plot predicting probability of being frisked from precinct population Black, compared across race

Figure 1. Logistic regression plot predicting probability of being frisked from precinct population Black, compared across race

To better visualize the relationship in interactions between race and other variables, we created logistic regression plots predicting the probability of being frisked from either Black Pop or Age, and bar charts comparing proportion of suspects frisked across sex and race.

Interestingly, given that the suspects are stopped, as the precinct proportion of Blacks increases, both Black and White suspects are more likely to be frisked. Furthermore, this trend is more profound for Black than White suspects (Figure 1).

Additionally, young Black suspects are much more likely than their White counterparts to be frisked, given that they are stopped. This difference diminishes as suspect age increases (Figure 2).

Logistic regression plot predicting probability of being frisked from age, compared across age

Figure 2. Logistic regression plot predicting probability of being frisked from age, compared across age

Finally, male suspects are much more likely to be frisked than females, given they are stopped (Figure 3). However, the bar charts indicate that the effect of race on the probability of being frisked does not depend on gender.

Proportion frisked by race, compared across sex

Figure 3. Proportion frisked by race, compared across sex

Is stop-and-frisk prone to racial bias?

Our results suggest that given that the suspect is stopped, after taking other external factors into account, race does not significantly influence the chance of being frisked in NYC in 2014. However, after looking at relationships between race and precinct population Black, age, and sex, there is a possibility that the NYPD “stop-and-frisk” practices are prone to racism, posing threat to minority citizens in NYC. It is crucial that the NYPD continue to evaluate its “stop-and-frisk” policy and make appropriate changes to the policy and/or police officer training in order to prevent racial profiling at any level of investigation.

*** This study by Linh Pham, Takahiro Omura, and Han Trinh won 2nd place in the 2016 Undergraduate Class Project Competition (USCLAP).

Check out the 2016 USCLAP Winners here.

Software Review: NVivo as a Teaching Tool

nvivo-logoFor the past few weeks, DASIL has been publishing a series of blog posts comparing the two presidential candidates this year – Hillary Clinton and Donald Trump – using NVivo, a text analysis software. Given the increasing demand for qualitative data analysis in academic research and teaching, this blog post will discuss the strengths and weaknesses of NVivo as a teaching tool in qualitative analysis.

Efficiency and reliability

Using software like NVivo in content analysis can add rigor to qualitative research. Doing word search or coding using NVivo will produce more reliable results than doing so manually since the software rules out human error. Furthermore, NVivo proves to be really useful with large data sets – it would be extremely time-consuming to code hundreds of documents by hand with a highlighter pen.

Ease of use

NVivo is relatively simple to use. Users can import documents directly from word processing packages in various forms, including Word documents and pdfs, and code these documents easily on screen via the point-and-click system. Teachers and students can quickly become proficient in use of this software.

NVivo and social media

NVivo allows users to import Tweets, Facebook posts, and Youtube comments and incorporate them as part of their data. Given the rise of social media and increased interest in studying its impact on our society, this capability of NVivo may become more heavily employed.

Segmenting and identifying patterns 

NVivo allows users to create clusters of nodes and organize their data into categories and themes, making it easy for researchers to identify patterns. At the same time, the use of word clouds and cluster analysis also provides insight into prevailing themes and topics across data sets.


While NVivo seems to be a great software that serves to provide a reliable, general picture of the data, it is important to be aware of its limitations. It may be tempting to limit the data analysis process to automatic word searches that yield a list of nodes and themes. While it is alluring to do so, in-depth analyses and critical thinking skill are needed for meaningful data analysis.

Although it is possible to search for particular words and derivations of those words, various ways in which ideas are expressed make it difficult to find all instances of a particular usage of words or ideas. Manual searches and evaluation of automatic word searches help to ensure that the data are, in fact, thoroughly examined.

Once individual themes in a data set are found, NVivo doesn’t provides tools to map out how these themes relate to one another, making it difficult to visualize the inter-relationships of the nodes and topics across data sets. Users need to think critically about ways in which these themes emerge and relate to each other to gain a deeper understanding of the data.

Meet Yujing Cao, DASIL’s new data scientist!

This year, DASIL welcomes a new member of our staff, Yujing Cao, who will be serving as the new data scientist. In her position at DASIL, Yujing will bring her expertise in data analysis and visualization to further expand DASIL’s capability to help students and faculty members integrate data analysis into research and classroom work.  In today’s big data era, enormous quantities of data are available, and Yujing will help Grinnell students and faculty explore them.

Yujing Cao is excited about joining DASIL and bringing a new level of data analysis to faculty research and teaching!

Yujing Cao is excited about joining DASIL and bringing a new level of data analysis to faculty research and teaching!

Originally from China, Yujing got her bachelor degree in Statistics from Anhui University. Her passion for data science led her to a PhD program in Statistics at the University of Texas at Dallas, where she obtained her degree in 2016. Her research was on graphical modeling of biological pathways in genomic studies. She is also interested in network analysis, machine learning, and trying different tools for data visualization. In her spare time, she enjoys reading, hiking, and exercising.

Yujing was excited about the position at Grinnell because of her strong interests in teaching and in data visualization. As she puts it:

“I wanted to look for a position which provides opportunities to create interesting data visualizations along with other data analysis work. I love using graphs to tell stories behind different data sets.

Working environment is another factor that led to my decision to come to Grinnell.  I strongly resonate with the core values of a liberal arts education. At Grinnell College, I can work in an academic environment helping faculty and students while promoting the use of data in research and learning.

Yujing also discusses a number of skills crucial to succeed in the field of data science. Data science is an interdisciplinary field requiring knowledge from mathematics, statistics, data mining and machine learning. Statistical knowledge and knowledge from other fields can help form good questions and seek direction, while programming skills (e.g. joining data sets and visualizing data) are needed for implementing our ideas. To be a good data scientist, you should possess strong programming and analytical skills.”

According to Yujing, “One of the most important qualities for any data scientist is curiosity. Curiosity encourages us to dig in and make interesting discoveries about data. Also, good communication skills can make a great data scientist. You should be able to clearly articulate your results and the implications of your findings to others, including other data scientists and people who don’t share a similar background.”

Her tip for students interested in a career in data science is to keep an open mind to learn from different disciplines and sharpen your programming skills.  In addition, a student who is interested in being a data scientist should take advantage of any opportunities to get hands-on projects that use real data.”

Faculty or students interested in meeting with Yujing should drop by DASIL(ARH 130) or her office (Goodnow 103) or contact her via email at for an appointment.

Data Across the Curriculum: The Explanatory Power of Data in Global Development & Geography

The field of geography is split into two camps: critical scholars, who are skeptical of data because they believe it silences certain voices within society and fails to explain process and context, and empirical scholars, who incorporate data to create empirical models that explain geographic concepts and trends.  Leif  Brottem, Assistant Professor of Political Science with a PhD in Geography, is a firm believer in the importance of both critical and empirical approaches.  Data analysis can compensate for and expand upon the limits of text and qualitative evidence. His focus on data analysis as a tool that illustrates narrative is evident in the work of each of his three classes, Introduction to Global Development Studies (GDS), Introduction to Geographical Analysis and Cartography, and Climate Change, Development, and Environment.

In his Introduction to Global Development Studies, for instance, Brottem utilizes infographics & charts to explain basic concepts, and utilizes data tools such as GapMinder to illustrate change over time and regional differences pertaining to a variety of development indicators. His students also complete two data analysis exercises as a part of the class: one exercise asks students to study the relationship between economic development and social development indicators, and the second has students explore different aspects of population dynamics such as carrying capacity, limits to growth and the determinants of population growth.

In Brottem’s Introduction to Geographical Analysis and Cartography course, students learn both the basic critical perspectives on how to evaluate maps and understand their overt and covert messages and practical techniques for making maps using Geographical Information Systems software.  Students complete in-class exercises and take-home labs that require creating data and using data to solve problems.


Finally, in Climate Change, Development, and Environment, Brottem utilizes data analysis in the form of topic-modeling: students investigate textual trends in various sources, from tweets to scholarly articles, using the MALLET topic model package. In addition, his students also work with Nvivo to conduct further qualitative analysis, and GIS to visualize spatial trends.

Working with data builds data literacy, a marketable and necessary skill in the real world that Brottem says isn’t typically developed in a liberal arts settings. Building data literacy is especially important in his introductory classes, because he has students who wouldn’t otherwise be exposed to data, and aims to get them comfortable with using data and reduce their fears of data, numbers, and data analysis. Brottem strongly believes that data is a powerful explanatory tool that helps students think of different ways to look at the world and their studies, beyond theory.

Data Across the Curriculum: Teaching Data Skills in Sociology

Casey Oberlin, Assistant Professor of Sociology, understands the importance of using data in the classroom, especially in such a discipline as Sociology, which is commonly viewed by others outside the discipline as a field with less real-life application of hard skills (e.g. data analysis). This conception is far from the truth, and Oberlin’s approach with data in the classroom gives her students a very holistic and interactive view of data analysis in the field that shows how data is part and parcel to the discipline.
Oberlin uses both her introductory Sociology courses and Research Methods courses as opportunities for students to get deeply entrenched with the data-rich, multi-tiered research process of the field. Data in Sociology is very diverse, as it involves both quantitative and qualitative measures, so Oberlin’s approach focuses on getting students exposed to the vast array of data types, as well as the techniques, technologies, and methods used to interpreting each type.


At the introductory level, Oberlin focuses on data consumption as a first step to data concepts. Students study infographics (see Figure 1) and other data visualizations to learn how to present data and interpret the data being presented. Oberlin’s Research Methods courses are reserved for her experiential-based approach with data that teaches students two data software programs throughout the semester, one quantitative (SPSS) and the other qualitative (Nvivo), shows students the wide range of data utilized by Sociology, and has students grapple with the entire research process for themselves. In Research Methods, students create research questions, hypotheses/expectations, clean or assess the dataset, analyze their results, and present their work in a professional manner. Her heavy guidance through the research process helps to mitigate understandable anxiety about trying new techniques and presenting their ongoing work, setting her students up to then develop their own sustained research project throughout the semester. Oberlin states this immersive method is beneficial to and enthusiastically received by students, as the practice in research opens doors to internships, jobs, and grad schools.

All in all, Casey Oberlin’s utilization of data in the class gives students exposure to the intensive research process that is integral to Sociology and teaches important data skills and concepts that are applicable both in the real-world and in a classroom setting.