Does Marriage Affect Earning Potential?

Using DASIL’s United States Income Data by Marital Status, Race, and Sex visualization, one can see how the effect of marriage on a person’s earnings is multifaceted in nature: it depends on who we focus on and other factors at play. However, there are general trends that do prevail.


Married people overall have higher earnings, although the difference between divorced people is smaller than that of single people. Married people with a spouse present earned over $33 annually, while single people earned on average well over $10,000 less than married people with a spouse present. While it may appear that being single correlates to lower earnings, inter-related variables may explain some of the earning discrepancies observed.


One important variable to consider is the effect of age. As we discuss in another blogpost, workers ages 15-24 earn less than those of other age brackets. Studies suggest that those belonging to the 15-24 age bracket are less likely to be married, so some of the earning trends shown may not be strictly due to marriage. In addition, as illustrated in the aforementioned blogpost, 25-34 year-olds and 65+ year-olds make about the same and the next least age demographic (about $25000 more in 2013 dollars), and 35-64 make about $20,000 more on average. The 35-64 year-olds are more likely to be established in their careers, earning their highest-paying years within this age bracket. So, some earnings trends may be attributed to the pace of a career’s trajectory.

Breaking down by gender, the general trend persists: married men make a lot more than divorced and single men of all races, $44k, $33K, and $20k respectively. Married women have been making more than single men in recent years, averaging about $2K more in 2006 and persisting into 2010. While single women made more than married women in the 80s, the trend has reversed in recent years.



Breaking down by race, both Asian single men and women make more than any other singles demographically, at both averaging about $21K in 2010. Hispanic single women make the least of all demographics of men and women, at $15.1K, although Black single men are a close second. Earnings of Black single men peaked in 1998, only separated from white men by about a $200 difference. Studies attribute this peak to the economic boom of the 1990s and the transition of Black men into higher-skilled service-industry jobs.



Married Hispanic women still make less in comparison to all other married women, at $19.1K, but still substantially more than if they are single. Black females top the earnings compared to women of other races, at $26.6K, with the trend moving more or less in the same way as Asian married women.

Investigating Police Brutality in Los Angeles

Excessive use of force by law enforcement is by no means a novel phenomenon in the United States. However, with high-profile cases like Michael Brown, Eric Garner, and most recently Greg Gunn, fueling national movements such as #BlackLivesMatter, race-related incidences of police brutality are receiving worldwide media attention.

I investigated geographic trends in reported police brutality, using Los Angeles County at the census tract level and data from The Guardian’s project “The Counted,” a comprehensive dataset that records all people killed by police and other law enforcement agencies in the US, for the year 2015.

To measure the effect of location on incidences of police brutality, I conducted a hot spot analysis, which identifies statistically significant spatial clusters of high (hot spots) and low police brutality (cold spots). Essentially, the hot spots/cold spots indicate whether observed spatial clustering of police brutality events is more pronounced than if the values were randomly distributed. We specified the spatial relationship for the analysis as Contiguity Edges, meaning that census tracts that share a boundary or overlap with a census tract that contains a police brutality event will be weighted more that those that don’t in the analysis.

Below is a map depicting the results of the hot spot analysis.


The hot spots depicted in the map reveal the relationship between location and the occurrence of police brutality. The neighborhoods enveloped in hot spots are those with an abnormally high number of police brutality events, indicating that these areas may be disproportionately affected by excessive use of force by law enforcement.

Looking demographically at both the incidences themselves and these hot spot neighborhoods can shed some light on why these areas have abnormally high police brutality. Right off the bat, the number of blue and green dots (Hispanic/Latino and black victims, respectively), dominates the map. Breaking down by race, there were 30 victims of Hispanic/Latino descent, 11 black, 4 Asian/Pacific Islander, 7 white, and 1 Arab-American. In addition, most of the incidences with blacks as victims happen in LA neighborhoods that have a large population of blacks, such as Willowbrook and Westmont. The same trend also appears when focusing on Hispanic/Latino victims: most Hispanic/Latino victims died in neighborhoods with large populations of Hispanics/Latinos, such as Los Angeles proper and Eastern LA County (Baldwin Park, Irwindale, West Covina).

A Tool for Visualizing Regression Models

Will sales of a good increase when its price goes down? Does the life expectancy of a country have anything to do with its GDP? To help answer these questions concerning different measures, researchers and analysts often employ the use of regression techniques.

Linear regression is a widely-used tool for quantifying the relationship between two or more quantitative variables. The underlying premise is simple: no more complicated than drawing a straight line through a scatterplot! This simple tool is nevertheless used for everything from market forecasting to economic models. Due to its pervasiveness in analytical fields, it is important to develop an intuition behind regression models and what they actually do. For this, I have developed a visualization tool that allows you to explore the way regressions work.

You can import your own dataset or choose from a selection of others, but the default one is information on a selection of movies. Suppose you want to know the strategy for making the most money from a film. In regression terminology, you ask what variables (factors) might be good predictors of a film’s box office gross?

The response variable is the measure you want to predict, which in this situation will be the box office gross (BoxOfficeGross). The attribute that you think might be a good predictor is the explanatory variable. The budget of the film might be a good explanatory variable to predict the revenue a film might earn, for example. Let’s change the explanatory variable of interest to Budget to explore this relationship. Do you see a clear pattern emerge from the scatterplot? Can you find a better predictor of BoxOfficeGross?

If you want to control for the effects of other pesky variables without having to worry about them directly, you can include them in your model as control variables.

Below the scatterplot are two important measures that are used in evaluating regression models: the p-value and the R2 value. What the p-value tells us is the probability of getting our result just by chance. In the context of a regression model, it suggests whether the specific combination of explanatory and control variables really do seem to affect the response variable in some way: a lower p-value means that there seems to be something actually going on with the data, as opposed to the points being just scattered randomly.  The R2 value, on the other hand, tells us how what proportion of the variability in the response (predicted) variable is explained by the explanatory (predictor) variable, in other words, how good the model is. If a model has a low R2 value and is incredibly bad at predicting our response, it might not be such a good model after all.

score vs runtime plot

If you want to predict a movie’s RottenTomatoesScore from its RunTime, for example, the incredibly small p-value might tempt you to conclude that, yes, longer movies do get better reviews! However, if you look at the scatterplot, you might get the feeling that something’s not right. The R2 value tells us this other side of the story: though RunTime does appear to be correlated to RottenTomatoesScore, the strength of that relationship is just too weak for us to do anything with!

Play around with the default dataset provided, or use your own dataset by going to the Change Dataset tab on top of the page. This visualization tool can be used to develop an intuition for regression analysis, to get a feel of a new dataset, or even in classrooms for a visual introduction to linear regression techniques.

Modeling Population Growth in Excel

The Malthus and Condorcet Equations, simple formulas that model relatively complex trends in population growth, are now accessible with an Excel calculator that allows the user full control over every component of the equations. Students can use the Excel file to model human population growth under the assumption that a human carrying capacity exists.

The Malthus Equation expresses the growth rate of a population as a function of the current population size and current carrying capacity. Specifically, the growth rate of a population is equal to a Malthusian parameter multiplied by the current population size multiplied by the difference between the current carrying capacity and the current population size. This relationship creates a high growth rate once a population is large enough to reproduce at its full potential, but remains a low growth rate when the population is very small or when a population is nearing its carrying capacity and feeling the effect of constrained resources. The Malthusian parameter is almost invariably between zero and one because a negative Malthusian parameter would lead to a population’s gradual extinction while a Malthusian parameter greater than one would lead to explosive population growth that would greatly exceed the carrying capacity. In the latter situation, unrealistically rapid and extreme periods of growth and contraction would ensue.

The Condorcet Equation expresses the growth rate of the carrying capacity of a population as equal to the growth rate of the population multiplied by a constant termed the Condorcet parameter. The logic behind this mathematical relationship is that the carrying capacity of a population increases or decreases proportionally with the growth rate of a population because an additional person in a population can have a positive or negative effect on the carrying capacity. This implies that a Condorcet parameter greater than one results from a society where an additional individual somehow increases the number of people that can be supported even when taking into account the resources that additional individual consumes; this could result from a situation where there are increasing returns to labor. If doctors cure diseases better when more of them work together, this is reflected by a Condorcet parameter greater than one. A Condorcet parameter between zero and one is most realistic for human populations because the contribution of another person will probably grow the carrying capacity but not by more than one. A negative parameter implies that an additional person would actually lower the carrying capacity; perhaps every additional person would consume natural resources at a rate greater than the previous individual’s rate.

As Cohen (1995 Science 269: 341-346) points out, the equations are not necessarily realistic models of human population growth. There is no consensus about whether or not a human carrying capacity exists. In theory, we as a species might be able to continually develop technology at such a rate that we are unable to approach a carrying capacity. A slowdown in overall human population growth is more likely due to a global increase in income per capita that leads to altered reproductive strategies.

With r=0.1 and c=0.1 as parameters, the population experiences a positive but steadily decreasing growth rate because the carrying capacity increases at 1/10th the rate of population growth, and since population growth slows as the population size approaches the carrying capacity, we observe almost asymptotic behavior. This is a realistic pattern for human population growth if a carrying capacity exists.

Figure 1: with r=0.1 and c=0.1 as parameters, the population experiences a positive but steadily decreasing growth rate because the carrying capacity increases at 1/10th the rate of population growth, and since population growth slows as the population size approaches the carrying capacity, we observe almost asymptotic behavior. This is a realistic pattern for human population growth if a carrying capacity exists.

The calculator defines the Malthus Equation as dP(t)/dt=rP(t)[K(t)-P(t)] and the Condorcet Equation as dK(t)/dt=c dP(t)/dt (See Cohen 1995: 343). The user may enter values for the initial states of r (the “Malthusian parameter”), P(t), (population size), K(t) (carrying capacity), c (“Condorcet parameter”), t_0 (the starting time for the model) and dt (the length of one interval in time) that determine all of the future changes in population size. The rates of change of population and carrying capacity at time t, dP(t)/dt and dK(t)/dt respectively, are determined by the equations. The Malthusian and Condorcet parameters are constant in a growth model provided that there are no exogenous shocks that affect the nature of population or carrying capacity growth. Because of this, they do not vary as a function of t.

To explore the Malthus-Condorcet calculator, please follow this link to an automatic download of the Excel spreadsheet containing the calculator.

Data Across the Curriculum: Teaching Data Skills in Sociology

Casey Oberlin, Assistant Professor of Sociology, understands the importance of using data in the classroom, especially in such a discipline as Sociology, which is commonly viewed by others outside the discipline as a field with less real-life application of hard skills (e.g. data analysis). This conception is far from the truth, and Oberlin’s approach with data in the classroom gives her students a very holistic and interactive view of data analysis in the field that shows how data is part and parcel to the discipline.
Oberlin uses both her introductory Sociology courses and Research Methods courses as opportunities for students to get deeply entrenched with the data-rich, multi-tiered research process of the field. Data in Sociology is very diverse, as it involves both quantitative and qualitative measures, so Oberlin’s approach focuses on getting students exposed to the vast array of data types, as well as the techniques, technologies, and methods used to interpreting each type.


At the introductory level, Oberlin focuses on data consumption as a first step to data concepts. Students study infographics (see Figure 1) and other data visualizations to learn how to present data and interpret the data being presented. Oberlin’s Research Methods courses are reserved for her experiential-based approach with data that teaches students two data software programs throughout the semester, one quantitative (SPSS) and the other qualitative (Nvivo), shows students the wide range of data utilized by Sociology, and has students grapple with the entire research process for themselves. In Research Methods, students create research questions, hypotheses/expectations, clean or assess the dataset, analyze their results, and present their work in a professional manner. Her heavy guidance through the research process helps to mitigate understandable anxiety about trying new techniques and presenting their ongoing work, setting her students up to then develop their own sustained research project throughout the semester. Oberlin states this immersive method is beneficial to and enthusiastically received by students, as the practice in research opens doors to internships, jobs, and grad schools.

All in all, Casey Oberlin’s utilization of data in the class gives students exposure to the intensive research process that is integral to Sociology and teaches important data skills and concepts that are applicable both in the real-world and in a classroom setting.