The Common Mistakes Made in Creating a Data Visualization

Oftentimes the best way to learn about how to do something right is by learning what not to do, especially for how to make good data visualizations. WTF Visualizations is a website that compiles poorly crafted data visualizations from across the web and media. Below is a sampling of some of the visualizations featured that illustrate some of the most common data visualization mistakes:

  • Absence of Proper Scaling

Including proper scaling is essential in accurately representing your data. In the example below, the differentiation between values is misrepresented due to the absence of a clear scaling measure. The 52% measure does not appear to be as large as it should be in comparison to the other bars, and the 13% figure appears to be much larger than 3% when compared to the two 10% figures.


  • Too Much Information

While the inclination is to include as much information in visualizations as possible, oftentimes including too much information detracts from the clarity and concision that is essential to good data visualization. The example below perfectly illustrates how including a myriad of different categories can muddle your visualization, as well as the importance of clear axis labels and descriptive titles.

Ensure that the data that you do decide to visualize is comprehensible to your audience: recode categories when there are too many; don’t include measures that illustrate the same phenomenon; don’t include 10 different variables when 3 will do. If need be, include more than one visualization to highlight different sub categories or variables.


  • Bad Math

Always double check your math before sharing your visualization to the public. You may run the risk of misrepresenting your data, as well as appearing as though you are not capable of simple arithmetic. The example below illustrates this point perfectly: while the creator uses a pie chart, the sections do not add up to 100%, but rather, 128%. The sections of the pie chart also do not accurately reflect the values they supposedly represent: The “51% Today” section, for instance, should be taking up a little more than half of the pie chart.


What Makes a Good Data Visualization?

Being able to represent data in a clear, concise, and engaging way is an essential skill. While an effective poster is key, data visualizations are a tool that enhance the communication of narratives underlying the data. David McCandless, a world-renown data visualization maker and creator of Information is Beautiful, constructed a Venn diagram that depicts the essential elements of a successful data visualization.


The irony of this data visualization, which aims at serving as an aesthetically-pleasing vehicle for what comprises a good visualization, is that there are several elements that don’t make it a successful one. Firstly, the information he wants to communicate is not immediately obvious. Figuring out how each of the circle categories and their intersections relate to the associated examples (e.g. information x goal = plot) takes time and is distracting. In addition, some of the examples he gives aren’t very descriptive. What does he mean by “pure data viz” in the visual form x information intersection? What about “proof of concept” at the intersection of goal and story? There isn’t enough context available to make sense of these examples and categories. While the visualization is accessible to colorblind audiences (a very important element to a good data visualization), the point that McCandless wants to communicate is lost due to its lack of description and over-complicated use of the Venn diagram model.

Does Marriage Affect Earning Potential?

Using DASIL’s United States Income Data by Marital Status, Race, and Sex visualization, one can see how the effect of marriage on a person’s earnings is multifaceted in nature: it depends on who we focus on and other factors at play. However, there are general trends that do prevail.


Married people overall have higher earnings, although the difference between divorced people is smaller than that of single people. Married people with a spouse present earned over $33 annually, while single people earned on average well over $10,000 less than married people with a spouse present. While it may appear that being single correlates to lower earnings, inter-related variables may explain some of the earning discrepancies observed.


One important variable to consider is the effect of age. As we discuss in another blogpost, workers ages 15-24 earn less than those of other age brackets. Studies suggest that those belonging to the 15-24 age bracket are less likely to be married, so some of the earning trends shown may not be strictly due to marriage. In addition, as illustrated in the aforementioned blogpost, 25-34 year-olds and 65+ year-olds make about the same and the next least age demographic (about $25000 more in 2013 dollars), and 35-64 make about $20,000 more on average. The 35-64 year-olds are more likely to be established in their careers, earning their highest-paying years within this age bracket. So, some earnings trends may be attributed to the pace of a career’s trajectory.

Breaking down by gender, the general trend persists: married men make a lot more than divorced and single men of all races, $44k, $33K, and $20k respectively. Married women have been making more than single men in recent years, averaging about $2K more in 2006 and persisting into 2010. While single women made more than married women in the 80s, the trend has reversed in recent years.



Breaking down by race, both Asian single men and women make more than any other singles demographically, at both averaging about $21K in 2010. Hispanic single women make the least of all demographics of men and women, at $15.1K, although Black single men are a close second. Earnings of Black single men peaked in 1998, only separated from white men by about a $200 difference. Studies attribute this peak to the economic boom of the 1990s and the transition of Black men into higher-skilled service-industry jobs.



Married Hispanic women still make less in comparison to all other married women, at $19.1K, but still substantially more than if they are single. Black females top the earnings compared to women of other races, at $26.6K, with the trend moving more or less in the same way as Asian married women.

Investigating Police Brutality in Los Angeles

Excessive use of force by law enforcement is by no means a novel phenomenon in the United States. However, with high-profile cases like Michael Brown, Eric Garner, and most recently Greg Gunn, fueling national movements such as #BlackLivesMatter, race-related incidences of police brutality are receiving worldwide media attention.

I investigated geographic trends in reported police brutality, using Los Angeles County at the census tract level and data from The Guardian’s project “The Counted,” a comprehensive dataset that records all people killed by police and other law enforcement agencies in the US, for the year 2015.

To measure the effect of location on incidences of police brutality, I conducted a hot spot analysis, which identifies statistically significant spatial clusters of high (hot spots) and low police brutality (cold spots). Essentially, the hot spots/cold spots indicate whether observed spatial clustering of police brutality events is more pronounced than if the values were randomly distributed. We specified the spatial relationship for the analysis as Contiguity Edges, meaning that census tracts that share a boundary or overlap with a census tract that contains a police brutality event will be weighted more that those that don’t in the analysis.

Below is a map depicting the results of the hot spot analysis.


The hot spots depicted in the map reveal the relationship between location and the occurrence of police brutality. The neighborhoods enveloped in hot spots are those with an abnormally high number of police brutality events, indicating that these areas may be disproportionately affected by excessive use of force by law enforcement.

Looking demographically at both the incidences themselves and these hot spot neighborhoods can shed some light on why these areas have abnormally high police brutality. Right off the bat, the number of blue and green dots (Hispanic/Latino and black victims, respectively), dominates the map. Breaking down by race, there were 30 victims of Hispanic/Latino descent, 11 black, 4 Asian/Pacific Islander, 7 white, and 1 Arab-American. In addition, most of the incidences with blacks as victims happen in LA neighborhoods that have a large population of blacks, such as Willowbrook and Westmont. The same trend also appears when focusing on Hispanic/Latino victims: most Hispanic/Latino victims died in neighborhoods with large populations of Hispanics/Latinos, such as Los Angeles proper and Eastern LA County (Baldwin Park, Irwindale, West Covina).