COMM 273D | Fall 2014

Thursday, November 20

Project prep and discussion

Discussion of final projects before the Thanksgiving break.

Thoughts about visualization and storytelling

Visually attractive graphics also gather their power from content and interpretations beyond the immediate display of some numbers. The best graphics are about the useful and important, about life and death, about the universe. Beautiful graphics do not traffic with the trivial.

Antipatterns in visualization

via Mushon Zer-Aviv, "Disinformation Visualization: How to lie with datavis"

Gallup: Pro-choice vs Pro-life

Riffing off of Mushon Zer-Aviv's critique of the various ways the Gallup abortion poll can be remixed visually to favor a certain political viewpoint.

Gallup's graph 1995 - 2010


The AP skew

(the AP article was originally published in 2009)

via Zer-Aviv:

Back in 2009 when the AP wanted to use the data to tell a good story, all they needed was to skew the graph’s proportions a bit, change the starting point of the vertical axis and switch to thicker, less nuanced lines that based on the very same data emphasizes a clear trend in the graph. And a clear shift in opinion in such a huge ideological debate makes for a much more interesting story.


The skew

Now here's LiveAction's take on it


via Zer-Aviv:

Earlier in 2009 when pro-life site wanted to graph the same data to preach to the choire and say “we’re winning!”…This time, no need to admit the public opinion was ever at the 1990’s low pro-life mark, so starting from 2003-2004 is fine. Focusing on a shoter time frame to tell a story of consistent increase helps too, as does the choice of 18-29 years old age brackets. There’s nothing about the data that isn’t true, but it is very selective.

Zooming out: Gallup 1995 - 2014

A much more muddled picture when you look at how opinion has changed from 2009 to 2014:


Adding nuance of which circumstances

Moving beyond pro-life versus pro-choice, let's look at the responses for "Do you think abortions should be legal under any circumstances, legal only under certain circumstances, or illegal in all circumstances?"


Variations in charting

Let's see what another level of granularity in data can show.

For respondents who said that abortions should be legal in certain circumstances, Gallup then asked them if they meant "in most circumstances" or "only in a few circumstances".

Part of the data is reproduced (by hand) below:

Date Always Mostly Few Never
5/8/2014 28 11 37 21
5/2/2013 26 13 38 20
5/3/2012 25 13 39 20
5/5/2011 27 10 39 22
5/3/2010 24 15 37 19
5/7/2009 22 15 37 23
5/8/2008 28 13 40 17
5/10/2007 26 15 40 18
5/8/2006 30 13 39 15
5/2/2005 23 12 40 22
5/2/2004 24 13 42 19
5/5/2003 23 15 42 19
5/6/2002 25 12 39 22
5/10/2001 26 15 41 15
1/13/2000 26 17 39 15
4/30/1999 27 12 42 16
1/16/1998 23 16 42 17
8/12/1997 22 12 48 15
7/25/1996 25 13 43 15
2/24/1995 32 9 41 15
9/6/1994 33 13 38 13

Stock area chart

When charting in Google Spreadsheets, the default chart choice is Area:

Area re-colored

The colors are randomly chosen, and in the default assignment, are pretty confusing: blue is for Always, but red is for Mostly.

So let's switch Always to blue, make Mostly a lighter blue, and Few and Never shades of red:

Area color, politicized

Now here's where judgment can apply color choices in such a way to skew the perception of the graph. Someone who sympathizes with the pro-choice side might argue that Few should not be considered a pro-life proponent, on the logic that you're either pro-life or you aren't. Such an advocate may then choose to make the Few area more neutral in appearance:

By making Few much less associated with Never, the situation makes the pro-choice contingent much more prevalent.

Line graphs

The area chart is useful for showing parts of a hole. But let's go with Gallup's choice and use a line graph.

Again, the default choices by Google are not helpful:

Line, re-colored

The problem with area charts (similar to pie charts) is that the eye has a harder time distinguishing differences between amounts of area. With the line chart, however, we see a new insight: the respondents who believe abortion should be highly restricted, i.e. Few circumstances, are by far the majority. And pro-lifers may claim that as a victory in the battle for opinion.

Line, compressed vertical axis

Changing the minimum of the axis to 5, and the maximum to 50, we see more dramatic changes in opinion, though that just serves to make the polling results seem more jittery:

Area, with less nuance

One more iteration: an area chart with the two in-between opinions combined into a Sometimes category, i.e. the same as Gallup's original graph, except as an area chart:

Course schedule

  • Tuesday, September 23

    The singular of data is anecdote

    An introduction to public affairs reporting and the core skills of using data to find and tell important stories.
    • Count something interesting
    • Make friends with math
    • The joy of text
    • How to do a data project
  • Thursday, September 25

    Bad big data

    Just because it's data doesn't make it right. But even when all the available data is flawed, we can get closer to the truth with mathematical reasoning and the ability to make comparisons, small and wide.
    • Fighting bad data with bad data
    • Baltimore's declining rape statistics
    • FBI crime reporting
    • The Uber effect on drunk driving
    • Pivot tables
  • Tuesday, September 30

    DIY Databases

    Learn how to take data in your own hands. There are two kinds of databases: the kind someone else has made, and the kind you have to make yourself.
    • The importance of spreadsheets
    • Counting murders
    • Making calls
    • A crowdsourced spreadsheet
  • Thursday, October 2

    Data in the newsroom

    Phillip Reese of the Sacramento Bee will discuss how he uses data in his investigative reporting projects.
    • Phillip Reese speaks
  • Tuesday, October 7

    The points of maps

    Mapping can be a dramatic way to connect data to where readers are and to what they recognize.
    • Why maps work
    • Why maps don't work
    • Introduction to Fusion Tables and TileMill
  • Thursday, October 9

    The shapes of maps

    A continuation of learning mapping tools, with a focus on borders and shapes
    • Working with KML files
    • Intensity maps
    • Visual joins and intersections
  • The first in several sessions on learning SQL for the exploration of large datasets.
    • MySQL / SQLite
    • Select, group, and aggregate
    • Where conditionals
    • SFPD reports of larceny, narcotics, and prostitution
    • Babies, and what we name them
  • Thursday, October 16

    A needle in multiple haystacks

    The ability to join different datasets is one of the most direct ways to find stories that have been overlooked.
    • Inner joins
    • One-to-one relationships
    • Our politicians and what they tweet
  • Tuesday, October 21

    Haystacks without needles

    Sometimes, what's missing is more important than what's there. We will cover more complex join logic to find what's missing from related datasets.
    • Left joins
    • NULL values
    • Which Congressmembers like Ellen Degeneres?
  • A casual midterm covering the range of data analysis and programming skills acquired so far.
    • A midterm on SQL and data
    • Data on military surplus distributed to U.S. counties
    • U.S. Census QuickFacts
  • Tuesday, October 28

    Campaign Cash Check

    The American democratic process generates loads of interesting data and insights for us to examine, including who is financing political campaigns.
    • Polling and pollsters
    • Following the campaign finance money
    • Competitive U.S. Senate races
  • Thursday, October 30

    Predicting the elections

    With Election Day coming up, we examine the practices of polling as a way to understand various scenarios of statistical bias and error.
    • Statistical significance
    • Poll reliability
    • Forecasting
  • Tuesday, November 4

    Election day (No class)

    Do your on-the-ground reporting
    • No class because of Election Day Coverage
  • While there are many tools and techniques for building data graphics, there is no magic visualization tool that will make a non-story worth telling.
    • Review of the midterm
    • The importance of good data in visualizations
    • How visualization can augment the Serial podcast
  • Tuesday, November 11

    Dirty data, cleaned dirt cheap

    One of the most tedious but important parts of data analysis is just cleaning and organizing the data. Being a good "data janitor" lets you spend more time on the more fun parts of journalism.
    • Dirty data
    • OpenRefine
    • Clustering
  • Thursday, November 13

    Guest speaker: Simon Rogers

    Simon Rogers, data editor at Twitter, talks about his work, how Twitter reflects how communities talk to each other, and the general role of data journalism.
    • Ellen, World Cup, and other masses of Twitter data
  • Tuesday, November 18

    What we say and what we do

    When the data doesn't directly reveal something obvious, we must consider what its structure and its metadata implies.
    • Proxy variables
    • Thanks Google for figuring out my commute
    • How racist are we, really?
    • How web sites measure us
  • Thursday, November 20

    Project prep and discussion

    Discussion of final projects before the Thanksgiving break.
  • Tuesday, November 25

    Thanksgiving break

    Holiday - no class
  • Thursday, November 27

    Thanksgiving break

    Holiday - no class
  • Tuesday, December 2

    Project wrapup

    Last-minute help on final projects.
  • Thursday, December 4

    Project Show-N-Tell

    In-class presentations of our final data projects.