11 Notes for reading graphs

Warning: jsMath requires JavaScript to process the mathematics on this page.
If your browser supports JavaScript, be sure it is enabled.

[next] [prev] [prev-tail] [tail] [up]

Chapter 11
Notes for reading graphs

Hopefully on 1 September.

Notes also available as PDF. Images as slides are available PDF as well.

11.1 Reading graphs

Everyone knows the basics for most plots. Find the data you have and follow lines to find the result. But what about points in between?

Reading graphs is a form of inductive reasoning.

Take care with your assumptions when reading graphs.
Make your assumptions plain when creating graphs.
Not all data need be graphed.
Modern, measurable plots began roughly with William Playfair (1759-1823).

___________________________________________________

pict

The text’s Figure 10. So 850 square feet needs a 16 000 BTU air conditioner.

What if you have a 900 square foot area? Might look between nearby points.
Or a 100 square foot area? Only one nearby point.

Is there a relationship you can use?

(Text figure’s source: Carey, Morris, and James. Home Improvement for Dummies, IDG Books.)

__________________________________________________________________

pict

A first thought is to use a line.
People often build items to fit lines; it’s how we tend to think.
But the relationship here doesn’t look like a line, does it?

________________________________________

pict

As a first step, get rid of the bars.
- The areas have no meaning.
- But they aren’t bad here, more on that later.
We haven’t changed the data, so this still does not appear to be a line.

_______________________________________

pict

Trying a statistical line fit: Definitely no line here.
But look at the x axis.
Are the points spaced appropriately?

__________________________________________________________________

pict

Spread the points out, and suddenly we do have a line.
- The BTU measurements likely are rounded, so not a perfect line.
Now we can predict the BTUs needed for any size without having to poke at nearby points and estimate differences.

______________________

pict

The points:

Bar charts like these often are tables and not graphs.
Inductive reasoning: Keep track of your assumptions when extrapolating visual relationships.

__________________________________________________________________

Remember in the bar chart: Areas did not matter.
People are very, very bad at judging areas.
Given the one baloon represents 12%, how large is the one next to it?

(From the Onion (//www.theonion.com:http://www.theonion.com))

__________________________________________________________________

Given the one baloon represents 12%, how large is the one next to it?
22%
This is the Onion, but the graph is to scale.
Not by area, but by length.
But you see area first\mathop{\mathop{…}}

(From the Onion (//www.theonion.com:http://www.theonion.com))

__________________________________________________________________

Also beware graphs with too much of a “slant”.
- (multiple meanings here)
In a “pie chart”, areas are the data.
But people are very, very bad at judging areas.
Which is larger, 19.5% or 21.2%?
Who is the 19.5% in this image?
Avoid 3D effects!

(at Macworld 2008, photo from Ryan Block of Engadget (//www.engardget.com:http://www.engardget.com))

____________________________________________________

Vendor	US market share (%)
RIM	39.0
Apple	19.5
Palm	9.8
Motorola	7.4
Nokia	3.1
other	21.2

Never be afraid of using small tables.
Is there a problem with other being the second largest?
other: LG, Samsung, Ericsson, \mathop{\mathop{…}}
Knowing your premises:
- US-only.
  - Nokia is #1 world-wide (40%)
  - Samsung is #2 (15%)
  - Motorola is #3 (10%)
- No history in this table, no idea about future.

__________________________________________________________________

This does not imply area is useless.

Graphic by Charles Joseph Minard in 1869.
Title: Carte figurative des pertes successives en hommes de l’Armée Fran\c{c}aise dans la campagne de Russie 1812-1813
Depicts Napolean’s 1812 march on Moscow and subsequent disaster.
Displays many variables in one image:
- location on the map,
- direction by color,
- size by width, and
- temperature during the retreat by the graph on the bottom.
At the time, an anti-war graphic!
Considered one of the best graphical displays of data across all of history.

11.2 Creating a graphical depiction of data

Begin with questions:
- What should the reader take away?
- What does the data really imply?
- One graph should not have too many messages.
Help the reader form correct comparisons. Items to compare should
- be represented similarly,
- and lie close together.
People judge lengths much better than areas.
Show causality, and avoid inferring causality where none exists.
- Plot unrelated quantites on different graphs, not on opposite axes of the same graph.
Use numbers and words.
Do not use visual effects unless they directly portray data.
- Extraneous symbols were coined “chartjunk” by Edward Tufte.
- Many affects can distort data, particularly 3-d affects.

Will walk through some of my thoughts while creating graphics for a highly technical paper. ____________________________________________________________

pict

Purpose: Display all our experimental data without much interpretation.
Too much data for a simple table?
On the left: algorithms and data cases (plain v. exceptional)
Blocks: Specific processors
Below: CPU/processor cycles per array entry
Dotted vert. line: CPU cycles for a critical operation
Colors and symbols: “direction” of algorithm and data cases (repeated!)
Graph allows simple comparisons of our raw data.

(from Marques, Riedy, and Vömel. Benefits of IEEE-754 features in modern symmetric tridiagonal eigensolvers)

__________________________________________________________________

pict

Purpose: Determine if CPUs impose penalties on certain arithmetic features.
Each algorithm (green bars) uses a different feature.
Ratio of “careful” over “plain” shows a slow-down.
Find outliers by looking down and across:
- One direction (“progressive”) encounters more problems than the other (“stationary”)?
- Missing data here: “stationary” ran far more slowly, slow-down hidden by total cost.

(from Marques, Riedy, and Vömel. Benefits of IEEE-754 features in modern symmetric tridiagonal eigensolvers)

__________________________________________________________________

pict

Purpose: Begin interpreting the data to determine which single algorithm should be used.
Chose one algorithm, “B”, to normalize others (2.2, 2.4).
- Reviewers (and authors) missed a typo. “B” should be 2.3.
Could this have been a table?
- More than a page of data in tabular form.
- Can be summarized by statistics (median and percentiles).
- Summaries were in the text.
Does this serve its purpose?
- With hindsight, not really.
- Summary in the text was better.
- This plot was unnecessary.

(from Marques, Riedy, and Vömel. Benefits of IEEE-754 features in modern symmetric tridiagonal eigensolvers)

__________________________________________________________________

11.3 Graph galleries and resources

Gallery of Data Visualization; The Best and Worst of Statistical Graphics: //www.math.yorku.ca/SCS/Gallery/: http://www.math.yorku.ca/SCS/Gallery/
Edward Tufte’s site: //www.edwardtufte.com:http://www.edwardtufte.com
Example graphs, some good, some not so good: //addictedtor.free. fr/graphiques/:http://addictedtor.free.fr/graphiques/
Other examples or essays:
- //www.dmreview.com/issues/20050101/1016296-1.html: http://www.dmreview.com/issues/20050101/1016296-1.html
- //www.bella-consults.com/square-pies: http://www.bella-consults.com/square-pies

[next] [prev] [prev-tail] [front] [up]

Chapter 11Notes for reading graphs

11.1 Reading graphs

11.2 Creating a graphical depiction of data

11.3 Graph galleries and resources

Chapter 11
Notes for reading graphs