What Does Correlation Looks Like? Optical Illusion / Optical Truth by W. Ben Hunt, Ph.D. – Salient Partners

Correlation

epsilon-theory-cholera-2

Portion of original dot map by Dr. John Snow, the founding father of epidemiology, showing the clusters of cholera cases in the London epidemic of 1854. The visual representation of Snow’s data analysis convinced local authorities to shut down the contaminated public well at ground zero of the cholera outbreak, although it would be another 20 years before Snow’s arguments in favor of germ theory and a direct connection between cholera and fecal contamination of water supply would be widely accepted.

John Snow, “On the Mode of Communication of Cholera” (1855)

Anscombe’s Quartet: four datasets that appear identical using summary statistical methods (mean, variance, correlation, linear regression), but are completely different in meaning and composition – a difference that is clearly revealed through visual inspection.

Correlation

 

Frank Anscombe, “Graphs in Statistical Analysis” American Statistician v.27 no.1 (1973), drawing by Schutz 

Correlation

Charles Joseph Minard, “Carte Figurative” of Napoleon’s 1812 Russian Campaign (1869)

The Minard Map: a map of Napoleon’s disastrous invasion of Russia in 1812, showing six distinct data dimensions (troop strength, temperature, distance marched, geographic latitude and longitude, direction of travel, location at event dates) in 2-dimensional form.

Mephistopheles:

Here too it’s masquerade, I find:
As everywhere, the dance of mind.
I grasped a lovely masked procession,
And caught things from a horror show…
I’d gladly settle for a false impression,
If it would last a little longer, though.

epsilon-theory-mephistopheles-5

Edouard de Reszke as Mephistopheles
in Gounod’s opera “Faust” (c. 1880)

So, so you think you can tell
Heaven from Hell,
Blue skies from pain.
Can you tell a green field
From a cold steel rail?
A smile from a veil?
Do you think you can tell?– Roger Waters, “Wish You Were Here” (1975)

A great deal of intelligence can be invested in ignorance when the need for illusion is deep.

Saul Bellow, “To Jerusalem and Back” (1976)

It is difficult to get a man to understand something, when his salary depends on his not understanding it.

Upton Sinclair, “I, Candidate for Governor: And How I Got Licked” (1935)

Knowledge kills action; action requires the veils of illusion.

Friedrich Nietzsche, “The Birth of Tragedy” (1872)

To find out if she really loved me, I hooked her up to a lie detector. And just as I suspected, my machine was broken.

Jarod Kintz, “Love Quotes for the Ages. Specifically Ages 19-91” (2013)

Edward Tufte is a personal and professional hero of mine. Professionally, he’s best known for his magisterial work in data visualization and data communication through such classics as The Visual Display of Quantitative Information (1983) and its follow-on volumes, but less well-known is his outstanding academic work in econometrics and statistical analysis. His 1974 book Data Analysis for Politics and Policy remains the single best book I’ve ever read in terms of teaching the power and pitfalls of statistical analysis. If you’re fluent in the language of econometrics (this is not a book for the uninitiated) and now you want to say something meaningful and true using that language, you should read this book (available for $2 in Kindle form on Tufte’s website). Personally, Tufte is a hero to me for escaping the ivory tower, pioneering what we know today as self-publishing, making a lot of money in the process, and becoming an interesting sculptor and artist. That’s my dream. That one day when the Great Central Bank Wars of the 21st century are over, I will be allowed to return, Cincinnatus-like, to my Connecticut farm where I will write short stories and weld monumental sculptures in peace. That and beekeeping.

But until that happy day, I am inspired in my war-fighting efforts by Tufte’s skepticism and truth-seeking. The former is summed up well in an anecdote Tufte found in a medical journal and cites in Data Analysis:

One day when I was a junior medical student, a very important Boston surgeon visited the school and delivered a great treatise on a large number of patients who had undergone successful operations for vascular reconstruction. At the end of the lecture, a young student at the back of the room timidly asked, “Do you have any controls?” Well, the great surgeon drew himself up to his full height, hit the desk, and said, “Do you mean did I not operate on half of the patients?” The hall grew very quiet then. The voice at the back of the room very hesitantly replied, “Yes, that’s what I had in mind.” Then the visitor’s fist really came down as he thundered, “Of course not. That would have doomed half of them to their death.” God, it was quiet then, and one could scarcely hear the small voice ask, “Which half?”

‘Nuff said.

The latter quality — truth-seeking — takes on many forms in Tufte’s work, but most noticeably in his constant admonitions to LOOK at the data for hints and clues on asking the right questions of the data. This is the flip-side of the coin for which Tufte is best known, that good/bad visual representations of data communicate useful/useless answers to questions that we have about the world. Or to put it another way, an information-rich data visualization is not only the most powerful way to communicate our answers as to how the world really works, but it is also the most powerful way to design our questions as to how the world really works. Here’s a quick example of what I mean, using a famous data set known as “Anscombe’s Quartet”.

Anscombe’s Quartet
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0