Eric Leung Code and Data Learnings     about     blog     projects     misc     feed

Reflecting on exploratory versus explanatory data visualization

I still haven’t created examples for the #TidyTuesday project.

But in looking at other submissions and comparing them with some of the visualizations I was preparing to create, I had some real insight into the difference between exploratory and explanatory data visualizations as I reflected on why I liked certain examples more than others and my own.

Exploratory data analysis, as the name implies, is about exploring the data. These figures can be quite complex and show a lot of data.

I noticed this faceted plot example. It is a nice faceted plot and cannot be understood with one look. It took me some time to read the legend and scan back and forth across all the years to understand its meaning.

This is what makes this a good exploratory plot. It invites the viewer to explore and think about the work and data.

Although this is a complex exploratory plot, I think it is an exemplar for an exploratory plot, much like an infographic.

Here’s another good exploratory plot showing a network of the 300 most common transatlantic slave routes.

I really enjoyed this plot because of the various annotations scattered throughout the visual. These enhance the plot’s meaning and understanding.

On the other hand, there are explanatory plots.

These have more thought and purpose to what they wish to show.

For example, the linked plot below is comparing the number of paintings acquired from a prolific artist, Joseph Mallord William Turner, versus everyone else.

These plots are typically not complex. The above plot is a standard histogram you learn in middle or high school. However, it is very effective in telling you a “story” or message.

To me, it shows

  • how prolific an artist Joseph Mallord William Turner was, and
  • how many paintings the Tate Art Museum has acquired.

These points are immediately clear.

I wrote this post because as I was creating my own visualizations for the #TidyTuesday project, I noticed how I didn’t feel as drawn to my examples as much as these others I found.

I then reflected on what kind of plot I was making and what insights or information I could learn from the plot. I realized I didn’t have a clear purpose in creating the plot other than to use a particular ggplot2 package, ggalt.

Although I may be overthinking it, this single exploration into the #TidyTuesday project has reminded me of what makes a good visualization. I hope to finally participate, share, and continue to learn from making more visualizations.

Side note, a great resource on exploratory data analysis can be found using NIST’s Engineering Statistics Handbook.