labs(x = expression(paste("Engine displacment (", "in"^3, ")")))Effective Data Visualisation, Visual Perception and Design Principles
Visual Perception
This section is about making sure plots are as easily interpretable as possible for the reader.
Visual channels
Channels are a concept from section 1.5 from Kieran Healey. See a list of visual channels in the figures to the right in the reference ordered by effectiveness.
Effectiveness meant such that the listing in the reference is ordered from top-to-bottom of most effective to least effective, where effectiveness is based on surveys of asking people to decode data values from their visual channel.
Direct Area Visualisations
“…, we perceive the area in a pie chart differently from the same area in a bar plot. The fundamental reason is that human perception primarily judges distances and not areas. Thus, if a data value is encoded entirely as a distance, as is the case with the length of a bar, we perceive it more accurately than when the data value is encoded through a combination of two or more distances that jointly create an area.”
Functions to change the scale of point sizes (when using aes(size = xx) in a geom_point fx.):
ggplot2::scale_radiuscan be used to scale based on radius rather than area since.ggplot2::scale_size_areacan be used to make sure ratio of sizes is equal to ratio of data values.
Note! Above function are generally not recommended, as they break with the principle of proportional ink. However, still might be cases where it makes sense to do.
Gestalt rules
“The strong inferences we make about relationships between visual elements from relatively sparse visual information are called “gestalt rules”. They are not pure perceptual effects… Rather, they describe our tendency to infer relationships between the objects we are looking at in a way that goes beyond what is strictly visible…“.
See slide 10 for a “ranking” of what rules are dominant over each other and should be used for creating relations between data values.
Data-to-ink ratio
Data-to-ink ratio (concept by Tufte): Tufte proposes to always maximise the data-to-ink ratio. Only add ANYTHING if it is really, really needed
- Healey notes that …, it may be the case that graphics that really do maximize the data-to-ink ratio are harder to interpret than those that are a little more relaxed about it.
The principle of proportional ink
This concept has been termed by Bergstrom and West as the principle of proportional ink (Bergstrom and West 2016):
The principle of proportional ink: The sizes of shaded areas in a visualization need to be proportional to the data values they represent.
- Most plots should always include 0 in the y-axis.
- Be careful with heights/widths of bars.
- If values in a plot are much different from 0, making it hard to see differences, consider changing the response to a difference of some sort.
- On logarithmic axes, bars are a ratio. Make sure to represent correctly.
- Consider using points instead of bars.
Concepts from sections 1.1 and 1.2 of “Data Visualization: A practical introduction” by Kieran Healey.
- Aesthetic: Ugly or inconsistent design choices
- Substantive: It’s not about making stuff look pretty but making sure to present data in the most informative way.
- In many cases, this is not an issue directly with the plot but with the data. Always related to our question of interest.
- Perceptual: In these cases, even with good aesthetic qualities and good data, the graph will be confusing or misleading because of how people perceive and process what they are looking at.
- Fx. a continuous color scale for many categories can make it hard to distinguish colors from each other and can make fx. make a color appear lighter than it actually is if surrounded by darker colors.
Practical tips to making plots nicer
CRAP design principles
The “The Non-Designer’s Design Book” by Robin Williams proposes four major design features that make an image more appealing and more effective: Contrast, Repetition, Alignment, and Proximity (CRAP).
See lecture notes about CRAP principles for more information about the meaning of the 4 design features. Below are short quotes about each principle from the link.
- Contrast: Elements of an image should be either very similar or very different. If elements of an image are supposed to be different, make them obviously different.
- See these slides by Andrew Heiss.
- Repitition: Specific colours or styles should be used to tie elements of an image together. This helps to create a more consistent, more cohesive, less chaotic, image.
- Fx. within an image by making all text have same color/size/… or across images in a report
- Alignment: All elements of an image should be aligned with some other anchor point in the image. This helps to create organisation and structure within an image, which can make it easier to navigate within the image.
- Fx. aligning the legend vertically with the top of the plotting region, aligning the title with the left of the plotting region, etc.
- See example in slides by Andrew Heiss of a compound figure.
- Proximity: Elements of an image should be placed close to each other to form sub-elements of the image.
- Fx. moving axis breaks closer to axis ticks or moving keys in a legend closer together to highlight that they “belong together”
See much more information in the readings in the link to lecture notes and some notes in the answer to lab 5. See especially this black themed plot in the answer to question 3 in the same lab.
Color pitfalls
- Encoding too much or irrelevant information
- Too much:
- Quote from reference in title: “As a rule of thumb, qualitative color scales work best when there are three to five different categories that need to be colored. Once we reach eight to ten different categories or more, the task of matching colors to categories becomes too burdensome to be useful, even if the colors remain sufficiently different to be distinguishable in principle.”
- Possible solutions: Label points (potentially color according to a categorisation variable with fewer levels while labelling according to the variable of interest)
- Irrelevant: Do not color without a clear purpose that helps answer a question.
- Too much:
Color vision deficiency
Tip: To make sure your figures work for people with cvd (color-vision-deficiency), don’t just rely on specific color scales. Instead, test your figures in a cvd simulator
Color contrasts
We judge color relative to other colors, so colors might appear darker or lighter than they are depending on surrounding colors. See fx. section 1.3.1 from Kieran Healy.
Color “scales”
See slide 11.
Works for categorical variables:
- hue: Different colors
Works for numerical variables:
Make sure to keep hue constant while changing these.
- chroma: How “saturated” the color is. White, grey, black have no chroma.
- luminance: Darkness/brightness of color. On scale we go from white to black.
It’s not only about color
Redundant coding in essence means we do more than one aesthetic mapping to a single variable. Wilke fx. recommends:
“use color to enhance the visual appearance of the figure without relying entirely on color to convey key information.”
- Scatter plots: Fx. change the shape of points instead of only color.
- Line plots:
- Changing line type often does not work very well.
- “If there is a clear visual ordering in your data, make sure to match it in the legend.”.
- Most plots will create legend in alphabetical order. Make sure to create factor with levels ordered by magnitude.
Designing figures without legends
“Whenever possible, design your figures so they don’t need a legend.”
See packages:
- ggtext for ways to fx. color text labels, titles, etc. See an example of coloring the title to replace a legend with code available in the answer to lab 5 here
ggforcewhich hasgeoms that takelabelas an aesthetic mappingdirectlabels
Titles, captions and tables
- Title: “Either the title is integrated into the actual figure display or it is provided as the first element of the caption underneath the figure.”
- Use
title,subtitleandcaptioninlabs.
- Use
- Axis and legend titles:
- Axis and legend titles should be informative.
- Display units in parentheses when possible.
- Legend label not needed when self explanatory. Fx. when values displayed are “female” and “male”, so the variable is obviously sex/gender.
- Axis and legend titles should be informative.
- Tables:
- See the 6-item breakdown of table guidelines in this section from the reference in the title of this section.
- Place captions on top (underneath for figures).
Tip: Use expressions to make LaTeX rendered text. Fx. write
to render the exponent.
Multi-panel figures
- Small multiples: “Small multiples are plots consisting of multiple panels arranged in a regular grid. Each panel shows a different subset of the data but all panels use the same type of visualization.”
- Fx. created by
ggplot2::facet_wrapandggplot2::facet_grid - Be very careful with having different axis ranges across panels. Guideline to almost never do this. If you HAVE to, make a note about it to the reader.
- Fx. created by
- Coumpound figures: “Compound figures consist of separate figure panels assembled in an arbitrary arrangement (which may or may not be grid based) and showing entirely different visualizations, or possibly even different datasets.”
- Unlike small multiples, where panels are labelled by variables, labels often need to be added manually to compound figures. Make sure they uniquely define each figure and are formatted (font, size, placement, etc.) nicely.
Overplotting
Issue: points that overlap.
Possible fixes:
- Transparent points
- Jitter
- High density of points: Use 2D histogram
- See
ggplot2::geom_bin2dandggplot2::geom_hex(hexagons are more “correct”) - See also
smoothScatterin the basegraphicspackage, whic makes a 2D kernel density estimate but also shows points “on the edge”.
- See
- Contour lines
- See
ggplot2::geom_density2dorggplot2::geom_density2d_filled- Be wary of
ggplot2::geom_density2dnot showing the entire range of data
- Be wary of
- See
References
Lecture notes and slides from STATS 787: Data Visualisation at UoA
References from inside lecture notes
- Top tier book!! “Fundamentals of Data Visualization” by Claus O. Wilke
- “Data Visualization: A practical introduction” by Kieran Healey.