Visualization with
ggplot2

Lecture 10

Dr. Colin Rundel

The Grammar of Graphics

  • Visualization concept created by Leland Wilkinson (The Grammar of Graphics, 1999)

  • attempt to taxonomize the basic elements of statistical graphics

  • Adapted for R by Hadley Wickham (2009)

    • consistent and compact syntax to describe statistical graphics

    • highly modular as it breaks up graphs into semantic components

    • ggplot2 is not meant as a guide to which graph to use and how to best convey your data (more on that later), but it does have some strong opinions.

Terminology

A statistical graphic is a…

  • mapping of data

  • which may be statistically transformed (summarized, log-transformed, etc.)

  • to aesthetic attributes (color, size, xy-position, etc.)

  • using geometric objects (points, lines, bars, etc.)

  • and mapped onto a specific facet and coordinate system

Anatomy of a ggplot call

ggplot(
  data = [dataframe], 
  mapping = aes(
    x = [var x], y = [var y], 
    color = [var color], 
    shape = [var shape],
    ...
  )
) +
  geom_[some geom](
    mapping = aes(
      color = [var geom color],
      ...
    )
  ) +
  ... # other geometries
  scale_[some axis]_[some scale]() +
  facet_[some facet]([formula]) +
  ... # other options

Data - Palmer Penguins

Measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.

library(palmerpenguins)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm
   <fct>   <fct>              <dbl>         <dbl>
 1 Adelie  Torgersen           39.1          18.7
 2 Adelie  Torgersen           39.5          17.4
 3 Adelie  Torgersen           40.3          18  
 4 Adelie  Torgersen           NA            NA  
 5 Adelie  Torgersen           36.7          19.3
 6 Adelie  Torgersen           39.3          20.6
 7 Adelie  Torgersen           38.9          17.8
 8 Adelie  Torgersen           39.2          19.6
 9 Adelie  Torgersen           34.1          18.1
10 Adelie  Torgersen           42            20.2
# ℹ 334 more rows
# ℹ 4 more variables: flipper_length_mm <int>,
#   body_mass_g <int>, sex <fct>, year <int>

Text <-> Plot

Start with the penguins data frame

ggplot(data = penguins)

Start with the penguins data frame, map bill depth to the x-axis

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm
  )
) 

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis.

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
)

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) + 
  geom_point()

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point.

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) + 
  geom_point(
    mapping = aes(color = species)
  )

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) +
  geom_point(
    mapping = aes(color = species)
  ) +
  labs(title = "Bill depth and length")

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) +
  geom_point(
    mapping = aes(color = species)
  ) +
  labs(
    title = "Bill depth and length",
    subtitle = paste(
      "Dimensions for Adelie,",
      "Chinstrap, and Gentoo",
      "Penguins")
  ) 

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) +
  geom_point(
    mapping = aes(color = species)
  ) +
  labs(
    title = "Bill depth and length",
    subtitle = paste(
      "Dimensions for Adelie,",
      "Chinstrap, and Gentoo",
      "Penguins"),
    x = "Bill depth (mm)",
    y = "Bill length (mm)"
  )

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively, label the legend “Species”

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) +
  geom_point(
    mapping = aes(color = species)
  ) +
  labs(
    title = "Bill depth and length",
    subtitle = paste(
      "Dimensions for Adelie,",
      "Chinstrap, and Gentoo",
      "Penguins"),
    x = "Bill depth (mm)",
    y = "Bill length (mm)",
    color = "Species"
  ) 

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively, label the legend “Species”, and add a caption for the data source.

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) +
  geom_point(
    mapping = aes(color = species)
  ) +
  labs(
    title = "Bill depth and length",
    subtitle = paste(
      "Dimensions for Adelie,",
      "Chinstrap, and Gentoo",
      "Penguins"),
    x = "Bill depth (mm)",
    y = "Bill length (mm)",
    color = "Species",
    caption = "Source: palmerpenguins package"
  )

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively, label the legend “Species”, and add a caption for the data source. Finally, use the viridis color palette for all points.

ggplot(
    data = penguins,
    mapping = aes(
      x = bill_depth_mm,
      y = bill_length_mm
    )
  ) +
    geom_point(
      mapping = aes(color = species)
    ) +
    labs(
      title = "Bill depth and length",
      subtitle = paste(
        "Dimensions for Adelie,",
        "Chinstrap, and Gentoo",
        "Penguins"),
      x = "Bill depth (mm)",
      y = "Bill length (mm)",
      color = "Species",
      caption = "Source: palmerpenguins package"
    ) +
    scale_color_viridis_d()

Aesthetics

Aesthetics options

Commonly used characteristics of plotting geometries that can be mapped to a specific variable in the data, examples include:

  • position (x, y)
  • color
  • shape
  • size
  • alpha (transparency)

Different geometries have different aesthetics available - see the ggplot2 geoms help files for listings.

  • Aesthetics given in ggplot() apply to all geoms.

  • Aesthetics for a specific geom_*() can be overridden via mapping or as an argument.

color

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(
    aes(color = species)
  )
Warning: Removed 2 rows containing missing values or values outside the
scale range (`geom_point()`).

Stop the warning

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(
    aes(color = species), na.rm=TRUE
  )

Shape

Mapped to a different variable than color

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) +
  geom_point(
    aes(color = species, shape = island), na.rm = TRUE
  )

Shape

Mapped to same variable as color

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) +
  geom_point(
    aes(color = species, shape = species), na.rm = TRUE
  )

Size

Using a fixed value - note that this value is outside of the aes call

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) +
  geom_point(
    aes(color = species, shape = species), na.rm = TRUE,
    size = 3
  )

Size

Mapped to a variable

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) +
  geom_point(
    aes(color = species, shape = species, size = body_mass_g), na.rm = TRUE
  )

Alpha

ggplot(
  penguins,
  aes(x = bill_depth_mm, y = bill_length_mm)
) +
  geom_point(
    aes(color = species, shape = species, alpha = body_mass_g), na.rm = TRUE,
    size = 3
  )

Mapping vs settings

  • Mapping - Determine an aesthetic (the size, alpha, etc.) of a geom based on the values of a variable in the data
    • wrapped by aes() and pass as mapping argument to ggplot() or geom_*().


  • Setting - Determine an aesthetic (the size, alpha, etc.) of a geom using a constant value not directly from the data.
    • passed directly into geom_*() as an argument.


From the previous slide color, shape, and alpha are all aesthetics while size was a setting.

Labels

labs()

In our previous example we saw the use of labs() to provide human readable labels to various plot elements.

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) +
  geom_point(
    mapping = aes(color = species), na.rm = TRUE
  ) +
  labs(
    title = "Bill depth and length",
    subtitle = paste(
      "Dimensions for Adelie,",
      "Chinstrap, and Gentoo",
      "Penguins"),
    x = "Bill depth (mm)",
    y = "Bill length (mm)",
    color = "Species",
    caption = "Source: palmerpenguins package"
  )

Labels

Instead of overridding with labs() we can instead annotate the data so that the label is generated automatically, by attaching a label argument to the appropriate column in our data frame.

p_labeled = penguins
attr(p_labeled$species, "label") = "Species"
attr(p_labeled$bill_depth_mm, "label") = "Bill depth (mm)"
attr(p_labeled$bill_length_mm, "label") = "Bill length (mm)"

ggplot(
  data = p_labeled,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) +
  geom_point(
    mapping = aes(color = species), na.rm = TRUE
  ) +
  labs(
    title = "Bill depth and length",
    subtitle = paste(
      "Dimensions for Adelie,",
      "Chinstrap, and Gentoo",
      "Penguins"),
    caption = "Source: palmerpenguins package"
  )

Dictionary

Alternatively, we can provide a dictionary / lookup table to labs() dictionary argument, which will then be used.

lookup = c(
  species = "Species",
  bill_depth_mm = "Bill depth {mm}",
  bill_length_mm = "Bill length {mm}"
)

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
  )
) +
  geom_point(
    mapping = aes(color = species), na.rm = TRUE
  ) +
  labs(
    title = "Bill depth and length",
    subtitle = paste(
      "Dimensions for Adelie,",
      "Chinstrap, and Gentoo",
      "Penguins"),
    caption = "Source: palmerpenguins package",
    dictionary = lookup
  )

Faceting

Faceting

  • Smaller plots that display different subsets of the data

  • Useful for exploring conditional relationships and large data

  • Sometimes referred to as “small multiples”

facet_grid

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(na.rm = TRUE) +
  facet_grid(
    species ~ island
  )  

Compare with …

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(
    aes(color = species, shape = island), na.rm = TRUE, size = 3
  )

Faceting and color

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)
) +
  geom_point(na.rm = TRUE) +
  facet_grid(species ~ island)

Hiding legend elements

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)
) +
  geom_point(na.rm = TRUE) +
  facet_grid(species ~ island) +
  guides(color = "none")

Facet layout - context

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)
) +
  geom_point(color = "grey", alpha = 0.5, na.rm = TRUE, layout = "fixed") +
  geom_point(na.rm = TRUE) +
  facet_grid(species ~ island) +
  guides(color = "none")

Facet layout - annotation

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)
) +
  geom_point(color = "grey", alpha = 0.5, na.rm = TRUE, layout = "fixed") +
  geom_point(na.rm = TRUE) +
  facet_grid(species ~ island) +
  guides(color = "none") +
  geom_text(
    x = 17.5, y = 35, label = "Only sampled on Dream", size = 6, color = "black", 
    layout = 5
  ) +
  geom_text(
    x = 17.5, y = 35, label = "Only sampled on Biscoe", size = 6, color = "black", 
    layout = 7
  )

Facet layout - annotation

Facet axes

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)
) +
  geom_point(color = "grey", alpha = 0.5, na.rm = TRUE, layout = "fixed") +
  geom_point(na.rm = TRUE) +
  facet_grid(species ~ island, axes = "all") +
  guides(color = "none")

Facet axes - labels

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)
) +
  geom_point(color = "grey", alpha = 0.5, na.rm = TRUE, layout = "fixed") +
  geom_point(na.rm = TRUE) +
  facet_grid(species ~ island, axes = "all", axis.labels = "margins") +
  guides(color = "none")

facet_grid (columns)

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(na.rm = TRUE) +
  facet_grid(~ species)  

facet_grid (rows)

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(na.rm = TRUE) +
  facet_grid(species ~ .)  

facet_wrap

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(~ species)

facet_wrap

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(~ species, ncol = 2)

facet_wrap - direction

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(~ species, ncol = 2, dir = "br")

facet_wrap - direction

ggplot(
  penguins, aes(x = bill_depth_mm, y = bill_length_mm)
) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(~ species, ncol = 2, dir = "rb")

Learning more

geom tour

Exercises

Exercise 1

Recreate, as faithfully as possible, the following plot using ggplot2 and the penguins data.

Exercise 2

Recreate, as faithfully as possible, the following plot from the palmerpenguin package README in ggplot2.