Use of this document

This is a study note for using \(ggplot2\) package for data visualisation. For more details on the study material see https://r4ds.had.co.nz/data-visualisation.html.

Prerequisites

# essential
library(tidyverse)
# combine plot in a same graph
library(cowplot)

1. A graphing template

This is a reusable template for making graphs with ggplot2. To make a graph, replace the bracketed sections in the code below with a dataset, a geom function, or a collection of mappings.

ggplot(data = <DATA>) +    
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) 
ggplot(data = <DATA>) +    
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>),
                  stat = <STAT>,
                  position = <POSITION>) +
  <COORDINATE_FUNCTION> +   
  <FACET_FUNCTION>  +
  <THEME_FUNCTION>

2. Global and local mappings

You can avoid repetition by passing a set of mapping to ggplot(). \(ggplot2\) will treat these mappings as global mappings that apply to each geom in the graph.

If you place mappings in a geom function, \(ggplot2\) will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers.

plot.1 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()
plot.2 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  geom_smooth(mapping = aes(x = displ, y = hwy))
plot.3 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) + 
  geom_smooth()
plot_grid(plot.1, plot.2, plot.3, nrow=3, labels=c("Global mapping","Local mapping", "Overriding"), align="V")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

3. Layer: GEOM_FUNCTION

A \(geom\) is the geometrical object that a plot uses to represent data. In \(ggplot2\) syntax, use different <GEOM_FUNCTION>() on the same variables can result different plots.

<GEOM_FUNCTION>() Description
geom_point() Points
geom_abline() geom_hline() geom_vline() Reference lines: horizontal, vertical, and diagonal
geom_segment() geom_curve() Line segments and curves
geom_smooth() stat_smooth() Smoothed conditional means
geom_polygon() Polygons
geom_ribbon() geom_area() Ribbons and area plots
geom_bar() geom_col() stat_count() Bar charts
geom_bin2d() stat_bin_2d() Heatmap of 2d bin counts
geom_qq_line() stat_qq_line() geom_qq() stat_qq() A quantile-quantile plot
geom_density() stat_density() Smoothed density estimates
geom_density_2d() stat_density_2d() Contours of a 2d density estimate
geom_freqpoly() geom_histogram() stat_bin() Histograms and frequency polygons
geom_boxplot() stat_boxplot() A box and whiskers plot (in the style of Tukey)
geom_violin() stat_ydensity() Violin plot
geom_crossbar() geom_errorbar() geom_linerange() geom_pointrange() Vertical intervals: lines, crossbars & errorbars

For more examples geometric function, see https://ggplot2.tidyverse.org/reference/.

3.1 aes

You can add a third variable, like class, to a two dimensional scatterplot by mapping it to an aesthetic. An aesthetic is a visual property of the objects in your plot. Aesthetics include things like the size, the shape, the color, or alpha (transparency) of your points.

plot.1 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, size = class))
plot.2 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, shape = class))
plot.3 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class))
plot.4 <- ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
plot_grid(plot.1, plot.2, plot.3, plot.4, labels=c("Size","Shape","Color","Alpha"))

3.2 stats

A handful of layers are more easily specified with a stat_ function, drawing attention to the statistical transformation rather than the visual appearance.

STAT_FUNCTION>() Description
stat_ecdf() Compute empirical cumulative distribution
stat_ellipse() Compute normal confidence ellipses
stat_function() Compute function for each x value
stat_identity() Leave data as is
stat_summary_2d() stat_summary_hex() Bin and summarise in 2d (rectangle & hexagons)
stat_summary_bin() stat_summary() Summarise y values at unique/binned x
stat_unique() Remove duplicates

3.3 position

There is one more piece of magic associated with bar charts. You can colour a bar chart using either the colour aesthetic, or, more usefully, fill. The stacking is performed automatically by the position adjustment specified by the position argument. If you don't want a stacked bar chart, you can use one of three other options: identity, dodge or fill.

  • position = identity will place each object exactly where it falls in the context of the graph
  • position = dodge places overlapping objects directly beside one another. This makes it easier to compare individual values.
  • position = fill works like stacking, but makes each set of stacked bars the same height.
plot.1 <- ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(alpha = 1/5, position = "identity")
plot.2 <- ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(alpha = 1/5, position = "dodge")
plot.3 <- ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(alpha = 1/5, position = "fill")
plot_grid(plot.1, plot.2, plot.3, nrow=3, labels=c("Identity","Dodge","Fill"), align="V")

4. Layer: COORDINATE_FUNCTION

Coordinate systems are probably the most complicated part of ggplot2. The default coordinate system is the Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are a number of other coordinate systems that are occasionally helpful.

5. Layer: FACET_FUNCTION

One way to add additional variables is with aesthetics.
Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display one subset of the data

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

6. Layer: `THEME_FUNCTION

theme(axis.line=element_blank(),
      axis.text.x=element_blank(),
      axis.text.y=element_blank(),
      axis.ticks=element_blank(),
      axis.title.x=element_blank(),
      axis.title.y=element_blank(),
      legend.position="none",
      panel.background=element_blank(),
      panel.border=element_blank(),
      panel.grid.major=element_blank(),
      panel.grid.minor=element_blank(),
      plot.background=element_blank())

7. Example

load("isu_pd.RData")
ggplot(data = isu_pd_topday1718) +
  geom_point(mapping = aes(x = `day of the year`, y = n)) +
  geom_vline(data = isu_football_s, aes(xintercept=`day of the year`, color = home)) +
  facet_wrap(~lubridate::year(day), nrow=2) +
  ylab("Number of police reports") +
  xlab("Day of the year") +
  labs(title = "ISU Police Department Report map with football days")

8. ggmap

see reference: http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html