This is a study note for using \(ggplot2\) package for data visualisation. For more details on the study material see https://r4ds.had.co.nz/data-visualisation.html.
# essential
library(tidyverse)
# combine plot in a same graph
library(cowplot)
This is a reusable template for making graphs with ggplot2. To make a graph, replace the bracketed sections in the code below with a dataset, a geom function, or a collection of mappings.
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION> +
<THEME_FUNCTION>
ggplot(data = )
: creates a coordinate system that you can add layers to.
data=
: using the specific dataset.<GEOM_FUNCTION>()
: adds a layer of geometry to the plot.
mapping
: defines how variables in your dataset are mapped to visual properties.
aes(x = , y = )
: always paired with mapping
, and the x and y specify which variables to map to the x and y axes.You can avoid repetition by passing a set of mapping
to ggplot()
. \(ggplot2\) will treat these mappings as global mappings that apply to each geom
in the graph.
If you place mappings in a geom
function, \(ggplot2\) will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers.
plot.1 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
plot.2 <- ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
plot.3 <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
plot_grid(plot.1, plot.2, plot.3, nrow=3, labels=c("Global mapping","Local mapping", "Overriding"), align="V")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
GEOM_FUNCTION
A \(geom\) is the geometrical object that a plot uses to represent data. In \(ggplot2\) syntax, use different <GEOM_FUNCTION>()
on the same variables can result different plots.
<GEOM_FUNCTION>() |
Description |
---|---|
geom_point() |
Points |
geom_abline() geom_hline() geom_vline() |
Reference lines: horizontal, vertical, and diagonal |
geom_segment() geom_curve() |
Line segments and curves |
geom_smooth() stat_smooth() |
Smoothed conditional means |
geom_polygon() |
Polygons |
geom_ribbon() geom_area() |
Ribbons and area plots |
geom_bar() geom_col() stat_count() |
Bar charts |
geom_bin2d() stat_bin_2d() |
Heatmap of 2d bin counts |
geom_qq_line() stat_qq_line() geom_qq() stat_qq() |
A quantile-quantile plot |
geom_density() stat_density() |
Smoothed density estimates |
geom_density_2d() stat_density_2d() |
Contours of a 2d density estimate |
geom_freqpoly() geom_histogram() stat_bin() |
Histograms and frequency polygons |
geom_boxplot() stat_boxplot() |
A box and whiskers plot (in the style of Tukey) |
geom_violin() stat_ydensity() |
Violin plot |
geom_crossbar() geom_errorbar() geom_linerange() geom_pointrange() |
Vertical intervals: lines, crossbars & errorbars |
For more examples geometric function, see https://ggplot2.tidyverse.org/reference/.
aes
You can add a third variable, like class, to a two dimensional scatterplot by mapping it to an aesthetic. An aesthetic is a visual property of the objects in your plot. Aesthetics include things like the size
, the shape
, the color
, or alpha
(transparency) of your points.
plot.1 <- ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
plot.2 <- ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
plot.3 <- ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
plot.4 <- ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
plot_grid(plot.1, plot.2, plot.3, plot.4, labels=c("Size","Shape","Color","Alpha"))
stats
A handful of layers are more easily specified with a stat_
function, drawing attention to the statistical transformation rather than the visual appearance.
STAT_FUNCTION>() |
Description |
---|---|
stat_ecdf() |
Compute empirical cumulative distribution |
stat_ellipse() |
Compute normal confidence ellipses |
stat_function() |
Compute function for each x value |
stat_identity() |
Leave data as is |
stat_summary_2d() stat_summary_hex() |
Bin and summarise in 2d (rectangle & hexagons) |
stat_summary_bin() stat_summary() |
Summarise y values at unique/binned x |
stat_unique() |
Remove duplicates |
position
There is one more piece of magic associated with bar charts. You can colour a bar chart using either the colour aesthetic, or, more usefully, fill. The stacking is performed automatically by the position adjustment specified by the position
argument. If you don't want a stacked bar chart, you can use one of three other options: identity
, dodge
or fill
.
identity
will place each object exactly where it falls in the context of the graphdodge
places overlapping objects directly beside one another. This makes it easier to compare individual values.fill
works like stacking, but makes each set of stacked bars the same height.plot.1 <- ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "identity")
plot.2 <- ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "dodge")
plot.3 <- ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "fill")
plot_grid(plot.1, plot.2, plot.3, nrow=3, labels=c("Identity","Dodge","Fill"), align="V")
COORDINATE_FUNCTION
Coordinate systems are probably the most complicated part of ggplot2. The default coordinate system is the Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are a number of other coordinate systems that are occasionally helpful.
coord_flip()
: switches the x and y axescoord_quickmap()
sets the aspect ratio correctly for mapscoord_polar()
uses polar coordinates.FACET_FUNCTION
One way to add additional variables is with aesthetics.
Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display one subset of the data
facet_wrap(~ variable)
: facet a plot by a single variable. The variable that you pass to facet_wrap() should be discrete.facet_grid(variable.1 ~ variable.2)
: facet a plot on the combination of two variables.ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
THEME_FUNCTION
theme(axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
legend.position="none",
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),
plot.background=element_blank())
load("isu_pd.RData")
ggplot(data = isu_pd_topday1718) +
geom_point(mapping = aes(x = `day of the year`, y = n)) +
geom_vline(data = isu_football_s, aes(xintercept=`day of the year`, color = home)) +
facet_wrap(~lubridate::year(day), nrow=2) +
ylab("Number of police reports") +
xlab("Day of the year") +
labs(title = "ISU Police Department Report map with football days")
see reference: http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html