When conducting data analysis plotting is critically important. In R, plots are crafted by calling successive functions to essentially build-up a plot. Just like a house you should start with the foundation and progress one step at a time until the home is complete. A best practice when dealing with charts in R is to think in two phases: (1) creating a plot and (2) annotating (adding lines, points, texts, etc) the plot. R is very robust in its plotting system and as such offers a high-degree of flexibility and control over charts which you will come to enjoy.
Plotting System
If you are trying to get to the core of the graphics engine with R remember the following two packages:
- graphics: this includes items such as plot, hist, and boxplot
- grDevices: this includes the graphic devices such as PDF, PostScrip, and PNG
There is a very important package known as the lattice plotting system and it is uniquely implemented as such:
- lattice: this includes the code for creating Trellis graphics using functions like xyplot, bwpot, and levelplot
- grid: lattice build on top of the grid, so you will not directly be calling packages from here
The Process of Making a Plot
During this phrase it is important to consider what it is you would like to accomplish by way of making a plot.
A few questions that you may want to think about before proceeding are:
- Where should I make the plot? (on the screen?, in a file?, etc)
- How is the plot going to be used?
- Is it just for me to conduct exploratory data analysis (temporary)
- Will this be going to a browser online?
- Will this end up in a publication of sorts?
- Is this going to be in a presentation?
- Is it going to just a few points of data or a large amount of data?
- Will I need to have a dynamic graphic?
- What graphic package should I aim to use (base, lattice, or ggplot2)?
It is important to note that graphics generally are constructed in a modular fashion. This means that each section are built in a one-by-one setup using a series of function calls. Many data scientist like this approach as it simulates the way we think.
Alternatively the lattice package requires that you define all parameters upfront which allows for lattice to calculate the appropriate spacing and font sizes.
ggplot2 is a fine package and plots using elements from both base and lattice, however it uses an independent implementation so we will not cover it in this post.
Base Graphics
If you are interested in creating 2-D graphics than you should use the base graphics system.
This is a two-step process:
- Initialize a new plot
- Add to an existing plot
You can call by plot(x, y) or hist(x). This will launch the graphics device and render a new plot. If you are not using the base graphics for some special use case then it will default to the system standard. Keep in mind though it is possible to change things like the title, x-axis label, y-axis label, etc. If you want to investigate further what can actually be changed key in ?par.This will generate the help page for you.
Simple Base Graphics: Histogram
library(datasets) hist(warpbreaks$breaks) ## Draw a new plot |
Simple Base Graphics: Scatterplot
library(datasets) with(ChickWeight, plot(weight, chick) ) |
Simple Base Graphics: Boxplot
library(datasets) airquality <- transform(airquality, Month = factor(Month)) boxplot(Ozone ~ Month, airquality, xlab = "Month", ylab = "Ozone (ppb)") |
Some Important Base Graphics Parameters
Function Name | Definition |
---|---|
col: | the plotting color (review the colors() function) |
lty: | the line type (solid line by default) |
lwd: | the line width |
pch: | the plotting symbol (open circle by default) |
xlab: | characters for x-axis label |
ylab: | characters for y-axis label |
It is worthwhile to investigate the par() function. This function controls the global graphics parameters which affect all the plots in a single R session. You can override parameters by using the following:
Parameter Name | Definition |
---|---|
las: | how axis labels are oriented on the plot |
bg: | background color |
mar: | size of margin |
oma: | outer margin size |
mfrow: | how many plots per row (row-wise) |
mfcol: | how many plots per row (column-wise) |
Default values for global graphic parameters:
par("lty") [1] "solid" par("col") [1] "black" par("pch") [1] 1 par("bg") [1] "white" par("mar") [1] 5.1 4.1 4.1 2.1 par("mfrow") [1] 1 1 |
Base Plotting Functions
Function Name | Definition |
---|---|
plot: | makes a scatterplot |
lines: | adds a line to a plot |
points: | adds points to a plot |
text: | add text labels to a plot |
title: | title and subtitle labels |
mtext: | adds text to margins of the plot |
axis: | add axis ticks/labels |
Base Plot with Annotation
library(datasets) with(ChickWeight, plot(weight, Chick)) title(main="Chicks and Weight in Nashville") ## Add a title |
with(ChickWeight, plot(weight, Chick, main = "Chicks and Weight in Nashville")) with(subset(ChickWeight, Diet == 4), points(weight, Chick, col = "blue")) |
with(ChickWeight, plot(weight, Chick, main = "Chicks and Weight in Nashville", type = "n")) with(subset(ChickWeight, Diet == 4), points(weight, Chick, col = "blue")) with(subset(ChickWeight, Diet != 4), points(weight, Chick, col = "red")) legend("topright", pch = 1, col = c("blue", "red"), legend = c("Not Normal","Normal")) |
Base Plot with Regression Line
with(ChickWeight, plot(weight, Chick, main = "Chicks and Weight in Nashville", pch = 20)) model <- lm(Chick ~ weight, ChickWeight) abline(model, lwd = 2) |
Multiple Base Plots
with(ChickWeight, {plot(weight, Chick, main="Chicks and Weight") + plot(Diet, weight, main ="Weight and Diet")}) |