Chapter 1 R Setup

1.1 Preparing your environment for R

The Institute and Faculty of Actuaries have provided their own guide to getting up and running with R.

The steps to have R working is dependant on your operating system. The following resources should allow for your local installation of R to be relatively painless:

  1. Download and install R from CRAN1.
  2. Download and install an integrated development environment, a strong recommendation is RStudio Desktop.

1.2 Basic interations with R

R is case-sensitive! We add comments to our R code using the # symbol on any line. A key concept when working with R is that the preference is to work with vectorised operations (over concepts like for loops). As an example we start with 1:10 which uses the colon operator (:) to generate a sequence starting with 1 and ending with 10 in steps of 1. The output is a numeric vector of integers. Let’s see this in R:

# This is the syntax for comments in R
(1:10) + 2 # Notice how we add element-wise in R
##  [1]  3  4  5  6  7  8  9 10 11 12

At the most basic level, R vectors can be of atomic modes:

  • integer,
  • numeric (equivalently, double),
  • logical which take on the Boolean types: TRUE or FALSE and can be coerced into integers as 1 and 0 respectively,
  • character which will be apparent in R with the wrapper "",
  • complex, and
  • raw

This book focuses on using R to solve actuarial statistical problems and will not explore the depths of the R language2. R has the usual arithmetic operators you’d expect with any programming language:

  • +, -, *, / for addition, subtraction, multiplication and division,
  • ^ for exponentiation,
  • %% for modulo arithmetic (remainder after division)
  • %/% for integer division

We assign values to variables using the <- (“assignment”) operator3.

x <- 1:10
y <- x + 2
x <- x + x # Notice that we can re-assign values to variables
z <- x + 2
y
##  [1]  3  4  5  6  7  8  9 10 11 12
z
##  [1]  4  6  8 10 12 14 16 18 20 22

Even though \(z\) is assigned the same way as we assigned \(y\), note that \(y \neq z\) so execution order matters in R. All of \(x\), \(y\) and \(z\) are vectors in R.

1.3 Functions in R

We can add functions to R via the format function_name(arguments = values, ...):

# c() is the "combine" function, used often to create vectors
# Note we can also nest functions within functions
x <- c(1:3, 6:20, 21:42, c(43, 44))
# Another function with arguments:
y <- sample(x, size = 3)
y
## [1] 11 41 42

There are a lot of in-built functions in R that we may need:

  • factorial(x)
  • choose(n, k) - for binomial coefficients
  • exp(x)
  • log(x) - by default in base \(e\)
  • gamma(x)
  • abs(x) - absolute value
  • sqrt(x)
  • sum(x)
  • mean(x)
  • median(x)
  • var(x)
  • sd(x)
  • quantile(x, 0.75)
  • set.seed(seed) - for reproducibility of random number generation
  • sample(x, size)

R has an in-built help function ? which can be used to read the documentation on any function as well as topic areas. For example have a look at ?Special for more details about in-built R functions for the beta and gamma functions.

1.4 Data structures in R

We have already seen vectors as a data structure that is very common in R. We can identify the structure of an R “object” using the str(object) function.

Matrices

Next we introduce the matrix structure. When interacting with matrices in R it is important to note that matrix multiplication requires the %*% syntax:

first_matrix <- matrix(1:9, byrow = TRUE, nrow = 3)
first_matrix %*% first_matrix
##      [,1] [,2] [,3]
## [1,]   30   36   42
## [2,]   66   81   96
## [3,]  102  126  150

Dataframes

A data.frame is a very popular data structure used in R. Each input variable has to have the same length but can be of different types (strings, integers, booleans, etc.).

# Input vectors for the data.frame
name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
surface_gravity <- c(0.38, 0.904, 1, 0.3794, 2.528, 1.065, 0.886, 1.14)
# Create a data.frame from the vectors
solar_system <- data.frame(name, surface_gravity)
str(solar_system)
## 'data.frame':	8 obs. of  2 variables:
##  $ name           : chr  "Mercury" "Venus" "Earth" "Mars" ...
##  $ surface_gravity: num  0.38 0.904 1 0.379 2.528 ...

Lists

A list is a versatile data structure in R as their elements can be of any type, including lists themselves. In fact a data.frame is a specific implementation of a list which allows columns in a data.frame to have different types, unlike a matrix.

We will come across a number of functions that return a list type whilst working with actuarial statistics in R. For example when we look at linear models we will make use of the lm(formula, data, ...) function which returns a list.

# Use Orange dataset
df <- Orange
# Fit a linear model to predict circumference from age
fitted_lm <- lm(circumference ~ age, df)
# Size of the list
length(fitted_lm)
## [1] 12
# Element names
names(fitted_lm)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"

We can access elements in the list using subsetting, noting the use of the [[ operator. Here we subset on “age” within the “coefficient” element in the list we called “fitted_lm”:

# Select [[1]] 1st element in the list, sub-select [2] 2nd element from that
fitted_lm[[1]][2] 
##       age 
## 0.1067703
# fitted_lm$coefficient is a shorthand for fitted_lm[["coefficient"]] 
fitted_lm$coefficients[2] 
##       age 
## 0.1067703
# Select element using matching character vector "age"
fitted_lm$coefficients["age"]
##       age 
## 0.1067703
# Select elements using matching character vectors
fitted_lm[["coefficients"]]["age"]
##       age 
## 0.1067703

1.5 Logical expressions in R

R has built in logic expressions:

Operator Description
< (<=) less than (or equal to)
> (>=) greater than (or equal to)
== exactly equal to
! NOT
& AND (element-wise)
| OR (element-wise)
!= not equal to

We can use logical expressions to effectively filter data via subsetting the data using the [...] syntax:

x <- 1:10
x[x != 5 & x < 7]
## [1] 1 2 3 4 6

We can select objects using the $ symbol (see ?Extract for more help):

#data.frame[rows to select, columns to select]
solar_system[solar_system$name == "Jupiter", c(1:2)]
##      name surface_gravity
## 5 Jupiter           2.528

1.6 Extending R with packages

We can extend R’s functionality by loading packages:

# Load the ggplot2 package
library(ggplot2)

Did you get an error from R trying this? To load packages they need to be installed using install.packages("package name").

1.7 Importing data

R can import a wide variety of file formats, including:

  • .csv
  • .RData
  • .txt

We can import these using read.csv(), load() and read.table() respectively.


  1. CRAN is the The Comprehensive R Archive Network - read more on the CRAN website↩︎

  2. I fear this is already too indepth for “basic interactions with R” but for those that want to jump down the rabbit hole, see Hadley Wickham’s book Advanced R.↩︎

  3. We can also assign values using the more familiar = symbol. In general this is discouraged, listen to Hadley Wickham.↩︎