Chapter 1 R Setup

1.1 Preparing your environment for R

The Institute and Faculty of Actuaries have provided their own guide to getting up and running with R.

The steps to have R working is dependant on your operating system. The following resources should allow for your local installation of R to be relatively painless:

  1. Download and install R from CRAN1.
  2. Download and install an integrated development environment, a strong recommendation is RStudio Desktop.

1.2 Basic interations with R

R is case-sensitive! We add comments to our R code using the # symbol on any line. A key concept when working with R is that the preference is to work with vectorised operations (over concepts like for loops). As an example we start with 1:10 which uses the colon operator (:) to generate a sequence starting with 1 and ending with 10 in steps of 1. The output is a numeric vector of integers. Let’s see this in R:

# This is the syntax for comments in R
(1:10) + 2 # Notice how we add element-wise in R
##  [1]  3  4  5  6  7  8  9 10 11 12

At the most basic level, R vectors can be of atomic modes:

  • integer,
  • numeric (equivalently, double),
  • logical which take on the Boolean types: TRUE or FALSE and can be coerced into integers as 1 and 0 respectively,
  • character which will be apparent in R with the wrapper "",
  • complex, and
  • raw

This book focuses on using R to solve actuarial statistical problems and will not explore the depths of the R language2. R has the usual arithmetic operators you’d expect with any programming language:

  • +, -, *, / for addition, subtraction, multiplication and division,
  • ^ for exponentiation,
  • %% for modulo arithmetic (remainder after division)
  • %/% for integer division

We assign values to variables using the <- (“assignment”) operator3.

x <- 1:10
y <- x + 2
x <- x + x # Notice that we can re-assign values to variables
z <- x + 2
y
##  [1]  3  4  5  6  7  8  9 10 11 12
z
##  [1]  4  6  8 10 12 14 16 18 20 22

Even though z is assigned the same way as we assigned y, note that yz so execution order matters in R. All of x, y and z are vectors in R.

1.3 Functions in R

We can add functions to R via the format function_name(arguments = values, ...):

# c() is the "combine" function, used often to create vectors
# Note we can also nest functions within functions
x <- c(1:3, 6:20, 21:42, c(43, 44))
# Another function with arguments:
y <- sample(x, size = 3)
y
## [1] 11 41 42

There are a lot of in-built functions in R that we may need:

  • factorial(x)
  • choose(n, k) - for binomial coefficients
  • exp(x)
  • log(x) - by default in base e
  • gamma(x)
  • abs(x) - absolute value
  • sqrt(x)
  • sum(x)
  • mean(x)
  • median(x)
  • var(x)
  • sd(x)
  • quantile(x, 0.75)
  • set.seed(seed) - for reproducibility of random number generation
  • sample(x, size)

R has an in-built help function ? which can be used to read the documentation on any function as well as topic areas. For example have a look at ?Special for more details about in-built R functions for the beta and gamma functions.

1.4 Data structures in R

We have already seen vectors as a data structure that is very common in R. We can identify the structure of an R “object” using the str(object) function.

Matrices

Next we introduce the matrix structure. When interacting with matrices in R it is important to note that matrix multiplication requires the %*% syntax:

first_matrix <- matrix(1:9, byrow = TRUE, nrow = 3)
first_matrix %*% first_matrix
##      [,1] [,2] [,3]
## [1,]   30   36   42
## [2,]   66   81   96
## [3,]  102  126  150

Dataframes

A data.frame is a very popular data structure used in R. Each input variable has to have the same length but can be of different types (strings, integers, booleans, etc.).

# Input vectors for the data.frame
name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
surface_gravity <- c(0.38, 0.904, 1, 0.3794, 2.528, 1.065, 0.886, 1.14)
# Create a data.frame from the vectors
solar_system <- data.frame(name, surface_gravity)
str(solar_system)
## 'data.frame':	8 obs. of  2 variables:
##  $ name           : chr  "Mercury" "Venus" "Earth" "Mars" ...
##  $ surface_gravity: num  0.38 0.904 1 0.379 2.528 ...

Lists

A list is a versatile data structure in R as their elements can be of any type, including lists themselves. In fact a data.frame is a specific implementation of a list which allows columns in a data.frame to have different types, unlike a matrix.

We will come across a number of functions that return a list type whilst working with actuarial statistics in R. For example when we look at linear models we will make use of the lm(formula, data, ...) function which returns a list.

# Use Orange dataset
df <- Orange
# Fit a linear model to predict circumference from age
fitted_lm <- lm(circumference ~ age, df)
# Size of the list
length(fitted_lm)
## [1] 12
# Element names
names(fitted_lm)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"

We can access elements in the list using subsetting, noting the use of the [[ operator. Here we subset on “age” within the “coefficient” element in the list we called “fitted_lm”:

# Select [[1]] 1st element in the list, sub-select [2] 2nd element from that
fitted_lm[[1]][2] 
##       age 
## 0.1067703
# fitted_lm$coefficient is a shorthand for fitted_lm[["coefficient"]] 
fitted_lm$coefficients[2] 
##       age 
## 0.1067703
# Select element using matching character vector "age"
fitted_lm$coefficients["age"]
##       age 
## 0.1067703
# Select elements using matching character vectors
fitted_lm[["coefficients"]]["age"]
##       age 
## 0.1067703

1.5 Logical expressions in R

R has built in logic expressions:

Operator Description
< (<=) less than (or equal to)
> (>=) greater than (or equal to)
== exactly equal to
! NOT
& AND (element-wise)
| OR (element-wise)
!= not equal to

We can use logical expressions to effectively filter data via subsetting the data using the [...] syntax:

x <- 1:10
x[x != 5 & x < 7]
## [1] 1 2 3 4 6

We can select objects using the $ symbol (see ?Extract for more help):

#data.frame[rows to select, columns to select]
solar_system[solar_system$name == "Jupiter", c(1:2)]
##      name surface_gravity
## 5 Jupiter           2.528

1.6 Extending R with packages

We can extend R’s functionality by loading packages:

# Load the ggplot2 package
library(ggplot2)

Did you get an error from R trying this? To load packages they need to be installed using install.packages("package name").

1.7 Importing data

R can import a wide variety of file formats, including:

  • .csv
  • .RData
  • .txt

We can import these using read.csv(), load() and read.table() respectively.


  1. CRAN is the The Comprehensive R Archive Network - read more on the CRAN website↩︎

  2. I fear this is already too indepth for “basic interactions with R” but for those that want to jump down the rabbit hole, see Hadley Wickham’s book Advanced R.↩︎

  3. We can also assign values using the more familiar = symbol. In general this is discouraged, listen to Hadley Wickham.↩︎