Chapter 1 R Setup
1.1 Preparing your environment for R
The Institute and Faculty of Actuaries have provided their own guide to getting up and running with R
.
The steps to have R
working is dependant on your operating system. The following resources should allow for your local installation of R
to be relatively painless:
- Download and install
R
from CRAN1. - Download and install an integrated development environment, a strong recommendation is RStudio Desktop.
1.2 Basic interations with R
R
is case-sensitive! We add comments to our R
code using the #
symbol on any line. A key concept when working with R
is that the preference is to work with vectorised operations (over concepts like for loops). As an example we start with 1:10
which uses the colon operator (:
) to generate a sequence starting with 1 and ending with 10 in steps of 1. The output is a numeric vector of integers. Let’s see this in R
:
# This is the syntax for comments in R
1:10) + 2 # Notice how we add element-wise in R (
## [1] 3 4 5 6 7 8 9 10 11 12
At the most basic level, R
vectors can be of atomic modes:
- integer,
- numeric (equivalently, double),
- logical which take on the Boolean types: TRUE or FALSE and can be coerced into integers as 1 and 0 respectively,
- character which will be apparent in
R
with the wrapper "", - complex, and
- raw
This book focuses on using R
to solve actuarial statistical problems and will not explore the depths of the R
language2.
R
has the usual arithmetic operators you’d expect with any programming language:
+
,-
,*
,/
for addition, subtraction, multiplication and division,^
for exponentiation,%%
for modulo arithmetic (remainder after division)%/%
for integer division
We assign values to variables using the <-
(“assignment”) operator3.
1:10
x <- x + 2
y <- x + x # Notice that we can re-assign values to variables
x <- x + 2
z <-
y## [1] 3 4 5 6 7 8 9 10 11 12
z## [1] 4 6 8 10 12 14 16 18 20 22
Even though z is assigned the same way as we assigned y, note that y≠z so execution order matters in R
. All of x, y and z are vectors in R
.
1.3 Functions in R
We can add functions to R
via the format function_name(arguments = values, ...)
:
# c() is the "combine" function, used often to create vectors
# Note we can also nest functions within functions
c(1:3, 6:20, 21:42, c(43, 44))
x <-# Another function with arguments:
sample(x, size = 3)
y <- y
## [1] 11 41 42
There are a lot of in-built functions in R
that we may need:
factorial(x)
choose(n, k)
- for binomial coefficientsexp(x)
log(x)
- by default in base egamma(x)
abs(x)
- absolute valuesqrt(x)
sum(x)
mean(x)
median(x)
var(x)
sd(x)
quantile(x, 0.75)
set.seed(seed)
- for reproducibility of random number generationsample(x, size)
R
has an in-built help function ?
which can be used to read the documentation on any function as well as topic areas. For example have a look at ?Special
for more details about in-built R
functions for the beta and gamma functions.
1.4 Data structures in R
We have already seen vectors as a data structure that is very common in R
. We can identify the structure of an R
“object” using the str(object)
function.
Matrices
Next we introduce the matrix structure. When interacting with matrices in R
it is important to note that matrix multiplication requires the %*%
syntax:
matrix(1:9, byrow = TRUE, nrow = 3)
first_matrix <-%*% first_matrix first_matrix
## [,1] [,2] [,3]
## [1,] 30 36 42
## [2,] 66 81 96
## [3,] 102 126 150
Dataframes
A data.frame
is a very popular data structure used in R
. Each input variable has to have the same length but can be of different types (strings, integers, booleans, etc.).
# Input vectors for the data.frame
c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
name <- c(0.38, 0.904, 1, 0.3794, 2.528, 1.065, 0.886, 1.14)
surface_gravity <-# Create a data.frame from the vectors
data.frame(name, surface_gravity)
solar_system <-str(solar_system)
## 'data.frame': 8 obs. of 2 variables:
## $ name : chr "Mercury" "Venus" "Earth" "Mars" ...
## $ surface_gravity: num 0.38 0.904 1 0.379 2.528 ...
Lists
A list
is a versatile data structure in R
as their elements can be of any type, including lists themselves. In fact a data.frame
is a specific implementation of a list
which allows columns in a data.frame
to have different types, unlike a matrix
.
We will come across a number of functions that return a list
type whilst working with actuarial statistics in R
. For example when we look at linear models we will make use of the lm(formula, data, ...)
function which returns a list
.
# Use Orange dataset
Orange
df <-# Fit a linear model to predict circumference from age
lm(circumference ~ age, df)
fitted_lm <-# Size of the list
length(fitted_lm)
## [1] 12
# Element names
names(fitted_lm)
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
We can access elements in the list using subsetting, noting the use of the [[
operator. Here we subset on “age” within the “coefficient” element in the list
we called “fitted_lm”:
# Select [[1]] 1st element in the list, sub-select [2] 2nd element from that
1]][2]
fitted_lm[[## age
## 0.1067703
# fitted_lm$coefficient is a shorthand for fitted_lm[["coefficient"]]
$coefficients[2]
fitted_lm## age
## 0.1067703
# Select element using matching character vector "age"
$coefficients["age"]
fitted_lm## age
## 0.1067703
# Select elements using matching character vectors
"coefficients"]]["age"]
fitted_lm[[## age
## 0.1067703
1.5 Logical expressions in R
R has built in logic expressions:
Operator | Description |
---|---|
< (<=) | less than (or equal to) |
> (>=) | greater than (or equal to) |
== | exactly equal to |
! | NOT |
& | AND (element-wise) |
| | OR (element-wise) |
!= | not equal to |
We can use logical expressions to effectively filter data via subsetting the data using the [...]
syntax:
1:10
x <-!= 5 & x < 7] x[x
## [1] 1 2 3 4 6
We can select objects using the $ symbol (see ?Extract
for more help):
#data.frame[rows to select, columns to select]
$name == "Jupiter", c(1:2)] solar_system[solar_system
## name surface_gravity
## 5 Jupiter 2.528
1.6 Extending R
with packages
We can extend R
’s functionality by loading packages:
# Load the ggplot2 package
library(ggplot2)
Did you get an error from R
trying this? To load packages they need to be installed using install.packages("package name")
.
1.7 Importing data
R
can import a wide variety of file formats, including:
- .csv
- .RData
- .txt
We can import these using read.csv()
, load()
and read.table()
respectively.
CRAN is the The Comprehensive R Archive Network - read more on the CRAN website↩︎
I fear this is already too indepth for “basic interactions with
R
” but for those that want to jump down the rabbit hole, see Hadley Wickham’s book Advanced R.↩︎We can also assign values using the more familiar
=
symbol. In general this is discouraged, listen to Hadley Wickham.↩︎