Chapter 1 R Setup
1.1 Preparing your environment for R
The Institute and Faculty of Actuaries have provided their own guide to getting up and running with R
.
The steps to have R
working is dependant on your operating system. The following resources should allow for your local installation of R
to be relatively painless:
- Download and install
R
from CRAN1. - Download and install an integrated development environment, a strong recommendation is RStudio Desktop.
1.2 Basic interations with R
R
is case-sensitive! We add comments to our R
code using the #
symbol on any line. A key concept when working with R
is that the preference is to work with vectorised operations (over concepts like for loops). As an example we start with 1:10
which uses the colon operator (:
) to generate a sequence starting with 1 and ending with 10 in steps of 1. The output is a numeric vector of integers. Let’s see this in R
:
## [1] 3 4 5 6 7 8 9 10 11 12
At the most basic level, R
vectors can be of atomic modes:
- integer,
- numeric (equivalently, double),
- logical which take on the Boolean types: TRUE or FALSE and can be coerced into integers as 1 and 0 respectively,
- character which will be apparent in
R
with the wrapper "", - complex, and
- raw
This book focuses on using R
to solve actuarial statistical problems and will not explore the depths of the R
language2.
R
has the usual arithmetic operators you’d expect with any programming language:
+
,-
,*
,/
for addition, subtraction, multiplication and division,^
for exponentiation,%%
for modulo arithmetic (remainder after division)%/%
for integer division
We assign values to variables using the <-
(“assignment”) operator3.
x <- 1:10
y <- x + 2
x <- x + x # Notice that we can re-assign values to variables
z <- x + 2
y
## [1] 3 4 5 6 7 8 9 10 11 12
z
## [1] 4 6 8 10 12 14 16 18 20 22
Even though \(z\) is assigned the same way as we assigned \(y\), note that \(y \neq z\) so execution order matters in R
. All of \(x\), \(y\) and \(z\) are vectors in R
.
1.3 Functions in R
We can add functions to R
via the format function_name(arguments = values, ...)
:
# c() is the "combine" function, used often to create vectors
# Note we can also nest functions within functions
x <- c(1:3, 6:20, 21:42, c(43, 44))
# Another function with arguments:
y <- sample(x, size = 3)
y
## [1] 11 41 42
There are a lot of in-built functions in R
that we may need:
factorial(x)
choose(n, k)
- for binomial coefficientsexp(x)
log(x)
- by default in base \(e\)gamma(x)
abs(x)
- absolute valuesqrt(x)
sum(x)
mean(x)
median(x)
var(x)
sd(x)
quantile(x, 0.75)
set.seed(seed)
- for reproducibility of random number generationsample(x, size)
R
has an in-built help function ?
which can be used to read the documentation on any function as well as topic areas. For example have a look at ?Special
for more details about in-built R
functions for the beta and gamma functions.
1.4 Data structures in R
We have already seen vectors as a data structure that is very common in R
. We can identify the structure of an R
“object” using the str(object)
function.
Matrices
Next we introduce the matrix structure. When interacting with matrices in R
it is important to note that matrix multiplication requires the %*%
syntax:
## [,1] [,2] [,3]
## [1,] 30 36 42
## [2,] 66 81 96
## [3,] 102 126 150
Dataframes
A data.frame
is a very popular data structure used in R
. Each input variable has to have the same length but can be of different types (strings, integers, booleans, etc.).
# Input vectors for the data.frame
name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
surface_gravity <- c(0.38, 0.904, 1, 0.3794, 2.528, 1.065, 0.886, 1.14)
# Create a data.frame from the vectors
solar_system <- data.frame(name, surface_gravity)
str(solar_system)
## 'data.frame': 8 obs. of 2 variables:
## $ name : chr "Mercury" "Venus" "Earth" "Mars" ...
## $ surface_gravity: num 0.38 0.904 1 0.379 2.528 ...
Lists
A list
is a versatile data structure in R
as their elements can be of any type, including lists themselves. In fact a data.frame
is a specific implementation of a list
which allows columns in a data.frame
to have different types, unlike a matrix
.
We will come across a number of functions that return a list
type whilst working with actuarial statistics in R
. For example when we look at linear models we will make use of the lm(formula, data, ...)
function which returns a list
.
# Use Orange dataset
df <- Orange
# Fit a linear model to predict circumference from age
fitted_lm <- lm(circumference ~ age, df)
# Size of the list
length(fitted_lm)
## [1] 12
# Element names
names(fitted_lm)
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
We can access elements in the list using subsetting, noting the use of the [[
operator. Here we subset on “age” within the “coefficient” element in the list
we called “fitted_lm”:
# Select [[1]] 1st element in the list, sub-select [2] 2nd element from that
fitted_lm[[1]][2]
## age
## 0.1067703
# fitted_lm$coefficient is a shorthand for fitted_lm[["coefficient"]]
fitted_lm$coefficients[2]
## age
## 0.1067703
# Select element using matching character vector "age"
fitted_lm$coefficients["age"]
## age
## 0.1067703
# Select elements using matching character vectors
fitted_lm[["coefficients"]]["age"]
## age
## 0.1067703
1.5 Logical expressions in R
R has built in logic expressions:
Operator | Description |
---|---|
< (<=) | less than (or equal to) |
> (>=) | greater than (or equal to) |
== | exactly equal to |
! | NOT |
& | AND (element-wise) |
| | OR (element-wise) |
!= | not equal to |
We can use logical expressions to effectively filter data via subsetting the data using the [...]
syntax:
## [1] 1 2 3 4 6
We can select objects using the $ symbol (see ?Extract
for more help):
## name surface_gravity
## 5 Jupiter 2.528
1.6 Extending R
with packages
We can extend R
’s functionality by loading packages:
Did you get an error from R
trying this? To load packages they need to be installed using install.packages("package name")
.
1.7 Importing data
R
can import a wide variety of file formats, including:
- .csv
- .RData
- .txt
We can import these using read.csv()
, load()
and read.table()
respectively.
CRAN is the The Comprehensive R Archive Network - read more on the CRAN website↩︎
I fear this is already too indepth for “basic interactions with
R
” but for those that want to jump down the rabbit hole, see Hadley Wickham’s book Advanced R.↩︎We can also assign values using the more familiar
=
symbol. In general this is discouraged, listen to Hadley Wickham.↩︎