👨‍💻 Eugene Hickey @ Atlantic Technological University 👨‍💻
Welcome to the course on statistical computing.
We will discuss how to make the most of your data.
Complimentary to the qualitative analysis you do with Emma.
Goal here is to provides skills in data analysis.
We’ll learn to use a software system called R.
Ruth Moran, co-pilot for this workshop Graduate Research & Education Training Officer.
slides produced using quarto
assessments produced using Rexams
Graduate students looking for better ways to present their data.
People currently using tools like MS Excel for analysis.
Working with a mouse isn’t reproducible
Good to separate sources of data from the analysis
R uses series of commands that input, manipulate, and display data
Lots of contributors around the globe, diverse fields
Introduction to Statistical Computing & R - Saturday April 5th
Statistical Inference - Saturday April 12th
Manipulating Data - Saturday April 26th
Getting and Cleaning Data - Saturday May ??
Data Visualisation - Saturday June ??
creating functions, iterating
R is cracker at machine learning
also great at making documents and presentations using quarto
and websites and blogs
swirl()
Karl Broman (https://www.biostat.wisc.edu/~kbroman/), and particularly this presentation
course by Boemhke on github github.com/uc-r/Intro-R
the good people at RStudio have lots of help at https://posit.co/
The R Graph Gallery is pretty good and worth checking out


Coursera: Data Science from Johns Hopkins. The course notes are on github
edx.org course from Irizarry

RWeekly.org, round up of events in the world of R
#Rstats on twitter Mastodon
#TidyTuesday on twitter
if you get stuck, google is your friend. Often sends you to stackoverflow.com or stackexchange.com
first R from CRAN
then RStudio from Posit
alternative is to make an account at Posit Cloud
R is case sensitive
This is a nice tutorial suite to explain installing R and RStudio:

install.packages("palmerpenguins")install.packages("BiocManager")BiocManager::install("some_genomics_package") to useinstall.packages("devtools")devtools::install_github("developer_name/package_name")install.packages("scales")
library(scales)install.packages("tidyverse")
45 + 17 and get the answer backsqrt(x=49) (or just sqrt(49))?sqrtbase:: and then press tabmy_square_root_result <- sqrt(49)<- reads as gets (can also us equals sign, =, but that’s sloppy)x <- 72L to get integer 72x <- "I am Groot".
x <- 'I am Groot'sqrt(-1) gives an error rather than ias.character(), as.numeric()class() functionView() function (note, capital “V”)names() functionmtcars[2, 5] is the 2nd row of the fifth columnmtcars[2, ] is the whole 2nd rowmtcars[,5] is the entire 5th columnmtcars$cyl gives the 2nd column, called cylmtcars$cyl[2] gives the second element of thismtcars[1:4, 2:4] takes a chunk of the mtcars data framedataframes have constraints
all columns must be the same type
rows and columns must be the same length
lists are more general
example, list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3))
list_data[[1]] gives the three months, etc
[[ ]]c() let’s you create a vector of quantities. Coerced to same typeis.na() to check if a missing value, sum(is.na()) gives total of thesedim() gives number of rows / columns in data frame, also length()class() for nature of variablesummary(), str(), glimpse() show data frame parameters. Also skim() from the skimr package%in% quite a bit, checks if value is among a bunch of values
"August" %in% month.name"ÂŁ" %in% lettersSys.date(), Sys.time()ls() gives list of variablesfile.list() gives list of filessessionInfo() tells you what packages are loadedcitation() tells you about a packagemean() calculates mean of a vector, sd() the standard deviationcomplete.cases() returns TRUE if there are no missing values in a row$ to extract a column of a data frame() are arguments to functions[] are gateways to data frame elements%>% is called the pipe and is really cool. Ctrl + Shift + M shortcut! is NOT, | (or ||) is OR, & (or &&) is ANDmultiply the numbers \(25\times \pi\) and save the result to my_first_result
make a vector of five numbers e.g. 34.1, 54.4, 71.5, 93.8, 22.6 and save them to my_second_result
multiply my_second_result by 7 and print the results
print the airquality dataset
examine the airquality dataset using summary(), str(), and glimpse(). For the latter you’ll need the tidyverse library.
store the values of Temp from the airquality dataset in a new variable you can call temperature
calculate the mean() of temperature
calculate the standard deviation, sd(), of temperature
find the names of the columns in the Puromycin dataset
look up the help for the USJudgeRatings dataset and find out what is meant by the “DECI” column name
install and load up the package dslabs. Use is.na() and sum() to find the number of missing values in the us_contagious_diseases dataset
what kind of variable is stored in olive$area?
what are the levels of the factor in olive$region?
Complete week one moodle quiz
Complete swirl() exercises
install.packages("swirl")
library(swirl)
install_course("R Programming E")
swirl()
choose course R Programming: The basics of programming in R
do the four exercises 1 (basic Building Blocks) to 4 (Vectors)
email the results to eugene.hickey@associate.atu.ie