Skip to Main Content

Learn R

This guide focuses on transformation and cleaning functions in R that are especially useful for working with tabular datasets.

Basic commands and Data types in R

Basic console commands in R

The R environment can be used to compute calculations and assign variables. As a new R-user, you might want to practice these simple exercises by typing them into the console window:

 

Data Types in R

There are different data types in R. These data types can be numeric, integer, logical/boolean, character/string, vector, matrix, array, list, data-frame. It is useful to know the data type in order to know what functions can be performed on the object.

To determine the type of data, you can use the class(), mode() or typeof() functions. The following commands create different variables and check their type using the class() function.

It is possible to convert from one data type to another by using functions such as as.integer(), as.vector(), as.matrix() etc.

Type typeof(df).  You will see the dataset we are using is type “data frame”.

 

Data Structures

Data Frame

  • Each column is a variable, each row is an observation
  • Internally, each column is a vector
Create Data Frame
df1<- data.frame(col1 = v1, col2 = v2, v3)
Dimension
nrow(df1); dim(df1); ncol(df1);
Get/Set Column Names
names(df1)
names(df1) <- c(....)
Get/Set Row Names
rownames(df1); rownames(df1) <- c(....)
Preview
head(df1); tail(df1)
Get Data Type
class(df1)
Index by Columns
df1['col1']; df1[1]
df1[c('col1','col3')]; df1[c(1,3)]
Index by Rows & Columns
df1[c(1,3), 2:3] #returns data from row 1&3, columns 2 to 3

 

Data Table

What is a data table?

  • Extends & enhances the of functionality of data frames

Differences - (data.table vs data.table)

  • By default, data.frame turns character data into factors, while data.table does not
  • Printing to the console, data.table intelligently only prints first 5 rows
  • KEY DIFFERENCE - data.tables are faster because they are indexed like a database
Create data.table from data.frame
dt1<-data.table(df1)
Index by column(s)
dt1[,'col1', with = FALSE]
Show info for each data.table in memory (i.e. size,...)
tables()
Show keys in data.table
key(dt1)
Create index for col1 & recorder data according to col1
setkey(dt1,col1)
Use key to Select data
dt1[c('col1Value1','colValue2'),]
Multiple key select
dt1[J('1', c('2','3')),]
Aggregation
dt1[,list(col = mean(col1), col2Sum = sum(col2)), by=list(col3, col4)]

 

Liaison Librarian

Profile Photo
Martin Morris
Contact:
Schulich Library of Physical Sciences, Life Sciences and Engineering
Macdonald-Stewart Library Building
809 rue Sherbrooke Ouest
Montréal, Québec H3A 0C1
(514) 398 8140
Website Skype Contact: martinatmcgill
Social: Twitter Page

McGill LibraryQuestions? Ask us!
Privacy notice