Welcome to R!

Statistical analysis is the core for nearly all research projects and researchers have a wide variety of statistical tools that they can use, like SPSS and SAS. Unfortunately, these analysis tools are expensive or difficult to master so this lab manual introduces R, a powerful, open source statistical analysis program that is available free of charge. But before diving into a statistics package there is one important background fundamental that must be covered: data.

Types of Data

There are two main types of data and it is important to understand the difference between them since that determines appropriate analytical tests.

  • Continuous data are integer or decimal numbers and are typically used for counts or measures – like a person’s weight, a tree’s height, or a car’s speed. Continuous data are measured with scales that have equal divisions so the difference between any two values can be calculated. Because continuous data include characteristics like means and standard deviations, they are analyzed using parametric tests.

  • Categorical data group observations into a limited number of categories; for example, type of pet (cat, dog, bird, etc.) or place of residence (Arizona, California, etc.). One common type of categorical data is generated with an “agree-disagree” type of scale, like “I enjoy reading: Strongly Agree : Agree : Neutral : Disagree : Strongly Disagree.” Because categorical data do not have characteristics like means or standard deviations, they are analyzed using nonparametric tests.

The R Command Line

All R commands are entered from a “Command Line” environment. Many students find this a bit challenging at first but once they learn some foundational concepts the command line becomes easy and fast to use. This is an explanation for the R script in the box below.

Demonstration: The R Command Line

  • Line 1: This is a comment that is used to record notes in a script. In R, all comments start with a hash-mark (#) and everything after that symbol is ignored. Comments are used frequently in scripts presented in this manual in order to explain what the script is doing. Good programmers comment liberally so team members can easily figure out what they did.

  • Line 2: Calculate the value of 3+5.

  • Line 3: Calculate the value of “5 + 8 * 2”.

  • Lines 6-7: These lines create two variables, MaxScore and MinScore, and then assign values to the variables. You should note two important things about these lines. First, the “assignment” operator is a less than sign followed by a hyphen, making a leftpointing arrow like <-. That tells R to store the number on the right side of the arrow operator into the variable named on the left side of the line. Also, keep in mind that capitalization matters with R. Thus, the variable named MaxScore would be different than a variable named maxscore. These lines only store values in variables and nothing gets printed to the screen.

A variable is nothing more than a place in memory to store temporary data. Think of it as a “box” that is used to store something until it is needed later.
  • Line 8: The variable Range is filled with the result of subtracting MaxScore minus MinScore.

  • Line 9: Entering a variable name, like Range, on a line by itself causes the value stored in that variable to be displayed.

  • Line 12: In R, a list of numbers can be stored in a single variable by using the “combine” function, which is a c followed by a list of the numbers inside a parenthesis. This line creates a variable called TestScores and then stores a list of six numbers in that variable.

  • Line 13: The contents of the variable TestScores is printed to the screen.

The DataCamp blocks found throughout this lab manual are designed to “try out” R commands. Click the yellow “Run” button in the lower left corner to execute the entire script and see the results. There are two panels available in the box. Clicking script.R in the top of the box displays the R script and clicking R Console displays the results of executing the script. The commands in the script can be modified and rerun in order to “play around” with R. Also, R commands can be entered directly in the R Console and executed by tapping the ENTER key.

Skill Check: The R Command Line

Now is a time to try some of the command line skills demonstrated above. In the following R codebox, calculate these values.


  1. In the second line the ^ symbol means “raise to the power of” so that line reads “24 plus the value of 2 raised to the 6th power.”

  2. In the third line the sqrt() function calculates the square root of the number in the parenthesis, or the square root of 9 in this example.

  • Set the variable M equal to 15 + ( 37 * 2 )

  • Set the variable N equal to 24 + ( 2 ^ 6 )

  • Set the variable P equal to 15 - ( sqrt(9) )


Data Frames

A data frame is a collection of data generated during a research project. An example data frame that is easy to understand would be a spreadsheet that contains the times recorded for a race. R comes configured with a 103 built-in data frames used for training and the R script below is an introduction to one of the data frames used in several of the labs in this manual: mtcars.

Demonstration: Data Frames

  • Line 2: Entering the name of the data frame, mtcars, on a line by itself causes R to print the contents of the entire the data frame to the screen. Since mtcars is rather small it is fine to print it to the screen, but some data frames have hundreds of lines and that may cause the screen to “scroll” for some time before the end of the data frame is reached.

  • Line 3: This prints the structure of the mtcars data frame. The result shows that this is a data.frame and has 32 observations (that is how many cars are in the dataset) of 11 variables (things like mpg). Also the structure command displays the type of data that are in the dataset. For example, all 11 variables are of the “number” type. The str function is frequently used to better understand a data frame.

  • Line 4: This line prints the maximum mpg value for the mtcars data frame. Note that the specific variable desired is indicated by both the data frame name and the variable, separated by a dollar sign, like mtcars$mpg on this line.

  • Line 5: This prints the minimum mpg value.

If some of the lines in the result are too long to fit on one row in the R Console they will wrap around.


Skill Check: Data Frames

In the following R codebox, explore the airquality data frame.

  • Determine the structure of the airquality data frame

  • Set MaxWind to the maximum value of Wind

  • Set MaxTemp to the maximum value of Temp

  • Set MinOzone to the minimum value of Ozone