This lesson is being piloted (Beta version)

Introduction to R for Geospatial Data

Key Points

Introduction to R and RStudio
  • Use RStudio to write and run R programs.

  • R has the usual arithmetic operators.

  • Use <- to assign values to variables.

Project Management With RStudio
  • Use RStudio to create and manage projects with consistent layout.

  • Treat raw data as read-only.

  • Treat generated output as disposable.

Data Structures
  • Use read.csv to read tabular data in R.

  • The basic data types in R are double, integer, complex, logical, and character.

  • Use factors to represent categories in R.

Exploring Data Frames
  • Use cbind() to add a new column to a data frame.

  • Use rbind() to add a new row to a data frame.

  • Remove rows from a data frame.

  • Use na.omit() to remove rows from a data frame with NA values.

  • Use levels() and as.character() to explore and manipulate factors.

  • Use str(), nrow(), ncol(), dim(), colnames(), rownames(), head(), and typeof() to understand the structure of a data frame.

  • Read in a csv file using read.csv().

  • Understand what length() of a data frame represents.

Subsetting Data
  • Indexing in R starts at 1, not 0.

  • Access individual values by location using [].

  • Access slices of data using [low:high].

  • Access arbitrary sets of data using [c(...)].

  • Use logical operations and logical vectors to access subsets of data.

Data frame Manipulation with dplyr
  • Use the dplyr package to manipulate dataframes.

  • Use select() to choose variables from a dataframe.

  • Use filter() to choose data based on values.

  • Use group_by() and summarize() to work with subsets of data.

  • Use mutate() to create new variables.

Introduction to Visualization
  • Use ggplot2 to create plots.

  • Think about graphics in layers: aesthetics, geometry, etc.

Writing Data
  • Save plots using ggsave() or pdf() combined with dev.off().

  • Use write.csv to save tabular data.

Reference

Introduction to R and RStudio

Project management with RStudio

Data Structures

Exploring Data Frames

Useful functions for querying data structures:

Subsetting data

Data frame manipulation with dplyr

Control flow

Writing data

Glossary

argument
A value given to a function or program when it runs. The term is often used interchangeably (and inconsistently) with parameter.
assign
To give a value a name by associating a variable with it.
body
(of a function): the statements that are executed when a function runs.
comment
A remark in a program that is intended to help human readers understand what is going on, but is ignored by the computer. Comments in Python, R, and the Unix shell start with a # character and run to the end of the line; comments in SQL start with --, and other languages have other conventions.
comma-separated values
(CSV) A common textual representation for tables in which the values in each row are separated by commas.
delimiter
A character or characters used to separate individual values, such as the commas between columns in a CSV file.
documentation
Human-language text written to explain what software does, how it works, or how to use it.
floating-point number
A number containing a fractional part and an exponent. See also: integer.
for loop
A loop that is executed once for each value in some kind of set, list, or range. See also: while loop.
index
A subscript that specifies the location of a single value in a collection, such as a single pixel in an image.
integer
A whole number, such as -12343. See also: floating-point number.
library
In R, the directory(ies) where packages are stored.
package
A collection of R functions, data and compiled code in a well-defined format. Packages are stored in a library and loaded using the library() function.
parameter
A variable named in the function’s declaration that is used to hold a value passed into the call. The term is often used interchangeably (and inconsistently) with argument.
return statement
A statement that causes a function to stop executing and return a value to its caller immediately.
sequence
A collection of information that is presented in a specific order.
shape
An array’s dimensions, represented as a vector. For example, a 5×3 array’s shape is (5,3).
string
Short for “character string”, a sequence of zero or more characters.
syntax error
A programming error that occurs when statements are in an order or contain characters not expected by the programming language.
type
The classification of something in a program (for example, the contents of a variable) as a kind of number (e.g. floating-point, integer), string, or something else. In R the command typeof() is used to query a variables type.
while loop
A loop that keeps executing as long as some condition is true. See also: for loop.