Day 5: Manipulating Raster Data

Overview

Teaching: min
Exercises: min

Questions

How can I summarize raster data based on vector locations

Objectives

Import .csv files containing x,y coordinate locations and convert to a spatial object

Crop a raster to the extent of a vector layer

Extract values from a raster that correspond to a vector file overlay

Outline

Addressing Minute questions (10-15 minutes)
Capstone lesson (30-40 minutes)
1. Revisit your spatial question
2. Load libraries and import data
3. Crop raster with a vector
4. Summarize raster data by vector locations
Keypoints and final questions (10-15 minutes)

1) Revisit your spatial question

On day 1 we brainstormed questions you could ask with geospatial data. You thought about questions you can answer with raster data, and separately, questions you could answer with vector data. Through this workshop you have practiced working with spatial data and have the tools to answer your question.

In this final day we will implement tools you learned for combining raster and vector data and the types of questions you can answer when doing so.

A spatial question may look like the following:

Where does something occur?
How does something change across space or time?
What are the differences between locations?

Examples of spatial questions are:

What county has the highest recorded precipitation?
Has mean annual temperature increased in the state?
How does the precipitation vary between Milwaukee and Madison?

Capstone data question: Does it get hotter in Milwaukee or Madison?

In this capstone lesson we will use the following capstone data:

World Climate raster of maximum temperature (will download)
Natural Earth populated places vector point data for cities in Wisconsin (provided ne_10m_populated_places_Wisconsin.csv file)

We will use the vector data to crop the raster data to the extent of Wisconsin and summarize the raster data using the point locations of Madison and Milwaukee.

If you are using your own data, please have raster and vector data that cover a similar geographic area. Since we are cropping the raster data by the vector data, it will work best if your vector data is of a smaller geographic area than the raster data.

2) Load libraries and import data

For raster data, we use functions in the raster package. For vector data, you will use functions in the sf package. In addition, because we will visualize our data, we will use the plotting package ggplot2.

Solution

library(raster)
library(sf)
library(ggplot2)

Load raster data

The capstone data includes the World Climate Data or “WorldClim”. We will use your coding skills to access the worldclim dataset from the raster package, crop it to our extent of interest, and summarize the data. By the end of this section you should have your raster data as a ‘Formal class RasterLayer’ in your Global Environment

World Climate Data

You can learn about and download the WorldClim from the website: https://www.worldclim.org/data/worldclim21.html For an example of how to use the raster package to access the WorldClim (raster) data and GADM (vector) data, visit this tutorial: https://www.gis-blog.com/r-raster-data-acquisition/

For the WorldClim data, we need to specify the data variable and the spatial resolution. We will use the [“bio” variable] (https://www.worldclim.org/data/bioclim.html), which are 19 historial bioclimatic variables. The bio variables include things like:

annual mean temperature
max temperataure of the warmest month
annual precipitation
precipiation of wettest month

The bio variables are available at spatial resolutions of 10 minutes, 5 minutes, 2.5 minutes, and 30 seconds.

Wait, why is spatial resolution in units of time? This is because the spatial coordinates on a sphere are often expressed as degrees-minutes-seconds. 10 minutes is the biggest and 30 seconds is the smallest. An online conversion tool is helpful to understand these unit conversions(https://www.opendem.info/arc2meters.html)

For the capstone data, we want to use the largest spatial resolution of 10 minutes. This means each pixel represents about 13 km on the ground at the approximate latitude of Wisconsin. Within the “bio” variables, we will use the “bio5” variable that represents the maxiumum temperature of the warmest month.

Solution

# get worldclim data, variable "bio" with a 10 minute resolution
# this line is downloading data
# if you selected a resolution higher than 10 (with a res of 0.5,2.5, or 5) it may take a while to download
worldClim_bio_global <- getData("worldclim", var="bio", res=10)

# use only the bio5 variable (BIO5 = Max Temperature of Warmest Month
maxTemp_global <- worldClim_bio_global$bio5

# plot data
plot(maxTemp_global)

plot of chunk load-raster-data

Load vector data

The capstone data includes the Natural Earth Populated Places dataset. We have provided the dataset for you as a “.csv” file in the Capstone data folder. By the end of this section you should have a simple feature (‘sf’) data frame type of object in your Global Environment.

Natural Earth Data

You can learn about and download the Natural Earth dataset from the website: https://www.naturalearthdata.com/ You can also download Natural Earth data directly into R using the [rnaturalearth package] (https://cran.r-project.org/web/packages/rnaturalearth/README.html)

Solution

# import point data from a csv file
WIcities <- read.csv("../data/ne_10m_populated_places_Wisconsin.csv")

Warning in file(file, "rt"): cannot open file '../data/
ne_10m_populated_places_Wisconsin.csv': No such file or directory

Error in file(file, "rt"): cannot open the connection

# check out the data
colnames(WIcities)

Error in is.data.frame(x): object 'WIcities' not found

Recall from your lesson how to import point data and convert it to a spatial data layer in R.

When looking at the column names we see that there is a lot of data! To convert to a spatial points data layer, we need to know information about the coordinate system and the coordinates for each point.

The column names LATITUDE and LONGITUDE have what we need!

Solution

# convert data frame to sf type of object by specifying coordinates (as xy) and the projection as a crs
WIcities_points <- st_as_sf(WIcities, coords=c('LONGITUDE','LATITUDE'), crs=crs(maxTemp_global))

Error in st_as_sf(WIcities, coords = c("LONGITUDE", "LATITUDE"), crs = crs(maxTemp_global)): object 'WIcities' not found

# check class of object - should be "sf" data frame
class(WIcities_points)

Error in eval(expr, envir, enclos): object 'WIcities_points' not found

A critical part of working with raster and vector data together is that the datasets are in the same coordinate system.

Does it make sense? Are they the same? You can use the functions in the raster package to check out your data.

crs(maxTemp_global)

CRS arguments: +proj=longlat +datum=WGS84 +no_defs

crs(WIcities_points)

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'crs': object 'WIcities_points' not found

extent(maxTemp_global)

class      : Extent 
xmin       : -180 
xmax       : 180 
ymin       : -60 
ymax       : 90 

extent(WIcities_points)

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'extent': object 'WIcities_points' not found

res(maxTemp_global)

[1] 0.1666667 0.1666667

Finally, let’s subset our spatial data to only include the city points we are interested in. For the sample dataset, we are going to subset the data to only include the cities of Madison and Milwaukee.

If you are using your own data, you may or may not need to do this subset or you might want to choose different locations.

Solution

WIcities_points$NAME

Error in eval(expr, envir, enclos): object 'WIcities_points' not found

MM_points <- WIcities_points[WIcities_points$NAME=='Madison'|WIcities_points$NAME=='Milwaukee',]

Error in eval(expr, envir, enclos): object 'WIcities_points' not found

3) Crop raster with vector

It may be useful to limit the raster data to only areas of interest. This is helpful when you load a larger raster file, such as data for the entire globe, but are only interested in a small part. For our capstone data, we loaded the global temperature data, but we are only interested in the data for Wisconsin, and specifically, the data for Milwaukee and Madison. Let’s use the crop() function in the raster package to cut out the maximum temperature data to only the area for the Wisconsin cities.

Solution

maxTemp_WI <- crop(x = maxTemp_global, y = extent(WIcities_points),snap="out")

Error in .local(x, y, ...): Cannot get an Extent object from argument y

maxTemp_WI_df <- as.data.frame(maxTemp_WI, xy=T)

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'maxTemp_WI' not found

ggplot() +
  geom_raster(data = maxTemp_WI_df,aes(x = x, y = y, fill = bio5)) + 
  geom_sf(data = MM_points,color="yellow",size=5) + 
  coord_sf()

Error in fortify(data): object 'maxTemp_WI_df' not found

4) Summarize raster data by vector locations

Now that we have our raster layer that contains the data we want to summarize, and the vector data that contains the locations indicating where we want to summarize the data, we can now put them together.

If you are working with your own data, ask yourself:

How do I want to summarize the data? For example, do I want a single mean value? A maximum? A distribution of all values?
What is the area I want to summarize? If it’s a point, do you want a certain distance around the point? Or, if it’s an area, you will include all of the raster values within that area.

Extract pixel values

For the capstone data, we want to extract all the pixel values within a distance buffer of the two city points. Since our data is “unprojected” and therefore the units are in latitude and longitude, the buffer distance will be in meters (hint: this information is in the Help file for the extract() function!)

Geographic and projected unit reminder

Remember, your raster is “unprojected” meaning that the pixels represent distances on a sphere of degrees, minutes, and seconds. You’ll need to consider is the conversation from degrees to meters. The conversions depends on where you are on the globe. For example, at the equator, 1 degree is 111 km. This distance decreases as you move closer to the poles. Try setting the buffer to 50 km or 50,000 meters. This is something you can play around with as you explore the data.

Solution

# extract pixel values from the raster based on the Milwaukee and Madison points
# using a buffer size of 50,000 (50 km around the each point) because the units are in meters
maxTemp_MM <- extract(x=maxTemp_global,df=TRUE,
                           y=MM_points,
                           buffer=50000)

Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'extract': object 'MM_points' not found

Convert data and plot

The last few steps will be small conversions of the datatypes to make it easy for summarizing and plotting.

Solution

# convert to data frame for each calculations and plotting
df_maxTemp_MM <- as.data.frame(maxTemp_MM)

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'maxTemp_MM' not found

# clean up data frame because we want to give IDs names and convert the temperature data
df_maxTemp_MM$ID <- as.factor(df_maxTemp_MM$ID)

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.factor': object 'df_maxTemp_MM' not found

levels(df_maxTemp_MM$ID) = levels(as.factor(MM_points$NAME))

Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'levels': error in evaluating the argument 'x' in selecting a method for function 'as.factor': object 'MM_points' not found

df_maxTemp_MM$bio5 <- df_maxTemp_MM$bio5/10

Error in eval(expr, envir, enclos): object 'df_maxTemp_MM' not found

# plot data
ggplot(data = df_maxTemp_MM, aes(x = bio5,fill=ID)) + 
  geom_histogram() +
  ggtitle("Histogram of maximum temperatures within 50 km of the city") +
  xlab("Maximum temperature (C)") + 
  ylab("Frequency of Pixels")

Error in ggplot(data = df_maxTemp_MM, aes(x = bio5, fill = ID)): object 'df_maxTemp_MM' not found

The graph above shows us the distribution of maximum temperatures within 50 km of Madison and Milwaukee. We have a distribution of values because the extracted data represents each individual pixel within the 50 km buffer. This graph shows us that the maximum temperatures recorded in the Madison are tend to be higher than the maximum temperatures recorded in the Milwaukee region. Does this pattern make sense with your knowledge of the area and the data? Remember, visualizing the data can be important in understanding and explaining patterns.

Key Points

Convert a data frame to an sf object using the st_as_sf() function.

Use the crop() and extract() functions to get raster data of spatial subsets.

Remember to check if your data align spatially by looking at the projection, extent, and units

previous episode

Geospatial Data Carpentry Capstone

next episode

Day 5: Manipulating Raster Data

Overview

Outline

1) Revisit your spatial question

Capstone data question: Does it get hotter in Milwaukee or Madison?

2) Load libraries and import data

Solution

Load raster data

World Climate Data

Solution

Load vector data

Natural Earth Data

Solution

Solution

Solution

3) Crop raster with vector

Solution

4) Summarize raster data by vector locations

Extract pixel values

Geographic and projected unit reminder

Solution

Convert data and plot

Solution

Key Points

previous episode

next episode