This lesson is still being designed and assembled (Pre-Alpha version)

# Introduction

## Overview

Teaching: 0 min
Exercises: 0 min
Questions
• Key question (FIXME)

Objectives
• First learning objective. (FIXME)

FIXME

## Key Points

• First key point. Brief Answer to questions. (FIXME)

# Day 1: Introduction to Geospatial Concepts

## Overview

Teaching: 0 min
Exercises: 0 min
Questions
• What format should I use to represent my data?

• What are the main data types used for representing geospatial data?

• What are the main attributes of raster and vector data?

Objectives
• Describe the difference between raster and vector data.

• Describe the stregnths and weaknesses of storing data in raster vs. vector formats.

• Distinguish between continuous and categorical raster data, and identify types of datasets that would be stored in each format.

• Describe the three types of vectors and identify types of data that would be stored in each.

This episode reviews the two primary types of geospatial data: rasters and vectors. It provides an overview of these two types of geospatial data, describing major features of both and providing examples of where you might find each type.

# Data Structures: Raster and Vector

Raster data is any pixelated (or gridded) data where each pixel is associated with a specific geographic location. The value of a pixel can be continuous (e.g. elevation) or categorical (e.g. land use). This data structure is very common: it’s how we represent any digital image. A geospatial raster is only different from a digital photo in that it is accompanied by spatial information that connects the data to a particular location. This includes the raster’s extent and cell size, the number of rows and columns, and its coordinate reference system (CRS).

Source: National Ecological Observatory Network (NEON)

Some examples of continuous rasters include:

1. Precipitation maps
2. Maps of tree height derived from LiDAR data
3. Elevation values for a region

Some rasters contain categorical data where each pixel represents a discrete class such as landcover type (e.g., “forest” or “grassland”) rather than a continuous value such as elevation or temperature. Some examples of classified maps include:

1. Land-cover/land-use maps
2. Tree height maps classified as short, medium, and tall trees
3. Elevation maps classified as low, medium, and high elevation

Source: United States Geological Survey (USGS)

With a partner, brainstorm potential advantages and disadvantages of storing data in raster format.

## Solution

Raster data has some important advantages:

• representation of continuous surfaces
• potentially very high levels of detail
• data is ‘unweighted’ across its extent - the geometry doesn’t implicitly highlight features
• cell-by-cell calculations can be very fast and efficient

The downsides of raster data are:

• file sizes become very large as the cell size gets smaller (more pixels)
• currently popular formats don’t embed metadata well
• can be difficult to represent complex information

## Important Attributes of Raster Data

### Extent

The spatial extent is the geographic area that the raster data covers. The spatial extent of an R spatial object represents the geographic edge or location that is the furthest north, south, east, and west. In other words, extent represents the overall geographic coverage of the spatial object.

### Resolution

The spatial resolution of a raster represents the area on the ground that each pixel of the raster covers.

Source: National Ecological Observatory Network (NEON)

## Multi-band Raster Data

A raster can contain one or more bands. One type of multi-band raster dataset that is familiar to many of us is a color image. A basic color image consists of three bands: red, green, and blue. Each band represents light reflected from the red, green, or blue portions of the electromagnetic spectrum. The pixel brightness for each band, when composited, creates the colors that we see in an image. In a multi-band dataset, the rasters will always have the same extent, resolution, and CRS.

Source: National Ecological Observatory Network (NEON)

## Raster Data Format for this Capstone

Raster data can come in many different formats. For this capstone, we’ve offered data in the GeoTIFF format, which has the extension `.tif`. A `.tif` file stores metadata or attributes about the file as embedded `.tif tags`. For instance, your camera might store a tag that describes the make and model of the camera or the date that the photo was taken when it saves a `.tif`. A GeoTIFF is a standard `.tif` image format with additional spatial (georeferencing) information embedded in the file as tags. These tags should include the following raster metadata:

1. Extent
2. Resolution
3. Coordinate Refernce System (CRS) - you’ll explore this concept in a later episode.
4. Values that represent missing data (`NoDataValue`).

## Spatial Questions with Raster Data

What kinds of geographic questions can you ask using raster data? Brainstorm a few with your neighbor or in a breakout room.

## Solution

How are precipitation patterns different across space? How does average temperature differ across space?

Vector data structures represent specific features on the Earth’s surface, and assign attributes to those features. Vectors are composed of discrete geometric locations (x, y values) known as vertices that define the shape of the spatial object. The organization of the vertices determines the type of vector that we are working with: point, line, or polygon.

Source: National Ecological Observatory Network (NEON)

• Points: Each point is defined by a single x, y coordinate. There can be many points in a vector point file. Examples of point data include: sampling locations, the location of individual trees, or the location of survey plots.

• Lines: Lines are composed of many (at least 2) points that are connected. For instance, a road or a stream may be represented by a line. This line is composed of a series of segments, each “bend” in the road or stream represents a vertex that has a defined x, y location.

• Polygons: A polygon consists of 3 or more vertices that are connected and closed. The outlines of survey plot boundaries, lakes, oceans, and states or countries are often represented by polygons.

## Data Tip

Sometimes, boundary layers such as states and countries, are stored as lines rather than polygons. However, these boundaries, when represented as a line, will not create a closed object with a defined area that can be filled.

## Solution

Vector data has some important advantages:

• the geometry itself contains information about what the dataset creator thought was important
• the geometry structures hold information in themselves — why choose point over polygon, for instance?
• each geometry feature can carry multiple attributes instead of just one, e.g. a database of cities can have attributes for name, country, population, etc.
• data storage can be very efficient compared to rasters

The downsides of vector data are:

• potential loss of detail compared to raster
• potential bias in datasets — what didn’t get recorded?
• calculations involving multiple vector layers need to do math on the geometry as well as on the attributes, so can be slow compared to raster math.

Vector datasets are in use in many industries besides geospatial fields. For instance, computer graphics are largely vector-based, although the data structures in use tend to join points using arcs and complex curves rather than straight lines. Computer-aided design (CAD) is also vector-based. The difference is that geospatial datasets are accompanied by information tying their features to real-world locations.

## Vector Data Format for this Capstone

Like raster data, vector data can also come in many different formats. For this capstone, we’ll use the Shapefile format, which has the extension `.shp`. A `.shp` file stores the geographic coordinates of each vertice in the vector, as well as metadata including:

• Extent: the spatial extent of the shapefile (i.e., geographic area that the shapefile covers). The spatial extent for a shapefile represents the combined extent for all spatial objects in the shapefile.
• Object type: whether the shapefile includes points, lines, or polygons.
• Coordinate Reference System (CRS): defines how the two-dimensional, projected map relates to real places on the earth. The decision of which map projection and CRS to use depends on the regional extent of the area you want to work in, on the analysis you want to do, and often on the availability of data.
• Other Attributes: for example, a line shapefile that contains the locations of streams, might contain the name of each stream.

Because the structure of points, lines, and polygons are different, each individual shapefile can only contain one vector type (all points, all lines, or all polygons). You will not find a mixture of point, line, and polygon objects in a single shapefile. A shapefile is actually a directory of files that contain different attributes about the data — you’ll learn more about this when you begin to work with vector data.

## More Resources

More about shapefiles can be found on Wikipedia

## Why Not Both?

Very few formats can contain both raster and vector data - in fact, most are even more restrictive than that. Vector datasets are usually locked to one geometry type, e.g. points only. Raster datasets can usually only encode one data type; for example, you can’t have a multi-band GeoTIFF where one layer is integer data and another is floating point. There are sound reasons for this: format standards are easier to define and maintain, and so is metadata. The effects of particular data manipulations are more predictable if you are confident that all of your output data has the same characteristics.

# Research Questions for Vector Data

Brainstorm potential research questions you could use vector data to answer, with a partner or in a breakout room.

## Solution

What counties had the highest and lowest precipitation in 1990?

## Key Points

• Raster data is pixelated data where each pixel is associated with a specific location.

• Raster data always has an extent and a resolution.

• The extent is the geographical area covered by a raster.

• The resolution is the area covered by each pixel of a raster.

• Vector data structures represent specific features on the Earth’s surface along with attributes of those features.

• Vector objects are either points, lines, or polygons.

# Day 2: Introduction to Raster Data

## Overview

Teaching: min
Exercises: 5 min
Questions
• What are the coordinate reference systems for your rasters and are they the same?

Objectives
• Visually explore the patterns in your rasters. Reproject the your raster into UTM coordinates.

# Outline

• Intro + Minute questions (10-15 min.)
• Raster Capstone (30-40 min.):
• Load and summarize the raster data
• Visualize the raster data
• Develop insights
• Outro + Final Questions (10-15 min.)

In this capstone lesson, we use the following capstone data:

• A raster of maximum temperature in Wisconsin
• A raster of minimum temperature in Wisconsin

If you are using your own data, please identify two (2) rasters with continuous values and that are unprojected.

Practice the first steps of using data by loading in the maximum and minimum temperature rasters for Wisconsin using the `raster()` function and then exploring using the `summary()` function.

## Solution

``````library(raster)
``````
``````Loading required package: sp
``````
``````library(ggplot2)
library(rgdal)
``````
``````rgdal: version: 1.5-23, (SVN revision 1121)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
Path to GDAL shared files: /usr/share/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
Path to PROJ shared files: /usr/share/proj
To mute warnings of possible GDAL/OSR exportToProj4() degradation,
``````
``````mintemp <- raster("../data/mintemp_monthcold_wi.tif")
summary(mintemp)
``````
``````        mintemp_monthcold_wi
Min.                  -20.70
1st Qu.               -18.20
Median                -16.70
3rd Qu.               -14.45
Max.                  -11.10
NA's                  377.00
``````
``````maxtemp <- raster("../data/maxtemp_monthwarm_wi.tif")
summary(maxtemp)
``````
``````        maxtemp_monthwarm_wi
Min.                    24.9
1st Qu.                 26.5
Median                  27.6
3rd Qu.                 28.2
Max.                    29.0
NA's                   377.0
``````

## 2) Check the CRS

Use the `crs()` function to check if the coordinate reference systems the same in these two rasters?

## Solution

``````mintemp
``````
``````class      : RasterLayer
dimensions : 27, 36, 972  (nrow, ncol, ncell)
resolution : 0.1666667, 0.1666667  (x, y)
extent     : -93, -87, 42.5, 47  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs
source     : mintemp_monthcold_wi.tif
names      : mintemp_monthcold_wi
values     : -20.7, -11.1  (min, max)
``````
``````maxtemp
``````
``````class      : RasterLayer
dimensions : 27, 36, 972  (nrow, ncol, ncell)
resolution : 0.1666667, 0.1666667  (x, y)
extent     : -93, -87, 42.5, 47  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs
source     : maxtemp_monthwarm_wi.tif
names      : maxtemp_monthwarm_wi
values     : 24.9, 29  (min, max)
``````

OR

``````crs(mintemp)
``````
``````CRS arguments: +proj=longlat +datum=WGS84 +no_defs
``````
``````crs(maxtemp)
``````
``````CRS arguments: +proj=longlat +datum=WGS84 +no_defs
``````

Yes, the CRS for both rasters is the same. Both are in longlat projection, datum WGS84.

## 3) Visualize

Create visualizations of both minimum and maximum temperature rasters using the `ggplot2` package.

## Solution

``````mintemp_df <- as.data.frame(mintemp, xy=TRUE)
maxtemp_df <- as.data.frame(maxtemp, xy=TRUE)
``````
``````ggplot() +
geom_raster(data = mintemp_df , aes(x = x, y = y, fill = mintemp_monthcold_wi)) +
scale_fill_viridis_c() +
coord_quickmap()
``````

``````ggplot() +
geom_raster(data = maxtemp_df , aes(x = x, y = y, fill = maxtemp_monthwarm_wi)) +
scale_fill_viridis_c() +
coord_quickmap()
``````

### Bonus

You can add the following aesthetics to a `ggplot` to customize the backgrounds, axis labels, title, legend label, and move the legend position to the bottom of the plot:

``````ggplot() +
geom_*() +
labs(title = "Your Title", x="X Label", y="Y Label", fill="Legend Label") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5), legend.position = "bottom")
``````

Explore what these aesthetics do with your plot of maximum monthly temperature.

## Solution

``````ggplot() +
geom_raster(data = maxtemp_df , aes(x = x, y = y, fill = maxtemp_monthwarm_wi)) +
scale_fill_viridis_c() +
coord_quickmap() +
labs(title = "Maximum Monthly Temperature: WI", x="Longitude", y="Latitude", fill="Max. Temp (C)") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5), legend.position = "bottom")
``````

## 4) Develop insights

Describe what patterns you observe across the state for coldest and warmest monthly temperatures in Wisconsin. What do you hypothesize is the effect of Lake Michigan?

## Solution

Warmer temperatures are in Wisconsin tend toward the lower half of the state and nearer to the Minnesota border. Colder temperatures tend toward the North and near Lake Michigan, possibly due to the lake effect

## 5) Reproject

Reproject the maxtemp raster into UTM coordinates using the following crs string: ` +proj=utm +zone=15 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 `. This string was obtained from Spatial Reference Orig under PROJ4.

## Solution

``````maxtemp_utm <- projectRaster(maxtemp, crs=" +proj=utm +zone=15 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 ")
crs(maxtemp_utm)
``````
``````CRS arguments:
+proj=utm +zone=15 +datum=WGS84 +units=m +no_defs
``````
``````crs(maxtemp)
``````
``````CRS arguments: +proj=longlat +datum=WGS84 +no_defs
``````

## Key Points

• Always explore your raster first using the summary() and crs() functions.

• To plot your raster with ggplot(), first convert your raster to a data frame using the as.data.frame() function and specify the argument xy=TRUE.

• You may need to reproject your raster, for which you can use the projectRaster() function.

• There are many plotting and aesthetics options in the ggplot2 package for you to explore.

# Day 3: Introduction to Raster Data

## Overview

Teaching: min
Exercises: 3 min
Questions
• What is the average of the minimum and maximum values of your raster and what patterns do you observe?

Objectives
• Import and explore multiple rasters and develop initial geospatial insights. Use raster math to to calculate different summary statistics.

• Load in and explore the rasters, describe the patterns you observe. Perform raster math.

# Outline

• Intro + Minute questions (10-15 min.)
• Raster Capstone (30-40 min.):
• Use the `overlay()` function to perform raster math
• Outro + Final Questions (10-15 min.)

In this capstone lesson, we use the following capstone data:

• A raster of precipitation in Wisconsin
• A raster of maximum temperature in Wisconsin
• A raster of minimum temperature in Wisconsin

If you are using your own data, please identify two (3) rasters with continuous values that are unprojected. You may use the same 2 rasters from the previous section.

## 1) Load, summarize, and visualize

Load in the precipitation raster for Wisconsin and explore the raster. Plot the raster and describe the patterns you observe.

## Solution

``````library(raster)
``````
``````Loading required package: sp
``````
``````library(ggplot2)
library(rgdal)
``````
``````rgdal: version: 1.5-23, (SVN revision 1121)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
Path to GDAL shared files: /usr/share/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
Path to PROJ shared files: /usr/share/proj
To mute warnings of possible GDAL/OSR exportToProj4() degradation,
``````
``````precip <- raster("../data/precip_annual_wi.tif")
summary(precip)
``````
``````        precip_annual_wi
Min.                 676
1st Qu.              795
Median               814
3rd Qu.              827
Max.                 898
NA's                 377
``````
``````precip_df <- as.data.frame(precip, xy=TRUE)
ggplot() +
geom_raster(data=precip_df,
aes(x=x, y=y,fill=precip_annual_wi))+
scale_fill_viridis_c() +
coord_quickmap()
``````

Annual precipitation is highest near the Illinois border and near Iron county. It is lowest near Douglas and Washburn county.

## 2) Raster math using `overlay()`: calculating the difference

Load in the maximum and minimum temperature rasters for Wisconsin. Subtract the minimum temperature from the maximum temperature using the `overlay()` function and plot your results. Note what you observe.

## Solution

``````mintemp <- raster("../data/mintemp_monthcold_wi.tif")
maxtemp <- raster("../data/maxtemp_monthwarm_wi.tif")

diff <- overlay(maxtemp, mintemp,
fun = function(r1, r2) { return( r1 - r2) })

diff_df <- as.data.frame(diff, xy=TRUE)

ggplot() +
geom_raster(data = diff_df,
aes(x = x, y = y, fill = layer)) +
scale_fill_gradientn(name = "Temperature Difference", colors = terrain.colors(10)) +
coord_quickmap()
``````

Near Lake Michigan, the difference between maximum and minimum temperature was lowest, again possibly due to the lake effect. The largest differences in maximum and minimum temperatures was near the Minnesota border.

## 3) Raster math using `overlay()`: calculating the average

Use the `overlay()` function again to calculate the average temperature, using the maximum and minimum temperature rasters. Plot your results and note what you observe.

## Solution

``````avg <- overlay(maxtemp, mintemp,
fun = function(r1, r2) { return( (r1 + r2)/2) })
#alternative:
#avg <- overlay(maxtemp, mintemp, fun = mean)

avg_df <- as.data.frame(avg, xy=TRUE)

ggplot() +
geom_raster(data = avg_df,
aes(x = x, y = y, fill = layer)) +
scale_fill_gradientn(name = "Temperature Average", colors = terrain.colors(10)) +
coord_quickmap()
``````

The average temperatures are higher toward the Illinois border and lower up north.

## Key Points

• Always explore your data and note initial geospatial insights.

• The overlay() function can be used to calculate different summary statistics for your raster.

• With the overlay() function, you can begin to develop your skills in writing functions.

# Day 4: Exploring Vector Data

## Overview

Teaching: min
Exercises: 7 min
Questions
• Are two datasets from two different sources suitable to be plotted together?

• What spatial patterns can we see by visual analysis?

• What are some plotting options to impact the way data are visualized?

Objectives
• Importing vector data and visually explore the vector data through attributes and by visualization.

In this capstone lesson, we will work with two Wisconsin vector data files: precipitation, and county boundaries. If you are using your own data, please identify 2 different vector files that you can use to summarize and visualize in this lesson. The learning objectives of this capstone are to demonstrate an example workflow of loading, exploring, and plotting vectors.

Load these libraries if you don’t already have them (`dplyr`, `ggplot2`, `sf`, `rgdal`)

## Solution

``````library(dplyr)
``````
``````Error in library(dplyr): there is no package called 'dplyr'
``````
``````library(ggplot2)
library(sf)
``````
``````Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
``````
``````library(rgdal)
``````
``````Loading required package: sp
``````
``````rgdal: version: 1.5-23, (SVN revision 1121)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
Path to GDAL shared files: /usr/share/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
Path to PROJ shared files: /usr/share/proj
To mute warnings of possible GDAL/OSR exportToProj4() degradation,
``````

In this exercise, we have provided two capstone vector data files: Wisconsin Annual Precipitation Average 1981-2010, and county outlines. You can work with these, or with your own vector datasets.

Filenames: precip1981_2010_a_wi.shp and WI_Counties2010.shp

## Note about the Shapefile format

Although we will be importing Shapefiles using the name of a single file with the extension “.shp” be aware that a Shapefile actually consists of a folder of multiple files, each having the same name but with different extensions. The multiple files in the folder each contain information about different aspects of the geospatial data. For more information see “What is a Shapefile?” https://www.gislounge.com/what-is-a-shapefile/

## Solution

``````wi_precip <- st_read("../data/precipitation/precip1981_2010_a_wi.shp")
``````
``````Reading layer `precip1981_2010_a_wi' from data source
`/home/runner/work/geospatial-capstone/geospatial-capstone/data/precipitation/precip1981_2010_a_wi.shp'
using driver `ESRI Shapefile'
Simple feature collection with 11 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -92.88943 ymin: 42.49172 xmax: -86.75393 ymax: 47.11257
Geodetic CRS:  WGS 84
``````
``````wi_county <- st_read("../data/WI_Counties2010/WI_Counties2010.shp")
``````
``````Reading layer `WI_Counties2010' from data source
`/home/runner/work/geospatial-capstone/geospatial-capstone/data/WI_Counties2010/WI_Counties2010.shp'
using driver `ESRI Shapefile'
Simple feature collection with 72 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -92.88811 ymin: 42.49192 xmax: -86.80542 ymax: 47.08001
Geodetic CRS:  WGS 84
``````
``````## County data is from here  https://geo.btaa.org/catalog/4E677AF3-3FF2-43FA-8A58-3C345EC7F465
``````

We have worked with functions to explore the geometry type, and projection of vector data files. Examples include:`st_geometry_type()`, `st_crs()`, `st_bbox()`, as well as reviewing the output produced in R when you imported the shapefiles.

Explore the metadata of the shapefiles. We will work towards combining these files in the same plot. Are these files suited to being plotted together - for example, are they in the same CRS and cover the same area?

## Solution

``````st_geometry_type(wi_precip)
``````
`````` [1] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[6] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[11] MULTIPOLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
``````
``````st_crs(wi_precip)\$proj4string
``````
``````[1] "+proj=longlat +datum=WGS84 +no_defs"
``````
``````st_bbox(wi_precip)
``````
``````     xmin      ymin      xmax      ymax
-92.88943  42.49172 -86.75393  47.11257
``````
``````st_geometry_type(wi_county)
``````
`````` [1] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[6] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[11] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[16] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[21] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[26] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[31] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[36] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[41] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[46] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[51] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[56] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[61] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[66] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON
[71] MULTIPOLYGON MULTIPOLYGON
18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
``````
``````st_crs(wi_county)\$proj4string
``````
``````[1] "+proj=longlat +datum=WGS84 +no_defs"
``````
``````st_bbox(wi_county)
``````
``````     xmin      ymin      xmax      ymax
-92.88811  42.49192 -86.80541  47.08001
``````

The CRS for both vectors is the same. Both are in longlat projection, datum WGS84. You may notice that the bounding box limits are slightly different. Since the limits are similar, we will continue working with them with the intent to plot them together, and make a note to inspect the plots and see why the limits are different.

## 4) Explore attributes. Identify attribute(s) of interest

If you are working with the capstone data, identify the attribute that appears to contain the precipitation data, and the attribute of county name. If you are working with your data, choose an attribute of interest to work with for plotting. There are multiple ways to explore the attributes of a spatial data file. We have worked with the `names()` function and previewing the first 6 rows with the `head()` functions as well as viewing the file in RStudio’s Environment tab, or printing the shapefile object to the screen.

For a visual method to explore attributes, you can try the `plot()` function built into sf. This plots a small map for each of the attributes. For example `plot(wi_precip)` Notice that this `plot()` command displays attribute names but does not display data values.

``````plot(wi_precip)
``````

``````plot(wi_county)
``````

## 5) Using ggplot2 to plot data with selected attribute of interest

If working with the capstone data, plot the precipitation shapefile using the attribute you identified as the precipitation data to set the fill. One way to do this is to use a fill aesthetic mapped to the column name of the attribute you identified `aes(fill = )`

## Solution

``````# Using aes in  ggplot2
ggplot() +
geom_sf(data = wi_precip, aes(fill = PrecipInch), size = 0.1) +
ggtitle("Wisconsin Annual Precipitation Average 1981 - 2010, In Inches") +
coord_sf()
``````

## 6) Spatial Questions Discussion

What do you see in the plotted map? For example, where are the highest/lowest values, and are there patterns across the state? How many different categories are there? Notice the default colors and how they are associated with the data values. Do you think this color scale is a good choice for the data? Can you easily read the different color values on the plot, and connect them to the data value? Is the color range intuitive to understand? Could this be symbolized better?

It is a common convention in map symbology to use darker colors for higher data values. We can do this in ggplot2 by reversing the color range. Add the line `scale_fill_continuous(trans = 'reverse') +`

## Solution

``````#Plot a shapefile, reverse the colors
ggplot() +
geom_sf(data = wi_precip, aes(fill = PrecipInch), size = 0.1) +
scale_fill_continuous(trans = 'reverse') +   ## reverses color scale
ggtitle("Wisconsin Annual Precipitation Average 1981 - 2010, In Inches") +
coord_sf()
``````

## 7) Plot multiple vector shapefiles

You can use ggplot2 to plot mutiple vector shapefiles layered in a single plot. For example, adding a layer of the county outlines over the plot of precipitation with a second use of the `geom_sf()` command. One design option to try is to make choices about the color and thickness of the county lines.

## Solution

``````ggplot() +
geom_sf(data = wi_precip, aes(fill = PrecipInch), size = 0.1) +
geom_sf(data = wi_county, fill = NA, color = gray(.5)) +
scale_fill_continuous(trans = 'reverse') +        ## reverses color scale
ggtitle("Wisconsin Annual Precipitation Average 1981 - 2010, In Inches") +
coord_sf()
``````

## Bonus: Try different color palette

R offers a number of color packages that allow you to choose from a variety of color ranges. One example of a color palette is `colorspace`

The `hcl_palettes` function will print the colorspace palette to the screen

`````` # install.packages("colorspace")
library(colorspace)
hcl_palettes(plot = TRUE)
``````

You can assign colors from a palette to the unique values in a range, as in our precipitation data. First, determine the number of unique values.

`````` wi_precip_uniq <- unique(wi_precip\$Inches)
length(wi_precip_uniq)
``````
`````` [1] 11
``````

In the case of our precipitation data, there are 11 unique values. This code example selects the palette “Purples 3” and reverses the order of the colors.

``````colors <- sequential_hcl(11, palette = "Purples 3")  #11 categories
colors <- rev(colors)   #reverses order
``````

The `cut_interval()` function can be used to assign the range of data values into a given number of groups. In this example we are using `cut_interval()` as an aspect of assigning a plot color to a group of data values within the “Inches” attribute . In this case, using 11 groups for `cut_interval()` assigns a color to each unique data value.

`````` ggplot() +
geom_sf(data = wi_precip, aes(fill = cut_interval(Inches, (11)))) +
scale_fill_manual(values = colors) +
ggtitle("Wisconsin Avg. Annual Precipitation 1981-2000") +
coord_sf()
``````

Changing the value in the `cut_interval()` function to a number that is less than the number of unique values will combine multiple data values into a range plotted as the same color. This can be helpful to simplify the appearance of a plot with several unique data values.

This example assigns the values of the “Inches” attribute to five categories.

``````ggplot() +
geom_sf(data = wi_precip, aes(fill = cut_interval(Inches, (5)))) +
scale_fill_manual(values = colors) +
ggtitle("Wisconsin Avg. Annual Precipitation 1981-2000") +
coord_sf()
``````

Notice that the polygons plot with outlines of each unique data value category, even though multiple data value categories are plotted with the same color. This can be a confusing appearance since the outlines don’t align with the color fills. This can be helped by setting the fill lineweight `lwd` to 0, suppressing the outlines.

``````ggplot() +
geom_sf(data = wi_precip, aes(fill = cut_interval(Inches, (5))), lwd = 0) +
scale_fill_manual(values = colors) +
ggtitle("Wisconsin Avg. Annual Precipitation 1981-2000") +
coord_sf()
``````

## Key Points

• Options for exploring vector files include displaying metadata, and visually examining plots
• R offers design options for the visual display of vector data, including choosing a color palette, and reversing the order of colors assigned to categories

# Day 5: Manipulating Raster Data

## Overview

Teaching: min
Exercises: min
Questions
• How can I summarize raster data based on vector locations

Objectives
• Import .csv files containing x,y coordinate locations and convert to a spatial object

• Crop a raster to the extent of a vector layer

• Extract values from a raster that correspond to a vector file overlay

# Outline

• Addressing Minute questions (10-15 minutes)
• Capstone lesson (30-40 minutes)
2. Load libraries and import data
3. Crop raster with a vector
4. Summarize raster data by vector locations
• Keypoints and final questions (10-15 minutes)

# 1) Revisit your spatial question

On day 1 we brainstormed questions you could ask with geospatial data. You thought about questions you can answer with raster data, and separately, questions you could answer with vector data. Through this workshop you have practiced working with spatial data and have the tools to answer your question.

In this final day we will implement tools you learned for combining raster and vector data and the types of questions you can answer when doing so.

A spatial question may look like the following:

• Where does something occur?
• How does something change across space or time?
• What are the differences between locations?

Examples of spatial questions are:

• What county has the highest recorded precipitation?
• Has mean annual temperature increased in the state?
• How does the precipitation vary between Milwaukee and Madison?

### Capstone data question: Does it get hotter in Milwaukee or Madison?

In this capstone lesson we will use the following capstone data:

• Natural Earth populated places vector point data for cities in Wisconsin (provided `ne_10m_populated_places_Wisconsin.csv` file)

We will use the vector data to crop the raster data to the extent of Wisconsin and summarize the raster data using the point locations of Madison and Milwaukee.

If you are using your own data, please have raster and vector data that cover a similar geographic area. Since we are cropping the raster data by the vector data, it will work best if your vector data is of a smaller geographic area than the raster data.

# 2) Load libraries and import data

For raster data, we use functions in the `raster` package. For vector data, you will use functions in the `sf` package. In addition, because we will visualize our data, we will use the plotting package `ggplot2`.

## Solution

``````library(raster)
library(sf)
library(ggplot2)
``````

The capstone data includes the World Climate Data or “WorldClim”. We will use your coding skills to access the worldclim dataset from the `raster` package, crop it to our extent of interest, and summarize the data. By the end of this section you should have your raster data as a ‘Formal class RasterLayer’ in your Global Environment

## World Climate Data

You can learn about and download the WorldClim from the website: https://www.worldclim.org/data/worldclim21.html For an example of how to use the `raster` package to access the WorldClim (raster) data and GADM (vector) data, visit this tutorial: https://www.gis-blog.com/r-raster-data-acquisition/

For the WorldClim data, we need to specify the data variable and the spatial resolution. We will use the [“bio” variable] (https://www.worldclim.org/data/bioclim.html), which are 19 historial bioclimatic variables. The bio variables include things like:

• annual mean temperature
• max temperataure of the warmest month
• annual precipitation
• precipiation of wettest month

The bio variables are available at spatial resolutions of 10 minutes, 5 minutes, 2.5 minutes, and 30 seconds.

Wait, why is spatial resolution in units of time? This is because the spatial coordinates on a sphere are often expressed as degrees-minutes-seconds. 10 minutes is the biggest and 30 seconds is the smallest. An online conversion tool is helpful to understand these unit conversions(https://www.opendem.info/arc2meters.html)

For the capstone data, we want to use the largest spatial resolution of 10 minutes. This means each pixel represents about 13 km on the ground at the approximate latitude of Wisconsin. Within the “bio” variables, we will use the “bio5” variable that represents the maxiumum temperature of the warmest month.

## Solution

``````# get worldclim data, variable "bio" with a 10 minute resolution
# if you selected a resolution higher than 10 (with a res of 0.5,2.5, or 5) it may take a while to download
worldClim_bio_global <- getData("worldclim", var="bio", res=10)

# use only the bio5 variable (BIO5 = Max Temperature of Warmest Month
maxTemp_global <- worldClim_bio_global\$bio5

# plot data
plot(maxTemp_global)
``````

The capstone data includes the Natural Earth Populated Places dataset. We have provided the dataset for you as a “.csv” file in the Capstone data folder. By the end of this section you should have a simple feature (‘sf’) data frame type of object in your Global Environment.

## Natural Earth Data

You can learn about and download the Natural Earth dataset from the website: https://www.naturalearthdata.com/ You can also download Natural Earth data directly into R using the [`rnaturalearth` package] (https://cran.r-project.org/web/packages/rnaturalearth/README.html)

## Solution

``````# import point data from a csv file
``````
``````Warning in file(file, "rt"): cannot open file '../data/
ne_10m_populated_places_Wisconsin.csv': No such file or directory
``````
``````Error in file(file, "rt"): cannot open the connection
``````
``````# check out the data
colnames(WIcities)
``````
``````Error in is.data.frame(x): object 'WIcities' not found
``````

Recall from your lesson how to import point data and convert it to a spatial data layer in R.

When looking at the column names we see that there is a lot of data! To convert to a spatial points data layer, we need to know information about the coordinate system and the coordinates for each point.

The column names `LATITUDE` and `LONGITUDE` have what we need!

## Solution

``````# convert data frame to sf type of object by specifying coordinates (as xy) and the projection as a crs
WIcities_points <- st_as_sf(WIcities, coords=c('LONGITUDE','LATITUDE'), crs=crs(maxTemp_global))
``````
``````Error in st_as_sf(WIcities, coords = c("LONGITUDE", "LATITUDE"), crs = crs(maxTemp_global)): object 'WIcities' not found
``````
``````# check class of object - should be "sf" data frame
class(WIcities_points)
``````
``````Error in eval(expr, envir, enclos): object 'WIcities_points' not found
``````

A critical part of working with raster and vector data together is that the datasets are in the same coordinate system.

Does it make sense? Are they the same? You can use the functions in the `raster` package to check out your data.

``````crs(maxTemp_global)
``````
``````CRS arguments: +proj=longlat +datum=WGS84 +no_defs
``````
``````crs(WIcities_points)
``````
``````Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'crs': object 'WIcities_points' not found
``````
``````extent(maxTemp_global)
``````
``````class      : Extent
xmin       : -180
xmax       : 180
ymin       : -60
ymax       : 90
``````
``````extent(WIcities_points)
``````
``````Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'extent': object 'WIcities_points' not found
``````
``````res(maxTemp_global)
``````
``````[1] 0.1666667 0.1666667
``````

Finally, let’s subset our spatial data to only include the city points we are interested in. For the sample dataset, we are going to subset the data to only include the cities of Madison and Milwaukee.

If you are using your own data, you may or may not need to do this subset or you might want to choose different locations.

## Solution

``````WIcities_points\$NAME
``````
``````Error in eval(expr, envir, enclos): object 'WIcities_points' not found
``````
``````MM_points <- WIcities_points[WIcities_points\$NAME=='Madison'|WIcities_points\$NAME=='Milwaukee',]
``````
``````Error in eval(expr, envir, enclos): object 'WIcities_points' not found
``````

# 3) Crop raster with vector

It may be useful to limit the raster data to only areas of interest. This is helpful when you load a larger raster file, such as data for the entire globe, but are only interested in a small part. For our capstone data, we loaded the global temperature data, but we are only interested in the data for Wisconsin, and specifically, the data for Milwaukee and Madison. Let’s use the `crop()` function in the `raster` package to cut out the maximum temperature data to only the area for the Wisconsin cities.

## Solution

``````maxTemp_WI <- crop(x = maxTemp_global, y = extent(WIcities_points),snap="out")
``````
``````Error in .local(x, y, ...): Cannot get an Extent object from argument y
``````
``````maxTemp_WI_df <- as.data.frame(maxTemp_WI, xy=T)
``````
``````Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'maxTemp_WI' not found
``````
``````ggplot() +
geom_raster(data = maxTemp_WI_df,aes(x = x, y = y, fill = bio5)) +
geom_sf(data = MM_points,color="yellow",size=5) +
coord_sf()
``````
``````Error in fortify(data): object 'maxTemp_WI_df' not found
``````

# 4) Summarize raster data by vector locations

Now that we have our raster layer that contains the data we want to summarize, and the vector data that contains the locations indicating where we want to summarize the data, we can now put them together.

• How do I want to summarize the data? For example, do I want a single mean value? A maximum? A distribution of all values?
• What is the area I want to summarize? If it’s a point, do you want a certain distance around the point? Or, if it’s an area, you will include all of the raster values within that area.

## Extract pixel values

For the capstone data, we want to extract all the pixel values within a distance buffer of the two city points. Since our data is “unprojected” and therefore the units are in latitude and longitude, the buffer distance will be in meters (hint: this information is in the Help file for the `extract()` function!)

## Geographic and projected unit reminder

Remember, your raster is “unprojected” meaning that the pixels represent distances on a sphere of degrees, minutes, and seconds. You’ll need to consider is the conversation from degrees to meters. The conversions depends on where you are on the globe. For example, at the equator, 1 degree is 111 km. This distance decreases as you move closer to the poles. Try setting the buffer to 50 km or 50,000 meters. This is something you can play around with as you explore the data.

## Solution

``````# extract pixel values from the raster based on the Milwaukee and Madison points
# using a buffer size of 50,000 (50 km around the each point) because the units are in meters
maxTemp_MM <- extract(x=maxTemp_global,df=TRUE,
y=MM_points,
buffer=50000)
``````
``````Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'extract': object 'MM_points' not found
``````

## Convert data and plot

The last few steps will be small conversions of the datatypes to make it easy for summarizing and plotting.

## Solution

``````# convert to data frame for each calculations and plotting
df_maxTemp_MM <- as.data.frame(maxTemp_MM)
``````
``````Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'maxTemp_MM' not found
``````
``````# clean up data frame because we want to give IDs names and convert the temperature data
df_maxTemp_MM\$ID <- as.factor(df_maxTemp_MM\$ID)
``````
``````Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.factor': object 'df_maxTemp_MM' not found
``````
``````levels(df_maxTemp_MM\$ID) = levels(as.factor(MM_points\$NAME))
``````
``````Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'levels': error in evaluating the argument 'x' in selecting a method for function 'as.factor': object 'MM_points' not found
``````
``````df_maxTemp_MM\$bio5 <- df_maxTemp_MM\$bio5/10
``````
``````Error in eval(expr, envir, enclos): object 'df_maxTemp_MM' not found
``````
``````# plot data
ggplot(data = df_maxTemp_MM, aes(x = bio5,fill=ID)) +
geom_histogram() +
ggtitle("Histogram of maximum temperatures within 50 km of the city") +
xlab("Maximum temperature (C)") +
ylab("Frequency of Pixels")
``````
``````Error in ggplot(data = df_maxTemp_MM, aes(x = bio5, fill = ID)): object 'df_maxTemp_MM' not found
``````

The graph above shows us the distribution of maximum temperatures within 50 km of Madison and Milwaukee. We have a distribution of values because the extracted data represents each individual pixel within the 50 km buffer. This graph shows us that the maximum temperatures recorded in the Madison are tend to be higher than the maximum temperatures recorded in the Milwaukee region. Does this pattern make sense with your knowledge of the area and the data? Remember, visualizing the data can be important in understanding and explaining patterns.

## Key Points

• Convert a data frame to an `sf` object using the `st_as_sf()` function.

• Use the `crop()` and `extract()` functions to get raster data of spatial subsets.

• Remember to check if your data align spatially by looking at the projection, extent, and units

# Wrap-up

## Overview

Teaching: 0 min
Exercises: 0 min
Questions
• Key question (FIXME)

Objectives
• First learning objective. (FIXME)

FIXME

## Key Points

• First key point. Brief Answer to questions. (FIXME)