################################################################################
### R BASICS WORKSHOP                                                        ###
### EXERCISE 6.2: Object manipulation (and graphics), in a spatial analysis  ### 
###                                                                          ###
### Center for Conservation and Sustainable Development                      ###
### Missouri Botanical Garden                                                ###
### Website: rbasicsworkshop.weebly.com                                      ### 
################################################################################

### INTRODUCTION ###############################################################
## The goal of this exercise is to manipulate objects often used in simple
## representantions of spatial data, including vectors, matrices and data frames.
## In addition, some tasks in this exercise introduce object classes that have
## been specifically designed to perform spatial analysis.

# The following code creates a vector, named "s.A", representing points
# regularly spaced along a transect through a hypothetical region "A":

s.A <- seq(0, 18.850, 0.1)

# The following code creates a vector, called "y.A", that represents the
# spatial variation of an environmental variable "y" along the transect that crosses
# the region "A":

y.A <- sin(s.A)

## TASK 1: make a graph of the geographic variation of the variable "y"
## along the spatial transect that crosses region "A", using
## the function *plot*. Note the sinusoidal variation (in waves) of the variable
## "y".

## TASK 2: create a vector, named "s.B", representing the coordinates
## of a transect through a hypothetical region "B". The values of this vector
## must be equal to those in the vector "s.A".

## TASK 3: create a vector called "y.B" that represents spatial variation
## in the variable "y" along the transect that crosses the hypothetical region
## "B". Do this using function *sort* to sort the values of  "y.A"
## from smallest to largest, assigning the result of this function to the
## vector "y.B". Read the help page for function *sort* as needed.

## TASK 4: make a graph of the geographic variation of the variable "y"
## along the spatial transect that crosses region "B", using the
## function *plot*.

## TASK 5: use function "points" to add in the graph above the
## representation of the geographic variation of the variable "y" along
## the spatial transect that crosses region "A". Be sure to use symbols
## of a different color to differentiate regions "A" and "B".

## TASK 6: create a legend for the graph you just constructed, using
## the function *legend*. Be sure to read the help page of the function
## *legend*. The legend should indicate which symbols are used to represent
## each region ("A" and "B").

# The contrast between regions "A" and "B" in the spatial pattern of the
# environmental variable "y" illustrates a central concept in spatial data
# analysis: the difference between two types of heterogeneity (Wiens, 2000,
# Ecological heterogeneity: an ontogeny of concepts and approaches;
# Jim nez & Ricklefs, 2014, Diversity anomalies and spatial climate
# heterogeneity, Global Ecology and Biogeography 23: 988-999):

# i) heterogeneity in constitution (or spatially implicit heterogeneity):
# variation between different sites, without considering the distance that
# separates the sites.

# ii) heterogeneity in spatial arrangement (or spatially
# explicit heterogeneity): spatial distribution of sites with different
# characteristics.

## TASK 7: confirm that heterogeneity in constitution (spatially
## implicit) of the variable "y" in region "A" is identical to that in
## region "B", using the *var*, *range* and *IQR* functions to calculate
## the variance, range, and interquartile range of the "y" variable in both
## regions.

# In the following tasks you will compare the distributional heterogeneity
# (spatially explicit) of the variable "y" in region "A" with that of
# the region "B", using the *dist* function to graphically examine the
# relationship between geographic distance and the change in the variable "y".
# Read the help for the function *dist*. With the following code you get an
# object of class "dist", containing the spatial distance between all points
# in the transect along region "A":

dist(s.A)

# similarly, the following code returns a class object "dist", containing the
# difference in the variable "y" between all the points of transect along
# region "A":

## TASK 8: plot the relationship between geographic distance, dist(s.A), and
## the change in the variable "y", dist(y.A), in region "A", using the
## function *plot*.

## TASK 9: plot the relationship between geographic distance and change in
## the variable "y" in region "B", using the *plot* function.

# to easily compare regions "A" and "B" in terms of the relationship
# between the geographic distance and the change in the variable "y",
# you can use the *dev.new* function to open an additional graphical window.
# Start by plotting the relationship between geographic distance and change
# in the variable "y" for region "A". Then run this code to open a new window
# graphical window:

dev.new()

# now graph the relationship between geographic distance and change in
# variable "y" for region "B". You can easily compare both graphs.
# Visit the help page for function *dev.new* and become familiar with this
# and other functions that control graphical windows. For example, the function
# *dev.set* determines which graphical window is active (i.e., ready to graph).
# The following code makes the graphical window number 2 active:

dev.set(2)

## TASK 10: The standard installation of R includes several databases, including
## a topographical description of the Maunga Whau volcano, on Mt Eden, located
## in Auckland, New Zealand. This database, called "volcano", is a
## example of how spatial information can be represented in a simple way
## in an object of class matrix. Load the database "volcano" with this
## code:

data(volcano)

## print the database in the console with this code:

volcano

## Use function *dim* to determine the number of rows and columns in
## the "volcano" matrix.

# The function *image* can be used to create a topographic map of the volcano
# Maunga Whau:

image(volcano)

# and the function *contour* can be used to add elevation contours
# to the previous plot:

contour(volcano, levels=c(140), add=T)

## TASK 11: the topographic map of the Maunga Whau volcano constructed with the
## function *image* is a grid in which each square (or cell) represents a
## geographic position on the volcano. Visit the help page of the "volcano" database
## to find out the spatial scale of that grid. Then use the *image* function to
## create a topographic map of the Maunga Whau volcano at the correct spatial scale.
## Notice that the code above (image(volcano)) does not have the correct spatial scale,
## because it uses the default values of the "x" and "y" arguments of the *image*
## function. Visit the help page for the "image" function to understand the use of these
## arguments. Also, use the "xlab" argument to give the horizontal axis the name
## "East - West (m)", and the argument "ylab" to give the axis vertical the name
## "South - North (m)".

## TASK 12: add to the previous map (the one you created in TASK 11) elevation contours
## showing the following altitudes: 100, 120, 140, 160 and 180 meters above
## sea level. Be sure to use the correct spatial scale.

## TASK 13: install the "sp" package and load it into your R session. This is one
## of the most important packages for spatial analysis in R.

## TASK 14: The "sp" package includes a database called "meuse.riv",
## which is an object of class matrix with the geographic coordinates of the boundary
## of the Meuse river in the surroundings of Stein (Holland). Load this database
## in your R session, check the class of the object using the function *class* and
## examine its dimensions with the function *dim*

## TASK 15: use the *plot* function to draw the outline of the river Meuse,
## using the arguments "type" and "col" to represent the outline of the river
## as blue lines (not dots).

## TASK 16: use the function *polygon* to draw the outline of the river Meuse,
## using the "border" and "col" arguments to represent the river as a
## blue polygon. If necessary, visit the help page for the function
## *polygon* and study its arguments.

## TASK 17: the package "sp" includes a database called "meuse",
## which is an object of class "data frame" with spatial information about
## the concentration of heavy metals, soil characteristics and land use
## land in the valley of the river Meuse near Stein (Holland). Load this
## database in your R session, check the class of the object using
## the function *class* and examine its dimensions with function *dim*.

# The following code produces a map of the sampled sites that are included
# in the "meuse" database, and then adds the river outline to the map
# Meuse:

plot(meuse$x, meuse$y, col="gray90", pch=21,
	xlab = "Easting (m) in Rijksdriehoek",
	ylab = "Northing (m) in Rijksdriehoek")
polygon(meuse.riv, col="blue", border="blue")

## TASK 18: use function *points* to add red round symbols
## (arguments pch=19 and col="red") that indicate the position of sites
## with corn agriculture. To do this task, visit the help page of the
## "meuse" database, and read the description of the landuse variable.
## The workshop section on logical indexing is helpful to complete this
## task.

## TASK 19: use function *points* to add round green symbols
## (arguments pch=19 and col="green") that indicate the position of grassy
## sites. Visit the help page of the database "meuse", and read the 
## description of the landuse variable. The workshop section on logical
## indexing is useful to complete this task.

# The following code adds a legend to the map you just created,
# visit the help page for the *legend* function to familiarize yourself
# with the arguments of this function:

legend(x=178530, y=333710, c("Corn", "Grass", "Others"), pch=c(19, 19, 21), 
	col=c("red", "green", "gray70"))

# The following code adds a title to the map you just created, visit
# the help page for the *mtext* function to familiarize yourself with
# this function:

mtext(side=3, "Distribution of corn crops and grass", line=2.5, cex=1.3)
mtext(side=3, "in the Meuse River valley", line=1, cex=1.3)

## TASK 20: what is the average concentration of lead (in units of
## mg/kg, or ppm) in the soil from sampled sites included in
## the "meuse" database? To do this task, visit the help page
## of the database "meuse", and read the description of the 
## lead concentration variable. Use function *mean* function to
## calculate the average.

## TASK 21: what is the average concentration of cadmium (in units of
## mg/kg, or ppm) in the soil from sampled sites included in
## the "meuse" database? To do this task, visit the help page
## of the database "meuse", and read the description of the
## cadmium concentration variable. Use the *mean* function to
## compute the average.

## TASK 22: Create, again, a map of the sampled sites that are included
## in the "meuse" database, and then add the river outline to the map
## Meuse. Use code similar to that in TASK 18, with the functions *plot*
## and *polygon*, but using "filled" gray points (pch=19 argument to
## function *plot*) to graph the points sampled.

## TASK 23: use function *points* to add (to the previous map)
## round orange symbols (arguments pch=19 and col="orange")
## indicating the position of sites with a concentration of lead
## above average for all sampled sites. The section of the
## workshop on logical indexing useful to carry out this task.

## TASK 24: use function *points* to add (to the previous map)
## open, round, red symbols (arguments pch=21 and col="red")
## indicating the position of sites with a concentration of
## cadmium above average for all sampled sites. Use
## the "cex" argument to make sure the symbols are larger
## than the ones you used in TASK 23. For example, use a value of
## 1.5 for the "cex" argument (cex=1.5). The workshop section on logical
## indexing is useful to complete this task.

## TASK 25: what is the seventy-fifth percentile of lead concentration
## (in units of mg/kg, or ppm) in the soil at the sampled sites
## included in the "meuse" database? Use the *quantile* function
## to calculate the seventy-fifth percentile.

## TASK 27: create, again, a map of the sampled sites that are included
## in the "meuse" database, and then add the river outline to the map
## Meuse. Use the same code that you used in TASK 22.

## TASK 28: use function *points* to add (to the previous map)
## round orange symbols (arguments pch=19 and col="orange")
## indicating the position of sites with a concentration of lead
## greater than the seventy-fifth percentile of the sampled sites.
## The workshop section on logical indexing is helpful for this task.

## TASK 29: use the *points* function to add (to the previous map)
## open, round, red symbols (arguments pch=21 and
## col="red") indicating the position of sites with a concentration of
## cadmium higher than the 75th percentile of the sampled sites.
## Use the "cex" argument to make sure the symbols are larger
## than those used in TASK 28. The workshop section on logical indexing
## is useful here too.

# Maps of continuous variables, such as the concentration of lead or cadmium
# on the Meuse river valley, are common in spatial analyses.
# The maps created in TASKS 22-24 and 27-29 are examples of such maps.
# As an additional example, the following code creates an array of colors
# to develop a map of lead concentration in the Meuse river valley.
# In particular, the vector assigns green color to values less than or
# equal to the median of sampled sites, and red color to the remining sites:

color.mid.lead <- rep("green", times=length(meuse$lead))
color.mid.lead[meuse$lead > quantile(meuse$lead, probs=c(0.5))] <- "red"

# the following code uses the vector you just created,
# "color.mid.lead" to map lead concentration:

plot(meuse$x, meuse$y, col="gray90", pch=21,
	xlab = "Easting (m) in Rijksdriehoek",
	ylab = "Northing (m) in Rijksdriehoek")
polygon(meuse.riv, col="blue", border="blue")
points(meuse$x, meuse$y, col=color.mid.lead, pch=19)

# the following code adds the map legend

legend(178530, 333710,
	c(
	paste("(", quantile(meuse$lead, probs=c(0.5)), ",", max(meuse$lead), "]", sep=""),
	paste("[", min(meuse$lead), ",",   quantile(meuse$lead, probs=c(0.5)), "]", sep="")
	),
	pch=19, col=c("red", "green"))

# and finally, the following code adds the map title:

mtext(side=3, "Lead concentration (ppm)", line=2.5, cex=1.3)
mtext(side=3, "in the Meuse River valley", line=1, cex=1.3)

## TASK 30: create a vector of colors to make a map showing
## quartiles of lead concentration in the Meuse River valley.
## The categories should be represented, from smallest to largest,
## by colors: green, yellow, orange and red. Start by creating a
## vector of colors in which all the values are green:

color.quartiles.lead <- rep("green", times=length(meuse$lead))

## now assign yellow to values greater than the first quartile:

color.quartiles.lead[meuse$lead > quantile(meuse$lead, probs=c(0.25))] <- "yellow"

## then color orange the values greater than the median:

color.quartiles.lead[meuse$lead > quantile(meuse$lead, probs=c(0.5))] <- "orange"

## Finally, assign red color to the values greater than the third quartile:

## TASK 31: use the color vector you just created, "color.quartiles.lead",
## to map lead concentration. The map should show the outline of the Meuse River
## in blue. Be sure to include the names and units of the horizontal and vertical axes.

## TASK 32: add the legend (using function "legend") and the title
## (using function *mtext*) to the map you just created.

## TASK 33: create a color vector to make a map showing quartiles of cadmium
## concentration in the Meuse River valley. The categories should be represented,
## from smallest to largest, by colors: green, yellow, orange and red. Use function
## *quantile* to create this vector. The vector should be named "color.quartiles.cadmium"

## TASK 34: use the color vector you just created, "color.quartiles.cadmium",
## to map the cadmium concentration. The map should show the outline of the Meuse River
## in blue. Be sure to include the names and units of the horizontal and vertical axes.

## TASK 35: add the legend (using function "legend") and the title
## (using function *mtext*) to the map you just created.

## TASK 36: plot the relationship between the concentration of lead and cadmium
## in the soil of the sampled sites in the Meuse River valley.

## TASK 37: use function *cor.test* to estimate the correlation between
## the concentration of lead and cadmium in the Meuse River valley. Measure the
## the correlation based on the Pearson coefficient.

## TASK 38: use function *pairs* to graph the relationship between the
## concentration of all heavy metal pairs included in the "meuse" database:
## cadmium, copper, lead, and zinc. Visit the help page for this function
## and to become familiar with its arguments.

## TASK 39: use function *cor* to estimate the correlation matrix
## between the concentration of all heavy metal pairs included in
## the "meuse" database: cadmium, copper, lead, and zinc. Avoid writing
## function *cor* multiple times (i.e., once for each pair of heavy metals)
## The code in your response must use the *cor* function once.

## TASK 40: plot the relationship between the distance to the Meuse River (horizontal axis)
## and the concentration of lead in the soil of the Meuse River valley (vertical axis).
## Visit the help page of the "meuse" database to see which column(s)
## have information on the distance to the river. Use the arguments
## "xlab" and "ylab" to indicate the variables and units
## represented by the horizontal and vertical axes, respectively.

## TASK 41: plot the relationship between the distance to the Meuse River (horizontal axis)
## and the concentration of cadmium in the soil of the Meuse River valley (vertical axis).
## Use the arguments "xlab" and "ylab" to specify the names and units of
## the variables represented by the horizontal and vertical axes, respectively.

## TASK 42: plot the relationship between the logarithm of the distance to the Meuse River
## (horizontal axis) and the lead concentration in the Meuse River valley (vertical axis).
## Use the arguments "xlab" and "ylab" to indicate the names and units of the variables
## represented by the horizontal and vertical axes, respectively.

## TASK 43: use function *lm* to fit a linear regression model in
## which the response variable (dependent variable) is the concentration
## of lead in the soil of the Meuse River valley and the independent variable is
## the logarithm of the distance to the river. Assign the result (using
## the operator "<-") to an object named "model.lead", and use the function
## *summary* to examine the magnitude, precision (the standard error), and
## statistical significance of the regression coefficients.

## TASK 44: use function *abline* to add the regression line
## of "model.lead" to the figure you created in TASK 42.

## TASK 45: plot the relationship between the logarithm of the distance to
## the Meuse River (horizontal axis) and the cadmium concentration in the
## Meuse River valley (vertical axis). Use the arguments "xlab" and "ylab"
## to indicate the names and units of the variables represented by the
## horizontal and vertical axes, respectively.

## TASK 46: use function *lm* to fit a linear regression model in
## which the response variable (dependent variable) is the concentration of
## cadmium in the Meuse River valley and the independent variable is the
## logarithm of the distance to the river. Assign the result (using the
## operator "<-") to an object named "model.cadmium", and use the function
## *summary* to examine the magnitude, precision (the standard error), and
## statistical significance of the regression coefficients.

## TASK 47: use function *abline* to add the regression line of the
## "model.cadmium" to the figure I created in TASK 45.

# Spatial correlation is a measure of heterogeneity in spatial arrangement
# (or spatially explicit heterogeneity, see text under TASK 6). In
# particular, spatial correlation measures the relationship between observed values
# in nearby or neighboring sites (Legendre and Legendre 2012, Numerical Ecology.
# Elsevier). One measure of spatial correlation is Moran's I. Read the
# equation 7.3 on page 232 of Bocard et al. (2011, Numerical Ecology.
# Springuer), available at:
# http://rbasicsworkshop.weebly.com/books.html
# Be sure to read the text at the bottom of page 232 and reflect on the
# meaning of equation 7.3. The code below calculates the spatial correlation
# in the concentration of lead in the soil of the Meuse River valley. To run
# this code you need to install and load the "spdep" package in your R session:

install.packages("spdep")
library(spdep)

# To get an idea of the frequency of geographic distances between
# the sampling points at which we want to calculate Moran's I, you
# may use the *dist* and *hist* functions:

hist(dist(meuse[,1:2]), breaks=100, main="",
	xlab="Distance between sampling points (m)",
	ylab="Frequency (pairs of sampling points)")

# this histogram suggests that you can calculate Moran's I by defining neighboring
# sampling sites as those separated by distances less than or equal to 500 meters.
# Visit the help page for the *dnearneigh* function of the package "spdep", which you
# will use to define neighboring sampling sites. Notice that the
# first argument "x" must be an array (or an object of class "SpatialPoints",
# specialized for spatial analysis) with the coordinates of sampling sites.
# Note that indexing the coordinates of the sampling sites,
# produces a data frame and not an array:

class(meuse[,1:2])

# therefore it is necessary to use the function *as.matrix* to define the
# value of the argument "x" of the function *dnearneigh*:

neighbor.sites <-  dnearneigh(x=as.matrix(meuse[,1:2]), d1=0, d2=500)

# The resulting object, neighbor.sites, is of class "nb" which is a class of
# objects specialized in representing relations of spatial proximity (neighborhood):

neighbor.sites[[1]]

# to reinforce this idea, study the following code, which displays a
# map of all sampled sites in the Meuse River valley, highlighting
# in red a focal site and in gray the neighboring sites:

plot(meuse$x, meuse$y, col="gray90", pch=21,
	xlab = "Easting (m) in Rijksdriehoek",
	ylab = "Northing (m) in Rijksdriehoek")
polygon(meuse.riv, col="blue", border="blue")
points(meuse$x[1], meuse$y[1], col="red", pch=19)
points(meuse$x[neighbor.sites[[1]]], meuse$y[neighbor.sites[[1]]], col="gray70", pch=19)

## TASK 48: modify the code above to show neighboring sites to the
## site that is in row 50 of the "meuse" data frame. To start,
## the following code identifies the neighbors of this site:

neighbor.sites[[50]]

## now modify the code immediately before this task to
## make a map that highlights the focal site in red (which is in the
## row 50 of the "meuse" data frame) and its neighbors grayed out.

# Next use the "neighbor.sites" object to estimate the spatial correlation,
# measured with Moran's I. To do this, the following code uses the
# function *sp.correlogram*. The "order" argument determines the distance to
# which the spatial correlation is computed. If "order=1", the correlation is
# computed only for immediate neighbors. If "order=2", the spatial correlation
# is calculated for immediate neighbors and, also, for sites that are more distant,
# twice the distance from immediate neighbors. And so on. Visit the help page for
# function *sp.correlogram* to learn about the arguments for this function. Then
# study this code:

correlogram.lead <- sp.correlogram(neighbours=neighbor.sites, var=meuse$lead, order = 5, method = "I")

# The resulting object, "correlogram.lead", is of class "spcor". This is a
# specialized class of objects for spatial analysis. Make sure the object
# "correlogram.lead" belongs to class "spcor" and examine its structure using
# functions *str* and *attributes*:

class(correlogram.lead)
str(correlogram.lead)
attributes(correlogram.lead)

# the result of function *str* shows that the class "spcor" is a
# particular type of list. The structure of this list is described in the
# "Value" section of the help page for function "sp.correlogram". The
# following code prints to the console the calculated values off Moran's I
# for five categories of neighbors ("estimate"), the expected value under the
# null hypothesis of no spatial correlation ("expectation") and its variance
# ("variance"), the degree to which the observed value deviates from the null
# hypothesis ("standard deviation") and the statistical significance ("Pr(I) two
# sided"):

print(correlogram.lead)

# the following code plots these results for the five categories of
# neighbors:

plot(correlogram.lead)

## TASK 49: spatial correlation among residuals of regression analysis,
## such as those contained in "model.lead" and "model.cadmium", is an important
## aspect of spatial data analysis (see page 9 in Legendre & Legendre 1998,
## available at: http://rbasicsworkshop.weebly.com/books.html). Calculate and graph
## the spatial correlation among residuals of "model.lead", using Moran's I.
## To begin with, use function *residuals* to obtain the residals of "model.lead":

residuals(model.lead)

## TASK 50: make a map showing the residuals of "model.lead", using four colors
## to show four partitions of the residuals according to quartiles. Be sure to
## include the names and units of the horizontal and vertical axes, the title of
## the graph (you could use the function *mtext*) and a legend. The code for TASKS
## 30-35 may be helpful here.

## TASK 51: calculate and plot the spatial correlation between the residuals of the
## "model.cadmium", using Moran's I.

## TASK 52: create a map showing the residuals of "model.cadmium",
## using four colors to show four partitions of the residuals
## according to quartiles. The map should show the outline of the Meuse River.
## Be sure to include the names and units of the horizontal and vertical axes,
## the title of the graph (you could use function *mtext*) and a legend.