################################################################################ ### R BASICS WORKSHOP ### ### EXERCISE 6.2: Object manipulation (and graphics), in a spatial analysis ### ### ### ### Center for Conservation and Sustainable Development ### ### Missouri Botanical Garden ### ### Website: rbasicsworkshop.weebly.com ### ################################################################################ ### INTRODUCTION ############################################################### ## The goal of this exercise is to manipulate objects often used in simple ## representantions of spatial data, including vectors, matrices and data frames. ## In addition, some tasks in this exercise introduce object classes that have ## been specifically designed to perform spatial analysis. # The following code creates a vector, named "s.A", representing points # regularly spaced along a transect through a hypothetical region "A": s.A <- seq(0, 18.850, 0.1) # The following code creates a vector, called "y.A", that represents the # spatial variation of an environmental variable "y" along the transect that crosses # the region "A": y.A <- sin(s.A) ## TASK 1: make a graph of the geographic variation of the variable "y" ## along the spatial transect that crosses region "A", using ## the function *plot*. Note the sinusoidal variation (in waves) of the variable ## "y". ## TASK 2: create a vector, named "s.B", representing the coordinates ## of a transect through a hypothetical region "B". The values of this vector ## must be equal to those in the vector "s.A". ## TASK 3: create a vector called "y.B" that represents spatial variation ## in the variable "y" along the transect that crosses the hypothetical region ## "B". Do this using function *sort* to sort the values of "y.A" ## from smallest to largest, assigning the result of this function to the ## vector "y.B". Read the help page for function *sort* as needed. ## TASK 4: make a graph of the geographic variation of the variable "y" ## along the spatial transect that crosses region "B", using the ## function *plot*. ## TASK 5: use function "points" to add in the graph above the ## representation of the geographic variation of the variable "y" along ## the spatial transect that crosses region "A". Be sure to use symbols ## of a different color to differentiate regions "A" and "B". ## TASK 6: create a legend for the graph you just constructed, using ## the function *legend*. Be sure to read the help page of the function ## *legend*. The legend should indicate which symbols are used to represent ## each region ("A" and "B"). # The contrast between regions "A" and "B" in the spatial pattern of the # environmental variable "y" illustrates a central concept in spatial data # analysis: the difference between two types of heterogeneity (Wiens, 2000, # Ecological heterogeneity: an ontogeny of concepts and approaches; # Jim nez & Ricklefs, 2014, Diversity anomalies and spatial climate # heterogeneity, Global Ecology and Biogeography 23: 988-999): # i) heterogeneity in constitution (or spatially implicit heterogeneity): # variation between different sites, without considering the distance that # separates the sites. # ii) heterogeneity in spatial arrangement (or spatially # explicit heterogeneity): spatial distribution of sites with different # characteristics. ## TASK 7: confirm that heterogeneity in constitution (spatially ## implicit) of the variable "y" in region "A" is identical to that in ## region "B", using the *var*, *range* and *IQR* functions to calculate ## the variance, range, and interquartile range of the "y" variable in both ## regions. # In the following tasks you will compare the distributional heterogeneity # (spatially explicit) of the variable "y" in region "A" with that of # the region "B", using the *dist* function to graphically examine the # relationship between geographic distance and the change in the variable "y". # Read the help for the function *dist*. With the following code you get an # object of class "dist", containing the spatial distance between all points # in the transect along region "A": dist(s.A) # similarly, the following code returns a class object "dist", containing the # difference in the variable "y" between all the points of transect along # region "A": ## TASK 8: plot the relationship between geographic distance, dist(s.A), and ## the change in the variable "y", dist(y.A), in region "A", using the ## function *plot*. ## TASK 9: plot the relationship between geographic distance and change in ## the variable "y" in region "B", using the *plot* function. # to easily compare regions "A" and "B" in terms of the relationship # between the geographic distance and the change in the variable "y", # you can use the *dev.new* function to open an additional graphical window. # Start by plotting the relationship between geographic distance and change # in the variable "y" for region "A". Then run this code to open a new window # graphical window: dev.new() # now graph the relationship between geographic distance and change in # variable "y" for region "B". You can easily compare both graphs. # Visit the help page for function *dev.new* and become familiar with this # and other functions that control graphical windows. For example, the function # *dev.set* determines which graphical window is active (i.e., ready to graph). # The following code makes the graphical window number 2 active: dev.set(2) ## TASK 10: The standard installation of R includes several databases, including ## a topographical description of the Maunga Whau volcano, on Mt Eden, located ## in Auckland, New Zealand. This database, called "volcano", is a ## example of how spatial information can be represented in a simple way ## in an object of class matrix. Load the database "volcano" with this ## code: data(volcano) ## print the database in the console with this code: volcano ## Use function *dim* to determine the number of rows and columns in ## the "volcano" matrix. # The function *image* can be used to create a topographic map of the volcano # Maunga Whau: image(volcano) # and the function *contour* can be used to add elevation contours # to the previous plot: contour(volcano, levels=c(140), add=T) ## TASK 11: the topographic map of the Maunga Whau volcano constructed with the ## function *image* is a grid in which each square (or cell) represents a ## geographic position on the volcano. Visit the help page of the "volcano" database ## to find out the spatial scale of that grid. Then use the *image* function to ## create a topographic map of the Maunga Whau volcano at the correct spatial scale. ## Notice that the code above (image(volcano)) does not have the correct spatial scale, ## because it uses the default values of the "x" and "y" arguments of the *image* ## function. Visit the help page for the "image" function to understand the use of these ## arguments. Also, use the "xlab" argument to give the horizontal axis the name ## "East - West (m)", and the argument "ylab" to give the axis vertical the name ## "South - North (m)". ## TASK 12: add to the previous map (the one you created in TASK 11) elevation contours ## showing the following altitudes: 100, 120, 140, 160 and 180 meters above ## sea level. Be sure to use the correct spatial scale. ## TASK 13: install the "sp" package and load it into your R session. This is one ## of the most important packages for spatial analysis in R. ## TASK 14: The "sp" package includes a database called "meuse.riv", ## which is an object of class matrix with the geographic coordinates of the boundary ## of the Meuse river in the surroundings of Stein (Holland). Load this database ## in your R session, check the class of the object using the function *class* and ## examine its dimensions with the function *dim* ## TASK 15: use the *plot* function to draw the outline of the river Meuse, ## using the arguments "type" and "col" to represent the outline of the river ## as blue lines (not dots). ## TASK 16: use the function *polygon* to draw the outline of the river Meuse, ## using the "border" and "col" arguments to represent the river as a ## blue polygon. If necessary, visit the help page for the function ## *polygon* and study its arguments. ## TASK 17: the package "sp" includes a database called "meuse", ## which is an object of class "data frame" with spatial information about ## the concentration of heavy metals, soil characteristics and land use ## land in the valley of the river Meuse near Stein (Holland). Load this ## database in your R session, check the class of the object using ## the function *class* and examine its dimensions with function *dim*. # The following code produces a map of the sampled sites that are included # in the "meuse" database, and then adds the river outline to the map # Meuse: plot(meuse$x, meuse$y, col="gray90", pch=21, xlab = "Easting (m) in Rijksdriehoek", ylab = "Northing (m) in Rijksdriehoek") polygon(meuse.riv, col="blue", border="blue") ## TASK 18: use function *points* to add red round symbols ## (arguments pch=19 and col="red") that indicate the position of sites ## with corn agriculture. To do this task, visit the help page of the ## "meuse" database, and read the description of the landuse variable. ## The workshop section on logical indexing is helpful to complete this ## task. ## TASK 19: use function *points* to add round green symbols ## (arguments pch=19 and col="green") that indicate the position of grassy ## sites. Visit the help page of the database "meuse", and read the ## description of the landuse variable. The workshop section on logical ## indexing is useful to complete this task. # The following code adds a legend to the map you just created, # visit the help page for the *legend* function to familiarize yourself # with the arguments of this function: legend(x=178530, y=333710, c("Corn", "Grass", "Others"), pch=c(19, 19, 21), col=c("red", "green", "gray70")) # The following code adds a title to the map you just created, visit # the help page for the *mtext* function to familiarize yourself with # this function: mtext(side=3, "Distribution of corn crops and grass", line=2.5, cex=1.3) mtext(side=3, "in the Meuse River valley", line=1, cex=1.3) ## TASK 20: what is the average concentration of lead (in units of ## mg/kg, or ppm) in the soil from sampled sites included in ## the "meuse" database? To do this task, visit the help page ## of the database "meuse", and read the description of the ## lead concentration variable. Use function *mean* function to ## calculate the average. ## TASK 21: what is the average concentration of cadmium (in units of ## mg/kg, or ppm) in the soil from sampled sites included in ## the "meuse" database? To do this task, visit the help page ## of the database "meuse", and read the description of the ## cadmium concentration variable. Use the *mean* function to ## compute the average. ## TASK 22: Create, again, a map of the sampled sites that are included ## in the "meuse" database, and then add the river outline to the map ## Meuse. Use code similar to that in TASK 18, with the functions *plot* ## and *polygon*, but using "filled" gray points (pch=19 argument to ## function *plot*) to graph the points sampled. ## TASK 23: use function *points* to add (to the previous map) ## round orange symbols (arguments pch=19 and col="orange") ## indicating the position of sites with a concentration of lead ## above average for all sampled sites. The section of the ## workshop on logical indexing useful to carry out this task. ## TASK 24: use function *points* to add (to the previous map) ## open, round, red symbols (arguments pch=21 and col="red") ## indicating the position of sites with a concentration of ## cadmium above average for all sampled sites. Use ## the "cex" argument to make sure the symbols are larger ## than the ones you used in TASK 23. For example, use a value of ## 1.5 for the "cex" argument (cex=1.5). The workshop section on logical ## indexing is useful to complete this task. ## TASK 25: what is the seventy-fifth percentile of lead concentration ## (in units of mg/kg, or ppm) in the soil at the sampled sites ## included in the "meuse" database? Use the *quantile* function ## to calculate the seventy-fifth percentile. ## TASK 27: create, again, a map of the sampled sites that are included ## in the "meuse" database, and then add the river outline to the map ## Meuse. Use the same code that you used in TASK 22. ## TASK 28: use function *points* to add (to the previous map) ## round orange symbols (arguments pch=19 and col="orange") ## indicating the position of sites with a concentration of lead ## greater than the seventy-fifth percentile of the sampled sites. ## The workshop section on logical indexing is helpful for this task. ## TASK 29: use the *points* function to add (to the previous map) ## open, round, red symbols (arguments pch=21 and ## col="red") indicating the position of sites with a concentration of ## cadmium higher than the 75th percentile of the sampled sites. ## Use the "cex" argument to make sure the symbols are larger ## than those used in TASK 28. The workshop section on logical indexing ## is useful here too. # Maps of continuous variables, such as the concentration of lead or cadmium # on the Meuse river valley, are common in spatial analyses. # The maps created in TASKS 22-24 and 27-29 are examples of such maps. # As an additional example, the following code creates an array of colors # to develop a map of lead concentration in the Meuse river valley. # In particular, the vector assigns green color to values less than or # equal to the median of sampled sites, and red color to the remining sites: color.mid.lead <- rep("green", times=length(meuse$lead)) color.mid.lead[meuse$lead > quantile(meuse$lead, probs=c(0.5))] <- "red" # the following code uses the vector you just created, # "color.mid.lead" to map lead concentration: plot(meuse$x, meuse$y, col="gray90", pch=21, xlab = "Easting (m) in Rijksdriehoek", ylab = "Northing (m) in Rijksdriehoek") polygon(meuse.riv, col="blue", border="blue") points(meuse$x, meuse$y, col=color.mid.lead, pch=19) # the following code adds the map legend legend(178530, 333710, c( paste("(", quantile(meuse$lead, probs=c(0.5)), ",", max(meuse$lead), "]", sep=""), paste("[", min(meuse$lead), ",", quantile(meuse$lead, probs=c(0.5)), "]", sep="") ), pch=19, col=c("red", "green")) # and finally, the following code adds the map title: mtext(side=3, "Lead concentration (ppm)", line=2.5, cex=1.3) mtext(side=3, "in the Meuse River valley", line=1, cex=1.3) ## TASK 30: create a vector of colors to make a map showing ## quartiles of lead concentration in the Meuse River valley. ## The categories should be represented, from smallest to largest, ## by colors: green, yellow, orange and red. Start by creating a ## vector of colors in which all the values are green: color.quartiles.lead <- rep("green", times=length(meuse$lead)) ## now assign yellow to values greater than the first quartile: color.quartiles.lead[meuse$lead > quantile(meuse$lead, probs=c(0.25))] <- "yellow" ## then color orange the values greater than the median: color.quartiles.lead[meuse$lead > quantile(meuse$lead, probs=c(0.5))] <- "orange" ## Finally, assign red color to the values greater than the third quartile: ## TASK 31: use the color vector you just created, "color.quartiles.lead", ## to map lead concentration. The map should show the outline of the Meuse River ## in blue. Be sure to include the names and units of the horizontal and vertical axes. ## TASK 32: add the legend (using function "legend") and the title ## (using function *mtext*) to the map you just created. ## TASK 33: create a color vector to make a map showing quartiles of cadmium ## concentration in the Meuse River valley. The categories should be represented, ## from smallest to largest, by colors: green, yellow, orange and red. Use function ## *quantile* to create this vector. The vector should be named "color.quartiles.cadmium" ## TASK 34: use the color vector you just created, "color.quartiles.cadmium", ## to map the cadmium concentration. The map should show the outline of the Meuse River ## in blue. Be sure to include the names and units of the horizontal and vertical axes. ## TASK 35: add the legend (using function "legend") and the title ## (using function *mtext*) to the map you just created. ## TASK 36: plot the relationship between the concentration of lead and cadmium ## in the soil of the sampled sites in the Meuse River valley. ## TASK 37: use function *cor.test* to estimate the correlation between ## the concentration of lead and cadmium in the Meuse River valley. Measure the ## the correlation based on the Pearson coefficient. ## TASK 38: use function *pairs* to graph the relationship between the ## concentration of all heavy metal pairs included in the "meuse" database: ## cadmium, copper, lead, and zinc. Visit the help page for this function ## and to become familiar with its arguments. ## TASK 39: use function *cor* to estimate the correlation matrix ## between the concentration of all heavy metal pairs included in ## the "meuse" database: cadmium, copper, lead, and zinc. Avoid writing ## function *cor* multiple times (i.e., once for each pair of heavy metals) ## The code in your response must use the *cor* function once. ## TASK 40: plot the relationship between the distance to the Meuse River (horizontal axis) ## and the concentration of lead in the soil of the Meuse River valley (vertical axis). ## Visit the help page of the "meuse" database to see which column(s) ## have information on the distance to the river. Use the arguments ## "xlab" and "ylab" to indicate the variables and units ## represented by the horizontal and vertical axes, respectively. ## TASK 41: plot the relationship between the distance to the Meuse River (horizontal axis) ## and the concentration of cadmium in the soil of the Meuse River valley (vertical axis). ## Use the arguments "xlab" and "ylab" to specify the names and units of ## the variables represented by the horizontal and vertical axes, respectively. ## TASK 42: plot the relationship between the logarithm of the distance to the Meuse River ## (horizontal axis) and the lead concentration in the Meuse River valley (vertical axis). ## Use the arguments "xlab" and "ylab" to indicate the names and units of the variables ## represented by the horizontal and vertical axes, respectively. ## TASK 43: use function *lm* to fit a linear regression model in ## which the response variable (dependent variable) is the concentration ## of lead in the soil of the Meuse River valley and the independent variable is ## the logarithm of the distance to the river. Assign the result (using ## the operator "<-") to an object named "model.lead", and use the function ## *summary* to examine the magnitude, precision (the standard error), and ## statistical significance of the regression coefficients. ## TASK 44: use function *abline* to add the regression line ## of "model.lead" to the figure you created in TASK 42. ## TASK 45: plot the relationship between the logarithm of the distance to ## the Meuse River (horizontal axis) and the cadmium concentration in the ## Meuse River valley (vertical axis). Use the arguments "xlab" and "ylab" ## to indicate the names and units of the variables represented by the ## horizontal and vertical axes, respectively. ## TASK 46: use function *lm* to fit a linear regression model in ## which the response variable (dependent variable) is the concentration of ## cadmium in the Meuse River valley and the independent variable is the ## logarithm of the distance to the river. Assign the result (using the ## operator "<-") to an object named "model.cadmium", and use the function ## *summary* to examine the magnitude, precision (the standard error), and ## statistical significance of the regression coefficients. ## TASK 47: use function *abline* to add the regression line of the ## "model.cadmium" to the figure I created in TASK 45. # Spatial correlation is a measure of heterogeneity in spatial arrangement # (or spatially explicit heterogeneity, see text under TASK 6). In # particular, spatial correlation measures the relationship between observed values # in nearby or neighboring sites (Legendre and Legendre 2012, Numerical Ecology. # Elsevier). One measure of spatial correlation is Moran's I. Read the # equation 7.3 on page 232 of Bocard et al. (2011, Numerical Ecology. # Springuer), available at: # http://rbasicsworkshop.weebly.com/books.html # Be sure to read the text at the bottom of page 232 and reflect on the # meaning of equation 7.3. The code below calculates the spatial correlation # in the concentration of lead in the soil of the Meuse River valley. To run # this code you need to install and load the "spdep" package in your R session: install.packages("spdep") library(spdep) # To get an idea of the frequency of geographic distances between # the sampling points at which we want to calculate Moran's I, you # may use the *dist* and *hist* functions: hist(dist(meuse[,1:2]), breaks=100, main="", xlab="Distance between sampling points (m)", ylab="Frequency (pairs of sampling points)") # this histogram suggests that you can calculate Moran's I by defining neighboring # sampling sites as those separated by distances less than or equal to 500 meters. # Visit the help page for the *dnearneigh* function of the package "spdep", which you # will use to define neighboring sampling sites. Notice that the # first argument "x" must be an array (or an object of class "SpatialPoints", # specialized for spatial analysis) with the coordinates of sampling sites. # Note that indexing the coordinates of the sampling sites, # produces a data frame and not an array: class(meuse[,1:2]) # therefore it is necessary to use the function *as.matrix* to define the # value of the argument "x" of the function *dnearneigh*: neighbor.sites <- dnearneigh(x=as.matrix(meuse[,1:2]), d1=0, d2=500) # The resulting object, neighbor.sites, is of class "nb" which is a class of # objects specialized in representing relations of spatial proximity (neighborhood): neighbor.sites[[1]] # to reinforce this idea, study the following code, which displays a # map of all sampled sites in the Meuse River valley, highlighting # in red a focal site and in gray the neighboring sites: plot(meuse$x, meuse$y, col="gray90", pch=21, xlab = "Easting (m) in Rijksdriehoek", ylab = "Northing (m) in Rijksdriehoek") polygon(meuse.riv, col="blue", border="blue") points(meuse$x[1], meuse$y[1], col="red", pch=19) points(meuse$x[neighbor.sites[[1]]], meuse$y[neighbor.sites[[1]]], col="gray70", pch=19) ## TASK 48: modify the code above to show neighboring sites to the ## site that is in row 50 of the "meuse" data frame. To start, ## the following code identifies the neighbors of this site: neighbor.sites[[50]] ## now modify the code immediately before this task to ## make a map that highlights the focal site in red (which is in the ## row 50 of the "meuse" data frame) and its neighbors grayed out. # Next use the "neighbor.sites" object to estimate the spatial correlation, # measured with Moran's I. To do this, the following code uses the # function *sp.correlogram*. The "order" argument determines the distance to # which the spatial correlation is computed. If "order=1", the correlation is # computed only for immediate neighbors. If "order=2", the spatial correlation # is calculated for immediate neighbors and, also, for sites that are more distant, # twice the distance from immediate neighbors. And so on. Visit the help page for # function *sp.correlogram* to learn about the arguments for this function. Then # study this code: correlogram.lead <- sp.correlogram(neighbours=neighbor.sites, var=meuse$lead, order = 5, method = "I") # The resulting object, "correlogram.lead", is of class "spcor". This is a # specialized class of objects for spatial analysis. Make sure the object # "correlogram.lead" belongs to class "spcor" and examine its structure using # functions *str* and *attributes*: class(correlogram.lead) str(correlogram.lead) attributes(correlogram.lead) # the result of function *str* shows that the class "spcor" is a # particular type of list. The structure of this list is described in the # "Value" section of the help page for function "sp.correlogram". The # following code prints to the console the calculated values off Moran's I # for five categories of neighbors ("estimate"), the expected value under the # null hypothesis of no spatial correlation ("expectation") and its variance # ("variance"), the degree to which the observed value deviates from the null # hypothesis ("standard deviation") and the statistical significance ("Pr(I) two # sided"): print(correlogram.lead) # the following code plots these results for the five categories of # neighbors: plot(correlogram.lead) ## TASK 49: spatial correlation among residuals of regression analysis, ## such as those contained in "model.lead" and "model.cadmium", is an important ## aspect of spatial data analysis (see page 9 in Legendre & Legendre 1998, ## available at: http://rbasicsworkshop.weebly.com/books.html). Calculate and graph ## the spatial correlation among residuals of "model.lead", using Moran's I. ## To begin with, use function *residuals* to obtain the residals of "model.lead": residuals(model.lead) ## TASK 50: make a map showing the residuals of "model.lead", using four colors ## to show four partitions of the residuals according to quartiles. Be sure to ## include the names and units of the horizontal and vertical axes, the title of ## the graph (you could use the function *mtext*) and a legend. The code for TASKS ## 30-35 may be helpful here. ## TASK 51: calculate and plot the spatial correlation between the residuals of the ## "model.cadmium", using Moran's I. ## TASK 52: create a map showing the residuals of "model.cadmium", ## using four colors to show four partitions of the residuals ## according to quartiles. The map should show the outline of the Meuse River. ## Be sure to include the names and units of the horizontal and vertical axes, ## the title of the graph (you could use function *mtext*) and a legend.