A Case Study in Reproducible Model Building

2016-08-04

center

The U.S. Geological Survey (USGS) recently published a report describing a groundwater-flow model of the Wood River Valley (WRV) aquifer system. What makes this report unique (at least in my opinion) was the authors’ desire to make their work as reproducible as possible under budgetary constraints. The collection of raw data, source code, and processing instructions used to build and analyze the model was placed in an non-general-use R package named wrv. The package repository can be found on GitHub. Commands for installing the package are as follows:

repos <- c("http://owi.usgs.gov/R", getOption("repos"))
install.packages("wrv", repos = repos, dependencies = TRUE, type = "both")  # about 100 MB, so be patient

While many of the functions are intended for non-general use, there are a few functions that the larger R community might find of interest. For example, the PlotMap, PlotGraph, and PlotCrossSection functions have been designed for general use. Report documentation was included in the wrv package as vignettes; these files are also available from the USGS Publications Warehouse. For a general overview of the project, I’ll recommend my useR! 2016 talk:

Any comments or suggestions regarding our approach to reproducible model building can be left below. Please realize that your opinions go a long way in determining whether this type of approach will be used in future projects.

Blog with Knitr and Jekyll

2012-07-03

The knitr package provides an easy way to embed R code in a Jekyll-Bootstrap blog post. The only required input is an R Markdown source file. The name of the source file used to generate this post is 2012-07-03-knitr-jekyll.Rmd, available here. Steps taken to build this post are as follows:

Step 1

Create a Jekyll-Boostrap blog if you don’t already have one. A brief tutorial on building this blog is available here.

Step 2

Open the R Console and process the source file:

KnitPost <- function(input, base.url = "/") {
  require(knitr)
  opts_knit$set(base.url = base.url)
  fig.path <- paste0("figs/", sub(".Rmd$", "", basename(input)), "/")
  opts_chunk$set(fig.path = fig.path)
  opts_chunk$set(fig.cap = "center")
  render_jekyll()
  knit(input, envir = parent.frame())
}
KnitPost("2012-07-03-knitr-jekyll.Rmd")

Step 3

Move the resulting image folder 2012-07-03-knitr-jekyll and Markdown file 2012-07-03-knitr-jekyll.md to the local jfisher-usgs.github.com git repository. The KnitPost function assumes that the image folder will be placed in a figs folder located at the root of the repository.

Step 4

Add the following CSS code to the /assets/themes/twitter-2.0/css/bootstrap.min.css file to center images:

[alt=center] {
  display: block;
  margin: auto;
}

That’s it.


Here are a few examples of embedding R code:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
par(mar = c(4, 4, 0.1, 0.1), omi = c(0, 0, 0, 0))
plot(cars)

center

Figure 1: Caption
par(mar = c(2.5, 2.5, 0.5, 0.1), omi = c(0, 0, 0, 0))
filled.contour(volcano)

center

Figure 2: Caption

And don’t forget your session information for proper reproducible research.

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10586)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.13.1
## 
## loaded via a namespace (and not attached):
## [1] magrittr_1.5  formatR_1.4   tools_3.3.1   stringi_1.1.1 stringr_1.0.0
## [6] evaluate_0.9

Grid2Polygons

2012-06-25

I’d like to introduce you to the Grid2Polygons function; an R function for converting sp spatial objects from class SpatialGridDataFrame to SpatialPolygonsDataFrame. The significance of this conversion is that spatial polygons can be transformed to a different projection or datum with the spTransform function in package rgdal. Postscript files created with spatial polygons are reduced in size and result in a much “cleaner” version of your image. Disadvantages of the conversion include long computational times and irreversible leveling, partitioning the range of z values. A general explanation of the algorithm is provided here; inspiration provided here.

To access the function install the Grid2Polygons package (source):

install.packages("Grid2Polygons")
library(Grid2Polygons)

See help documentation for argument descriptions:

?Grid2Polygons

The following examples highlight the functions usefulness:

Example 1

Construct a simple spatial grid data frame.

m <- 5
n <- 6
z <- c(1.1,  1.5,  4.2,  4.1,  4.3,  4.7,
       1.2,  1.4,  4.8,  4.8,   NA,  4.1,
       1.7,  4.2,  1.4,  4.8,  4.0,  4.4,
       1.1,  1.3,  1.2,  4.8,  1.6,   NA,
       3.3,  2.9,   NA,  4.1,  1.0,  4.0)
x <- rep(0:6, m + 1)
y <- rep(0:5, each = n + 1)
xc <- c(rep(seq(0.5, 5.5, by = 1), m))
yc <- rep(rev(seq(0.5, 4.5, by = 1)), each = n)
grd <- data.frame(z = z, xc = xc, yc = yc)
coordinates(grd) <- ~ xc + yc
gridded(grd) <- TRUE
grd <- as(grd, "SpatialGridDataFrame")

Plot the grid using a gray scale to indicate values of z (fig. 1).

image(grd, col = gray.colors(30), axes = TRUE)
grid(col = "black", lty = 1)
points(x = x, y = y, pch = 16)
text(cbind(xc, yc), labels = z)
text(cbind(x = x + 0.1, y = rev(y + 0.1)), labels = 1:42, cex = 0.6)

center

Figure 1: Simple spatial grid data frame.

Convert the grid to spatial polygons and overlay in plot (fig. 2). Leveling is specified with cut locations at 1, 2, 3, 4, and 5, and z-values set equal to the midpoint between breakpoints. A “winding rule” is used to determine if a polygon ring is filled (island) or is a hole in another polygon.

plys <- Grid2Polygons(grd, level = TRUE, at = 1:5)
cols <- rainbow(4, alpha = 0.3)
plot(plys, col = cols, add = TRUE)
x <- rep(0:6, m + 1)
y <- rep(0:5, each = n + 1)
legend("top", legend = plys[[1]], fill = cols, bty = "n", xpd = TRUE, inset = c(0, -0.1), ncol = 4)

center

Figure 2: Simple gridded data represented with spatial polygons.

Example 2

Apply the conversion function to the meuse data set, included in the sp package. The effect of leveling is shown in figure 3.

data(meuse.grid)
coordinates(meuse.grid) <- ~ x + y
gridded(meuse.grid) <- TRUE
meuse.grid <- as(meuse.grid, "SpatialGridDataFrame")
meuse.plys <- Grid2Polygons(meuse.grid, "dist", level = FALSE)
op <- par(mfrow = c(1, 2), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0))
z <- meuse.plys[[1]]
col.idxs <- findInterval(z, sort(unique(na.omit(z))))
cols <- heat.colors(max(col.idxs))[col.idxs]
plot(meuse.plys, col = cols)
title("level = FALSE", line = -7)
meuse.plys.lev <- Grid2Polygons(meuse.grid, "dist", level = TRUE)
z <- meuse.plys.lev[[1]]
col.idxs <- findInterval(z, sort(unique(na.omit(z))))
cols <- heat.colors(max(col.idxs))[col.idxs]
plot(meuse.plys.lev, col = cols)
title("level = TRUE", line = -7)
par(op)

center

Figure 3: Distance from river represented with spatial polygons.