Learning Objectives

When I am embarking upon a particular analysis, I first outline the different functions I will need, their input and outputs. I then write the functions in the order in which they are needed, testing/debugging the functions along the way.

Basic debugging strategies

(modified by a tutorial from Chris Paciorek)

Some common causes of bugs

Some of these are R-specific, while others are common to a variety of languages.

More insidious bugs - Comparing real numbers exactly using == is dangerous because numbers on a computer are only represented to limited numerical precision. For example,

    1/3 == 4*(4/12-3/12)
## [1] FALSE

General tips to avoid bugs

R options for debigging

Interactive debugging via the browser

The core strategy for interactive debugging is to use the browser function, which pauses the current execution, and provides an interpreter, allowing you to view the current state of R. You can invoke browser in four ways

Once in the browser, you can execute any R commands you want. In particular, using ls to look at the objects residing in the current function environment, looking at the values of objects, and examining the classes of objects is often helpful.

Using debug to step through code

To step through a function, use debug(nameOfFunction). Then run your code. When the function is executed, R will pause execution just before the first line of the function. You are now using the browser and can examine the state of R and execute R statements.

Once in the browser context, you can use ‘n’ or return to step to the next line, ‘c’ to execute the entire current function or current loop, and `’ to stop debugging. We’ll see this in the screencast demo.

To unflag the function so that calling it doesn’t invoke debug, use undebug(nameOfFunction). In addition to working with functions you write you can use debug with standard R functions and functions from packages. For example you could do debug(glm).

Tracing errors in the call stack

traceback and recover allow you to see the call stack at the time of the error - i.e., they will show you all the functions that have been called, in the order called. This helps pinpoint where in a series of function calls the error may be occurring.

If you’ve run the code and gotten an error, you can invoke traceback after things have gone awry. R will show you the call stack, which can help pinpoint where an error is occurring.

More helpful is to be able to browse within the call stack. To do this invoke options(error = recover) (potentially in your .Rprofile if you do a lot of programming). Then when an error occurs, recover gets called, usually from the function in which the error occurred. The call to recover allows you to navigate the stack of active function calls at the time of the error and browse within the desired call. You just enter the number of the call you’d like to enter (or 0 to exit). You can then look around in the frame of a given function, entering an empty line when you want to return to the list of calls again.

You can also combine this with options(warn = 2), which turns warnings into errors to get to the point where a warning was issued.

Using trace to temporarily insert code

trace lets you temporarily insert code into a function (including standard R functions and functions in packages!) that can then be easily removed. You can use trace in a variety of ways.

The most flexible way to use trace is to use the argument edit = TRUE and then insert whatever code you want wherever you want in the function given as the first argument to trace. If I want to ensure I use a particular editor, such as emacs, I can use the argument edit = “emacs”. A standard approach would be to add a line with browser() at some point in the function to be able to step through the code from that point.

You can also use trace without directly editing the function. Here are a couple examples:

You call untrace, e.g., untrace(lm), to remove the temporarily inserted code; otherwise it’s removed when the session ends.

To figure out why warnings are being issued, you can do trace(warning, recover) which will insert a call to recover whenever warning is called.

Of course you can manually change the code in a function without using trace, but it’s very easy to forget to change things back (and a pain to remember exactly what you changed) and hard to do this with functions in packages, so trace is a nice way to do things.

Challenge - Debugging

Use browser() to find the bug in this function to plot the number of bee operations in New Zealand

 hb <- read.csv('data/beeOperations.csv')
 hb <- hb[order(hb$year),]

 plotVar <- function(hb, var, ylabs,
                    types=c("hobby", "semicom", "com"),
                    div=10^3){
    ## function to plot honey bee data by region
    ## takes hb, a data.frame with the varaibles to be plotted
    ## var, the name of the column of interest
    ## types, the different types of honey bee keepers
    ## div, a scalar to divide the values on the yaxis by to avoid
    ## ugly numbers
    layout(matrix(1:(length(unique(hb$region))+1), ncol=2))
    par(oma=c(3,3,2,1), mar=c(2,2,3,1), mgp=c(2,1,0),
        cex.axis=1.5)
    ntypes <- length(types)
    for(i in unique(hb$region)){
       this.region <- hb[hb$region == i,]
        plot(x=this.region$year[this.region$type == types[1]],
             y=this.region[,var][this.region$type == types[1]],
             type="l",
             ylim=c(0, max(hb[,var], na.rm=TRUE)),
             xlab="Year",
             ylab=ylabs,
             yaxt="n",
             main=i)
        ltys <- c("dashed", "dotted")
        for(j in 2:length(types)){
        points(x=this.region$year[this.region$type == types[j]],
               y=this.region[,var][this.region$type == types[j]],
               type="l", lty=ltys[j])
        }
        if(i %in% unique(hb$region)[1:4]){
            axis(2, pretty(c(0, max(hb[,var], na.rm=TRUE))),
                 labels=pretty(c(0, max(hb[,var], na.rm=TRUE)))/div)
        }
    }
}


plotVar(hb, "hives", "Hives")
plotVar(hb, "apiaries", "Apiaries")
plotVar(hb, "beekeepers", "Beekeepers")
plotVar(hb, "hives_beekeeper", "Hives per beekeeper")
plotVar(hb, "apiary_beekeeper", "Apiaries per beekeeper", div=1)

Practice defensive programming

When writing functions, and software more generally, you’ll want to warn the user or stop execution when there is an error and exit gracefully, giving the user some idea of what happened. Here are some things to consider:

The warning and stop functions allow you to do stop execution and issue warnings or errors in the same way that base R code does; in general they would be called based on an if statement. More succinctly, to stop code if a condition is not satisfied, you can use stopifnot. This allow you to catch errors that can be anticipated.

Here’s an example of building a robust square root function using stop and warning. Note you could use stopifnot(is.numeric(x)) in place of one of the checks here.

mysqrt <- function(x) {
    if (is.list(x)) {
        warning("x is a list; converting to a vector")
        x <- unlist(x)
    }
    if (!is.numeric(x)) {
        stop("What is the square root of 'bob'?")
        ## alternatively: stopifnot(is.numeric(x))
    } else {
        if (any(x < 0)) {
            warning("mysqrt: found negative values; proceeding anyway")
            x[x >= 0] <- (x[x >= 0])^(1/2)
            x[x < 0] <- NaN
            return(x)
        } else return(x^(1/2))
    }
}
mysqrt(c(1, 2, 3))
mysqrt(c(5, -7))
mysqrt(c("asdf", "sdf"))
mysqrt(list(5, 3, "ab"))

Catch run-time errors with try statements

Also, sometimes a function you call will fail, but you want to continue execution. For example, suppose you are doing a stratified analysis in which you take subsets of your data based on some categorical variable and fit a statistical model for each value of the categorical variable. If some of the subsets have no or very few observations, the statistical model fitting might fail. To do this, you might be using a for loop or lapply. You want your code to continue and fit the model for the rest of the cases even if one (or more) of the cases cannot be fit. You can wrap the function call that may fail within the try function (or tryCatch) and then your code won’t stop, even when an error occurs. Here’s a toy example.

set.seed(0)
nCats <- 30
n <- 100
y <- rnorm(n)
x <- rnorm(n)
cats <- sample(1:nCats, n, replace = TRUE)
data <- data.frame(y, x, cats)

params <- matrix(NA, nr = nCats, nc = 2)
for(i in 1:nCats){
    sub <- data[data$cats == i, ]
    fit <- try(
        lm(y ~ x, data = sub) )
    if(!is(fit, "try-error"))
        params[i, ] = fit$coef
}
params
##              [,1]       [,2]
##  [1,]  0.32220511  0.4166179
##  [2,]  0.47524008 -0.6892083
##  [3,] -0.96340970  0.8029426
##  [4,]  0.91113661  0.4604388
##  [5,]  0.50491729  0.2285024
##  [6,]  9.48183414 13.2340397
##  [7,]          NA         NA
##  [8,] -0.14232948  0.1600386
##  [9,] -0.08800939  1.1267142
## [10,]  0.43568330         NA
## [11,]  0.16216315  0.0224648
## [12,] -0.08576737 -0.1363258
## [13,] -0.43036777  1.0122300
## [14,]  0.39878728  0.1609506
## [15,] -0.57225776 -0.1405326
## [16,]  0.76815674  1.0656086
## [17,]  0.72675075         NA
## [18,] -0.38587004  0.3660344
## [19,] -1.32035875 -0.1513769
## [20,] -6.29053429 -8.1642739
## [21,] -1.01985678  0.5352108
## [22,] -0.27658878  0.2977353
## [23,]  0.27678418  0.2170440
## [24,]  0.11880266  0.2487930
## [25,] -0.35410886 -1.0930109
## [26,]  0.21821623 -0.5951547
## [27,]  0.33884808 -0.1287089
## [28,]  0.61548557  1.5411494
## [29,] -0.21555810  0.1195182
## [30,] -2.24250180 -1.4846365

The seventh stratum had no observations, so that call to lm failed, but the loop continued because we ‘caught’ the error with try. In this example, we could have checked the sample size for the subset before doing the regression, but in other contexts, we may not have an easy way to check in advance whether the function call will fail.