Category: Open Source

Outlier Reporting and Benefits from Unit Testing in R

A recent AlignAlytics analysis project, reliant on Big Data processing and storage, required complex outlier reporting using the R statistical-programming language. This open-source software, combined with in-house statistical skills, allowed the team to quickly produce reports that are now the foundation of an on-going strategic analysis programme.

Unit testing is one part of this story and we hope Peter Rosenmai can continue to share more with us.

Getting started with unit testing in R

Unit testing is an essential means of creating robust code. The basic idea is simple: You write tests that the functions you code are required to fulfil; whenever you thereafter make changes to your code, you can run the tests to ensure that your functions all still work.

Such future-proofing is obviously useful, but unit testing brings other benefits. It forces you to break your code down into discrete, testable units. And the tests provide excellent examples of how your functions should be called. That can be really useful, especially when code commenting is shoddy or out of date.

Here’s an example of unit testing in the R statistical-programming language using the RUnit package. We have a file main.r in our current working directory. That file contains main(), our top-level function:

# main.r

# Load in the unit testing package

# Load in source files

# Function to run all unit tests (functions named test.*) in all
# R files in the current working directory
runUnitTests <- function(){
   cat("Running all unit tests (being functions that begin with 'test.')
        in all R files in the current working directory.")

   tests <- defineTestSuite("Tests", dirs=getwd(),
                            testFileRegexp = "^.*.[rR]$",
                            testFuncRegexp = "^test..+")

   test.results <- runTestSuite(tests)

   cat(paste(test.results$Tests$nTestFunc,    " test(s) run.n",
             test.results$Tests$nDeactivated, " test(s) deactivated.n",
             test.results$Tests$nFail,        " test(s) failed.n",
             test.results$Tests$nErr,         " errors reported.n",

   if((test.results$Tests$nFail > 0) || (test.results$Tests$nErr > 0)){
      stop("Execution halted following unit testing. Fix the
	        above problem(s)!")

main <- function(run.unit.tests=TRUE){
   if (run.unit.tests) runUnitTests()

   # Your code here...

The above code loads in from our current working directory the file string-utils.r:

# string-utils.r

# Load the unit testing package

# Function to trim a string
trim <- function(str){
   if (class(str) != "character"){
      stop(paste("trim passed a non string:", str))

   return(gsub("^s+|s+$", "", str))
test.trim <- function(){
   checkTrue(trim("  abc ") == "abc")
   checkTrue(trim("a b ")   == "a b")
   checkTrue(trim(" a b")   == "a b")
   checkTrue(trim("")       == "")
   checkException(trim(3), silent=TRUE)

We run our top-level function using:

rm(list=ls(all=TRUE)); source("main.R",echo=FALSE); main()

That line removes all variables from the workspace, creates the functions in the above blocks of code and calls main(). The first thing main() does is call runUnitTests() to run all functions with names that start with "test." in all R files in the current working directory. Those are our unit tests.

For example, one of those unit test functions is test.trim(), the function shown above that checks that trim() is working as it should. Note how test.trim() not only checks expected return values but makes sure that trim() throws exceptions when it should. And what does trim() do? The examples in the test code should make it clear—which is why I like to keep the unit tests together with the functions that they test.

The above is the briefest of introductions to a huge topic. I could say a lot more about, for instance, test-driven development, refactoring and code coverage. But my aim here is not that ambitious. If you’re an analyst or a statistician, chances are you haven’t previously heard of unit testing. If that’s the case, I merely wish to suggest that you give the above a try the next time you find yourself coding in R. Unit testing really is worth the effort.

Author: Peter Rosenmai

Posted on January 19, 2014 by Danielle Mosimann