Jon is an experienced, results-driven engineer who writes great code. He is confident and experienced at managing the unexpected.
Updated Sep 15, 2022
R is not a new language. It is an implementation of an older language called S that was initially developed between 1975 and 1976 by John Chambers. R was first conceived in 1992 and initially released in 1995. R has been taught to data analysis students in different fields for at least a decade. Because they learn about it in school, many data analysts who are not developers believe it is their only resource.
R has a number of advantages for this kind of work. It’s focused on making statistical analysis and visualization easy for humans (resulting in performance tradeoffs, which we’ll discuss). It’s “fast enough” for a large swath of typical analytics tasks. The code is intuitive from the perspective of the data analyst. And, it’s highly flexible–you can change a lot of aspects of a program as you go, exploring data and iterating on a problem.
Challenges of working in R
Although data scientists love it, as software engineers, we wouldn’t necessarily choose to work in R. There are mature statistical computing technologies that we can leverage to accomplish many of the same goals without the performance and DevOps disadvantages, such as Python or Julia. However, R is very common, and if you’re working on analytics projects–especially in finance, government, or academia–you’re likely to run into it eventually.
R was not designed as a general-purpose programming language. Unfortunately, it often gets used in contexts where it’s not optimal simply because it’s what people know. By the time we enter the picture, we’re stuck with it and have to figure out how to make it work as well as possible.
We recently engaged in a project where the client had an existing codebase written in R. Jumping the “R ship” didn’t make sense. We needed to work within the language and find the best possible solutions. Here, we share what we learned so you can avoid some challenges we had to overcome the hard way.
Lack of documentation
The initial issue was documentation. While R has a large community and many packages that provide additional functionality, the documentation can be a bit spotty. It is challenging to find what a function returns or what arguments a specific function accepts. Since the language has been around for a long time, there are first-page Google results for outdated functions that do not indicate they are deprecated or what new function to use.
While there are many helpful blog posts on R, most of them are written by non-developers who are excited by the idea of a “function”. It’s great that these resources exist, but it can take a bit of digging to find deeply useful information.
Slow and manual package management
Package management in R leaves a lot to be desired. Packrat provides general package management functionality. Unfortunately, it tends to be slow as molasses. Each package is downloaded as source and then built. The source is stored locally as tar files. Whenever any one package is updated, all the packages are rebuilt–a very time-consuming process.
The Renv package is superior in many ways to Packrat and has gained significant popularity. Renv adds a lock file (renv.lock) that tracks and locks the dependency versions. It includes a script to bootstrap itself to ensure it is available at the beginning of the R session. It is still a far cry from seamless dependency management found in tools such as npm and yarn (for JavaScript).
Performance speed bumps
R is single-threaded. That’s right: with the vanilla distribution of R, you can run your app on a 64-core megaprocessor and it will poke along using only one of those cores. Additionally, R is an interpreted language, making processing of structures such as for-loops very slow. There are workarounds, but it’s still not going to match the performance of a compiled language.
In some ways, the performance limitations are a natural result of R’s key benefit, which is that it was written to help people do statistical analysis, not to optimize computing performance. A compiled C# program can do things very fast, but very few humans could look at the assembly code and understand what’s going on. R, by contrast, is highly legible and flexible. You can change functions, methods, fields, and objects whenever you want without breaking the application. That makes it easy to iterate and solve problems on the fly–as many humans like to do.
Unusual Syntax
The syntax in R might feel a bit strange. For example, in many common programming languages, the properties of objects are referenced using a dot (.), as in
ThisObject.property01
In R, the dot (.) is (mostly) just a string character. It is typical, and often preferred, to create variable names using a dot (.) as a word separator instead of camel-case or snake-case, as in
this.great.variable
Instead, a dollar sign ($) is used to reference a property name, as in
ThisObject$property02
Another syntax quirk that may be unfamiliar to modern developers is the assignment operator ( <- , -> , = , <<- , ->> ). The preferred assignment operator in R is <- .
Unfriendly Namespacing
Namespacing is challenging, global collisions are common, and passing function arguments are extremely “squishy”. The “squishy”-ness of function arguments can lead to code that is challenging to read (see http://adv-r.had.co.nz/Functions.html#function-arguments for more information.
Solutions that worked for us
Third-Party Package Management
There are many third-party packages available to include in an R application. These can provide additional functionality, such as database connection or data frame tools. Ensuring that these dependencies are consistent through environments and on each user’s machine will help the application to perform predictably.
We leveraged Renv for package management. We built a process for installing new dependencies and getting dependencies installed for the specific project into the app.
While Renv worked great for local development, it’s not supported in ShinyApps.io via RSconnect. The key was to maintain the renv.lock file for local development while ALSO maintaining project dependencies in the DESCRIPTION file under “imports”. We put a request in to RSconnect to support Renv, but have yet to receive a response.
A note of caution: we have seen folks recommending that you copy code from packages and paste it into your project to avoid the need to install the package as a dependency. However, this cuts your package code off from maintenance processes like bug fixes and updates. That’s a lot of copying and pasting, pretty much forever. I recommend that you always call the function from the package, even if it’s a little more work up front.
Options for Improving Performance
To enhance performance in processing data frames, we dug a bit deeper than R. The solution often involves finding the right tool for the job, and knowing what NOT to do.
na_to_green <- function(value) {
If (!is.na(value)) {
return(value)
}
return(“green”)
}
result <- This.data.frame %>%
dplyr::rowwise() %>%
dplyr::mutate(column.name = na_to_green(column.name))
it is more efficient to write something like
result <- This.data.frame %>%
dplyr::mutate(column.name = ifelse(!is.na(column.name), column.name, “green”))
rm(the.global.object, pos = “.GlobalEnv”)
And then call garbage collection
gc()
Calling garbage collection after functions that create and / or process large pieces of data will help reduce memory usage.
R is pretty easy to pick up when you are using it for its intended purpose–exploring data and solving problems in an iterative, intuitive fashion. However, if you have to use it in the context of a modern application, it’s going to throw some barriers in your way. It’s just not built to meet those expectations. Don’t get too frustrated! Put on your 1996 developer hat and proceed from there.
We were able to overcome the challenges of working in R and are definitely stronger for it. R is very manageable and you can employ good engineering practices to make the most of it. If your product uses R and you’re stuck on what to do next, give us a shout. We’re happy to figure out how we can help.
Can we help you apply these ideas on your project? Send us a message! You'll get to talk with our awesome delivery team on your very first call.