Package ecosystem and graphics are strengths; security and memory management are weaknesses
The R programming language is an important tool for development in the numeric analysis and machine learning spaces. With machines becoming more important as data generators, the popularity of the language can only be expected to grow. But R has both pros and cons that developers should know.
With interest in the language growing, as shown on language popularity indexes such as TIobe, PyPL, and Redmonk, R first appeared in the 1990s and has served as an implementation of the S statistical programming language. Notes Roger Peng, an 18-year R programming veteran who teaches R both at the university and on the Coursera online platform, “R is the most popular language used in the field of statistics.”
“I like [R] because it’s very easy to program in from a more computer science-y level,” says Peng. And R has gotten faster over time and serves as a glue language for piecing together different data sets, tools, or software packages, Peng says.
“R is the best way to create reproducible, high-quality analysis. It has all the flexibility and power I’m looking for when dealing with data,” says Matt Adams, a data scientist at Code School, which offers online programming education. “Most of the programs I write in R are actually just collections of scripts that are organized into projects.”
R’s strong package ecosystem and charting benefits
R’s advantages include its package ecosystem. “The vastness of package ecosystem is definitely one of R’s strongest qualities — if a statistical technique exists, odds are there’s already an R package out there for it,” says Adams.
“There’s a lot of functionality that’s built in that’s built for statisticians,” says Peng. R is extensible and offers rich functionality for developers to build their own tools and methods for analyzing data, he says. “As time has gone on, a lot more people have been attracted to it from other fields,” including biosciences and even humanities.
“People can extend it without having to ask permission.” Indeed, Peng recalls R’s usage terms as being a big help many years ago. “At the time when it first came out, the biggest advantage was that it was free software. The source code and everything about it was available to look at.”
All R’s graphics and charting capabilities, Adams says, are “unmatched.” The dplyr and ggplot2 packages for data manipulation and plotting, respectively, “have literally improved my quality of life,” he says.
For machine learning, R’s advantages are linked mostly to R’s strong ties to academia, says Adams. “Any new research in the field probably has an accompanying R package to go with it from the get-go. So in this respect, R stays at the cutting edge,” he says. “The caret package also offers a pretty nifty way of doing machine learning in R through a relatively unified API.” Peng also notes that a lot of popular machine learning algorithms are implemented in R.
R’s shortcomings in security and memory management
For all its benefits, R has its share of shortcomings. “Memory management, speed, and efficiency are probably the biggest challenges R faces,” says Adams. “Strides have been — and are still being — made to make progress on those fronts. Also, people coming to R from other languages might also consider R quirky.”
The basic principle of R emanates from programming languages built in the 1960s, Peng says. “In that sense, it’s kind of an old technology in the way it was originally designed.” The design of the language can sometimes pose problems in working with very large data sets, he says. Data has to be stored in physical memory. But as computers have gotten more memory, this has become less of an issue, Peng notes.
Capabilities such as security were not built into the R language, Peng says. Also, R cannot be embedded in a Web browser, says Peng. “You can’t use it for Web-like or Internet-like apps.” It was basically impossible to use R as back-end server to do calculations because of its lack of security over the Web, he says. The security issue, however, has been lessened by developments such as the use of virtual containers on the Amazon Web Services cloud platform, Peng says.
For a long time, there was not a lot of interactivity in the language, he says. Languages such as JavaScript still have to come in and fill this gap, says Peng. Although an analysis may be done in R, the presentation of results might be done in different language such as JavaScript, he says.
R isn’t just for advanced programmers
Still, Adams and Peng both see R as an accessible language. “I don’t come from a computer science background and never had aspirations of becoming a programmer. Knowledge of programming fundamentals certainly helps when adding R to your toolbox, but I wouldn’t say it’s required to get started,” Adams says.
“I wouldn’t even say R is for programmers. It’s best suited for people that have data-oriented problems they’re trying to solve, regardless of their programming aptitude,” he says.