Learn the new pipe operator built into R 4.1 and how it differs from the maggritr pipe. Don’t want to install R 4.1 yet? See how to run R 4.1 in a Docker container. Credit: Thinkstock The R language has a new, built-in pipe operator as of R version 4.1: |> %>% is the pipe that most R users know. Originally from the magrittr package, it’s now used in many other packages as well. (If you’re wondering where the magrittr name came from, it’s a reference to Belgian artist Rene Magritte and one of his paintings, The Treachery of Images, that says in French: “This is not a pipe.”) Here’s a somewhat trivial example using the %>% pipe with the mtcars data set and a couple of dplyr functions. This code filters the data for rows with more than 25 mpg and arranges the results by descending miles per gallon: library(dplyr) mtcars %>% filter(mpg > 25) %>% arrange(desc(mpg)) Not everyone likes the pipe syntax. But especially when using tidyverse functions, there are advantages in code readability, in not having to repeat the data frame name, and not creating new copies of a data set. Here are some non-pipe ways of writing the same dplyr code: mtcars <- filter(mtcars, mpg > 25) mtcars <- arrange(mtcars, desc(mpg)) # OR arrange(filter(mtcars, mpg > 25), desc(mpg)) Run R 4.1 in Docker If you’re not yet ready to install R 4.1 on your system, one easy way to try out the new pipe is by running R 4.1 inside a Docker container. I provide full general instructions in “How to run R 4.0 in Docker” — the only new part is using a Docker image with R 4.1. Basically, you need to download and install Docker if you don’t already have it, launch Docker, and then run the code below in a terminal window (not the R console). docker run -e PASSWORD=your_password_here --rm -p 8787:8787 -v /path/to/local/directory:/home/rstudio/morewithr rocker/tidyverse:4.1.0 The -v /path/to/local/directory:/home/rstudio/morewithr part of the code creates a volume connecting a directory inside the Docker container to files in a local directory. That’s optional but can be quite handy. The new pipe in R 4.1 Why does R need a new, built-in pipe when magrittr already supplies one? It cuts down on external dependencies, so developers don’t have to rely on an external package for such a key operation. Also, the built-in pipe may be faster. The new base R and magrittr pipes work mostly the same, but there’s an important difference when handling functions that don’t have pipe-friendly syntax. By pipe friendly, I mean a function’s first argument is likely to be a value that will be passed through from piped code. For example, the str_detect() function in the stringr package uses the string to be searched as its first argument and the pattern to search for as the second argument. That works well with pipes. For example: library(stringr) # add column name with car model number mtcars$model <- rownames(mtcars) # filter for all cars that start with "F" mtcars %>% filter(str_detect(model, "^F")) By contrast, grepl() in base R has the opposite syntax. Its first argument is the pattern and the second argument is the string to search. That causes problems for a pipe. The maggritr pipe has a solution for non-pipe-friendly syntax, which is to use the . dot character to represent the value being piped in: mtcars %>% filter(grepl("^F", .[["model"]])) Now let’s see how the base R pipe works. It runs the stringr code just fine: mtcars |> dplyr::filter(stringr::str_detect(model, "^F")) However, it doesn’t use a dot to represent what’s being piped, so this code will not work: mtcars |> filter(grepl("^F", .[["model"]])) At least for now, there is no special character to represent the value being piped. In this example it hardly matters, since you don’t need a pipe to do something this simple. But what about more complex calculations where there isn’t an existing function with pipe-friendly syntax? Can you still use the new pipe? It’s often not the most efficient option, but you could create your own function using the original function and just switch arguments around or otherwise re-do code so that the first argument becomes pipe friendly. For example, my new mygrepl function has a data frame as its first argument, which is often the way pipes start out: mygrepl <- function(mydf, mycolumn, mypattern) { mydf[grepl(mypattern, mydf[[mycolumn]]),] } mtcars |> mygrepl("model", "^F") R 4.1 function shorthand And speaking of functions, R 4.1 has another interesting new feature. You can now use the backslash character as a shorthand for “function” in R 4.1. I think this was done mostly for so-called anonymous functions — i.e., functions you create within code that don’t have their own names. But it works for all functions. Instead of creating a new function with function(), you can now use (). For example: mygrepl2 <- (mydf, mycolumn, mypattern) { mydf[grepl(mypattern, mydf[[mycolumn]]),] } mtcars |> mygrepl2("model", "^F") R pipes and functions without arguments Finally, one last point about the new built-in pipe. If you’re piping into a function with no arguments, parentheses are optional with the maggritr pipe but required with the base R pipe. These both work for %>% : #Works: mtcars %>% tail() #Works: mtcars %>% tail But only the first version works with |> : #Works: mtcars |> tail() #Doesn't work mtcars |> tail You can see the new pipe in action, plus running R 4.1 in a Docker container, in the video at the top of this article. For more R tips and tutorials, head to my Do More With R page. Related content analysis 7 steps to improve analytics for data-driven organizations Effective data-driven decision-making requires good tools, high-quality data, efficient processes, and prepared people. Here’s how to achieve it. By Isaac Sacolick Jul 01, 2024 10 mins Analytics news Maker of RStudio launches new R and Python IDE Posit, formerly RStudio, has released a beta of Positron, a ‘next generation’ data science development environment based on Visual Studio Code. By Sharon Machlis Jun 27, 2024 3 mins Integrated Development Environments Python R Language feature 4 highlights from EDB Postgres AI New platform product supports transactional, analytical, and AI workloads. By Aislinn Shea Wright Jun 13, 2024 6 mins PostgreSQL Generative AI Databases analysis Microsoft Fabric evolves from data lake to application platform Microsoft delivers a one-stop shop for big data applications with its latest updates to its data platform. By Simon Bisson Jun 13, 2024 7 mins Microsoft Azure Natural Language Processing Data Architecture Resources Videos