Sharon Machlis
Executive Editor, Data & Analytics

How to write your own ggplot2 functions in R

how-to
Jul 24, 20195 mins
Data VisualizationR LanguageSoftware Development

You no longer have to worry about quoted and unquoted column names when using ggplot2, thanks to the latest version of the rlang package

Do More With R [video teaser/video series] - R Programming Guide - Tips & Tricks
Credit: Thinkstock

Tidyverse packages like ggplot2 and dplyr have a function syntax that is usually pretty handy: You don’t have to put column names in quotation marks. For example: 

dplyr::filter(mtcars, mpg > 30)

Note the column name, mpg, is unquoted.

That feature hasn’t been handy, though, if you want to write your own R functions using the tidyverse. That’s because base R functions usually need quoted column names while tidyverse functions generally don’t.

But that problem has a simple solution now, thanks to the latest version of the rlang package. And that means it’s very easy to create your own ggplot functions for your favorite customized graphs.

Let me go through an example, using data from Zillow with estimated median home values. In the code below, I load a couple of packages, set my data file name, and use base R’s download.file function to download a CSV from Zillow. Final data prep steps: Import that CSV into R and filter for rows where City is Boston. (I’m using the rio package for data import because I love rio, but you can use something else like read_csv() or fread().) If you’re following along, feel free to filter for another city.

library(dplyr)
library(ggplot2)
# File name I want to download data to:
myfilename <- "Zillow_neighborhood_home_values.csv"
# If go.infoworld.com/ZillowData doesn't work, full URL is
# http://files.zillowstatic.com/research/public/Neighborhood/Neighborhood_Zhvi_Summary_AllHomes.csv
download.file("https://go.infoworld.com/ZillowData", myfilename)
bos_values <- rio::import("Zillow_neighborhood_home_values.csv") %>%
  filter(City == "Boston")

Next, I’ll create a horizontal bar chart with some customizations I often like to use. I’m ordering the bars from highest to lowest values, outlining them in black, coloring them in blue, and changing the ggplot2 default gray background.

ggplot(data = bos_values, aes(x=reorder(RegionName, Zhvi), y=Zhvi)) +
  geom_col(color = "black", fill="#0072B2") +
  xlab("") +
  ylab("") +
  ggtitle("Zillow Home Value Index by Boston Neighborhood") +
  theme_classic()   +
  theme(plot.title=element_text(size=24))  +
  coord_flip()

What if I’d like to make my own function to quickly generate a graph like this with any data frame? More specifically, a function with input arguments of the data frame name, the x column, the y column, and the graph title? 

Below is one attempt to create a function called mybarplot with the customizations I want, without using the rlang package. However, it won’t work.

mybarplot <- function(mydf, myxcol, myycol, mytitle) {
 ggplot(data = mydf, aes(x=reorder(myxcol, myycol), y=myycol)) +
    geom_col(color = "black", fill="#0072B2") +
    xlab("") +
    ylab("") +
    coord_flip() +
    ggtitle(mytitle) +
    theme_classic()   +
    theme(plot.title=element_text(size=24))  
}

I’ll show you what happens if I try to call that function using unquoted column names. For instance: 

mybarplot(bos_values, RegionName, Zhvi, 
          "Zillow Home Value Index by Boston Neighborhood")

The result is I get an error, as you can see in the video above. If I call the function with quoted column names, I get a graph — but not the graph I want.

graph with all bars the same height Sharon Machlis, IDG

This is not the graph I want when trying to create a custom ggplot2 function.

This is due to the issue of base R needing quoted column names while ggplot doesn’t.

Older versions of the rlang package had a multi-step solution for this, as I covered in an earlier episode of “Do More With R,” “Tidy Eval in R.” The current version of rlang solves the problem with a new operator called the tidy evaluation operator — double curly braces. You just put the curly braces around the unquoted column names inside your function, and you’re done!

Note that you need at least version 0.4.0 of the rlang package for this to work. At the time I wrote this article, version 0.4.0 was on CRAN but you needed to compile it from source when given that option during installation, at least on a Mac.

In the code below, I load rlang and tweak my bar plot function so every time I refer to a column name within ggplot, I surround it with double curly braces — “curly curly” is how the package creators refer to it. 

library(rlang)
mybarplot <- function(mydf, myxcol, myycol, mytitle) {
   ggplot2::ggplot(data = mydf, aes(x=reorder({{ myxcol }}, 
      {{ myycol }}), y= {{ myycol }})) +
    geom_col(color = "black", fill="#0072B2") +
    xlab("") +
    ylab("") +
    coord_flip() +
    ggtitle(mytitle) +
    theme_classic()   +
    theme(plot.title=element_text(size=24))
}

Now I can call my function with

mybarplot(bos_values, RegionName, Zhvi, 
          "Zillow Home Value Index by Boston Neighborhood")

Just as with tidyverse functions, I didn’t need to put the column names in quotation marks. It creates a graph like the one below

Graph of median home values by Boston neighborhood Sharon Machlis, IDG

Graph of median home values by Boston neighborhood, created with a custom ggplot2 function. Data from Zillow.

I can still tweak the graph created by my function, using other ggplot commands. In the next block of code, I save the graph created by my custom function to a variable and then make some more changes. The geom_text() code displays the median value onto each bar, and theme() sets the graph headline size.

mygraph <- mybarplot(bos_values, RegionName, Zhvi, 
                     "Zillow Home Value Index by Boston Neighborhood")
mygraph +
  geom_text(aes(label=scales::comma(Zhvi, prefix = "$")), 
            hjust=1.0, colour="white", position=position_dodge(.9), size=4) +
  theme(plot.title=element_text(size=24))

The new graph would look like this:

Graph of median Boston home values by neighborhood, displaying values on the bars. Sharon Machlis, IDG

Graph created by a custom ggplot function and then tweaked with ggplot code outside the function. Data from Zillow.

For more R tips, head to the “Do More With R” page at InfoWorld or the “Do More With R” playlist on YouTube.

Sharon Machlis
Executive Editor, Data & Analytics

Sharon Machlis is Director of Editorial Data & Analytics at Foundry (the IDG, Inc. company that publishes websites including Computerworld and InfoWorld), where she analyzes data, codes in-house tools, and writes about data analysis tools and tips. She holds an Extra class amateur radio license and is somewhat obsessed with R. Her book Practical R for Mass Communication and Journalism was published by CRC Press.

More from this author