Home Analytics How to use .SD in the R data.table package

by Sharon Machlis

Executive Editor, Data & Analytics

How to use .SD in the R data.table package

how-to

Jul 18, 20194 mins

AnalyticsR LanguageSoftware Development

See how to use data.table's special .SD symbol to perform calculations and other tasks by group

Do More With R [video teaser/video series] - R Programming Guide - Tips & Tricks

Credit: Thinkstock

For some data.table users, “dot-SD” is a bit of a mystery. But data.table creator Matt Dowle told me that it’s actually quite simple: Just think of it as a symbol representing “each group.” Let’s go through a couple of examples.

I have a data set of daily cycling trips from the Boston area’s bicycle-share system. If you’d like to follow along, you can download the CSV file from the link at the bottom of this article.

I’ll load data.table and import my CSV file using data.table’s fread() function. In the code below, I’m saving the data into a data table called mydt.

library(data.table)
mydt <- fread("daily_cycling_trips_by_usertype.csv")

Next, I suggest printing the first six lines with head(mydt) to see what the data looks like. You’ll see that the data has columns for the date, the user type (subscriber or single-trip customer), number of trips, year, and month starting date to help with totals by month.

The first example Matt suggested: Print the first few rows of the data table grouped by user type. (We’re filtering for the first 12 rows just to make it easier to see the output).

mydt[1:12, print(.SD), by = usertype]

print() iterated over each group and printed two separate times, one for each user type. The problem, though, is I don’t know which is the customer user group and which is the subscriber user group. The “by” column didn’t print out. Fortunately, Matt showed me a little trick for that.

If you’re familiar with mydt[i, j, by] data.table syntax, there are three parts to the bracket notation after the data table name: i, j, and by. i is for filtering rows, j is for what you want to do, and by is how you want to group your data.

For example:

mydt[1:12, { print(.SD) }, by = usertype]

In the line of code above, I’ve just put curly braces around the j part. That’s going to let me add multiple R expressions inside the j argument. Now it’s still the same as before: no user type names.

But in this next line of code, look at the R statement I added (well, Matt told me to add): print(.BY).

mydt[1:12, { print(.BY); print(.SD) }, by = usertype]

.BY is a special data.table symbol that holds the value of by – what column or columns I’m grouping by.

If you run this code, you’ll have the name of each grouping variable along with the printout.

Results of printing by group with data.table and .SD.

So that’s a very basic example. I’m guessing you might want to do something a little more interesting with .SD than print, though. Next let’s look at summarizing the data by group, calculating which day had the most trips each month this year.

This line of code has it all:

mydt[Year == "2019", .SD[which.max(Trips)], by = MonthStarting]

The i first argument in the brackets filters for any rows where the year is 2019. The j argument is the interesting part for .SD. Think of .SD as referring to each group of your data. Or as Matt said, “You do j by by. Like a for loop.”

What if you want to see maximums for each month and user type? Just add another column to the by (third) argument:

mydt[Year == "2019", .SD[which.max(Trips)], 
    by = .(MonthStarting, usertype)]

There are several ways to express grouping by more than one column in data.table. One way is with the dot before the unquoted column names, as above. Another is to use list instead of the dot, for example:

mydt[Year == "2019", .SD[which.max(Trips)], 
     by = list(MonthStarting, usertype)]

You can also use a conventional base R vector with quotation marks around each column name.

mydt[Year == "2019", .SD[which.max(Trips)], 
by = c("MonthStarting", "usertype")]

For more R tips, head to the “Do More With R” video page on InfoWorld or check out the “Do More With R” YouTube playlist.

download

Sample Bicycle Trip Data

CSV file to accompany my “How to use .SD in the R data.table package” article and video Sharon Machlis

Hope to see you next episode!

by Sharon Machlis

Executive Editor, Data & Analytics

Sharon Machlis is Director of Editorial Data & Analytics at Foundry (the IDG, Inc. company that publishes websites including Computerworld and InfoWorld), where she analyzes data, codes in-house tools, and writes about data analysis tools and tips. She holds an Extra class amateur radio license and is somewhat obsessed with R. Her book Practical R for Mass Communication and Journalism was published by CRC Press.

Topics

About

Policies

Our Network

More

How to use .SD in the R data.table package

See how to use data.table's special .SD symbol to perform calculations and other tasks by group

More from this author

Shiny for Python adds chat component for generative AI chatbots

Maker of RStudio launches new R and Python IDE

5 easy ways to run an LLM locally

How to run R in Visual Studio Code

Posit lays off R Markdown, knitr creator Yihui Xie

8 ChatGPT tools for R programming

OpenAI DevDay: 3 new tools to build LLM-powered apps

Python Pandas creator Wes McKinney joins Posit

Show me more

Microsoft extends Entra ID to WSL, WinGet

Microsoft rebrands Azure AI Studio to Azure AI Foundry

Succeeding with observability in the cloud

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx

How to use .SD in the R data.table package

See how to use data.table's special .SD symbol to perform calculations and other tasks by group

Related content

What is Rust? Safe, fast, and easy software development

Kotlin for Java developers: Classes and coroutines

Azure AI Foundry tools for changes in AI applications

Microsoft unveils imaging APIs for Windows Copilot Runtime

More from this author

Shiny for Python adds chat component for generative AI chatbots

Maker of RStudio launches new R and Python IDE

5 easy ways to run an LLM locally

How to run R in Visual Studio Code

Posit lays off R Markdown, knitr creator Yihui Xie

8 ChatGPT tools for R programming

OpenAI DevDay: 3 new tools to build LLM-powered apps

Python Pandas creator Wes McKinney joins Posit

Show me more

Microsoft extends Entra ID to WSL, WinGet

Microsoft rebrands Azure AI Studio to Azure AI Foundry

Succeeding with observability in the cloud

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx