Peter Wayner
Contributing writer

9 reasons SQL has got to go

feature
Dec 11, 20238 mins
Data ManagementDatabasesSoftware Development

Why is the language developers and DBAs use to organize our data such a mess? Here are nine reasons we wish we could quit SQL, even though we probably won't.

Garbage can paper trash
Credit: Thinkstock

For all its popularity and success, SQL is a study in paradox. It can be clunky and verbose, yet developers often find it is the simplest, most direct way to extract the data they want. It can be lightning quick when a query is written correctly, and slow as molasses when the query misses the mark. It’s decades old, but new features are constantly being bolted on.

These paradoxes don’t matter because the market has spoken: SQL is the first choice for many, even given newer and arguably more powerful options. Developers everywhere—from the smallest websites to the biggest mega corporations—know SQL. They rely on it to keep all their data organized.

SQL’s tabular model is so dominant that many non-SQL projects end up adding an SQLish interface because users demand it. This is even true of the NoSQL movement, which was invented to break free from the old paradigm. In the end, it seems, SQL won.

SQL’s limitations may not be enough to drive it into the dustbin. Developers may never rise up and migrate all their data away from SQL. But SQL’s problems are real enough to generate stress for developers, add delays, and even require re-engineering for some projects.

Here are nine reasons we wish we could quit SQL, even though we know we probably won’t.

9 ways SQL makes things worse

  1. Tables don’t scale
  2. SQL isn’t JSON- or XML-native
  3. Marshaling is a big time-sink
  4. SQL doesn’t do real-time
  5. JOINS are a headache
  6. Columns are a waste of space
  7. Optimizer only helps sometimes
  8. Denormalization treats tables like trash
  9. Bolted-on ideas can wreck your database

Tables don’t scale

The relational model loves tables, and so we just keep building them. This is fine for small or even normal-sized databases. But the model starts to break down at truly large scales.

Some try to solve the problem by bringing together old and new, like integrating sharding into an older open source database. Adding layers might seem to make data simpler to manage and offer infinite scale. But those added layers can hide landmines. A SELECT or a JOIN can take vastly different amounts of time to process depending on how much data is stored in the shards.

Sharding also forces the DBA to consider the possibility that data may be stored in a different machine, or maybe even a different geographic location. An inexperienced administrator who starts searching across a table may get confused if they don’t realize the data is stored in different locations. The model sometimes abstracts the location away from view. 

Some AWS machines come with 24 terabytes of RAM. Why? Because some database users need that much. They have that much data in an SQL database and it runs much better in one single machine in one single block of RAM.

SQL isn’t JSON- or XML-native

SQL may be evergreen as a language, but it doesn’t play particularly well with newer data exchange formats like JSON, YAML, and XML. All of these support a more hierarchical and flexible format than SQL does. The guts of the SQL databases are still stuck in the relational model with tables everywhere.

The market finds ways to paper over this common complaint. It’s relatively easy to add a different data format like JSON with the right glue code, but you’ll pay for it with lost time.

Some SQL databases are now able to encode and decode more modern data formats like JSON, XML, GraphQL, or YAML as native features. But on the inside, the data is usually stored and indexed using the same old tabular model.

How much time is spent converting data in and out of these formats? Wouldn’t it be easier to store our data in a more modern way? Some clever database developers continue to experiment, but the odd thing is, they often end up bolting on some kind of SQL parser. That’s what the developers say they want.

Marshaling is a big time-sink

Databases may store data in tables, but programmers write code that deals with objects. It seems like much of the work of designing data-driven applications is figuring out the best way to extract data from a database and turn it into objects the business logic can work with. Then, the data fields from the object must be unmarshaled by turning them into an SQL upsert. Isn’t there a way to leave the data in a format that’s just ready to go?

SQL doesn’t do real-time

The original SQL database was designed for batch analytics and interactive mode. The model of streaming data with long processing pipelines is a relatively new idea, and it doesn’t exactly match.

The major SQL databases were designed decades ago when the model imagined the database sitting off on its own and answering queries like some kind of oracle. Sometimes they respond quickly, sometimes they don’t. That’s just how batch processing works.

Some of the newest applications demand better real-time performance—not only for convenience but because the application requires it. Sitting around like a guru on a mountain doesn’t work so well in the modern, streaming world.

The newest databases designed for these markets put speed and responsiveness at a premium. They don’t offer the kind of elaborate SQL queries that can slow everything to a halt.

JOINs are a headache

The power of relational databases comes from splitting up data into smaller, more concise tables. The headache comes afterward.

Reassembling data on the fly with JOINs is often the most computationally expensive part of a job because the database has to juggle all the data. The headaches begin when the data starts to outgrow the RAM.

JOINs can be incredibly confusing for anyone learning SQL. Figuring out the difference between the inner and outer JOINs is only the beginning. Finding the best way to link together several JOINs makes it worse. The internal optimizers might lend a hand, but they can’t help when the database admin asks for a particularly complex combination.

Columns are a waste of space

One of the great ideas of NoSQL was giving users freedom from columns. If someone wanted to add a new value to an entry, they could choose whatever tag or name they wanted. There was no need to update the schema to add a new column.

SQL defenders see only chaos in that model. They like the order that comes with tables and don’t want developers adding new fields on the fly. They have a point, but adding new columns can be pretty expensive and time-consuming, especially in big tables. Putting the new data in separate columns and matching them with JOINs adds even more time and complexity.

Optimizer only helps sometimes

Database companies and researchers have spent a great deal of time developing good optimizers that take apart a query and find the best way to order its operations.

The gains can be significant but there are limits to what the optimizer can do. If the query demands a particularly large or ornate response, well, the optimizer can’t just say, “Are you really sure?” It’s got to assemble the answer and do as it’s told.

Some DBAs only learn this as the application begins to scale. The early optimizations are enough to handle the test data sets during development. But at crunch time, there’s no more juice for the optimizer to squeeze out of the query.

Denormalization treats tables like trash

Developers often find themselves caught between users who want faster performance and the bean counters who don’t want to pay for bigger, more expensive hardware. A common solution is to denormalize tables so there’s no need for complex JOINs or cross-tabular anything. All the data is already there in one long rectangle.

This isn’t a bad technical solution, and it often wins because disk space has become cheaper than processing power. But denormalization also tosses aside the cleverest parts of SQL and relational database theory. All that fancy database power is pretty much obliterated when your database becomes one long CSV file.

Bolted-on ideas can wreck your database

Developers have been adding new features to SQL for years, and some are pretty clever. It’s hard to be upset about cool features you don’t have to use. On the other hand, these bells and whistles are often bolted on, which can lead to performance issues. Some developers warn that you should be extra careful with subqueries because they’ll slow everything down. Others say that selecting subsets like Common Table Expressions, Views, or Windows over-complicates your code. The code’s creator can read it, but everyone else gets a headache trying to keep all the layers and generations of SQL straight. It’s like watching a film by Christopher Nolan but in code.

Some of these great ideas get in the way of what already works. Window functions were designed to make basic data analytics faster by speeding up the computation of results like averages. But many SQL users will discover and use some bolted-on feature instead. In most cases, they’ll try the new feature and only notice something is wrong when their machine slows to a crawl. Then they’ll need some old and gray DBA to explain what happened and how to fix it.