by Jordan Tigani

5 tips for choosing the right database

feature
Mar 15, 20226 mins
AnalyticsCloud ComputingDatabases

Choosing the wrong database for data-intensive applications opens a door to scaling challenges and unnecessary complexity. Making the right choice is simpler.

The database that you choose today will impact your application and development efforts far into the future. Yet a developer’s choice of a database tends to be an emotional decision, and developers often choose a database based solely on what their applications need at the outset.

If a database seems exciting, a developer may just go with their gut. And they may forego doing analysis of whether or not the database will work for their application today and in the future.

A developer may also be overwhelmed by having to choose between the many available databases. This can create paralysis followed by the selection of a database that addresses what an application starts out as. But you don’t always know what all the use cases of an application will be. Applications tend to start simple and become more complicated over time.

So, what typically happens is that a developer starts with PostgreSQL. The developer then adds MongoDB because they want to work with semi-structured data and need a flexible schema. The developer then adds Elasticsearch to do a log search or faceted search. And then they find that it isn’t fast enough, so they bring in Redis as a cache on top of that. And when they want to do analytics, they spin up a data warehouse such as Snowflake.

It all gets confusing quickly. The resulting database sprawl leaves developers to worry about moving data between databases and using costly extract, transform, and load (ETL) processes.

But it doesn’t have to be that way. Developers can get everything they need by taking a different approach. Here is how to select a database that will meet all of your needs.

Look beyond the immediate future

Database migrations and re-platforming are no picnic. As you select your database, keep in mind that you are making a choice that’s hard to undo and can lock you into an architecture.

Think about whether your database of choice both meets your current requirements and can get you where you need to go with your application in the future. Ask yourself these questions:

  • What will you need from your database if your application goes viral?
  • Does your database of choice provide the ability to add cool new features later on?
  • Are you picking a database solely based on what will allow you to show your demo now?

You may be leaning toward just spinning up MongoDB and leaving it at that. But keep in mind that making a snap decision today may cause you heartache later on.

Avoid cheating as you scale out

Traditional databases were based on scale-up architectures. That meant if you wanted to go faster, you had to buy bigger hardware. This added considerable expense to your efforts. But modern databases are typically based on scale-out architectures. You still need additional hardware, but the cost is proportional to the value you must add. So, it’s much more palatable.

Understanding the appeal of scale-out architectures, old school databases like MySQL and PostgreSQL have patched in scale-out capabilities. One common approach used to do this involves sharding, or breaking the database into separate pieces (or shards). As a result, instead of having one giant MySQL instance, you might have 10 smaller MySQL instances.

Be aware that if you choose a database provider that takes this approach to scale out, you may get into trouble if you need to run queries that need to share data across shards. Analytics-style queries, where you need to find the top customers in a region or the most active users, can be particularly problematic in this kind of architecture.

Run both hot and cold

Column stores are ideal for analytics because they allow you to scan large amounts of data quickly. Row stores are a better match for transactional workloads, which require low latency lookups and updates.

In the past, you had to choose between column-oriented and row-oriented storage in making a database selection. But you no longer have to make that difficult choice. Modern databases that combine row and column storage into the same table are now available.

In this scenario, data is written into an in-memory row store to enable incredibly fast transactions and lookups. As data quiesces, colder data is written back to the column store. This lets you do efficient analytics on it as well.

Move faster, but don’t get physical

Traditional databases were designed around magnetic disks, which are the primary source of latency. While database vendors have designed algorithms to minimize that latency, there are physical limits to the amount of time it takes for spinning disks and disk heads to get into position.

You can reduce IO-related delay by adopting a modern database with no moving parts. Solid-state drives (SSDs) are up to 200 times faster than spinning disks. In order to persist a database write, you just have to do an append. You get persistence by writing to a transaction log — appending to the transaction log. If you’re just doing read or lookup queries, you shouldn’t need to hit the disk at all.

Understand that less is more

The databases of the past lack the speed, flexibility, and capability to support all of your applications. That’s why so many businesses have amassed so many different databases over time.

Your data-intensive applications call for a new kind of database.

Seek out a single database that allows you to scale efficiently and to dramatically simplify your architecture. Find a database designed to handle both transactional and analytical workloads. Find a database that does fast analytical queries across large, dynamic data sets with high concurrency. And employ a multi-model database that addresses all data types.

Jordan Tigani is chief product officer at SingleStore. He previously was one of the founding engineers on Google BigQuery. In his decade building out that product, he served as director of engineering and then director of product management, and co-authored two books about the topic. Jordan also spent several years at Microsoft as an engineer on the Windows kernel and working on runtime binary software analysis in Microsoft Research. Jordan has a Bachelor of Arts in Electrical Engineering from Harvard University in 1998 and a Master of Science in Computer Science from University of Washington in 2008.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.