How to avoid costly mistakes with DynamoDB partition keys, read/write capacity modes, and global secondary indexes Credit: Thinkstock Amazon DynamoDB is a managed NoSQL database in the AWS cloud that delivers a key piece of infrastructure for use cases ranging from mobile application back-ends to ad tech. DynamoDB is optimized for transactional applications that need to read and write individual keys but do not need joins or other RDBMS features. For this subset of requirements, DynamoDB offers a way to have a virtually infinitely scalable datastore that requires minimal maintenance. While DynamoDB is quite popular, one common complaint we often hear from developers is that DynamoDB is expensive. In particular, costs can scale sharply as usage grows in an almost surprising manner. In this post, we will examine three reasons why DynamoDB is perceived as being expensive at scale, and outline steps that you can take to make DynamoDB costs more reasonable. DynamoDB partition keys Given the simplicity in using DynamoDB, a developer can get pretty far in a short time. But there are some latent pitfalls that come from not thinking through the data distribution before starting to use it. To manage your data in DynamoDB effectively, an understanding of some DynamoDB internals—of how data is stored under the hood—is important. As we mentioned before, DynamoDB is a NoSQL datastore, which means the operations it supports efficiently are GET (by primary key or index) and PUT. Every record you store in DynamoDB is called an item, and these items are stored within partitions. These partitions are all managed automatically and not exposed to the user. Every item has a partition key that is used as input to an internal hash function to determine which partition the item will live within. The partitions themselves are stored on SSD and replicated across multiple Availability Zones in a region. There are some constraints on each individual partition: A single partition can store at most 10 GB of data. A single partition can support a maximum of 3000 read capacity units (RCUs) or 1000 write capacity units (WCUs). Given those limits, we know that our data may be placed on more partitions based on two criteria. If a single partition grows to over 10 GB in size, a new partition will need to be created to store more data. Similarly if the user’s requested read capacity or write capacity grows beyond what a single partition supports, new partitions will be created under the hood. In addition to partitions, another aspect that is worth understanding is how reads and writes are priced in DynamoDB. Reads and writes consume abstract units called RCUs (read compute units) and WCUs (write compute units). Each read or write in DynamoDB consumes these units, and therefore, as your read and write workload grows, you will consume more RCUs and WCUs, respectively. The partition key that we choose dictates how evenly the data gets distributed among the partitions. Choosing a partition key that is not very random is an anti-pattern that can cause an uneven distribution of data within these partitions. Until recently, the RCU and WCU allocations among partitions were inelastic and done statically. However, in the case of “hot keys” due to uneven distribution of data, some partitions would require more RCU and WCU allocations than others, and this led to the problem of over-provisioning RCUs and WCUs to ensure that the overloaded partitions had enough RCUs and WCUs. In 2018, Amazon introduced Amazon DynamoDB adaptive capacity, which alleviates this issue by allowing the allocation of RCUs and WCUs to be more dynamic between partitions. Today, DynamoDB even does this redistribution “instantly”. As a result, even with the hot key issue, there may not be an immediate need to overprovision far beyond the required RCUs and WCUs. However, if you recall the limit of WCUs and RCUs on a single partition and the overall size limit, if you are looking to allocate resources beyond those limits—as will be the case for some high traffic applications—you may run into high costs. Nike’s engineering blog on DynamoDB cost mentions this as one of the cost drivers for their setup. Interestingly, rather than redesign their partition keys, they chose to move some tables to a relational datastore. In short, partitioning the data in a sub-optimal manner is one cause of increasing costs with DynamoDB. Although this cause is somewhat alleviated by adaptive capacity, it is still best to design DynamoDB tables with sufficiently random partition keys to avoid this issue of hot partitions and hot keys. DynamoDB read/write capacity modes DynamoDB has a few different modes to pick from when provisioning RCUs and WCUs for your tables. Choosing the right mode can have large implications on your application performance as well as the costs that you incur. At the top level, there are two modes: provisioned capacity and on-demand capacity. Within provisioned capacity, you can get reserved pricing similar to how reserved instances work elsewhere in AWS, wherein you get discount pricing by committing a certain amount of spend to the product over a period of time. Then there is DynamoDB Autoscaling, which can be used in conjunction with provisioned capacity mode. The mode you should use depends on the type of application you are looking to build on top of DynamoDB. Provisioned capacity mode is when you pay for a certain number of RCUs and WCUs and they are available to your table at all times. This is the recommended mode of operation in the following cases: If you have a stable workload that exhibits similar requirements in RCU and WCU with very little variability. In conjunction with DynamoDB Autoscaling, if you have a workload that exhibits predictable variability—according to time of day, for example. If the cost of read/write throttling for your service is very high. If you have sudden spikes, or bursty workloads, this can prove expensive since the amount of capacity you provision needs to be beyond your spike to avoid throttling. Autoscaling can help when there is a gradual growth or decline in capacity consumption from your application, but it is often ineffective against spikes and bursts. If you choose to use autoscaling, some requests may get throttled as the capacity is adjusted, which may be unacceptable when operating a customer-facing application like an e-commerce website that can have an impact on your revenue. If we instead choose to provision more fixed capacity than any of our bursts/spikes will require, this will ensure that your users get the best experience. But it might also mean that a lot of capacity is wasted a lot of the time. When you are starting out with a new workload and you have not done capacity estimation for it, or when usage may be unpredictable, it can be a good cost-saving measure to switch to the on-demand mode. In on-demand mode, DynamoDB manages all capacity and scales up and down completely on its own. Some users have reported large cost savings by moving to on-demand mode from provisioned. Per RCU/WCU, on-demand mode can be 6x to 7x more expensive than provisioned capacity, but it does better at handling large variations between maximum and minimum load. On-demand mode is also useful for dev instances of tables where usage often drops to zero and spikes unpredictably. Will on-demand mode be cost-effective for your specific tables? That depends on your access patterns, scale of data, and business goals. Therefore, it is important to choose the correct mode and set up the right autoscaling for your particular table. The best mode for your table can vary based on use case, workload pattern, and error tolerance. DynamoDB scans and GSIs DynamoDB supports two different types of read operations, which are query and scan. A query is a lookup based on either the primary key or an index key. A scan is, as the name indicates, a read call that scans the entire table in order to find a particular result. The operation that DynamoDB is tuned for is the query operation when it operates on a single item or a few items in a table. DynamoDB also supports secondary indexes, which allow lookups based on keys other than the primary key. Secondary indexes also consume RCUs and WCUs during reads and writes. Sometimes it is important to run more complex queries on DynamoDB data. This might be finding the top 10 most-purchased items in some time period for an e-commerce retailer, or ad conversion rates for an ad platform. Scans are typically very slow for these types of queries, so the first step is typically to create a GSI (global secondary index). As Nike discovered, overusing global secondary indexes can be expensive. The solution Nike adopted was to move those workloads into a relational database. However, this is not always an option because there are transactional queries that work better on DynamoDB at scale than in a relational database that may need more tuning. For complex queries, especially analytical queries, you can gain significant cost savings by syncing the DynamoDB table with a different tool or service that is better suited for running complex queries efficiently. Rockset is one such engine for operational analytics that is cloud-native and does not require managing servers or infrastructure. Once provided with read access to a DynamoDB table, Rockset collections can replicate changes as they occur in DynamoDB by making use of changelogs in DynamoDB streams. This gives you an up-to-date (to within a few seconds) indexed version of your DynamoDB table within Rockset. You can run complex OLAP queries with the full power of SQL on this indexed collection and serve these queries by building either live dashboards or custom applications using the Rockset API and SDKs. This approach is significantly less expensive than running those queries directly on DynamoDB because Rockset is a search and analytics engine that is specifically tuned to index and run complex queries over semi-structured data. Making use of converged indexing, Rockset turns SQL queries into fast key lookups on RocksDB-Cloud under the hood. Each query is capable of taking advantage of distributed execution and the underlying indexes opportunistically to ensure that query results return in milliseconds. Rockset can be especially useful for developers looking to build operational analytical dashboards on top of their transactional datastore to monitor the current state of the system. Rockset users build live dashboards as well as power search applications by making use of this live sync and queries on Rockset. To sum up, poorly chosen partition keys, the wrong capacity mode, and overuse of scans and global secondary indexes are all causes of skyrocketing DynamoDB costs as applications scale. Much of the cost associated with DynamoDB tends to stem from either a lack of understanding of its internals, or from trying to retrofit it for a use case that it was never designed to serve efficiently. Choosing your partition key wisely, choosing a mode of operation that is appropriate for your workload, and using a special purpose operational analytics engine can improve the scalability and performance of your DynamoDB tables while keeping your DynamoDB bill in check. Anirudh Ramanathan is a software engineer with a focus on distributed systems and machine learning. He is passionate about start-ups and entrepreneurship. Currently, he works on the future of databases at Rockset and serves as an advisor to Doc.ai. — New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com. Related content news SingleStore acquires BryteFlow to boost data ingestion capabilities SingleStore will integrate BryteFlow’s capabilties inside its database offering via a no-code interface named SingleConnect. By Anirban Ghoshal Oct 03, 2024 4 mins ETL Databases Data Integration feature 3 great new features in Postgres 17 Highly optimized incremental backups, expanded SQL/JSON support, and a configurable SLRU cache are three of the most impactful new features in the latest PostgreSQL release. By Tom Kincaid Sep 26, 2024 6 mins PostgreSQL Relational Databases Databases feature Why vector databases aren’t just databases Vector databases don’t just store your data. They find the most meaningful connections within it, driving insights and decisions at scale. By David Myriel Sep 23, 2024 5 mins Generative AI Databases Artificial Intelligence feature Overcoming AI hallucinations with RAG and knowledge graphs Combining knowledge graphs with retrieval-augmented generation can improve the accuracy of your generative AI application, and generally can be done using your existing database. By Dom Couldwell Sep 17, 2024 6 mins Graph Databases Generative AI Databases Resources Videos