Document databases offer a wonderfully flexible data model that often leads to scaling and performance issues. Here’s how Aerospike overcomes these challenges. Credit: Raspirator / Getty Images Digital transformation continues to be a top initiative for enterprises. As they embark on this journey, it is essential they leverage data strategically to succeed. Data has become a critical asset for any business—helping to increase revenue, improve customer experiences, retain customers, enable innovation, launch new products and services, and expand markets. To capitalize on the data, enterprises need a platform that can support a new generation of real-time applications and insights. In fact, by 2025, it is estimated that 30% of all data will be real-time. For businesses to flourish in this digital environment, they must deliver exceptional customer experiences in the moments that matter. The document database has emerged as a popular alternative to the relational database to help enterprises manage the fast-growing and increasingly complex unstructured data sets in real time. It provides storage, processing, and access to document-oriented data, supports horizontal scale-out architecture using a schema-less and flexible data model, and is optimized for high performance. Document databases support all types of database applications, from systems of engagement to systems of automation to systems of record. All of these systems help create the 360-degree customer profiles that companies need to provide exceptional service. Supporting documents more efficiently Document databases offer a data model that supports documents more efficiently. They store each row as a document, with the flexibility to model lists, maps, and sets, which in turn can contain any number of nested columns and fields, which relational models can’t do. Since documents are variable in every business operation, this flexibility helps address new business requirements. These attributes enable document databases to deliver high performance on reads and writes, which is important when there are thousands of reads per second. As enterprises go from thousands to billions of documents, they need more CPUs, storage, and network bandwidth to store and access tens and hundreds of terabytes of documents in real time. Document databases can elastically scale to support dynamic workloads while maintaining performance. While some document databases can scale, some have limitations. Scale is not just about data volumes. It’s also about latency. Enterprises today push the boundaries with scaling: They need to support ever-growing volumes of data, and they need low-latency access to data and sub-millisecond response time. Developers can’t afford to wait to get a document into a real-time application. It has to happen quickly. As more enterprises have to do more with fewer resources, a document database should be self-service and automated to simplify administration and optimization—reducing overhead and enabling higher productivity. Developers shouldn’t have to spend much time optimizing queries and tuning systems. A document database also needs API support to help quickly build modern microservices applications. Microservices deal with many APIs. The performance will slow if an application makes 10 different API calls to 10 repositories. A document database enables these microservices applications to make a single API call. Aerospike’s real-time document database at scale A real-time document database should have an underlying data platform that provides quick ingest, efficient storage, and powerful queries while delivering fast response times. The Aerospike Document Database offers these capabilities at previously unattainable scales. Document storage JSON, a format for storing and transporting data, has passed XML to become the de facto data model for the web and is commonly used in document databases. The Aerospike Document Database lets developers ingest, store, and process JSON document data as Collection Data Types (CDTs)—flexible, schema-free containers that provide the ability to model, organize, and query a large JSON document store. The CDT API models JSON documents by facilitating list and map operations within objects. The resulting aggregate CDT structures are stored and transferred using the binary MessagePack format. This highly efficient approach reduces client-side computation and network costs and adds minimal overhead to read and write calls. Aerospike Figure 1: An example of Aerospike’s Collection Data Types. Document scaling The Aerospike Document Database uses set indexes and secondary indexes for nested elements of JSON documents, enabling it to achieve high performance and petabyte scaling. Indexes avoid the unnecessary scanning of an entire database for queries. Aerospike Figure 2: Aerospike secondary indexes. The Aerospike Document Database also supports Aerospike Expressions, a domain-specific language for querying and manipulating record metadata and data. Queries using Aerospike Expressions perform fast and efficient value-based searches on documents and other datasets in Aerospike. Document query The CDT API discussed above includes the necessary elements to build the Aerospike Document API. Using the JSONPath standard, the Aerospike Document API gives developers a programmatic way to implement CRUD (create, read, update, and delete) operations via JSON syntax. JSONPath queries allow developers to query documents stored in Aerospike bins using JSONPath operators, functions, and filters. In Figure 3 below, developers send a JSONPath query to Aerospike stating the appropriate key and the bin name that stores the document, and Aerospike returns the matching data. CDT operations use the syntax Aerospike supports (syntax not supported by Aerospike is split), and the JSONPath library processes the result. Developers can also put, delete, and append items at a path matching a JSONPath query. Additionally, developers can query and extract documents stored in the database using SQL with Presto/Trino. Aerospike Figure 3: JSONPath queries. Transforming the document database Today’s document databases often suffer from performance and scalability challenges as document data volumes explode. The greater richness and nested structures of document data expose scaling and performance issues. Developers typically need to re-architect and tweak applications to deliver reasonable response times when working with a terabyte of data or more. Aerospike’s document data services overcome these challenges by providing an efficient and performant way to store and query document data for large-scale, real-time, web-facing applications. Srini Srinivasan is the founder and chief product officer at Aerospike, a real-time data platform leader. He has two decades of experience designing, developing, and operating high-scale infrastructures. He has more than 30 patents in database, web, mobile, and distributed systems technologies. He co-founded Aerospike to solve the scaling problems he experienced with internet and mobile systems while he was senior director of engineering at Yahoo. — New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com. Related content news SingleStore acquires BryteFlow to boost data ingestion capabilities SingleStore will integrate BryteFlow’s capabilties inside its database offering via a no-code interface named SingleConnect. By Anirban Ghoshal Oct 03, 2024 4 mins ETL Databases Data Integration feature 3 great new features in Postgres 17 Highly optimized incremental backups, expanded SQL/JSON support, and a configurable SLRU cache are three of the most impactful new features in the latest PostgreSQL release. By Tom Kincaid Sep 26, 2024 6 mins PostgreSQL Relational Databases Databases feature Why vector databases aren’t just databases Vector databases don’t just store your data. They find the most meaningful connections within it, driving insights and decisions at scale. By David Myriel Sep 23, 2024 5 mins Generative AI Databases Artificial Intelligence feature Overcoming AI hallucinations with RAG and knowledge graphs Combining knowledge graphs with retrieval-augmented generation can improve the accuracy of your generative AI application, and generally can be done using your existing database. By Dom Couldwell Sep 17, 2024 6 mins Graph Databases Generative AI Databases Resources Videos