Microsoft’s recent acquisition of Citus Data is Azure’s path to further enhancing its PostgreSQL database implementation Credit: Thinkstock Azures collection of open source data tools includes the popular PostgreSQL database. Postgres began its life in the mid-1980s as a follow-up to the University of California at Berkeley’s Ingres system. Now developed by an international consortium, it’s grown a lot over the decades, moving from its own query language to one based on SQL. That change led to its renaming as PostgreSQL in 1996. It has maintained a large academic following, with a lot of support in the machine learning community, as well as powering many popular consumer-facing services. Its enduring popularity has meant that it’s become one of the standard on-premises databases for large applications, with integration with most common development platforms and with web content management tools like WordPress and Drupal. A wide set of use cases and a massive user base makes PostgreSQL an essential component of any public cloud service, enabling quick lift-and-shift of existing applications and content. Managed PostgreSQL on Azure Given that history, it’s not surprising that there’s a managed PostgreSQL among Azure’s many services. Like Azure’s MariaDB, it’s a way to use a familiar database without needing to worry about provisioning and managing your database servers; all you need to do is set up an instance, load up your data, and start writing code. If you’re using WordPress or a similar content management tool, you can install it from the Azure Marketplace, upload your data, and go. There’s a lot of focus on lift-and-shift with tools like this, as organizations migrate from on-premises data centers to the cloud. While that’s still an important market for Microsoft, it’s one that’s going to decline as more and more applications either migrate to the cloud or are left running on-premises for regulatory and compliance reasons. By adding high-availability tools and scaling, Microsoft is looking to support new and existing born-in-the-cloud applications. PostgreSQL’s history as a tool for software-as-a-service (SaaS) providers lets Azure support them as well, either as additional capacity where necessary or as a complete replacement for existing data centers. Creating PostgreSQL databases Azure’s PostgreSQL is based on the Community Edition, so you’ll find it familiar. You can create a new database using the Azure Portal or the Azure CLI, because it’s a standard database option. Like all Azure services, you need to add your database to a resource group and define login usernames and passwords before Azure sets it up. You can choose a version if you need a specific set of features; otherwise it defaults to the latest major release. Like many Azure services built on its infrastructure-as-a-service (IaaS) environment Azure’s PostgreSQL needs a server size. You can choose from a basic tier, a general-purpose tier, and a tier intended for in-memory operations. Each option lets you pick either fourth- or fifth-generation servers where possible, because some regions now only support fifth-generation systems. Basic workloads are best used for test and development, or for small applications. Most applications instead will run off general purpose instances, with between two and 32 cores. As it’s a managed service, you’re able to add Azure-specific services including scaling with red replicas and using Azure Advanced Threat Protection to reduce security risks. Azure PostgreSQL is self-patching, so once you’re up and running you can leave Azure to manage the underlying virtual server and the PostgreSQL instance. Creating a database takes a few minutes, ready for use with familiar tools and SDKs. Developers running local copies of PostgreSQL on Linux or WSL systems will be able to connect to Azure with psql; using it to create and populate tables, or to try out queries. Apps can be built using any language with PostgreSQL support, either through applications like Django or via SQL. The next step for Azure: adding Citus Data Microsoft recently acquired Citus Data, which develops a PostgreSQL extension that simplifies scaling out databases. It’s a way of building distributed databases across multiple nodes and sites, working through a coordinator that handles access to the different worker nodes. Where things get interesting is how Citus Data works: as the coordinator node stores metadata about the data that’s then sharded across the various instances. You make queries against the coordinator, which then directs its own queries to workers before delivering responses to the querying application. Because the coordinator holds the worker data, it can produce optimized queries. So, a query for product data that’s been entered between 2005 and 2007 is only directed to the nodes that hold that data. By transforming queries automatically, Citus Data can manage load across shards, improving performance while still supporting standard SQL queries. Microsoft plans to add Citus Data to its own managed PostgreSQL instances, using it to scale across Azure instances and using multiple databases in a single region. The current open source Citus release includes tools for migrating from existing single node databases to its multinode system, including an option that keeps your existing data in sync with a new Citus Data-based system while you migrate your application and data. Once you’ve finished migrating data, you can test against your new Citus-enabled PostgreSQL service before switching to using it live. Scaling PostgreSQL in the public cloud Because the coordinator node handles query translation, your code should work with a scale-out PostgreSQL system without needing any changes, Citus Data supports most popular programming languages, and it includes Ruby on Rails integration for building and managing multitenant systems, so you can use it to run your own SaaS applications. There’s no need to switch from relational to NoSQL, because Citus’s features hide the complexity of running and managing a distributed database behind a set of familiar APIs and queries. Combining Azure’s managed PostgreSQL with Citus Data makes a lot of sense, especially if it can be automated as part of a managed service. Modifying Citus Data’s existing migration tools to work with live data should be possible, especially with Microsoft’s resources. Instead of having to manually add nodes and reconfigure tables, a future Azure PostgreSQL should be able to add new shards as required. All you’d need to do is decide if you want to optimize your database for performance or for scale. Having PostgreSQL support in a public cloud is, to be honest, table stakes. Azure has to compete with Google Cloud Platform and Amazon Web Services, as well as services like Heroku. By adding Citus Data to its own expertise, Azure now has the opportunity to significantly differentiate its offering from its competitors. Related content news SingleStore acquires BryteFlow to boost data ingestion capabilities SingleStore will integrate BryteFlow’s capabilties inside its database offering via a no-code interface named SingleConnect. By Anirban Ghoshal Oct 03, 2024 4 mins ETL Databases Data Integration feature 3 great new features in Postgres 17 Highly optimized incremental backups, expanded SQL/JSON support, and a configurable SLRU cache are three of the most impactful new features in the latest PostgreSQL release. By Tom Kincaid Sep 26, 2024 6 mins PostgreSQL Relational Databases Databases feature Why vector databases aren’t just databases Vector databases don’t just store your data. They find the most meaningful connections within it, driving insights and decisions at scale. By David Myriel Sep 23, 2024 5 mins Generative AI Databases Artificial Intelligence feature Overcoming AI hallucinations with RAG and knowledge graphs Combining knowledge graphs with retrieval-augmented generation can improve the accuracy of your generative AI application, and generally can be done using your existing database. By Dom Couldwell Sep 17, 2024 6 mins Graph Databases Generative AI Databases Resources Videos