Simon Bisson
Contributor

Jumping into Azure Arc Data Services

analysis
Jan 05, 20217 mins
Cloud ComputingDatabasesMicrosoft Azure

Azure Arc brings Azure databases to your on-premises servers and applications.

Microsoft may have promised multiple future releases of on-premises Windows Server, but that doesn’t mean you continue to manage and use those servers the way you always have. Key to Windows Server’s future is Microsoft’s hybrid cloud strategy which gives equal weight to on-premises hardware and its Azure hyperscale cloud. Technologies such as Windows Admin Center and Azure Arc bring Web-based monitoring and administration to your servers, providing a bridge between the Azure Portal and Windows.

Using Azure Arc with databases

Azure Arc is an important piece of that puzzle, allowing you to separate physical infrastructure from virtual, adding a new tier of management tools that support your applications directly. With Arc, you use cloud-focused management tools to work with on-premises resources, deploying virtual machine images and virtual appliances, managing virtual networks and storage pools, as well as deploying managed Kubernetes services on your own hardware. It’s no wonder that Azure Arc is the management layer for the latest versions of Microsoft’s Azure Stack HCI hyperconverged server implementation.

At its 2019 launch, Microsoft promised three roles for Azure Arc: working with virtual infrastructures, managing Kubernetes, and supporting local instances of cloud databases. The first is now generally available, with Kubernetes support in preview. Now it’s time for the data services facet of Arc to get its time in the sun, with a public preview of support for Azure’s PostgreSQL Hyperscale and SQL Managed Instance, as well as for the familiar SQL Server.

Microsoft treats Arc’s database tools as two different options, separating the Arc-managed and locally hosted Azure data services from SQL Server. It’s a sensible choice if you’re looking to use Arc to manage workloads and data that may migrate from your servers to Azure. The SQL Server option lets you migrate existing on-premises data to Azure Stack HCI or another Arc-enabled server cluster as part of a migration process, getting data onto new hardware before moving it to newer, cloud-ready database technologies.

Using containers to deploy cloud applications on-premises

Working with Azure Arc’s data services has other advantages. You’re now working with Azure-managed applications on your own hardware, handing over responsibility for updates to Microsoft. The databases run in containers, with your data and any stored procedures stored on your Arc cluster’s virtualized storage array, separate from the database engine. Using Arc’s management tools, you can apply new database containers from the Azure Container Registry as new patched versions are released.

Using containers as a package like this changes the way you need to think about your databases. Instead of applying patches to a binary, having to schedule downtime and testing, you’re treating the executables as idempotent. Once downloaded and running, a container’s contents will not change, with configurations and data stored outside. Any update now means simply stopping operations to swap in a preconfigured, pretested, updated container, and restarting. If you’re running in a cluster, this can be done with nearly zero downtime: stopping one host, loading a new database container, and then restarting before failing over to the new container and updating additional nodes.

Arc’s container-based approach to data services, along with support for Windows Server’s clustering, makes it possible to approach the same level of elasticity as the public cloud, instantiating new instances as demand increases. You are limited by your available managed hardware, as this model only works with servers that are managed by Azure Arc. However, if you’re working with bursty workload where demand is uneven, this approach can make life a lot easier for you and your users. Support for Azure data tools with on-premises data should help prepare for cloud migrations, while providing a centralized location for logs and analytic tools that should help with application development and support.

Setting up Azure Arc Data Services

Getting started with Azure Arc Data Services can take some time as it entails setting up the entire suite of Azure data management tools before installing and running database containers. The required tools include the Azure Data Studio development and query tool, with Azure Arc extensions installed alongside the Azure Data CLI for command-line operations. Next you need the appropriate management extensions for the database you intend to use, for example adding a PostgreSQL extension for working with PostgreSQL Hyperscale. To complete the set of prerequisites, install the Azure CLI and the Kubernetes kubectl tool as the data service run on Azure Arc’s Kubernetes installation.

Once you have the prerequisites in place, you need to create a data controller. This is deployed from Azure Data Studio and targeted at an Azure Arc-managed Kubernetes install, using your choice of Kubernetes namespace for management. While you install the metadata about your controller in Azure, the controller and databases are installed in your local Kubernetes cluster. This approach allows the Azure Portal to manage your data services. Once the Kubernetes configuration file is ready, Arc will deploy the data controllers’ pods in your data service’s namespace. There is also the option of creating the controller from the Azure portal or from the Azure CLI.

With a data controller in place, you can begin configuring your data service. Again, you can use the Azure CLI or Kubernetes native tool to set up your service containers in Kubernetes pods, though again I’d recommend working from the Azure Data Studio. Using the same tools as your database administrators to manage the data service makes a lot of sense and allows you to train them so they can handle future deployments themselves.

Working with PostgreSQL Hyperscale in Azure Arc

In Azure Data Studio, choose a new deployment and pick your data service. These are marked as being for Azure Arc, so for example, choose PostgreSQL Hyperscale. First, name the server group, set administrator passwords, and choose the storage options you’ll use. Finally, you can pick the number of worker nodes and then deploy the container, ready for use.

Once the deployment is complete, you have a PostgreSQL Hyperscale cluster up and running, which you can use much as you would in Azure. It’s important to remember that you’re now running a distributed database using the tools Microsoft gained from its Citus acquisition in 2019. This requires you to think differently about how you work with data and how it’s partitioned across the database nodes. The resulting system and configuration will be portable between your local Azure Arc-managed cluster and Azure itself, so your applications and data can be run locally or in the Azure public cloud with minimal changes.

Bringing Azure’s at-scale data services to your on-premises systems is a good way for Microsoft to support both fully hybrid and disconnected scenarios. In a fully hybrid environment, you can take advantage of Azure resources to add compute and storage to your applications, treating Azure Arc-hosted data services as a mirror of your Azure instances. Using them in a purely local instance adds a firebreak between your local data and data stored in Azure, helping deal with any possible regulatory issues that require data to remain on-premises while still being able to take advantage of code developed to run in Azure.

With the public preview of Azure Arc Data Services, Microsoft has delivered on all its initial Azure Arc promises. It’ll be interesting to watch how the platform evolves, especially in conjunction with Azure Stack HCI, as more Azure services get containerized for use outside the public cloud.

Simon Bisson
Contributor

Author of InfoWorld's Enterprise Microsoft blog, Simon Bisson prefers to think of “career” as a verb rather than a noun, having worked in academic and telecoms research, as well as having been the CTO of a startup, running the technical side of UK Online (the first national ISP with content as well as connections), before moving into consultancy and technology strategy. He’s built plenty of large-scale web applications, designed architectures for multi-terabyte online image stores, implemented B2B information hubs, and come up with next generation mobile network architectures and knowledge management solutions. In between doing all that, he’s been a freelance journalist since the early days of the web and writes about everything from enterprise architecture down to gadgets.

More from this author