Simon Bisson
Contributor

A new wave of Azure services lifts Kubernetes and Cosmos DB

analysis
May 07, 20196 mins
Cloud ComputingCloud ManagementDatabases

As always, Microsoft Build sees a myriad of announcements. What’s important and relevant for anyone building apps on Azure?

ocean surf millimeter wave wireless technology sound waves abstract audio graphic
Credit: Getty Imags

Microsoft has starting referring to itself as a three-cloud company. There’s the Xbox gaming cloud, Microsoft 365 productivity services, and, first and foremost, Azure. Number two behind Amazon Web Services, Azure is a hyperscale behemoth, rolling out service after service at a rate that’s hard to keep up with. That rapid cadence is even more visible during Microsoft’s three main developer events, and you sometimes have to delve into the flurry of announcements to understand the key elements.

It’s clear that Microsoft’s focus is on Azure as a platform for building serverless applications, offering infrastructure services that don’t require specifying virtual machines, and that charge by the second of CPU used. A wide variety of platform services sits under your applications, providing machine learning, analytics, storage, and compute.

Things get really interesting where the two meet: a point that starts to improve scalability for our code, within single Azure regions or across the entire Azure public cloud. Building distributed applications by hand isn’t easy, and Microsoft provides more ways to automate scalability, using a mix of familiar tools and new features.

Event-driven scaling in Kubernetes

Azure, like the other big public clouds, is hugely dependent on Kubernetes. Managing and orchestrating distributed applications is essential when you’re working with hyperscale compute, and where delivering applications in isolated containers simplifies the process of delivering a complete infrastructure from your build system.

But Kubernetes can be complex, especially if you’re using it for simple event-driven applications. One option is Brigade, with its JavaScript orchestrated Kubernetes, but that doesn’t offer the scaling we need if we’re using tools such as Azure’s EventGrid to trigger events and launch new serverless functions on demand.

Microsoft and Red Hat have been collaborating on a scalable alternative: KEDA (Kubernetes-based event-driven autoscaling). KEDA is designed to quickly scale workloads, using its own triggers as an alternative to HTTP events. Applications scale up and down as necessary (even down to zero), using notifications from other applications and services to drive scaling, rather than the more usual scaling based on workload performance.

By using KEDA to scale, you roll out containers in response to events, ready to process data associated with that event. It’s a model akin to Azure’s Functions, and it’s not surprising that Microsoft is offering KEDA as a way to take your existing Functions code and host it in a Kubernetes application. With a Function in a container and triggered by events, we can quickly scale our Kubernetes cluster to support demand, shutting down containers when they’re no longer needed.

Putting Functions in a Kubernetes container helps avoid cloud lock-in; using KEDA means a serverless application can run on Azure, on-premises, or on any cloud. All you need is a host that supports KEDA, either in its own Kubernetes infrastructure or in your own virtual servers.

Analyzing global data in Cosmos DB

I remain fascinated by Cosmos DB. Using it might not be cheap, but it provides a scalable distributed database that’s truly global, using novel consistency models to fit how your applications work rather than making your applications fit the database’s replication model. Until now it’s been a useful store, with limited analytical capabilities; if you needed analytics you’d use queries to populate another system. But some of the larger Cosmos DB implementations store petabytes of data, and that makes it hard to justify a secondary analytics system at the same scale.

The newest release of Cosmos DB adds a set of Apache Spark APIs, supporting queries against any Cosmos DB partition in any region. Queries are made against the nearest replica of your Cosmos DB data, which handles converting its own data formats into Spark’s native format.

To help develop Cosmos DB analytics, it now supports Jupyter notebooks across all its data models, so you can build and run queries in notebooks and share them with colleagues. As Jupyter and its associated tools are common ways to experiment with ML (machine learning) models, you can bring your ML development directly to your data, shortening the development loop by using Cosmos DB for ML training and as a source of inferencing data.

Scaling Kubernetes with Cosmos DB

Distributed databases aren’t only a tool for working with large amounts of data; they can support smaller-scale applications where the distributed nature of the store matters. Here you make Cosmos DB part of your application infrastructure, keeping application management data where it’s closest to your code. You build the code once, build the infrastructure once, store the configuration information once, and use Cosmos DB to share that data globally.

Etcd is a key component of Kubernetes, holding its configuration and state data. Microsoft runs Kubernetes at scale in Azure, and it needs a distributed store for Kubernetes configuration data. Cosmos DB, as the foundation for much of Azure’s data infrastructure, clearly fits the bill and apparently has handled this role for some time. Now etcd support is being turned into a public-facing service. With a well-tested internal API, a shift to supporting all developers is a logical move, as more and more open source infrastructure services are using etcd.

Using Cosmos DB for your etcd data will make it easier to build Kubernetes applications that are globally replicated. Instead of having to put a copy of etcd in every region and stand up the servers you need to host it, you can have one Cosmos DB instance that hosts the configuration data for all your Kubernetes clusters, wherever they are. Configuration changes only need to be made once, reducing the risk of failed deployments and updates.

Microsoft’s new buzzword is “scaffolding,” providing frameworks to support the tools and services that build on top of its platforms. Much of what it’s doing with Azure is about providing that scaffolding, at scale. With Cosmos DB at its foundation and Kubernetes as the scaffolding, there’s now plenty of space to start building your own distributed applications—starting with Functions and building up from there.

Simon Bisson
Contributor

Author of InfoWorld's Enterprise Microsoft blog, Simon Bisson prefers to think of “career” as a verb rather than a noun, having worked in academic and telecoms research, as well as having been the CTO of a startup, running the technical side of UK Online (the first national ISP with content as well as connections), before moving into consultancy and technology strategy. He’s built plenty of large-scale web applications, designed architectures for multi-terabyte online image stores, implemented B2B information hubs, and come up with next generation mobile network architectures and knowledge management solutions. In between doing all that, he’s been a freelance journalist since the early days of the web and writes about everything from enterprise architecture down to gadgets.

More from this author