The clock is ticking on Microsoft’s Extended Support for SQL Server 2008/R2, unless you migrate to the Azure cloud. Here’s how Credit: Thinkstock With Extended Support for SQL Server 2008 and SQL Server 2008 R2 ending in July 2019, to be followed six months later by the end of Extended Support for Windows Server 2008 and 2008 R2 in January 2020, the clock is ticking on making changes to certain legacy applications. Upgrading to the latest versions is always an option, but Microsoft is providing an alternative when upgrades are not viable: Migrate the applications to the Azure cloud to get three more years of free Extended Security Update support. Microsoft’s offer: “Customers who migrate workloads to Azure virtual machines will have access to Extended Security Updates for both SQL Server and Windows Server 2008 and 2008 R2 for three years after the End of Support deadlines, included at no additional charge over the standard VM pricing.” To assist with the migration, Microsoft permits existing on-premises licenses to be transferred to the Azure cloud, and also offers a 180-day transition period during which a single license applies concurrently on-premises and in the cloud. For those legacy SQL Server 2008/R2 applications that remain critical, some form of high availability or disaster recovery protection will be required to preserve business continuity. It is possible to fully protect these legacy applications in the state-of-the-art Azure cloud, of course, but that requires understanding and properly configuring the necessary provisions. And with decade-old software, getting it right can be somewhat challenging. This article provides an overview of the high availability (HA) and disaster recovery (DR) provisions available for SQL Server 2008/R2 in the Azure cloud and highlights two common HA/DR configurations. It is important to note that the provisions discussed here also apply to later versions of SQL Server and Windows Server, making them suitable for other applications, including any legacy ones being upgraded. Any differences among the different versions will be noted. High availability options in Azure The Azure cloud offers redundancy within data centers, within regions, and across multiple regions. Redundancy within a data center is provided by an availability set, which distributes servers across different fault domains. Each fault domain resides in a different rack to protect against failures at the server and rack levels, enabling Microsoft to offer a 99.95 percent uptime guarantee. The service level agreement (SLA) ensures that if two or more servers are deployed in an availability set, at least one will have external connectivity. However, the SLA does not guarantee availability at the application level, and it does nothing to protect against site-wide failures, like the one that occurred in Azure’s South-Central US Region in September 2018. For protection from single site-wide failures, Azure is rolling out availability zones (AZs). Each region that supports AZs has at least three data centers that are interconnected via high-bandwidth, low-latency networks capable of supporting synchronous data replication. When multiple servers are deployed in different AZs, Microsoft offers a 99.99 percent SLA, guaranteeing that at least one of those servers will have external connectivity. For even greater resiliency, Azure offers region pairs, where a region gets paired with another within the same geography (such as the US, Europe, or Asia) separated by at least 300 miles. The pairing is strategically chosen to protect against widespread power or network outages, and major natural disasters. Microsoft also takes advantage of the arrangement to minimize downtime during infrastructure updates, and to accelerate recovery times in the event of a widespread outage. Database and system administrators should be fully aware that, even with the higher uptime assurances offered by AZs and region pairs, what counts as downtime excludes many common causes of failure at the application level. Here is just a partial list of common causes of failure that are explicitly excluded from the Azure Service Level Agreement: factors outside Microsoft’s reasonable control; the use of services, hardware, or software not provided by Microsoft; the customer’s unauthorized action or lack of action when required; and faulty input, instructions, or arguments. In effect, Microsoft’s SLA guarantees “dial tone” for the servers and nothing more. It is up to the customer to ensure uptime at the application level. With so many common causes of downtime excluded from service level assurances in the Azure cloud, achieving satisfactory HA protection for mission-critical applications will require a third-party failover clustering solution like SIOS DataKeeper. The need for third-party solutions is rooted in the fact that traditional HA failover clustering is not possible in the Azure cloud due to the lack of a storage area network or other shared storage. Microsoft addressed this limitation with the introduction of Storage Spaces Direct (S2D), a virtual shared storage solution. But S2D began with Windows Server 2016 and supports only SQL Server 2016 and later, so is not an option for SQL Server 2008/R2. Similarly, the more robust Always On Availability Groups feature that was introduced with SQL Server 2012 is also not an option. Disaster recovery options in Azure Adequate business continuity or DR protection for some applications is possible with either Azure Backup or Azure Site Recovery. With both Azure Backup and Azure Site Recovery, Microsoft’s DR-as-a-service (DRaaS) offering, the application’s critical data is routinely backed up or replicated, respectively, to a remote site and available to be restored in the event of a localized disaster. Both services are designed to meet different needs. Azure Backup is designed with file-level granularity and longer data retention periods. But the backup agent only supports SQL Server 2012 and later, making it unavailable as an option for SQL Server 2008/R2 applications. Azure Site Recovery, by contrast, is designed to protect entire workloads and does support SQL Server 2008/R2. Azure Site Recovery automatically replicates the configuration and all data on the active instance to another instance in another data center. As a DRaaS solution, Azure Site Recovery coordinates the replication and failover/failback processes across the active and standby instances. Azure Site Recovery is able to accommodate recovery point objectives (RPOs) ranging from a few minutes to a few seconds, and recovery time objectives (RTOs) of under one hour for a failed instance and as little as a few minutes with special arrangements. With the many exclusions to uptime in Azure, along with the inability to take advantage of the many enhancements and new services introduced since 2008, administrators are finding that the best way to be confident about protecting SQL Server 2008/R2 applications is with a third-party failover clustering solution that is purpose-built for high availability and disaster recovery. Third-party failover clustering solutions for SQL Server 2008/R2 The biggest challenge involved in implementing HA or DR provisions for SQL Server 2008/R2 in Azure derives from the need for failover cluster instances (FCIs) to use some form of shared storage, normally in the form of a SAN. Because the only shared storage available in the Azure cloud is of the virtual variety (with S2D that only supports SQL Server 2016 and later), a third-party, software-based solution that supports SAN-less failover clustering is needed. Microsoft recognizes the need for third-party failover clustering technology to provide HA protection for many applications. The company includes these instructions for configuring SIOS DataKeeper in its documentation: “High availability for a file share using WSFC, Azure ILB, and third-party software SIOS DataKeeper.” Third-party failover clustering solutions include, at a minimum, real-time data replication, continuous monitoring capable of detecting any failure at the application level, and configurable policies for failover and failback. Most solutions for Windows are designed to integrate seamlessly with Windows Server Failover Clustering (WSFC) to help simplify the implementation. Many solutions also offer additional advanced capabilities that frequently include a choice of synchronous or asynchronous replication, WAN optimization to maximize performance, and manual switchover of active and standby assignments for performing planned maintenance and routine backups without disrupting the application. SAN-less failover clustering software works by building SQL Server FCIs using locally-attached storage in a way that makes it appear to be shared among the member instances. These virtual volumes create, in effect, a layer of abstraction that takes the place of physical storage resources. Rather than control disk locking or SCSI reservations, as would be the case with shared storage, the cluster instead controls the mirror direction between or among the virtual volumes. This approach ensures that writes occurring on the active instance are replicated to all standby instances in the failover cluster. The block-level data replication occurs synchronously between or among instances in the same Azure region and asynchronously across regions. The resulting shared-nothing SAN-less failover cluster configuration eliminates all potential single points of failure, thereby assuring dependable HA or DR protection. Most third-party failover clustering solutions are implemented entirely in software to afford virtually unlimited scalability across private, public, and hybrid cloud infrastructures. Most are also designed to be agnostic with respect to applications and platforms. Application-agnosticism enables support for virtually all application software including all versions of SQL Server, thereby providing a universal, general-purpose HA/DR solution. This same design also makes it possible to support FCIs in the Standard Edition of all versions of SQL Server, as well as the more advanced Always On Availability Groups feature in the Enterprise Edition that was introduced in SQL Server 2012. Platform-agnosticism makes it possible to leverage, while not being dependent upon, various capabilities and services, including different hypervisors, available within and for the Azure cloud. Although Linux was not supported until SQL Server 2017, there are third-party failover clustering solutions available for all Linux applications. Purpose-built for high availability and disaster recovery, SAN-less failover clusters are capable of immediately detecting failures at the application level regardless of the cause and without the exceptions cited in the Azure SLA. As a result, SAN-less failover clusters can accommodate far more stringent RPOs and RTOs compared to standard Azure HA/DR provisions, making them suitable for even the most mission-critical of applications. HA/DR configurations for SQL Server 2008/R2 in Azure With HA provisions for legacy SQL Server 2008/R2 applications being problematic in the Azure cloud, the only viable option is a third-party failover clustering solution. For DR, administrators have a choice of using Azure Site Recovery or the failover cluster for both HA and DR. This section highlights both configurations, and outlines the basic steps required to implement each. Important note: For running SQL Server applications on failover clusters in Azure on Windows Server 2008 R2 or Windows Server 2012, Microsoft offers this hotfix to enable the listener to be used by both FCIs and availability groups. Depending on the configuration, additional hotfixes may be needed. If possible, using Windows Server 2012 R2 or later eliminates the need for any hotfixes. The combination of failover clustering for HA and Azure Site Recovery for DR provides a cost-effective solution for many SQL Server applications. The shared storage required by FCIs is provided by virtual volumes in the SAN-less HA failover cluster. Azure Site Recovery replicates the pair of VM images in the cluster (both the active and standby) to another region in a region pair to protect against widespread disasters. “Lifting and shifting” an existing SQL Server 2008/R2 failover cluster from on-premises to the Azure cloud is quite straightforward: It involves replacing physical disk resources with virtual volumes and replacing the disk witness with a file share witness. During the migration, Azure Site Recovery can be used to replicate the existing active VM on-premises to the active and standby VMs in the Azure cloud. The last step involves configuring the Azure Internal Load Balancer (ILB) for client redirection and running a PowerShell script on the local nodes to update the SQL Server cluster IP resource to listen for the ILB probe. Because the IP addresses and subnet of the cluster instances will likely change as part of this migration, this step may also require other changes to related IP addresses and job endpoints. If no on-premises HA failover cluster exists to be lifted and shifted, it will be necessary to create one in the Azure cloud. Most vendors of failover clustering solutions provide documentation with detailed, step-by-step instructions for configuring clusters in popular public clouds, and these configurations normally leverage cloud-specific capabilities, such as availability sets and availability zones in Azure. Like all DRaaS offerings, Azure Site Recovery has some limitations. For example, WAN bandwidth consumption cannot exceed 10 megabytes per second, and that may be too low for high-use applications. More serious limitations involve the inability to automatically detect or fail over from an outage. Of course, this is why the service is both relatively inexpensive and characterized as being for DR and not for HA. More robust DR protection is possible with any failover clustering solution, normally in a three-node configuration as shown in the diagram below. Two of the nodes provide HA protection with rapid, automatic failover based on real-time, synchronous data replication across an availability set or zone. The third node, located in a different Azure region (preferably as part of a region pair), offers DR protection for widespread disasters affecting an entire region. SIOS Technology A two-node SAN-less failover cluster consisting of SQL1 and SQL2 is deployed across availability zones in the West Europe region for HA protection. SQL3, a standalone instance of SQL Server, is in the North Europe region for DR protection. Once again, detailed, step-by-step instructions for how to configure three-node HA/DR clusters will be available in your vendor’s documentation. There are often many steps involved, but the entire process usually takes less than an hour. Some vendors also offer “Quick Start” templates that automate the steps for all or part of the configuration. The main disadvantage of using the failover cluster for both HA and DR is the slight increase in the cost of licensing for the third node. The main advantage is the ability to accommodate even the most demanding RPOs and RTOs with no bandwidth limitations. Get started… sooner rather than later While it only takes a few minutes to spin up a SQL Server instance in Azure, it is advisable to not wait until the last minute to migrate applications. Administrators will encounter the usual learning curves for Azure, in general, and its HA/DR provisions, in particular. The third-party failover clustering solution you choose will also have a learning curve. A good best practice is to practice with one or more of the affected applications in a non-production environment. With all HA and DR configurations, comprehensive testing is needed to prevent failover provisions from failing when needed. So don’t wait, start now. Practice, practice, practice, and test, test, test. And complete the migration well in advance of the deadline—just in case (see Murphy’s Law). If you follow these recommendations, when Tuesday July 9, 2019, finally arrives, you should be rewarded with just another routine day at the office. David Bermingham is technical evangelist at SIOS Technology. He is recognized within the technology community as a high-availability expert and has been honored to be elected a Microsoft MVP for the past eight years including six years as a Cluster MVP and two years as a Cloud and Datacenter Management MVP. David holds numerous technical certifications and has more than 30 years of IT experience including in finance, healthcare, and education. — New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com. Related content analysis Azure AI Foundry tools for changes in AI applications Microsoft’s launch of Azure AI Foundry at Ignite 2024 signals a welcome shift from chatbots to agents and to using AI for business process automation. By Simon Bisson Nov 20, 2024 7 mins Microsoft Azure Generative AI Development Tools analysis Succeeding with observability in the cloud Cloud observability practices are complex—just like the cloud deployments they seek to understand. The insights observability offers make it a challenge worth tackling. By David Linthicum Nov 19, 2024 5 mins Cloud Management Cloud Computing news Akka distributed computing platform adds Java SDK Akka enables development of applications that are primarily event-driven, deployable on Akka’s serverless platform or on AWS, Azure, or GCP cloud instances. By Paul Krill Nov 18, 2024 2 mins Java Scala Serverless Computing analysis Strategies to navigate the pitfalls of cloud costs Cloud providers waste a lot of their customers’ cloud dollars, but enterprises can take action. By David Linthicum Nov 15, 2024 6 mins Cloud Architecture Cloud Management Cloud Computing Resources Videos