Many enterprises are now migrating the more difficult applications and data sets to the cloud. Once they attempt to operationalize these workloads, it’s clear that some things should have been fixed before. Credit: Thinkstock You’re a new CIO. It’s 9:00 a.m. on a Wednesday and you’re in an emergency Zoom meeting with IT operations leaders. The faces on the screen are somber, and it’s clear why when they explain the purpose of the meeting. It seems that all of IT ops, which was initially budgeted at $10 million for this fiscal year, is now looking at a $4 million overrun due to the unanticipated cost of the operations personnel and tools needed to operate the new bunch of applications and databases that just moved to a public cloud. What happened? It’s likely they hit a “cloudops wall,” meaning that the cost of operating systems in the cloud was underestimated by 20% to 30%. They assumed that, at most, the cost of operating the same systems in the cloud would be about 10% more than on premises. Indeed, the industry told them that operations cost would likely be reduced. The reality is that a few things are occurring right now. First, the pandemic pushed many enterprises to migrate their next tranche of systems to the cloud—systems avoided at first since they were more complex and not as well designed. Moreover, these systems are interacting in new ways, such as a cloud-based database now consuming data from a traditional data center versus them living in the same data center. Second, since there is a “need for speed” in moving to the cloud, many of the pragmatic steps have been compressed or skipped. Refactoring applications to leverage cloud-native services or containerizing some of the migrating systems has been pushed off, opting for cheaper and faster lift-and-shift processes that are underoptimized. Finally, and most important, nobody in the company has done cloudops for these types of systems yet. For example, moving mainframe-based systems to a public cloud is much different from migrating LAMP (Linux, Apache, MySQL, and PHP) stacks, which are more modern. This lack of skills turns much of the planning into guesswork. This time they guessed wrong by 20% to 30%. There are a few ways to fix the cloudops wall that enterprises are hitting now. First, there needs to be more focus on refactoring or fixing systems as they move to the cloud. I often say, “Crap on premises moved to the cloud is just crap in the cloud.” Systems that get even more complicated and costly to operate in the cloud need to be fixed or improved as you move them. It’s simple math for me. If you’re skipping improving the systems, then you need to budget more for cloudops. Or improve the systems as they migrate, such as refactoring to cloud-native services, and gain cloudops improvements and thus lower costs. It’s a clear trade-off. Second, leverage the right cloudops tools to ensure that all operations that can be automated are automated. Most of those that hit a cloudops wall have underoptimized operations automation. They carry forward their ops practices from on premises to the cloud and end up adapting already inefficient processes and tools to systems that have become more complex. The problem with the cloudops wall is that enterprises don’t understand why they’re hitting it. This is not a matter of systems in the cloud being more costly to operate than originally thought. This is about a lack of planning and a lack of a willingness to improve systems before moving to the cloud. It’s also about knowing how to leverage the correct cloudops tools in the right ways. Perhaps this is another example of pay now versus pay a whole lot more later. I’ve found that the former is always a better choice in the world of cloud computing. Related content analysis Azure AI Foundry tools for changes in AI applications Microsoft’s launch of Azure AI Foundry at Ignite 2024 signals a welcome shift from chatbots to agents and to using AI for business process automation. By Simon Bisson Nov 20, 2024 7 mins Microsoft Azure Generative AI Development Tools analysis Succeeding with observability in the cloud Cloud observability practices are complex—just like the cloud deployments they seek to understand. The insights observability offers make it a challenge worth tackling. By David Linthicum Nov 19, 2024 5 mins Cloud Management Cloud Computing news Akka distributed computing platform adds Java SDK Akka enables development of applications that are primarily event-driven, deployable on Akka’s serverless platform or on AWS, Azure, or GCP cloud instances. By Paul Krill Nov 18, 2024 2 mins Java Scala Serverless Computing analysis Strategies to navigate the pitfalls of cloud costs Cloud providers waste a lot of their customers’ cloud dollars, but enterprises can take action. By David Linthicum Nov 15, 2024 6 mins Cloud Architecture Cloud Management Cloud Computing Resources Videos