The storied history of web-scale problems offers lessons for operators of increasingly complex IT environments. Credit: Thinkstock Some problems are good to have… but they’re still problems. A company that has web-scale problems is probably growing and innovating—but at a pace so rapid that the current infrastructure can’t keep up. Adding to the challenge is that companies don’t always know that they even have a web-scale problem. In this article I will discuss the origin and evolution of web-scale problems, how to determine whether you have a web-scale problem, and how container orchestration is the most elegant solution we’ve found to help organizations solve these problems. Early warnings We saw one of the first harbingers of web-scale worries in, of all places, the greeting card industry. For almost 100 years, greeting card companies in the United States hummed along, manufacturing and merchandising cards that would get taped to gifts, sent through the mail, and stuck on refrigerators. Then, in the mid-1990s, everything changed. It was the rise of the World Wide Web, and everyone wanted to be part of it. In 1996, Blue Mountain, American Greetings, and Hallmark all launched dot-com sites to serve e-cards—and a digital battle ensued. I worked in the greeting card industry, and it was all about the holidays. Valentine’s Day, Mother’s Day, and Christmas are some of the happiest—and, not coincidentally, most lucrative—times of the year for greeting card companies. As business moved online, these major holidays became battlegrounds in the e-card space—blending the teachings of The Art of War (Sun Tzu) with The Mythical Man-Month (Fred Brooks) to craft state-of-the-art web infrastructure and win new digital business. (Today, we call this digital transformation.) At first, e-cards were free. The goal was to attract users, not make money. For dot-coms, millions of users were worth millions of dollars in company valuations. Things were great for a while. Everyone was attracting new users. Soon enough, however, the dot-coms needed to make real money. This created both strife and opportunity. When AmericanGreetings.com decided to start charging for e-cards, people didn’t want to pay, so they flooded Hallmark.com. Hallmark couldn’t handle the extra traffic, and it crashed. People still wanted to send e-cards, so they went back to AmericanGreetings.com and paid to send them. This drove tremendous business for American Greetings, but, more importantly, it highlighted the competitive advantage of being able to handle not just web-scale traffic, but unpredictable web-scale traffic. The business lesson we quickly learned was that web infrastructure could be an advantage in driving revenue. The dawn of web-scale worries Consumers at this time were warming to the idea of e-commerce, and servers powering small intranet and internet sites were being asked to perform web-based transactional processing at a scale no one had ever imagined. The servers, network equipment, storage devices, and internet pipes already in place couldn’t handle the traffic, creating the first web-scale problems for companies doing business on the web. At the time, there were no out-of-the-box solutions to solve these problems, so dot-coms had to build their own—through, in my experience, lots of trial and error and a great deal of pain. Best practices for how to solve web-scale problems were collected and disseminated throughout the industry, as talented systems administrators and developers taught each other through social connections. Not every company had web-scale problems—it was mostly start-ups and dot-coms—but those that did started targeting this talent pool. Web-scale problems go mainstream Of course, purely transactional e-commerce is now table stakes. Companies have systems on premises, in the cloud, and at the edge, spread across multiple providers’ platforms. And then there’s the demand from customers for more powerful and more personalized applications, not to mention information in real time. The scope and context of web-scale problems has changed, which, in many ways, makes them even more challenging to identify. Here is a list of questions to ask to determine if you have a web-scale problem in your business (and how big that problem really is): Do you have a double-sided marketplace with hundreds or thousands of users who purchase or consume resources, as well as tens or hundreds of IT professionals curating the services offered? Do you have scenarios where load on the system can change dramatically in a short period of time? Do you have hundreds or thousands of servers that are underutilized most of the time, but spike at other times? Do you collect data generated from thousands or millions of small devices or users? Do you have a workload that dramatically out-scales the capacity of a single box? Are you developing hundreds or thousands of services or microservices? Did you say yes to any of these questions? Do you think you will say yes to any of these questions within the next three to five years? Solving web-scale problems elegantly Back at American Greetings (and for years afterwards at other places), I solved web-scale problems with the software equivalent of shoestring and bubblegum. At the time, our team used a mix of open source and homegrown solutions to manage one of the largest websites on the internet. Using tools like Linux, Apache, and a homegrown CFEngine replica—yes, a replica—we were able to manage more than 1,000 servers and 70 applications with approximately three people (what most would call site reliability engineers nowadays). These tools were great, and cutting-edge for the time, but the set of higher-level primitives we used to define clusters, network endpoints, and applications were all things we simply made up. We had to, because there was no standard way to imagine, define, and build web-scale applications in those days. Each company was left to invent primitives, and each team member had to learn them if they wanted to understand the system and build new applications or troubleshoot broken ones. Early web scaling was akin to the earliest days of computers: If you didn’t know how to use Windows or Linux, you knew how to use a specific computer like COLOSSUS or the ENIAC. In those early days of web-scale computing, there wasn’t much portability in the knowledge you had, although basic concepts (networking, load balancers, storage, web servers, and so on) applied. After American Greetings, I worked at an ISP and web development company and solved similar problems for more than 70 different customers. That work helped me realize that there could and should be a standard way to solve web-scale problems. That’s why I was so excited when I saw Kubernetes come along. It changed everything. When I first saw Kubernetes, I was excited beyond belief. I knew there was finally a way to solve web-scale problems in a standard way. A need for Kubernetes At build time, Kubernetes and containers enable a standardized way to construct applications. Everyone can learn this way: Use Dockerfiles/Containerfiles, and commit them in Git. This standardized language for build management simplifies the cognitive load and makes the knowledge that SREs have portable to other systems within your organization and from other organizations (making it easier to hire new people). It also makes it a lot easier to test applications before pushing them into production. At run time, Kubernetes makes applications portable among different servers in the cluster, manages failover, handles the load balancers in the cluster, scales when traffic is heavy, and deploys pretty much anywhere—in the cloud or on premises. In fact, when people say they don’t need Kubernetes, it’s jarring for an e-commerce veteran like me to hear. My theory is that people who say they don’t need Kubernetes don’t realize they have web-scale problems. (And, it’s highly likely that they do.) The Kubernetes project, in combination with the many open source tools designed to complement it, enables organizations to effectively meet web-scale needs. Notice I didn’t say “easily meet.” I’m not going to pretend Kubernetes is an easy lift, because it’s not. But, remember, web-scale problems aren’t easy, and almost everyone has one (or more) nowadays. Kubernetes has capabilities I never could have imagined when I was going crazy trying to prevent Valentine’s Day from breaking my company’s technological heart. At Red Hat, Scott McCarty is senior principal product manager for RHEL Server, arguably the largest open source software business in the world. Focus areas include cloud, containers, workload expansion, and automation. Working closely with customers, partners, engineering teams, sales, marketing, other product teams, and even in the community, Scott combines personal experience with customer and partner feedback to enhance and tailor strategic capabilities in Red Hat Enterprise Linux. Scott is a social media startup veteran, an e-commerce old timer, and a weathered government research technologist, with experience across a variety of companies and organizations, from seven person startups to 12,000 employee technology companies. This has culminated in a unique perspective on open source software development, delivery, and maintenance. — New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com. Related content feature What is Rust? Safe, fast, and easy software development Unlike most programming languages, Rust doesn't make you choose between speed, safety, and ease of use. Find out how Rust delivers better code with fewer compromises, and a few downsides to consider before learning Rust. By Serdar Yegulalp Nov 20, 2024 11 mins Rust Programming Languages Software Development how-to Kotlin for Java developers: Classes and coroutines Kotlin was designed to bring more flexibility and flow to programming in the JVM. Here's an in-depth look at how Kotlin makes working with classes and objects easier and introduces coroutines to modernize concurrency. By Matthew Tyson Nov 20, 2024 9 mins Java Kotlin Programming Languages analysis Azure AI Foundry tools for changes in AI applications Microsoft’s launch of Azure AI Foundry at Ignite 2024 signals a welcome shift from chatbots to agents and to using AI for business process automation. By Simon Bisson Nov 20, 2024 7 mins Microsoft Azure Generative AI Development Tools news Microsoft unveils imaging APIs for Windows Copilot Runtime Generative AI-backed APIs will allow developers to build image super resolution, image segmentation, object erase, and OCR capabilities into Windows applications. By Paul Krill Nov 19, 2024 2 mins Generative AI APIs Development Libraries and Frameworks Resources Videos