The storied history of web-scale problems offers lessons for operators of increasingly complex IT environments. Credit: Thinkstock Some problems are good to have… but they’re still problems. A company that has web-scale problems is probably growing and innovating—but at a pace so rapid that the current infrastructure can’t keep up. Adding to the challenge is that companies don’t always know that they even have a web-scale problem. In this article I will discuss the origin and evolution of web-scale problems, how to determine whether you have a web-scale problem, and how container orchestration is the most elegant solution we’ve found to help organizations solve these problems. Early warnings We saw one of the first harbingers of web-scale worries in, of all places, the greeting card industry. For almost 100 years, greeting card companies in the United States hummed along, manufacturing and merchandising cards that would get taped to gifts, sent through the mail, and stuck on refrigerators. Then, in the mid-1990s, everything changed. It was the rise of the World Wide Web, and everyone wanted to be part of it. In 1996, Blue Mountain, American Greetings, and Hallmark all launched dot-com sites to serve e-cards—and a digital battle ensued. I worked in the greeting card industry, and it was all about the holidays. Valentine’s Day, Mother’s Day, and Christmas are some of the happiest—and, not coincidentally, most lucrative—times of the year for greeting card companies. As business moved online, these major holidays became battlegrounds in the e-card space—blending the teachings of The Art of War (Sun Tzu) with The Mythical Man-Month (Fred Brooks) to craft state-of-the-art web infrastructure and win new digital business. (Today, we call this digital transformation.) At first, e-cards were free. The goal was to attract users, not make money. For dot-coms, millions of users were worth millions of dollars in company valuations. Things were great for a while. Everyone was attracting new users. Soon enough, however, the dot-coms needed to make real money. This created both strife and opportunity. When AmericanGreetings.com decided to start charging for e-cards, people didn’t want to pay, so they flooded Hallmark.com. Hallmark couldn’t handle the extra traffic, and it crashed. People still wanted to send e-cards, so they went back to AmericanGreetings.com and paid to send them. This drove tremendous business for American Greetings, but, more importantly, it highlighted the competitive advantage of being able to handle not just web-scale traffic, but unpredictable web-scale traffic. The business lesson we quickly learned was that web infrastructure could be an advantage in driving revenue. The dawn of web-scale worries Consumers at this time were warming to the idea of e-commerce, and servers powering small intranet and internet sites were being asked to perform web-based transactional processing at a scale no one had ever imagined. The servers, network equipment, storage devices, and internet pipes already in place couldn’t handle the traffic, creating the first web-scale problems for companies doing business on the web. At the time, there were no out-of-the-box solutions to solve these problems, so dot-coms had to build their own—through, in my experience, lots of trial and error and a great deal of pain. Best practices for how to solve web-scale problems were collected and disseminated throughout the industry, as talented systems administrators and developers taught each other through social connections. Not every company had web-scale problems—it was mostly start-ups and dot-coms—but those that did started targeting this talent pool. Web-scale problems go mainstream Of course, purely transactional e-commerce is now table stakes. Companies have systems on premises, in the cloud, and at the edge, spread across multiple providers’ platforms. And then there’s the demand from customers for more powerful and more personalized applications, not to mention information in real time. The scope and context of web-scale problems has changed, which, in many ways, makes them even more challenging to identify. Here is a list of questions to ask to determine if you have a web-scale problem in your business (and how big that problem really is): Do you have a double-sided marketplace with hundreds or thousands of users who purchase or consume resources, as well as tens or hundreds of IT professionals curating the services offered? Do you have scenarios where load on the system can change dramatically in a short period of time? Do you have hundreds or thousands of servers that are underutilized most of the time, but spike at other times? Do you collect data generated from thousands or millions of small devices or users? Do you have a workload that dramatically out-scales the capacity of a single box? Are you developing hundreds or thousands of services or microservices? Did you say yes to any of these questions? Do you think you will say yes to any of these questions within the next three to five years? Solving web-scale problems elegantly Back at American Greetings (and for years afterwards at other places), I solved web-scale problems with the software equivalent of shoestring and bubblegum. At the time, our team used a mix of open source and homegrown solutions to manage one of the largest websites on the internet. Using tools like Linux, Apache, and a homegrown CFEngine replica—yes, a replica—we were able to manage more than 1,000 servers and 70 applications with approximately three people (what most would call site reliability engineers nowadays). These tools were great, and cutting-edge for the time, but the set of higher-level primitives we used to define clusters, network endpoints, and applications were all things we simply made up. We had to, because there was no standard way to imagine, define, and build web-scale applications in those days. Each company was left to invent primitives, and each team member had to learn them if they wanted to understand the system and build new applications or troubleshoot broken ones. Early web scaling was akin to the earliest days of computers: If you didn’t know how to use Windows or Linux, you knew how to use a specific computer like COLOSSUS or the ENIAC. In those early days of web-scale computing, there wasn’t much portability in the knowledge you had, although basic concepts (networking, load balancers, storage, web servers, and so on) applied. After American Greetings, I worked at an ISP and web development company and solved similar problems for more than 70 different customers. That work helped me realize that there could and should be a standard way to solve web-scale problems. That’s why I was so excited when I saw Kubernetes come along. It changed everything. When I first saw Kubernetes, I was excited beyond belief. I knew there was finally a way to solve web-scale problems in a standard way. A need for Kubernetes At build time, Kubernetes and containers enable a standardized way to construct applications. Everyone can learn this way: Use Dockerfiles/Containerfiles, and commit them in Git. This standardized language for build management simplifies the cognitive load and makes the knowledge that SREs have portable to other systems within your organization and from other organizations (making it easier to hire new people). It also makes it a lot easier to test applications before pushing them into production. At run time, Kubernetes makes applications portable among different servers in the cluster, manages failover, handles the load balancers in the cluster, scales when traffic is heavy, and deploys pretty much anywhere—in the cloud or on premises. In fact, when people say they don’t need Kubernetes, it’s jarring for an e-commerce veteran like me to hear. My theory is that people who say they don’t need Kubernetes don’t realize they have web-scale problems. (And, it’s highly likely that they do.) The Kubernetes project, in combination with the many open source tools designed to complement it, enables organizations to effectively meet web-scale needs. Notice I didn’t say “easily meet.” I’m not going to pretend Kubernetes is an easy lift, because it’s not. But, remember, web-scale problems aren’t easy, and almost everyone has one (or more) nowadays. Kubernetes has capabilities I never could have imagined when I was going crazy trying to prevent Valentine’s Day from breaking my company’s technological heart. At Red Hat, Scott McCarty is senior principal product manager for RHEL Server, arguably the largest open source software business in the world. Focus areas include cloud, containers, workload expansion, and automation. Working closely with customers, partners, engineering teams, sales, marketing, other product teams, and even in the community, Scott combines personal experience with customer and partner feedback to enhance and tailor strategic capabilities in Red Hat Enterprise Linux. Scott is a social media startup veteran, an e-commerce old timer, and a weathered government research technologist, with experience across a variety of companies and organizations, from seven person startups to 12,000 employee technology companies. This has culminated in a unique perspective on open source software development, delivery, and maintenance. — New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com. Related content feature 14 great preprocessors for developers who love to code Sometimes it seems like the rules of programming are designed to make coding a chore. Here are 14 ways preprocessors can help make software development fun again. By Peter Wayner Nov 18, 2024 10 mins Development Tools Software Development feature Designing the APIs that accidentally power businesses Well-designed APIs, even those often-neglected internal APIs, make developers more productive and businesses more agile. By Jean Yang Nov 18, 2024 6 mins APIs Software Development news Spin 3.0 supports polyglot development using Wasm components Fermyon’s open source framework for building server-side WebAssembly apps allows developers to compose apps from components created with different languages. By Paul Krill Nov 18, 2024 2 mins Microservices Serverless Computing Development Libraries and Frameworks news Go language evolving for future hardware, AI workloads The Go team is working to adapt Go to large multicore systems, the latest hardware instructions, and the needs of developers of large-scale AI systems. By Paul Krill Nov 15, 2024 3 mins Google Go Generative AI Programming Languages Resources Videos