Many systems architects already see too much focus on processors for generative AI systems and not enough attention on other vital components. Talk to anybody about generative AI in the cloud, and the conversation goes quickly to GPUs (graphics processing units). But that could be a false objective. GPUs do not matter as much as people think they do, and in a few years, the conversation will likely shift to what is much more critical to the development and deployment of generative AI systems in the cloud. The current assumption is that GPUs are indispensable for facilitating the complex computations required by generative AI models. While GPUs have been pivotal in advancing AI, overemphasizing them might detract from exploring and leveraging equally effective and potentially more sustainable alternatives. Indeed, GPUs could quickly become commodities like other resources that AI systems need, such as storage and processing space. The focus should be on designing and deploying these systems, not just the hardware they run on. Call me crazy. GPU gold rush The importance of GPUs has worked out well for Nvidia, a company most people did not pay much attention to until now. In its most recent quarter, Nvidia posted record-high data center revenue of $14.5 billion, up 41% from the prior quarter and 279% from the year-ago quarter. Its GPUs are now the standard in AI processing, even more so than gaming. More than the explosion of the Nvidia stock, you can’t open social media without seeing somebody taking a selfie with Jensen Huang, Nvidia’s CEO. Moreover, everyone who’s anyone has partnered with Nvidia, running multimillion-dollar budgets to get close to this high-growth company and technology. Initially designed for accelerating 3D graphics in gaming in the 1990s, GPUs have evolved from their origins. Early GPU architecture was highly specialized for graphical calculations and used primarily for rendering images and handling the intensive parallel processing tasks associated with 3D rendering. This makes them a good fit for AI since they are adept at tasks requiring simultaneous computations. Are GPUs really a big deal? GPUs require a host chip to orchestrate operations. Although this simplifies the complexity and capability of modern GPU architectures, it’s also less efficient than it could be. GPUs operate in conjunction with CPUs (the host chip), which offload specific tasks to GPUs. Also, these host chips manage the overall operation of software programs. Adding to this question of efficiency is the necessity for inter-process communications; challenges with disassembling models, processing them in parts, and then reassembling the outputs for comprehensive analysis or inference; and the complexities inherent in using GPUs for deep learning and AI. This segmentation and reintegration process is part of distributing computing tasks to optimize performance, but it comes with its own efficiency questions. Software libraries and frameworks designed to abstract and manage these operations are required. Technologies like Nvidia’s CUDA (Compute Unified Device Architecture) provide the programming model and toolkit needed to develop software that can harness GPU acceleration capabilities. A core reason for the high interest in Nvidia is that it provides a software ecosystem that allows GPUs to work more efficiently with applications, including gaming, deep learning, and generative AI. Without these ecosystems, CUDA and others wouldn’t have the same potential. Thus, the spotlight is on Nvidia, which has both the processor and the ecosystem for now. Alternatives on the horizon I’m not saying that Nvidia GPUs are bad technology. Clearly they are effective. The argument is that having the processing layer be the major focus of building and deploying generative AI systems in the cloud is a bit of a distraction. I suspect that in two years, GPUs will certainly still be in the picture, but the excitement about them will have long passed. Instead, we’ll be focused on inference efficiency, continuous model improvement, and new ways to manage algorithms and data. The meteoric rise of Nvidia has investors running for their checkbooks to invest in any potential alternatives to play in that market. Apparent competitors right now are AMD and Intel. Intel, for example, is pursuing a GPU alternative with its Gaudi 3 processor. More interestingly, several startups purport to have created better ways to process large language models. A short list of these companies includes SambaNova, Cerebras, GraphCore, Groq, and xAI. Of course, not only are these companies looking to build chips and software ecosystems for those chips, many are working to provide microclouds or small cloud providers that will offer their GPU alternatives as a service, much like AWS, Microsoft, and Google do today with available GPUs. The list of GPU cloud providers is growing by the day, judging from the number of PR agencies banging on my door for attention. While we are just reselling Nvidia GPU processing, you can count on these same microclouds to adopt new GPU analogs as they hit the market, considering that they are cheaper, more efficient, and require less power. If that occurs, they will quickly replace whatever processor is less advanced. What’s more, if the performance and reliability are there, we really don’t care what brand the processor is, or even the architecture that it employs. In that world, I doubt we’ll be seeking selfies with the CEOs of those companies. It’s just a component of a system that works. Sometimes GPUs are not needed Of course, as I covered here, GPUs are not always needed for generative AI or other AI processing. Smaller models might run efficiently on traditional CPUs or other specialized hardware and be more cost- and energy-efficient. Many of my generative AI architectures have used traditional CPUs without a significant impact on performance. Of course, it depends on what you’re attempting to do. Most enterprise generative AI deployments will require less power, and I suspect that many of the current generative AI projects that insist on using GPUs are often overkill. Eventually we’ll get better at understanding when GPUs (or their analogs) should be used and when they are not needed. However, much like we’re seeing with the cloud-flation out there, enterprises may overprovision the processing power for their AI systems and won’t care until they see the bill. We have not reached the point where we are too worried about the cost optimization of generative AI systems, but we will have to be accountable at some point. Okay, Linthicum is being a buzzkill again. I guess I am, but for good reason. We’re about to enter a time of much change and transformation in the use of AI technology that will impact IT moving forward. What keeps me up at night is that the IT industry is being distracted by another shiny object. That typically doesn’t end well. Related content analysis Strategies to navigate the pitfalls of cloud costs Cloud providers waste a lot of their customers’ cloud dollars, but enterprises can take action. By David Linthicum Nov 15, 2024 6 mins Cloud Architecture Cloud Management Cloud Computing analysis Understanding Hyperlight, Microsoft’s minimal VM manager Microsoft is making its Rust-based, functions-focused VM tool available on Azure at last, ready to help event-driven applications at scale. By Simon Bisson Nov 14, 2024 8 mins Microsoft Azure Rust Serverless Computing how-to Docker tutorial: Get started with Docker volumes Learn the ins, outs, and limits of Docker's native technology for integrating containers with local file systems. By Serdar Yegulalp Nov 13, 2024 8 mins Devops Cloud Computing Software Development news Red Hat OpenShift AI unveils model registry, data drift detection Cloud-based AI and machine learning platform also adds support for Nvidia NIM, AMD GPUs, the vLLM runtime for KServe, KServe Modelcars, and LoRA fine-tuning. By Paul Krill Nov 12, 2024 3 mins Generative AI PaaS Artificial Intelligence Resources Videos