How CPUs will address the energy challenges of generative AI

Balancing performance, energy efficiency, and cost-effectiveness, CPUs adeptly handle the less-intensive inference tasks that make up the lion’s share of AI workloads.

The vast majority of company leaders (98%) recognize the strategic importance of AI, with nearly 65% planning increased investments. Global AI spending is expected to reach $300 billion by 2026. Also by 2026, AI’s electricity usage could increase tenfold, according to the International Energy Agency. Clearly, AI presents businesses with a dual challenge: maximizing AI’s capabilities while minimizing its environmental impact.

In the United States alone, power consumption by data centers is expected to double by 2030, reaching 35GW (gigawatts), primarily due to the growing demand for AI technologies. This increase is largely driven by the deployment of AI-ready racks, which consume an excessive 40kW to 60kW (kilowatts) each due to their GPU-intensive processes.

There are three main strategies available to address these looming energy challenges effectively:

Selecting the right computing resources for AI workloads, with a focus on distinguishing between training and inference needs.
Optimizing performance and energy efficiency within existing data center footprints.
Fostering sustainable AI development through collaborative efforts across the ecosystem.

CPUs vs. GPUs for AI inference workloads

Contrary to common belief, sustainable AI practices show that CPUs, not just high-powered GPUs, are suitable for most AI tasks. For example, 85% of AI compute is used for inference and does not require a GPU.

For AI inference tasks, CPUs offer a balanced blend of performance, energy efficiency, and cost-effectiveness. They adeptly handle diverse, less-intensive inference tasks, making them particularly energy-efficient. Additionally, their ability to process parallel tasks and adapt to fluctuating demands ensures optimal energy usage, which is crucial for maintaining efficiency. This stands in stark contrast to the more power-hungry GPUs, which excel in AI training due to their high-performance capabilities but often remain underutilized between intensive tasks.

Moreover, the lower energy and financial spend associated with CPUs make them a preferable option for organizations striving for sustainable and cost-effective operations. Further enhancing this advantage, software optimization libraries tailored for CPU architectures significantly reduce energy demands. These libraries optimize AI inference tasks to run more efficiently, aligning computational processes with the CPU’s operational characteristics to minimize unnecessary power usage.

Similarly, enterprise developers can utilize cutting-edge software tools that enhance AI performance on CPUs. These tools integrate seamlessly with common AI frameworks such as TensorFlow and ONNX, automatically tuning AI models for optimal CPU performance. This not only streamlines the deployment process but also eliminates the need for manual adjustments across different hardware platforms, simplifying the development workflow and further reducing energy consumption.

Lastly, model optimization complements these software tools by refining AI models to eliminate unnecessary parameters, creating more compact and efficient models. This pruning process not only maintains accuracy but also reduces computational complexity, lowering the energy required for processing.

Choosing the right compute for AI workloads

For enterprises to fully leverage the benefits of AI while maintaining energy efficiency, it is critical to strategically match CPU capabilities with specific AI priorities. This involves several steps:

Identify AI priorities: Start by pinpointing the AI models that are most critical to the enterprise, considering factors like usage volume and strategic importance.
Define performance requirements: Establish clear performance criteria, focusing on essential aspects like latency and response time, to meet user expectations effectively.
Evaluate specialized solutions: Seek out CPU solutions that not only excel in the specific type of AI required but also meet the set performance benchmarks, ensuring they can handle the necessary workload efficiently.
Scale with efficiency: Once the performance needs are addressed, consider the solution’s scalability and its ability to process a growing number of requests. Opt for CPUs that offer the best balance of throughput (inferences per second) and energy consumption.
Right-size the solution: Avoid the pitfall of selecting the most powerful and expensive solution without assessing actual needs. It’s crucial to right-size the infrastructure to avoid wasteful expenditure and ensure it can be scaled efficiently as demand grows.
Consider future flexibility: Caution is advised against overly specialized solutions that may not adapt well to future changes in AI demand or technology. Enterprises should prefer versatile solutions that can support a range of AI tasks to avoid future obsolescence.

Data centers currently account for about 4% of global energy consumption, a figure that the growth of AI threatens to increase significantly. Many data centers already have deployed large numbers of GPUs, which consume tremendous power and suffer from thermal constraints.

For example, GPUs like Nvidia’s H100, with 80 billion transistors, push power consumption to extremes, with some configurations exceeding 40kW. As a result, data centers must employ immersion cooling, a process which submerges the hardware in thermally conductive liquid. While effective at heat removal and allowing for higher power densities, this cooling method consumes additional power, compelling data centers to allocate 10% to 20% of their energy solely for this task.

Conversely, energy-efficient CPUs offer a promising solution to future-proof against the surging electricity needs driven by the rapid expansion of complex AI applications. Companies like Scaleway and Oracle are leading this trend by implementing CPU-based AI inferencing methods that dramatically reduce reliance on traditional GPUs. This shift not only promotes more sustainable practices but also showcases the ability of CPUs to efficiently handle demanding AI tasks.

To illustrate, Oracle has successfully run generative AI models with up to seven billion parameters, such as the Llama 2 model, directly on CPUs. This approach has demonstrated significant energy efficiency and computational power benefits, setting a benchmark for effectively managing modern AI workloads without excessive energy consumption.

Matching CPUs with performance and energy needs

Given the superior energy efficiency of CPUs in handling AI tasks, we should consider how best to integrate these technologies into existing data centers. The integration of new CPU technologies demands careful consideration of several key factors to ensure both performance and energy efficiency are optimized:

High utilization: Select a CPU that avoids resource contention and eliminates traffic bottlenecks. Key attributes include a high core count, which helps maintain performance under heavy loads. This also drives highly efficient processing of AI tasks, offering better performance per watt and contributing to overall energy savings. The CPU should also provide significant amounts of private cache and an architecture that supports single-threaded cores.
AI-specific features: Opt for CPUs that have built-in features tailored for AI processing, such as support for common AI numerical formats like INT8, FP16, and BFloat16. These features enable more efficient processing of AI workloads, enhancing both performance and energy efficiency.
Economic considerations: Upgrading to CPU-based solutions can be more economical than maintaining or expanding GPU-based systems, especially given the lower power consumption and cooling requirements of CPUs.
Simplicity of integration: CPUs offer a straightforward path for upgrading data center capabilities. Unlike the complex requirements for integrating high-powered GPUs, CPUs can often be integrated into existing data center infrastructure—including networking and power systems—with ease, simplifying the transition and reducing the need for extensive infrastructure changes.

By focusing on these key considerations, we can effectively balance performance and energy efficiency in our data centers, ensuring a cost-effective and future-proofed infrastructure prepared to meet the computational demands of future AI applications.

Advancing CPU technology for AI

Industry AI alliances, such as the AI Platform Alliance, play a crucial role in advancing CPU technology for artificial intelligence applications, focusing on enhancing energy efficiency and performance through collaborative efforts. These alliances bring together a diverse range of partners from various sectors of the technology stack—including CPUs, accelerators, servers, and software—to develop interoperable solutions that address specific AI challenges. This work spans from edge computing to large data centers, ensuring that AI deployments are both sustainable and efficient.

These collaborations are particularly effective in creating solutions optimized for different AI tasks, such as computer vision, video processing, and generative AI. By pooling expertise and technologies from multiple companies, these alliances aim to forge best-in-breed solutions that deliver optimal performance and remarkable energy efficiency.

Cooperative efforts such as the AI Platform Alliance fuel the development of new CPU technologies and system designs that are specifically engineered to handle the demands of AI workloads efficiently. These innovations lead to significant energy savings and boost the overall performance of AI applications, highlighting the substantial benefits of industry-wide collaboration in driving technological advancements.

Jeff Wittich is chief product officer at Ampere Computing.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.

Topics

About

Policies

Our Network

More

How CPUs will address the energy challenges of generative AI

Balancing performance, energy efficiency, and cost-effectiveness, CPUs adeptly handle the less-intensive inference tasks that make up the lion’s share of AI workloads.

CPUs vs. GPUs for AI inference workloads

Choosing the right compute for AI workloads

Matching CPUs with performance and energy needs

Advancing CPU technology for AI

Show me more

What is Rust? Safe, fast, and easy software development

Kotlin for Java developers: Classes and coroutines

Microsoft extends Entra ID to WSL, WinGet

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx

How CPUs will address the energy challenges of generative AI

Balancing performance, energy efficiency, and cost-effectiveness, CPUs adeptly handle the less-intensive inference tasks that make up the lion’s share of AI workloads.

CPUs vs. GPUs for AI inference workloads

Choosing the right compute for AI workloads

Matching CPUs with performance and energy needs

Advancing CPU technology for AI

Related content

Azure AI Foundry tools for changes in AI applications

Microsoft unveils imaging APIs for Windows Copilot Runtime

A GRC framework for securing generative AI

Go language evolving for future hardware, AI workloads

Show me more

What is Rust? Safe, fast, and easy software development

Kotlin for Java developers: Classes and coroutines

Microsoft extends Entra ID to WSL, WinGet

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx