Cloud-based generative AI systems that use too many resources turn out to be too complex and expensive. Here’s how you can avoid this. Credit: Cagkan Sayin/Shutterstock Cloud is the easiest way to build generative AI systems; that’s why cloud revenues are skyrocketing. However, many of these systems are overengineered, which drives complexity and unnecessary costs. Overengineering is a familiar issue. We’ve been overthinking and overbuilding systems, devices, machines, vehicles, etc., for many years. Why would the cloud be any different? Overengineering is designing an unnecessarily complex product or solution by incorporating features or functionalities that add no substantial value. This practice leads to the inefficient use of time, money, and materials and can lead to decreased productivity, higher costs, and reduced system resilience. Overengineering any system, whether AI or cloud, happens through easy access to resources and no limitations on using those resources. It is easy to find and allocate cloud services, so it’s tempting for an AI designer or engineer to add things that may be viewed as “nice to have” more so than “need to have.” Making a bunch of these decisions leads to many more databases, middleware layers, security systems, and governance systems than needed. The ease with which enterprises can access and provision cloud services has become both a boon and a bane. Advanced cloud-based tools simplify the deployment of sophisticated AI systems, yet they also open the door to overengineering. If engineers had to go through a procurement process, including purchasing specialized hardware for specific computing or storage services, chances are they would be more restrained than when it only takes a simple click of a mouse. The dangers of easy provisioning Public cloud platforms boast an impressive array of services designed to meet every possible generative AI need. From data storage and processing to machine learning models and analytics, these platforms offer an attractive mix of capabilities. Indeed, look at the recommended list of a few dozen services that cloud providers view as “necessary” to design, build, and deploy a generative AI system. Of course, keep in mind that the company creating the list is also selling the services. GPUs are the best example of this. I often see GPU-configured compute services added to a generative AI architecture. However, GPUs are not needed for “back of the napkin” type calculations, and CPU-powered systems work just fine for a bit of the cost. For some reason, the explosive growth of companies that build and sell GPUs has many people believing that GPUs are a requirement, and they are not. GPUs are needed when specialized processors are indicated for a specific problem. This type of overengineering costs enterprises more than other overengineering mistakes. Unfortunately, recommending that your company refrain from using higher-end and more expensive processors will often uninvite you to subsequent architecture meetings. Keeping to a budget Escalating costs are directly tied to the layered complexity and the additional cloud services, which are often included out of an impulse for thoroughness or future-proofing. When I recommend that a company use fewer resources or less expensive resources, I’m often met with, “We need to account for future growth,” but this can often be handled by adjusting the architecture as it evolves. It should never mean tossing money at the problems from the start. This tendency to include too many services also amplifies technical debt. Maintaining and upgrading complex systems becomes increasingly difficult and costly. If data is fragmented and siloed across various cloud services, it can further exacerbate these issues, making data integration and optimization a daunting task. Enterprises often find themselves trapped in a cycle where their generative AI solutions are not just overengineered but also need to be more optimized, leading to diminished returns on investment. Strategies to mitigate overengineering It takes a disciplined approach to avoid these pitfalls. Here are some strategies I use: Prioritize core needs. Focus on the essential functionalities required to achieve your primary objectives. Resist the temptation to inflate them. Plan and asses thoroughly. Invest time in the planning phase to determine which services are essential. Start small and scale gradually. Begin with a minimal viable product (MVP) focusing on core functionalities. Assemble an excellent generative AI architecture team. Pick AI engineering, data scientists, AI security specialists, etc., who share the approach to leveraging what’s needed but not overkill. You can submit the same problems to two different generative AI architecture teams and get plans that differ in cost by $10 million. Which one got it wrong? Usually, the team looking to spend the most. The power and flexibility of public cloud platforms are why we leverage the cloud in the first place, but caution is warranted to avoid the trap of overengineering generative AI systems. Thoughtful planning, judicious service selection, and continuous optimization are key to building cost-effective AI solutions. By adhering to these principles, enterprises can harness the full potential of generative AI without falling prey to the complexities and costs of an overengineered system. More by David Linthicum: Serverless cloud technology fades away The next 10 years for cloud computing All the brilliance of AI on minimalist platforms Related content analysis Azure AI Foundry tools for changes in AI applications Microsoft’s launch of Azure AI Foundry at Ignite 2024 signals a welcome shift from chatbots to agents and to using AI for business process automation. By Simon Bisson Nov 20, 2024 7 mins Microsoft Azure Generative AI Development Tools news Microsoft unveils imaging APIs for Windows Copilot Runtime Generative AI-backed APIs will allow developers to build image super resolution, image segmentation, object erase, and OCR capabilities into Windows applications. By Paul Krill Nov 19, 2024 2 mins Generative AI APIs Development Libraries and Frameworks feature A GRC framework for securing generative AI How can enterprises secure and manage the expanding ecosystem of AI applications that touch sensitive business data? Start with a governance framework. By Trevor Welsh Nov 19, 2024 11 mins Generative AI Data Governance Application Security news Go language evolving for future hardware, AI workloads The Go team is working to adapt Go to large multicore systems, the latest hardware instructions, and the needs of developers of large-scale AI systems. By Paul Krill Nov 15, 2024 3 mins Google Go Generative AI Programming Languages Resources Videos