David Linthicum
Contributor

Forgotten cloud scaling tricks

analysis
Mar 03, 20234 mins
Cloud ArchitectureCloud ComputingIT Skills

Architecting for scalability will soon become a lost art. Most architects overlook autoscaling with predictive analytics, resource sharding, and cache invalidation.

shutterstock 1748437547 cloud computing cloud architecture edge computing
Credit: amgun / Shutterstock

I’m noticing a pattern in my work with young and old cloud architects. Well-known cloud scaling techniques used years ago are rarely used today. Yes, I understand why, being it’s 2023 and not 1993, but cloud architect silverbacks still know a few clever tricks that are relevant today.

Until recently, we just provisioned more cloud services to solve scaling problems. That approach usually produces sky-high cloud bills. The better tactic is to put more quality time into upfront design and deployment rather than allocating post-deployment resources willy-nilly and driving up costs.

Let’s look at the process of designing cloud systems that scale and learn a few of the lesser-known architecture tricks that help cloud computing systems scale efficiently.

Autoscaling with predictive analytics

Predictive analytics can forecast user demand and scale resources to optimize utilization and minimize costs. Today’s new tools can also deploy advanced analytics and artificial intelligence. I don’t see these tactics applied as much as they should be.

Autoscaling with predictive analytics is a technology that allows cloud-based applications and infrastructure to automatically scale up or down based on predicted demand patterns. It combines the benefits of autoscaling, which automatically adjusts resources based on current demand monitoring, with predictive analytics, which uses historical data and machine learning models to forecast demand patterns.

This blend of old and new is making a big comeback because powerful tools are available to automate the process. This architectural approach and technology are especially beneficial for applications with highly variable traffic patterns, such as e-commerce websites or sales order-entry systems, where sudden spikes in traffic can cause performance issues if the infrastructure cannot scale fast enough to meet demand. Autoscaling with predictive analytics results in a better user experience and reduced costs by only using the resources when needed.

Resource sharding

Sharding is an extended existing technique that involves dividing large data sets into smaller, more manageable subsets called shards. Sharding data or other resources enhances its ability to scale.

In this approach, a large pool of resources, such as a database, storage, or processing power, is partitioned across multiple nodes on the public cloud, allowing multiple clients to access them concurrently. Each shard is assigned to a specific node, and the nodes work together to serve client requests.

As you may have guessed, resource sharding can improve performance and availability by distributing the load across multiple cloud servers. This reduces the amount of data each server needs to manage, allowing for faster response times and better utilization of resources.

Cache invalidation

I’ve taught cache invalidation on whiteboards since cloud computing first became a thing, and yet it’s still not well understood. Cache invalidation involves removing “stale data” from the cache to free up resources, thus reducing the amount of data that needs to be processed. The systems can scale and perform much better by reducing the time and resources required to access that data from its source.

As with all these tricks, you must be careful about some unwanted side effects. For instance, if the original data changes, the cached data becomes stale and may lead to incorrect results or outdated information being presented to users. Cache invalidation, if done correctly, should solve this problem by updating or removing the cached data when changes to the original data occur.

Several ways to invalidate a cache include time-based expiration, event-based invalidation, and manual invalidation. Time-based expiration involves setting a fixed time limit for how long the data can remain in the cache. Event-based invalidation triggers cache invalidation based on specific events, such as changes to the original data or other external factors. Finally, manual invalidation involves manually updating or removing cached data based on user or system actions.

None of this is secret, but these tips are often not taught anymore in advanced cloud architecture courses, including certification courses. These approaches provide better overall optimization and efficiency to your cloud-based solutions, but there is no penalty for not using them. Indeed, these problems can all be solved by tossing money at them, which normally works. However, it may cost you 10 times more than an optimized solution that takes advantage of these or other architectural techniques.

I would prefer to do this right (optimized) versus doing this fast (underoptimized). Who’s with me?  

David Linthicum
Contributor

David S. Linthicum is an internationally recognized industry expert and thought leader. Dave has authored 13 books on computing, the latest of which is An Insider’s Guide to Cloud Computing. Dave’s industry experience includes tenures as CTO and CEO of several successful software companies, and upper-level management positions in Fortune 100 companies. He keynotes leading technology conferences on cloud computing, SOA, enterprise application integration, and enterprise architecture. Dave writes the Cloud Computing blog for InfoWorld. His views are his own.

More from this author