David Linthicum
Contributor

When it comes to data and cloud computing, think proactively

analysis
Aug 03, 20094 mins
Cloud Computing

Where you place data within your cloud computing service is more important than you may know

Data is moving to the cloud and has been for some time. However, when considering moving large data sets around the Internet, cloud to cloud or cloud to company, you have to consider the architectural trade-offs. And there are several.

The core issue is that data residing in the cloud is perfectly fine, when considering performance and integrity, as long as it’s within the same domain as the core applications and the processes that use the data. Thus, if your data resides on Amazon’s EC2, the best approach is to place your applications and processes there as well.

[ Stay up on the cloud with InfoWorld’s Cloud Computing Report newsletter. | Confused by the cloud hype? Read InfoWorld’s “What cloud computing really means” and watch our cloud computing InfoClipz. ]

Why? It’s all about the transmission of data requests and result sets. If the data is located in a different domain — say, another cloud computing provider — than the result sets, which are typically huge, all that data has to find its way back over the Internet to the requesting application or processes. Thus, the system suffers from the latency that comes along with moving a lot of data over the Internet. That’s not the case if it is all within a single domain, whether cloud-delivered or on-premise.

I’m seeing cloud computing performance issues coming up time and time again due to the fact that we now have cloud computing providers who provide a specific service component, such as database, development platform, or integration. Thus, thus you can get your database from one provider, your application development platform from another, and your process integration engine from a third.

While this mixing and matching of fine-grained cloud-delivered IT resources is just fine in many instances, if you are consistently moving large amounts of data from cloud computing provider to cloud computing provider, or between on-premise systems and the clouds, then performance problems will surely arise. Moreover, you may find other problems as well, such as database integrity issues, including corruption and data loss.

So what are the architectural guidelines when it comes to data and cloud computing? There are two main ones.

1. Consider the size of the result sets. Result sets that are consistently large should never be returned from a remote source, either via cloud computing or on-premise. They should be placed as close to the applications and processes that use the data as possible.

In some cases, the architecture has some processes and applications that are close to the data, but others that exist on other cloud computing platforms or perhaps on-premise. In these instances, you need to consider relocating those applications and processes to be closer to the data, or you’ll suffer latency issues. There’s no easy way around that.

If the data can only exist remotely from the core applications and processes that use the data, then consider cloud computing providers that support sophisticated caching mechanisms, thus reducing the amount of data that travels over the Internet. Also, consider those cloud computing providers with many points of presence so that you’re as close to the data as you can get, no matter where your applications and processes are located.

2. Consider security. In many instances, data needs to be encrypted as it moves over the Internet for legal and privacy reasons, and that causes additional latency. Avodiing this encryption is another reason to collocate the data with the applications and processes that use them.

If this smells like the typical trade-offs with distributed architectures, you’re right. However, because cloud computing platforms are Internet-delivered, there are many other considerations, including new vulnerabilities. Performance is going to be a critical success factor for systems that are cloud-based, and your best approach is to be proactive with the architecture, understanding the issues before they become problems.

David Linthicum
Contributor

David S. Linthicum is an internationally recognized industry expert and thought leader. Dave has authored 13 books on computing, the latest of which is An Insider’s Guide to Cloud Computing. Dave’s industry experience includes tenures as CTO and CEO of several successful software companies, and upper-level management positions in Fortune 100 companies. He keynotes leading technology conferences on cloud computing, SOA, enterprise application integration, and enterprise architecture. Dave writes the Cloud Computing blog for InfoWorld. His views are his own.

More from this author