Many enterprises move their data problems to the cloud. Invest the time and money to clean up your data so that it can be more valuable to the business. To be kind, most enterprise data is less than optimal. Want to test this statement out at your company? Just ask where the customer data of record resides. If you ask someone in four different departments, you’ll get four very different answers. This issue is the natural byproduct of 20 to 30 years spent creating new databases using whatever database was popular at the time. This includes databases for mainframes, big relational databases, open source SQL, object databases, and now, special-purpose databases. Heterogeneity and complexity problems are an undeniable reality for those looking to move terabytes of data to the cloud. You must find a database analog in the cloud that is either an exact brand match or one that requires a minimal amount of restructuring and conversion. Unfortunately, this approach perpetuates the database silo problem. It’s a classic and seemingly endless example of kicking the can down the road for the next generation of IT. The trouble is that the “kick the can” path is relatively cheap. The “fix everything” path? Not so much. Those with a short-term view often find that migrating data to a public cloud provides no real gains in cost savings, agility, or productivity. Indeed, the problem that resided in their data center is now a problem that resides in the cloud. The pandemic drove many organizations to create a larger role for the public cloud within the enterprise. Most enterprises just want their move to the cloud to be fast and cheap. That means they take a lift-and-shift approach to data migration. At first, this method may make budgetary sense. However, taking the long view, lift and shift means you’ll have to migrate your data twice: once, the wrong way, and second, the right way. Here’s the bad news: The most effective data migration efforts take years, not months. Today, there are some who look at migrating data to the cloud as an opportunity to finally fix their enterprise data—to make data a first-class citizen and do wonderful things with all the data their enterprise has collected over the years. The best migration efforts focus on normalizing and improving all the data as it moves to the public cloud. Here are three fundamentals of a more effective data migration: Single source of truth. One database should manage data about customers, inventory, sales, etc. It should not have to gather data from 20 different places and deal with the resulting data quality issues. This may mean major surgery on your data, and perhaps the normalization of your database after 30 years. However, this is a basic step that makes enterprise data more usable and more valuable to the company. Heterogeneous metadata management. An abstraction layer exists over all cloud and on-premises databases that allows us to alter the structure and meaning of the data and to do so from a single interface. Data virtualization. A common architecture trick is to leverage data virtualization. This allows you to view any number of physical databases that you can virtually combine or split to meet your existing needs. The power of data virtualization is that it does not require back-end physical database changes to restructure data. It’s a quick way to move databases to the cloud and still deal with data in much more efficient and agile ways. If this sounds like new technology, it’s not. Data virtualization has been around since the 90s and can now be had in the public clouds. Some view data virtualization as cheating. It’s actually a sensible compromise if there is only a small budget to augment and improve data moving to the cloud. If you want to lock-in failure, relocating your database as-is to a public cloud will ensure it. Let’s face facts; your data is probably a mess. There comes a time when Band-Aids can no longer hold together decades of data slices and dices. It’s past time for most enterprise data to undergo the surgery required to fix the underlying problems. Simply moving the problem to the cloud simply creates a bigger problem. Do you really want to be that company? Related content feature Dataframes explained: The modern in-memory data science format Dataframes are a staple element of data science libraries and frameworks. Here's why many developers prefer them for working with in-memory data. By Serdar Yegulalp Nov 06, 2024 6 mins Data Science Data Management analysis Cloud providers make bank with genAI while projects fail Generative AI is causing excitement but not success for most enterprises. This needs to change quickly, but it will take some work that enterprises may not be willing to do. By David Linthicum Nov 05, 2024 5 mins Generative AI Cloud Computing Data Management feature Overcoming data inconsistency with a universal semantic layer Disparate BI, analytics, and data science tools result in discrepancies in data interpretation, business logic, and definitions among user groups. A universal semantic layer resolves those discrepancies. By Artyom Keydunov Nov 01, 2024 7 mins Business Intelligence Data Management feature Bridging the performance gap in data infrastructure for AI A significant chasm exists between most organizations’ current data infrastructure capabilities and those necessary to effectively support AI workloads. By Colleen Tartow Oct 28, 2024 12 mins Generative AI Data Architecture Artificial Intelligence Resources Videos