Moral of the story: Data migration is never simple, cloud or not. Credit: Tumisu I’m often asked, “How can I relocate data to the cloud, and improve the databases, applications, security, governance, and dataops as the migration occurs?” Everyone is looking for a shortcut or a magical tool that will migrate and automatically improve the state of the data. Sorry, that magic does not yet exist. In the meantime, a nonmagical migration process provides the best odds of success. Before we explore that process, I’ll mention a few things: First, cloud relocation does not use a waterfall approach. Certain tasks need to be completed to move on to the next tasks, but not all. These dependences will be readily apparent, but feel free to do any of the tasks below out of sequence. Second, to get this right the first time, follow the process outlined below with the correct mix of talent. You’ll need subject matter experts for databases, security, ops, governance, cloud-specific services, etc. Those people are difficult to find right now. Finally, this is a general approach. You will need to add or remove some items. For instance, if you’re a health care company, you need to deal with more compliance and governance issues around the use, migration, and deployment of data. With all that said, here’s the process: Assess the “as is” state of the data, including models (object, relational, in memory, special purpose, or other), metadata, application coupling, and requirements (security, governance, business continuity/disaster recovery, and management). Tagging starts here. Look for opportunities to reduce redundancy and increase efficiency. This can be as impactful as moving from one model to another (relational to object) which requires a great deal of application refactoring, normalization of all data schemas, defining a single source of truth, etc. You need to consider security, governance, and data ops as well, which are redundant to everything listed here, just to be clear. Define the “to be” state with the changes and requirements defined above. One of the paths I recommend is the development of a CDM (common metadata model). A CDM, at its essence, provides a single source of truth for most and sometimes all of the data that exists in an enterprise. It’s made up of many different databases that may use different database models, such as relational and object, and many different structures or schemas. However, it appears to all who use the CDM as a single, unified, abstract database that, when asked a question, provides a common and consistent answer. Define a migration and implementation plan, focusing on the target cloud platforms. The devil is in the details, and some changes still need to be made in flight, but minor ones. Create a staging and testing platform for applications and databases. This may also include CI/CD (continuous integration/continuous delivery) links. Moving forward, they should be maintained by devsecops teams, as well as DBAs. Make a plan for that maintenance. Test deployment on the staging and testing platforms to determine performance, security, governance, fit-to-purpose, etc. Repeat for each application and database. Testing will help determine cost of ops and provide the data to project those costs for the next several years. Now that real cost metrics exist, this is easy. Projections also help avoid sticker shock when you get your first cloud bill. Implement cost governance. Define ops planning, including monitoring and management approaches, playbooks, and tools. Take advantage of abstraction and automation to remove humans from the ops processes as much as possible. Begin phased deployments, starting with the smallest and least important databases and applications, progressing to the biggest and most important. Try to deploy with flexible deadlines. Don’t worry, you’ll get better at it as you learn. Rushing this portion of the process is where failure typically occurs because important tasks are tossed out in favor of meeting an arbitrary deadline. Execute acceptance testing after each phase. Begin dataops. Take a vacation. On average, this process takes three weeks for each database. If you have 100 databases to migrate, realistically speaking, it will take about 42 to 52 weeks to complete. The move-and-improve processes are not magical or automatic, but they can be baked into the migration. Good luck. Related content feature Dataframes explained: The modern in-memory data science format Dataframes are a staple element of data science libraries and frameworks. Here's why many developers prefer them for working with in-memory data. By Serdar Yegulalp Nov 06, 2024 6 mins Data Science Data Management analysis Cloud providers make bank with genAI while projects fail Generative AI is causing excitement but not success for most enterprises. This needs to change quickly, but it will take some work that enterprises may not be willing to do. By David Linthicum Nov 05, 2024 5 mins Generative AI Cloud Computing Data Management feature Overcoming data inconsistency with a universal semantic layer Disparate BI, analytics, and data science tools result in discrepancies in data interpretation, business logic, and definitions among user groups. A universal semantic layer resolves those discrepancies. By Artyom Keydunov Nov 01, 2024 7 mins Business Intelligence Data Management feature Bridging the performance gap in data infrastructure for AI A significant chasm exists between most organizations’ current data infrastructure capabilities and those necessary to effectively support AI workloads. By Colleen Tartow Oct 28, 2024 12 mins Generative AI Data Architecture Artificial Intelligence Resources Videos