by Sammy Tam

Understand the 3 major approaches to data migration

feature
May 12, 202313 mins
AnalyticsData ManagementDatabases

Application-based, file-based, and block-based migration all have their own merits and use cases. Choosing the right solution starts with understanding their differences.

shutterstock 168398438 migration migrating swans large birds flying in formation against a blue sky
Credit: Shutterstock / Delmas Lehman

Data migration is a critical and often challenging operation for IT organizations of any size. Whether the organization is small, mid-sized, or a Fortune 500 giant, moving data from one system to another is fraught with risks, ranging from data loss or corruption to extended downtime, and the impacts of those risks can be extremely costly. Regardless of company size, establishing continuity and reliability of the organization’s data mobility functions is a vital undertaking, and selecting the correct approach and solution for data migration is essential.

There are three major approaches to migrating data enterprise production environments—application-based (logical), file-based, and block-based (physical). Each of these migration methods has its own merits and use cases. We’ll evaluate each of the three approaches individually in this article. To start, we’ll discuss some common reasons why organizations need data migration in the first place.

Common data migration use cases

Migrating to a new location (data relocation). Data migration is needed when data and applications must be moved from one location to another, such as during a data center relocation or consolidation. These migrations are especially popular among large multinational enterprise organizations where data is frequently moved from place to place.

Migration performance and the ability to conduct live data migration are especially important in this type of migration due to the potentially limited bandwidth between the source and destination.

Migrating to new storage (storage refresh). Replacing or adding new storage is possibly the most common use case for data migration. Organizations acquire new storage for many reasons, and each storage refresh requires moving production workloads from old storage to new storage. Cost, features, reliability, and performance are among the popular reasons organizations acquire new storage.

Storage refreshes may include physical storage changes and storage protocol changes (from iSCSI to Fibre Channel, Fibre Channel to iSCSI, and other proprietary protocols).

The ability to transparently and non-disruptively launch and perform data migration without downtime is crucial to this type of migration to eliminate unnecessary impact on business applications in production.

Migrating to a new platform (infrastructure refresh). Infrastructure refreshes occur all the time within organizations, especially when operations scale through natural growth or acquisition or when new technology is available. These refreshes can be prompted by a desire to move application workloads from one hosting location or state to another, from physical environments to virtual environments, to private cloud or hyperconverged infrastructures, to public cloud, between cloud providers, or even when exiting the cloud to a managed data center.

Migrating storage data is usually just one part of a much wider-scoped infrastructure upgrade carried out over a longer period. Many different types of applications, operating systems, file systems, infrastructure platforms, and providers are usually involved.

As a result, having a single integrated migration solution that works natively with many platforms and vendors has become vital for efficiency and manageability for organizations that value data mobility. Using multiple tools and solutions for the scenarios detailed above can introduce unnecessary complexity and increase the risk of human error, factors that can lead to increased cost and downtime.

Application transformation. Data migration is sometimes needed when application environments or applications themselves require transformations. These may include application upgrades, consolidations, expansions, transforming monoliths to microservices, or even moving services from one type of application to another.

When an enterprise decides to transform its applications, it is usually beyond IT infrastructure-level migration as it requires broader business transformation operations.

In a hurry to complete a project, developing a strategy to move the data onto the new storage environment is often pushed to the last minute. The last-minute scramble often causes an organization to skip steps and jump into the data migration without taking the necessary steps. It seems obvious, but to properly design and execute a data migration, an organization needs to outline the reason for the migration. Once it is understood what data need to be migrated and why, they can explore the best way to approach the migration.

The three major approaches to migrating data are application-level, file-level, and block-level. Let’s look at each in more detail.

Application-level or logical data migration

Application data migration—sometimes called logical data migration or transaction-level migration—is a migration approach that utilizes the data mobility capabilities built natively into the application workload itself.

These capabilities are usually available only for a small number of enterprise-scale applications such as databases, virtualization hypervisors, and file servers, and they are typically designed for data protection purposes.

Technique: Some applications offer proprietary data mobility features. These capabilities usually facilitate or assist with configuring backups or secondary storage. These applications then synchronously or asynchronously ensure that the secondary storage is valid and, when necessary, can be used without the primary copy.

Application examples: Postgres SQL Logical Replication, Microsoft SQL Replication, Oracle Goldengate, Storage vMotion (VMware), and other commercial tools that migrate VMware using VMware APIs.

Advantages of application-level data migration

User interface. The native data mobility capabilities are usually integrated with the application software and can be configured using the software’s main user interface.

Deployment. With native data mobility in the software, no additional requirements or installations are generally necessary.

Compatibility and support. Native data mobility is designed only for the specific application. There is no need to worry about compatibility. If you run into trouble, the vendor typically has online support. Application-level migration may also enable other application transformation possibilities that other data migration approaches cannot provide. One example would be moving data between major database versions that are not otherwise compatible.

Limitations of application-level data migration

Limited availability. Only major large-scale enterprise applications such as databases and file servers may provide such capabilities. The key word here is “may.” Availability will depend significantly on the age and type of application you want to migrate to the latest version. 

Single-purpose. Since the data mobility features are built specifically for the individual application, the associated costs of licenses, training, and other administrative overhead will add up when used in a large migration operation.

Efficiency. Application-level data synchronization is performed logically. For example, database replications are performed at the database record, transaction, or SQL statement level. While these methods are accurate and versatile, there may be more efficient methods to synchronize data from one storage system to another or from one platform to another, especially when a large amount of data is involved.

Production impact. Logical synchronization is part of the application and therefore can use only the existing available bandwidth between the application and storage. As a result, the ability to perform data migration while simultaneously maintaining the production workload may be limited.

License cost. App-level data migration functionalities are often considered enterprise-grade features and require an additional license. Due to the software’s proprietary and single-purpose nature, there may be no viable lower-cost alternatives.

File-level data migration

File migration is just what it sounds like—a data migration performed at the file system level. It can include local and network-based file systems. File migration tools are usually integrated with popular files ystem types and file storage providers.

Technique: File migration tools usually scan a file system (Ext4, NTFS, CIFS, NFS, SMB, etc.) and copy the files to a secondary file system file by file. When a file is in use, it cannot be copied and has to be moved in a subsequent scan.

A few common examples include Rsync (Linux), Robocopy (Windows), Rclone (cloud), and various commercial options.

Advantages of file-level data migration

Interoperability. Most applications today are built using files as persistent storage. File migration can be a general mechanism for migrating different applications in different configurations. The migration tool is therefore separate from the application.

Technically simple. File data can be accessed using the same well-established APIs provided by operating systems that most applications already use. Therefore, file migration operations usually involve less specialized knowledge and technique that could introduce errors if not performed correctly.

Available tools. Many file-level data synchronization tools are free or open-sourced, including tools distributed with major operating systems.

Compatibility. During an application or platform transformation, there may be times when the migration must be performed from one type of file system or file share to another. File migration naturally supports these transformations because data synchronization is performed on a file-to-file basis.

Limitations of file-level data migration

Administrative overhead. In a typical application environment, you will find an enormous number of files and file systems. Managing the migration of all files and file systems could incur significant unnecessary administrative and management overhead. For example, if the organization is relocating an entire data center, the time and management required for a file by file migration could be burdensome enough to delay the move significantly.

Efficiency. Like migrating at an application record or transaction level, migrating a large amount of data file by file can be inefficient, especially in active environments with a high rate of data change. The resources required to manage such a migration are usually higher as well.

Applications such as databases that frequently change file data (keeping files opened and locked) may in some cases make file migration extremely inefficient or even impossible.

File metadata. File metadata, such as ACLs, can be very complex. Many basic tools do not provide adequate support. The lack of on-demand support can be problematic when migrating across platforms.

Data integrity. With file migration, only file data is synchronized. The internal structure and metadata of a file system are not. Leaving metadata behind is a problem for some organizations that must independently verify the data’s integrity after the migration. There is no easy way to discover missing or corrupted files. 

In contrast, if a file system is migrated entirely, including internal file system structures and metadata, any data corruption or missed data would likely render a file system unmountable and could be detected by file system checks. The chances that only file data is corrupted but not the file system itself is so extremely unlikely that it is mathematically negligible.

Block-level data migration

Block-level data migration is performed at the storage volume level. Block-level migrations are not strictly concerned about the actual data stored within the storage volume. Rather, they include file system data of any kind, partitions of any kind, raw block storage, and data from any applications.

Technique: Block-level migration tools synchronize one storage volume to another storage volume from the beginning of the volume (byte 0) to the end of the entire volume (byte N) without processing any data content. All data are synchronized, resulting in a byte-to-byte identical destination copy of the migrated source volume.

Examples: The dd command (Linux), Cirrus Migrate Cloud, Cirrus Migrate On-Premises, and other commercial migration and disaster recovery tools.

Advantages of block-level data migration

Administrative efficiency. Organizations relocating their data centers or refreshing their storage typically see material efficiency advantages. In these scenarios, the goal is to create an identical copy of the storage volumes in the new location or storage product. The data migration is performed as one identical unit regardless of how much data is being transferred, how many files are stored within the storage devices, or how many different types of data are on the storage devices.

Performance. Data is synchronized at the block level to perform data copying more efficiently with more granular change tracking, larger block I/O, sequential access, etc. Migrating an entire storage volume as a unit also enables more advanced data reduction capabilities.

Fundamentally versatile. Block migration migrates data as one unit at the infrastructure level. There are no file system or application support or compatibility concerns because the block-level migration process does not require processing any data that resides on a storage device. Any applications or any file systems—from VMware’s VMFS, to hyperconverged environments, to horizontally scaled software-defined storage—can be migrated without any data content processing necessary.

Data security. Block-level migration is the only genuinely secure approach to data migration because the migration tool does not interpret any application or file data during the entire migration. It is even possible to migrate an encrypted file system without having the key to the file system.

Raw storage support. In specialized applications that do not consume data from a file system or that use a proprietary file system, block-level migration can be the only way to accomplish an accurate and volume-consistent migration.

Data integrity. Block-level migrations are much more straightforward compared to other migration approaches. The block-level data is mostly copied sequentially, and the entire storage device is synchronized as one unit. As a result, the data integrity of a completed migration can be independently verified with much less effort.

True live migration. Migration tools that perform block-level migration can migrate truly live data. It does not matter how that data is used in production. Whether the data is contained in a database or a file archive, whether files are constantly opened and locked, or even if file permissions change, block-level migration is always performed in the same manner.

Limitations of block-level data migration

Technically sophisticated. Although conceptually straightforward, block-level migrations are technically sophisticated. Unlike other migration approaches, block-level migration often involves specialized knowledge and techniques instead of the readily available OS-provided APIs. These include knowledge of Fibre Channel and iSCSI protocols, low-level OS-specific kernel operations, etc.

Scarcity of tools. Due to the sophistication and specialized nature of a block-level migration, fewer block-level migration tools are available. There are even fewer purpose-built, block-level migration tools, as most block-level synchronization solutions available today are designed for data protection and disaster recovery purposes.

Application transformation. Block-level migration provides an excellent way to migrate any data. However, when the application is being transformed, and the data needs to be changed, application-specific tools may be necessary. For example, when migrating an Oracle Database instance from an AIX host to a Linux host, an application-level logical migration may be preferable due to the byte-order differences between the two operating system’s architectures.

Application, file, or block?

As the volume of data that needs to be stored continues to balloon, organizations across the globe are wrestling with not only where to keep their data but how to optimize their storage environments. As storage technologies continue to advance, and the cloud becomes viable for high-performance databases and applications, data migration and data mobility become significant considerations.

The conversations about data types, goals, and ways to control storage costs are now taking center stage. The first step in the journey starts with understanding the options and then aligning the strategy to the goal.  

Sammy Tam is the vice president of engineering for Cirrus Data Solutions. As a founding member of the R&D team at Cirrus Data, Sammy has been instrumental in developing block-level data migration technologies and software. Based in Syosset, NY, Sammy leads the worldwide engineering and development team. For more information, visit www.cirrusdata.com.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.