“Data is the new gold” is a common and popular phrase nowadays. Companies increasingly rely on their valuable data assets to optimize their operations and to make strategic decisions. All of them have a significant impact on their business roadmap. While their applications are moving to the cloud in an ever faster pace, it becomes time to consider also moving their data assets to the cloud as well. Not every company is so confident to migrate both topics at the same time. They choose to operate the hybrid cloud model. When they do want to migrate their data, there are a lot of factors to think of. In this article, we’ll explore key aspects of your large data migration projects. By following these guidelines you make sure your project starts off well.
Definition
As with many aspects, there are multiple definitions of data migration. According to Microsoft, their specific definition is as follows:
Data migration is the process of selecting, preparing, extracting, and transforming data and permanently transferring it from one computer storage system to another.
So this means data migration is a lot more than just moving over your raw data assets from one location to the other. To narrow down our focus on this topic, we’ll only explore the guidelines to migrate from an on-premise data center to the cloud. We’ll skip migration projects from one data center to the other and from cloud provider A to cloud provider B.
Migration versus conversion
Often, people mix up data migration and data conversion. To simplify things, both terms are not synonyms for each other since they fit different purposes. Data conversion is a step in the entire migration project.
The following statements make the difference between them more clear:
- Data migration might include data conversion but this is not an absolute necessity (think of lift and shift of your data).
- In data conversion, the format is transformed. This is not the case for data migration.
- Converted data requires a new application to actually use (read and write) it. The system, environment or data center is not changed.
Projects to make the data available in the public cloud include data conversion and migration.
Justifications to migrate
Business owners need a proper justification to migrate their data before they will sign off on your project. Large data projects are also called “big data” projects and these are characterized by high volumes (of data), high velocity, or variety. All of these characteristics require extensive computing power and fast, reliable storage mechanisms which cloud providers offer.
Among these benefits, the following justifications make up your business case to satisfy the needs of business owners:
- Get rid of legacy systems. By converting your data into more modern formats and making it available to your end-users through state-of-the-art applications.
- Lower costs: as with all cloud services and the “pay per use” model. This also gives you the option to try out new and more sophisticated services which are first released in the cloud.
- Scalability and flexibility increase: it’s easier and much faster to utilize scalable cloud services which are available on demand. It offers more flexibility for your data storage and processing needs.
- More secure. This tends to conflict with the arguments of security professionals who still think in the “old world”. However, various research projects have been carried out to justify that your data is more secure in the cloud compared to your on-premise solutions.
After these arguments, it’s time to find out which data types might make the move to the cloud.
Different types of data
The most commonly named data types on the internet are storage migration, database migration, application migration, data-center migration, business process migration, and cloud migration.
In short the main characteristics of these data types:
- Transportation of data from a physical or virtual environment to another one. Often to use upgraded hardware or to replace unstructured data (paper) with structured data in a digital format.
- Database migration which includes upgrades of the way the data is stored as well as running a completely different database solution. Think of using PostgreSQL in the cloud as opposed to Oracle DB on-premises.
- Application migration: companies that are about to use new/other software applications that deal differently with data.
- Data-center migration: relocating physical servers, network equipment, storage devices, etc to a new location.
- Business process migration: transform and/or transfer business applications with its customer data, products, processes, statistical data and databases, etc to a new environment.
- Cloud migration includes setting up the entire cloud environment that suits the need of an organization as well as the incorporation of (base) services that applications and workloads require.
For every IT product, it’s important to consider which data types you’re about to migrate. Every data type needs a proper approach to make sure the migration is fast, secure, straightforward as well as properly executed.
Planning and validation
Data migration efforts require proper planning and validation of the data when the transfer is over. Validation is not just an IT-related activity. Also, business owners need to ensure the data is (still) usable by them. Besides this, think of IT auditors who validate the data (requirements) after the various migration steps have been finished.
It would be beneficial to ask a dedicated and experienced (Agile) team to carry out the tasks which are needed. A Product Owner makes sure the right stakeholders are on board and also aligned with the right features.
Transferring your data
One of the main tasks of the migration is the actual transfer of your data. This might seem easy, but can be very challenging due to the following risk factors:
- You can lose your data if you don’t do it correctly. For example, delete the source data immediately after the transfer is finished. What if the integrity of your files is incorrect? Then you have to rely on a backup to be restored.
- Data in transit poses a security threat since it might be transferred over the public internet. Even if you have secured your channels, something can go wrong such as losing a decryption key. Another factor to take into account is the actual new storage location. Data at rest should also be secured.
- Speed: especially if you have large quantities of data, the actual file transfer can take very long. Various online tools such as the omnicalculator offer a way to estimate the actual transfer time based on the pretended file size and transfer speed. And on top of this, you need to make sure the actual transfer does not harm the current usage of the network, thus not disturbing other employees or (critical) systems.
- Cost: It’s often easy to get your data into the cloud, but difficult and costly to get it out. However, even when uploading large quantities of data comes at a cost. You have to take into account the personnel costs (think of training up-front), product costs (if using a commercial tool) network costs and storage + operation costs. Furthermore, you need to pay for the cloud platform infrastructure which needs to be present up-front.
Other considerations include the decision to use online versus offline transfer. There are also calculators online that can help you with this choice.
Tools
Various tools exist today that can help you with the actual transformation and transportation of the data. Some tools are free of charge, whereas others are very sophisticated so they can handle a lot of use cases and scenarios.
Self-scripted tools are often only sufficient for small amounts of data for on-premises destinations where scaling is not required. They only provide local access and they do not take into account high up-time and reliability for example.
On-premises tools are better suited for large quantities of data for common sources and destinations which also enables scaling through various techniques. Still, they have not suited for an on-premises-to-cloud migration scenario.
Contrary to the previous tools, cloud-based tools offer options to use multi-site sources, only common sources and destinations. They also offer global access plus computing and storage on demand with high uptime and reliability.
A collection of common tools which organizations use nowadays are:
- Flyway database migration tool for Java applications
- Various AWS cloud services to host data on Amazon Fsx for Lustre
- Azure DataLake migration considerations and supporting tools
This list is just the tip of the iceberg. Be sure to check out the most suitable for your large datasets.
Concluding words
As seen in this article, the migration of data from an on-premises data center to the cloud is not always as easy as it seems. There are numerous factors to take into account. Costs can become very high, transfer time can be huge, carefully consider conversion or just migration, etc. Your business owners demand a proper financial trigger before you carry out costly projects. Various tools ranging from self-created scripts to advanced (cloud) based tools help you to migrate your large datasets smoothly and with ease.