NTG Clarity Arabia Inc. :: The best in OSS Products, Professional Sevice and Corporate Training

Data migration projects typically go through several phases, with milestones in between. These phases can be depicted in following diagram.

STRATEGIZE

The Strategizing phase is usually the easiest part of the project planning process, however, it is the most critical. In this phase, we establish the focus of the overall project.

The data migration projects do not happen independently; rather, they are integral parts of other development efforts, such as the implementation of new systems. This is probably where fundamental mistake number one generally occurs. The project manager(s) is clearly focused on determining the requirements and functionalities that the new system must satisfy, and pays little attention to the data migration that should take place. It happens frequently, while reviewing a project plan for a new system, that we discover that data migration is merely listed as a single task item, if at all. The project team, while working under such project plans, will discover that a separate project plan, entirely devoted to the data migration effort, should have been developed.

These following activities are carried out during the Strategizing phase:

Determine the source and the target system by identifying all sources of the incoming data.
Determine the timeframe expected for the data migration activity. This is especially important if you are migrating a large 24x7 system that cannot afford any significant downtime.

ANALYZE

The Analysis phase of data migration should proceed in parallel with the analysis phase of the main project. Quite often it is the same people who conduct both of these two analyses steps. These people need clearly defined tasks for which they are responsible.

One of the main activities of this step is to identify the data sources that must be migrated to the new system.

At this step the project team will be getting familiarized with the actual data they plan to migrate. Remember, at this point of the project, no one has any idea if the data is even of high enough quality to consider migrating. We will also have to determine how much data there is to migrate, down to the row and column counts.

During the analysis phase, the following must be resolved:

Determine the required schema changes including which elements are missing in the current schema that will be needed to support the new application functionalities.
Define the required data cleansing to be performed.
Complete the topology analysis and establish of performance estimates. We, as a project team, will need to know precisely how long will the data migration effort needs to run. This will add to the required business downtime.
Decide what tools and languages the data migration utilities will utilize in the Design Phase.

At this point we are in a position to generate the physical database design.

DESIGN

In the Design phase, we will be working through each data element from every source data structure, and deciding whether to migrate each element, and where it maps into the new structure.

This phase of data migration must advance in parallel with the analysis phase of the main project. This is true because each data element, identified as a candidate for migration, is likely to result in a change to the data model.

The Design phase is not intended to thoroughly identify the conversion rules by which old data will be converted to fit into the new system; rather, it is the act of making a list of the old data elements that we intend to migrate.

BUILD

The Build phase of migration must go in parallel with the design phase of the main project. The project team commences the building phase by generating the new data structures and creating them within the database.

It is necessary to perform data mapping to the physical data model as opposed to logical data models.

With the physical data structures in place, the project team can begin the mapping process.

A team of at least three people from each core business area generally conducts the mapping process. Each team should be composed of:

Business analyst, ideally an end user possessing intimate knowledge of the old data to be migrated.
Systems analyst with knowledge of both the old and new systems.
Systems engineer who performs data research and develops migration programs and scripts based upon the mappings defined jointly by the business analyst and the systems analyst.

During this phase, the actual data migration code is developed, or the scripts written if a specific data migration tool was chosen.

During this phase the following issues are addressed:

Resolving data incompatibilities, irregularities and exceptions during conversions. The tool should be able to produce data irregularities and integrity violations reports. The source data will need to be cleansed by the end users or the legacy system administrators.
Maximizing code efficiency by taking full advantage of parallelism and performance tuning. The timeframe for the migration can be quite tight.

TEST

The Testing phase deals with two main subject areas: logical errors and physical errors.

Physical errors are syntactical in nature and can be straightforwardly identified and corrected.

Physical errors do not mean that there were errors in the mapping process. Rather, it deals with errors in applying rules of the scripting language used in the conversion process.

Running the data into the scripts is where we identify and resolve logical errors.

The first step is to execute the mapping. Even if the mapping is completed successfully, we must still identify:

The number of records this script is expected to create.
Whether or not the correct number of records got created. And resolve any discrepancies.
Whether or not the data got loaded into the correct fields.
Whether or not the data was correctly formatted.

The most effective test of data mapping is providing the newly loaded databases to the users who were involved in the analysis and Designing of the core system. Quite often these users will begin to identify other old data elements that should be migrated, and which were not identified during the analysis and design phases.

Data mapping often does not make sense to many end users until they can physically interact with the newly loaded databases. Most likely, this is where the majority of conversion and mapping requirements will be identified. Most people do not realize that they have overlooked something until it is not there anymore. For this reason, it is extremely important to unleash the end users upon the newly loaded data bases sooner rather than later.

The data-migration testing phase must be reached as early as possible to guarantee that it takes place prior to the Designing and building phases of the main core project. Otherwise, months of development effort can be wasted since each newly identified migration requirement affects the data model. This, in turn, call for painful amendments to the applications built upon the data model.

REFINE

The Refinement phase is where data cleansing is managed. Each modification to the data model, adjustment to the conversion rules, and changes to the scripts are basically combined to form the Refinement phase.

Scripting may cover all of the data cleansing situations, however, a certain degree of manual data cleansing may be required. It is the user community who can effectively complete this manual task.

It is important to know that the Testing step, and the Refinement step will be reiterative. Expect to go several times through these two steps before the data can be declared completely clean.

At this point, it is essential that both the logical and the physical data models be maintained concurrently. One has to recognise that this will double the administrative workload for the keeper of the data models. One must maintain continuity between the logical and physical designs.

Tools can be used to maintain the relationship between the logical and physical models, although this will require several in-house reports to be developed. For example, the project team will want reports that indicate discrepancies between entities/tables and attributes/columns. These reports will highlight any mismatch between the number of entities versus tables, and attributes versus columns. These reports should also identify any naming convention violations and point out any data definition discrepancies. It is prudent to select a tool that provides an API (Application Program Interface) to the meta-data. This will definitely be needed.

IMPLEMENT

This is the phase where all of the mappings are validated and successfully implemented in a series of thoroughly tested programs and scripts. This phase differs depending upon whether the project calls for converting old data to fit a new system or developing data interface (Bridge) between old system and new system.

In the case of converting old data to fit a new system, then we are working within the One Time' paradigm. Our goal is to successfully migrate the old data into the new system, rendering the migration scripts historical relics when the migration has successfully been completed.

On the other hand, if we are developing data interface (Bridge) between old system and new system, this interface will most likely be reloading the new system periodically. As new information is entered into the old system, it will need to be transferred to the new system. Script performance is a critical issue in this "Bridging" type of migrations. Meanwhile, converting old data to fit a new system type of migrations pay little attention to script performance since they will only run once.

Legal Disclaimer

EXCEPTIONS
','#F2F2F2', 50)"; onMouseout="hideddrivetip()">