2. Cómo obtener información del MIE
2.5. Ajuste de distribuciones espectrales de energía
THE BUTTERFLY METHOD
Wu et al. (1997) developed the Butterfly methodology. This is a seminal publication in the field of data migration. The objective of the methodology is to migrate from a legacy system to a target system focusing on the data perspective. The methodology eliminates the problem of keeping the legacy and target system running together and having to maintain consistency between the two systems. Practically, when the target system is deployed in production the legacy system is no longer used. During migration the target system is not yet deployed however. The reason for choosing this strategy is because of the technical challenges consistency maintenance involves and the lack of a general solution for such an issue. In order to manage this the methodology makes the legacy data store read only when the migration process begins and makes use of temp data stores for data that needs to be stored before the target system is put in production. The temp data stores are accessed through a data access allocator that redirects requests to the correct store. The migration is supported by a tool developed especially for this task, called Chrysaliser. During migration each data store is made read only and then migrated incrementally until the data in a data store is lower that the threshold value set at the start of the project. This means that the last data store can be migrated with little effort and in a short time span so that the system doesn’t suffer from long downtime. After this last migration step the target system is put in production.
The Butterfly method consists of six phases. The first phase (Phase 0) is the Prepare for migration phase. In this phase the requirements, the benchmarks and the target architecture and hardware are identified. The second phase (Phase 1) is Understanding the semantics of the legacy system and developing the target data schemas. Under analysis are the legacy interfaces, the legacy applications and the legacy data. Also, together with the target data schemas, mapping rules are determined between legacy and target schemas. The third phase (Phase 2) is building up a Sample Datastore, based upon the Target SampleData, in the target system. The aim of this phase is to get testing data ready for the target system. The next phase (Phase 3) refers to incrementally migrating all components of the system to the target architecture, except for the data. The fifth phase (Phase 4) is the migration of the legacy data to the target system and training the users in the target system. Migration is done gradually via the TempStores, Data Access Allocator and the data transformer (Chrysaliser). The last phase (Phase 5) is the cut-over to the migrated target system. The Butterfly methodology phases are summarized in Figure 7.
21 Figure 7 - Butterfly Methodology (Wu et al., 1997)
ARCHITECTURE DRIVEN MODERNIZATION
Khusidman & Ulrich (2007) proposes the Architecture Driven Modernization Horseshoe Model. The model presents Business and IT domain integrated view on system modernization. The business domain is represented by Business Architecture. Business architecture can undergo modernization when existing business rules or processes evolve to target business rules and processes. The IT domain comprises of the Application and Data Architecture and the Technical Architecture. The application and data architecture refer to the software architecture while the technical architecture represents the underlying hardware architecture. Each architecture can drive the evolution process, however regardless of the level of impact there are three common elements to any modernization:
• Knowledge discovery of the existing solution.
• Target architecture definition.
• Transformation from as-is to the to-be state.
Notable is the data element of the architecture. Thus, the modernization approach can be applied at a data level as well. The three elements above are identifiable in data migration scenarios. Another interesting aspect is the coupling of modernization efforts on all layers described by ADM. In order to achieve optimum results from the process, synchronization and alignment are critical on both vertical and horizontal dimensions. Thus, business, application and data and technical architecture need to be in sync and source and target solutions should also be coupled. For data migration, the relationship to the other layers, technical, application and business, is critical for success.
In practice modernization can be driven by any of the three layers. Technical driven modernization is the most commonly applied in practice as physical chance occurs because of obsolesce, usability or limitations in legacy technology. Application and data driven modernization occurs when applications no longer meet the business needs or when the data architecture is outdated with regard to strategic information requirements. Business driven modernization occurs when business models change meaning new business semantics, rules or processes. This influences the application and data and technical levels by a need to align them to new business requirements.
THE DATA WAREHOUSE INSTITUTE
Russom (2006) presents best practices in data migration as well as a data migration development and deployment model for The Data Warehouse Institute. The development model consists of iterating through five
22
phase together with a preliminary phase. The deployment model consists of five phases covering the requirements for successful data migration completion.
The first phase in the development model is Solution pre-design. The phase refers to requirements gathering and developing a detailed project plan and timeline. Another activity within this phase is data profiling, which should be done together with business managers and domain experts. The second phase, Solution design, refers to dividing the data migration between separate tasks based on their dependencies. The third phase is Data modeling. The phase concerns building the target database. In some cases, building the target database is a phased approach with intermediary models before reaching the target. Data modeling can also be an opportunity for improvement in the metadata level. The fourth phase, Data mapping, consists of mapping legacy source fields to the target ones. Data is rarely only mapped and copied, so mappings also include transformations. The process is iterative and the final transformation logic and migration order arises after several cycles of defining, building and testing. Because of the complexity of this task it is advised that mappings are not manually defined, rather a graphical user interfaced migration support tool should be used. The fifth phase, Solution development, refers to building the migration programs that effectively migrates the data from source to target using the mappings. The final phase, Solution testing, refers to deciding which subset of data is representative enough to test the development of the solution on. As stated earlier development and testing are iterative and form a cycle for improving the solution. Figure 8 depicts the development model.
Figure 8 - TDWI data migration development model (Russom, 2006)
The Deployment model starts with the actual deployment of the solution. Generally, a phased approach is undertaken and the timeline for the phases should be approved by both IT and business units. The second phase is the Administration hand-over which concerns giving responsibility over the data from the development team to the IT department. In case the legacy and target applications run simultaneously an optional Synchronization phase is required where the data between the two is synchronized. After deployment, Monitoring is a very important phase where the success of the data migration is evaluated by end users. Also, scalability, outages and data defects are analyzed. Finally the Retiring the legacy platform phase takes care of
23
decommissioning the legacy application and data source. Before this, it must be certain that the migration is a success, otherwise fallback is not possible.
PRACTICAL DATA MIGRATION
Morris (2006) describes in his book “Practical Data Migration” practical experience in data migration projects. The perspective is that of a third party that is brought in to handle the migration for the data owner. Along with a method for performing successful data migration projects the author discusses key concepts and rules of data migration. The first take-away are the four golden rules of data migration:
• Data migration is a business, not a technical issue – this means that even though the solution for data migration is technical in nature, the data that is migrated needs to make business sense, so the owner of the data migration project must be from the business side.
• The business knows best – because the data is business relevant, it is obvious that the business has the knowledge about the data.
• No organization needs, wants or will pay for perfect quality data – this refers to how much data cleaning needs to be done in order to raise the quality of the legacy data. The target system might implement better workflows and validation so that the new data is of better quality, however existing data should not necessarily be brought up to a perfect quality as that is time and financially expensive.
• If you can’t count it, it doesn’t count – this means that facts and figures are required for data migration in order to monitor progress and quality. If something is not measurable, then a statement regarding the success of that process is not possible.
There are four key concepts that any data migration project manager needs to be very familiar with. These concepts refer the existing data and to the strategy used for migrating it. The concepts are:
• Legacy Data Store – is a data repository of any kind that holds data of interest to the target system.
• Key Business Data Areas – is made up of the Legacy Data Stores and business functions that hold data regarding a certain entity of the existing system’s conceptual model.
• System Retirement Policy – is a specification of how a Legacy Data Store is taken offline after migration.
• Data Transitional Rules – are the temporary procedures put in place to cope with the effects of the migration, for example if the existing system is kept live while data migration is being carried out. These four concepts are the basis for the most important deliverables of the data migration project. Legacy Data Stores have to be analyzed and documented. The same holds for Key Business Data Areas. Analyzing the business data areas sheds light on which strategy to choose for the technical data migration. System Retirement Policies and Data Transitional Rules make up the core strategy documents for achieving success in the migration project. A successful project is that which ends with all exiting data available in the target system. Regarding the technical level of data migration, the author emphasizes the need to compare data structures and perform gap analysis before data mapping. There are also other non-functional requirements important to define. Some requirements are straight forward, such as data sizing and run times. Data sizing is the amount of data to be loaded. Run times can be lowered when queries on the legacy data stores are optimized.
Other requirements are more complex, such as sequencing, hardware and software considerations, windows of opportunity, migration implementation forms, fallback, the one way street problem. Sequencing is the order in which the update processes are run in order to obtain a consistent target. Hardware and software of the existing data stores are documented and the technical experts for each are identified. Windows of opportunity are periods when the systems are under less use than normal. This can be on national holidays for example,
24
and these can be the deadlines for migration projects. Implementation forms for migration are of three types. Big Bang implementation is when the entire data is migrated at once and the legacy data stores are taken offline. Parallel Running implementation consists of loading the data in the target, however the legacy data stores are kept running until the stakeholders are satisfied with the migration outcome. Phased Delivery involves migration only the data needed for the system to go live leaving the historical and reporting data to be migrated in different phases. Depending on the situation each approach has its benefits and disadvantages. It is also advisable that a pilot implementation is done so that the process is checked. In any type of implementation, if something goes wrong there is a need for a Fallback. Fallback is the steps required to be performed in order to bring the system back to the original state before the migration. The one way street problem occurs when data has been transformed in such a manner that the original values cannot be retrieved. To avoid this problem cloning the legacy sources is as solution.
Morris (2006) also presents a comprehensive approach to data migration in the form of a project. The reasons for which data migration should be undertaken as a separate project are:
• Data Migration has specific deliverables, such as Data quality rules, System retirement policies, Data Transitional Rules and Data Stakeholder analysis.
• The data migration team negotiates between the business side and the technical side.
• Data migration analysts need specialized skills.
• The project structure is different compared to standard development projects.
The project has four stages: Project Initiation, Data Preparation, Data Preparation (2), Build and Test. Iteration, especially through the preparation phases, is to be expected. Also, adaptation to design changes during the entire project is essential as the target system can encounter changes or analysis of legacy data stores can bring up functionality that is not covered in the target system. The separation of the project into stages allows for formal re-planning between the stages. For this at the end of each stage control points are in place in order to monitor progress.
Stage 0 is the project initiation is concerned with the organizational activities of the project which put it on the right track. Also, in stage 0 the data migration strategy is defined. Identification of data stakeholders and business domain experts is done in this stage. Also, initial lists and details regarding Data store owners, Legacy data stores and Key Business Data Areas are documented. Leveraging information and buy in from the technical and business sides is handled by knowing what deliverables to expect from these teams and knowing when and what deliverables are needed from the data migration project. System Retirement Policies are decided upon in relation to the situation of the legacy data stores and system use. Finally, the test strategy should also be thought of at this stage.
Stage 1 and 2 regard data preparation. In Stage 1 Legacy Data Stores are analyzed and definitions are written for each in order to decide whether they are migrated, removed or left untouched. When analyzing the data stores Data Quality Rules are also created as well as Key Business Areas models. A first draft of the System Retirement Policies is made as well. In Stage 1 the focus is on fixing the issues present in the Legacy Data Stores. Stage 2 is concerned with delivering a polished version of the Legacy Data Stores so that they are consistent with the conceptual model of the legacy system. Transient data stores are used here in order to process the data for cleaning. Gap analysis is performed on the Legacy Data Stores and data gaps are filled. Data Transitional Rules defined in Stage 2 solve the problem of managing data that overlaps the go live period. After these two stages Data Quality Rules, System Retirement Policies and Conceptual Entity Models are also be delivered.
Stage 3 is where the actual build and test activities are performed. Even though these activities are standard to any IT development project, there are data migration specifics that need to be accounted for. This is where the
25
ETL definitions and Data Transitional Rules are used, as well as the System Retirement Policies. At first a physical design of the migration is necessary. The design includes a timetable, the technical specification, fallback specification and non-functional specification. After this, the test strategy has to be built based on the ETL definitions and the System Retirement Policies. The definitions and policies are also the backbone for building the migration software. Finally the migration can be executed, however, not without a post- implementation review and on-going data quality enhancement.
Even though, generally, the description of the project plan that Morris (2006) presents is detailed, the focus is only on a project planning perspective and there are details regarding change and risk management that are not provided. Notable is also the fact that the quality of the software built for the migration does not have to be optimum. As the author states “good is good enough for migration”. This can be shortcoming in some scenarios however.
PROCESS MODEL
The Matthes & Schultz (2011) data migration process model consists of 14 phases. The model has been presented in similar form in two publications, a technical report (Matthes & Schultz, 2011) and two conference papers1 (Matthes, Schultz & Haller, 2011; Matthes, Schultz & Haller, 2012). The authors developed the model based on the state of the art in data migration literature and on practical experience gained by the researchers while taking part in several data migration projects.
The model is divided between four stages: Initialization, Development, Testing and Cut-Over. Initialization refers to setting up the organization and the infrastructure for data migration. Development refers to developing the data migration programs. Testing refers to validating the correctness, stability and execution time of both data and data programs. Cut-Over is the switch to the target application by performing the migration. The model is presented in Figure 9.
The Initialization stage consists of three phases. The first phase, Call for tender and bidding, refers to determining who carries out the migration, whether it is an internal department or an external party. There are three main reasons for outsourcing a data migration project. First, this kind of project requires additional time and effort needed on top of ordinary IT tasks and generally these are underestimated. Secondly, the competencies necessary for this task take time and effort to build, so it might not be feasible for all companies to have a separate department only for this type of projects. And lastly, data migration projects do not have a high frequency within the same company. This does not allow building up internal knowledge on the topic.
1 Note that the latest paper was not included in this literature study as it was made available in the final stages of the research. The latest paper includes a similar process model that also presents deliverables and more in