CA2437008A1 - Method and apparatus for data migration between databases - Google Patents

Method and apparatus for data migration between databases Download PDF

Info

Publication number
CA2437008A1
CA2437008A1 CA002437008A CA2437008A CA2437008A1 CA 2437008 A1 CA2437008 A1 CA 2437008A1 CA 002437008 A CA002437008 A CA 002437008A CA 2437008 A CA2437008 A CA 2437008A CA 2437008 A1 CA2437008 A1 CA 2437008A1
Authority
CA
Canada
Prior art keywords
data
target
schema
database
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002437008A
Other languages
French (fr)
Inventor
Corrado Balducci
Yumman Chan
Feras Dawisha
Ramzan Khuwaja
Brenda Miu-To Lam
Sapna Mahwal
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IBM Canada Ltd
Original Assignee
IBM Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IBM Canada Ltd filed Critical IBM Canada Ltd
Priority to CA002437008A priority Critical patent/CA2437008A1/en
Publication of CA2437008A1 publication Critical patent/CA2437008A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention herein provides a method, system, and computer readable articl e for producing a schema mapping transformation for a relational database to allow selected da ta stored in at least one table in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of the data and satisfying constraints of the target relational database, including, determining the schemas of the source and target relational databases; developing a mapping transformation mapping the first schema to the second schema in a computer interpretable language; and, processing t he mapping transformation into a set of computer readable instructions for directing a computer to transform the selected data from the source relational database into a form suitable f or the schema of the target relational database. The mapping transformation can be conveniently performed in a spread sheet, transformed to the XML computer readable language and the n converted in SQL instructions which can the n be run on a computer to migrate code from the source to target databases.

Description

METHOD AND APPARATUS FOR DATA MIGRATION BETWEEN DATABASES
Field of the Invention This invention relates to a metlua u::a url;u:aW ~ ion migrating data between databases taking into account schema of the databases.
Background The use of databases, such as relational databases in computer systems and networks to store data is well known. As technology and ecommerce has evolved and grown, so to have databases and their sizes. Terabyte database are not unknown and the distinct possibility of them growing larger exists. In this growing environment, particularly ecommerce systems, it has become important to be able to transfer large amounts of data between databases. In transfers between different organizations and different databases it has become apparent that differences in the ways that databases store data; e.g. differences in database schemas have complicated effective data transfer. It is important to have an efficient way of transferring data between databases while taking into account of the differences in the database schemas.
It would be useful to have a system to move data from a source database having one schema to another, a target database having a different schema, where the tables and fields in the target schema can differ from the source schema in the following ways:
~ Target tables have been normauzect ~ Target tables have been de-normalized ~ Target tables / fields are a superset of source tables ~ Target tables / fields are a subset of source tables ~ Target fields have different data types from source fields ~ Data in target fields must be adjusted ~ Target tables require the addition of primary keys ~ Target data must be created based on source information Although techniques may exist that allow the movement and transformation of data, they may have limitations.
~ Techniques that use XML (Extensible Markup Language) computer language cannot transform data or automatically generate nrimary keys S ~ Techniques that use XML do not allow normalization of tables while rr~aintaining integrity of the data ~ Techniques that transform data in memory set severe restrictions on the size of the table that can be transformed ~ Techniques that act on data directly require connection to the source and target database throughout the pre-transformation and transformation process Techniques that act on data directly restrict the transformation process to what is allowed by the tool and do not allow additional manipulation of the data prior to extraction and load.
Patent references:
t.,lS Patent 2003007014~A I
This reference appears to disclose the generation of tables and transfer of data from an XML
document to a relational DB. The relationships between tables of the database are determined by the nesting of the nodes in the XML document. This reference does not deal with the transfer of data between databases. Th~_;, i~:., :..:, ~,..:: a_:a :u:.!es or schema.
Tables are created and populated based on the XML document passed in. This reference requires the data to be available for the pre-transformation. It cannot handle pre-transformation outside the realm of data itself.
~JS Patent 200300787CCA.l This reference apparently ~?~~w-~~~~ rhP ~nt;m;~at;~n of existing data and indexes it to speed up future search queries based on the current query. This reference does not disclose procedures for moving data from one database to another or one schema level to another.
There is no transformation of data.

LIS Patent 20020188G29A1 This reference apparently discloses a method of taking data from distributed spreadsheets and working on the data as if it were local to the users spreadsheet. This invention deals with the data directly. There is no transf~rmati~n of datafrom one schema to another.
The resulting data is used for a specific purpose, within the context of a spreadsheet application.
It appears that prior art techniques have a number of limitations:
Prior art mapped data, not schema.
Prior art did not provide a schema mapping system that could be encapsulated in the spreadsheet without requiring source data, database or a Document Type Definition (DTD).
Prior art systems needed to work with data directly, limiting scalability.
Prior art systems did not use pre-defined source and destination schema, and did not use a transformation system to build native SQL commands to extract data from one database and load data into another database.
One proposed solution in transforming data attempts to use Extensible Stylesheet Language (XML) with Extensible Stylesheet Language Transformations (XSLT) languages, in a process which includes working with large data file sets, having to know both SQL &
XSLT languages and create complex XSLT instructions. Drawbacks of the XML/XSLT solution, especially when working with large and complex data and data schemas, include:
Heavily time consum=-~- ~~ ~~t'~ ~°~~P~~~-~°nt of the XML/XSLT
and running of a transformation Limitations to a single step translation which does not provide the flexibility inherent in a mufti-step transformation Hardware requirements tend to be on the high end of the scale because the XSLT
files work directly with the XML data files which can get very large depending on the DB
therefore causing the transformation to take up more nhvsical and machine memory which cause transformation time to expand exponentially.

Summary of the Invention The present invention provides a method to transfomn data from one data schema to another in an easy, efficient manor, minimizing time and effort. The present invention seeks to overcome the limitations of the n~ior art. anal provide a flexible method and apparatus to move data from source database tables or fields having one schema to a target database having another. In various aspects of the invention herein the tables and fields in the target schema can differ from the source schema in the following ways:
~ Target tables have been normalized ~ Target tables have been de-normalized ~ Target tables / fields are a superset of source tables ~ Target tables / fields are a subset of source tables ~ Target fields have different data types from source fields ~ Data in target fields may be adjusted ~ Target tables may require the addition of primary keys ~ Target data might be created based on source data in order to fulfill target schema and new business rules requirements.
Embodiments of the invention provide a method of producing a schema mapping transformation for a relational database to allow selected data stored in at least one table in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of the data and sanstying constraints of the target relational database, in accordance with claim l, including, determining the schemas of the source and target relational databases; developing a mapping transformation mapping the first schema to the second schema; expressing the mapping transformation in a spreadsheet;
converting the spreadsheet into a computer interpretable language (XML); and, processing the spreadsheet in the computer interpretable language into a a set of computer readable instructions (SQL) for directing a computer to trans~.,..:: ~:." ;:_:-:._~_ ~ w~.~ :rom the source relational database into a form suitable for the schema of the target relational database.
The selected data should be transformed in accordance with the computer readable instructions.

Conveniently the selected data is stored in the target relational database in accordance with the computer readable instructions.
Generating data integrity instructions fir use with the spreadsheet helps to assure data integrity S when selected data is transported to the target relational database.
It is useful to generate data integrity instructions for use with the spreadsheet to satisfy constraints of the target relational database to assure data integrity and conformity when selected data is transported to the target relational database.
It is useful to specify conversion instructions in the spreadsheet to guide conversion of selected data from the source to target relational databases to assure conformity with constraints of the target relational database and data integrity.
1 S The conversion instructions may include at least one key word specified in the spreadsheet, and the instructions can comprise key words specified in the spreadsheet to cause generation of a group/set/sequence/series of instructions to perform conversion of selected data.
Further aspects of the invention may additionally provide steps including, extracting selected data from the source relational database; loading the selected data into a temporary table in a database; transforming the seI-~~~e~' ~-~t~ ~r~~ ~n- w'°~mediate set of data, for conformation with constraints of the target relational database while preserving data integrity;
extracting the intermediate data; and loading the intermediate data into appropriate/preselected locations in the target relational database.

The steps may include, extracting selected data from the source relational database; transforming the selected data into an intermPdsate set of data, for conformation with constraints of the target relational database while preserving data integrity; and loading the intermediate data into appropriate/preselected locations in the target relational database.

In another aspect of the invention the instructions may include instructions for generation of a primary key for each row of a table in the target database, or cause generation of sequential primary keys for rows of a table in the target database to assure data integrity.
The instructions can cause generation of unique id's for each row of a table in the target database to assure data integrity.
Another aspect of the invention provides a method of producing a schema mapping transformation for a relational database to allow selected data stored in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of the data and satisfying constraints of the target relational database, further including, selecting tables and fields of tables of the source relational database from which selected data is to be moved; selecting corresponding tables and fields of tables in the target relational database to which the selected data is to be moved; for the mapping transformation a) selecting data transformation rules for conversion of data to be transported from the source to target databases, to satisfy target table constraints; b) selecting schema transformation rules to permit conformation with the schema of the target database; and representing the transformation rules in a transformation spreadsheet.
The method of the invention may include using a suitable computer language parser to create a set of transformer computer in-'-~::'v~~~ ~w'~ ~~ ~'~T tc~ enable data transformation from source to target data.
Usefully for the mapping transformation there may be included c) selecting data and schema transformation rules for augmenting the selected data of the source database to conform to the schema of the target database to enforcelpreserve a parent child relationship within the data which is present in the original rata when transporting the selected data to the target database.
The transformation rules may include inserting an additional column in the table which was not present in the at least one table of the source database to normalize the selected data.

The transformation rules may include removing at least one selected column from the at least one table in the source database when transporting the selected data to the target database.
It may be useful to use a temporary tahl~ to build the additional column.
A primary key may be created for use with the additional column for normalization of the selected data.
Another aspect of the invention provides a method for normalizing the selected data, including, generating an intermediate table corresponding to the selected source table for normalizing the data; providing a table column for generation of primary keys for rows of the table;
extracting selected data from the table of the source database; transforming the selected data in conformity with the data and schema transformation rules; generating primary keys for the selected data from the source database in accordance with the schema transformation rules;
populating the intermediate table with the selected data and the primary keys to form an intermediate set of data; populating the tables of the target database with the intermediate set of data in accordance with the schema transformation rules.
The intermediate set of data in the intermediate may be further transformed based on data present in the intermediate table and the data and schema transformation rules.
The intermediate table may be populated with data from at least one source table or table field in accordance with the data and schema transformation rules to form the intermediate set of data.
The target database table may be populated with selected data from the intermediate database table and at least one source database table in accordance with the data and schema transformation rules.
The intermediate table and a second intermediate table may be used for further transformation and assembly of the selected data.

The further transformation may include the additionlinsertion of additional primary keys.
Another aspect of the invention may provide a method for normalizing the selected data, including, transforming the selected data in conformitv with the data and schema transformation rules; generating primary keys for the selected data from the source database in accordance with the schema transformation rules; populating the target table with the selected data and the primary keys in accordance with the schema transformation rules.
The mapping transformation may include application rules such as business rules for transforming at least a subset of said selected data into a suitable form for use by an application program.
Other embodiments of the invention may be implemented by a data processing system for carrying out the steps of the invention described above or a computer program, or an article including a computer-readable signal-bearing medium; the medium being selected from the group including a recordable data storage medium, magnetic storage medium, optical storage medium, biological data storage medium, and atomic data storage medium, a modulated carrier signal, and a transmission over the Internet; including means, such as computer program instructions in the medium for producing a schema mapping transformation for a relational database to allow selected data stored in at least one table in a source relational database having a first schema to be transportea to a iaige~ re~auonai database having a second schema while preserving integrity of the data and satisfying constraints of the target relational database.
Brief Description of the Drawings Embodiments of the invention will be described in conjunction with the following diagrams, in which:
Fig. 1 depicts a format of a sp:~~a:~~.:;_: '__ _=.v_: __'_-____ with one embodiment of the invention;
Fig. 2 depicts a format of a spreadsheet in accordance with one embodiment of the invention in which the source and target columns of the spreadsheet are identical;

Fig. 3 depicts a format of a spreadsheet in accordance with one embodiment of the invention in which the source fields have constants specified;
Fig. 4 depicts a format of a spreadsheet in accordance with one embodiment of the invention in which it can use the database server to generate a primary key before exporting data;
Fig. 5 depicts a format of a spreadsheet in accordance with one embodiment of the invention in which data is to be moved from a single table in source database to multiple tables in the target database;
Fig. 6 depicts a format of a spreadsheet in accordance with one embodiment of the invention in which the source table contains a primary key and a source table is split into multiple target tables in which there is to be a parent child relationship between the multiple target tables;
Fig. 7 depicts format of a spreadsheet in accordance with one embodiment of the invention in which a more complex form of splitting a source table into multiple tables where a parent child relationship is required in the target tables, where the source table does not have a primary key;
Fig. 8 depicts a flow chart of an embodiment of the method of the invention depicting the process for generation of XML, manipulation of the XML to generate SQL, files used, and created, and the source and ta--°t a~t~'~°~°~y Fig. 9 depicts an overview of the creation of SQL instructions for the migration of data.
Detailed Description of Preferred Embodiments We will now discuss some asrPot~ and embodiments of the present invention.
One advantageous embodiment of the invention provides a system including software for performing data movement and transformation in a database system through the use of a spreadsheet. Although the use of a spreadsheet is discussed in the following description, it will be appreciated that other transformation embodiments can be used. In this embodiment a spreadsheet is used as a transformation mapping to indicate the source database schema and the target database schema and the type of transformation required to move the data from the source to the target, which an SQL code generator can use to create the necessary SQT
. lnnT . anrT / or DML) required to move the data. Although the end result achieves the movement and transformation of data from the source database to the target database, the actual movement of the data is left to the source and target database servers.
A preferred embodiment of the present invention provides a system for the transforming, migrating or moving of simple to complex data and data schemas using a simple mapping file in the form of a spreadsheet which can be created by someone with little or, in some cases, no understanding of SQL or even data transformation technology. Using specific rules in the spreadsheet, which relate to the mapping of tables and columns, when/value clauses, constants, from clauses and where clauses, the tool builds the underlying SQL
instructions required to transform/move the data from one schema to another. The underlying database engine will move and manipulate data based on the generated SQL instructions.
The spreadsheet is converted to a well-defined XML format using a Perl script and a DTD file.
The XML is then used as input to the Java Application which manipulates and transforms the XML file to mufti-step SQL instructions which can be executed against the source schema to _°~irn:late the data as many times defined by the user as needed in orr'--~ '~ ~~,, ~- ,.., ,..:~,_ ~,~P
right result of data files in order to populate the new schema. Once extracted, the load SQL
instructions created by the tool can then be used to populate the target schema which is important because it allows users to manipulate data over and over again using different SQL instructions until the desired result set is achieved. i.e. a user may want to manipulate some data and only extract the rows which satisfy a condition of the result set. This sequence and data process at each interim step is specified solely by the mapping spreadsheet.
Advantages:
Complex data transformations/moves can be accomplished in mufti-step sequence of SQL statements not limited to a single SQL transformation.

Once data has been manipulated or a SQL step in sequence is performed, the data produced can be reused in other mufti-step sequence transformations which relates to performance and accuracy of the migration/transformation listed above.
Sequence of steps can always be split into several independent spreadsheets allowing development of subsystems to work dependently or independently.
Similarly there is no easy way for a user to transfer their data from application Y to application Z. This tool allows the user to complete such a data migration based solely on the mapping spreadsheet.
The process of migrating data is greatly eased since the User only needs to create and maintain the Mapping Spreadsheet and is not required to learn the underlying technologies, such as SQL, even though the spreadsheet can generate complex queries.
It uses well recognized / well known SQL constructs or only simple SQL
instructions, therefore do not need to know or learn XML/XSLT or complex SQL statements.
Tool performance is high since the tool does not directly transform the data -it only creates the mufti-step SQL statements. This allows the user to develop and perform initial testing of the mapping spreadsheet on a low end PC, without the requirement of powerful hardware.
Performance benefits since the actual data movement is performed using the native export/load tools of the source and target database server instead of, as in traditional XML/XSLT based tool, custom tools.
The tool can generate primary keys where required.
The tool can be used across multiple platforms as it produces standard SQL
files specific to the underlying Database.
The tool can be used in a combination of platforms for moving/transforming data from one DB platform to another DB platform i.e. moving data in an Oracle database to a DB2 database without having to change any created spreadsheets.
The generated extracts and load can be extended and manipulated as needed outside of the spreadsheet. Once the tool has generated the extract and load SQL
instructions, they can be modified as required without having to return to the spreadsheet or the tool.

We have found that it is preferable that the actual movement and transformation of the data is done through the use of generated tables rather than in memory to allow increased scalability, where the size of the table to be moved is limited to the size of the storage device, not memory size. Additionally, no data access is required in the use of the tool - the invention discussed here creates SQL instructions which can be later used to do the actual data access.
The steps that can be used in implementing a preferred embodiment of the invention include:
A user creates a spreadsheet, which indicates the tables and fields to be moved from the source database. This can be done manually or can be automated through the use of well-known database tools.
A user augments the spreadsheet with the tables and fields from the target database and any required transformations to satisfy requirements. Examples include:
~ if a source table contains separate fields for, say, first and last name and the target table contains a single field for full name, a transformation could be to concatenate the first and last names before loading into the target full name field ~ If during data movement it is also required that a fields' values be updated - say, if the source table contains employee numbers that start at 0 and the target table requires these employee numbers to start at 1000 - a transformation can be created to add 1000 to all (or a subset) of the source values The format of the spreadsheet is provided in Figure 1.
This spreadsheet is then saved in a standard comma separated value (CSV) form and this CSV
file is used as input to the XML generator tool to generate an XML file.
This XML file is used as input to the XML SQL-generator tool to generate the database specific export DDL and DML scripts and the load DDL and DML scripts.

The export subset of these scripts can now be used against the source database to export the data and the load subset of these scripts can be used against the target database to load the data.
In the simplest case, the data can be moved from one set of tables and f gilds to a similarly structured but differently named set of tables and fields where the source and target columns of the spreadsheet would be identical, except for the names of the source and target tables and fields.
An example of the simplest case of this movement is provided in Fi ure 2.
Another aspect of this would have the data from the source set of tables and fields moved to a subset of the tables and fields. This is accomplished by removing any tables and fields that should not be moved from the spreadsheet. For example, if the source table contains 3 fields and the target table only contains 2 fields, the spreadsheet representation would only list the 2 common fields as a source and target.
Another aspect of this would have the data from the source set of tables and fields moved to a superset of the tables and fields where the additional tables and fields have no relationship to the source database. This is accomplished by specifying constants in the source fields. An example of this is provided in _fi_ rude 3.
Another aspect of this would be when the source table does not have a primary key but the target t~hie requires one. In this case, the tool that generates the SQL instr-wti~r~
'="t~ hmld the nDL.
to generate a temporary table and move the data into this table. The temporary table is built in such a way as to use the database server to automatically generate the primary key before exporting the data. An example of this is provided in fi ure 4.
Another aspect of this would have the data from a single table in the source database moved to multiple tables in the target database. In its simplest form, this can be accomplished by specifying the source and target tables as in fi ure 5.

A more complex form of splitting a source table into multiple target tables is when there is a need to have parent child relationship between the multiple target tables. The simplest form of this is where the source table contains a primary key. An example of this is provided in fi re 6.
A more complex form of splitting a source table into multiple tables where a parent child relationship is required in the target tables is when the source table does not have a primary key.
In this form, the tool that generates the SQL instructions will build a temporary table and automatically generate a primary key for the target table parent and use this primary key as a foreign key in the target table child. The temporary table is built in such a way as to use the database server to automatically generate the primary key before exporting the data. An example of this is provided in figure 7.
Additional transformations can be accomplished by specifying constants to be used to manipulate the data in the fields in the source tables. These constants are declared in the spreadsheet and are used throughout as modifiers to the source fields. For example, the system can be instructed to create DML, which will modify the data in the source tables prior to exporting the data.
Another aspect of this could occur when information or data from source tables can be used to create new information and new data in target tables. For example, for every customer existing '~ ? source table a user could use the process of the invention to crea~r a nP~=.' ~~~? E'.'~''C'P,°'.;~'.'~lr:~
contract data in the target table.
The invention provides a process and apparatus, preferably in the form of a software tool that can provide a relatively easy way to migrate, transform and/or move data. This tool will enable a user to manipulate and move any data from any source table of any schema to any other target table schema by simply creating a mapping file spreadsheet between source and target tables and columns. The process of the invention can be embodied in two tools:
1 ) The first tool discussed is the XML Creator that converts spreadsheets containing the mapping into a well defined XML file. The following table is an illustration of such a spreadsheet.

Target Source Constants Target Table Target Column Type Source Table3ource Column Type When Value When Value When Value Wh merchantToStoreC
onstant=100000~RDERITEMS STATUS SHIPTO STSTAT STSTAT = 'C' 'S' STMENBR +
ORDERITEMS STOREENT_ID SHIPTO $merchantToStoreConstant SHIPTO-STQUANT i~
not null and SHIP t6liIPTO.S i t~UANT
ORDERITEMS TOTALPRODUCT SHIPTO $Zero STPRICE is not nul6HIPTO.STPRICE
2) The second tool is a SQL Generator that converts the XML file created above into Database SQL exportlload files. These SQL files will extract data from the original state and place it in a data file in its target state. Based on how the mapping file is created there may be multiple intermediate steps of export/load before the final target data file is specified.
The SQL instructions generated by these processes can be considered as five distinct steps (as needed):
a) Create table stage in which all required temporary tables are created in the source database as specified in the mapping spreadsheet.
b) Exportl stage in which all intermediate manipulation of data occurs within the source data base and data is extracted to data files.
c) Loadl stage in which all loading of intermediate manipulated data is re-loaded into temp tables from the source database which come from the data files of step b.
Step b and c can be repeated many times as needed and can be initiated under the cover by the application for efficiency i.e. when creating primary keys or can be explicitly used by the user by writing the "freadsheet in that form. i.e. manipulating the data, placing it into a tE_:~~, ~u;;._ ~~L:,'. ~r:~_:~ ,::~. ;he manipulated data in the temp table.
d) Export2 stage in which final manipulation of data can occur from the source database and data is extracted to data files.
e) Load2 stage in which all final data is loaded into the target database and schema in their final form from the data files created in step d.
The XML Creator and SQL Generator can run on either a NT or Llnix platform, although they could be developed to run on other platforms, as will be appreciated by those skilled in the art.
Since no connection to a database is required during these phases we expect these phases to be perfornled on low end, NT desktops to minimize hardware requirements. The XML
Creator can be written in pert, which is pre-loaded on most Unix servers, and the SQL
Generator is a portable, Java based application that will work on any platform. Once the SQL files have been generated they can be moved to the corresponding Source DB machine and Target DB
machine.
Pseudo Code The following section depicts sample pseudo code for processing lines of a source table to construct an XML file to be used in formation of an XML mapping spreadsheet in accordance with one aspect of the invention:
Perl Script Pseudo Code Adding constants to a hash table allows for substitution of constants to be used throughout the spreadsheet during the creation of the XML file:
1. If - the current line contains a constant Add it to a hash of unique constant - value pairs.
The first step of the file creation is to initialize the XML file with the DTD
info and begin with a SQL opening tag:
2. When the first line containing mapping data is found Open a xml file with the same name as the csv file and print the beginning XML
tags to it.
Start the first SQL mapping Conditions for creating a proper spreadsheet and therefore a proper XML file:
~. B~.~ild the xml file with the following conditions Target tables/columns must be known:
4. Target table can not be null 5. Target Column can not be null Source tablesfcolumns must be known:
6. Source Column can not be null 7. All other columns can be null Must replace constants with actual values:
8. When a constant is found in any field that can contain a constant replace it with its value from the hash of constants.

Can only create export/load SQL instructions for one target table at a time:
9. When a target table name changes to a new target table name start a new SQL
mapping.
When Value conditions are optional and describe how to transform column data based on the When condition.
10. If "When" and "Value" conditions are found build them into the xml.
From clause needed to know where to extract data from (table name) 11. Create a new tag in the XML file called From and place the value from the spreadsheet as the element value for "From".
Where clause is optional and should be represented in the XML as a new tag if column contains value. Used as the where clause of the SQL.
12. Create a new tag in the XML file called Where and place the value from the spreadsheet as the element value for "Where".
Close the XML final tag.
13. After the last line has been read Close out the xml file with the proper SQL ending tags.
PSEUDO CODE FOR XML TO SOL INSTRUCTION GENERATOR
The following sample pseudo code depicts an SQL generator that can be used to generate SQL
instructions to transform data from a source database to a target database in accordance with an embodiment of the invention.
XML_SQL Generator Pseudo Code Creating objects that represent the XML format that contain the data makes it easier to work with the data:
1. Parse XML document and create a SQL object, which contains the TableMapping object, fromClause, whereClause.

-Create TableMapping object which consists of the ColumnMapping object -Create the ColumnMappping Object, which consists of two column objects (target column and source column) -Create Column Object, which consists of element data for tablename, S columnname and datatype from the XML file.
Used to get information referring to file creation:
2. Get properties from properties file -check the properties file for values to fields declared in SqlGen Class -if values exist place into fields else use default value.
Drill down to the lowest level object (Column object) and work back up to create the SQL
mapping and looking for key words:
3. For every SQL Object and create export/load SQLs -get TableMapping object 1 S -get Target table name from SQL object -add it to a list, if the name exists place next sequential number next to name an save name -create a statement object, which will contain the export/load SQLs.
-For every ColumnMapping object get target and source column objects -if target column name equals to "SelfGeneratedID" Rr ~o,irce column name equals "Generate ID"
then create temp table called <source table> PRIME SQL
-place target column name in SQL
-if target name equals SelfGeneratedID then use a sequence value for primary key else use column name -create export/load SQL statements for stage 1 -for every column name place into SQL statements -break -if source column equals "Generate ID" and target column doesn't equal SelfGeneratedID
then -generate sequence value -place column name into export SQL statement for stagel -place column name into load SQL statement for stagel -place place column name into export SQL statement for stage2 -place column name into load SQL statement for stage2 -get the whenlvalue values if any for that column mapping -add columns to the export/load object query strings -End column Loop -get From Clause if any -add Frorn Clause to query strings -get Where Clause -add Where Clause to query strings -add Table Mapping Query Object to Vector
4. Purge old files S. Print new files (print all objects in the vector) End SQLs are created for two stages:
Stagel = refer to export/load SQL statements that export manipulated data on the source database and load it to a target table database.
Stage2 = refer to export/load SQL statements that export data from source table database and load it to a target table database.
All temp tables created for data manipulation or primary key reference are done on the source database. Since temp tables are distinguished by the following name pattern <source table> PRIME and contain manipulated data and/or primary keys we can reuse these tables add manipulation to these tables. By repeated references to manipulated temp tables data can go through a multi-step transformation in Stagel where the spreadsheet keeps referencing the new table and data until the final step manipulation would be in Stage2 and loaded on the target database.
The tool allows end users to do complicated manipulation of data in multiple simple steps. Also users can have newly created primary keys used as referential integrity to other tables.
Referring to Figure 9 which depicts an overview of the creation of SQL
instructions for data migration, may be seen that spreadsheet 1 which depicts the transformation to be achieved is converted into an XML file because the XML parser and object representation makes it easier to work with when creating SQL statements. Aslo constant place holders are converted to the actual value. 2 by the XML Creator Tool described above; after which the SQL
Generator tool, which uses Java in this embodiment, generates a set of SQL instructions 3 for later use in transferring data from a source to target database.
Refernng to Figure 8 which depicts a flow chart of an embodiment of the invention depicts the process for generation of an XML mapping, conversion of it to SQL instructions and there use in transfernng data from source database 8 to target database 9, it may be seen that Spread sheet mapping 1 is converted to a corresponding XML mapping 2 by means of the XML
Creator program 4, in conjunction with conversion rules 6. XML SQL Generator 5 which may be embodied in software, using properties file 7 (which may contain the names of the source 8 and target 11 databases, user names, and files to which information and instructions are written) if needed converts the XML spreadsheet mapping into SQL lnstru~t~nnc i,p Rnl.
F.xr,nrt fle 9.
Using the SQL Export file 9 the invention can create (IXF Data Files 10) intermediate tables to transform data in accordance with the schema of source and target databases and final data files from data in source database 8 tables. SQL Load file 3 is used to operate on the IXF data files and load the final data files intended into target database I 1.
This section describes a detailed method for using the data migration tools of the invention herein. It will be useful to provide an overview of the method before we continue. This overview is illustrated in Fig. 8.
Migration of data is accomplished via the following steps:

1 ) Create a spreadsheet that maps Source table columns to Target table columns, then save it as a CSV file.
2) Generate an XML file from the CSV file using the XML Generator Tool.
3) Generate Export / Load files from the XML file (created in previous step) using the XML SQL Generator Tool.
4) Execute the following files on the source DB (SOURCE DB MACHINE); The user must manually copy the files from the /Export directory to the Source DB machine, as these scripts will not work remotely, and the user will want to keep the files in the same directory structures (/export, /IXFData) and make sure the IXFData directory gets created under the source directory a. CREATETABLES.bat b.EXPORTI .bat c. LOAD l .bat d.EXPORT2.bat
5) Drop all Constraints on the Target DB.
6) Execute the following files on the target DB using the generated IXF data files:
a. LOAD2.bat
7) Restore the table Constraints.
Requirements ;. ~ There must be adequate space within the DB to be capable c'_~wut__~.~
,i~.°_p_~; °-, '.w'-'Ps.
~ The file system should be at least double the size of the Source DB.
~ The user must catalog the remote Target DB to the migration tool workstation.
Assumptions 1. The migration tool can provide base table spreadsheets that map specified schema to schema.
2. The user is responsible for creating custom table or schema m,:YY..~s ;,Yr=_=~~?s?_~:,"~~ := _~eeded that conform to the spreadsheet generation rules.

Creating a Table Mapping Spreadsheet The only input required by the Data Migration Tool is a table mapping spreadsheet. This spreadsheet can be easily modified to accommodate unique requirements of the user. By adjusting specific fields andlor adding additional column mappi~g~. t1-~y ~'~~r ,~~~~' ~;~e the spreadsheet to map custom tables to base tables, custom-to-custom tables, as well as base-to-base tables. Using the keyword 'Generate-ID' will generate a primary key directly to that table, whereas the keyword SelfgeneratedID , on the other hand, will generate a unique primary key for each row of the table and it can be used throughout other table mappings.
Definition of the Spreadsheet The spreadsheet will map Source schema columns to Target schema columns, using constants, conditions and generated primary keys.
The first row of the spreadsheet should be formatted in 17 columns as follows:
onstants Target Target Target Source Source Source When Value When Value When Value When Value From Where Table Column Type Table Column Type After filling in the first row, use the first column to declare, employing the format below, all the constants the user will be using throughout the spreadsheet.
t'ocstants One=1 Two=2 Negative_One=-1 Zero=0 actionlog.highbid_id.prefix='HIGHBID_ID:' concat.insert=j merchantToCatalogConstant=40000000000 merchantToMemberConstant=10000000000 NOTE:
~ Constants can either be a String, Integer or Column Name ~~ u~w, .~ ~::L , a_.~ ::, w Wring, then it needs to be encapsulated with single quotes, for example: 'String'.
~ No spaces are allowed in constant names.
~ When the user is using a constant in a row, the user must append "$" to it and separate it using a space. Fer example:

SGRFNBR +
SGRFNBR + $shopperToMerr $shopperToMembel berConstant +
Constant NAME
$TypeG $TypeG
The following table provides the rules outlining why columns may be empty, when it is appropriate, as well as when they may contain constants.
anteConstants TargetTargetVersionVersion4Source ValueWhen ValueWhen ValueWhen ValueFromV'herc Target When TableColumnType 4TableColumnType nptyYes No No Yes, Yes Yes Yes,Yes Yes Yes Yes Yes Yes Yes Yes Yc~sYes, i1 lowed? unlessunlessbut but unlessthere if if it i the the the the is no row row table table the f first is is getcrate generate row "whcn beingbeing of a used useds s mappingclause to to a a declaredeclarePrimacy Primary for the constantconstantKey Key eatraci s s then then or or this this to to separateseparatefield field is is mappinmappinmandato mandato gs. gs. ry, ry.

anstants No No Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes lowed?

efinitionRepresentTargetTargetTargetSourceSourceSourceA The A The A
The A The s TableColumnColumnTableColumnColumnconditiovalueconditiovalueconditiovaluecondit iovalue the constants Type . Typen mappedn mappedn mappedn mappe that that that that that Selfgen General must to must to must to must d are be the be the be the be to used eratedl e_ID met correspomet comespomet correspomet the througho D cannot when ndingwhen ndingwhen ndingwhen coiTes must ut be be mappincolumnmappincolumnmappincolumnmappinpondin the the spreadshe defined first g for g for g for g g row this this this this here in column,whencolumn.whe~~ .LhPnc~n~~~olam a ~~~~~nn new first table. the the the n for before When When ~i'hen when being conditio conditio conditio the used n n n When is is is met, met. met.

through conditi out. on is mct.

Additional Notes:
~ The user should insert one blank row between each table mapping.
Spreadsheet Notes:

~ Column 1 of the spreadsheet, Constants, represents all the constants used throughout the spreadsheet as well as their values.
~ Column 2 of the spreadsheet, Target Table, represents the name of the target table being mapped.
~ Column 3 of the spreadsheet, Target Column, represents the name of the target table being mapped, If a primary key needs to be generated and used as part of a parent-child relationship, then this column of the first row of the mapping will contain the keyword 'SelfGeneratedID'. A temporary table which has the same name as the source table concatenated with '-prime' is created to generate the key. This table can then be used throughout the spreadsheet to represent the source table with primary key.
~ Column 4 and 7 of the spreadsheet, Target Type and Source Type, represent the data types of the target and source columns and are mandatory when using SelfgeneratedID.
~ Column 5 of the spreadsheet, Source Table, represents the Source table to map from.
~ Column 6 of the spreadsheet, Source Column, represents the Source column that should be mapped to the corresponding Target column. Values may include a column name, a constant variable name from the constant row and/or a co..~~:~~~~ :~~.~u:, :;:. ~~~~b cr an Integer. String values must be encapsulated by single quotes, i.e. 'STRING'.
~ Columns 8 through I S of the spreadsheet, When/Value, represent conditions and values placed on a particular column. If the When condition is meet then the corresponding Value column is used. The Source Column, holds the default value when the condition is not met. The user may set up to four conditions for a colur~~
~ Column 16 of the spreadsheet, From, represents the source tables for supplied information and uses standard SQL syntax. This field is filled out only once per SQL

instruction in the first row of each table mapping. Each From clause will distinguish the beginning and end of each new table mapping.
~ Column 17 of the spreadsheet. Where, represents the "where" clause or condition for the table and uses standard SQL syntax. This column is used to maintain the uniqueness of the rows when joining more than one table and it only needs to be specified once per SQL instruction.
Before running the XML Generator:
The XML Generator points to a Perl prolnam that generates the required XML
file. The user needs to install Perl and the DBI & ODBC packages on the user system before executing this batch file.
Running the XML Generator After completing the spreadsheet, save the file under a <...>/Preprocessor/
DataMigration/Data directory. Then, using a spreadsheet program, save a CSV version of the spreadsheet in the same directory by highlighting columns A to Q and rows 1 to last non-empty row.
Next, open the directory to <...>/Preprocessor/DataMigration/Tools and execute the DB BIULDALL.BAT by double clicking the icon; no parameters are needed since the .bat file is setup to directly look under the Data directory and create one XML file for each CSV file. The output is also placed in the Data directory along with a .log file that reports any problem with the spreadsheet.
Generating Export / Load Files To generate da~ ~I~ ~:. - mi~~~~r ~~rir~ ;, the user must execute the XML SQL
Generator.bat command from the <...>/bin directory. This batch tile will read the XML file created by the XML Generator and generate the database Export and Load scripts that will migrate the data from the Source DB to the Target DB.

Configuration of the tool The XML SQL Generator tool uses a properties file called DataMigration.properties located under the <...>/config directory.
A sample of this file is shown below:
# DataMigration Properties This parameter specifies where to place the Export files:
*Export dirLocation -d:\\Engagements\\MigToolDev\\build\\PreProcessorToolsllDataMigration\\Data\\exp ort\\
This parameter specifies where to place the Load files:
*Load dirLocation -d:\\Engagements\\MigToolDev\\buildl\PreProcessorTools\\DataMigration\\Data\\loa d\\
This parameter specifies where to place the IXF data files:
*IXF dirLocation = ..IIXFDatal This parameter specifies which Database:
*DB Platform = DB2 These parameters specify the Source and Target Dbnames.
*Source DBA User = db2netcs *Source DBA Password = db2netcs *Target~DBA L s~,i = v~ L~ad~n~:
*Target DBA Password = torn 1 lab #*Source DB = reimigr.vulcan.torolab.ibm.com #*Target = ORADEV
DB

*Source DB mservera =

*Target DB MALL
=

*mandatory Pre-execution of the Export / Load files:

1. Drop all the constraints for all tables on the target DB
Executing the tool:
From the bin directory Type XI~1L._adL,Generator -xml 'xmlFile' where xmlFile = directory and filename of the XML file generated by the XMI -Generator tool.
Note: Create the output directories ( /export, /load, /IXFData) as the user initializes the property file Execution of the Export / Load files:
I S The Export directory contains the followin g files:
CREATETABLE.bat EXPORT I .bat LOAD l .bat EXPORT2.bat plus the corresponding .cr, .ex, .ld files These four .bat files must be executed in order provided above. FTP these files to the source machine DB and execute each as listed. These scripts export data as IXF for DB2 and DAT files for Oracle, and place it in the \IXFData directory of the Source DB Machine (Note: create the \IXFData directory at the same level as \export directory BEFORE running the above scripts) Make sure the export of aata was successful by checking the err.log file in the \IXFData directory The Load directory contains the followin files:
LOAD2.bat plus the corresponding .Id files.
FTP the data files (IXFIhA'f) files from \IXFData directory on Source DB
Machine to \IXFData directory on Target DB Machine. Then FTP these files to the target DB and execute the .bat.
Executes LOAD2.bat file by typing ...\load\LOAD2.bat <db name> <db userid>
<db-password>

Make sure the loading of data was successful by checking the err.log file in the \IXFData directory.
Post Migration steps Rebuild all table constraints that were dropped earlier.

Claims (98)

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. A method of producing a schema mapping transformation for a relational database to allow selected data stored in at least one table in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of said data and satisfying constraints of said target relational database, comprising:
determining said schemas of said source and target relational databases;
developing a mapping transformation mapping said first schema to said second schema in a computer interpretable language; and, processing said mapping transformation into a set of computer readable instructions for directing a computer to transform said selected data from said source relational database into a form suitable for the schema of said target relational database.
2. A method of producing a schema mapping transformation for a relational database to allow selected data stored in at least one table in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of said data and satisfying constraints of said target relational database, in accordance with claim 1, comprising:
determining said schemas of said source and target relational databases;
developing a mapping transformation mapping said first schema to said second schema;
expressing said mapping transformation in a spreadsheet;
converting said spreadsheet into a computer interpretable language (XML); and, processing said spreadsheet in said computer interpretable language into a a set of computer readable instructions (SQL) for directing a computer to transform said selected data from said source relational database into a form suitable for the schema of said target relational database.
3. The method of claim 2 wherein said selected data is transformed in accordance with said computer readable instructions.
4. The method of claim 3 wherein said selected data is stored in said target relational database in accordance with said computer readable instructions.
5. The method of claim 2 further comprising:
generating data integrity instructions for use with said spreadsheet to assure data integrity when selected data is transported to said target relational database.
6. The method of claim 5 further comprising:
generating data integrity instructions for use with said spreadsheet to satisfy constraints of said target relational database to assure data integrity and conformity when selected data is transported to said target relational database.
7. The method of claim 2 further comprising:
specifying conversion instructions in said spreadsheet to guide conversion of selected data from said source to target relational databases to assure conformity with constraints of said target relational database and data integrity.
8. The method of claim 2 further comprising:
specifying conversion instructions to be used with said spreadsheet to guide conversion of selected data from said source to target relational databases to assure conformity with constraints of said target relational database and data integrity.
9. The method of claim 8 wherein said conversion instructions comprise at least one key word specified in said spreadsheet.
10. The method of claim 8 wherein said instructions comprise key words specified in said spreadsheet to cause generation of a set of instructions to perform conversion of selected data.
11. The method of claim 10 wherein said steps comprise:
extracting selected data from said source relational database;

loading said selected data into a temporary table in a database;
transforming said selected data into an intermediate set of data, for conformation with constraints of said target relational database while preserving data integrity;
extracting said intermediate data;
loading said intermediate data into preselected locations in said target relational database.
12. The method of claim 10 wherein said steps comprise:
extracting selected data from said source relational database;
transforming said selected data into an intermediate set of data, for conformation with constraints of said target relational database while preserving data integrity;
loading said intermediate data into preselected locations in said target relational database.
13. The method of claim 10 wherein said steps comprise:
extracting selected data from said source relational database;
loading said selected data into a temporary table in a database such as said source database;
transforming said selected data into an intermediate set of data, for conformation with constraints of said target relational database while preserving data integrity;
extracting said intermediate data;
loading said intermediate data into preselected locations in said target relational database.
14. The method of claim 10 wherein said instructions may include instructions for generation of a primary key for each row of a table in said target database.
15. The method of claim 10 wherein said instructions cause generation of sequential primary keys for rows of a table in said target database to assure data integrity.
16. The method of claim 8 where said instructions cause generation of unique id's for each row of a table in said target database to assure data integrity.
17. A method in accordance with claim 2 of producing a schema mapping transformation for a relational database to allow selected data stored in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of said data and satisfying constraints of said target relational database, further comprising:
selecting tables and fields of tables of said source relational database from which selected data is to be moved;
selecting corresponding tables and fields of tables in said target relational database to which said selected data is to be moved;
for said mapping transformation a) selecting data transformation rules for conversion of data to be transported from said source to target databases, to satisfy target table constraints;
b) selecting schema transformation rules to permit conformation with said schema of said target database;
representing said transformation rules in a transformation spreadsheet.
18. The method of claim 17 further comprising:
converting said spreadsheet into XML or other computer interpretable language.
19. The method of claim 18 further comprising:
using a suitable computer language parser to create a set of transformer computer instructions such as SQL to enable data transformation from source to target data.
20. The method of claim 17 further comprising:
for said mapping transformation c) selecting data and schema transformation rules for augmenting said selected data of said source database to conform to said schema of said target database to enforce preserve a parent child relationship within said data which is present in said original data when transporting said selected data to said target database.
21. The method of claim 20 wherein said transformation rules include inserting an additional column in said table which was not present in said at least one table of said source database to normalize said selected data.
22. The method of claim 17 wherein said transformation rules include removing at least one selected column from said at least one table in said source database when transporting said selected data to said target database.
23. The method of claim 21 further comprising:
using a temporary table to build said additional column.
24. The method of claim 23 wherein a primary key is created for use with said additional column for normalization of said selected data.
25. The method of claim 17 further including a method for normalizing said selected data, comprising:
generating an intermediate table corresponding to said selected source table for normalizing said data;
providing a table column for generation of primary keys for rows of said table;
extracting selected data from said table of said source database;
transforming said selected data in conformity with said data and schema transformation rules;
generating primary keys for said selected data from said source database in accordance with said schema transformation rules;
populating said intermediate table with said selected data and said primary keys to form an intermediate set of data;
populating said tables of said target database with said intermediate set of data in accordance with said schema transformation rules.
26. The method of claim 25 wherein said intermediate set of data in said intermediate is further transformed based on data present in said intermediate table and said data and schema transformation rules.
27. The method of claim 25 wherein said intermediate table is populated with data from at least one source table or table field in accordance with said data and schema transformation rules to form said intermediate set of data.
28. The method of claim 25 wherein said target database table is populated with selected data from said intermediate database table and at least one source database table in accordance with said data and schema transformation rules.
29. The method of claim 27 wherein said intermediate table and a second intermediate table is used for further transformation and assembly of said selected data.
30, The method of claim 29 wherein said further transformation includes the addition insertion of additional primary keys,
31. The method of claim 17 further including a method for normalizing said selected data, comprising:
transforming said selected data in conformity with said data and schema transformation rules;
generating primary keys for said selected data from said source database in accordance with said scheme transformation rules:
populating said target table with said selected data and said primary keys in accordance with said schema transformation rules.
32. A data processing system for producing a schema mapping transformation for a relational database to allow selected data stored in at least one table in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of said data and satisfying constraints of said target relational database, comprising:
means for determining said schemas of said source and target relational databases;

means for developing a mapping transformation mapping said first schema to said second schema in a computer interpretable language; and, means for processing said mapping transformation into a set of computer readable instructions for directing a computer to transform said selected data from said source relational database into a form suitable for the schema of said target relational database.
33. The data processing system for producing a schema mapping transformation for a relational database to allow selected data stored in at least one table in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of said data and satisfying constraints of said target relational database, in accordance with claim 32, comprising:
means for determining said schemas of said source and target relational databases;
means for developing a mapping transformation mapping said first schema to said second schema;
means for expressing said mapping transformation in a spreadsheet;
means for converting said spreadsheet into a computer interpretable language (XML);
means for processing said spreadsheet in said computer interpretable language into a a set of computer readable instructions (SQL) for directing a computer to transform said selected data from said source relational database into a form suitable for the schema of said target relational database.
34. The data processing system of claim 33 wherein said selected data is transformed in accordance with said computer readable instructions.
35. The data processing system of claim 34 wherein said selected data is stored in said target relational database in accordance with said computer readable instructions.
36. The data processing system of claim 33 further comprising:
means for generating data integrity instructions for use with said spreadsheet to assure data integrity when selected data is transported to said target relational database.
37. The data processing system of claim 36 further comprising:
means for generating data integrity instructions for use with said spreadsheet to satisfy constraints of said target relational database to assure data integrity and conformity when selected data is transported to said target relational database.
38. The data processing system of claim 33 further comprising:
means for specifying conversion instructions in said spreadsheet to guide conversion of selected data from said source to target relational databases to assure conformity with constraints of said target relational database and data integrity.
39. The data processing system of claim 33 further comprising:
means for specifying conversion instructions to be used with said spreadsheet to guide conversion of selected data from said source to target relational databases to assure conformity with constraints of said target relational database and data integrity.
40. The data processing system of claim 39 wherein said conversion instructions comprise at least one key word specified in said spreadsheet.
41. The data processing system of claim 39 wherein said instructions comprise key words specified in said spreadsheet to cause generation of a set of instructions to perform conversion of selected data
42. The data processing system of claim 41 wherein said steps comprise:
means for extracting selected data from said source relational database;
means for loading said selected data into a temporary table in a database;
means for transforming said selected data into an intermediate set of data, for conformation with means for constraints of said target relational database while preserving data integrity;
means for extracting said intermediate data;
means for loading said intermediate data into preselected locations in said target relational database.
43. The data processing system of claim 41 wherein said steps comprise:
means for extracting selected data from said source relational database;
means for transforming said selected data into an intermediate set of data, for conformation with means for constraints of said target relational database while preserving data integrity;
means for loading said intermediate data into preselected locations in said target relational database.
44. The data processing system of claim 41 wherein said steps comprise:
means for extracting selected data from said source relational database;
means for loading said selected data into a temporary table in a database such as said source database;
means for transforming said selected data into an intermediate set of data, for conformation with constraints of said target relational database while preserving data integrity;
means for extracting said intermediate data;
means for loading said intermediate data into preselected locations in said target relational database.
45. The data processing system of claim 41 wherein said instructions may include instructions for generation of a primary key for each row of a table in said target database.
46. The data processing system of claim 41 wherein said instructions cause generation of sequential primary keys for rows of a table in said target database to assure data integrity.
47. The data processing system of claim 39 where said instructions cause generation of unique id's for each row of a table in said target database to assure data integrity.
48. The data processing system of claim 33 for producing a schema mapping transformation for a relational database to allow selected data stored in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of said data and satisfying constraints of said target relational database, further comprising:
means for selecting tables and fields of tables of said source relational database from which selected data is to be moved;
means for selecting corresponding tables and fields of tables in said target relational database to which said selected data is to be moved;
for said mapping transformation a) means for selecting data transformation rules for conversion of data to be transported from said source to target databases, to satisfy target table constraints;
b) means for selecting schema transformation rules to permit conformation with said schema of said target database;
means for representing said transformation rules in a transformation spreadsheet.
49. The data processing system of claim 48 further comprising:
means for converting said spreadsheet into XML or other computer interpretable language.
50. The data processing system of claim 49 further comprising:
using a suitable computer language parser to create a set of transformer computer instructions such as SQL to enable data transformation from source to target data.
51. The data processing system of claim 48 further comprising:
for said mapping transformation c) means for selecting data and schema transformation rules for augmenting said selected data of said source database to conform to said schema of said target database to enforce a parent child relationship within said data which is present in said original data when transporting said selected data to said target database.
52. The data processing system of claim 51 wherein said transformation rules include inserting an additional column in said table which was not present in said at least one table of said source database to normalize said selected data.
53. The data processing system of claim 48 wherein said transformation rules include removing at least one selected column from said at least one table in said source database when transporting said selected data to said target database.
54. The data processing system of claim 52 further comprising:
means for using a temporary table to build said additional column.
55. The data processing system of claim 54 wherein a primary key is created for use with said additional column for normalization of said selected data.
56. The data processing system of claim 48 further including means for normalizing said selected data, comprising:
means for generating an intermediate table corresponding to said selected source table for normalizing said data;
means for providing a table column for generation of primary keys for rows of said table;
means for extracting selected data from said table of said source database;
means for transforming said selected data in conformity with said data and schema transformation rules;
means for generating primary keys for said selected data from said source database in accordance with said schema transformation rules;
means for populating said intermediate table with said selected data and said primary keys to form an intermediate set of data;
means for populating said tables of said target database with said intermediate set of data in accordance with said schema transformation rules.
57. The data processing system of claim 56 wherein said intermediate set of data in said intermediate is further transformed based on data present in said intermediate table and said data and schema transformation rules.
58. The data processing system of claim 56 wherein said intermediate table is populated with data from at least one source table or table field in accordance with said data and schema transformation rules to form said intermediate set of data.
59. The data processing system of claim 56 wherein said target database table is populated with selected data from said intermediate database table and at least one source database table in accordance with said data and schema transformation rules.
60. The data processing system of claim 58 wherein said intermediate table and a second intermediate table is used for further transformation and assembly of said selected data.
61. The data processing system of claim 60 wherein said further transformation includes the insertion of additional primary keys.
62. The data processing system of claim 48 further including means for normalizing said selected data, comprising:
means for transforming said selected data in conformity with said data and schema transformation rules;
means for generating primary keys for said selected data from said source database in accordance with said schema transformation rules;
means for populating said target table with said selected data and said primary keys in accordance with said schema transformation rules.
63. An article comprising a computer-readable signal-bearing medium; said medium being selected from the group including a recordable data storage medium, magnetic storage medium, optical storage medium, biological data storage medium, and atomic data storage medium, a modulated carrier signal, and a transmission over the Internet;
means in the medium for producing a schema mapping transformation for a relational database to allow selected data stored in at least one table in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of said data and satisfying constraints of said target relational database, comprising:

means for determining said schemas of said source and target relational databases;
means for developing a mapping transformation mapping said first schema to said second schema in a computer interpretable language; and, means for processing said mapping transformation into a set of computer readable instructions for directing a computer to transform said selected data from said source relational database into a form suitable for the schema of said target relational database.
64. The article for producing a schema mapping transformation for a relational database to allow selected data stored in at least one table in a source relational database having a first schema to be transported to a target relational database having a second schema while presenting integrity of said data and satisfying constraints of said target relational database, in accordance with claim 63, comprising:
means for determining said schemas of said source and target relational databases;
means for developing a mapping transformation mapping said first schema to said second schema;
means for expressing said mapping transformation in a spreadsheet;
means for converting said spreadsheet into a computer interpretable language (XML);
means for processing said spreadsheet in said computer interpretable language into a a set of computer readable instructions (SQL) for directing a computer to transform said selected data from said source relational database into a form suitable for the schema of said target relational database.
65. The article of claim 64 wherein said selected data is transformed in accordance with said computer readable instructions.
66. The article of claim 65 wherein said selected data is stored in said target relational database in accordance with said computer readable instructions.
67. The article of claim 64 further comprising:
means for generating data integrity instructions for use with said spreadsheet to assure data integrity when selected data is transported to said target relational database.
68. The article of claim 67 further comprising:
means for generating data integrity instructions for use with said spreadsheet to satisfy constraints of said target relational database to assure data integrity and conformity when selected data is transported to said target relational database.
69. The article of claim 64 further comprising:
means for specifying conversion instructions in said spreadsheet to guide conversion of selected data from said source to target relational databases to assure conformity with constraints of said target relational database and data integrity.
70. The article of claim 64 further comprising:
means for specifying conversion instructions to be used with said spreadsheet to guide conversion of selected data from said source to target relational databases to assure conformity with constraints of said target relational database and data integrity.
71. The article of claim 70 wherein said conversion instructions comprise at least one key word specified in said spreadsheet.
72. The article of claim 70 wherein said instructions comprise key words specified in said spreadsheet to cause generation of a set of instructions to perform conversion of selected data.
73. The article of claim 72 wherein said steps comprise:
means for extracting selected data from said source relational database;
means for loading said selected data into a temporary table in a database;
means for transforming said selected data into an intermediate set of data, for conformation with means for constraints of said target relational database while preserving data integrity;
means for extracting said intermediate data;
means for loading said intermediate data into preselected locations in said target relational database.
74. The article of claim 72 wherein said steps comprise:
means for extracting selected data from said source relational database;
means for transforming said selected data into an intermediate set of data, for conformation with means for constraints of said target relational database while preserving data integrity;
means for loading said intermediate data into preselected locations in said target relational database.
75. The article of claim 72 wherein said steps comprise:
means for extracting selected data from said source relational database;
means for loading said selected data into a temporary table in a database such as said source database;
means for transforming said selected data into an intermediate set of data, for conformation with constraints of said target relational database while preserving data integrity;
means for extracting said intermediate data;
means for loading said intermediate data into preselected locations in said target relational database.
76. The article of claim 72 wherein said instructions may include instructions for generation of a primary key for each row of a table in said target database.
77. The article of claim 72 wherein said instructions cause generation of sequential primary keys for rows of a table in said target database to assure data integrity.
78. The article of claim 70 where said instructions cause generation of unique id's for each row of a table in said target database to assure data integrity.
79. The article of claim 64 for producing a schema mapping transformation for a relational database to allow selected data stored in a source relational database having a first schema to be transported to a target relational database having a second schema while preserving integrity of said data and satisfying constraints of said target relational database, further comprising:
means for selecting tables and fields of tables of said source relational database from which selected data is to be moved;
means for selecting corresponding tables and fields of tables in said target relational database to which said selected data is to be moved;
for said mapping transformation a) means for selecting data transformation rules for conversion of data to be transported from said source to target databases, to satisfy target table constraints;
b) means for selecting schema transformation rules to permit conformation with said schema of said target database;
means for representing said transformation rules in a transformation spreadsheet.
80. The article of claim 79 further comprising:
means for converting said spreadsheet into XML or other computer interpretable language.
81. The article of claim 80 further comprising:
using a suitable computer language parser to create a set of transformer computer instructions such as SQL to enable data transformation from source to target data.
82. The article of claim 79 further comprising:
for said mapping transformation c) means for selecting data and schema transformation rules for augmenting said selected data of said source database to conform to said schema of said target database to enforce a parent child relationship within said data which is present in said original data when transporting said selected data to said target database.
83. The article of claim 82 wherein said transformation rules include inserting an additional column in said table which was not present in said at least one table of said source database to normalize said selected data.
84. The article of claim 79 wherein said transformation rules include removing at least one selected column from said at least one table in said source database when transporting said selected data to said target database.
85. The article of claim 73 further comprising:
means for using a temporary table to build said additional column.
86. The article of claim 75 wherein a primary key is created for use with said additional column for normalization of said selected data.
87. The article of claim 79 further including means for normalizing said selected data, comprising:
means for generating an intermediate table corresponding to said selected source table for normalizing said data;
means for providing a table column for generation of primary keys for rows of said table;
means for extracting selected data from said table of said source database;
means for transforming said selected data in conformity with said data and schema transformation rules;
means for generating primary keys for said selected data from said source database in accordance with said schema transformation rules;
means for populating said intermediate table with said selected data and said primary keys to form an intermediate set of data;
means for populating said tables of said target database with said intermediate set of data in accordance with said schema transformation rules.
88. The article of claim 87 wherein said intermediate set of data in said intermediate is further transformed based on data present in said intermediate table and said data and schema transformation rules.
89. The article of claim 87 wherein said intermediate table is populated with data from at least one source table or table field in accordance with said data and schema transformation rules to form said intermediate set of data.
90. The article of claim 87 wherein said target database table is populated with selected data from said intermediate database table and at least one source database table in accordance with said data and schema transformation rules.
91. The article of claim 89 wherein said intermediate table and a second intermediate table is used for further transformation and assembly of said selected data.
92. The article of claim 91 wherein said further transformation includes the insertion of additional primary keys.
93. The article of claim 79 further including means for normalizing said selected data, comprising:
means for transforming said selected data in conformity with said data and schema transformation rules;
means for generating primary keys for said selected data from said source database in accordance with said schema transformation rules;
means for populating said target table with said selected data and said primary keys in accordance with said schema transformation rules.
94. The method of claim 1 wherein said mapping transformation includes application rules such as business rules for transforming at least a subset of said selected data into a suitable form for use by an application program.
95. The system of claim 32 wherein said mapping transformation includes application rules such as business rules for transforming at least a subset of said selected data into a suitable form for use by an application program.
96. The article of claim 63 wherein said mapping transformation includes application rules such as business rules for transforming at least a subset of said selected data into a suitable form for use by an application program.
97. The article of claim 63 wherein said means comprises computer program code means.
98. An article comprising a computer-readable signal-bearing medium; said medium being selected from the group including a recordable data storage medium, magnetic storage medium, optical storage medium, biological data storage medium, and atomic data storage medium, a modulated carrier signal, and a transmission over the Internet; including a computer-readable signal comprising computer program code adapted to perform the steps of any of claims 1 to 3 when run on a computer.
CA002437008A 2003-08-12 2003-08-12 Method and apparatus for data migration between databases Abandoned CA2437008A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA002437008A CA2437008A1 (en) 2003-08-12 2003-08-12 Method and apparatus for data migration between databases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA002437008A CA2437008A1 (en) 2003-08-12 2003-08-12 Method and apparatus for data migration between databases

Publications (1)

Publication Number Publication Date
CA2437008A1 true CA2437008A1 (en) 2005-02-12

Family

ID=34120707

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002437008A Abandoned CA2437008A1 (en) 2003-08-12 2003-08-12 Method and apparatus for data migration between databases

Country Status (1)

Country Link
CA (1) CA2437008A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7580941B2 (en) 2006-06-13 2009-08-25 Microsoft Corporation Automated logical database design tuning
CN107958057A (en) * 2017-11-29 2018-04-24 苏宁云商集团股份有限公司 A kind of code generating method and device for being used for Data Migration in heterogeneous database
CN110413672A (en) * 2019-07-03 2019-11-05 平安科技(深圳)有限公司 Data automatically imported method, device and computer readable storage medium
CN110647564A (en) * 2019-08-14 2020-01-03 中国平安财产保险股份有限公司 Hive table establishing method, electronic device and computer readable storage medium
CN111143470A (en) * 2019-12-27 2020-05-12 中国银行股份有限公司 Cross-platform database data synchronization comparison method and device
CN111767350A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Data warehouse testing method and device, terminal equipment and storage medium
CN112035461A (en) * 2020-06-17 2020-12-04 深圳市法本信息技术股份有限公司 Migration method and system for table data file of database

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7580941B2 (en) 2006-06-13 2009-08-25 Microsoft Corporation Automated logical database design tuning
CN107958057A (en) * 2017-11-29 2018-04-24 苏宁云商集团股份有限公司 A kind of code generating method and device for being used for Data Migration in heterogeneous database
CN110413672A (en) * 2019-07-03 2019-11-05 平安科技(深圳)有限公司 Data automatically imported method, device and computer readable storage medium
CN110413672B (en) * 2019-07-03 2023-09-19 平安科技(深圳)有限公司 Automatic data importing method and device and computer readable storage medium
CN110647564A (en) * 2019-08-14 2020-01-03 中国平安财产保险股份有限公司 Hive table establishing method, electronic device and computer readable storage medium
CN110647564B (en) * 2019-08-14 2023-11-24 中国平安财产保险股份有限公司 Hive table building method, electronic device and computer readable storage medium
CN111143470A (en) * 2019-12-27 2020-05-12 中国银行股份有限公司 Cross-platform database data synchronization comparison method and device
CN112035461A (en) * 2020-06-17 2020-12-04 深圳市法本信息技术股份有限公司 Migration method and system for table data file of database
CN112035461B (en) * 2020-06-17 2024-05-10 深圳市法本信息技术股份有限公司 Method and system for migrating table data files of database
CN111767350A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Data warehouse testing method and device, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
US8275810B2 (en) Making and using abstract XML representations of data dictionary metadata
US6836778B2 (en) Techniques for changing XML content in a relational database
US7219102B2 (en) Method, computer program product, and system converting relational data into hierarchical data structure based upon tagging trees
US7599947B1 (en) Method and system for converting hierarchical database schemas into relational database schemas
US6708186B1 (en) Aggregating and manipulating dictionary metadata in a database system
US7756904B2 (en) Nested conditional relations (NCR) model and algebra
US5742810A (en) System, method and computer program product for passing host variables to a database management system
US6917937B1 (en) Server-side object filtering
KR101331532B1 (en) Dynamic method for generating xml documents from a database
US7634515B2 (en) Data model and schema evolution
US20050071359A1 (en) Method for automated database schema evolution
US20080114803A1 (en) Database System With Path Based Query Engine
US20020120630A1 (en) Method and apparatus for storing semi-structured data in a structured manner
US7707159B2 (en) Method and apparatus for storing semi-structured data in a structured manner
WO2001061566A1 (en) System and method for automatic loading of an xml document defined by a document-type definition into a relational database including the generation of a relational schema therefor
US20100185701A1 (en) Method and system for enabling life cycle maintenance of hierarchical database schemas in modeling tool
US20100030727A1 (en) Technique For Using Occurrence Constraints To Optimize XML Index Access
CN105335366A (en) SQL statement processing method and apparatus and server
US7051041B1 (en) Simplified relational database extension to DBM hash tables and method for using same
CA2437008A1 (en) Method and apparatus for data migration between databases
US8001110B2 (en) Apparatus, method, and computer program product for processing databases
US6925630B1 (en) Method for generating code for processing a database
Fong et al. An interpreter approach for exporting relational data into XML documents with structured export markup language
Raj et al. Implementation of ETL process using Pig and Hadoop
Morishima et al. A data modeling and query processing scheme for integration of structured document repositories and relational databases

Legal Events

Date Code Title Description
EEER Examination request
FZDE Dead