CN105843955A

CN105843955A - Data migration system

Info

Publication number: CN105843955A
Application number: CN201610227555.9A
Authority: CN
Inventors: 张建磊; 郭庆; 惠润海; 宋怀明
Original assignee: Dawning Information Industry Beijing Co Ltd
Current assignee: Dawning Information Industry Beijing Co Ltd
Priority date: 2016-04-13
Filing date: 2016-04-13
Publication date: 2016-08-10

Abstract

The invention provides a data migration system. The system comprises a source database, a first input/output module, a memory computing module, a second input/output module and at least two cluster databases, wherein the first input/output module is used for reading source data from the source database; the memory computing module is used for receiving the source data and dividing and transmitting the source data; the second input/output module is used for receiving divided data and transmitting the data to the corresponding cluster database; the cluster databases are used for receiving and storing the divided data. The data are read from the source database to a memory in a block-based reading manner, then the data are divided in a data division rule, and the divided data are written into a corresponding target database of the cluster databases in a block write-in manner. With the adoption of the block-based I/O (input/output) and memory computing manners, simultaneous operation of time-consuming data reading and efficient data division is avoided, and accordingly, the data migration efficiency is greatly improved.

Description

A kind of data mover system

Technical field

The present invention relates to field of computer data processing, it particularly relates to a kind of data mover system.

Background technology

The high-volume data being frequently encountered in actual production environment in data base be will be stored in are according to certain Distribution mode moves in distributed experiment & measurement system, or by the data that store in distributed type assemblies according to newly Distribution mode repartition the requirement of storage.At present, it is achieved above-mentioned migration or the universal method heavily divided Being broadly divided into two kinds: one is from source database, data to be exported to text, then in target database Sql (Structured Query Language SQL) statement is used to carry out text drawing Divide and import；Two is to be write by the division of the complete paired data of sql statement when deriving data from source database Text, then imports data in the database node of correspondence.

Above two method writes disk due to needs to intermediate object program, simultaneously will be by the sql language of data base Sentence carries out needing when data divide that data carry out in data base the other scanning of table level and calculates, and therefore efficiency is not High.

Realize mass data the most efficiently migrate or be heavily divided in distributed experiment & measurement system system one-tenth For puzzlement user and a difficult problem for manufacturer at present.And then, producer have employed sqoop instrument, and it is supported in data Using sql statement filter data and divide in transition process, therefore, it can be used as realizing hadoop And the instrument of Data Migration between relevant database.But, although number can be realized by hadoop According to the Data Migration between storehouse or data-base cluster, but divide plan owing to realizing data the most efficiently Slightly, the filtration that can only carry out data by realizing sql statement divides, the method and the migration side of above two Method is the same, inefficient.

For the problem in correlation technique, effective solution is the most not yet proposed.

Summary of the invention

For the problem in correlation technique, the present invention proposes a kind of data mover system.

The technical scheme is that and be achieved in that:

This data mover system includes:

Source database；

First input/output module, for reading the source data of source database and sending；

Internal memory computing module, is used for receiving source data, and divides source data and send；

Second input/output module, the data after receiving division, and it is sent to the cluster number of correspondence According to storehouse；

At least two Cluster Database, is used for the data after receiving division and stores.

Preferably, internal memory computing module includes:

Task management module, transcoding module, character meaning transferring module and data divide module.

Preferably, task management module is for distributing the task of source data, and monitors code conversion mould Block, character shift module and data divide the running status of module.

Preferably, transcoding module is for carrying out transcoding operation to the spcial character in source data.

Preferably, character meaning transferring module for carrying out character escape operation to the spcial character in source data.

Preferably, spcial character includes at least one of:

Newline, carriage return character, form feed character.

Preferably, data divide module, connect metadatabase, for source data is carried out division operation.

Preferably, metadatabase is for providing the list sending position of source data.

Preferably, data dividing mode includes at least one of:

Hash division, scope division, list division, poll divide.

Preferably, source database includes at least one of:

Oracle database, PostgreSQL data base.

Data are read in internal memory, so from source database by the present invention by using block-based reading manner After complete the division of data according to the division rule of data, ready-portioned data are write by the way of block writes Enter in the corresponding target database of Cluster Database.Owing to have employed the mode calculated based on block I/O and internal memory, Avoid running while the division of efficient data of time-consuming digital independent, be therefore greatly improved number According to the efficiency migrated.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only Some embodiments of the present invention, for those of ordinary skill in the art, are not paying creative work Under premise, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the schematic diagram of data mover system according to embodiments of the present invention.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art obtained all its His embodiment, broadly falls into the scope of protection of the invention.

As it is shown in figure 1, data mover system according to embodiments of the present invention includes:

Source database；

In a preferred embodiment, internal memory computing module includes: task management module, code conversion Module, character meaning transferring module and data divide module.Wherein, this task management module is for source number According to task distribution, and monitor transcoding module, character shift module and data divide the fortune of module Row state；This transcoding module is for carrying out transcoding operation to the spcial character in source data； This character meaning transferring module for carrying out character escape operation to the spcial character in source data, wherein, special Different character includes at least one of: newline, carriage return character, form feed character；These data divide module, Connecting metadatabase, for source data is carried out division operation, wherein, this metadatabase is used for providing The list sending position of source data.

In a preferred embodiment, data dividing mode includes at least one of: Hash divides, Scope divides, list (list) divides, poll divides, and wherein, the mode that data divide is according to Kazakhstan Data are divided by uncommon algorithm, scope, list, the method for poll.

In another preferred embodiment, source database includes at least one of: Oracle data Storehouse, PostgreSQL data base.

In order to be better understood from this programme, it is explained in detail with specific embodiment below.

The invention mainly comprises an I/O (Input/Output input/output) module, the 2nd I/O mould Block and internal memory computing module, wherein internal memory computing module includes: task management, code conversion, character Escape, data divide four submodules.The core concept of the present invention by according to data base with block Mode is by digital independent to internal memory, then according to incoming source database and the coding of target database The parameter logistic such as form, ESC and the data storage mode in distributed type assemblies are according to being sequentially completed Code conversion, character escape and data divide, the mode finally ready-portioned data write according to block Join node database corresponding in distributed type assemblies.

Wherein, carry out digital independent by block and carry out performance boost 5-10 times of digital independent one by one, and And the data of reading no longer write disk from storehouse, source, this saves the biggest for mass data Time overhead.Conversion process and the division operation of data completes in internal memory simultaneously, greatly improves The performance that Data Migration divides.

In Fig. 1, the function of each module is as follows:

First (or second) I/O module: this module is by data base's I/O interface function of encapsulation, real Now read data and by ready-portioned data according to the function of data block write into Databasce according to data block, There is provided degree parameter to realize concurrently reading simultaneously.This module comprises an interface, Qi Zhongfeng Fill the I/O interface function of disparate databases, to realize the reading and writing data of distinct type data-base is propped up Hold.Have been completed the encapsulation to oracle and postgresql data base at present.

Concrete data reading step is as follows:

1) data base's connection of correspondence is set up according to incoming source database Connecting quantity；

2) according to incoming table name, inquiry database dictionary determines the block at data place to be read；

3) calling interface function reading database block；

4) blocks of data of reading is resolved to record and put into internal memory.

The step writing data to destination node data base is as follows:

1) connection of node database corresponding to division data is set up；

2) inquiry data dictionary confirms that object table exists and structure is correct；

3) data are write data file according to the mode of block, and revise the corresponding data dictionary of data base Information, completes data and imports.

The sql statement (oracle database) of acquisition data block:

Internal memory computing module: this module mainly includes task management, code conversion, character escape, number According to dividing four submodules, mainly complete conversion process and the partition functionality of data.The merit of each submodule Can be as follows:

Task management module: be responsible for the start and stop of relevant operation in internal memory computing module and dividing of memory source Join recovery.Whether comprise code conversion, ESC and data according to the parameter that migration task is incoming to draw The method divided starts corresponding code conversion, character escape and data and divides module.Simultaneously according to biography The data volume entered carries out the distribution of memory source and monitors the running status of each module design task.

Transcoding module: carry out common problem encountered is that between data base of Data Migration between disparate databases Coding difference causes Confused-code (especially Chinese) occur.It is achieved that code conversion function, pass through Charset and ncharset respectively specifies that the character encoding type of source and target data base, completes corresponding Code conversion works.The character code supported at present has: GB2312；ISO-8859；UTF-8；ASCII.

Character meaning transferring module: if the spcial character comprised in record is not processed, data can be caused Cannot normal storage.Such as: record comprises newline, or comprises “ " etc., need data are entered Row corresponding escape operation, as newline escape is " n ", backslash escape is " " etc..Specifically The ESC that escape result is specified by user determines, such as:

Escape=0x5c

Null=0x5cN

" "=0x5c0x5c

Data divide module: the data dividing mode of support has Hash, scope, list and poll.Logical Data are completed to divide by the data dividing mode crossing target table definition.Concrete partiting step is as follows:

1) metadatabase of Querying Distributed cluster, confirms the dividing mode of object table；

2) call partition function according to the dividing mode of object table data record is divided；

3) data of division are stored in internal memory container.

The effect of the present invention is as follows:

1) by carrying out reading and writing data by data block, the performance of data exporting is greatly improved；

2) the I/O interface function provided by encapsulation disparate databases, can not only realize homotype data Data Migration between storehouse, moreover it is possible to realize the Data Migration between distinct type data-base；

3) data read from source database are stored in internal memory, it is to avoid disk write operation, save The expense of mass data written document；

4) process that the data read from source data carry out in internal memory data converts and divides, and carries The efficiency that high Data Migration divides；

5) by realizing code conversion and character meaning transferring module, improve different coding mode data base it Between the success rate of Data Migration.

The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this Within bright spirit and principle, any modification, equivalent substitution and improvement etc. made, should be included in this Within bright protection domain.

Claims

1. a data mover system, it is characterised in that including:

Source database；

First input/output module, for reading the source data of described source database and sending；

Internal memory computing module, is used for receiving described source data, and divides described source data also Send；

Cluster Database described at least two, is used for the data after receiving described division and stores.

Data mover system the most according to claim 1, it is characterised in that described internal memory calculates Module includes:

Data mover system the most according to claim 2, it is characterised in that described task management Module is for distributing the task of described source data, and monitors described transcoding module, described character Shift module and described data divide the running status of module.

Data mover system the most according to claim 2, it is characterised in that described code conversion Module is for carrying out transcoding operation to the spcial character in described source data.

Data mover system the most according to claim 2, it is characterised in that described character escape Module for carrying out character escape operation to the spcial character in described source data.

Data mover system the most according to claim 5, it is characterised in that described spcial character Including at least one of:

Newline, carriage return character, form feed character.

Data mover system the most according to claim 2, it is characterised in that described data divide Module, connects metadatabase, for described source data is carried out division operation.

Data mover system the most according to claim 7, it is characterised in that described metadatabase For providing the list sending position of described source data.

Data mover system the most according to claim 7, it is characterised in that data dividing mode Including at least one of:

Hash division, scope division, list division, poll divide.

Data mover system the most according to claim 1, it is characterised in that described source data Storehouse includes at least one of:

Oracle database, PostgreSQL data base.