CN105843955A - Data migration system - Google Patents
Data migration system Download PDFInfo
- Publication number
- CN105843955A CN105843955A CN201610227555.9A CN201610227555A CN105843955A CN 105843955 A CN105843955 A CN 105843955A CN 201610227555 A CN201610227555 A CN 201610227555A CN 105843955 A CN105843955 A CN 105843955A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- character
- database
- division
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data migration system. The system comprises a source database, a first input/output module, a memory computing module, a second input/output module and at least two cluster databases, wherein the first input/output module is used for reading source data from the source database; the memory computing module is used for receiving the source data and dividing and transmitting the source data; the second input/output module is used for receiving divided data and transmitting the data to the corresponding cluster database; the cluster databases are used for receiving and storing the divided data. The data are read from the source database to a memory in a block-based reading manner, then the data are divided in a data division rule, and the divided data are written into a corresponding target database of the cluster databases in a block write-in manner. With the adoption of the block-based I/O (input/output) and memory computing manners, simultaneous operation of time-consuming data reading and efficient data division is avoided, and accordingly, the data migration efficiency is greatly improved.
Description
Technical field
The present invention relates to field of computer data processing, it particularly relates to a kind of data mover system.
Background technology
The high-volume data being frequently encountered in actual production environment in data base be will be stored in are according to certain
Distribution mode moves in distributed experiment & measurement system, or by the data that store in distributed type assemblies according to newly
Distribution mode repartition the requirement of storage.At present, it is achieved above-mentioned migration or the universal method heavily divided
Being broadly divided into two kinds: one is from source database, data to be exported to text, then in target database
Sql (Structured Query Language SQL) statement is used to carry out text drawing
Divide and import;Two is to be write by the division of the complete paired data of sql statement when deriving data from source database
Text, then imports data in the database node of correspondence.
Above two method writes disk due to needs to intermediate object program, simultaneously will be by the sql language of data base
Sentence carries out needing when data divide that data carry out in data base the other scanning of table level and calculates, and therefore efficiency is not
High.
Realize mass data the most efficiently migrate or be heavily divided in distributed experiment & measurement system system one-tenth
For puzzlement user and a difficult problem for manufacturer at present.And then, producer have employed sqoop instrument, and it is supported in data
Using sql statement filter data and divide in transition process, therefore, it can be used as realizing hadoop
And the instrument of Data Migration between relevant database.But, although number can be realized by hadoop
According to the Data Migration between storehouse or data-base cluster, but divide plan owing to realizing data the most efficiently
Slightly, the filtration that can only carry out data by realizing sql statement divides, the method and the migration side of above two
Method is the same, inefficient.
For the problem in correlation technique, effective solution is the most not yet proposed.
Summary of the invention
For the problem in correlation technique, the present invention proposes a kind of data mover system.
The technical scheme is that and be achieved in that:
This data mover system includes:
Source database;
First input/output module, for reading the source data of source database and sending;
Internal memory computing module, is used for receiving source data, and divides source data and send;
Second input/output module, the data after receiving division, and it is sent to the cluster number of correspondence
According to storehouse;
At least two Cluster Database, is used for the data after receiving division and stores.
Preferably, internal memory computing module includes:
Task management module, transcoding module, character meaning transferring module and data divide module.
Preferably, task management module is for distributing the task of source data, and monitors code conversion mould
Block, character shift module and data divide the running status of module.
Preferably, transcoding module is for carrying out transcoding operation to the spcial character in source data.
Preferably, character meaning transferring module for carrying out character escape operation to the spcial character in source data.
Preferably, spcial character includes at least one of:
Newline, carriage return character, form feed character.
Preferably, data divide module, connect metadatabase, for source data is carried out division operation.
Preferably, metadatabase is for providing the list sending position of source data.
Preferably, data dividing mode includes at least one of:
Hash division, scope division, list division, poll divide.
Preferably, source database includes at least one of:
Oracle database, PostgreSQL data base.
Data are read in internal memory, so from source database by the present invention by using block-based reading manner
After complete the division of data according to the division rule of data, ready-portioned data are write by the way of block writes
Enter in the corresponding target database of Cluster Database.Owing to have employed the mode calculated based on block I/O and internal memory,
Avoid running while the division of efficient data of time-consuming digital independent, be therefore greatly improved number
According to the efficiency migrated.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement
In example, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only
Some embodiments of the present invention, for those of ordinary skill in the art, are not paying creative work
Under premise, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the schematic diagram of data mover system according to embodiments of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly
Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than
Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art obtained all its
His embodiment, broadly falls into the scope of protection of the invention.
As it is shown in figure 1, data mover system according to embodiments of the present invention includes:
Source database;
First input/output module, for reading the source data of source database and sending;
Internal memory computing module, is used for receiving source data, and divides source data and send;
Second input/output module, the data after receiving division, and it is sent to the cluster number of correspondence
According to storehouse;
At least two Cluster Database, is used for the data after receiving division and stores.
Data are read in internal memory, so from source database by the present invention by using block-based reading manner
After complete the division of data according to the division rule of data, ready-portioned data are write by the way of block writes
Enter in the corresponding target database of Cluster Database.Owing to have employed the mode calculated based on block I/O and internal memory,
Avoid running while the division of efficient data of time-consuming digital independent, be therefore greatly improved number
According to the efficiency migrated.
In a preferred embodiment, internal memory computing module includes: task management module, code conversion
Module, character meaning transferring module and data divide module.Wherein, this task management module is for source number
According to task distribution, and monitor transcoding module, character shift module and data divide the fortune of module
Row state;This transcoding module is for carrying out transcoding operation to the spcial character in source data;
This character meaning transferring module for carrying out character escape operation to the spcial character in source data, wherein, special
Different character includes at least one of: newline, carriage return character, form feed character;These data divide module,
Connecting metadatabase, for source data is carried out division operation, wherein, this metadatabase is used for providing
The list sending position of source data.
In a preferred embodiment, data dividing mode includes at least one of: Hash divides,
Scope divides, list (list) divides, poll divides, and wherein, the mode that data divide is according to Kazakhstan
Data are divided by uncommon algorithm, scope, list, the method for poll.
In another preferred embodiment, source database includes at least one of: Oracle data
Storehouse, PostgreSQL data base.
In order to be better understood from this programme, it is explained in detail with specific embodiment below.
The invention mainly comprises an I/O (Input/Output input/output) module, the 2nd I/O mould
Block and internal memory computing module, wherein internal memory computing module includes: task management, code conversion, character
Escape, data divide four submodules.The core concept of the present invention by according to data base with block
Mode is by digital independent to internal memory, then according to incoming source database and the coding of target database
The parameter logistic such as form, ESC and the data storage mode in distributed type assemblies are according to being sequentially completed
Code conversion, character escape and data divide, the mode finally ready-portioned data write according to block
Join node database corresponding in distributed type assemblies.
Wherein, carry out digital independent by block and carry out performance boost 5-10 times of digital independent one by one, and
And the data of reading no longer write disk from storehouse, source, this saves the biggest for mass data
Time overhead.Conversion process and the division operation of data completes in internal memory simultaneously, greatly improves
The performance that Data Migration divides.
In Fig. 1, the function of each module is as follows:
First (or second) I/O module: this module is by data base's I/O interface function of encapsulation, real
Now read data and by ready-portioned data according to the function of data block write into Databasce according to data block,
There is provided degree parameter to realize concurrently reading simultaneously.This module comprises an interface, Qi Zhongfeng
Fill the I/O interface function of disparate databases, to realize the reading and writing data of distinct type data-base is propped up
Hold.Have been completed the encapsulation to oracle and postgresql data base at present.
Concrete data reading step is as follows:
1) data base's connection of correspondence is set up according to incoming source database Connecting quantity;
2) according to incoming table name, inquiry database dictionary determines the block at data place to be read;
3) calling interface function reading database block;
4) blocks of data of reading is resolved to record and put into internal memory.
The step writing data to destination node data base is as follows:
1) connection of node database corresponding to division data is set up;
2) inquiry data dictionary confirms that object table exists and structure is correct;
3) data are write data file according to the mode of block, and revise the corresponding data dictionary of data base
Information, completes data and imports.
The sql statement (oracle database) of acquisition data block:
Internal memory computing module: this module mainly includes task management, code conversion, character escape, number
According to dividing four submodules, mainly complete conversion process and the partition functionality of data.The merit of each submodule
Can be as follows:
Task management module: be responsible for the start and stop of relevant operation in internal memory computing module and dividing of memory source
Join recovery.Whether comprise code conversion, ESC and data according to the parameter that migration task is incoming to draw
The method divided starts corresponding code conversion, character escape and data and divides module.Simultaneously according to biography
The data volume entered carries out the distribution of memory source and monitors the running status of each module design task.
Transcoding module: carry out common problem encountered is that between data base of Data Migration between disparate databases
Coding difference causes Confused-code (especially Chinese) occur.It is achieved that code conversion function, pass through
Charset and ncharset respectively specifies that the character encoding type of source and target data base, completes corresponding
Code conversion works.The character code supported at present has: GB2312;ISO-8859;UTF-8;ASCII.
Character meaning transferring module: if the spcial character comprised in record is not processed, data can be caused
Cannot normal storage.Such as: record comprises newline, or comprises “ " etc., need data are entered
Row corresponding escape operation, as newline escape is " n ", backslash escape is " " etc..Specifically
The ESC that escape result is specified by user determines, such as:
Escape=0x5c
Null=0x5cN
" "=0x5c0x5c
Data divide module: the data dividing mode of support has Hash, scope, list and poll.Logical
Data are completed to divide by the data dividing mode crossing target table definition.Concrete partiting step is as follows:
1) metadatabase of Querying Distributed cluster, confirms the dividing mode of object table;
2) call partition function according to the dividing mode of object table data record is divided;
3) data of division are stored in internal memory container.
The effect of the present invention is as follows:
1) by carrying out reading and writing data by data block, the performance of data exporting is greatly improved;
2) the I/O interface function provided by encapsulation disparate databases, can not only realize homotype data
Data Migration between storehouse, moreover it is possible to realize the Data Migration between distinct type data-base;
3) data read from source database are stored in internal memory, it is to avoid disk write operation, save
The expense of mass data written document;
4) process that the data read from source data carry out in internal memory data converts and divides, and carries
The efficiency that high Data Migration divides;
5) by realizing code conversion and character meaning transferring module, improve different coding mode data base it
Between the success rate of Data Migration.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this
Within bright spirit and principle, any modification, equivalent substitution and improvement etc. made, should be included in this
Within bright protection domain.
Claims (10)
1. a data mover system, it is characterised in that including:
Source database;
First input/output module, for reading the source data of described source database and sending;
Internal memory computing module, is used for receiving described source data, and divides described source data also
Send;
Second input/output module, the data after receiving division, and it is sent to the cluster number of correspondence
According to storehouse;
Cluster Database described at least two, is used for the data after receiving described division and stores.
Data mover system the most according to claim 1, it is characterised in that described internal memory calculates
Module includes:
Task management module, transcoding module, character meaning transferring module and data divide module.
Data mover system the most according to claim 2, it is characterised in that described task management
Module is for distributing the task of described source data, and monitors described transcoding module, described character
Shift module and described data divide the running status of module.
Data mover system the most according to claim 2, it is characterised in that described code conversion
Module is for carrying out transcoding operation to the spcial character in described source data.
Data mover system the most according to claim 2, it is characterised in that described character escape
Module for carrying out character escape operation to the spcial character in described source data.
Data mover system the most according to claim 5, it is characterised in that described spcial character
Including at least one of:
Newline, carriage return character, form feed character.
Data mover system the most according to claim 2, it is characterised in that described data divide
Module, connects metadatabase, for described source data is carried out division operation.
Data mover system the most according to claim 7, it is characterised in that described metadatabase
For providing the list sending position of described source data.
Data mover system the most according to claim 7, it is characterised in that data dividing mode
Including at least one of:
Hash division, scope division, list division, poll divide.
Data mover system the most according to claim 1, it is characterised in that described source data
Storehouse includes at least one of:
Oracle database, PostgreSQL data base.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610227555.9A CN105843955A (en) | 2016-04-13 | 2016-04-13 | Data migration system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610227555.9A CN105843955A (en) | 2016-04-13 | 2016-04-13 | Data migration system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105843955A true CN105843955A (en) | 2016-08-10 |
Family
ID=56597419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610227555.9A Pending CN105843955A (en) | 2016-04-13 | 2016-04-13 | Data migration system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105843955A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570314A (en) * | 2016-10-19 | 2017-04-19 | 北京千医健康管理有限公司 | ICCINO (Insurance, Check, Check, Inform, Nursing and Observe) door-to-door nurse service standard |
CN109284335A (en) * | 2018-09-10 | 2019-01-29 | 郑州云海信息技术有限公司 | A kind of method and apparatus of integration across database batch conduct data |
CN110300188A (en) * | 2019-07-25 | 2019-10-01 | 中国工商银行股份有限公司 | Data transmission system, method and apparatus |
CN112817944A (en) * | 2021-02-26 | 2021-05-18 | 北京北信源软件股份有限公司 | Data migration method and device, electronic equipment and storage medium |
WO2021203802A1 (en) * | 2020-04-10 | 2021-10-14 | 苏州浪潮智能科技有限公司 | Character string transmission method and device, computer and readable storage medium |
CN114063914A (en) * | 2021-11-05 | 2022-02-18 | 武汉理工大学 | DRAM-HBM hybrid memory oriented data management method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080155218A1 (en) * | 2006-12-20 | 2008-06-26 | International Business Machines Corporation | Optimized Data Migration with a Support Processor |
CN102938001A (en) * | 2012-12-10 | 2013-02-20 | 曙光信息产业(北京)有限公司 | Data loading device and data loading method |
CN102999537A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | System and method for data migration |
CN103793424A (en) * | 2012-10-31 | 2014-05-14 | 阿里巴巴集团控股有限公司 | Database data migration method and database data migration system |
CN104572862A (en) * | 2014-12-19 | 2015-04-29 | 阳珍秀 | Mass data storage access method and system |
-
2016
- 2016-04-13 CN CN201610227555.9A patent/CN105843955A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080155218A1 (en) * | 2006-12-20 | 2008-06-26 | International Business Machines Corporation | Optimized Data Migration with a Support Processor |
CN102999537A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | System and method for data migration |
CN103793424A (en) * | 2012-10-31 | 2014-05-14 | 阿里巴巴集团控股有限公司 | Database data migration method and database data migration system |
CN102938001A (en) * | 2012-12-10 | 2013-02-20 | 曙光信息产业(北京)有限公司 | Data loading device and data loading method |
CN104572862A (en) * | 2014-12-19 | 2015-04-29 | 阳珍秀 | Mass data storage access method and system |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570314A (en) * | 2016-10-19 | 2017-04-19 | 北京千医健康管理有限公司 | ICCINO (Insurance, Check, Check, Inform, Nursing and Observe) door-to-door nurse service standard |
CN109284335A (en) * | 2018-09-10 | 2019-01-29 | 郑州云海信息技术有限公司 | A kind of method and apparatus of integration across database batch conduct data |
CN110300188A (en) * | 2019-07-25 | 2019-10-01 | 中国工商银行股份有限公司 | Data transmission system, method and apparatus |
WO2021203802A1 (en) * | 2020-04-10 | 2021-10-14 | 苏州浪潮智能科技有限公司 | Character string transmission method and device, computer and readable storage medium |
CN112817944A (en) * | 2021-02-26 | 2021-05-18 | 北京北信源软件股份有限公司 | Data migration method and device, electronic equipment and storage medium |
CN112817944B (en) * | 2021-02-26 | 2024-06-07 | 北京北信源软件股份有限公司 | Data migration method and device, electronic equipment and storage medium |
CN114063914A (en) * | 2021-11-05 | 2022-02-18 | 武汉理工大学 | DRAM-HBM hybrid memory oriented data management method |
CN114063914B (en) * | 2021-11-05 | 2024-04-09 | 武汉理工大学 | Data management method for DRAM-HBM hybrid memory |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105843955A (en) | Data migration system | |
CN109299102B (en) | HBase secondary index system and method based on Elastcissearch | |
CN104794123B (en) | A kind of method and device building NoSQL database indexes for semi-structured data | |
CN107818115B (en) | Method and device for processing data table | |
CN106708993B (en) | Method for realizing space data storage processing middleware framework based on big data technology | |
CN103514201B (en) | Method and device for querying data in non-relational database | |
CN103106249B (en) | A kind of parallel data processing system based on Cassandra | |
CN104252536B (en) | A kind of internet log data query method and device based on hbase | |
CA2997061C (en) | Method and system for parallelization of ingestion of large data sets | |
CN109241159B (en) | Partition query method and system for data cube and terminal equipment | |
CN103793424A (en) | Database data migration method and database data migration system | |
CN102129458A (en) | Method and device for storing relational database | |
US11386063B2 (en) | Data edge platform for improved storage and analytics | |
US10282349B2 (en) | Method for storing data elements in a database | |
CN105164673A (en) | Query integration across databases and file systems | |
CN110674154A (en) | Spark-based method for inserting, updating and deleting data in Hive | |
CN110968579B (en) | Execution plan generation and execution method, database engine and storage medium | |
CN102193990A (en) | Pattern database and realization method thereof | |
CN111506621A (en) | Data statistical method and device | |
CN105447172A (en) | Data processing method and system under Hadoop platform | |
CN103810219A (en) | Line storage database-based data processing method and device | |
CN113177090A (en) | Data processing method and device | |
CN112231351A (en) | Real-time query method and device for PB-level mass data | |
KR101955376B1 (en) | Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method | |
CN112417225A (en) | Joint query method and system for multi-source heterogeneous data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160810 |