CN105843955A - Data migration system - Google Patents

Data migration system Download PDF

Info

Publication number
CN105843955A
CN105843955A CN201610227555.9A CN201610227555A CN105843955A CN 105843955 A CN105843955 A CN 105843955A CN 201610227555 A CN201610227555 A CN 201610227555A CN 105843955 A CN105843955 A CN 105843955A
Authority
CN
China
Prior art keywords
data
module
character
database
division
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610227555.9A
Other languages
Chinese (zh)
Inventor
张建磊
郭庆
惠润海
宋怀明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN201610227555.9A priority Critical patent/CN105843955A/en
Publication of CN105843955A publication Critical patent/CN105843955A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data migration system. The system comprises a source database, a first input/output module, a memory computing module, a second input/output module and at least two cluster databases, wherein the first input/output module is used for reading source data from the source database; the memory computing module is used for receiving the source data and dividing and transmitting the source data; the second input/output module is used for receiving divided data and transmitting the data to the corresponding cluster database; the cluster databases are used for receiving and storing the divided data. The data are read from the source database to a memory in a block-based reading manner, then the data are divided in a data division rule, and the divided data are written into a corresponding target database of the cluster databases in a block write-in manner. With the adoption of the block-based I/O (input/output) and memory computing manners, simultaneous operation of time-consuming data reading and efficient data division is avoided, and accordingly, the data migration efficiency is greatly improved.

Description

A kind of data mover system
Technical field
The present invention relates to field of computer data processing, it particularly relates to a kind of data mover system.
Background technology
The high-volume data being frequently encountered in actual production environment in data base be will be stored in are according to certain Distribution mode moves in distributed experiment & measurement system, or by the data that store in distributed type assemblies according to newly Distribution mode repartition the requirement of storage.At present, it is achieved above-mentioned migration or the universal method heavily divided Being broadly divided into two kinds: one is from source database, data to be exported to text, then in target database Sql (Structured Query Language SQL) statement is used to carry out text drawing Divide and import;Two is to be write by the division of the complete paired data of sql statement when deriving data from source database Text, then imports data in the database node of correspondence.
Above two method writes disk due to needs to intermediate object program, simultaneously will be by the sql language of data base Sentence carries out needing when data divide that data carry out in data base the other scanning of table level and calculates, and therefore efficiency is not High.
Realize mass data the most efficiently migrate or be heavily divided in distributed experiment & measurement system system one-tenth For puzzlement user and a difficult problem for manufacturer at present.And then, producer have employed sqoop instrument, and it is supported in data Using sql statement filter data and divide in transition process, therefore, it can be used as realizing hadoop And the instrument of Data Migration between relevant database.But, although number can be realized by hadoop According to the Data Migration between storehouse or data-base cluster, but divide plan owing to realizing data the most efficiently Slightly, the filtration that can only carry out data by realizing sql statement divides, the method and the migration side of above two Method is the same, inefficient.
For the problem in correlation technique, effective solution is the most not yet proposed.
Summary of the invention
For the problem in correlation technique, the present invention proposes a kind of data mover system.
The technical scheme is that and be achieved in that:
This data mover system includes:
Source database;
First input/output module, for reading the source data of source database and sending;
Internal memory computing module, is used for receiving source data, and divides source data and send;
Second input/output module, the data after receiving division, and it is sent to the cluster number of correspondence According to storehouse;
At least two Cluster Database, is used for the data after receiving division and stores.
Preferably, internal memory computing module includes:
Task management module, transcoding module, character meaning transferring module and data divide module.
Preferably, task management module is for distributing the task of source data, and monitors code conversion mould Block, character shift module and data divide the running status of module.
Preferably, transcoding module is for carrying out transcoding operation to the spcial character in source data.
Preferably, character meaning transferring module for carrying out character escape operation to the spcial character in source data.
Preferably, spcial character includes at least one of:
Newline, carriage return character, form feed character.
Preferably, data divide module, connect metadatabase, for source data is carried out division operation.
Preferably, metadatabase is for providing the list sending position of source data.
Preferably, data dividing mode includes at least one of:
Hash division, scope division, list division, poll divide.
Preferably, source database includes at least one of:
Oracle database, PostgreSQL data base.
Data are read in internal memory, so from source database by the present invention by using block-based reading manner After complete the division of data according to the division rule of data, ready-portioned data are write by the way of block writes Enter in the corresponding target database of Cluster Database.Owing to have employed the mode calculated based on block I/O and internal memory, Avoid running while the division of efficient data of time-consuming digital independent, be therefore greatly improved number According to the efficiency migrated.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only Some embodiments of the present invention, for those of ordinary skill in the art, are not paying creative work Under premise, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the schematic diagram of data mover system according to embodiments of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art obtained all its His embodiment, broadly falls into the scope of protection of the invention.
As it is shown in figure 1, data mover system according to embodiments of the present invention includes:
Source database;
First input/output module, for reading the source data of source database and sending;
Internal memory computing module, is used for receiving source data, and divides source data and send;
Second input/output module, the data after receiving division, and it is sent to the cluster number of correspondence According to storehouse;
At least two Cluster Database, is used for the data after receiving division and stores.
Data are read in internal memory, so from source database by the present invention by using block-based reading manner After complete the division of data according to the division rule of data, ready-portioned data are write by the way of block writes Enter in the corresponding target database of Cluster Database.Owing to have employed the mode calculated based on block I/O and internal memory, Avoid running while the division of efficient data of time-consuming digital independent, be therefore greatly improved number According to the efficiency migrated.
In a preferred embodiment, internal memory computing module includes: task management module, code conversion Module, character meaning transferring module and data divide module.Wherein, this task management module is for source number According to task distribution, and monitor transcoding module, character shift module and data divide the fortune of module Row state;This transcoding module is for carrying out transcoding operation to the spcial character in source data; This character meaning transferring module for carrying out character escape operation to the spcial character in source data, wherein, special Different character includes at least one of: newline, carriage return character, form feed character;These data divide module, Connecting metadatabase, for source data is carried out division operation, wherein, this metadatabase is used for providing The list sending position of source data.
In a preferred embodiment, data dividing mode includes at least one of: Hash divides, Scope divides, list (list) divides, poll divides, and wherein, the mode that data divide is according to Kazakhstan Data are divided by uncommon algorithm, scope, list, the method for poll.
In another preferred embodiment, source database includes at least one of: Oracle data Storehouse, PostgreSQL data base.
In order to be better understood from this programme, it is explained in detail with specific embodiment below.
The invention mainly comprises an I/O (Input/Output input/output) module, the 2nd I/O mould Block and internal memory computing module, wherein internal memory computing module includes: task management, code conversion, character Escape, data divide four submodules.The core concept of the present invention by according to data base with block Mode is by digital independent to internal memory, then according to incoming source database and the coding of target database The parameter logistic such as form, ESC and the data storage mode in distributed type assemblies are according to being sequentially completed Code conversion, character escape and data divide, the mode finally ready-portioned data write according to block Join node database corresponding in distributed type assemblies.
Wherein, carry out digital independent by block and carry out performance boost 5-10 times of digital independent one by one, and And the data of reading no longer write disk from storehouse, source, this saves the biggest for mass data Time overhead.Conversion process and the division operation of data completes in internal memory simultaneously, greatly improves The performance that Data Migration divides.
In Fig. 1, the function of each module is as follows:
First (or second) I/O module: this module is by data base's I/O interface function of encapsulation, real Now read data and by ready-portioned data according to the function of data block write into Databasce according to data block, There is provided degree parameter to realize concurrently reading simultaneously.This module comprises an interface, Qi Zhongfeng Fill the I/O interface function of disparate databases, to realize the reading and writing data of distinct type data-base is propped up Hold.Have been completed the encapsulation to oracle and postgresql data base at present.
Concrete data reading step is as follows:
1) data base's connection of correspondence is set up according to incoming source database Connecting quantity;
2) according to incoming table name, inquiry database dictionary determines the block at data place to be read;
3) calling interface function reading database block;
4) blocks of data of reading is resolved to record and put into internal memory.
The step writing data to destination node data base is as follows:
1) connection of node database corresponding to division data is set up;
2) inquiry data dictionary confirms that object table exists and structure is correct;
3) data are write data file according to the mode of block, and revise the corresponding data dictionary of data base Information, completes data and imports.
The sql statement (oracle database) of acquisition data block:
Internal memory computing module: this module mainly includes task management, code conversion, character escape, number According to dividing four submodules, mainly complete conversion process and the partition functionality of data.The merit of each submodule Can be as follows:
Task management module: be responsible for the start and stop of relevant operation in internal memory computing module and dividing of memory source Join recovery.Whether comprise code conversion, ESC and data according to the parameter that migration task is incoming to draw The method divided starts corresponding code conversion, character escape and data and divides module.Simultaneously according to biography The data volume entered carries out the distribution of memory source and monitors the running status of each module design task.
Transcoding module: carry out common problem encountered is that between data base of Data Migration between disparate databases Coding difference causes Confused-code (especially Chinese) occur.It is achieved that code conversion function, pass through Charset and ncharset respectively specifies that the character encoding type of source and target data base, completes corresponding Code conversion works.The character code supported at present has: GB2312;ISO-8859;UTF-8;ASCII.
Character meaning transferring module: if the spcial character comprised in record is not processed, data can be caused Cannot normal storage.Such as: record comprises newline, or comprises “ " etc., need data are entered Row corresponding escape operation, as newline escape is " n ", backslash escape is " " etc..Specifically The ESC that escape result is specified by user determines, such as:
Escape=0x5c
Null=0x5cN
" "=0x5c0x5c
Data divide module: the data dividing mode of support has Hash, scope, list and poll.Logical Data are completed to divide by the data dividing mode crossing target table definition.Concrete partiting step is as follows:
1) metadatabase of Querying Distributed cluster, confirms the dividing mode of object table;
2) call partition function according to the dividing mode of object table data record is divided;
3) data of division are stored in internal memory container.
The effect of the present invention is as follows:
1) by carrying out reading and writing data by data block, the performance of data exporting is greatly improved;
2) the I/O interface function provided by encapsulation disparate databases, can not only realize homotype data Data Migration between storehouse, moreover it is possible to realize the Data Migration between distinct type data-base;
3) data read from source database are stored in internal memory, it is to avoid disk write operation, save The expense of mass data written document;
4) process that the data read from source data carry out in internal memory data converts and divides, and carries The efficiency that high Data Migration divides;
5) by realizing code conversion and character meaning transferring module, improve different coding mode data base it Between the success rate of Data Migration.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this Within bright spirit and principle, any modification, equivalent substitution and improvement etc. made, should be included in this Within bright protection domain.

Claims (10)

1. a data mover system, it is characterised in that including:
Source database;
First input/output module, for reading the source data of described source database and sending;
Internal memory computing module, is used for receiving described source data, and divides described source data also Send;
Second input/output module, the data after receiving division, and it is sent to the cluster number of correspondence According to storehouse;
Cluster Database described at least two, is used for the data after receiving described division and stores.
Data mover system the most according to claim 1, it is characterised in that described internal memory calculates Module includes:
Task management module, transcoding module, character meaning transferring module and data divide module.
Data mover system the most according to claim 2, it is characterised in that described task management Module is for distributing the task of described source data, and monitors described transcoding module, described character Shift module and described data divide the running status of module.
Data mover system the most according to claim 2, it is characterised in that described code conversion Module is for carrying out transcoding operation to the spcial character in described source data.
Data mover system the most according to claim 2, it is characterised in that described character escape Module for carrying out character escape operation to the spcial character in described source data.
Data mover system the most according to claim 5, it is characterised in that described spcial character Including at least one of:
Newline, carriage return character, form feed character.
Data mover system the most according to claim 2, it is characterised in that described data divide Module, connects metadatabase, for described source data is carried out division operation.
Data mover system the most according to claim 7, it is characterised in that described metadatabase For providing the list sending position of described source data.
Data mover system the most according to claim 7, it is characterised in that data dividing mode Including at least one of:
Hash division, scope division, list division, poll divide.
Data mover system the most according to claim 1, it is characterised in that described source data Storehouse includes at least one of:
Oracle database, PostgreSQL data base.
CN201610227555.9A 2016-04-13 2016-04-13 Data migration system Pending CN105843955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610227555.9A CN105843955A (en) 2016-04-13 2016-04-13 Data migration system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610227555.9A CN105843955A (en) 2016-04-13 2016-04-13 Data migration system

Publications (1)

Publication Number Publication Date
CN105843955A true CN105843955A (en) 2016-08-10

Family

ID=56597419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610227555.9A Pending CN105843955A (en) 2016-04-13 2016-04-13 Data migration system

Country Status (1)

Country Link
CN (1) CN105843955A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570314A (en) * 2016-10-19 2017-04-19 北京千医健康管理有限公司 ICCINO (Insurance, Check, Check, Inform, Nursing and Observe) door-to-door nurse service standard
CN109284335A (en) * 2018-09-10 2019-01-29 郑州云海信息技术有限公司 A kind of method and apparatus of integration across database batch conduct data
CN110300188A (en) * 2019-07-25 2019-10-01 中国工商银行股份有限公司 Data transmission system, method and apparatus
CN112817944A (en) * 2021-02-26 2021-05-18 北京北信源软件股份有限公司 Data migration method and device, electronic equipment and storage medium
WO2021203802A1 (en) * 2020-04-10 2021-10-14 苏州浪潮智能科技有限公司 Character string transmission method and device, computer and readable storage medium
CN114063914A (en) * 2021-11-05 2022-02-18 武汉理工大学 DRAM-HBM hybrid memory oriented data management method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080155218A1 (en) * 2006-12-20 2008-06-26 International Business Machines Corporation Optimized Data Migration with a Support Processor
CN102938001A (en) * 2012-12-10 2013-02-20 曙光信息产业(北京)有限公司 Data loading device and data loading method
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN103793424A (en) * 2012-10-31 2014-05-14 阿里巴巴集团控股有限公司 Database data migration method and database data migration system
CN104572862A (en) * 2014-12-19 2015-04-29 阳珍秀 Mass data storage access method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080155218A1 (en) * 2006-12-20 2008-06-26 International Business Machines Corporation Optimized Data Migration with a Support Processor
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN103793424A (en) * 2012-10-31 2014-05-14 阿里巴巴集团控股有限公司 Database data migration method and database data migration system
CN102938001A (en) * 2012-12-10 2013-02-20 曙光信息产业(北京)有限公司 Data loading device and data loading method
CN104572862A (en) * 2014-12-19 2015-04-29 阳珍秀 Mass data storage access method and system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570314A (en) * 2016-10-19 2017-04-19 北京千医健康管理有限公司 ICCINO (Insurance, Check, Check, Inform, Nursing and Observe) door-to-door nurse service standard
CN109284335A (en) * 2018-09-10 2019-01-29 郑州云海信息技术有限公司 A kind of method and apparatus of integration across database batch conduct data
CN110300188A (en) * 2019-07-25 2019-10-01 中国工商银行股份有限公司 Data transmission system, method and apparatus
WO2021203802A1 (en) * 2020-04-10 2021-10-14 苏州浪潮智能科技有限公司 Character string transmission method and device, computer and readable storage medium
CN112817944A (en) * 2021-02-26 2021-05-18 北京北信源软件股份有限公司 Data migration method and device, electronic equipment and storage medium
CN112817944B (en) * 2021-02-26 2024-06-07 北京北信源软件股份有限公司 Data migration method and device, electronic equipment and storage medium
CN114063914A (en) * 2021-11-05 2022-02-18 武汉理工大学 DRAM-HBM hybrid memory oriented data management method
CN114063914B (en) * 2021-11-05 2024-04-09 武汉理工大学 Data management method for DRAM-HBM hybrid memory

Similar Documents

Publication Publication Date Title
CN105843955A (en) Data migration system
CN109299102B (en) HBase secondary index system and method based on Elastcissearch
CN104794123B (en) A kind of method and device building NoSQL database indexes for semi-structured data
CN107818115B (en) Method and device for processing data table
CN106708993B (en) Method for realizing space data storage processing middleware framework based on big data technology
CN103514201B (en) Method and device for querying data in non-relational database
CN103106249B (en) A kind of parallel data processing system based on Cassandra
CN104252536B (en) A kind of internet log data query method and device based on hbase
CA2997061C (en) Method and system for parallelization of ingestion of large data sets
CN109241159B (en) Partition query method and system for data cube and terminal equipment
CN103793424A (en) Database data migration method and database data migration system
CN102129458A (en) Method and device for storing relational database
US11386063B2 (en) Data edge platform for improved storage and analytics
US10282349B2 (en) Method for storing data elements in a database
CN105164673A (en) Query integration across databases and file systems
CN110674154A (en) Spark-based method for inserting, updating and deleting data in Hive
CN110968579B (en) Execution plan generation and execution method, database engine and storage medium
CN102193990A (en) Pattern database and realization method thereof
CN111506621A (en) Data statistical method and device
CN105447172A (en) Data processing method and system under Hadoop platform
CN103810219A (en) Line storage database-based data processing method and device
CN113177090A (en) Data processing method and device
CN112231351A (en) Real-time query method and device for PB-level mass data
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN112417225A (en) Joint query method and system for multi-source heterogeneous data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160810