CN105279280A - Method and tool for quickly migrating oracle data to MPP database - Google Patents

Method and tool for quickly migrating oracle data to MPP database Download PDF

Info

Publication number
CN105279280A
CN105279280A CN201510786466.3A CN201510786466A CN105279280A CN 105279280 A CN105279280 A CN 105279280A CN 201510786466 A CN201510786466 A CN 201510786466A CN 105279280 A CN105279280 A CN 105279280A
Authority
CN
China
Prior art keywords
data
metadata
oracle
subtask
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510786466.3A
Other languages
Chinese (zh)
Inventor
赵伟
武新
田志敏
杨伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Original Assignee
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd filed Critical TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority to CN201510786466.3A priority Critical patent/CN105279280A/en
Publication of CN105279280A publication Critical patent/CN105279280A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support

Abstract

The invention provides a method and a tool which perform functions of quickly extracting data in an oracle database, converting the data into the data which can be identified by an MPP database and quickly loading the converted data into the MPP database, thereby supporting data exchange in an enterprise online transaction system and a big data platform based on the method and the tool.

Description

Fast transferring Oracle data are to the method for MPP database and instrument
Technical field
The present invention relates to the technology of the migration of subscriber data of OLTP and OLAP field MPP data-base cluster, particularly the data of fast transferring oracle database are to the method for MPP data-base cluster.
Background technology
Along with the fast development of informationization technology, number of users, the data volume of an enterprise all present explosive growth, while portfolio improves, the visit capacity of database and data volume increase database processing power and calculating strength also corresponding increase fast, sharply expanding of data causes a set of database can not support business transaction system and the statistical analysis system of enterprise simultaneously, mainstream solution enterprise builds a set of large data platform again on existing online trading system basis, market has occurred the large data platform based on MPP and hadoop Liang great camp.Under such background, fast and accurately the Data Migration in enterprise's online trading system database is arisen at the historic moment to the demand in MPP or hadoop system.Laminating market demand this patent describes a kind of fast transferring Oracle data to the method in MPP data-base cluster.
Summary of the invention
Technical matters to be solved by this invention is on the basis of existing technology, propose a kind of can data in rapid extraction oracle database be converted into data rapid loading that MPP database can identify to the method in MPP database and instrument, support the exchanges data of enterprise's online trading system and large data platform based on this.
The technical scheme that the present invention takes is: a kind of fast transferring Oracle data, to the method for MPP database, comprise the steps:
(1) obtain the corresponding metadata information of oracle database, and be split as multiple subtask walked abreast according to metadata information and strategy pattern;
(2) subtask split out in concurrence performance step (1), carries out data pick-up and conversion operations;
(3) data that step (2) data pick-up and conversion operations obtain are loaded to MPP database.
Further, in described step (1), metadata information comprises table name, partition information, maximum, the minimum ROWID value of each block of table.
Further, the method that in described step (1), subtask splits is:
(11) according to the metadata information obtained, the total data summary info of a table is calculated;
(12) carry out cutting according to metadata digest information to the data of whole table, cutting is the output of multiple subtask;
(13) each subtask processes a section, and the intersection of whole subtask is the partial data of this table.
The Migration tools that the present invention adopts is the instruments of a kind of fast transferring Oracle data to MPP database, comprises metadata and obtains and computing module, parallel extraction modular converter, Data import module; Described metadata obtains to computing module for obtaining the corresponding metadata information of oracle database, and is split as multiple subtask walked abreast according to metadata information and strategy pattern; Described parallel extraction modular converter is used for the subtask that concurrence performance splits out, and carries out data pick-up and conversion operations; The data that described Data import module is used for parallel extraction modular converter obtains load to MPP database.
Further, described metadata obtains and comprises metadata information acquiring unit with computing module, for obtaining table name, and partition information, maximum, the minimum ROWID value of each block of table; Also comprise computing unit, for according to the metadata information obtained, calculate the total data summary info of a table; Also comprise cutting unit, for carrying out cutting according to metadata digest information to the data of whole table, cutting is the output of multiple subtask.
Further, described metadata obtains with computing module, walks abreast and extract modular converter, Data import module producer consumer pattern each other, ensures high-performance line production.
Further, have strategy pattern unit at Data import module installation, loading strategy pattern for setting data is that high scalability adapts to the multiple loading data pattern of MPP database or the diversified output mode of data.
The advantage of this patent and beneficial effect are:
1. performance boost, the traditional deriving method unit list table comparing oracle derives performance boost to more than 400GB/h
2. dispose selecting property of flexible and selectable strong, oracle client can be relied on, can not rely on
3. high scalability, can distributed deployment, and performance linear promotes
Accompanying drawing explanation
Fig. 1 general plan Organization Chart;
Fig. 2 distributed fast transferring oracle database data are to MPP data base tool Organization Chart;
The quick derived data principle schematic of Fig. 3.
Embodiment
Integral deployment framework of the present invention as shown in Figure 1, extracts fast, changes, the instrument loaded data in MPP data-base cluster can be deployed as distributed mode and single cpu mode from oracle database.
Migration tools of the present invention is divided into three large modules on the whole: metadata obtains and computing module, parallel extraction modular converter, Data import module (software architecture figure is as Fig. 2).
Wherein the principle of metadata acquisition and the quick extracted data of computing module as shown in Figure 3.
The migration flow process of Migration tools is:
1. Migration tools starts, and obtains configuration information, and task is derived table data content be loaded in mpp cluster from oracle data;
2. Migration tools sets up the connection with oracle database, is obtained and obtains corresponding metadata information to computing module, and this subtask is split as multiple subtask walked abreast according to metadata information and strategy pattern by metadata;
3. Migration tools performs data pick-up work, the subtask split out is assigned to concurrent extraction modular converter, concurrence performance data pick-up and conversion operations in step 2;
4. the data that concurrent extraction modular converter obtains are submitted to Data import module by Migration tools, complete the load operation to mpp database by Data import module;
In described step 2, metadata letter comprises table name, partition information, maximum, the minimum ROWID value of each block of table, according to the metadata information obtained, can calculate the total data summary info of a table, according to metadata digest information, cutting is carried out to the data of whole table, cutting is the output of multiple subtask, and each subtask processes a section, and the intersection of whole subtask is the partial data of this table;
In described step 2, metadata obtains and computing module, parallel extraction modular converter, and Data import module is producer consumer pattern each other, ensures high-performance line production; Strategy pattern high scalability is used to adapt to the multiple loading data pattern of MPP database or the diversified output mode of data in Data import pattern.
Enumerate an embodiment scene below: at IP be 192.168.103.65 server on SID be the table that test user in the oracle data of orcl has a test by name, the inside comprises 3.3 hundred million (331941213) bar data.
1. receive and derive process request, metadata to obtain with computing module according to the SID of the IP address of formulating and port numbers and Oracle and derives agreement (OCI or the thin) creation database used and be connected.
2. if case of non-partitioned tables, according to the table name of specifying and user's name, from the data in the system views such as the system view dba_extents of oracle use ROWID_CREATE function to calculate all block of current table and terminate ROWID metadata array.
3. if partition table, if derive full table, according to the table name of specifying to system view dba_tab_partitions, dba_extents, etc. the ROWID type in system view, block and blocksize is calculated the beginning of all block of all subregions of current table and terminates ROWID metadata array by ROWID_CREATE function.
4. if partition table, but only derive current bay, according to the table name of specifying to the ROWID type in the system views such as system use dba_extents, block and blocksize is calculated the beginning of all block of all subregions of current table and terminates ROWID metadata array by ROWID_CREATE function.
5. after the metadata array of table derives, give parallel extraction modular converter, the parallel modular converter that extracts is if single cpu mode, open multithreading, each thread once extracts the data of a block, read in binary form, give modular converter and carry out the load format that data are converted to the identification of MPP database.
6., if the parallel modular converter that extracts is deployed as distributed mode, every station server simultaneous multi-threading, each thread of every station server derives the data of a block, is carried out fast the management of instrument self metadata everywhere by zookeeper.
7. persistence architecture has the prompt access interface of Data import module schedules MPP database to put in storage after completing in the memory cache queue of the machine, if the loading API that the Data import instrument of MPP database does not directly accept binary stream can select to land after formatted text lands reloads in MPP database.

Claims (7)

1. fast transferring Oracle data are to a method for MPP database, it is characterized in that, comprise the steps:
(1) obtain the corresponding metadata information of oracle database, and be split as multiple subtask walked abreast according to metadata information and strategy pattern;
(2) subtask split out in concurrence performance step (1), carries out data pick-up and conversion operations;
(3) data that step (2) data pick-up and conversion operations obtain are loaded to MPP database.
2. a kind of fast transferring Oracle data according to claim 1 are to the method for MPP database, it is characterized in that: in described step (1), metadata information comprises table name, partition information, maximum, the minimum ROWID value of each block of table.
3. a kind of fast transferring Oracle data according to claim 1 and 2 are to the method for MPP database, it is characterized in that, the method that in described step (1), subtask splits is:
(11) according to the metadata information obtained, the total data summary info of a table is calculated;
(12) carry out cutting according to metadata digest information to the data of whole table, cutting is the output of multiple subtask;
(13) each subtask processes a section, and the intersection of whole subtask is the partial data of this table.
4. fast transferring Oracle data are to an instrument for MPP database, it is characterized in that: comprise metadata and obtain and computing module, parallel extraction modular converter, Data import module; Described metadata obtains to computing module for obtaining the corresponding metadata information of oracle database, and is split as multiple subtask walked abreast according to metadata information and strategy pattern; Described parallel extraction modular converter is used for the subtask that concurrence performance splits out, and carries out data pick-up and conversion operations; The data that described Data import module is used for parallel extraction modular converter obtains load to MPP database.
5. a kind of fast transferring Oracle data according to claim 4 are to the instrument of MPP database, it is characterized in that, described metadata obtains and comprises metadata information acquiring unit, for obtaining table name with computing module, partition information, maximum, the minimum ROWID value of each block of table; Also comprise computing unit, for according to the metadata information obtained, calculate the total data summary info of a table; Also comprise cutting unit, for carrying out cutting according to metadata digest information to the data of whole table, cutting is the output of multiple subtask.
6. a kind of fast transferring Oracle data according to claim 4 or 5 are to the instrument of MPP database, it is characterized in that: described metadata obtains with computing module, walks abreast and extract modular converter, Data import module producer consumer pattern each other, ensures high-performance line production.
7. a kind of fast transferring Oracle data according to claim 4 or 5 are to the instrument of MPP database, it is characterized in that: have strategy pattern unit at Data import module installation, loading strategy pattern for setting data is that high scalability adapts to the multiple loading data pattern of MPP database or the diversified output mode of data.
CN201510786466.3A 2015-11-16 2015-11-16 Method and tool for quickly migrating oracle data to MPP database Pending CN105279280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510786466.3A CN105279280A (en) 2015-11-16 2015-11-16 Method and tool for quickly migrating oracle data to MPP database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510786466.3A CN105279280A (en) 2015-11-16 2015-11-16 Method and tool for quickly migrating oracle data to MPP database

Publications (1)

Publication Number Publication Date
CN105279280A true CN105279280A (en) 2016-01-27

Family

ID=55148294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510786466.3A Pending CN105279280A (en) 2015-11-16 2015-11-16 Method and tool for quickly migrating oracle data to MPP database

Country Status (1)

Country Link
CN (1) CN105279280A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760212A (en) * 2016-02-02 2016-07-13 贵州大学 Data redistribution method and device based on vessels
CN106572172A (en) * 2016-11-07 2017-04-19 湖北省农村信用社联合社网络信息中心 Multi-process data migration method based on Hash algorithm
CN107291764A (en) * 2016-04-05 2017-10-24 中兴通讯股份有限公司 A kind of big data exchange method and device, system
CN108446145A (en) * 2018-03-21 2018-08-24 苏州提点信息科技有限公司 A kind of distributed document loads MPP data base methods automatically
CN109213751A (en) * 2018-08-06 2019-01-15 北京所问数据科技有限公司 Oracle database parallel migration technology based on Spark platform
CN111581179A (en) * 2019-02-19 2020-08-25 上海云桓信息科技有限公司 Data migration method and tool from Oracle to MySQL
CN113656474A (en) * 2021-08-05 2021-11-16 京东科技控股股份有限公司 Service data access method and device, electronic equipment and storage medium
CN116756150A (en) * 2023-08-16 2023-09-15 浩鲸云计算科技股份有限公司 Mpp database large table association acceleration method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080288498A1 (en) * 2007-05-14 2008-11-20 Hinshaw Foster D Network-attached storage devices
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
US20140156666A1 (en) * 2012-11-30 2014-06-05 Futurewei Technologies, Inc. Method for Automated Scaling of a Massive Parallel Processing (MPP) Database
CN103902593A (en) * 2012-12-27 2014-07-02 中国移动通信集团河南有限公司 Data transfer method and device
CN104123392A (en) * 2014-08-11 2014-10-29 吉林禹硕动漫游戏科技股份有限公司 Tool and method for transferring relational database to HBase
CN104899333A (en) * 2015-06-24 2015-09-09 浪潮(北京)电子信息产业有限公司 Cross-platform migrating method and system for Oracle database

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080288498A1 (en) * 2007-05-14 2008-11-20 Hinshaw Foster D Network-attached storage devices
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
US20140156666A1 (en) * 2012-11-30 2014-06-05 Futurewei Technologies, Inc. Method for Automated Scaling of a Massive Parallel Processing (MPP) Database
CN105009110A (en) * 2012-11-30 2015-10-28 华为技术有限公司 Method for automated scaling of massive parallel processing (mpp) database
CN103902593A (en) * 2012-12-27 2014-07-02 中国移动通信集团河南有限公司 Data transfer method and device
CN104123392A (en) * 2014-08-11 2014-10-29 吉林禹硕动漫游戏科技股份有限公司 Tool and method for transferring relational database to HBase
CN104899333A (en) * 2015-06-24 2015-09-09 浪潮(北京)电子信息产业有限公司 Cross-platform migrating method and system for Oracle database

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760212A (en) * 2016-02-02 2016-07-13 贵州大学 Data redistribution method and device based on vessels
CN105760212B (en) * 2016-02-02 2019-04-12 贵州大学 A kind of fast resampling method and device based on container
CN107291764A (en) * 2016-04-05 2017-10-24 中兴通讯股份有限公司 A kind of big data exchange method and device, system
CN106572172A (en) * 2016-11-07 2017-04-19 湖北省农村信用社联合社网络信息中心 Multi-process data migration method based on Hash algorithm
CN108446145A (en) * 2018-03-21 2018-08-24 苏州提点信息科技有限公司 A kind of distributed document loads MPP data base methods automatically
CN109213751A (en) * 2018-08-06 2019-01-15 北京所问数据科技有限公司 Oracle database parallel migration technology based on Spark platform
CN109213751B (en) * 2018-08-06 2021-11-23 北京所问数据科技有限公司 Spark platform based Oracle database parallel migration method
CN111581179A (en) * 2019-02-19 2020-08-25 上海云桓信息科技有限公司 Data migration method and tool from Oracle to MySQL
CN113656474A (en) * 2021-08-05 2021-11-16 京东科技控股股份有限公司 Service data access method and device, electronic equipment and storage medium
CN116756150A (en) * 2023-08-16 2023-09-15 浩鲸云计算科技股份有限公司 Mpp database large table association acceleration method
CN116756150B (en) * 2023-08-16 2023-10-31 浩鲸云计算科技股份有限公司 Mpp database large table association acceleration method

Similar Documents

Publication Publication Date Title
CN105279280A (en) Method and tool for quickly migrating oracle data to MPP database
CN103020281B (en) A kind of data storage and retrieval method based on spatial data numerical index
CN104899295B (en) A kind of heterogeneous data source data relation analysis method
CN103530327B (en) A kind of data migration method from non-relational database to relevant database
CN111382226B (en) Database query and retrieval method and device and electronic equipment
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN106547918B (en) Statistical data integration method and system
CN102262640A (en) Method and device for full-text retrieval of document database
CN104599032A (en) Distributed memory power grid construction method and system for resource management
CN104111996A (en) Health insurance outpatient clinic big data extraction system and method based on hadoop platform
US8880463B2 (en) Standardized framework for reporting archived legacy system data
CN108241627A (en) A kind of isomeric data storage querying method and system
CN104239377A (en) Platform-crossing data retrieval method and device
CN103177035A (en) Data query device and data query method in data base
CN104572894A (en) Method for describing service model by utilizing XML (Extensible Markup Language) in business intelligence and business intelligence system
CN104268298A (en) Method for creating database index and inquiring data
CN105205621A (en) High-performance information management system and data processing method for bioinformatics
CN105677858A (en) Data collection method and device based on big data technology framework
CN108133043B (en) Structured storage method for server running logs based on big data
CN107798120B (en) Data conversion method and device
CN110263021B (en) Theme library generation method based on personalized label system
CN102779160A (en) Mass data information indexing system and indexing construction method
CN104715076A (en) Multi-threaded data processing method and device
CN107301203B (en) Mass data comparison method and system
CN107291938A (en) Order Query System and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160127