CN104572895A - MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method - Google Patents

MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method Download PDF

Info

Publication number
CN104572895A
CN104572895A CN201410820059.5A CN201410820059A CN104572895A CN 104572895 A CN104572895 A CN 104572895A CN 201410820059 A CN201410820059 A CN 201410820059A CN 104572895 A CN104572895 A CN 104572895A
Authority
CN
China
Prior art keywords
data
mpp
hadoop
importing
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410820059.5A
Other languages
Chinese (zh)
Other versions
CN104572895B (en
Inventor
陈雨
夏旭东
崔维力
武新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Original Assignee
TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd filed Critical TIANJIN NANKAI UNIVERSITY GENERAL DATA TECHNOLOGIES Co Ltd
Priority to CN201410820059.5A priority Critical patent/CN104572895B/en
Publication of CN104572895A publication Critical patent/CN104572895A/en
Application granted granted Critical
Publication of CN104572895B publication Critical patent/CN104572895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, a tool and a realization method, comprising a method for intercommunicating data between an MPP database and a Hadoop cluster by utilizing a data intercommunication tool and a method for intercommunicating the data through TXT transmit; the data is directly exported (imported) into the Hadoop cluster from the MPP database, and does not need to be transferred through a storage unit except the MPP database and the Hadoop cluster; and thereby the export process is more efficient. If the data is needed to be processed secondly through the Hadoop cluster, the TXT format transmit way is selected; according to the invention, the problem that the data between the MPP databast and the Hadoop business cannot be intercommunicated can be solved; the mashup of two business platforms of the MPP database and the Hadoop cluster is realized.

Description

MPP database and Hadoop company-data interoperability methods, instrument and implementation method
Technical field
The present invention relates to and belong to distributed data base field, particularly relate to a kind of MPP database and Hadoop company-data interoperability methods, instrument and its implementation.
Background technology
Before internet occurs, data produce mainly through man-machine conversation's mode, based on structural data.For this affairs type data, the additions and deletions of final user to data change to look into more to be paid close attention to, and corresponding data processing is referred to as OLTP (Online Transaction Processing, Transaction Processing).Traditional Relational DataBase (RDBMS) mainly towards this Demand Design and exploitation, and occupies critical role between in the past 30 years.During this period, increaseing slowly of data, more isolated between system, traditional database can meet types of applications demand substantially.
Along with appearance and the fast development of internet, the especially in recent years develop rapidly of mobile Internet, Data Source there occurs qualitative change.Data are mostly is automatically produced by equipment, server, various application, and these data are based on destructuring, semi-structured, and growth rate is geometry level.For this categorical data (being called large data), the additions and deletions of the less execution of final user to data change operation, what more pay close attention to is obtain data with prestissimo by database, and data are arranged, alternate analysis and the degree of depth excavate, and produces report and the prediction etc. to data.Corresponding data processing is referred to as OLAP (Online AnalyticalProcessing, on-line analytical processing).
Traditional database for this kind of demand of large data analysis in technology with functionally all almost feel simply helpless, along with Data Source and the change to data processing needs, it is found that single platform meets all application demands real not, and start to select optimal product and technology according to application demand, data characteristics and magnitude.The situation that the technology path of data processing field also rules all the land from traditional database (OldSQL) has moved towards segmentation development, becomes the situation that present stage applied by OldSQL, NewSQL and NoSQL polymorphic type common support multiclass.
NewSQL types of database mainly refers to MPP (Massively Parallel Processing, massively parallel processing) the advanced database cluster of framework, the large data of emphasis Industry-oriented, adopt SharedNothing framework, by multinomial large data processing techniques such as row storage, coarseness indexes, again in conjunction with the efficient distributed computing model of MPP framework, complete the support to analysis classes application, running environment mostly is low cost PC Server, there is the feature of high-performance and high scalability, obtain in enterprise diagnosis class application and apply widely.
NoSQL type mainly refers to technological expansion based on Hadoop and encapsulation, derives relevant large data technique around Hadoop, for tackling the storage of the more unmanageable half/unstructured data of traditional Relational DataBase and calculating etc.The most typical application scenarios is exactly by expanding and encapsulating Hadoop to realize the support storing the large data of internet arena, analyze at present.For half/unstructured data process, complicated ETL (Exract-Transform-Load extracts-conversion-loading) flow process, complicated data mining and computation model, Hadoop is more good at.
In sum, cannot the problem of intercommunication for data between MPP database and Hadoop business, the invention provides a kind of method supporting MPP database and Hadoop data interchange, the wherein mode of both direct intercommunications, data transmission efficiency is very high, is that MPP database and the mixed of Hadoop two kinds of business platforms take one of prerequisite.
Summary of the invention
The problem to be solved in the present invention is for cannot the problem of intercommunication between MPP database and Hadoop business, proposes a kind of MPP database and Hadoop company-data interoperability methods and data interchange instrument.For solving the problems of the technologies described above, the technical solution used in the present invention is: a kind of MPP database and Hadoop company-data interoperability methods, comprise
(1) utilize data interchange instrument that data are directly exported to Hadoop cluster by MPP database, or data export to Hadoop cluster by MPP database by TXT transfer;
(2) utilize data interchange instrument that data are directly directed into MPP database by Hadoop cluster, or data are directed into MPP database by Hadoop cluster by TXT transfer.
Further, the step that data directly export to Hadoop cluster by MPP database is by the described data interchange instrument that utilizes:
(1) data interchange instrument start-up;
(2) status checking, data interchange instrument is to MPP data base set pocket transmission sql command, carry out status checking, after MPP data-base cluster receives status checking sql command, connect Hadoop cluster and check Hadoop cluster assigned catalogue can write state, MPP data-base cluster checks self each node state and each data fragmentation state;
(3) derive metadata, data tool sends to MPP database and derives metadata sql command, and metadata is exported to Hadoop file system assigned catalogue after receiving and deriving metadata sql command by MPP data-base cluster;
(4) obtain and treat derived table in database;
(5) by table derived data, data interchange instrument adopts derives sql command to the pocket transmission of MPP data base set concurrently by table mode, MPP data-base cluster performs statistical conversion operation, directly by the back end assigned catalogue of statistical conversion to Hadoop cluster after receiving table derivation sql command;
(6) derive successfully, normally exit;
(7) derive unsuccessfully, implementation interrupts exiting.
Further, the step that described data export to Hadoop cluster by MPP database by TXT transfer is:
(1) data interchange instrument start-up;
(2) status checking, data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking.After MPP data-base cluster receives status checking sql command, check self each node state and each data fragmentation state;
(3) derive metadata, data interchange instrument sends to MPP database and derives metadata sql command, and after MPP data-base cluster receives and derives metadata sql command, metadata is exported to the assigned catalogue of exterior storage, export form is TXT.;
(4) obtain and treat derived table in database;
(5) by table derived data, data interchange instrument adopts derives sql command to the pocket transmission of MPP data base set concurrently by table mode, MPP data-base cluster receives table and derives after sql command, performs statistical conversion operation, directly by statistical conversion to exterior storage assigned catalogue;
(6) Hadoop imports data, and the physical machine at exterior storage place is installed Hadoop client, performs-put the order of Hadoop, imports in the assigned catalogue of Hadoop by the data file of TXT form;
(7) Hadoop imports data success, normally exits;
(8) implementation interrupts exiting.
Further, the step that data are directly directed into MPP database by Hadoop cluster is by the described data interchange instrument that utilizes:
(1) data interchange instrument start-up;
(2) status checking, data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking, after MPP data-base cluster receives status checking sql command, connect Hadoop cluster and check Hadoop cluster assigned catalogue can read state; MPP data-base cluster checks self each node state simultaneously;
(3) import metadata, import tool sends to MPP database and imports metadata sql command, and metadata is imported by Hadoop file system assigned catalogue after receiving and importing metadata sql command by MPP data-base cluster;
(4) table to be imported in database is obtained;
(5) data are imported by table, data interchange instrument adopts and imports sql command to the pocket transmission of MPP data base set concurrently by table mode, after MPP data-base cluster receives table importing sql command, perform data import operations, directly the back end of access Hadoop cluster by data importing to MPP database;
(6) import successfully, normally exit;
(7) import unsuccessfully, implementation interrupts exiting.
Further, the step that described data are directed into MPP database by Hadoop cluster by TXT transfer is:
(1) Hadoop derived data, the physical machine at exterior storage place is installed Hadoop client, performs-get the order of Hadoop, derives the data file of TXT form in exterior storage assigned catalogue by the assigned catalogue of Hadoop;
(2) data interchange instrument start-up;
(3) status checking, data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking.After MPP data-base cluster receives status checking sql command, check self each node state;
(4) import metadata, data interchange instrument sends to MPP database and imports metadata sql command, after MPP data-base cluster receives and imports metadata sql command, imports metadata by the assigned catalogue of exterior storage;
(5) all tables in database are obtained;
(6) importing data are performed by table, data interchange instrument adopts and imports sql command to the pocket transmission of MPP data base set concurrently by table mode, MPP data-base cluster performs data import operation after receiving table importing sql command, imports data by the assigned catalogue of exterior storage;
(7) import successfully, normally exit;
(8) implementation interrupts exiting.
Further, support during statistical conversion in MPP database that screening is derived, the mode that screening is derived is the SQL statement of input tape where condition.
A kind of MPP database and Hadoop company-data intercommunication instrument, comprise main control module, Command Line Parsing module, connector, derivation importing scheduler, worker thread, log pattern, SQL structure module, worker thread pond; Described main control module and described Command Line Parsing module, worker thread pond, derive scheduler and be connected; Described log pattern and described Command Line Parsing module, SQL build module, worker thread, worker thread pond, connector are connected; Described derivation imports scheduler and described connector, worker thread, worker thread pond, SQL build model calling; Described connector is connected with described worker thread; Described worker thread and described SQL build model calling.
A kind of MPP database and Hadoop company-data intercommunication instrument implementation method, comprise the steps:
(1) user is by start up with command-line options instrument and input configuration information thereupon, main control module starts with instrument start-up, first main control module creates instrument running log overall situation example by log pattern after starting, and then completes other modules and initial work;
(2) main control module receives the character string forms configuration information of user's input, and imports this information into Command Line Parsing exemplary module, resolves further user's input configuration;
(3) user inputs character string form configuration information is resolved to the inner identifiable design configuration information of program by Parameter analysis of electrochemical module, and is returned to main control module;
(6) main control module starts derives importing scheduler, imports scheduler to derive (importing) work by derivation;
(7) derive importing scheduler and create main connector example, and connect MPP database by main connector;
(6) derive importing scheduler and build module construction status checking SQL by SQL, checked by main connector executing state;
(7) derive importing scheduler and build module construction derivation (importing) metadata SQL by SQL, perform derivation (importing) metadata by main connector;
(8) derive import scheduler by SQL build module construction inquiry institute need derive (importings) table SQL, by main connector perform inquiry institute need derive (importings) table, obtain institute need derivation (importing) show;
(9) derive importing scheduler and create job scheduling daily record overall situation example by log pattern;
(10) derive importing scheduler and obtain worker thread by thread pool module, quantity equals to derive the configuration of (importing) degree of parallelism, create the working connectors of respective amount, and each worker thread distributes a working connectors, then start All Jobs, (importing) operation is derived in each worker thread parallel processing; Wherein, the derivation (importing) of single table is called operation, job content comprises: the first step, MPP database is connected by working connectors, second step, build module construction by SQL and derive (importing) SQL, the 3rd step performs derivation (importing) by working connectors;
(11) derivation (importing) the Job execution situation importing scheduler and gather each worker thread is derived, arrange the implementation status after gathering for derivation (importing) execution result returns to main control module, main control module finally will be derived (importing) result and be returned to user.
The advantage that the present invention has and good effect are: achieve the data interchange between MPP database and Hadoop cluster, and derivation/lead-in mode can be selected according to actual needs flexibly: when not needing Hadoop secondary treating, efficient direct mode can be selected; Otherwise TXT form transfer mode can be selected.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of a kind of MPP database and Hadoop data interchange method;
Fig. 2 is that data directly export to the schematic diagram of Hadoop cluster by MPP database;
Fig. 3 is data are directly exported to the concrete execution step of Hadoop cluster schematic diagram by MPP database;
Fig. 4 is data are exported to Hadoop cluster by TXT transfer schematic diagram by MPP database;
The schematic diagram of Fig. 5 to be data by MPP database specifically performed by the method that TXT transfer exports to Hadoop cluster step;
Fig. 6 is that data are directly directed into the schematic diagram of MPP database by Hadoop cluster;
Fig. 7 is data are directly directed into the concrete execution step of MPP database schematic diagram by Hadoop cluster;
Fig. 8 is that data are directly directed into the schematic diagram of MPP database by Hadoop cluster;
The schematic diagram of Fig. 9 to be data by Hadoop cluster specifically performed by the method that TXT transfer is directed into MPP database step;
Figure 10 is the schematic diagram of data interchange instrument.
Embodiment
And the present invention is described in detail in conjunction with example below with reference to the accompanying drawings.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can be combined with each other.
The invention provides a kind of MPP database and Hadoop company-data intercommunication instrument and data interchange method, comprise and utilize data interchange instrument directly carry out the method for the intercommunication of data between MPP database and Hadoop cluster and carried out the method for data interchange by TXT transfer.
1, as shown in Figure 2, data directly export to Hadoop cluster by MPP database, the computing node of MPP database accesses the back end of Hadoop cluster by data interchange instrument, data are directly exported to Hadoop cluster, without the need to passing through the storage unit transfer outside MPP database and Hadoop cluster, thus make derivation process more efficient.Its concrete implementation is as shown in Figure 3:
Step 301, data interchange instrument start-up;
Step 302, status checking.Data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking.After MPP data-base cluster receives status checking sql command, connect Hadoop cluster and check Hadoop cluster assigned catalogue can write state, MPP data-base cluster checks self each node state and each data fragmentation state; If state inspection results is not for pass through, then perform step 307, otherwise perform step 303;
Step 303, derives metadata.Data interchange instrument sends to MPP database and derives metadata sql command, and metadata is exported to Hadoop file system assigned catalogue after receiving and deriving metadata sql command by MPP data-base cluster.If perform unsuccessfully, perform step 307, otherwise perform step 304;
Step 304, obtains and treats derived table in database.Data interchange instrument is to the table query SQL order of MPP database transmit band where condition, where condition is not with to represent whole derivation, after MPP data-base cluster receives the table query SQL order of band where condition, perform the table order that inquiry meets where condition, if failure, perform step 307, otherwise the table name satisfied condition in return data intercommunication tool database; Perform step 305;
Step 305, by table derived data.Data interchange instrument adopts derives sql command to the pocket transmission of MPP data base set concurrently by table mode, and MPP data-base cluster performs statistical conversion operation, directly by the back end assigned catalogue of statistical conversion to Hadoop cluster after receiving table derivation sql command; If single table is derived unsuccessfully, then next opens table to skip this table continuation process, if N continuous (user's appointment) is table to derive unsuccessfully, then performs step 307, otherwise continuation performs to all table derivation complete, then performs step 306;
Step 306, derives successfully, normally exits;
Step 307, derives unsuccessfully, and implementation interrupts exiting.
2, as shown in Figure 4, data export to Hadoop cluster by MPP database by TXT transfer, data are exported to storage unit outside MPP database and Hadoop cluster by MPP database by data interchange instrument, by-put the mode of Hadoop client by external memory unit by the data importing of TXT textual form to Hadoop cluster, thus make Hadoop can carry out its concrete implementation of secondary treating as shown in Figure 5 to the data of TXT textual form before importing:
Step 501, data interchange instrument start-up;
Step 502, status checking.Data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking.After MPP data-base cluster receives status checking sql command, check self each node state and each data fragmentation state; If state inspection results is not for pass through, then perform step 507, otherwise perform step 503;
Step 503, derives metadata.Data interchange instrument sends to MPP database and derives metadata sql command, and after MPP data-base cluster receives and derives metadata sql command, metadata is exported to the assigned catalogue of exterior storage, export form is TXT.If perform unsuccessfully, perform step 507, otherwise perform step 504;
Step 504, obtains and treats derived table (according to specified requirements) in database.Derivation instrument is to the table query SQL order of MPP database transmit band where condition (not representing whole derivation with where condition), after MPP data-base cluster receives the table query SQL order of band where condition, perform the table order that inquiry meets where condition, if failure, perform step 507, otherwise return the table name of deriving and satisfying condition in tool database; Perform step 505;
Step 505, by table derived data.Derivation instrument adopts derives sql command to the pocket transmission of MPP data base set concurrently by table mode, and MPP data-base cluster receives table and derives after sql command, performs statistical conversion operation, directly by statistical conversion to exterior storage assigned catalogue; If single table is derived unsuccessfully, then next opens table to skip this table continuation process, if N continuous (user's appointment) is table to derive unsuccessfully, then performs step 507, otherwise continuation performs to all table derivation complete, then performs step 506;
Step 506, Hadoop imports data.The physical machine at exterior storage place is installed Hadoop client, perform-put the order of Hadoop, the data file of TXT form is imported in the assigned catalogue of Hadoop, if Hadoop imports data success, by MPP database derived data to Hadoop normal termination, perform step 507; Otherwise perform step 508;
Step 507, Hadoop imports data success, normally exits;
Step 508, implementation interrupts exiting.
3, as shown in Figure 6, data are directly directed into MPP database by Hadoop cluster, data are without the need to passing through the storage unit transfer outside MPP database and Hadoop cluster, and the computing node of MPP database directly accesses the back end of Hadoop cluster, thus make importing process more efficient.Its concrete implementation is as shown in Figure 7:
Step 701, data interchange instrument start-up;
Step 702, status checking.Import tool, to MPP data base set pocket transmission sql command, carries out status checking.After MPP data-base cluster receives status checking sql command, connect Hadoop cluster and check Hadoop cluster assigned catalogue can read state; MPP data-base cluster checks self each node state simultaneously; If state inspection results is not for pass through, then perform step 706, otherwise perform step 703;
Step 703, imports metadata.Import tool sends to MPP database and imports metadata sql command, and metadata is imported by Hadoop file system assigned catalogue after receiving and importing metadata sql command by MPP data-base cluster.If perform unsuccessfully, perform step 707, otherwise perform step 704;
Step 704, obtains all tables in database.Import tool sends the order of table query SQL to MPP database, after MPP data-base cluster receives the order of table query SQL, performs all table orders of inquiry, if failure, performs step 707, otherwise return all table names in import tool database; Perform step 705;
Step 705, performs by table and imports data.Import tool adopts and imports sql command to the pocket transmission of MPP data base set concurrently by table mode, and MPP data-base cluster receives table and imports after sql command, performs data import operation, directly the back end of access Hadoop cluster by data importing to MPP database; If single table imports unsuccessfully, then next opens table to skip this table continuation process, if N continuous (user's appointment) is table to import unsuccessfully, then performs step 706, otherwise continuation performs to all table importings complete, then performs step 706;
Step 706, imports successfully, normally exits;
Step 707, imports unsuccessfully, and implementation interrupts exiting.
4, as shown in Figure 8, data are directed into MPP database by Hadoop cluster by TXT transfer, data are exported to storage unit beyond MPP database and Hadoop cluster with TXT text mode by Hadoop cluster, then by MPP database by the data importing of TXT text mode to MPP database.Its concrete implementation is as shown in Figure 9:
Step 901, Hadoop derived data.The physical machine at exterior storage place is installed Hadoop client, performs-get the order of Hadoop, derive the data file of TXT form in exterior storage assigned catalogue by the assigned catalogue of Hadoop.If the failure of Hadoop derived data, performs step 908, otherwise performs step 902;
Step 902, data interchange instrument start-up, performs step 903;
Step 903, status checking.Data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking.After MPP data-base cluster receives status checking sql command, check self each node state; If state inspection results is not for pass through, then perform step 908, otherwise perform step 904;
Step 904, imports metadata.Data interchange instrument sends to MPP database and imports metadata sql command, after MPP data-base cluster receives and imports metadata sql command, imports metadata by the assigned catalogue of exterior storage.If perform unsuccessfully, perform step 908, otherwise perform step 905;
Step 905, obtains all tables in database.Data interchange instrument sends the order of table query SQL to MPP database, after MPP data-base cluster receives the order of table query SQL, performs all table orders of inquiry, if failure, performs step 908, otherwise all table names in return data intercommunication tool database; Perform step 906;
Step 906, performs by table and imports data.Data interchange instrument adopts and imports sql command to the pocket transmission of MPP data base set concurrently by table mode, and MPP data-base cluster performs data import operation after receiving table importing sql command, imports data by the assigned catalogue of exterior storage; If single table imports unsuccessfully, then next opens table to skip this table continuation process, if N continuous (user's appointment) is table to import unsuccessfully, then performs step 908, otherwise continuation performs to all table importings complete, then performs step 907;
Step 907, imports successfully, normally exits;
Step 908, implementation interrupts exiting.
Above embodiments of the invention have been described in detail, but described content being only preferred embodiment of the present invention, can not being considered to for limiting practical range of the present invention.All equalizations done according to the scope of the invention change and improve, and all should still belong within this patent covering scope.

Claims (8)

1. MPP database and a Hadoop company-data interoperability methods, is characterized in that, comprise
(1) utilize data interchange instrument that data are directly exported to Hadoop cluster by MPP database, or data export to Hadoop cluster by MPP database by TXT transfer;
(2) utilize data interchange instrument that data are directly directed into MPP database by Hadoop cluster, or data are directed into MPP database by Hadoop cluster by TXT transfer.
2. a kind of MPP database according to claim 1 and Hadoop data interchange method, is characterized in that, the step that data directly export to Hadoop cluster by MPP database is by the described data interchange instrument that utilizes:
(1) data interchange instrument start-up;
(2) status checking, data interchange instrument is to MPP data base set pocket transmission sql command, carry out status checking, after MPP data-base cluster receives status checking sql command, connect Hadoop cluster and check Hadoop cluster assigned catalogue can write state, MPP data-base cluster checks self each node state and each data fragmentation state;
(3) derive metadata, data tool sends to MPP database and derives metadata sql command, and metadata is exported to Hadoop file system assigned catalogue after receiving and deriving metadata sql command by MPP data-base cluster;
(4) obtain and treat derived table in database;
(5) by table derived data, data interchange instrument adopts derives sql command to the pocket transmission of MPP data base set concurrently by table mode, MPP data-base cluster performs statistical conversion operation, directly by the back end assigned catalogue of statistical conversion to Hadoop cluster after receiving table derivation sql command;
(6) derive successfully, normally exit;
(7) derive unsuccessfully, implementation interrupts exiting.
3. a kind of MPP database according to claim 1 and Hadoop data interchange method, is characterized in that, the step that described data export to Hadoop cluster by MPP database by TXT transfer is:
(1) data interchange instrument start-up;
(2) status checking, data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking.After MPP data-base cluster receives status checking sql command, check self each node state and each data fragmentation state;
(3) derive metadata, data interchange instrument sends to MPP database and derives metadata sql command, and after MPP data-base cluster receives and derives metadata sql command, metadata is exported to the assigned catalogue of exterior storage, export form is TXT.;
(4) obtain and treat derived table in database;
(5) by table derived data, data interchange instrument adopts derives sql command to the pocket transmission of MPP data base set concurrently by table mode, MPP data-base cluster receives table and derives after sql command, performs statistical conversion operation, directly by statistical conversion to exterior storage assigned catalogue;
(6) Hadoop imports data, and the physical machine at exterior storage place is installed Hadoop client, performs-put the order of Hadoop, imports in the assigned catalogue of Hadoop by the data file of TXT form;
(7) Hadoop imports data success, normally exits;
(8) implementation interrupts exiting.
4. a kind of MPP database according to claim 1 and Hadoop data interchange method, is characterized in that, the step that data are directly directed into MPP database by Hadoop cluster is by the described data interchange instrument that utilizes:
(1) data interchange instrument start-up;
(2) status checking, data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking, after MPP data-base cluster receives status checking sql command, connect Hadoop cluster and check Hadoop cluster assigned catalogue can read state; MPP data-base cluster checks self each node state simultaneously;
(3) import metadata, import tool sends to MPP database and imports metadata sql command, and metadata is imported by Hadoop file system assigned catalogue after receiving and importing metadata sql command by MPP data-base cluster;
(4) table to be imported in database is obtained;
(5) data are imported by table, data interchange instrument adopts and imports sql command to the pocket transmission of MPP data base set concurrently by table mode, after MPP data-base cluster receives table importing sql command, perform data import operations, directly the back end of access Hadoop cluster by data importing to MPP database;
(6) import successfully, normally exit;
(7) import unsuccessfully, implementation interrupts exiting.
5. a kind of MPP database according to claim 1 and Hadoop data interchange method, is characterized in that, the step that described data are directed into MPP database by Hadoop cluster by TXT transfer is:
(1) Hadoop derived data, the physical machine at exterior storage place is installed Hadoop client, performs-get the order of Hadoop, derives the data file of TXT form in exterior storage assigned catalogue by the assigned catalogue of Hadoop;
(2) data interchange instrument start-up;
(3) status checking, data interchange instrument, to MPP data base set pocket transmission sql command, carries out status checking.After MPP data-base cluster receives status checking sql command, check self each node state;
(4) import metadata, data interchange instrument sends to MPP database and imports metadata sql command, after MPP data-base cluster receives and imports metadata sql command, imports metadata by the assigned catalogue of exterior storage;
(5) all tables in database are obtained;
(6) importing data are performed by table, data interchange instrument adopts and imports sql command to the pocket transmission of MPP data base set concurrently by table mode, MPP data-base cluster performs data import operation after receiving table importing sql command, imports data by the assigned catalogue of exterior storage;
(7) import successfully, normally exit;
(8) implementation interrupts exiting.
6. a kind of MPP database according to claim 1 and Hadoop data interchange method, is characterized in that: support during statistical conversion in MPP database that screening is derived, the mode that screening is derived is the SQL statement of input tape where condition.
7. MPP database and a Hadoop company-data intercommunication instrument, comprises main control module, Command Line Parsing module, connector, derivation importing scheduler, worker thread, log pattern, SQL structure module, worker thread pond; Described main control module and described Command Line Parsing module, worker thread pond, derive scheduler and be connected; Described log pattern and described Command Line Parsing module, SQL build module, worker thread, worker thread pond, connector are connected; Described derivation imports scheduler and described connector, worker thread, worker thread pond, SQL build model calling; Described connector is connected with described worker thread; Described worker thread and described SQL build model calling.
8. MPP database and a Hadoop company-data intercommunication instrument implementation method, is characterized in that: comprise the steps:
(1) user is by start up with command-line options instrument and input configuration information thereupon, main control module starts with instrument start-up, first main control module creates instrument running log overall situation example by log pattern after starting, and then completes other modules and initial work;
(2) main control module receives the character string forms configuration information of user's input, and imports this information into Command Line Parsing exemplary module, resolves further user's input configuration;
(3) user inputs character string form configuration information is resolved to the inner identifiable design configuration information of program by Parameter analysis of electrochemical module, and is returned to main control module;
(4) main control module starts derives importing scheduler, imports scheduler to derive (importing) work by derivation;
(5) derive importing scheduler and create main connector example, and connect MPP database by main connector;
(6) derive importing scheduler and build module construction status checking SQL by SQL, checked by main connector executing state;
(7) derive importing scheduler and build module construction derivation (importing) metadata SQL by SQL, perform derivation (importing) metadata by main connector;
(8) derive import scheduler by SQL build module construction inquiry institute need derive (importings) table SQL, by main connector perform inquiry institute need derive (importings) table, obtain institute need derivation (importing) show;
(9) derive importing scheduler and create job scheduling daily record overall situation example by log pattern;
(10) derive importing scheduler and obtain worker thread by thread pool module, quantity equals to derive the configuration of (importing) degree of parallelism, create the working connectors of respective amount, and each worker thread distributes a working connectors, then start All Jobs, (importing) operation is derived in each worker thread parallel processing; Wherein, the derivation (importing) of single table is called operation, job content comprises: the first step, MPP database is connected by working connectors, second step, build module construction by SQL and derive (importing) SQL, the 3rd step performs derivation (importing) by working connectors;
(11) derivation (importing) the Job execution situation importing scheduler and gather each worker thread is derived, arrange the implementation status after gathering for derivation (importing) execution result returns to main control module, main control module finally will be derived (importing) result and be returned to user.
CN201410820059.5A 2014-12-24 2014-12-24 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method Active CN104572895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410820059.5A CN104572895B (en) 2014-12-24 2014-12-24 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410820059.5A CN104572895B (en) 2014-12-24 2014-12-24 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method

Publications (2)

Publication Number Publication Date
CN104572895A true CN104572895A (en) 2015-04-29
CN104572895B CN104572895B (en) 2018-02-23

Family

ID=53088957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410820059.5A Active CN104572895B (en) 2014-12-24 2014-12-24 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method

Country Status (1)

Country Link
CN (1) CN104572895B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320755A (en) * 2015-10-14 2016-02-10 夏君 Secure high-speed data transmission method
CN105933446A (en) * 2016-06-28 2016-09-07 中国农业银行股份有限公司 Service dual-active implementation method and system of big data platform
CN106227862A (en) * 2016-07-29 2016-12-14 浪潮软件集团有限公司 E-commerce data integration method based on distribution
CN107402995A (en) * 2016-09-21 2017-11-28 广州特道信息科技有限公司 A kind of distributed newSQL Database Systems and method
CN107622094A (en) * 2017-08-30 2018-01-23 苏州朗动网络科技有限公司 A kind of high-volume data guiding system and method based on search engine
CN107679192A (en) * 2017-10-09 2018-02-09 中国工商银行股份有限公司 More cluster synergistic data processing method, system, storage medium and equipment
CN108446145A (en) * 2018-03-21 2018-08-24 苏州提点信息科技有限公司 A kind of distributed document loads MPP data base methods automatically
WO2019109854A1 (en) * 2017-12-07 2019-06-13 中兴通讯股份有限公司 Data processing method and device for distributed database, storage medium, and electronic device
CN110716802A (en) * 2019-10-11 2020-01-21 恩亿科(北京)数据科技有限公司 Cross-cluster task scheduling system and method
CN111143403A (en) * 2019-12-10 2020-05-12 跬云(上海)信息科技有限公司 SQL conversion method and device and storage medium
CN111416861A (en) * 2020-03-20 2020-07-14 中国建设银行股份有限公司 Communication management system and method
CN112632114A (en) * 2019-10-08 2021-04-09 中国移动通信集团辽宁有限公司 Method and device for MPP database to quickly read data and computing equipment
CN114138750A (en) * 2021-12-03 2022-03-04 无锡星凝互动科技有限公司 AI consultation database cluster building method and system
CN116010337A (en) * 2022-12-05 2023-04-25 广州海量数据库技术有限公司 Method for accessing ORC data by openGauss

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050044086A1 (en) * 2003-08-18 2005-02-24 Cheng-Hwa Liu Symmetry database system and method for data processing
CN101187937A (en) * 2007-10-30 2008-05-28 北京航空航天大学 Mode multiplexing isomerous database access and integration method under gridding environment
CN101944128A (en) * 2010-09-25 2011-01-12 中兴通讯股份有限公司 Data export and import method and device
US20130110799A1 (en) * 2011-10-31 2013-05-02 Sally Blue Hoppe Access to heterogeneous data sources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050044086A1 (en) * 2003-08-18 2005-02-24 Cheng-Hwa Liu Symmetry database system and method for data processing
CN101187937A (en) * 2007-10-30 2008-05-28 北京航空航天大学 Mode multiplexing isomerous database access and integration method under gridding environment
CN101944128A (en) * 2010-09-25 2011-01-12 中兴通讯股份有限公司 Data export and import method and device
US20130110799A1 (en) * 2011-10-31 2013-05-02 Sally Blue Hoppe Access to heterogeneous data sources

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
辛晃 等: "基于Hadoop+MPP架构的电信运营商网络数据共享平台研究", 《电信科学》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320755A (en) * 2015-10-14 2016-02-10 夏君 Secure high-speed data transmission method
CN105933446A (en) * 2016-06-28 2016-09-07 中国农业银行股份有限公司 Service dual-active implementation method and system of big data platform
CN106227862A (en) * 2016-07-29 2016-12-14 浪潮软件集团有限公司 E-commerce data integration method based on distribution
CN107402995A (en) * 2016-09-21 2017-11-28 广州特道信息科技有限公司 A kind of distributed newSQL Database Systems and method
CN107402995B (en) * 2016-09-21 2020-06-09 云润大数据服务有限公司 Distributed newSQL database system and method
CN107622094A (en) * 2017-08-30 2018-01-23 苏州朗动网络科技有限公司 A kind of high-volume data guiding system and method based on search engine
CN107679192A (en) * 2017-10-09 2018-02-09 中国工商银行股份有限公司 More cluster synergistic data processing method, system, storage medium and equipment
CN110019469A (en) * 2017-12-07 2019-07-16 中兴通讯股份有限公司 Distributed data base data processing method, device, storage medium and electronic device
WO2019109854A1 (en) * 2017-12-07 2019-06-13 中兴通讯股份有限公司 Data processing method and device for distributed database, storage medium, and electronic device
CN110019469B (en) * 2017-12-07 2022-06-21 金篆信科有限责任公司 Distributed database data processing method and device, storage medium and electronic device
US11928089B2 (en) 2017-12-07 2024-03-12 Zte Corporation Data processing method and device for distributed database, storage medium, and electronic device
CN108446145A (en) * 2018-03-21 2018-08-24 苏州提点信息科技有限公司 A kind of distributed document loads MPP data base methods automatically
CN112632114A (en) * 2019-10-08 2021-04-09 中国移动通信集团辽宁有限公司 Method and device for MPP database to quickly read data and computing equipment
CN112632114B (en) * 2019-10-08 2024-03-19 中国移动通信集团辽宁有限公司 Method, device and computing equipment for fast reading data by MPP database
CN110716802A (en) * 2019-10-11 2020-01-21 恩亿科(北京)数据科技有限公司 Cross-cluster task scheduling system and method
CN111143403A (en) * 2019-12-10 2020-05-12 跬云(上海)信息科技有限公司 SQL conversion method and device and storage medium
CN111416861A (en) * 2020-03-20 2020-07-14 中国建设银行股份有限公司 Communication management system and method
CN114138750A (en) * 2021-12-03 2022-03-04 无锡星凝互动科技有限公司 AI consultation database cluster building method and system
CN116010337A (en) * 2022-12-05 2023-04-25 广州海量数据库技术有限公司 Method for accessing ORC data by openGauss

Also Published As

Publication number Publication date
CN104572895B (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN104572895A (en) MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method
JP5298117B2 (en) Data merging in distributed computing
CN104965735B (en) Device for generating upgrading SQL scripts
US9128991B2 (en) Techniques to perform in-database computational programming
US10102039B2 (en) Converting a hybrid flow
US8682876B2 (en) Techniques to perform in-database computational programming
EP3751426A1 (en) System and method for migration of a legacy datastore
CN104133772A (en) Automatic test data generation method
US9043344B1 (en) Data mining and model generation using an in-database analytic flow generator
CN103425762A (en) Telecom operator mass data processing method based on Hadoop platform
US9563650B2 (en) Migrating federated data to multi-source universe database environment
CN106528898A (en) Method and device for converting data of non-relational database into relational database
CN106776962A (en) A kind of general Excel data import multiple database physical table methods
US20170060977A1 (en) Data preparation for data mining
CN105677687A (en) Data processing method and device
CN112214453B (en) Large-scale industrial data compression storage method, system and medium
CN108829884A (en) data mapping method and device
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN111290813B (en) Software interface field data standardization method, device, equipment and medium
CN111126852A (en) BI application system based on big data modeling
CN108255852B (en) SQL execution method and device
CN102023859A (en) Digital development environment-oriented software integration method with reliability, maintainability and supportability
CN111125064A (en) Method and device for generating database mode definition statement
CN105653830A (en) Data analysis method based on model driving
CN109829003A (en) Database backup method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant