CN106649418A - High-performance method for importing data into distributed database through direct connection of fragments in driver - Google Patents

High-performance method for importing data into distributed database through direct connection of fragments in driver Download PDF

Info

Publication number
CN106649418A
CN106649418A CN201510755541.XA CN201510755541A CN106649418A CN 106649418 A CN106649418 A CN 106649418A CN 201510755541 A CN201510755541 A CN 201510755541A CN 106649418 A CN106649418 A CN 106649418A
Authority
CN
China
Prior art keywords
burst
sql
data
node
performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510755541.XA
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU CITED RUN NETWORK TECHNOLOGY Co Ltd
Original Assignee
JIANGSU CITED RUN NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU CITED RUN NETWORK TECHNOLOGY Co Ltd filed Critical JIANGSU CITED RUN NETWORK TECHNOLOGY Co Ltd
Priority to CN201510755541.XA priority Critical patent/CN106649418A/en
Publication of CN106649418A publication Critical patent/CN106649418A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a high-performance method for importing data into a distributed database through direct connection of fragments in a driver. An algorithm implementation is expounded in the method. The method comprises the steps that a fragmentation algorithm is integrated in the database driver, and all fragment nodes are directly connected by avoiding a master node; a blocking queue is established for SQLs to be executed in all the fragments, the batch of the SQLs is replanned, the SQLs are submitted to the fragment nodes in batch, and the data is written; two fragment synchronizing modes, namely performance priority and data security priority are provided. According to the method, the fragment nodes are directly connected by avoiding the traditional master node, existing single-step execution of the SQLs is converted into concurrent execution of the SQLs in all the fragments in batch, therefore, intermediate processing and transmission links are omitted in the whole process, the network load is greatly reduced, the network utilization rate is increased, physical resources of all the distributed nodes are fully mined and utilized, and therefore import performance is remarkably improved in the scene of importing the data into the distributed database.

Description

A kind of burst direct-connected in driving realizes the high performance method that distributed data database data is imported
Technical field
The present invention relates to database technical field.
Background technology
In the attended operations such as backup, recovery, migration, it is a common operation to import data to database.Point Cloth database is generally used for the scene of mass data, imports data time-consuming often very long.Import data Performance, it is just particularly important in production environment.Import the use scene of data with the scene for normally using Difference, it is disposable, and as failure can reform, database is typically in low-load state, each burst State be good and without significant difference, it is separate that the SQL of each burst is performed.Using this scene Some features be optimized, be possible to obtain some targetedly performance boosts.
The content of the invention
Disclosure sets forth an algorithm to realize, importing performance can be obviously improved in the scene that data are imported. Algorithm has following feature:
1st, the main controlled node of distributed data base is connected in database-driven, the burst for obtaining database is matched somebody with somebody Put, redundant copy, the information such as burst rule;
2nd, the integrated slicing algorithm in database-driven, gets around main controlled node and is directly connected to each burst node;
3rd, SQL pending to each burst sets up obstruction queue.Batch is planned again to SQL, is submitted to by batch To burst node, data are write;
4th, each slicing synchronization mode provides two kinds of selections:One:Performance priority, to be initially completed SQL execution Burst speeds control entirety transaction progress;Two:Data safety is preferential, is completed with all bursts SQL performs the overall transaction progress of the control that is defined.
The performance boost that the present invention brings is essentially from two aspects:One, main controlled node has been got around, data will be imported The direct-connected database burst node of application system, reduce an intermediate transfer step, eliminate a network Delay link, also reduces 50% network traffics, is significant to performance boost effect;Two, carry by batch Hand over SQL.Algorithm is cached to SQL in the pending obstruction queue of each burst in driving, has performed in burst Next batch SQL is organized to perform immediately into after, without waiting for the complete of other bursts under the pattern of performance priority Into this can come maximized the squeezing out of the performance potential of each burst.
Description of the drawings
Fig. 1 is workflow diagram of the present invention.
Fig. 2 is the workflow schematic diagram of the present invention.
Specific embodiment
Being embodied as the present invention is to drive realization by transforming the JDBC in legacy data storehouse.It is concrete to change by walking as follows Suddenly:
1st, original JDBC Connection objects are inherited, is realized in constructed fuction:
A) father's constructed fuction is called to complete existing establishment operation;
B) after the completion of father's constructed fuction, driving is set up and is connected with the main controlled node of distributed data base, using proper When management sql command reading database burst configuration, redundant copy, the information such as burst rule;
C) set up one by one and be connected with the database of each burst node;
D) it is that the database connection of each burst node creates bounded and blocks queue, its element is Runnable Type;
E) a worker thread is created for each bounded obstruction queue;
F) each worker thread is started.Worker thread takes the object of Runnable types from queue:If any inspection Survey whether object is off running the example of order:In this way, thread operation is exited, otherwise, this is run Runnable objects;If queue is sky, then thread block, waits and waking up.
2nd, realize in the execution SQL methods of this Connection object:
A) determine whether pending SQL is accessible type.Present invention determine that minimum requirements of the method to SQL It is that each burst SQL execution should be separate, it is impossible to have cross-node to calculate.Data import scene and use Insert The SQL of into types is to meet.This be the algorithm of the present invention mainly for scene, but in fact Application scenarios not limited to this, calculate, even multilist join or subquery as long as can guarantee that without the need for cross-node The complicated SQL of type also can be supported.This needs combines SQL syntax according to the database table information on burst Analysis determines;
B) to inexecutable SQL, SQL is sent to into database main controlled node and is performed, method terminates to return;
C) to the SQL that can be performed, the mapping that a burst node is connected to Runnable objects is created;
D) integrated slicing algorithm is used, travels through every SQL.To every SQL:A Runnable object is created, This object carries the burst connection that SQL and SQL is performed;
3rd, realize in the synchronous method of global transaction manager:
A) according to mode of operation (performance priority or data safety are preferential), the count value of thread synchronization is determined.Performance is excellent First pattern value is 1, and the preferential value of data safety is burst number.This count value is atom integer objects;
B) traversal c) described in map each entry:Press the connection of burst node and determine the corresponding SQL queues of burst, Runnable objects are shifted onto in queue;The worker thread blocked because of queue empty before this will be waken up immediately Perform;
C) count value described in a) is subtracted one by worker thread after the completion of SQL execution, and notifies to monitor this counting The synchronous method thread of value again reads off count value, and worker thread itself is again introduced into reading empty queue and blocking State;
D) synchronous method thread is after zero, to exit synchronizing process count value is read, and this time SQL implementation procedures are completed.

Claims (6)

1. a kind of burst direct-connected in driving realizes the high performance method that distributed data database data is imported, and it is special Levy and be:
1) it is a special algorithm for processing distributed data base burst node SQL operations.
2) algorithm requires that sql command can be independently executed on each burst node, and without the need for cross-node computing.
3) algorithm realization gets around distributed data by the way of the integrated slicing algorithm in database-driven The main controlled node in storehouse, directly sets up with burst node and is connected execution SQL.
4) algorithm provides performance priority and the preferential two kinds of mode of operations of data safety.The former can be obtained than normal Rule drive the SQL execution performances being substantially higher by.
2. the direct-connected burst in driving as described in right 1 realizes the high-performance that distributed data database data is imported The special algorithm for processing burst node SQL operations of method indication, it is characterised in that:Algorithm should be in database Realize in driving, its connection interface is JDBC, application oriented system is operated in application system and database point Between piece node;SQL is performed and is independent of main controlled node, but needs from the related burst of main controlled node reading to match somebody with somebody Put, the data such as copy redundancy.
3. the direct-connected burst in driving as described in right 1 realizes the high-performance that distributed data database data is imported The algorithm of method is realized, it is characterised in that:The queue of an obstruction is set up for each burst node.
4. the direct-connected burst in driving as described in right 3 realizes the high-performance that distributed data database data is imported The obstruction queue of method, it is characterised in that:SQL batches can again be planned so that sql command can be criticized Amount is submitted to burst node.
5. the direct-connected burst in driving as described in right 3 realizes the high-performance that distributed data database data is imported The obstruction queue of method, it is characterised in that:Running for pending SQL is connected to by setting up a burst The mapping of object, thus mapping and slicing algorithm are planning burst node queue that SQL is pushed.
6. the direct-connected burst in driving as described in right 3 realizes the high-performance that distributed data database data is imported The obstruction queue of method, it is characterised in that:By setting up a global atomic counters, in performance priority mould 1 is entered as under formula, burst nodes are entered as under data safety pattern.By each point of this counter controls The thread synchronization that piece SQL is performed.
CN201510755541.XA 2015-11-04 2015-11-04 High-performance method for importing data into distributed database through direct connection of fragments in driver Pending CN106649418A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510755541.XA CN106649418A (en) 2015-11-04 2015-11-04 High-performance method for importing data into distributed database through direct connection of fragments in driver

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510755541.XA CN106649418A (en) 2015-11-04 2015-11-04 High-performance method for importing data into distributed database through direct connection of fragments in driver

Publications (1)

Publication Number Publication Date
CN106649418A true CN106649418A (en) 2017-05-10

Family

ID=58851192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510755541.XA Pending CN106649418A (en) 2015-11-04 2015-11-04 High-performance method for importing data into distributed database through direct connection of fragments in driver

Country Status (1)

Country Link
CN (1) CN106649418A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555012A (en) * 2018-05-14 2019-12-10 杭州海康威视数字技术股份有限公司 data migration method and device
WO2021068850A1 (en) * 2019-10-11 2021-04-15 中兴通讯股份有限公司 Transaction management method and system, network device and readable storage medium
CN112925841A (en) * 2021-03-26 2021-06-08 瀚高基础软件股份有限公司 Distributed JDBC implementation method, device and computer-readable storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467570A (en) * 2010-11-17 2012-05-23 日电(中国)有限公司 Connection query system and method for distributed data warehouse
CN102622426A (en) * 2012-02-27 2012-08-01 杭州闪亮科技有限公司 Database writing system and database writing method
CN102662793A (en) * 2012-03-07 2012-09-12 江苏引跑网络科技有限公司 Hot backup and recovery method of distributed database with guarantee of data consistency
CN102750368A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 High-speed importing method of cluster data in data base
CN102929951A (en) * 2012-10-08 2013-02-13 深圳市博瑞得科技有限公司 Mass data storage method and device with data binding
CN103577551A (en) * 2013-10-16 2014-02-12 青岛海信传媒网络技术有限公司 Method and device for submitting data to database in batch
CN103853713A (en) * 2012-11-28 2014-06-11 成都勤智数码科技股份有限公司 Efficient storage method of mass data
CN104239417A (en) * 2014-08-19 2014-12-24 天津南大通用数据技术股份有限公司 Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database
CN104615490A (en) * 2015-02-05 2015-05-13 浪潮集团有限公司 Method and device for data conversion
CN106021484A (en) * 2016-05-18 2016-10-12 中国电子科技集团公司第三十二研究所 Customizable multi-mode big data processing system based on memory calculation
CN106302581A (en) * 2015-05-21 2017-01-04 阿里巴巴集团控股有限公司 The introduction method of batch data and system
CN106528792A (en) * 2016-11-10 2017-03-22 福州智永信息科技有限公司 Big data acquisition and high-speed processing method and system based on multi-layer caching mechanism

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467570A (en) * 2010-11-17 2012-05-23 日电(中国)有限公司 Connection query system and method for distributed data warehouse
CN102622426A (en) * 2012-02-27 2012-08-01 杭州闪亮科技有限公司 Database writing system and database writing method
CN102662793A (en) * 2012-03-07 2012-09-12 江苏引跑网络科技有限公司 Hot backup and recovery method of distributed database with guarantee of data consistency
CN102750368A (en) * 2012-06-18 2012-10-24 天津神舟通用数据技术有限公司 High-speed importing method of cluster data in data base
CN102929951A (en) * 2012-10-08 2013-02-13 深圳市博瑞得科技有限公司 Mass data storage method and device with data binding
CN103853713A (en) * 2012-11-28 2014-06-11 成都勤智数码科技股份有限公司 Efficient storage method of mass data
CN103577551A (en) * 2013-10-16 2014-02-12 青岛海信传媒网络技术有限公司 Method and device for submitting data to database in batch
CN104239417A (en) * 2014-08-19 2014-12-24 天津南大通用数据技术股份有限公司 Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database
CN104615490A (en) * 2015-02-05 2015-05-13 浪潮集团有限公司 Method and device for data conversion
CN106302581A (en) * 2015-05-21 2017-01-04 阿里巴巴集团控股有限公司 The introduction method of batch data and system
CN106021484A (en) * 2016-05-18 2016-10-12 中国电子科技集团公司第三十二研究所 Customizable multi-mode big data processing system based on memory calculation
CN106528792A (en) * 2016-11-10 2017-03-22 福州智永信息科技有限公司 Big data acquisition and high-speed processing method and system based on multi-layer caching mechanism

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555012A (en) * 2018-05-14 2019-12-10 杭州海康威视数字技术股份有限公司 data migration method and device
CN110555012B (en) * 2018-05-14 2022-03-25 杭州海康威视数字技术股份有限公司 Data migration method and device
WO2021068850A1 (en) * 2019-10-11 2021-04-15 中兴通讯股份有限公司 Transaction management method and system, network device and readable storage medium
CN112925841A (en) * 2021-03-26 2021-06-08 瀚高基础软件股份有限公司 Distributed JDBC implementation method, device and computer-readable storage medium
CN112925841B (en) * 2021-03-26 2022-11-08 瀚高基础软件股份有限公司 Distributed JDBC implementation method, device and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN109101627B (en) Heterogeneous database synchronization method and device
US9589041B2 (en) Client and server integration for replicating data
JP3779263B2 (en) Conflict resolution for collaborative work systems
EP3120261B1 (en) Dependency-aware transaction batching for data replication
JP5577350B2 (en) Method and system for efficient data synchronization
CN104699541A (en) Method, device, data transmission assembly and system for synchronizing data
CN106325984B (en) Big data task scheduling device
EP3491532B1 (en) System and method for data redistribution in database
CN109933632B (en) Data migration method, device and equipment for database
US20210073085A1 (en) Query Fault Processing Method and Processing Apparatus
CN102236705A (en) Fine grain synchronization for database replay
CN105677465B (en) The data processing method and device of batch processing are run applied to bank
CN102332125A (en) Workflow mining method based on subsequent tasks
CN103970833A (en) Method for achieving two-way synchronous data circulation in heterogeneous database synchronizing system based on logs
CN109086382B (en) Data synchronization method, device, equipment and storage medium
CN104598299A (en) System and method for performing aggregation process for each piece of received data
EP4170509A1 (en) Method for playing back log on data node, data node, and system
CN106649418A (en) High-performance method for importing data into distributed database through direct connection of fragments in driver
CN105989006A (en) Data migration method and device
CN110334077A (en) The method and device of database cross computer room migration
CN103927314A (en) Data batch processing method and device
US20140114923A1 (en) Method, system, and computer readable medium for long term archiving of data in a mes system
CN113946628A (en) Data synchronization method and device based on interceptor
CN106412088A (en) Data synchronization method and terminal
US9563471B2 (en) Simulation apparatus, simulation method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170510