CN106649418A

CN106649418A - High-performance method for importing data into distributed database through direct connection of fragments in driver

Info

Publication number: CN106649418A
Application number: CN201510755541.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: JIANGSU CITED RUN NETWORK TECHNOLOGY Co Ltd
Current assignee: JIANGSU CITED RUN NETWORK TECHNOLOGY Co Ltd
Priority date: 2015-11-04
Filing date: 2015-11-04
Publication date: 2017-05-10

Abstract

The invention discloses a high-performance method for importing data into a distributed database through direct connection of fragments in a driver. An algorithm implementation is expounded in the method. The method comprises the steps that a fragmentation algorithm is integrated in the database driver, and all fragment nodes are directly connected by avoiding a master node; a blocking queue is established for SQLs to be executed in all the fragments, the batch of the SQLs is replanned, the SQLs are submitted to the fragment nodes in batch, and the data is written; two fragment synchronizing modes, namely performance priority and data security priority are provided. According to the method, the fragment nodes are directly connected by avoiding the traditional master node, existing single-step execution of the SQLs is converted into concurrent execution of the SQLs in all the fragments in batch, therefore, intermediate processing and transmission links are omitted in the whole process, the network load is greatly reduced, the network utilization rate is increased, physical resources of all the distributed nodes are fully mined and utilized, and therefore import performance is remarkably improved in the scene of importing the data into the distributed database.

Description

A kind of burst direct-connected in driving realizes the high performance method that distributed data database data is imported

Technical field

The present invention relates to database technical field.

Background technology

In the attended operations such as backup, recovery, migration, it is a common operation to import data to database.Point Cloth database is generally used for the scene of mass data, imports data time-consuming often very long.Import data Performance, it is just particularly important in production environment.Import the use scene of data with the scene for normally using Difference, it is disposable, and as failure can reform, database is typically in low-load state, each burst State be good and without significant difference, it is separate that the SQL of each burst is performed.Using this scene Some features be optimized, be possible to obtain some targetedly performance boosts.

The content of the invention

Disclosure sets forth an algorithm to realize, importing performance can be obviously improved in the scene that data are imported. Algorithm has following feature：

1st, the main controlled node of distributed data base is connected in database-driven, the burst for obtaining database is matched somebody with somebody Put, redundant copy, the information such as burst rule；

2nd, the integrated slicing algorithm in database-driven, gets around main controlled node and is directly connected to each burst node；

3rd, SQL pending to each burst sets up obstruction queue.Batch is planned again to SQL, is submitted to by batch To burst node, data are write；

4th, each slicing synchronization mode provides two kinds of selections：One：Performance priority, to be initially completed SQL execution Burst speeds control entirety transaction progress；Two：Data safety is preferential, is completed with all bursts SQL performs the overall transaction progress of the control that is defined.

The performance boost that the present invention brings is essentially from two aspects：One, main controlled node has been got around, data will be imported The direct-connected database burst node of application system, reduce an intermediate transfer step, eliminate a network Delay link, also reduces 50% network traffics, is significant to performance boost effect；Two, carry by batch Hand over SQL.Algorithm is cached to SQL in the pending obstruction queue of each burst in driving, has performed in burst Next batch SQL is organized to perform immediately into after, without waiting for the complete of other bursts under the pattern of performance priority Into this can come maximized the squeezing out of the performance potential of each burst.

Description of the drawings

Fig. 1 is workflow diagram of the present invention.

Fig. 2 is the workflow schematic diagram of the present invention.

Specific embodiment

Being embodied as the present invention is to drive realization by transforming the JDBC in legacy data storehouse.It is concrete to change by walking as follows Suddenly：

1st, original JDBC Connection objects are inherited, is realized in constructed fuction：

A) father's constructed fuction is called to complete existing establishment operation；

B) after the completion of father's constructed fuction, driving is set up and is connected with the main controlled node of distributed data base, using proper When management sql command reading database burst configuration, redundant copy, the information such as burst rule；

C) set up one by one and be connected with the database of each burst node；

D) it is that the database connection of each burst node creates bounded and blocks queue, its element is Runnable Type；

E) a worker thread is created for each bounded obstruction queue；

F) each worker thread is started.Worker thread takes the object of Runnable types from queue：If any inspection Survey whether object is off running the example of order：In this way, thread operation is exited, otherwise, this is run Runnable objects；If queue is sky, then thread block, waits and waking up.

2nd, realize in the execution SQL methods of this Connection object：

A) determine whether pending SQL is accessible type.Present invention determine that minimum requirements of the method to SQL It is that each burst SQL execution should be separate, it is impossible to have cross-node to calculate.Data import scene and use Insert The SQL of into types is to meet.This be the algorithm of the present invention mainly for scene, but in fact Application scenarios not limited to this, calculate, even multilist join or subquery as long as can guarantee that without the need for cross-node The complicated SQL of type also can be supported.This needs combines SQL syntax according to the database table information on burst Analysis determines；

B) to inexecutable SQL, SQL is sent to into database main controlled node and is performed, method terminates to return；

C) to the SQL that can be performed, the mapping that a burst node is connected to Runnable objects is created；

D) integrated slicing algorithm is used, travels through every SQL.To every SQL：A Runnable object is created, This object carries the burst connection that SQL and SQL is performed；

3rd, realize in the synchronous method of global transaction manager：

A) according to mode of operation (performance priority or data safety are preferential), the count value of thread synchronization is determined.Performance is excellent First pattern value is 1, and the preferential value of data safety is burst number.This count value is atom integer objects；

B) traversal c) described in map each entry：Press the connection of burst node and determine the corresponding SQL queues of burst, Runnable objects are shifted onto in queue；The worker thread blocked because of queue empty before this will be waken up immediately Perform；

C) count value described in a) is subtracted one by worker thread after the completion of SQL execution, and notifies to monitor this counting The synchronous method thread of value again reads off count value, and worker thread itself is again introduced into reading empty queue and blocking State；

D) synchronous method thread is after zero, to exit synchronizing process count value is read, and this time SQL implementation procedures are completed.

Claims

1. a kind of burst direct-connected in driving realizes the high performance method that distributed data database data is imported, and it is special Levy and be：

1) it is a special algorithm for processing distributed data base burst node SQL operations.

2) algorithm requires that sql command can be independently executed on each burst node, and without the need for cross-node computing.

3) algorithm realization gets around distributed data by the way of the integrated slicing algorithm in database-driven The main controlled node in storehouse, directly sets up with burst node and is connected execution SQL.

4) algorithm provides performance priority and the preferential two kinds of mode of operations of data safety.The former can be obtained than normal Rule drive the SQL execution performances being substantially higher by.

2. the direct-connected burst in driving as described in right 1 realizes the high-performance that distributed data database data is imported The special algorithm for processing burst node SQL operations of method indication, it is characterised in that：Algorithm should be in database Realize in driving, its connection interface is JDBC, application oriented system is operated in application system and database point Between piece node；SQL is performed and is independent of main controlled node, but needs from the related burst of main controlled node reading to match somebody with somebody Put, the data such as copy redundancy.

3. the direct-connected burst in driving as described in right 1 realizes the high-performance that distributed data database data is imported The algorithm of method is realized, it is characterised in that：The queue of an obstruction is set up for each burst node.

4. the direct-connected burst in driving as described in right 3 realizes the high-performance that distributed data database data is imported The obstruction queue of method, it is characterised in that：SQL batches can again be planned so that sql command can be criticized Amount is submitted to burst node.

5. the direct-connected burst in driving as described in right 3 realizes the high-performance that distributed data database data is imported The obstruction queue of method, it is characterised in that：Running for pending SQL is connected to by setting up a burst The mapping of object, thus mapping and slicing algorithm are planning burst node queue that SQL is pushed.

6. the direct-connected burst in driving as described in right 3 realizes the high-performance that distributed data database data is imported The obstruction queue of method, it is characterised in that：By setting up a global atomic counters, in performance priority mould 1 is entered as under formula, burst nodes are entered as under data safety pattern.By each point of this counter controls The thread synchronization that piece SQL is performed.