Summary of the invention
In view of the above-mentioned problems, present description provides a kind of partition level connection method of distributed data base,
The distributed data base includes multiple tables of data, and the multiple tables of data is based on identical subregion key and is partitioned,
Wherein, any data table is divided into multiple logical partitions, is under the jurisdiction of multiple logical partitions of different data table based on described identical
Subregion key be attached, the connection method includes:
Receive the concatenate rule planned M logical partition for being located at the first physical machine, wherein the M logical partition
It is under the jurisdiction of M tables of data respectively;
Examine whether physical machine locating for the M logical partition changes;
If so, obtaining the second physical machine locating for the logical partition that position changes, and carries out data migration cost and comment
Estimate, the data migration cost assessment migrates the logical partition to the first physical machine from second physical machine for calculating
Cost value;
Determine whether to execute the concatenate rule according to the result that the data migration cost is assessed.
More preferably, the concatenate rule includes obtaining the subregion from the M logical partition for being located at the first physical machine
The identical logical partition of key value carries out equivalent connection to the identical logical partition of the subregion key value.
More preferably, the result according to data migration cost assessment determines whether to execute the concatenate rule, wraps
It includes: comparing the cost value and default cost value threshold value,
If the cost value is less than the default cost threshold value, the logical partition is moved from second physical machine position
It is moved back to first physical machine, the concatenate rule is executed to the table of multiple logical partitions of the multiple tables of data.
If the cost value is greater than the default cost threshold value, the concatenate rule is not executed.
More preferably, the data migration cost includes that the logical partition is migrated back described from second physical machine
Duration needed for one physical machine.
More preferably, the data migration cost includes that the logical partition is migrated back described from second physical machine
The data exchange number carried out needed for one physical machine.
This specification additionally provides a kind of partition level attachment device of distributed data base, and the distributed data base includes
Multiple tables of data, the multiple tables of data are based on identical subregion key and are partitioned, wherein any data table is divided into multiple logics
Subregion, the multiple logical partitions for being under the jurisdiction of different data table are based on the identical subregion key and are attached, the attachment device
Include:
Receiving module receives the concatenate rule planned M logical partition for being located at the first physical machine, wherein the M
Logical partition is under the jurisdiction of M tables of data respectively;
Inspection module, examines whether physical machine locating for the M logical partition changes;
Cost evaluation module obtains the second physical machine locating for the logical partition that position changes, and carries out data and move
Move cost evaluation, data migration cost assessment migrates the logical partition to the from second physical machine for calculating
The cost value of one physical machine;
Judgment module determines whether to execute the concatenate rule according to the result that the data migration cost is assessed.
More preferably, the concatenate rule includes obtaining the subregion from the M logical partition for being located at the first physical machine
The identical logical partition of key value carries out equivalent connection to the identical logical partition of the subregion key value.
More preferably, the judgment module is further, compare the cost value and default cost value threshold value,
If the cost value is less than the default cost threshold value, the logical partition is moved from second physical machine position
It is moved back to first physical machine, the concatenate rule is executed to the table of multiple logical partitions of the multiple tables of data.
If the cost value is greater than the default cost threshold value, the concatenate rule is not executed.
More preferably, the data migration cost includes that the logical partition is migrated back described from second physical machine
Duration needed for one physical machine.
More preferably, the data migration cost includes that the logical partition is migrated back described from second physical machine
The data exchange number carried out needed for one physical machine.
Correspondingly, this specification additionally provides a kind of computer equipment, comprising: memory and processor;The memory
On be stored with can by processor run computer program;When the processor runs the computer program, execute as above-mentioned
Step described in the partition level connection method of distributed data base.
Correspondingly, this specification additionally provides a kind of computer readable storage medium, is stored thereon with computer program, institute
When stating computer program and being run by processor, the step as described in the partition level connection method of above-mentioned distributed data base is executed.
Method and apparatus are connected using partitions of database grade provided by this specification, is generated in static state and adapts to distributed number
When planning according to the Parallel districts grade connection in library, introduces automatic data processing migration or a small number of subregions are unsatisfactory for partition level connection
The processing capacity of plan condition can be with the subregion physical distribution of self-adaptive processing distributed system based on built-in Cost Model
Variation promotes the execution efficiency of the database operating instructions such as user query.
Specific embodiment
In order in specific database manipulation reduce reading and writing data total amount to reduce the response time, partitions of database is
A kind of common Physical database design technology.Database (or tables of data) subregion is exactly by the data in a large data volume table
Different system partitionings, hard disk are assigned to according to different partitioning strategies or is different on server apparatus, realize data
Equilibrium assignment, with balanced Volume data into different storage mesons, subregion each in this way has shared a part of data equally, so
It can navigate in specified subregion afterwards, demand operation is carried out to tables of data, in addition, also facilitating management data list, for example to be deleted
Except the data of some period, so that it may according to date subregion, then directly delete the date subregion.Therefore data subregion
It is the important means for improving O&M efficiency, improving system performance.
For be related to multi partition data requirements operate, database often use the concurrent executive mode of multithreading/process with
Execution efficiency is improved, to reduce parallel execution bring data exchange cost, the optimizer of database will can close as far as possible
And execution operation be placed in same thread/process and complete, one of which, which merges, to be executed the mode of operation and is known as " subregion cascade
Connect ", i.e., when two tables or multi-table join, if the connecting key (join key) of two tables or multilist that are related to and its subregion key
Unanimously (column for executing attended operation reference to two tables or multilist are also reference when two table or multilist execute division operation
Column), then attended operation can be performed simultaneously inside multiple subregions, and by stages is without carrying out data exchange.But work as database
When being the distributed data base being made up of one or more physical machine network interconnection, distributed data base system often occurs
The operation of Data Migration, optimizing phase are in the data subregion of same physical machine, are also possible to occur in the execution stage certain
Subregion is migrated to the case where other machines, and the Parallel districts grade connection plan that static state generates equally can not also automatically process this
Situation.
Based on problem above, one exemplary embodiment of this specification proposes a kind of partition level connection side of distributed data base
Method, as shown in Figure 1, the distributed data base includes multiple tables of data, the multiple tables of data is based on identical subregion key quilt
Subregion, wherein any data table is divided into multiple logical partitions, is under the jurisdiction of multiple logical partitions of different data table based on described
Identical subregion key is attached, and the connection method includes:
Step 102, the concatenate rule planned M logical partition for being located at the first physical machine is received, wherein the M
Logical partition is under the jurisdiction of M tables of data respectively;
Step 104, examine whether physical machine locating for the M logical partition changes;
If so,
Step 106, the second physical machine locating for the logical partition that position changes is obtained, and carries out data migration cost
Assessment, the data migration cost assessment migrate the logical partition to the first physics from second physical machine for calculating
The cost value of machine;
Step 108, determine whether to execute the concatenate rule according to the result that the data migration cost is assessed.
Distributed data base as described in the examples provided by this specification refers to and is led to by one or more physical machines
The distributed data base of network interconnection composition is crossed, may all there is a complete copy pair of total data in above-mentioned every physical machine
This or copied part copy, the above-mentioned more physical machines positioned at different physical address are interconnected by network, are collectively constituted
One complete, global database concentrated, be physically distributed in logic.Multiple tables of data in database are being based on attribute
After identical subregion key executes logical partition, multiple logical partitions of each tables of data can be located in different physical machines, on
Distributed data base is stated when formulating concatenate rule (or connection plan join) for above-mentioned " partition level connection ", need to be directed to upper
The respective logic subregion positioned at same physical machine of multiple tables of data is stated to formulate connection plan, is divided above-mentioned respective logic with it
Area connection, with facilitate be attached in same thread/process of the physical machine after data a variety of operations, as to data
Increase, delete, change, look into.
More physical machines in distributed data base that Fig. 2 illustrates the offer of one embodiment of this specification execute partition level parallel
The logical architecture of attended operation illustrates only two tables of data t1, t2 bases for simplicity in each physical machine shown in Fig. 2
The connection of logical partition (the p0 subregion of such as t1 and the p0 subregion of t2) after the identical subregion key subregion of attribute, corresponding,
Those skilled in the art is, it should be understood that in practical applications, may include the multiple and different of a tables of data in same physical machine
Logical partition does not limit logical partition number of the tables of data in a physical machine, but partition level in the present specification
Concatenate rule (join logical operation as shown in Figure 2) need to be for the M logical partition (p1 of such as t1 for being under the jurisdiction of M tables of data
The p2 subregion of the p1 subregion or t1 of subregion and t2 and the p2 subregion of t2) it executes, those skilled in the art are, it should be understood that M should be derived from so
Number.Above-mentioned " counterlogic subregion ", is that multiple logical partition can be attached based on above-mentioned identical subregion key, and can be
To a variety of operations of data after being attached in same thread/process of same physical machine.It is above-mentioned based on identical subregion key into
The detailed process of row subregion, is not construed as limiting in the present specification, can select hash compartment model or range compartment model etc.
A variety of partitioning strategies modes.
For example, in the embodiment shown in Figure 2, t1 and t2 can carry out subregion, the first row of t1 based on hash compartment model
C1 is its subregion key, and the first row c1 of t2 is also its subregion key, and the c1 of t1 arranges, subregion process identical as the c2 Column Properties of t2
It can be with are as follows:
Select*from t1, t2where t1.c1=t2.c1;
create table t1(c1int,c2int)partition by hash(c1)partitions 4;
create table t2(c1int,c2int)partition by hash(c1)partitions 4;
To which t1 and t2 table is respectively divided into tetra- logical partitions of p0, p1, p2, p3.
In general, above-mentioned specific partition level concatenate rule is planned by the optimizer of distributed data base.Optimizer base
Physical machine position where the logical partition shown in the partition table of current database, for multiple data in same physical machine
The counterlogic subregion (the p3 subregion of such as t1 and the p3 subregion of t2) of table (t1, t2 as shown in Figure 2) formulates partition level connection rule
Then (join 0, join 1, join 2 or join 4 as shown in Figure 2), which may include the multiple data that should be connected
Multiple counterlogic subregions in same physical machine of table, connect the contents such as process at connection type, can generally be presented as optimization
The executive plan tree that device processing generates.Above-mentioned connection type includes but is not limited to the interior connection to above-mentioned multiple logical partitions,
Outer connection and interconnection etc., and above-mentioned connection type should all be based on above-mentioned multiple tables of data in generation point in partition level connection
Subregion key when area and carry out;The data-handling efficiency after partition level connection is carried out to multiple databases to further increase, on
To state the connection type in concatenate rule should be preferably equivalent connection, i.e., from the multiple tables of data that are under the jurisdiction of for being located at same physical machine
The identical logical partition of the subregion key value, logical partition identical to the subregion key value are obtained in multiple logical partitions
Carry out equivalent connection.
Above-mentioned connection plan is influenced to prevent above-mentioned logical partitioned data from physical migration occurs in distributed data base
Implementation is accurately executed, can be each database in the logic level of database in one illustrative examples of this specification
Each logical partition regressor RX, logical operator RX is in the concatenate rule (or connection plan) for receiving optimizer and sending
Afterwards, examine the corresponding physical location of each logical partition compared to the physical location that the logical partition for including in concatenate rule should be at
Whether change:
Verify that include in the corresponding physical machine of the logical partition and concatenate rule is somebody's turn to do if executing logical operator RX
The physical machine that logical partition should be at is identical, i.e., in same physical machine, then the RX operator executes " short-circuit mode ", i.e., above-mentioned
Concatenate rule is available for the logical partition, and concatenate rule join 0 as shown in Figure 2, join 1, join 2 and join 3 are equal
Available to its logical partition, which can directly return to the result of its data scanning to Database Systems (or optimizer).
Since distributed data base is easy to happen Data Migration caused by the other instruction executions of artificial or system, if
Executing the logical partition that logical operator RX verifies that its corresponding logical partition has not been assert in concatenate rule should locate
In physical machine on, the p3 subregion of t2 as shown in Figure 2 not concatenate rule generate when assert the subregion where physical machine 3
On, the module that RX can be responsible for including into the distributed data base data partition information is communicated, and corresponding is patrolled with obtaining this
The position of the now locating physical machine 4 of volume subregion, and the position of the now locating physical machine 4 of the corresponding logical partition is sent
It is responsible for calculating the cost evaluation module of Data Migration to database, to carry out data migration cost assessment, above-mentioned cost evaluation mistake
Journey include calculate by the corresponding logical partition of logical operator RX from physical machine 4 migrate back physical machine 3 needed for consumption database
The cost value (cost) of system, such as in Fig. 2, Database Systems (usually optimizer) answer asking for the corresponding RX of p3 subregion of t2
It asks, assessment migrates back the p3 subregion of t2 in connection plan join 3 needed for corresponding physical machine 3 from now locating physical machine 4
Cost value.
It is responsible for calculating the cost evaluation module of Data Migration, the usually optimization of the distributed data base in above-mentioned database
The functional module that device includes, the specific manifestation of the cost of the Data Migration, it may include system is by the logical partition from described
Two physical machines (physical machine 4 in such as Fig. 2) migrate back consume needed for first physical machine (physical machine 3 in such as Fig. 2) when
Long (system command delay) may also include and migrate the logical partition from second physical machine (physical machine 4 in such as Fig. 2)
Return the data exchange number computer system common generation carried out needed for first physical machine (physical machine 3 in such as Fig. 2)
Valence indicates.This specification does not limit mathematical model or algorithm based on above-mentioned cost evaluation process, those skilled in the art
The cost evaluation model of Data Migration can be set based on specific application scenarios, and different numbers is set for specific application scenarios
According to migration cost threshold value, which is used to indicate that Database Systems to be to maintain original partition level connection plan
(rule) and migration cost value of the acceptable logical partitioned data between different physical machines.
After assessment obtains the migration cost value of above-mentioned logical partitioned data, Database Systems can be according to the Data Migration generation
The result of valence assessment is made whether to can be performed the judgement of above-mentioned concatenate rule, the mode of judgement can there are many, such as data base set
The logic judgment module of system can be selected cost threshold comparison method and be judged, according to the above-mentioned data migration cost value being calculated
With system for logical partitioned data migrate and preset migration cost threshold value compares:
If cost evaluation model, which calculates resulting cost value, is less than the default cost threshold value, which can be sent out
The logical partition is migrated back the first physical machine from the second physical machine, and determined to the multiple tables of data by migration instruction out
The tables of data of multiple logical partitions execute the concatenate rule, above-mentioned migration can pass through the side of such as RPC teledata calling
Formula is realized.
If cost evaluation model, which calculates resulting cost value, is greater than the default cost threshold value, Database Systems will not be held
The row concatenate rule, concatenate rule join 4 as shown in Figure 2 are no longer executed.In the case, optimizer can be generated
New concatenate rule reformulates new partition level connection plan to the logical partition that present moment is in same physical machine,
And to the logical partition being not located in same physical machine, from the demand of practical application scene, as carried out in difference
The connection of multiple logical partitions in physical machine should then formulate new connection plan, such as respectively hash connection two table of left and right, extensively
It broadcasts left-handed watch, send the executive plans such as right table at random.
Certainly, above-mentioned Database Systems are made whether that above-mentioned connection can be performed according to the result that the data migration cost is assessed
Marking and queuing system, patrolling needed for such as executing to the partition level concatenate rule in different physical machines also can be selected in the judgment mode of rule
It collects partition data migration cost be ranked up by the size of cost value, Database Systems choose cost value in tolerance interval
Interior partition level concatenate rule, and the Data Migration of its respective logic subregion is returned by former partition level connection by corresponding RX operator
The physical machine of rule instruction is to carry out the execution of former partition level concatenate rule.
This specification above-described embodiment realizes the partition level connection to distributed data base by introducing logical operator RX
The processing of plan generates the Parallel districts grade connection plan for adapting to distributed data base in the optimizer static state of Database Systems
When, automatic data processing migration is introduced by the interaction of RX and each functional module of Database Systems or a small number of subregions are unsatisfactory for
The processing capacity that partition level connects plan condition can be with self-adaptive processing distributed data based on built-in cost evaluation model
The physical distribution of the logical partition in library changes, and promotes the execution efficiency of the database operating instructions such as user query.The skill of this field
Art personnel are, it should be understood that logical operator RX is only abstract representation of the Database Systems in logical operation level, to the reality of the logical process
Border implementation should be not limited to any expression way of any computer language.
Corresponding with the realization of above-mentioned process, the embodiment of this specification additionally provides a kind of subregion cascade of distributed data base
Connection device.The device can also be realized by software realization by way of hardware or software and hardware combining.With software reality
It is CPU (Central Process Unit, the central processing by place equipment as the device on logical meaning for existing
Device) by corresponding computer program instructions be read into memory operation formed.For hardware view, in addition to shown in Fig. 4
Except CPU, memory and memory, the equipment where the data processing equipment is also typically included for carrying out wireless signal transmitting-receiving
Other hardware such as chip, and/or for realizing other hardware such as board of network communicating function.
Fig. 3 show a kind of partition level attachment device 30 of distributed data base, the distribution provided by this specification
Formula database includes multiple tables of data, and the multiple tables of data is based on identical subregion key and is partitioned, wherein any data table quilt
It is divided into multiple logical partitions, the multiple logical partitions for being under the jurisdiction of different data table are based on the identical subregion key and are attached,
The attachment device 30 includes:
Receiving module 302 receives the concatenate rule planned M logical partition for being located at the first physical machine, wherein described
M logical partition is under the jurisdiction of M tables of data respectively;
Inspection module 304, examines whether physical machine locating for the M logical partition changes;
Cost evaluation module 306 obtains the second physical machine locating for the logical partition that position changes, and carries out data
Migrate cost evaluation, data migration cost assessment for calculate by the logical partition from second physical machine migrate to
The cost value of first physical machine;
Judgment module 308 determines whether to execute the concatenate rule according to the result that the data migration cost is assessed.
It more preferably, is the execution efficiency for further increasing the parallel partition level connection of distributed data base, the connection rule
It then include obtaining the identical logical partition of the subregion key value from the M logical partition for being located at the first physical machine, to institute
It states the identical logical partition of subregion key value and carries out equivalent connection.
More preferably, the judgment module is further, compare the cost value and default cost value threshold value,
If the cost value is less than the default cost threshold value, the logical partition is moved from second physical machine position
It is moved back to first physical machine, the concatenate rule is executed to the table of multiple logical partitions of the multiple tables of data.
If the cost value is greater than the default cost threshold value, the concatenate rule is not executed.
More preferably, the data migration cost includes that the logical partition is migrated back described from second physical machine
Duration needed for one physical machine.
More preferably, the data migration cost includes that the logical partition is migrated back described from second physical machine
The data exchange number carried out needed for one physical machine.
The function of modules and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus
Realization process, the relevent part can refer to the partial explaination of embodiments of method, and details are not described herein.
The apparatus embodiments described above are merely exemplary, wherein described, module can as illustrated by the separation member
It is physically separated with being or may not be, the component shown as module may or may not be physics mould
Block, it can it is in one place, or may be distributed on multiple network modules.It can be selected according to the actual needs
In some or all of unit or module realize the purpose of this specification scheme.Those of ordinary skill in the art are not paying
In the case where creative work, it can understand and implement.
Device that above-described embodiment illustrates, module can specifically realize by computer chip or entity, or by having certain
The product of function is planted to realize.A kind of typically to realize that equipment is computer, the concrete form of computer can be individual calculus
Machine, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation equipment,
Any several equipment in E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment
Combination.
Corresponding with above method embodiment, the embodiment of this specification additionally provides a kind of computer equipment, the calculating
Machine equipment includes memory and processor.Wherein, the computer program that can be run by processor is stored on memory;Processing
Device executes the partition level connection method of distributed data base in this specification embodiment in the computer program of operation storage
Each step.Content before being referred to the detailed description of each step of the partition level connection method of distributed data base,
It is not repeated.
Corresponding with above method embodiment, the embodiment of this specification additionally provides a kind of computer-readable storage medium
Matter is stored with computer program on the storage medium, and it is real to execute this specification when being run by processor for these computer programs
Apply each step of the partition level connection method of distributed data base in example.To the partition level connection method of distributed data base
The detailed description of each step refer to before content, be not repeated.
The foregoing is merely the preferred embodiments of this specification, all in this explanation not to limit this specification
Within the spirit and principle of book, any modification, equivalent substitution, improvement and etc. done should be included in the model of this specification protection
Within enclosing.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory
(SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM are read-only
Memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or
Other magnetic storage devices or any other non-transmission medium, can be used for storage can be accessed by a computing device information.According to
Herein defines, and computer-readable medium does not include temporary computer readable media (transitory media), such as modulation
Data-signal and carrier wave.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that the embodiment of this specification can provide as the production of method, system or computer program
Product.Therefore, the embodiment of this specification can be used complete hardware embodiment, complete software embodiment or combine software and hardware side
The form of the embodiment in face.Moreover, it wherein includes that computer is available that the embodiment of this specification, which can be used in one or more,
It is real in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form for the computer program product applied.