CN109416683A - The traffic operation method of data processing equipment, Database Systems and Database Systems - Google Patents
The traffic operation method of data processing equipment, Database Systems and Database Systems Download PDFInfo
- Publication number
- CN109416683A CN109416683A CN201680084285.9A CN201680084285A CN109416683A CN 109416683 A CN109416683 A CN 109416683A CN 201680084285 A CN201680084285 A CN 201680084285A CN 109416683 A CN109416683 A CN 109416683A
- Authority
- CN
- China
- Prior art keywords
- data
- data processing
- operation symbol
- processing equipment
- operator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
Abstract
A kind of data processing equipment (40), for executing the part operation of distributed data base system (404).The data processing equipment (40) includes: logic planner (42), for generating logic plan based on data base querying;Physics planner (43), for generating physics plan based on the logic plan;Marking unit (44), is used for: determining the inside the plan traffic operation symbol of physics, wherein traffic operation symbol is the operator comprising communication;The communication pattern of traffic operation symbol is determined based on the operator types of traffic operation symbol;The symbol of traffic operation determined by marking, each operator have the data markers of the communication pattern including identified traffic operation symbol.In addition, the data processing equipment (40) includes: code generator (45), it is used for: executable code is generated based on physics plan, and data markers are converted into communicator instruction.In addition, the data processing equipment (40) includes: code executor (46), for executing the executable code;Communicator (47), for being communicated based on communicator instruction with other data processing equipments (402 and 403) in distributed data base system (404).
Description
Technical field
The present invention relates to computer software engineering fields, more particularly, to distributed data base system.
Background technique
Distributed data base system has multiple and different nodes, and the node is also referred to as data processing equipment.Execute data
When library inquiry, communication will do it between these different nodes.Especially in the Database Systems with multiple nodes, this communication
The bottleneck of Database Systems may be become.
As shown in Figure 1, a SQL query execution pipeline can be divided into many steps.
1, plan: in first step 12, inquiry plain text 11 is converted to logic plan, positioned at the tree-shaped of query pipeline
Figure is intermediate.Logic plans optimised in second step 13 and is converted into physics plan, it is also contemplated that data parameters optimize.
Physics plan is made of physics plan operator, represents the certain basic behaviour carried out according to database low-level interface to data set
Make.
2, in third step 14, executable code code building: is generated based on physics plan.This improves data base set
The performance of system.This method is used for some example frames of distributed SQL inquiry operation: SparkSQL, Cassandra etc..
3, it executes: in four steps 15, executing the code prepared in step 14 previous.If it is distributed data
Library, then on the work station cluster connected in a network it is synchronous execute it is primary.
The data 20 being made of data block 21 to 23 are depicted in Fig. 2 and using the executive plan 24 of multiple nodes 25 to 27
Between connection, particularly illustrate different data blocks 21 to 23 and be stored on different nodes 25 to 27, and only these not
It is interacted between same node 25 to 27.Executive plan 24 can bring result 28.
When executing distributed query, data to be processed are spread in cluster, so that every machine only stores a part
Set of source data.Nevertheless, the data exchange between clustered node may be needed to some operations of data set, such as based on whole
A data set executes the set operation of accumulation single value.This network communication may will be greatly reduced the performance of Database Systems,
Either it is related to large data sets still not execute in the best way.
Different SQL physical layer operations symbols produce the network flow that can match different communication modes.These communication moulds
Formula is as shown in Fig. 3 a to Fig. 3 c.Fig. 3 a shows the end-to-end communication mode between two nodes 30.It is shown in Fig. 3 b multiple
Multicast communication mode between node 31, wherein communication terminates at multiple nodes 31 from node 31 together.It is shown in Fig. 3 c
Many-many communication mode between multiple nodes 32, wherein each node can under this communication pattern with other each sections
Point is communicated.
All these modes are all widely used in distributed query execution.Multicast is replicated for data;Many-many communication mode
For resetting;End-to-end mode is used as the basis of all other type communication.The exemplary solution of distributed query operation
It does not distinguish between these modes.However, network performance is heavily dependent on the realization of communication pattern, dedicated transmissions association
View may have better performance to certain specific communication patterns.
The example protocol TCP that can be used for transport layer is highly suitable for end-to-end communication, because the communication in agreement thus is
It is executed by the end to end connection being previously created.The multicast communication mode low efficiency realized using TCP, this is because identical
Data should be transmitted several times, i.e., each destination is transmitted to by connection accordingly, so a large amount of repetitive streams can be generated in a network
Amount.
Spark frame realizes broadcast communication by using the BitTorrent application layer protocol for being still based on TCP.This side
Method can accelerate to broadcast, but have several drawbacks in that:
1, nevertheless, some nodes can receive the broadcast data from adjacent node, reduce the negative of sending node
The problem of carrying, but not can solve the data packet replicated under normal circumstances.Network should transmit identical as the quantity of nodes
Data packet.
2, the Additional Agreement acted at the top of transport layer leads to additional expense, this will seriously affect the broadcast of small message
Energy.
In contrast, the primary support multicast communication mode of TIPC transport layer protocol, and show to compare TCP in such a mode
Good many performances.But the end to end performance of TIPC ratio TCP is poor, therefore on the problem of should using which agreement not
Specific answer.
We have found that there are following three main problems for exemplary solution:
1, exemplary solution is based on the single networking transport agreement selected by certain parameter static state.This method is brought
The expense of data exchange in communication pattern, and selected agreement can not be used preferably.
2, puppy parc is intended to be suitable for all possible situation.It will lead to using puppy parc and caused by agreement versatility
Expense, even if it is only used for several use patterns.
3, the additional logic at the top of transport layer will lead to additional expense.
Summary of the invention
Therefore, the purpose of the present invention is to provide a kind of the permission equipment of efficient communication and sides in distributed data base
Method.
The purpose is realized by the feature of the claim 13 of the claim 1 and the method for described device.In addition, also
The purpose is realized by the feature of claim 14 associated with computer program.Dependent claims include further to send out
Exhibition.
First aspect present invention provides a kind of data processing equipment, and the part for executing distributed data base system is grasped
Make.The data processing equipment includes: logic planner, for generating logic plan based on data base querying;Physics planner is used
In based on logic plan generation physics plan;Marking unit determines the inside the plan traffic operation symbol of physics, wherein communication behaviour
Making symbol is the operator comprising communication;The communication pattern of traffic operation symbol is determined based on the operator types of traffic operation symbol;Mark
The symbol of traffic operation determined by remembering, each operator have the data mark of the communication pattern including identified traffic operation symbol
Note.In addition, data processing equipment includes code generator, for generating executable code based on physics plan, and by data mark
Note is converted to communicator instruction;Communicator, at based on other data in communicator instruction and distributed data base system
Reason equipment is communicated.Therefore communication task and the operation of routine data library can be separated, thus realize very efficient communication,
To realize very efficient database manipulation.
In the first implementation of first aspect, data base querying is SQL query.In this way allow using easily obtaining
Database component.
In first aspect in a kind of a kind of implementation of implementation, the marking unit is for determining in distribution
The operator that communication is realized in Database Systems, especially as the duplication operator of traffic operation symbol, and/or mapping reduction behaviour
It accords with and/or sorting operation accords with, and/or reset attended operation symbol and/or Hash attended operation symbol, and/or broadcast Hash company
Operator is connect, and/or merges attended operation symbol.Allow very efficient database manipulation in this way.
In first aspect in a kind of a kind of implementation of implementation, the marking unit is used to be based on traffic operation
Symbol distinguishes one group of network communication mode.Additionally or alternatively, the marking unit is for determining for replicating operator, and/
Or mapping reduction operation symbol and/or sorting operation accord with, and/or reset attended operation symbol and/or Hash attended operation symbol, and/
Or broadcast Hash attended operation symbol, and/or merge the end-to-end communication mode of attended operation symbol.Additionally or alternatively, described
Marking unit is for determining for replicating operator and/or broadcasting multicast or the broadcast communication mode of Hash attended operation symbol.Separately
Other places or optionally, the marking unit for determine for reset attended operation symbol and/or Hash attended operation symbol, and/or
Merge the many-many communication mode of attended operation symbol.Therefore, it can be achieved that the high flexible of the communication pattern based on operator selects.
In another implementation of first aspect or above-mentioned shown implementation, the communicator is for being at least based on
Communicator instruction dynamically determines the communication protocol for being ready to use in each operator.Extremely efficient database behaviour may be implemented in this way
Make.
In a kind of a kind of upper implementation of implementation, data markers further include to operator transmission data it is total
Amount;The communicator is also used to dynamically determine the communication for being ready to use in each operator based on the total amount of data transmitted to operator
Agreement.Most suitable communication protocol can be preferably selected in this way, to improve the operating efficiency of database.
In another implementation of first two implementation, the communicator is used for based on true for each operator
Fixed communication protocol is communicated.Allow extremely efficient database manipulation in this way.
In another implementation of first aspect or above-mentioned shown first aspect implementation, the data processing is set
Standby further includes storage unit, at least part data for storing in distributed storage Database Systems.By in distribution
Data are divided between different data processing equipment in Database Systems, realize extremely efficient database manipulation.
In another implementation of first aspect or above-mentioned shown first aspect implementation, the data processing is set
Standby further includes inquire-receive device, for receiving data library inquiry.Allow to handle standardized data base querying in this way.
In another implementation of first aspect or any of the above-described kind of implementation, the communicator is used at least to
At least part of data to be processed is transferred to other data processing equipments.It in this way can not be in all data processing equipments
All data are stored, to save memory space.
Second aspect of the present invention provides a kind of Database Systems, includes at least according to first aspect or according to first aspect
First data processing equipment of any implementation and according to first aspect or according to the first of any implementation of first aspect
Second data processing equipment of aspect.The communicator of first data processing equipment be used for based on determining communicator instruction at least with
Second data processing equipment is communicated.It is thus achieved that extremely efficient Database Systems.
In a kind of implementation of second aspect, the Database Systems include at least third data processing equipment.The
The communicator of one data processing equipment is used to instruct based on identified communicator to execute and at least the second data processing equipment
With the communication of third data processing equipment.It is thus achieved that extremely efficient Database Systems.
Third aspect present invention provides a kind of for operating the side of the Database Systems including multiple data processing equipments
Method.This method comprises: generating logic plan based on data base querying;Logic-based plan generates physics plan;Determine physics meter
Traffic operation symbol in drawing;The communication pattern of traffic operation symbol is determined based on the operator types of traffic operation symbol;Label institute is really
Fixed traffic operation symbol, wherein each operator has the data markers of the communication pattern including identified traffic operation symbol.
In addition, this method comprises: generating executable code based on physics plan;Data markers are converted into communicator instruction;Execute institute
State executable code;It is instructed based on communicator and is communicated with other data processing equipments in distributed data base system.By
This realizes the extremely efficient operation of distributed data base system.
Fourth aspect present invention provides a kind of computer program with program code, for working as the computer program
When running on computers, the method for executing third aspect present invention, to realize extremely efficient database manipulation.
In short, proposing a kind of realization distributed networks database query execution pipeline net to solve the problems, such as above-mentioned discovery
The new method of network communication.Advantageously execute following steps:
1, planning phase extension is as follows:
Identify and mark the node (physical operator) of the inquiry physics plan comprising communication.
2, code generation phase extension is as follows:
According to the data markers added in step 1, adds data packet and is filled with the creation other message of application-level, comprising:
(a) communication mode identifier;(b) additional service information;(c) data to be exchanged.
3, it is as follows to execute stage extension:
A. communicator generalized Petri net query execution system is used, which encapsulates all biographies to use at runtime
Defeated layer protocol.
B. communication mode identifier and additional service information are sent to communicator together with data.
4, optimal communication agreement of the dynamic select for the data exchange in specified communication mode.
Usually, it should be noted that all arrangements, equipment, component, unit and device described in this application etc. can be by soft
Part or hardware element or its any kind of combination are realized.In addition, equipment can be processor or may include processor,
In the function of element described in this application, unit and device can be realized by one or more processors.It is described in the application
Various entities performed by all steps and by the function that various entities execute be intended to indicate that corresponding entity be suitable for or
For executing corresponding step and function.Even if be described below or specific embodiment in, specific function performed by general entity
Energy or step will not be reflected in the description for the specific detailed elements of the entity for executing the particular step or function, technology people
Member, which should understand, to realize these methods and function by software or hardware element or its any kind of combination.
Detailed description of the invention
The present invention is elaborated below with respect to the embodiment of the present invention and with reference to attached drawing, in the accompanying drawings:
Fig. 1 shows exemplary database query execution assembly line;
Fig. 2 shows exemplary distributed data processing;
Fig. 3 a shows illustrative end-to-end communication mode;
Fig. 3 b shows illustrative multicast communication mode;
Fig. 3 c shows illustrative many-many communication mode;
Fig. 4 shows the first embodiment of data processing equipment according to a first aspect of the present invention;
The extension for the planning phase that the second embodiment that Fig. 5 shows first aspect present invention uses;
Fig. 6 shows the exemplary object generated in the 3rd embodiment of data processing equipment according to a first aspect of the present invention
Reason plan;
Fig. 7 shows data markers used by the fourth embodiment of data processing equipment according to a first aspect of the present invention
The exemplary physical plan of extension;
Fig. 8 shows the details of the 5th embodiment of data processing equipment according to a first aspect of the present invention;
Fig. 9 shows the example data that the sixth embodiment of data processing equipment according to a first aspect of the present invention uses
Library inquiry implementation procedure;
Figure 10, which is shown, passes through tool used in the 7th embodiment of data processing equipment according to a first aspect of the present invention
There is the communication of the heterogeneous networks of multiple transport layer protocols;
Figure 11 shows the details of the 8th embodiment of data processing equipment according to a first aspect of the present invention;
Figure 12 shows the multiplexing of the exemplary service between the different nodes in distributed data base system;
Figure 13 shows the first embodiment of the operating method of Database Systems according to a third aspect of the present invention;
Figure 14 is shown to entire test cases using data processing equipment according to a first aspect of the present invention or according to this
Achievable result when the operating method of the distributed data base system of the invention third aspect;
Figure 15, which is shown, uses data processing equipment according to a first aspect of the present invention to single broadcast Hash attended operation
Or distributed data base system according to a third aspect of the present invention operating method when achievable result;
Figure 16, which is shown, uses data processing according to a first aspect of the present invention to processing time (% relative to baseline)
Achievable result when the operating method of equipment or distributed data base system according to a third aspect of the present invention.
Specific embodiment
Firstly, Fig. 1 and Fig. 2 show the function of illustrating exemplary distributed data library system.Fig. 3 a to Fig. 3 c describes difference
Communication pattern.In conjunction with Fig. 4 to Figure 12, the different embodiments of data processing equipment according to the first aspect of the invention are carried out
Description.The function of the method for Figure 13 detailed description according to a third aspect of the present invention.Finally, Figure 14 to Figure 16 show it is achievable
Efficiency increase.
Similar entity and reference number partially omit in different figures.
Term and its meaning are as described below:
Fig. 4 shows the first embodiment of the Database Systems 404 of second aspect of the present invention, including first aspect present invention
Data processing equipment 40 first embodiment.Data processing equipment 40 includes inquire-receive device 41, which connects
It is connected to logic planner 42, logic planner 42 again connects to physics planner 43.It is single that physics planner 43 is connected to label
Member 44, marking unit 44 is connected to code generator 45.In addition, code generator 45 is connected to code executor 46, code is held
Row device 46 is connected to communicator 47.All units 41 to 47 are connected to control unit 48.In addition, communicator 47 is connected to network
401, network 401 is connected to other data processing equipments 402 and 403.Network 401 and data processing equipment 402 and 403 are not several
Distributed data base system 404 is formed according to a part of processing equipment 40, but with data processing equipment 40.Control unit 48 controls
The function of all other unit of data processing equipment 40.
In distributed data base system, inquire-receive device 41 receives data base querying, especially SQL query.Inquire quilt
It handles and passes to logic planner 42.Logic planner 42 is based on data base querying and generates logic plan.The logic plan quilt
Physics planner 43 is passed to, physics planner 43 generates physics plan from logic in the works.Physics scheduled transfer is single to label
Member, the marking unit determine the traffic operation symbol of physics in the works, and traffic operation symbol is the operator comprising communication.Then it marks
The operator types that unit 44 is accorded with based on traffic operation determine the communication pattern of traffic operation symbol.Particularly, marking unit determines
The operator of communication is realized in distributed data base system, be especially intended to traffic operation symbol duplication operator, and/
Or mapping reduction operation symbol and/or sorting operation accord with, and/or reset attended operation symbol and/or Hash attended operation symbol, and/
Or broadcast Hash attended operation symbol, and/or merge attended operation symbol.Finally, the traffic operation symbol that marking unit label is determining, often
A operator all has the data markers of the communication pattern of the determination including traffic operation symbol.
In addition, marking unit 44, which is based on traffic operation symbol, distinguishes one group of network communication mode.Particularly, marking unit determines
For replicating operator, and/or mapping reduction operation symbol and/or sorting operation symbol, and/or attended operation symbol is reset, and/or
Hash attended operation symbol, and/or broadcast Hash attended operation symbol, and/or merge the end-to-end communication mode of attended operation symbol.Together
When, marking unit is determined for replicating operator and/or broadcasting multicast or the broadcast communication mode of Hash attended operation symbol.This
Outside, marking unit 44 is for determining for resetting attended operation symbol and/or Hash attended operation symbol, and/or merging attended operation
The many-many communication mode of symbol.
Then by the physics scheduled transfer of label to code generator, code generator generates executable according to physics plan
Data markers are simultaneously converted to communicator instruction by code.The code that the instruction of these communicators is delivered to execution executable code is held
Row device 46.It is communicated in addition, communicator 47 is based on communicator instruction with other data processing equipments 402 and 403.
In addition, marking unit 44 marks the total amount of data transmitted to operator.Then, communicator 47 is based on number to be sent
The communication protocol for each operator is determined according to total amount and based on communication pattern.
SQL is the actual standard of data bank access method.The following are the SQL Q3 inquiries in example T PC-H benchmark test
Example:
SQL query text is handled by database engine, and SQL query text conversion is by database engine in the first step
The referred to as tree-shaped display diagram of logic plan.Later, database engine executes logic optimization, generates the logic plan of optimization, then
Be converted to the basis low level data library API.The plan is known as physics plan, it is also contemplated that database physical parameter optimizes.
The leaf of physics plan tree indicates data source, and node is physics plan operator, indicates on relational database not
Same basic operation.
Kinematic chains finally increase an additional physics plan processing step, are shown specifically in Fig. 5.Input 50 by
Physics plan 51 and data association message 52 form.The input 50 is delivered to the functional block 53 for constituting the expansion plan stage.It should
Functional block 53 includes that detection communicates relevant physics plan operator 54.As for identification communicate relational operator input,
The corresponding relationship of the operator and communication pattern that are stored in knowledge base 56 is used as input.In addition, the expansion plan stage 53 also wraps
Include the operator that label detects respectively in functional block 55.Finally, the physics of extension is intended to be 57 transmitting of output.
Fig. 6 shows exemplary physical plan 60.Physics plan 60 includes multiple operators 601 to 614.Traffic operation symbol
601,603,604 and 605 dashed lines labeled is used, and these operators 601,603,604 and 605 is detected.
Fig. 7 shows illustrative extension physics plan 70.Here, additional data label 71,72,73,74 and 75 is collected
At into extension physics plan 70.Each of data markers 71 to 75 are all comprising the letter about used communication pattern
Breath.Other information can store in data markers.
Above-mentioned data markers and relevant physics plan operator have it is stringent contact, and convey about communication pattern
Information, such as its communication pattern ID, to be used for further data exchange.
It is selected using such as following table communication pattern ID associated to specific physics plan operator.The table defines
Corresponding relationship between physics plan operator and communication pattern.
Fig. 8 shows the detailed content of code generation phase.The extension physics plan 81 that the marking unit 44 of Fig. 4 generates is used
It inputs.The extension physics plan is delivered to code generator 82, and code generator 82 corresponds to the code generator of Fig. 4
45.Code generator 82 uses the information being stored in code building library 84.Particularly, code generator 82 is used for physics
One group of transformation rule of plan operator 85 and including data markers converter and for existing method modification one group of library modification.
As output, executable code 83 is generated by code generator 82.
A kind of possible exemplary codes generation method is that physics plan is converted to the code write with all-purpose language, example
Such as C++.The advantages of this method is: the code of generation can be compiled as executable generation by the additional optimizations that compiler provides
Code, then executes code building by special module-code generator 82, and code generator 82 is by the tree-shaped exhibition of physics plan
Diagram is converted to executable plain code.The extension physics plan that previous step generates includes novel physical plan operator-data mark
Note, the label are also converted in executable code.Therefore code generator is by being used for data markers physics schedules operations
The converter of symbol is extended.
Other than the converter for data markers, it is also necessary to which modification is used for the conversion of existing physics plan operator
Device, to provide the access to communications related data for communication layers.
Exemplary prototype query execution system is known as Flint.Flint allows to execute the inquiry physics plan indicated with C++,
Code generation phase output can be considered as.Being described in detail in for this method is forth below.
In the stage of execution, the program of generation is run on distributed type assemblies.All node synchronization process one in cluster
Divide input data, it means that they execute identical operation to data in synchronization.Fig. 9 shows one for cluster
The execution 90 of specific label physical operator is presented in the SQL query implementation procedure of separate nodes, the figure presented.
As input, data 91 and data markers 92 to be processed are used.Data markers 92 include communication-related information 98,
Such as communication pattern ID and additional service information.In addition, the executable code 93 to generation is handled, to carry out to local
Data 94, the processing of data to be exchanged 95 and result data 96.In addition, data to be exchanged 95 is transmitted by communicator 99, the communication
Device 99 corresponds to the communicator 47 of Fig. 4.After having handled executable code 93, next operator 97 is handled.
Figure 10 shows the processing of communicator progress.It includes actuator application program 102 that code, which executes 101, uses communication
Device 103 is extended.The communicator 103 encapsulates all transport layer protocols 104,105 and 106 ready for use at runtime.Transmission
Each of agreement 103,104 and 105 forms network 1001, and they between all clustered nodes 107,108 and 109
Addressing be clearly converted into the addressing used by application program.For this target, communicator 103 forms a conversion
Table stores the corresponding relationship between application layer and transport Layer address.
Application layer address | The address TCP | .... | The address TIPC |
0 | 192.168.1.0:5555 | .... | 1.1.1:(100,1) |
1 | 192.168.1.1:5555 | .... | 1.1.2:(100,2) |
... | ... | .... | ... |
N | 192.168.1.N:5555 | .... | 1.1.3:(100,2) |
Communicator 103 can be based on any number of transport layer protocol 104,105 and 106, it might even be possible to encapsulate other answer
Use layer protocol.
Figure 11 shows the extensive side of the communicator 113 of the communicator 47 of the communicator 103 and Fig. 4 corresponding to Figure 10
Case.Communication-related information 111 associated with input data 110 is for following communication protocols selection 114.Communication pattern is based on depositing
The information stored up in knowledge base 116 is selected.Selected agreement ID is transmitted to transmitter 115 together with data.Receiver
117 reception data do not need any special operation.The data received are only only transmitted to only higher agreement as output data 112
Layer.
To select communication protocol, all data to be exchanged are transmitted to communicator 113 from application program.The data can be with
Include or not comprising relative additional information.If data do not have information on services, number is transmitted using default transport protocol
According to.The following table shows corresponding relationships possible between identified communication pattern and obtained communication protocol.
If data indicate additional information, use and the more matched transport protocol transmission data of data exchange communications mode.
Determine what transport protocol to be related to following aspect using:
The static knowledge about transport protocol being obtained ahead of time;
Dynamic cata exchange parameter, such as: total amount of data, optimization rank in communication pattern ID, mode.
Finally, the communication protocol determined is transmitted for data.
Particularly, in transmitting terminal, according to agreement ID, the flow that application program generates is multiple between the transport layer protocol of support
With.The data received under receiving end, different agreement are merged into one and are streamed to corresponding application program, do not need by
Any transmission relevant information is transmitted to higher protocol layer, so that data transmit as it is, without any relative
Added field.
The transmitter of nodes X 1201 and the receiver of node Y 1208 are shown in Figure 12.The transmitter of nodes X includes
Multiplexer 1204 and multiple protocol stacks 1205,1206 and 1207.Input data 1202 and corresponding agreement ID 1203 are sent to
Multiplexer 1204, multiplexer 1204 are based on agreement ID 1203 and select corresponding protocol stack 1205 to 1207, and using corresponding
Protocol stack 1205 to 1207 sends data 1202 to the receiver of node Y 1208.The receiver of node Y 1208 includes multiple
Protocol stack 1209 to 1211 and demultiplexer 1212.When the transmitter by using nodes X 1201 particular protocol stack 1205 to
When receiving data in 1207, the decoding of respective protocol stack 1209 to 1211 of the receiver of node Y 1208 is solved by demultiplexer 1212
The data for being multiplexed and being provided as output data 1213.
Figure 13 shows the flow chart of the embodiment of the method for operating distributed data base system.First step 130
In, from the data base querying received generate logic plan, then in second step 131, by the logic of generation plan with
In generation physics plan.The inside the plan traffic operation symbol of physics is determined in third step 132.In four steps 133, physics is determined
The communication pattern of inside the plan traffic operation symbol is.In 5th step 134, in the traffic operation that physics plan internal labeling determines
Symbol, generates the physics plan of extension.In 6th step 135, executable code is generated in the works from extension physics.7th step
In 136, the inside the plan data markers of extension physics are converted into communicator instruction.In 8th step 137, executable generation is executed
Code.In last step 138, distributed data base system is executed using the communicator instruction generated in the 7th step 136
Different nodes between communication.
It is also pointed out that the method that the elaboration about data processing equipment is also applied for operation distributed data base system.
Figure 14 to Figure 16 shows the acceleration to using preceding method to carry out data base querying, is based especially on two kinds of agreements
TCP and TIPC are compared.As benchmark, a kind of standard method is applied: in this case, using towards assisting end to end
View.As benchmark, very popular TPC-H decision support benchmark has been used.TPC-H decision support benchmark is by a whole set of towards industry
The temporary query and concurrent data of business modify composition.The inquiry and data for filling database are chosen to have the extensive whole industry
Correlation.The benchmark test elaborates to check mass data, executes highly complex inquiry, and mention for key business problem
For the DSS of answer.Using the scale factor 100 for generating about 100Gb data in table, show that Q8 is inquired here
Result.Figure 14 shows exemplary execution inquiry Q8's as a result, particularly, presents the Q8 without using broadcast Hash connection
As a result, to show in two methods, standard with according to the method for the present invention in the performance benefits that are connected without using broadcast Hash.It needs
It is noted that showing multiple nodes in x-axis, the execution time of inquiry is shown in y-axis.
It can be clearly seen that not considering number of nodes, solution more exemplary than two has according to the method for the present invention
Benefit, although described improvement is not so big as expected.Since in the test in Figure 14, broadcast Hash attended operation processing
Relatively small data portion, and its duration will not produce a very large impact result.In order to more accurately show this hair
Bright benefit, the acceleration that single broadcast Hash attended operation is shown in Figure 15 are compared.Here, multiple sections are depicted in x-axis
Point, and accelerated factor is shown on the y axis.
It is also very attractive to analyze adjustability problem.Therefore analyze execute the time reduce with cluster in multiple nodes it
Between dependence, be shown in FIG. 16.Processing time is shown in Figure 16, in y-axis (relative to the baseline TCP not broadcasted)
Percentage, and the quantity of node is shown in x-axis.
Main Conclusions are as follows: compared with standard method, 32 nodes can even reduce performance using broadcast Hash connection.When making
When with method of the invention, still display performance gain.
It, can be using more than the communication pattern being previously proposed and corresponding transport layer association as further alternative solution
View.Different transport layer solutions can be directed to some special service conditions or sign, for example, many-many communication mode can
The scheme of can solve can provide transport protocol, which provides fair communication between all nodes.
The method proposed can be applied to other distributed computings of one group of difference physical operator, such as reduce with before
Sew scanning.It is, for example, possible to use MPI transport layers.
Set forth below is some more thin of the embodiment realization about computer program according to a third aspect of the present invention
Section.The software frame is named as flint.Flint is distributed SQL query execution frame, allows to execute the object indicated with C++
Manage inquiry plan.It assume that the output for the code generation phase with the query execution plan that C++ writes.The following are TPC-H
The example code of benchmark Integrated query Q3 illustrates and how to realize proposed code generating method.Flint Q3 is inquired most
High-level-physics plan has following presentation form:
The data markers added in a previous step are converted to 7,12,18 and of the symbol of the special operation in previous list-row
24, communication pattern ID and total data size estimation are attached to data to be exchanged.
The basis that the inquiry of previous list is realized is Dataset base class:
Method marker () has following implementations:
So will create the example of a MarkedDataset class as call method marker () and return to a finger
To its pointer.MarkedDataset class has defined below:
The method next () of rewriting is the method next () that record pointer is passed to input data set.It is most important
It is heavily loaded getServiceInfo () method, this method now returns to pointer to the information on services for being supplied to marker () method.
Newly created information on services can be used in the duplicate stage of broadcast Hash connection.It is below Dataset class
A kind of implementation of broadcastHashJoin () method:
As marker () method, broadcastHashJoin () method also creates one
The example of BroadcastHashJoin class, and return to the pointer for being directed toward it.
During constructing BroadcastHashJoin class, created according to the inside table for copying to each node in cluster
Build the 19th row of Hash table-previous list.
It is the implementation of replicate () method and ReplicateDataset class below:
Duplication passes through the 17 rows-preparation of BroadcastJob realization-previous list and sends specified data collection:
Finally, collect relevant to MarkedDataset information on services-previous list the 12nd row and previous list
Row 20 and data to be sent send communicator to together.
Other communication patterns use identical method.For example, realizing the connection of physics plan operator Hash and resetting connection
The used ScatterJob for resetting program:
The present invention is not limited to example illustrated above.The characteristic of exemplary embodiment can be made with any advantageous combination
With.
The present invention is described in conjunction with various embodiments herein.But those skilled in the art are studied attached by the practice present invention
Figure, the present invention and the attached claims, it is to be understood that and obtain other variants of open embodiment.In claims
In, word " comprising " is not excluded for other elements or step, and " one " is not excluded for multiple.Single processor or other units can meet power
Several functions described in benefit requirement.Only this in the dependent claims being typically different is being documented in certain measures
The simple fact is not meant to that the combination of these measures cannot be used effectively.Computer program can store or be distributed to conjunction
The optical storage media or solid-state that part on suitable medium, such as together with other hardware or as other hardware provides are situated between
Matter can also for example be distributed by internet or other wired or wireless telecommunication systems in other forms.
Claims (14)
1. a kind of data processing equipment (40), which is characterized in that grasp the part for executing distributed data base system (404)
Make, comprising:
Logic planner (42), for generating logic plan based on data base querying;
Physics planner (43), for generating physics plan (70) based on the logic plan;
Marking unit (44), is used for:
Determine in physics plan (70) traffic operation symbol (601,602,604 and 605), wherein traffic operation symbol (601,602,
604 and 605) be the operator comprising communication;
Determine that traffic operation accords with (601,602,604 and based on the operator types of traffic operation symbol (601,602,604 and 605)
605) communication pattern;
The symbol of traffic operation determined by marking (601,602,604 and 605), it includes that identified communication is grasped that each operator, which has,
Accord with the data markers (71,72,73,74 and 75) of the communication pattern of (601,602,604 and 605);
Code generator (45), is used for:
Executable code (70) are generated based on physics plan;
Data markers (71,72,73,74 and 75) are converted into communicator instruction;
Code executor (46), for executing the executable code;
Communicator (47), for based on communicator instruction and other data processing equipments in distributed data base system (404)
(402 and 403) are communicated.
2. data processing equipment (40) according to claim 1, which is characterized in that data base querying is SQL query.
3. data processing equipment (40) according to claim 2, which is characterized in that the marking unit (44) is for determining
The operator that communication is realized in distributed data base system, especially as traffic operation symbol (601,602,604 and 605)
Operator, and/or mapping reduction operation symbol and/or sorting operation symbol are replicated, and/or resets attended operation symbol and/or Hash
Attended operation symbol, and/or broadcast Hash attended operation symbol, and/or merge attended operation symbol.
4. data processing equipment (40) according to claim 3, which is characterized in that
The marking unit (44) is used to distinguish one group of network communication mode based on traffic operation symbol;And/or
The marking unit (44) is for determining for replicating operator, and/or mapping reduction operation symbol and/or sorting operation
Symbol, and/or attended operation symbol and/or Hash attended operation symbol are reset, and/or broadcast Hash attended operation symbol, and/or merge
The end-to-end communication mode of attended operation symbol;And/or
The marking unit (44) is for determining for replicating operator and/or broadcasting the multicast or broadcast of Hash attended operation symbol
Communication pattern;And/or
The marking unit (44) is for determining for resetting attended operation symbol and/or Hash attended operation symbol, and/or merging
The many-many communication mode of attended operation symbol.
5. data processing equipment (40) according to any one of claim 1 to 4, which is characterized in that
The communicator (47) is ready to use in the communication protocol of each operator at least dynamically determining based on communicator instruction.
6. data processing equipment (40) according to claim 5, which is characterized in that
Data markers (71,72,73,74 and 75) further include the total amount of data to operator transmission;
The communicator (47), which is also used to dynamically determine based on the total amount of data transmitted to operator, is ready to use in each operator
Communication protocol.
7. data processing equipment (40) according to claim 5 or 6, which is characterized in that
The communicator (47) is used to be communicated based on the communication protocol determined for each operator.
8. data processing equipment (40) according to any one of claim 1 to 7, which is characterized in that
The data processing equipment (40) further includes storage unit, at least one for storing in distributed storage Database Systems
Partial data.
9. data processing equipment (40) according to any one of claim 1 to 8, which is characterized in that
The data processing equipment (40) further includes inquire-receive device (41), for receiving data library inquiry.
10. data processing equipment (40) according to any one of claim 1 to 9, which is characterized in that
The communicator (47) is used at least part of data to be processed being transferred to other data processing equipments.
11. a kind of Database Systems (404), which is characterized in that include at least according to any one of claim 1 to 10
First data processing equipment (40) and the second data processing equipment according to any one of claim 1 to 10 (402),
In,
The communicator (47) of first data processing equipment (40) is used to instruct at least and at the second data based on determining communicator
Reason equipment (402) is communicated.
12. Database Systems (404) according to claim 11, which is characterized in that
The Database Systems (404) include at least third data processing equipment (403), wherein
The communicator (47) of first data processing equipment (40) is used to instructing based on identified communicator to execute and at least the
The communication of two data processing equipments (402) and third data processing equipment (403).
13. a kind of method for operating the Database Systems including multiple data processing equipments characterized by comprising
(130) logic plan is generated based on data base querying;
Logic-based plan generates (131) physics plan (70);
Determine that the traffic operation in (132) physics plan (70) accords with (601,602,604 and 605);
Based on traffic operation symbol (601,602,604 and operator types 605) determine the communication mould of (133) traffic operation symbol
Formula;
The symbol of traffic operation determined by (134) is marked, each operator has the communication mould including identified traffic operation symbol
The data markers (71,72,73,74 and 75) of formula;
Executable code (70) are generated based on (135) physics plan;
Data markers (71,72,73,74 and 75) conversion (136) is instructed at communicator;
Execute (137) described executable code;
(138) are carried out with other data processing equipments in distributed data base system based on communicator instruction to communicate.
14. a kind of computer program with program code, which is characterized in that for working as the computer program on computers
When operation, method as claimed in claim 13 is executed.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2016/000191 WO2017176144A1 (en) | 2016-04-05 | 2016-04-05 | Data handling device, database system and method for operating a database system with efficient communication |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109416683A true CN109416683A (en) | 2019-03-01 |
CN109416683B CN109416683B (en) | 2022-04-05 |
Family
ID=57200064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680084285.9A Active CN109416683B (en) | 2016-04-05 | 2016-04-05 | Data processing apparatus, database system, and communication operation method of database system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109416683B (en) |
WO (1) | WO2017176144A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117251472A (en) * | 2023-11-16 | 2023-12-19 | 中邮消费金融有限公司 | Cross-source data processing method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102201010A (en) * | 2011-06-23 | 2011-09-28 | 清华大学 | Distributed database system without sharing structure and realizing method thereof |
US20110302583A1 (en) * | 2010-06-04 | 2011-12-08 | Yale University | Systems and methods for processing data |
CN105279286A (en) * | 2015-11-27 | 2016-01-27 | 陕西艾特信息化工程咨询有限责任公司 | Interactive large data analysis query processing method |
CN105426504A (en) * | 2015-11-27 | 2016-03-23 | 陕西艾特信息化工程咨询有限责任公司 | Distributed data analysis processing method based on memory computation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9465844B2 (en) * | 2012-04-30 | 2016-10-11 | Sap Se | Unified table query processing |
-
2016
- 2016-04-05 CN CN201680084285.9A patent/CN109416683B/en active Active
- 2016-04-05 WO PCT/RU2016/000191 patent/WO2017176144A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110302583A1 (en) * | 2010-06-04 | 2011-12-08 | Yale University | Systems and methods for processing data |
CN102201010A (en) * | 2011-06-23 | 2011-09-28 | 清华大学 | Distributed database system without sharing structure and realizing method thereof |
CN105279286A (en) * | 2015-11-27 | 2016-01-27 | 陕西艾特信息化工程咨询有限责任公司 | Interactive large data analysis query processing method |
CN105426504A (en) * | 2015-11-27 | 2016-03-23 | 陕西艾特信息化工程咨询有限责任公司 | Distributed data analysis processing method based on memory computation |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117251472A (en) * | 2023-11-16 | 2023-12-19 | 中邮消费金融有限公司 | Cross-source data processing method, device, equipment and storage medium |
CN117251472B (en) * | 2023-11-16 | 2024-02-27 | 中邮消费金融有限公司 | Cross-source data processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109416683B (en) | 2022-04-05 |
WO2017176144A1 (en) | 2017-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wood et al. | An overview of the multiagent systems engineering methodology | |
CN103309738B (en) | User job dispatching method and device | |
US7710884B2 (en) | Methods and system for dynamic reallocation of data processing resources for efficient processing of sensor data in a distributed network | |
CN103678471B (en) | The method and apparatus that subregion is carried out to the search space for distributed crawl | |
CN105450618A (en) | Operation method and operation system of big data process through API (Application Programming Interface) server | |
CN101485149A (en) | Inter-proximity communication within a rendezvous federation | |
CN101491006A (en) | Rendezvousing resource requests with corresponding resources | |
CN105075199B (en) | Straight-forward network system with multiple distributed connections to each resource | |
CN105409169B (en) | A kind of building method, the apparatus and system of multipath forward rule | |
Chen et al. | Introduction to OPNET network simulation | |
CN104917680A (en) | Concurrent hashes and sub-hashes on data streams | |
CN110226159A (en) | Best-effort traffic library facility | |
CN102034144A (en) | Group compositing algorithms for presence background | |
CN109218060B (en) | Method and device for driving flow table by service configuration | |
CN110169019A (en) | The network switch and Database Systems that database function defines | |
CN110020243A (en) | Querying method, device, Internet of Things server and the storage medium of internet of things data | |
US20050010386A1 (en) | Method and system for dynamically modeling resources | |
CN105024929A (en) | Application awareness resource management method in software defined network | |
CN109416683A (en) | The traffic operation method of data processing equipment, Database Systems and Database Systems | |
CN103686668A (en) | Data updating method, system and device | |
Audrito et al. | The share operator for field-based coordination | |
CN109474908A (en) | A kind of aeronautical Ad hoc networks method of task based access control driving | |
Bhuvaneswari et al. | Semantic web service discovery for mobile web services | |
Groß et al. | Towards a common interface for overlay network simulators | |
JP4331045B2 (en) | Database system and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |