CN109416683A - The traffic operation method of data processing equipment, Database Systems and Database Systems - Google Patents

The traffic operation method of data processing equipment, Database Systems and Database Systems Download PDF

Info

Publication number
CN109416683A
CN109416683A CN201680084285.9A CN201680084285A CN109416683A CN 109416683 A CN109416683 A CN 109416683A CN 201680084285 A CN201680084285 A CN 201680084285A CN 109416683 A CN109416683 A CN 109416683A
Authority
CN
China
Prior art keywords
data
data processing
operation symbol
processing equipment
operator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201680084285.9A
Other languages
Chinese (zh)
Other versions
CN109416683B (en
Inventor
德米特里·谢尔盖耶维奇·科尔马科夫
亚历山大·弗拉基米罗维奇·斯莱萨连科
张学仓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN109416683A publication Critical patent/CN109416683A/en
Application granted granted Critical
Publication of CN109416683B publication Critical patent/CN109416683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries

Abstract

A kind of data processing equipment (40), for executing the part operation of distributed data base system (404).The data processing equipment (40) includes: logic planner (42), for generating logic plan based on data base querying;Physics planner (43), for generating physics plan based on the logic plan;Marking unit (44), is used for: determining the inside the plan traffic operation symbol of physics, wherein traffic operation symbol is the operator comprising communication;The communication pattern of traffic operation symbol is determined based on the operator types of traffic operation symbol;The symbol of traffic operation determined by marking, each operator have the data markers of the communication pattern including identified traffic operation symbol.In addition, the data processing equipment (40) includes: code generator (45), it is used for: executable code is generated based on physics plan, and data markers are converted into communicator instruction.In addition, the data processing equipment (40) includes: code executor (46), for executing the executable code;Communicator (47), for being communicated based on communicator instruction with other data processing equipments (402 and 403) in distributed data base system (404).

Description

The traffic operation method of data processing equipment, Database Systems and Database Systems
Technical field
The present invention relates to computer software engineering fields, more particularly, to distributed data base system.
Background technique
Distributed data base system has multiple and different nodes, and the node is also referred to as data processing equipment.Execute data When library inquiry, communication will do it between these different nodes.Especially in the Database Systems with multiple nodes, this communication The bottleneck of Database Systems may be become.
As shown in Figure 1, a SQL query execution pipeline can be divided into many steps.
1, plan: in first step 12, inquiry plain text 11 is converted to logic plan, positioned at the tree-shaped of query pipeline Figure is intermediate.Logic plans optimised in second step 13 and is converted into physics plan, it is also contemplated that data parameters optimize. Physics plan is made of physics plan operator, represents the certain basic behaviour carried out according to database low-level interface to data set Make.
2, in third step 14, executable code code building: is generated based on physics plan.This improves data base set The performance of system.This method is used for some example frames of distributed SQL inquiry operation: SparkSQL, Cassandra etc..
3, it executes: in four steps 15, executing the code prepared in step 14 previous.If it is distributed data Library, then on the work station cluster connected in a network it is synchronous execute it is primary.
The data 20 being made of data block 21 to 23 are depicted in Fig. 2 and using the executive plan 24 of multiple nodes 25 to 27 Between connection, particularly illustrate different data blocks 21 to 23 and be stored on different nodes 25 to 27, and only these not It is interacted between same node 25 to 27.Executive plan 24 can bring result 28.
When executing distributed query, data to be processed are spread in cluster, so that every machine only stores a part Set of source data.Nevertheless, the data exchange between clustered node may be needed to some operations of data set, such as based on whole A data set executes the set operation of accumulation single value.This network communication may will be greatly reduced the performance of Database Systems, Either it is related to large data sets still not execute in the best way.
Different SQL physical layer operations symbols produce the network flow that can match different communication modes.These communication moulds Formula is as shown in Fig. 3 a to Fig. 3 c.Fig. 3 a shows the end-to-end communication mode between two nodes 30.It is shown in Fig. 3 b multiple Multicast communication mode between node 31, wherein communication terminates at multiple nodes 31 from node 31 together.It is shown in Fig. 3 c Many-many communication mode between multiple nodes 32, wherein each node can under this communication pattern with other each sections Point is communicated.
All these modes are all widely used in distributed query execution.Multicast is replicated for data;Many-many communication mode For resetting;End-to-end mode is used as the basis of all other type communication.The exemplary solution of distributed query operation It does not distinguish between these modes.However, network performance is heavily dependent on the realization of communication pattern, dedicated transmissions association View may have better performance to certain specific communication patterns.
The example protocol TCP that can be used for transport layer is highly suitable for end-to-end communication, because the communication in agreement thus is It is executed by the end to end connection being previously created.The multicast communication mode low efficiency realized using TCP, this is because identical Data should be transmitted several times, i.e., each destination is transmitted to by connection accordingly, so a large amount of repetitive streams can be generated in a network Amount.
Spark frame realizes broadcast communication by using the BitTorrent application layer protocol for being still based on TCP.This side Method can accelerate to broadcast, but have several drawbacks in that:
1, nevertheless, some nodes can receive the broadcast data from adjacent node, reduce the negative of sending node The problem of carrying, but not can solve the data packet replicated under normal circumstances.Network should transmit identical as the quantity of nodes Data packet.
2, the Additional Agreement acted at the top of transport layer leads to additional expense, this will seriously affect the broadcast of small message Energy.
In contrast, the primary support multicast communication mode of TIPC transport layer protocol, and show to compare TCP in such a mode Good many performances.But the end to end performance of TIPC ratio TCP is poor, therefore on the problem of should using which agreement not Specific answer.
We have found that there are following three main problems for exemplary solution:
1, exemplary solution is based on the single networking transport agreement selected by certain parameter static state.This method is brought The expense of data exchange in communication pattern, and selected agreement can not be used preferably.
2, puppy parc is intended to be suitable for all possible situation.It will lead to using puppy parc and caused by agreement versatility Expense, even if it is only used for several use patterns.
3, the additional logic at the top of transport layer will lead to additional expense.
Summary of the invention
Therefore, the purpose of the present invention is to provide a kind of the permission equipment of efficient communication and sides in distributed data base Method.
The purpose is realized by the feature of the claim 13 of the claim 1 and the method for described device.In addition, also The purpose is realized by the feature of claim 14 associated with computer program.Dependent claims include further to send out Exhibition.
First aspect present invention provides a kind of data processing equipment, and the part for executing distributed data base system is grasped Make.The data processing equipment includes: logic planner, for generating logic plan based on data base querying;Physics planner is used In based on logic plan generation physics plan;Marking unit determines the inside the plan traffic operation symbol of physics, wherein communication behaviour Making symbol is the operator comprising communication;The communication pattern of traffic operation symbol is determined based on the operator types of traffic operation symbol;Mark The symbol of traffic operation determined by remembering, each operator have the data mark of the communication pattern including identified traffic operation symbol Note.In addition, data processing equipment includes code generator, for generating executable code based on physics plan, and by data mark Note is converted to communicator instruction;Communicator, at based on other data in communicator instruction and distributed data base system Reason equipment is communicated.Therefore communication task and the operation of routine data library can be separated, thus realize very efficient communication, To realize very efficient database manipulation.
In the first implementation of first aspect, data base querying is SQL query.In this way allow using easily obtaining Database component.
In first aspect in a kind of a kind of implementation of implementation, the marking unit is for determining in distribution The operator that communication is realized in Database Systems, especially as the duplication operator of traffic operation symbol, and/or mapping reduction behaviour It accords with and/or sorting operation accords with, and/or reset attended operation symbol and/or Hash attended operation symbol, and/or broadcast Hash company Operator is connect, and/or merges attended operation symbol.Allow very efficient database manipulation in this way.
In first aspect in a kind of a kind of implementation of implementation, the marking unit is used to be based on traffic operation Symbol distinguishes one group of network communication mode.Additionally or alternatively, the marking unit is for determining for replicating operator, and/ Or mapping reduction operation symbol and/or sorting operation accord with, and/or reset attended operation symbol and/or Hash attended operation symbol, and/ Or broadcast Hash attended operation symbol, and/or merge the end-to-end communication mode of attended operation symbol.Additionally or alternatively, described Marking unit is for determining for replicating operator and/or broadcasting multicast or the broadcast communication mode of Hash attended operation symbol.Separately Other places or optionally, the marking unit for determine for reset attended operation symbol and/or Hash attended operation symbol, and/or Merge the many-many communication mode of attended operation symbol.Therefore, it can be achieved that the high flexible of the communication pattern based on operator selects.
In another implementation of first aspect or above-mentioned shown implementation, the communicator is for being at least based on Communicator instruction dynamically determines the communication protocol for being ready to use in each operator.Extremely efficient database behaviour may be implemented in this way Make.
In a kind of a kind of upper implementation of implementation, data markers further include to operator transmission data it is total Amount;The communicator is also used to dynamically determine the communication for being ready to use in each operator based on the total amount of data transmitted to operator Agreement.Most suitable communication protocol can be preferably selected in this way, to improve the operating efficiency of database.
In another implementation of first two implementation, the communicator is used for based on true for each operator Fixed communication protocol is communicated.Allow extremely efficient database manipulation in this way.
In another implementation of first aspect or above-mentioned shown first aspect implementation, the data processing is set Standby further includes storage unit, at least part data for storing in distributed storage Database Systems.By in distribution Data are divided between different data processing equipment in Database Systems, realize extremely efficient database manipulation.
In another implementation of first aspect or above-mentioned shown first aspect implementation, the data processing is set Standby further includes inquire-receive device, for receiving data library inquiry.Allow to handle standardized data base querying in this way.
In another implementation of first aspect or any of the above-described kind of implementation, the communicator is used at least to At least part of data to be processed is transferred to other data processing equipments.It in this way can not be in all data processing equipments All data are stored, to save memory space.
Second aspect of the present invention provides a kind of Database Systems, includes at least according to first aspect or according to first aspect First data processing equipment of any implementation and according to first aspect or according to the first of any implementation of first aspect Second data processing equipment of aspect.The communicator of first data processing equipment be used for based on determining communicator instruction at least with Second data processing equipment is communicated.It is thus achieved that extremely efficient Database Systems.
In a kind of implementation of second aspect, the Database Systems include at least third data processing equipment.The The communicator of one data processing equipment is used to instruct based on identified communicator to execute and at least the second data processing equipment With the communication of third data processing equipment.It is thus achieved that extremely efficient Database Systems.
Third aspect present invention provides a kind of for operating the side of the Database Systems including multiple data processing equipments Method.This method comprises: generating logic plan based on data base querying;Logic-based plan generates physics plan;Determine physics meter Traffic operation symbol in drawing;The communication pattern of traffic operation symbol is determined based on the operator types of traffic operation symbol;Label institute is really Fixed traffic operation symbol, wherein each operator has the data markers of the communication pattern including identified traffic operation symbol. In addition, this method comprises: generating executable code based on physics plan;Data markers are converted into communicator instruction;Execute institute State executable code;It is instructed based on communicator and is communicated with other data processing equipments in distributed data base system.By This realizes the extremely efficient operation of distributed data base system.
Fourth aspect present invention provides a kind of computer program with program code, for working as the computer program When running on computers, the method for executing third aspect present invention, to realize extremely efficient database manipulation.
In short, proposing a kind of realization distributed networks database query execution pipeline net to solve the problems, such as above-mentioned discovery The new method of network communication.Advantageously execute following steps:
1, planning phase extension is as follows:
Identify and mark the node (physical operator) of the inquiry physics plan comprising communication.
2, code generation phase extension is as follows:
According to the data markers added in step 1, adds data packet and is filled with the creation other message of application-level, comprising: (a) communication mode identifier;(b) additional service information;(c) data to be exchanged.
3, it is as follows to execute stage extension:
A. communicator generalized Petri net query execution system is used, which encapsulates all biographies to use at runtime Defeated layer protocol.
B. communication mode identifier and additional service information are sent to communicator together with data.
4, optimal communication agreement of the dynamic select for the data exchange in specified communication mode.
Usually, it should be noted that all arrangements, equipment, component, unit and device described in this application etc. can be by soft Part or hardware element or its any kind of combination are realized.In addition, equipment can be processor or may include processor, In the function of element described in this application, unit and device can be realized by one or more processors.It is described in the application Various entities performed by all steps and by the function that various entities execute be intended to indicate that corresponding entity be suitable for or For executing corresponding step and function.Even if be described below or specific embodiment in, specific function performed by general entity Energy or step will not be reflected in the description for the specific detailed elements of the entity for executing the particular step or function, technology people Member, which should understand, to realize these methods and function by software or hardware element or its any kind of combination.
Detailed description of the invention
The present invention is elaborated below with respect to the embodiment of the present invention and with reference to attached drawing, in the accompanying drawings:
Fig. 1 shows exemplary database query execution assembly line;
Fig. 2 shows exemplary distributed data processing;
Fig. 3 a shows illustrative end-to-end communication mode;
Fig. 3 b shows illustrative multicast communication mode;
Fig. 3 c shows illustrative many-many communication mode;
Fig. 4 shows the first embodiment of data processing equipment according to a first aspect of the present invention;
The extension for the planning phase that the second embodiment that Fig. 5 shows first aspect present invention uses;
Fig. 6 shows the exemplary object generated in the 3rd embodiment of data processing equipment according to a first aspect of the present invention Reason plan;
Fig. 7 shows data markers used by the fourth embodiment of data processing equipment according to a first aspect of the present invention The exemplary physical plan of extension;
Fig. 8 shows the details of the 5th embodiment of data processing equipment according to a first aspect of the present invention;
Fig. 9 shows the example data that the sixth embodiment of data processing equipment according to a first aspect of the present invention uses Library inquiry implementation procedure;
Figure 10, which is shown, passes through tool used in the 7th embodiment of data processing equipment according to a first aspect of the present invention There is the communication of the heterogeneous networks of multiple transport layer protocols;
Figure 11 shows the details of the 8th embodiment of data processing equipment according to a first aspect of the present invention;
Figure 12 shows the multiplexing of the exemplary service between the different nodes in distributed data base system;
Figure 13 shows the first embodiment of the operating method of Database Systems according to a third aspect of the present invention;
Figure 14 is shown to entire test cases using data processing equipment according to a first aspect of the present invention or according to this Achievable result when the operating method of the distributed data base system of the invention third aspect;
Figure 15, which is shown, uses data processing equipment according to a first aspect of the present invention to single broadcast Hash attended operation Or distributed data base system according to a third aspect of the present invention operating method when achievable result;
Figure 16, which is shown, uses data processing according to a first aspect of the present invention to processing time (% relative to baseline) Achievable result when the operating method of equipment or distributed data base system according to a third aspect of the present invention.
Specific embodiment
Firstly, Fig. 1 and Fig. 2 show the function of illustrating exemplary distributed data library system.Fig. 3 a to Fig. 3 c describes difference Communication pattern.In conjunction with Fig. 4 to Figure 12, the different embodiments of data processing equipment according to the first aspect of the invention are carried out Description.The function of the method for Figure 13 detailed description according to a third aspect of the present invention.Finally, Figure 14 to Figure 16 show it is achievable Efficiency increase.
Similar entity and reference number partially omit in different figures.
Term and its meaning are as described below:
Fig. 4 shows the first embodiment of the Database Systems 404 of second aspect of the present invention, including first aspect present invention Data processing equipment 40 first embodiment.Data processing equipment 40 includes inquire-receive device 41, which connects It is connected to logic planner 42, logic planner 42 again connects to physics planner 43.It is single that physics planner 43 is connected to label Member 44, marking unit 44 is connected to code generator 45.In addition, code generator 45 is connected to code executor 46, code is held Row device 46 is connected to communicator 47.All units 41 to 47 are connected to control unit 48.In addition, communicator 47 is connected to network 401, network 401 is connected to other data processing equipments 402 and 403.Network 401 and data processing equipment 402 and 403 are not several Distributed data base system 404 is formed according to a part of processing equipment 40, but with data processing equipment 40.Control unit 48 controls The function of all other unit of data processing equipment 40.
In distributed data base system, inquire-receive device 41 receives data base querying, especially SQL query.Inquire quilt It handles and passes to logic planner 42.Logic planner 42 is based on data base querying and generates logic plan.The logic plan quilt Physics planner 43 is passed to, physics planner 43 generates physics plan from logic in the works.Physics scheduled transfer is single to label Member, the marking unit determine the traffic operation symbol of physics in the works, and traffic operation symbol is the operator comprising communication.Then it marks The operator types that unit 44 is accorded with based on traffic operation determine the communication pattern of traffic operation symbol.Particularly, marking unit determines The operator of communication is realized in distributed data base system, be especially intended to traffic operation symbol duplication operator, and/ Or mapping reduction operation symbol and/or sorting operation accord with, and/or reset attended operation symbol and/or Hash attended operation symbol, and/ Or broadcast Hash attended operation symbol, and/or merge attended operation symbol.Finally, the traffic operation symbol that marking unit label is determining, often A operator all has the data markers of the communication pattern of the determination including traffic operation symbol.
In addition, marking unit 44, which is based on traffic operation symbol, distinguishes one group of network communication mode.Particularly, marking unit determines For replicating operator, and/or mapping reduction operation symbol and/or sorting operation symbol, and/or attended operation symbol is reset, and/or Hash attended operation symbol, and/or broadcast Hash attended operation symbol, and/or merge the end-to-end communication mode of attended operation symbol.Together When, marking unit is determined for replicating operator and/or broadcasting multicast or the broadcast communication mode of Hash attended operation symbol.This Outside, marking unit 44 is for determining for resetting attended operation symbol and/or Hash attended operation symbol, and/or merging attended operation The many-many communication mode of symbol.
Then by the physics scheduled transfer of label to code generator, code generator generates executable according to physics plan Data markers are simultaneously converted to communicator instruction by code.The code that the instruction of these communicators is delivered to execution executable code is held Row device 46.It is communicated in addition, communicator 47 is based on communicator instruction with other data processing equipments 402 and 403.
In addition, marking unit 44 marks the total amount of data transmitted to operator.Then, communicator 47 is based on number to be sent The communication protocol for each operator is determined according to total amount and based on communication pattern.
SQL is the actual standard of data bank access method.The following are the SQL Q3 inquiries in example T PC-H benchmark test Example:
SQL query text is handled by database engine, and SQL query text conversion is by database engine in the first step The referred to as tree-shaped display diagram of logic plan.Later, database engine executes logic optimization, generates the logic plan of optimization, then Be converted to the basis low level data library API.The plan is known as physics plan, it is also contemplated that database physical parameter optimizes.
The leaf of physics plan tree indicates data source, and node is physics plan operator, indicates on relational database not Same basic operation.
Kinematic chains finally increase an additional physics plan processing step, are shown specifically in Fig. 5.Input 50 by Physics plan 51 and data association message 52 form.The input 50 is delivered to the functional block 53 for constituting the expansion plan stage.It should Functional block 53 includes that detection communicates relevant physics plan operator 54.As for identification communicate relational operator input, The corresponding relationship of the operator and communication pattern that are stored in knowledge base 56 is used as input.In addition, the expansion plan stage 53 also wraps Include the operator that label detects respectively in functional block 55.Finally, the physics of extension is intended to be 57 transmitting of output.
Fig. 6 shows exemplary physical plan 60.Physics plan 60 includes multiple operators 601 to 614.Traffic operation symbol 601,603,604 and 605 dashed lines labeled is used, and these operators 601,603,604 and 605 is detected.
Fig. 7 shows illustrative extension physics plan 70.Here, additional data label 71,72,73,74 and 75 is collected At into extension physics plan 70.Each of data markers 71 to 75 are all comprising the letter about used communication pattern Breath.Other information can store in data markers.
Above-mentioned data markers and relevant physics plan operator have it is stringent contact, and convey about communication pattern Information, such as its communication pattern ID, to be used for further data exchange.
It is selected using such as following table communication pattern ID associated to specific physics plan operator.The table defines Corresponding relationship between physics plan operator and communication pattern.
Fig. 8 shows the detailed content of code generation phase.The extension physics plan 81 that the marking unit 44 of Fig. 4 generates is used It inputs.The extension physics plan is delivered to code generator 82, and code generator 82 corresponds to the code generator of Fig. 4 45.Code generator 82 uses the information being stored in code building library 84.Particularly, code generator 82 is used for physics One group of transformation rule of plan operator 85 and including data markers converter and for existing method modification one group of library modification. As output, executable code 83 is generated by code generator 82.
A kind of possible exemplary codes generation method is that physics plan is converted to the code write with all-purpose language, example Such as C++.The advantages of this method is: the code of generation can be compiled as executable generation by the additional optimizations that compiler provides Code, then executes code building by special module-code generator 82, and code generator 82 is by the tree-shaped exhibition of physics plan Diagram is converted to executable plain code.The extension physics plan that previous step generates includes novel physical plan operator-data mark Note, the label are also converted in executable code.Therefore code generator is by being used for data markers physics schedules operations The converter of symbol is extended.
Other than the converter for data markers, it is also necessary to which modification is used for the conversion of existing physics plan operator Device, to provide the access to communications related data for communication layers.
Exemplary prototype query execution system is known as Flint.Flint allows to execute the inquiry physics plan indicated with C++, Code generation phase output can be considered as.Being described in detail in for this method is forth below.
In the stage of execution, the program of generation is run on distributed type assemblies.All node synchronization process one in cluster Divide input data, it means that they execute identical operation to data in synchronization.Fig. 9 shows one for cluster The execution 90 of specific label physical operator is presented in the SQL query implementation procedure of separate nodes, the figure presented.
As input, data 91 and data markers 92 to be processed are used.Data markers 92 include communication-related information 98, Such as communication pattern ID and additional service information.In addition, the executable code 93 to generation is handled, to carry out to local Data 94, the processing of data to be exchanged 95 and result data 96.In addition, data to be exchanged 95 is transmitted by communicator 99, the communication Device 99 corresponds to the communicator 47 of Fig. 4.After having handled executable code 93, next operator 97 is handled.
Figure 10 shows the processing of communicator progress.It includes actuator application program 102 that code, which executes 101, uses communication Device 103 is extended.The communicator 103 encapsulates all transport layer protocols 104,105 and 106 ready for use at runtime.Transmission Each of agreement 103,104 and 105 forms network 1001, and they between all clustered nodes 107,108 and 109 Addressing be clearly converted into the addressing used by application program.For this target, communicator 103 forms a conversion Table stores the corresponding relationship between application layer and transport Layer address.
Application layer address The address TCP .... The address TIPC
0 192.168.1.0:5555 .... 1.1.1:(100,1)
1 192.168.1.1:5555 .... 1.1.2:(100,2)
... ... .... ...
N 192.168.1.N:5555 .... 1.1.3:(100,2)
Communicator 103 can be based on any number of transport layer protocol 104,105 and 106, it might even be possible to encapsulate other answer Use layer protocol.
Figure 11 shows the extensive side of the communicator 113 of the communicator 47 of the communicator 103 and Fig. 4 corresponding to Figure 10 Case.Communication-related information 111 associated with input data 110 is for following communication protocols selection 114.Communication pattern is based on depositing The information stored up in knowledge base 116 is selected.Selected agreement ID is transmitted to transmitter 115 together with data.Receiver 117 reception data do not need any special operation.The data received are only only transmitted to only higher agreement as output data 112 Layer.
To select communication protocol, all data to be exchanged are transmitted to communicator 113 from application program.The data can be with Include or not comprising relative additional information.If data do not have information on services, number is transmitted using default transport protocol According to.The following table shows corresponding relationships possible between identified communication pattern and obtained communication protocol.
If data indicate additional information, use and the more matched transport protocol transmission data of data exchange communications mode. Determine what transport protocol to be related to following aspect using:
The static knowledge about transport protocol being obtained ahead of time;
Dynamic cata exchange parameter, such as: total amount of data, optimization rank in communication pattern ID, mode.
Finally, the communication protocol determined is transmitted for data.
Particularly, in transmitting terminal, according to agreement ID, the flow that application program generates is multiple between the transport layer protocol of support With.The data received under receiving end, different agreement are merged into one and are streamed to corresponding application program, do not need by Any transmission relevant information is transmitted to higher protocol layer, so that data transmit as it is, without any relative Added field.
The transmitter of nodes X 1201 and the receiver of node Y 1208 are shown in Figure 12.The transmitter of nodes X includes Multiplexer 1204 and multiple protocol stacks 1205,1206 and 1207.Input data 1202 and corresponding agreement ID 1203 are sent to Multiplexer 1204, multiplexer 1204 are based on agreement ID 1203 and select corresponding protocol stack 1205 to 1207, and using corresponding Protocol stack 1205 to 1207 sends data 1202 to the receiver of node Y 1208.The receiver of node Y 1208 includes multiple Protocol stack 1209 to 1211 and demultiplexer 1212.When the transmitter by using nodes X 1201 particular protocol stack 1205 to When receiving data in 1207, the decoding of respective protocol stack 1209 to 1211 of the receiver of node Y 1208 is solved by demultiplexer 1212 The data for being multiplexed and being provided as output data 1213.
Figure 13 shows the flow chart of the embodiment of the method for operating distributed data base system.First step 130 In, from the data base querying received generate logic plan, then in second step 131, by the logic of generation plan with In generation physics plan.The inside the plan traffic operation symbol of physics is determined in third step 132.In four steps 133, physics is determined The communication pattern of inside the plan traffic operation symbol is.In 5th step 134, in the traffic operation that physics plan internal labeling determines Symbol, generates the physics plan of extension.In 6th step 135, executable code is generated in the works from extension physics.7th step In 136, the inside the plan data markers of extension physics are converted into communicator instruction.In 8th step 137, executable generation is executed Code.In last step 138, distributed data base system is executed using the communicator instruction generated in the 7th step 136 Different nodes between communication.
It is also pointed out that the method that the elaboration about data processing equipment is also applied for operation distributed data base system.
Figure 14 to Figure 16 shows the acceleration to using preceding method to carry out data base querying, is based especially on two kinds of agreements TCP and TIPC are compared.As benchmark, a kind of standard method is applied: in this case, using towards assisting end to end View.As benchmark, very popular TPC-H decision support benchmark has been used.TPC-H decision support benchmark is by a whole set of towards industry The temporary query and concurrent data of business modify composition.The inquiry and data for filling database are chosen to have the extensive whole industry Correlation.The benchmark test elaborates to check mass data, executes highly complex inquiry, and mention for key business problem For the DSS of answer.Using the scale factor 100 for generating about 100Gb data in table, show that Q8 is inquired here Result.Figure 14 shows exemplary execution inquiry Q8's as a result, particularly, presents the Q8 without using broadcast Hash connection As a result, to show in two methods, standard with according to the method for the present invention in the performance benefits that are connected without using broadcast Hash.It needs It is noted that showing multiple nodes in x-axis, the execution time of inquiry is shown in y-axis.
It can be clearly seen that not considering number of nodes, solution more exemplary than two has according to the method for the present invention Benefit, although described improvement is not so big as expected.Since in the test in Figure 14, broadcast Hash attended operation processing Relatively small data portion, and its duration will not produce a very large impact result.In order to more accurately show this hair Bright benefit, the acceleration that single broadcast Hash attended operation is shown in Figure 15 are compared.Here, multiple sections are depicted in x-axis Point, and accelerated factor is shown on the y axis.
It is also very attractive to analyze adjustability problem.Therefore analyze execute the time reduce with cluster in multiple nodes it Between dependence, be shown in FIG. 16.Processing time is shown in Figure 16, in y-axis (relative to the baseline TCP not broadcasted) Percentage, and the quantity of node is shown in x-axis.
Main Conclusions are as follows: compared with standard method, 32 nodes can even reduce performance using broadcast Hash connection.When making When with method of the invention, still display performance gain.
It, can be using more than the communication pattern being previously proposed and corresponding transport layer association as further alternative solution View.Different transport layer solutions can be directed to some special service conditions or sign, for example, many-many communication mode can The scheme of can solve can provide transport protocol, which provides fair communication between all nodes.
The method proposed can be applied to other distributed computings of one group of difference physical operator, such as reduce with before Sew scanning.It is, for example, possible to use MPI transport layers.
Set forth below is some more thin of the embodiment realization about computer program according to a third aspect of the present invention Section.The software frame is named as flint.Flint is distributed SQL query execution frame, allows to execute the object indicated with C++ Manage inquiry plan.It assume that the output for the code generation phase with the query execution plan that C++ writes.The following are TPC-H The example code of benchmark Integrated query Q3 illustrates and how to realize proposed code generating method.Flint Q3 is inquired most High-level-physics plan has following presentation form:
The data markers added in a previous step are converted to 7,12,18 and of the symbol of the special operation in previous list-row 24, communication pattern ID and total data size estimation are attached to data to be exchanged.
The basis that the inquiry of previous list is realized is Dataset base class:
Method marker () has following implementations:
So will create the example of a MarkedDataset class as call method marker () and return to a finger To its pointer.MarkedDataset class has defined below:
The method next () of rewriting is the method next () that record pointer is passed to input data set.It is most important It is heavily loaded getServiceInfo () method, this method now returns to pointer to the information on services for being supplied to marker () method.
Newly created information on services can be used in the duplicate stage of broadcast Hash connection.It is below Dataset class A kind of implementation of broadcastHashJoin () method:
As marker () method, broadcastHashJoin () method also creates one The example of BroadcastHashJoin class, and return to the pointer for being directed toward it.
During constructing BroadcastHashJoin class, created according to the inside table for copying to each node in cluster Build the 19th row of Hash table-previous list.
It is the implementation of replicate () method and ReplicateDataset class below:
Duplication passes through the 17 rows-preparation of BroadcastJob realization-previous list and sends specified data collection:
Finally, collect relevant to MarkedDataset information on services-previous list the 12nd row and previous list Row 20 and data to be sent send communicator to together.
Other communication patterns use identical method.For example, realizing the connection of physics plan operator Hash and resetting connection The used ScatterJob for resetting program:
The present invention is not limited to example illustrated above.The characteristic of exemplary embodiment can be made with any advantageous combination With.
The present invention is described in conjunction with various embodiments herein.But those skilled in the art are studied attached by the practice present invention Figure, the present invention and the attached claims, it is to be understood that and obtain other variants of open embodiment.In claims In, word " comprising " is not excluded for other elements or step, and " one " is not excluded for multiple.Single processor or other units can meet power Several functions described in benefit requirement.Only this in the dependent claims being typically different is being documented in certain measures The simple fact is not meant to that the combination of these measures cannot be used effectively.Computer program can store or be distributed to conjunction The optical storage media or solid-state that part on suitable medium, such as together with other hardware or as other hardware provides are situated between Matter can also for example be distributed by internet or other wired or wireless telecommunication systems in other forms.

Claims (14)

1. a kind of data processing equipment (40), which is characterized in that grasp the part for executing distributed data base system (404) Make, comprising:
Logic planner (42), for generating logic plan based on data base querying;
Physics planner (43), for generating physics plan (70) based on the logic plan;
Marking unit (44), is used for:
Determine in physics plan (70) traffic operation symbol (601,602,604 and 605), wherein traffic operation symbol (601,602, 604 and 605) be the operator comprising communication;
Determine that traffic operation accords with (601,602,604 and based on the operator types of traffic operation symbol (601,602,604 and 605) 605) communication pattern;
The symbol of traffic operation determined by marking (601,602,604 and 605), it includes that identified communication is grasped that each operator, which has, Accord with the data markers (71,72,73,74 and 75) of the communication pattern of (601,602,604 and 605);
Code generator (45), is used for:
Executable code (70) are generated based on physics plan;
Data markers (71,72,73,74 and 75) are converted into communicator instruction;
Code executor (46), for executing the executable code;
Communicator (47), for based on communicator instruction and other data processing equipments in distributed data base system (404) (402 and 403) are communicated.
2. data processing equipment (40) according to claim 1, which is characterized in that data base querying is SQL query.
3. data processing equipment (40) according to claim 2, which is characterized in that the marking unit (44) is for determining The operator that communication is realized in distributed data base system, especially as traffic operation symbol (601,602,604 and 605) Operator, and/or mapping reduction operation symbol and/or sorting operation symbol are replicated, and/or resets attended operation symbol and/or Hash Attended operation symbol, and/or broadcast Hash attended operation symbol, and/or merge attended operation symbol.
4. data processing equipment (40) according to claim 3, which is characterized in that
The marking unit (44) is used to distinguish one group of network communication mode based on traffic operation symbol;And/or
The marking unit (44) is for determining for replicating operator, and/or mapping reduction operation symbol and/or sorting operation Symbol, and/or attended operation symbol and/or Hash attended operation symbol are reset, and/or broadcast Hash attended operation symbol, and/or merge The end-to-end communication mode of attended operation symbol;And/or
The marking unit (44) is for determining for replicating operator and/or broadcasting the multicast or broadcast of Hash attended operation symbol Communication pattern;And/or
The marking unit (44) is for determining for resetting attended operation symbol and/or Hash attended operation symbol, and/or merging The many-many communication mode of attended operation symbol.
5. data processing equipment (40) according to any one of claim 1 to 4, which is characterized in that
The communicator (47) is ready to use in the communication protocol of each operator at least dynamically determining based on communicator instruction.
6. data processing equipment (40) according to claim 5, which is characterized in that
Data markers (71,72,73,74 and 75) further include the total amount of data to operator transmission;
The communicator (47), which is also used to dynamically determine based on the total amount of data transmitted to operator, is ready to use in each operator Communication protocol.
7. data processing equipment (40) according to claim 5 or 6, which is characterized in that
The communicator (47) is used to be communicated based on the communication protocol determined for each operator.
8. data processing equipment (40) according to any one of claim 1 to 7, which is characterized in that
The data processing equipment (40) further includes storage unit, at least one for storing in distributed storage Database Systems Partial data.
9. data processing equipment (40) according to any one of claim 1 to 8, which is characterized in that
The data processing equipment (40) further includes inquire-receive device (41), for receiving data library inquiry.
10. data processing equipment (40) according to any one of claim 1 to 9, which is characterized in that
The communicator (47) is used at least part of data to be processed being transferred to other data processing equipments.
11. a kind of Database Systems (404), which is characterized in that include at least according to any one of claim 1 to 10 First data processing equipment (40) and the second data processing equipment according to any one of claim 1 to 10 (402), In,
The communicator (47) of first data processing equipment (40) is used to instruct at least and at the second data based on determining communicator Reason equipment (402) is communicated.
12. Database Systems (404) according to claim 11, which is characterized in that
The Database Systems (404) include at least third data processing equipment (403), wherein
The communicator (47) of first data processing equipment (40) is used to instructing based on identified communicator to execute and at least the The communication of two data processing equipments (402) and third data processing equipment (403).
13. a kind of method for operating the Database Systems including multiple data processing equipments characterized by comprising
(130) logic plan is generated based on data base querying;
Logic-based plan generates (131) physics plan (70);
Determine that the traffic operation in (132) physics plan (70) accords with (601,602,604 and 605);
Based on traffic operation symbol (601,602,604 and operator types 605) determine the communication mould of (133) traffic operation symbol Formula;
The symbol of traffic operation determined by (134) is marked, each operator has the communication mould including identified traffic operation symbol The data markers (71,72,73,74 and 75) of formula;
Executable code (70) are generated based on (135) physics plan;
Data markers (71,72,73,74 and 75) conversion (136) is instructed at communicator;
Execute (137) described executable code;
(138) are carried out with other data processing equipments in distributed data base system based on communicator instruction to communicate.
14. a kind of computer program with program code, which is characterized in that for working as the computer program on computers When operation, method as claimed in claim 13 is executed.
CN201680084285.9A 2016-04-05 2016-04-05 Data processing apparatus, database system, and communication operation method of database system Active CN109416683B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2016/000191 WO2017176144A1 (en) 2016-04-05 2016-04-05 Data handling device, database system and method for operating a database system with efficient communication

Publications (2)

Publication Number Publication Date
CN109416683A true CN109416683A (en) 2019-03-01
CN109416683B CN109416683B (en) 2022-04-05

Family

ID=57200064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680084285.9A Active CN109416683B (en) 2016-04-05 2016-04-05 Data processing apparatus, database system, and communication operation method of database system

Country Status (2)

Country Link
CN (1) CN109416683B (en)
WO (1) WO2017176144A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251472A (en) * 2023-11-16 2023-12-19 中邮消费金融有限公司 Cross-source data processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201010A (en) * 2011-06-23 2011-09-28 清华大学 Distributed database system without sharing structure and realizing method thereof
US20110302583A1 (en) * 2010-06-04 2011-12-08 Yale University Systems and methods for processing data
CN105279286A (en) * 2015-11-27 2016-01-27 陕西艾特信息化工程咨询有限责任公司 Interactive large data analysis query processing method
CN105426504A (en) * 2015-11-27 2016-03-23 陕西艾特信息化工程咨询有限责任公司 Distributed data analysis processing method based on memory computation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9465844B2 (en) * 2012-04-30 2016-10-11 Sap Se Unified table query processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110302583A1 (en) * 2010-06-04 2011-12-08 Yale University Systems and methods for processing data
CN102201010A (en) * 2011-06-23 2011-09-28 清华大学 Distributed database system without sharing structure and realizing method thereof
CN105279286A (en) * 2015-11-27 2016-01-27 陕西艾特信息化工程咨询有限责任公司 Interactive large data analysis query processing method
CN105426504A (en) * 2015-11-27 2016-03-23 陕西艾特信息化工程咨询有限责任公司 Distributed data analysis processing method based on memory computation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251472A (en) * 2023-11-16 2023-12-19 中邮消费金融有限公司 Cross-source data processing method, device, equipment and storage medium
CN117251472B (en) * 2023-11-16 2024-02-27 中邮消费金融有限公司 Cross-source data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN109416683B (en) 2022-04-05
WO2017176144A1 (en) 2017-10-12

Similar Documents

Publication Publication Date Title
Wood et al. An overview of the multiagent systems engineering methodology
CN103309738B (en) User job dispatching method and device
US7710884B2 (en) Methods and system for dynamic reallocation of data processing resources for efficient processing of sensor data in a distributed network
CN103678471B (en) The method and apparatus that subregion is carried out to the search space for distributed crawl
CN105450618A (en) Operation method and operation system of big data process through API (Application Programming Interface) server
CN101485149A (en) Inter-proximity communication within a rendezvous federation
CN101491006A (en) Rendezvousing resource requests with corresponding resources
CN105075199B (en) Straight-forward network system with multiple distributed connections to each resource
CN105409169B (en) A kind of building method, the apparatus and system of multipath forward rule
Chen et al. Introduction to OPNET network simulation
CN104917680A (en) Concurrent hashes and sub-hashes on data streams
CN110226159A (en) Best-effort traffic library facility
CN102034144A (en) Group compositing algorithms for presence background
CN109218060B (en) Method and device for driving flow table by service configuration
CN110169019A (en) The network switch and Database Systems that database function defines
CN110020243A (en) Querying method, device, Internet of Things server and the storage medium of internet of things data
US20050010386A1 (en) Method and system for dynamically modeling resources
CN105024929A (en) Application awareness resource management method in software defined network
CN109416683A (en) The traffic operation method of data processing equipment, Database Systems and Database Systems
CN103686668A (en) Data updating method, system and device
Audrito et al. The share operator for field-based coordination
CN109474908A (en) A kind of aeronautical Ad hoc networks method of task based access control driving
Bhuvaneswari et al. Semantic web service discovery for mobile web services
Groß et al. Towards a common interface for overlay network simulators
JP4331045B2 (en) Database system and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant