CN102323946B - Implementation method for operator reuse in parallel database - Google Patents
Implementation method for operator reuse in parallel database Download PDFInfo
- Publication number
- CN102323946B CN102323946B CN 201110259524 CN201110259524A CN102323946B CN 102323946 B CN102323946 B CN 102323946B CN 201110259524 CN201110259524 CN 201110259524 CN 201110259524 A CN201110259524 A CN 201110259524A CN 102323946 B CN102323946 B CN 102323946B
- Authority
- CN
- China
- Prior art keywords
- operator
- plan
- thread
- materialization
- reusable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an implementation method for operator reuse in a parallel database, comprising the following steps of: step 1, generating a serial query plan for query through a normal query planning method, wherein the query plan is a binary tree structure; step 2, executing the query plane by scanning from top to bottom, searching materialized reusable operators, changing the query plane structure, and changing thread level materialized operators into global reusable materialized operators; step 3, parallelizing the query plan changed in the step 2, and generating a plan forest for parallel execution of a plurality of threads; step 4, executing global reusable operator combination on the plan forest generated in the step 3, and generating a directed graph plan for the materialized reusable operators capable of being executed by the plurality of threads in parallel; step 5, executing own plan part in the directed graph by each thread in parallel, wherein the thread which executes the global reusable operator firstly is called a main thread, the main thread locks the global reusable operator and truly executes the operator and the plan of the operator, and other threads wait; step 6, unlocking the global reusable operator by the main thread after execution, wherein other threads start to read data from the global reusable operator and continue to execute own plan tree; and step 7, releasing the materialized data of the operator by the main thread after all the plans read the data of the global reusable operator.
Description
Technical field
The present invention relates to a kind of Database Systems, especially relate to a kind of multiplexing implementation method of operator of parallel database.
Background technology
Along with the development of infotech with popularize, data expand rapidly with index speed, and processing mass data more and more becomes the major issue that computer realm faces.The research to OLAP, DSS, data mining etc. that database field is risen all is the research to mass data processing in essence.
Solving the popular technology of mass data processing problem at present is parallel query technology and Clustering.The parallel query technology all is the study hotspot of database field all the time, and academia has proposed the architectural framework of multiple parallel query: Share-Everything (fully shared) framework, Share-Memory (shared drive) framework, Share-Disk (shared disk) framework and Share-Nothing (without sharing) framework.Share-Memory framework and Share-Everything framework all can shared main storages, and process or thread can pass through the internal memory swap data.But at present popular Share-Memory and the parallel database system of Share-Everything framework all are to use shared drive to communicate and exchanges data, and do not utilize the operator between a plurality of concurrent processes or the thread multiplexing.Under the parallel architecture based on subregion, a plurality of threads or process are independently carried out task separately, parallel often a plurality of processes or the almost completely identical statement of thread execution structure in the inquiry of same query statement, the part table that difference just wherein relates to is different.
In this case, each process or thread be all separately query execution one time, and do not consider operator multiplexing be significant wastage to resource.
Summary of the invention
In order to address the above problem, the invention provides a kind of resource utilization and performance that in the parallel database of SM framework, improves system by realization operator multiplex technique.
The present invention adopts following technical scheme:
Step 1, use common query planning method to be the inquiry plan of query generation serial, described inquiry plan is a binary tree structure;
Described inquiry plan is carried out in step 2, top-down scanning, seeks reusable materialization operator, and change inquiry plan structure, changes thread-level materialization operator into overall multiplexing materialization operator;
Inquiry plan after step 3, the change that step 2 is generated carries out parallelization to be processed, and generates to be used for the plan forest that a plurality of thread parallels are carried out;
Step 4, the plan forest that step 3 is generated carry out overall multiplexing operator merging processing, and generation is used for can be for the digraph plan of a plurality of thread parallels execution and reusable materialization operator;
Step 5, each thread parallel are carried out the calculated plan tree separately of described digraph, first thread of carrying out overall multiplexing operator is referred to as main thread, pin the multiplexing operator of this overall situation and real this operator and following plan tree thereof, other thread waits carried out by main thread;
Step 6, described main thread execute the afterwards release of this operator, and other threads begin reading out data and continuation plan tree separately from the multiplexing operator of this overall situation;
Step 7, described main thread wait for that all plan tree all reads the data that the data of overall multiplexing operator discharge this operator materialization after complete; Wherein, the criterion of described reusable materialization operator is: if certain materialization operator and following plan tree thereof do not comprise partition table, this materialization operator can be re-used so.
The present invention is the optimization to the parallel database query execution flow process of SM framework, and key features is materialization operator identical in a plurality of threads is shared.Carry out flow process with common parallel query and compare, not only save CPU and memory source, also will lack IO and read, do not increase any cost.
Description of drawings
The present invention is further illustrated below in conjunction with the drawings and specific embodiments.
Fig. 1 shows the synoptic diagram of the plan tree construction of step 1 generation;
Fig. 2 shows the synoptic diagram of the plan tree construction of step 2;
Fig. 3 shows the synoptic diagram of the plan forest structure of step 3;
Fig. 4 shows the synoptic diagram of the digraph proposed figures for the plan of step 4;
Fig. 5 shows the data flowchart of plan execute phase.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail:
Can supply multiplexing operator by in query optimization stage of database the parallel plan of multithreading being scanned therefrom to seek, can be revised as the overall situation for multiplexing operator and share operator, and the change proposed figures for the plan, will plan tree and become the plan forest, and further be rewritten into digraph.By this digraph plan of executed in parallel, the intermediate result of multiplexing materialization operator in the plan implementation.
This method mainly may further comprise the steps:
The plan generation phase:
Step 1:
Use common query planning method to be the inquiry plan of query generation serial, this inquiry plan is a binary tree structure.Wherein inquiry relates to partition table, so the part leaf node is the scanning to partition table.As shown in Figure 1, some inquiries are select*from A, B, P where A.a=B.b and A.a=P.p; Wherein A, B are not partition tables, and P is partition table, and child partition is P1, P2.This plan tree table is bright in this inquiry, at first B table is created the Hash table, be connected with B to connect by the HashJoin mode by A, then the result of the connection of A, B is created Hash and shows, and is connected with P to connect by the HashJoin mode again.
Step 2:
Reusable materialization class operator is sought in top-down scanning executive plan, and the change proposed figures for the plan, changes thread-level materialization operator into the overall situation multiplexing materialization operator.When finding after one, no longer continue scanning subtree downwards.The criterion of reusable materialization operator is: if certain materialization operator and following plan tree thereof do not comprise partition table, this materialization operator can be re-used so.Such as the plan in this example, as shown in Figure 2, top-down scan plan can find at first that the connection result establishment Hash table to A, B is reusable materialization operator, and therefore revising this Hash operator is the GlobalHash operator.If can find that it also is a materialization operator that can be re-used that the B table is created the Hash table although continue downward scanning, but be the subtree that the connection result of A, B is created the Hash table because B table is created the Hash table, we stop to continue downward scanning after searching out a reusable operator, therefore the B table being created the Hash table will not be re-used.
Step 3:
The inquiry plan that step 2 is generated carries out the parallelization processing, generates to be used for the plan that a plurality of thread parallels are carried out.Concrete grammar is partition table in the scan plan, if all partition table partitioned modes that relate to are identical, so just can copy plan with the identical umber of the number of partitions, and the subregion master meter in each plan is replaced with the subregion sublist, forms a plan forest.As shown in Figure 3, the partition table in this example is P, and the number of partitions is 2.Therefore the plan in the step 2 is copied as 2 parts, and the P table in each plan tree is revised as respectively P1 and P2.
Step 4:
The plan forest that step 3 is generated carries out overall multiplexing operator merging processing, and generation is used for can be for the digraph plan of a plurality of thread parallels execution and reusable materialization operator.Concrete grammar is each plan tree in the scan plan forest, runs into overall materialization operator and just the overall materialization operator of the same position in each plan tree is merged into one.As shown in Figure 4, the plan forest in this example is merged into following digraph plan.
The plan execute phase:
The execution of parallel plan is actually the concurrent execution of a plurality of threads, and carries out mutually the process of data transmission.Materialization operator multiplex technique the work that plan is done in the implementation be exactly only have the execution of a thread reality this operator with and under plan, other threads all not have execution, and the result that has been multiplexing.Concrete method is: each thread parallel is carried out the plan part separately in the digraph, first thread of carrying out overall multiplexing operator is referred to as main thread, pin the multiplexing operator of this overall situation and real this operator and following plan thereof, other thread waits carried out by main thread.Main thread executes the afterwards release of this operator, and other threads begin reading out data and continuation plan tree separately from the multiplexing operator of this overall situation.Main thread waits for that all plans all read the data that discharge this operator materialization after the data of the multiplexing operator of the complete overall situation.Data flow diagram as shown in Figure 5 in this example.As can be seen from the figure, two thread parallel execution HashJoin separately, wherein main thread will be finished the connection of P1, A, three tables of B, and another thread will be finished the connection of the same A of P2, B table.But only have main thread to carry out the HashJoin of A and B and A, B connection result created the Hash table, another thread directly in the GlobalHash operator reading out data be Hashjoin with the P2 table.Obviously, multiplexing by to operator creates the Hash table and the operation that the connection result of A, B creates the Hash table only done once the B table, saved internal memory and cpu resource; Attended operation to A, B has only been done once, has saved cpu resource; Data to A, B table read and have only done once, have saved the IO resource.
The present invention is the optimization to the parallel database query execution flow process of SM framework, and key features is materialization operator identical in a plurality of threads is shared.Carry out flow process with common parallel query and compare, not only save CPU and memory source, also will lack IO and read, do not increase any cost.After tested, use the operator multiplex technique cpu resource utilization rate can be reduced by 16% in TPC-H100G benchmark test, internal memory uses reduction by 32%, IO amount to reduce by 35%.
Certainly; the present invention also can have other various embodiments; in the situation that does not deviate from spirit of the present invention and essence thereof; those of ordinary skill in the art work as can make according to the present invention various corresponding changes and distortion, but these changes of believing and distortion all should belong to the protection domain of the appended claim of the present invention.
Claims (5)
1. the implementation method of an operator reuse in parallel database comprises the steps:
Step 1, use common query planning method to be the inquiry plan of query generation serial, described inquiry plan is a binary tree structure;
Described inquiry plan is carried out in step 2, top-down scanning, seeks reusable materialization operator, and change inquiry plan structure, changes thread-level materialization operator into overall multiplexing materialization operator;
Inquiry plan after step 3, the change that step 2 is generated carries out parallelization to be processed, and generates to be used for the plan forest that a plurality of thread parallels are carried out;
Step 4, the plan forest that step 3 is generated carry out overall multiplexing operator merging processing, and generation is used for can be for the digraph plan of a plurality of thread parallels execution and reusable materialization operator;
Step 5, each thread parallel are carried out the calculated plan tree separately of described digraph, first thread of carrying out overall multiplexing operator is referred to as main thread, pin the multiplexing operator of this overall situation and real this operator and following plan tree thereof, other thread waits carried out by main thread;
Step 6, described main thread execute the afterwards release of this operator, and other threads begin reading out data and continuation plan tree separately from the multiplexing operator of this overall situation;
Step 7, described main thread wait for that all plan tree all reads the data that the data of overall multiplexing operator discharge this operator materialization after complete; Wherein, the criterion of described reusable materialization operator is: if certain materialization operator and following plan tree thereof do not comprise partition table, this materialization operator can be re-used so.
2. the implementation method of operator reuse in parallel database as claimed in claim 1, wherein step 1 specifically comprises: described inquiry relates to partition table, and the part leaf node is the scanning to partition table.
3. the implementation method of operator reuse in parallel database as claimed in claim 1, wherein step 2 specifically comprises: after finding a reusable materialization operator, no longer continue scanning subtree downwards.
4. the implementation method of operator reuse in parallel database as claimed in claim 1, wherein step 3 specifically comprises: partition table in the scan plan, if the partition table partitioned mode that all relate to is identical, so just can copy plan with the identical umber of the number of partitions, and the subregion sublist replacement of the subregion master meter in each plan, form a plan forest.
5. the implementation method of operator reuse in parallel database as claimed in claim 1, wherein step 4 specifically comprises: each plan tree in the scan plan forest runs into overall materialization operator and just the overall materialization operator of the same position in each plan tree is merged into one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110259524 CN102323946B (en) | 2011-09-05 | 2011-09-05 | Implementation method for operator reuse in parallel database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110259524 CN102323946B (en) | 2011-09-05 | 2011-09-05 | Implementation method for operator reuse in parallel database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102323946A CN102323946A (en) | 2012-01-18 |
CN102323946B true CN102323946B (en) | 2013-03-27 |
Family
ID=45451689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110259524 Active CN102323946B (en) | 2011-09-05 | 2011-09-05 | Implementation method for operator reuse in parallel database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102323946B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014015492A1 (en) * | 2012-07-26 | 2014-01-30 | 华为技术有限公司 | Data distribution method, device, and system |
CN103678368B (en) * | 2012-09-14 | 2017-02-08 | 华为技术有限公司 | query processing method and device |
CN103678619B (en) * | 2013-12-17 | 2017-06-30 | 北京国双科技有限公司 | Database index treating method and apparatus |
CN105630789B (en) * | 2014-10-28 | 2019-07-12 | 华为技术有限公司 | A kind of inquiry plan method for transformation and device |
US10339137B2 (en) * | 2015-12-07 | 2019-07-02 | Futurewei Technologies, Inc. | System and method for caching and parameterizing IR |
US10671607B2 (en) * | 2016-09-23 | 2020-06-02 | Futurewei Technologies, Inc. | Pipeline dependent tree query optimizer and scheduler |
US20180173753A1 (en) * | 2016-12-16 | 2018-06-21 | Futurewei Technologies, Inc. | Database system and method for compiling serial and parallel database query execution plans |
CN108829735B (en) * | 2018-05-21 | 2021-06-29 | 上海达梦数据库有限公司 | Synchronization method, device, server and storage medium for parallel execution plan |
CN110909023B (en) * | 2018-09-17 | 2021-11-19 | 华为技术有限公司 | Query plan acquisition method, data query method and data query device |
CN112270412B (en) * | 2020-10-15 | 2023-10-27 | 北京百度网讯科技有限公司 | Network operator processing method and device, electronic equipment and storage medium |
CN112270413B (en) * | 2020-10-22 | 2024-02-27 | 北京百度网讯科技有限公司 | Operator merging method, device, electronic equipment and storage medium |
CN116644090B (en) * | 2023-07-27 | 2023-11-10 | 天津神舟通用数据技术有限公司 | Data query method, device, equipment and medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002211616A1 (en) * | 2000-10-06 | 2002-04-15 | Whamtech, L.P. | Enhanced boolean processor with parallel input |
EP1716505B1 (en) * | 2004-02-21 | 2018-01-10 | Microsoft Technology Licensing, LLC | Ultra-shared-nothing parallel database |
CN101187937A (en) * | 2007-10-30 | 2008-05-28 | 北京航空航天大学 | Mode multiplexing isomerous database access and integration method under gridding environment |
-
2011
- 2011-09-05 CN CN 201110259524 patent/CN102323946B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN102323946A (en) | 2012-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102323946B (en) | Implementation method for operator reuse in parallel database | |
Ammar et al. | Distributed evaluation of subgraph queries using worstcase optimal lowmemory dataflows | |
Lai et al. | Scalable distributed subgraph enumeration | |
US8984515B2 (en) | System and method for shared execution of mixed data flows | |
Wang et al. | Multi-query optimization in mapreduce framework | |
US9489411B2 (en) | High performance index creation | |
CN103761080A (en) | Structured query language (SQL) based MapReduce operation generating method and system | |
Serafini et al. | Qfrag: Distributed graph search via subgraph isomorphism | |
Guo et al. | Exploiting reuse for gpu subgraph enumeration | |
CN104111936A (en) | Method and system for querying data | |
CN104871153A (en) | System and method for flexible distributed massively parallel processing (mpp) database | |
Tatemura et al. | Partiqle: An elastic SQL engine over key-value stores | |
Pertesis et al. | Efficient skyline query processing in spatialhadoop | |
FI128995B (en) | Object grouping in computer aided modeling | |
Gao et al. | GLog: A high level graph analysis system using MapReduce | |
Abualigah et al. | Advances in MapReduce big data processing: platform, tools, and algorithms | |
Peng et al. | Mining frequent subgraphs from tremendous amount of small graphs using MapReduce | |
An et al. | Using index in the mapreduce framework | |
Nam et al. | A parallel query processing system based on graph-based database partitioning | |
CN109635473A (en) | A kind of heuristic high-throughput material simulation calculation optimization method | |
CN104038356A (en) | Execution method, configuration apparatus and processing apparatus for data route | |
Suryajaya et al. | A fast large-size production data transformation scheme for supporting smart manufacturing in semiconductor industry | |
Douglas et al. | Blind men and an elephant coalescing open-source, academic, and industrial perspectives on BigData | |
Boehm et al. | Vectorizing instance-based integration processes | |
Holland et al. | Distributing an SQL Query over a Cluster of Containers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |