WO2015149350A1 - 一种数据处理方法及系统 - Google Patents

一种数据处理方法及系统 Download PDF

Info

Publication number
WO2015149350A1
WO2015149350A1 PCT/CN2014/074798 CN2014074798W WO2015149350A1 WO 2015149350 A1 WO2015149350 A1 WO 2015149350A1 CN 2014074798 W CN2014074798 W CN 2014074798W WO 2015149350 A1 WO2015149350 A1 WO 2015149350A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
processor
operators
performance
allocation
Prior art date
Application number
PCT/CN2014/074798
Other languages
English (en)
French (fr)
Inventor
李俊
施会华
方帆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201480000560.5A priority Critical patent/CN105210059B/zh
Priority to PCT/CN2014/074798 priority patent/WO2015149350A1/zh
Publication of WO2015149350A1 publication Critical patent/WO2015149350A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations

Definitions

  • the embodiments of the present invention relate to the field of communications, and in particular, to a data processing method and system. Background technique
  • Databases are warehouses that organize, store, and manage data in terms of data structures. To date, data management is no longer just about storing and managing data, but transforming it into the way it performs the various data management that users need.
  • the existing database execution has a certain process, first transform the (Structured Query Language, SQL) structured query language into an executable relation tree, then map the relationship tree to the corresponding execution function, and finally execute these functions.
  • the execution of the query process is completed, wherein the tree nodes are executed on the processor by a (Query Engine, QE) structuring engine, and these tree nodes are called operators.
  • the database is a system software, he needs to run on a certain hardware platform, the database may change greatly in hardware form, in addition to the central processing unit (CPU), it will add some other functions.
  • processors such as accelerated processors, typical (Graphic Processing Units, GPUs), Field Programmable Gate Arrays (FPGAs), field-programmable gate arrays, and processors that add other functions are called heterogeneous processing.
  • Device such as accelerated processors, typical (Graphic Processing Units, GPUs), Field Programmable Gate Arrays (FPGAs), field-programmable gate arrays, and processors that add other functions are called heterogeneous processing.
  • FPGAs Field Programmable Gate Arrays
  • the existing database software implementation is based on a common processor (ie, CPU), and each node on the relationship tree is executed on a normal processor, so that a processor with a new function, that is, a heterogeneous processor cannot be highlighted. Its computational advantages, such as the advantages of acceleration, therefore, how to properly assign different operators to the appropriate execution subject in a database system including heterogeneous processors has become an urgent problem to be solved.
  • Embodiments of the present invention provide a data processing method and system, which are capable of including a heterogeneous processor In the database system, different operators are reasonably allocated to the appropriate execution subject.
  • a data processing method including:
  • the processor includes a heterogeneous processor
  • Each of the operators is assigned to the processor in accordance with the allocation policy of each of the operators, such that the processor performs a calculation on the assigned operator.
  • the method before the forming the execution tree according to the obtained query statement, the method further includes:
  • the performance parameters of each type of operator corresponding to each processor are written into an allocation rule table, wherein the type of the operator includes at least selection, projection, and grouping.
  • the performance parameter is the weight value and the performance score
  • the allocation policy is determined according to the performance parameters of each processor corresponding to each processor in the set allocation rule table, including:
  • the allocation policy determines, according to the product of the weight value and the performance score of each of the operators in the set distribution rule table, the processor with the highest product value and the processor corresponding to the performance score as The processor of the operator.
  • the allocation policy according to each of the operators corresponds to each Sub-allocation to the processor includes:
  • a data processing system comprising: An execution tree building module, configured to form an execution tree according to the obtained query statement, where each tree node of the execution tree is an operator;
  • An allocation module configured to traverse an operator on the execution tree to obtain an allocation policy of each of the operators, wherein the allocation policy corresponds to each processor according to each type of the operator in the set allocation rule table
  • the performance parameter determines that the processor includes a heterogeneous processor
  • an execution module configured to allocate each of the operators to the processor according to the allocation policy of each of the operators, so that the processor performs calculation on the assigned operator.
  • the method further includes:
  • An obtaining module configured to obtain, according to the type of the operator, performance parameters of each type of operator corresponding to each processor
  • An allocation rule table construction module is configured to write a performance parameter corresponding to each processor of each type of operator to the allocation rule table, where the type of the operator includes at least selection, projection, and grouping, and initial values of the performance parameter It is obtained according to the specified file or evenly distributed according to the performance of the current system, and the performance parameter remains dynamically updated.
  • the performance parameter is the weight value and the performance score
  • the allocation module is specifically configured to: traverse an operator on the execution tree to obtain an allocation policy of each of the operators, where the allocation policy corresponds to each of the operators in the set allocation rule table
  • the product of the weight value and the performance score, and the processor corresponding to the highest weight value and the performance score is determined as the processor of the operator.
  • the execution module includes:
  • Determining a sub-module configured to determine, according to the allocation policy of each of the operators, the processor allocated to each of the operators;
  • a scheduling submodule configured to adjust, according to a load condition of each of the processors and/or a parallelizable condition of each of the operators, a processor allocated for each of the operators.
  • a second aspect of the present invention provides a data processing system, including: a first processor, configured to form an execution tree according to the obtained query statement, where each tree node of the execution tree is an operator; traversing an operator on the execution tree to obtain an allocation of each operator a policy, where the allocation policy is determined according to performance parameters of each type of the operator in the set allocation rule table, and the second processor includes a heterogeneous processor;
  • an allocator configured to allocate each of the operators to the second processor according to the allocation policy of each of the operators, so that the second processor performs calculation on the assigned operator.
  • the method further includes:
  • the first processor is further configured to: obtain, according to the type of the operator, performance parameters of each type of operator corresponding to each second processor; and write performance parameters of each type of operator corresponding to each second processor into an allocation rule table,
  • the type of the operator includes at least selection, projection, and grouping.
  • the performance parameter is the weight value and the performance score
  • the first processor is configured to: traverse the operator on the execution tree to obtain an allocation policy of each of the operators, where the allocation policy is corresponding to each of the second processors in the configured allocation rule table.
  • the product of the weight value of the operator and the performance score, and the second processor corresponding to the weight value with the highest product and the performance score is determined as the second processor of the operator.
  • the allocator is specifically configured to determine, according to the allocation policy of each of the operators, the second processor allocated to each of the operators; according to load conditions and/or each of the second processors
  • the parallelizable case of the operators adjusts the second processor assigned to each of the operators.
  • the data processing system forms an execution tree according to the obtained query statement, wherein each tree node of the execution tree is an operator; traversing the operator on the execution tree to obtain each operator
  • the allocation strategy wherein the allocation strategy is determined according to the performance parameters of each type of operator in the set distribution rule table, and the processor includes a heterogeneous processor; each operator is respectively allocated according to an allocation strategy of each operator Assigned to the processor such that the processor performs calculations on the assigned operator.
  • FIG. 1 is a flowchart of an embodiment of a data processing method provided by the present invention.
  • FIG. 2 is a flowchart of another embodiment of a data processing method provided by the present invention.
  • FIG. 3 is a schematic structural diagram of an embodiment of a data processing system provided by the present invention.
  • FIG. 4 is a schematic structural diagram of another embodiment of a data processing system according to the present invention.
  • FIG. 5 is a schematic structural diagram of still another embodiment of a data processing system according to the present invention.
  • FIG. 6 is a schematic structural diagram of another embodiment of a data processing system according to the present invention.
  • FIG. 1 is a flowchart of an embodiment of a data processing method according to the present invention. As shown in FIG. 1, the method includes:
  • the S10 data processing system forms an execution tree according to the obtained query statement, wherein each tree node of the execution tree is an operator.
  • a user can submit a SQL to a data processing system through a terminal, and a database management system (DBMS) in a data processing system parses the SQL into a corresponding execution tree according to a grammar, and according to existing rules. Optimize the execution tree, such as selecting select pushdown or repeat join join merge for the tree. Among them, select and join are commonly used functions in SQL database.
  • S102 The data processing system traverses an operator on the execution tree to obtain an allocation strategy of each operator, where the allocation policy is determined according to performance parameters of each processor corresponding to each type of operator in the set allocation rule table, and the processor includes different Construct a processor.
  • the processor includes at least two types of processors.
  • the processor may include a CPU or at least one type of heterogeneous processing; or, in another In this scenario, the processor can include at least two heterogeneous processors.
  • the database system in the data processing system performs calculation calculation on the processor, and finally summarizes the calculation data and feeds back to the user.
  • the database system in the data processing system uses a common processor as the main processor, and the processor such as GPU and FPGA added by the system is a coprocessor (also called a heterogeneous processor, but the heterogeneous processor is not limited to the above two types. Processor). Since the heterogeneous processor is added, the operator cannot be reasonably allocated between the heterogeneous processor and the normal processor. Therefore, in this embodiment, the execution tree is optimized according to the existing rules, and each operator is obtained.
  • the allocation strategy is set according to the running parameters of each operator on each processor and the performance of the processor itself, that is, according to the performance parameters of each operator corresponding to each processor in the set allocation rule table.
  • the assignment of the operators has been improved so that the operators can perform calculations on each processor more evenly, thereby achieving an optimal performance of the overall performance.
  • the performance parameter described above may be a performance score of a corresponding operator of each processor, or the performance parameter may be a weight value and a performance score.
  • the weight value is determined according to at least one of an impact factor such as a data amount, a parallel amount, a hardware model, and a network speed. For example, the weight value is determined according to the data amount and the hardware model, when the data amount changes.
  • the weight value in the distribution rule table also needs to be changed accordingly; or, when the hardware model is permanently changed, the weight value needs to be changed accordingly.
  • the weight value is the weight of the performance score
  • changes in the impact factors such as data volume, parallel amount, hardware model, and network speed will not only affect the weight value, but also have some impact on the performance score, and will also have some impact on the performance score.
  • the initial value of the performance parameter may be obtained according to the specified file or evenly distributed according to the performance of the current system, and the performance parameter may remain dynamically updated.
  • the dynamic update of the performance score may be a performance update of the corresponding operator of each processor in the value update allocation rule table normalized according to the running speed of an operator on each processor, for example, an operator on the execution tree, Operator B and operator C are three operators, and the normalized value is updated according to the running speed of operator A on each processor.
  • Each processor in the allocation rule table corresponds to the performance score of operator A, as is operator B and operator C.
  • the execution tree there are three operators in the execution tree, operator B, and operator C.
  • the performance parameter of operator A in the general processor is 9.
  • the performance parameter of the heterogeneous processor is 8, the performance parameter of the operator B in the ordinary processor is 5, the performance parameter of the heterogeneous processor is 7, and the performance parameter of the operator C in the heterogeneous processor is 8,
  • the allocation strategy of the operator A can be set to a normal processor
  • the allocation strategy of the operator B is set to a heterogeneous processor
  • the operator C allocation policy is set to Heterogeneous processor. This is only an example of a situation and is not intended to be limiting.
  • the processor in which the operator is located in the scheme may be heterogeneous, and the manner of the rules and weights may optimize the combination of processing units, which can effectively improve the overall execution performance. All operators are executed on the highest performing processing unit, which guarantees the highest overall performance of the query.
  • the data processing system separately allocates each operator to the processor according to an allocation strategy of each operator, so that the processor performs calculation on the assigned operator.
  • processors perform different calculations on the assigned operators.
  • an ordinary processor, a heterogeneous processor with GPU function, and a heterogeneous processor with FPGA function are used as examples to illustrate execution. the process of:
  • the heterogeneous processor with FPGA function is responsible for setting the FPGA configuration information, selecting the corresponding algorithm, and having different FPGA functions. After the execution of the processor is completed, it is responsible for retrieving the data and applying the execution result to the subordinate execution operator.
  • the GPU-enabled heterogeneous processor loads the execution code into the GPU and is responsible for retrieving the executed data. , to ensure that the data is passed to the next node.
  • the normal processor creates a thread and executes the operator.
  • the data processing system forms an execution tree according to the obtained query statement, wherein each tree node of the execution tree is an operator; traversing an operator on the execution tree to obtain an assignment of each operator a policy, wherein the allocation policy is determined according to performance parameters of each processor in each type of operator in the set allocation rule table, and the processor includes a heterogeneous processor;
  • the allocation strategy assigns each operator to the processor, respectively, such that the processor performs the calculation on the assigned operator.
  • the processing of the allocation on the execution tree increases the allocation process, so that in the data processing system with the heterogeneous processor, the operator can perform calculations on each processor more uniformly, so that the data processing system reaches The optimal performance of the overall performance.
  • FIG. 2 is a flowchart of another embodiment of a data processing method according to the present invention. As shown in FIG. 2, the performance parameter is illustrated by using a performance parameter as a weight value and a performance score, but is not limited thereto.
  • the method includes :
  • the S20 data processing system obtains the weight value and performance score of each type of operator corresponding to each processor according to the type of the operator.
  • the data processing system writes the weight value and the performance score corresponding to each processor of each type operator into the allocation rule table.
  • the type of operator includes at least selection, projection and grouping. Further, when the data processing system is initialized, the execution tree is not generated, so the weight value and performance score of the operator in each processor are previously written into the allocation rule table according to the type of the operator.
  • a rule table containing the processing capabilities of operators on common processors and heterogeneous processors.
  • the processing power is represented by performance scores.
  • the rule table can also include weight values to correct the running process. Performance deviation.
  • a rule base is built that maintains the performance scores of each operator on each of the normal and heterogeneous processors, which can be dynamically updated.
  • Table 1 is an allocation rule table. As shown in Table 1, the data processing system constructs an allocation rule table, which stores the performance scores and weight values of each operator on each processor, and can also write some Other information is not listed here. It should be noted that the allocation rule table is only one example in this embodiment. In addition to the performance of the core, other parts of the table may be dynamically added or deleted.
  • the performance scores in Table 1 above can be obtained by two methods. One is to read from the specified file.
  • the specified file can be fixedly generated when the hardware is determined.
  • the generated rule can be based on the database system in the data processing system.
  • the performance-intensive allocation such as the performance of the processor is more intensive, the performance score is higher, the performance of the test is sparse, the performance score is lower; the second is to initialize to an average, that is, all
  • the performance scores of the processors are all set to the same number. For example, if the performance score of the entire system is 100 and there are 20 processors, the performance score of each processor can be set to 5.
  • This embodiment uses the first method to obtain an example. The first method obtains the advantage that the performance score can be initialized according to the tested value, and the performance optimization degree is ensured.
  • the number obtained by the weight value initialization of the performance score is also obtained by using the above two acquisition methods, and will not be described herein.
  • the weight value changes dynamically during the running process. For example, the weight value changes according to the amount of data, parallelism, hardware model changes, and so on.
  • the distribution rule table After the distribution rule table obtains the performance score according to the above method, it can be normalized according to the running speed of each operator on different processors, and the normalized value is also written into the corresponding performance score in the allocation rule table. In the field. And the performance score can be dynamically updated, that is, when the actual running speed of the same operator on different processors is obtained at runtime, normalization is performed, and the normalized result is updated. Write to the corresponding performance subfield in the allocation rules table.
  • the performance weight of the processor is dynamically updated, it ensures that the data processing system will converge to a more accurate performance indicator after a period of time. Because the hardware, interconnects, and data volume of the data processing system affect the evaluation of the entire performance score, weight values are introduced to adjust the performance scores of these factors. The benefit of this fine-tuning is the best processing performance. Approaching the true optimal performance of database systems in data processing systems.
  • the data processing system forms an execution tree according to the obtained query statement.
  • the data processing system traverses an operator on the execution tree to obtain an allocation strategy of each operator, where the allocation strategy is based on a product of a weight value and a performance score of each operator corresponding to each processor in the set distribution rule table.
  • the processor with the highest weight value and the performance score corresponding to the processor is determined as the processor of the operator, and the processor includes the heterogeneous processor.
  • traversing the operator on the execution tree to obtain the allocation strategy of each operator may also form an allocation tree, so that each operator performs the next calculation according to the tree node and structure of the allocation tree.
  • the performance of the selection operator on the heterogeneous processor with GPU function is divided into 10, and the weight value is 1, then the operator is selected to have GPU function.
  • the product on the heterogeneous processor is 10; the performance of the selection operator on the ordinary processor is divided into 5, and the weight value is 1, then the product of the selection operator on the ordinary processor is 5; the projection operator is in the GPU function.
  • the performance on the heterogeneous processor is divided into 5, and the weight value is 1, the product of the projection operator on the heterogeneous processor with GPU function is 5; the performance of the projection operator on the ordinary processor is divided into 8, the weight value When 1, the product of the selection operator on the ordinary processor is 8. Then, there are four choices for the allocation strategy.
  • the selection operator and the projection operator can be on different processors or on one processor.
  • the data processing system determines, according to an allocation policy of each operator, a processor allocated for each operator. Further, the data processing system considers the parallelism and load balancing of each processor, so all operators may be aggregated on one processor, load balancing is required, and the allocation strategy of each operator is read and determined as each Which processor is allocated by the operator.
  • the data processing system adjusts the processor allocated for each operator according to the load condition of each processor and/or the parallelizable condition of each operator.
  • At least two operators in each operator are operators that can be calculated in parallel, then at least two operators are assigned to different processors to perform calculations, for example, two operators are operations for identifying the two child nodes. And determine if the twin child operator is on the same processor, and if so, move the lower-ranking partial operator to another processor.
  • the additional processor can choose to have the processing power of the operator second only to the original processor.
  • At least two operators in each operator are serially calculated operators, then at least two operators are assigned to the same processor to perform calculations; or,
  • each operator is assigned to a different processor to perform the calculation.
  • the dynamic balancing of the processors ensures the parallelism of the data processing system. Because different processors can run in parallel, dynamic load balancing of the processor can speed up execution and improve overall system performance.
  • the data processing system stores important operational information.
  • important operational information may be the time at which the assigned processor performs the computation, the amount of data at runtime, the processor on which the operator resides, and the like.
  • the data processing system stores the amount of data at runtime: when a certain operator A completes the operator A running on all the processors, and the amount of data running is the same, the running time of the operator A is returned. The processing is performed, and the normalized data is recorded under the data amount to calculate the reference influencing factor as the weight value of the next operation.
  • the data processing system forms an execution tree according to the obtained query statement, wherein each tree node of the execution tree is an operator; traversing an operator on the execution tree to obtain an assignment of each operator a policy, wherein the allocation policy is determined according to performance parameters of each type of operators in the set distribution rule table, and the processor includes a heterogeneous processor; respectively, each operator is allocated according to an allocation strategy of each operator In the processor, to cause the processor to perform calculations on the assigned operator.
  • the processing of the allocation on the execution tree increases the allocation process, so that in the data processing system with the heterogeneous processor, the operator can perform calculations on each processor more uniformly, so that the data processing system reaches The optimal performance of the overall performance.
  • FIG. 3 is a schematic structural diagram of an embodiment of a data processing system according to the present invention. As shown in FIG. 3, the data processing system 30 includes: an execution tree construction module 301, an allocation module 302, and an execution module 303.
  • the execution tree building module 301 is configured to form an execution tree according to the obtained query statement, where each tree node of the execution tree is an operator.
  • the allocation module 302 is configured to traverse an operator on the execution tree to obtain an allocation policy of each operator, where the allocation policy is determined according to performance parameters of each processor corresponding to each type of operator in the set allocation rule table, and the processor includes Heterogeneous processor.
  • the executing module 303 is configured to allocate each operator to the processor according to an allocation policy of each operator, so that the processor performs calculation on the assigned operator.
  • FIG. 4 is a schematic structural diagram of another embodiment of a data processing system according to the present invention.
  • the data processing system 30 further includes: an obtaining module 304 and an allocation rule table building module 305.
  • the obtaining module 304 is configured to obtain, according to the type of the operator, performance parameters of each type of operator corresponding to each processor.
  • An allocation rule table construction module 305 configured to match each type of operator to performance parameters of each processor Corresponding to the write allocation rule table, wherein the type of the operator includes at least selection, projection and grouping.
  • the performance parameter is the performance score, or the performance parameter is the weight value and the performance score; wherein, the weight value is the weight of the performance score, which is determined according to the impact performance factor, and the performance factor is the data amount, the parallel amount, and the hardware. At least one of the model number and the network speed.
  • the distribution rule table construction module 305 is further configured to dynamically update the performance score, wherein the dynamic update of the performance score is a normalized value according to an operator's running speed on each processor, and the update is performed.
  • the performance scores of the operators corresponding to each processor in the distribution rule table are further configured to dynamically update the performance score, wherein the dynamic update of the performance score is a normalized value according to an operator's running speed on each processor, and the update is performed.
  • FIG. 5 is a schematic structural diagram of another embodiment of a data processing system according to the present invention.
  • the execution module 303 includes: a determining submodule 3031 and a scheduling submodule 3032.
  • the determining sub-module 3031 is configured to determine a processor allocated for each operator according to an allocation policy of each operator.
  • the scheduling sub-module 3032 is configured to adjust the processor allocated for each operator according to the load condition of each processor and/or the parallelizable condition of each operator.
  • the scheduling sub-module 3032 can be used for at least two operators in each operator to be parallel computing operators, and then assign at least two operators to different processors to perform calculations; or, scheduling sub-modules 3032 can also be used for operators in which at least two operators are serially calculated, and at least two operators are assigned to the same processor to perform calculations; or, the scheduling sub-module 3032 can also be used for each If there is no association between the operators, then each operator is assigned to a different processor to perform the calculation.
  • the data processing system forms an execution tree according to the obtained query statement, wherein each tree node of the execution tree is an operator; traversing an operator on the execution tree to obtain an assignment of each operator a policy, wherein the allocation policy is determined according to performance parameters of each type of operators in the set distribution rule table, and the processor includes a heterogeneous processor; respectively, each operator is allocated according to an allocation strategy of each operator In the processor, to cause the processor to perform calculations on the assigned operator.
  • the operator on the execution tree increases the allocation process, so that there is a difference
  • the operator can perform calculations on each processor relatively uniformly, so that the data processing system achieves the optimal performance of the overall performance.
  • FIG. 6 is a schematic structural diagram of another embodiment of a data processing system according to the present invention.
  • the data processing system 40 includes: a first processor 401, a second processor 402, and a distributor 403.
  • a first processor 401 configured to form an execution tree according to the obtained query statement, where each tree node of the execution tree is an operator; traversing an operator on the execution tree to obtain an allocation strategy of each operator, where The policy is determined according to the performance parameters of each type of the second processor 402 in the set allocation rule table, and the second processor 402 includes a heterogeneous processor.
  • the allocator 403 is configured to allocate each operator to the second processor 402 according to the allocation strategy of each operator, so that the second processor 402 performs calculation on the assigned operator.
  • the first processor 401 is further configured to: obtain, according to the type of the operator, performance parameters of each type of operator corresponding to each of the second processors 402; and write each type of operator to the performance parameter of each second processor 402.
  • the types of operators include at least selection, projection, and grouping.
  • the performance parameter is the performance score, or the performance parameter is the weight value and the performance score; wherein, the weight value is the weight of the performance score, and is based on the data factor, the parallel amount, the hardware model, the network speed, and the like. At least one of the ok.
  • the first processor 401 is further configured to dynamically update the performance score, wherein the dynamic update of the performance score is a normalized value according to an operating speed of each operator on each second processor 402.
  • the performance scores of the corresponding operators of the second processors 402 in the distribution rule table are updated.
  • the performance parameter is a weight value and a performance score
  • the first processor 401 is specifically configured to obtain an allocation policy of each operator by traversing an operator on the execution tree, where the allocation policy is according to the set allocation rule table.
  • Each second processor 402 determines the second processor corresponding to the weight value of each operator and the performance score, and determines the second processor corresponding to the highest weight value and the performance score as the second processor of the operator.
  • the allocator 403 may be configured to determine, according to the allocation policy of each operator, the second processor 402 allocated to each operator; according to the load condition of each second processor 402 and/or the parallelizable situation of each operator The sub-allocated second processor 402 performs the adjustment.
  • the allocator 403 is specifically configured to have at least two operators in each operator as operators that can be calculated in parallel, and then assign at least two operators to different second processors 402 to perform calculations; or
  • the allocator 403 is specifically configured to use at least two operators in each operator as serially calculated operators, and then assign at least two operators to the same second processor 402 to perform calculations; or, the distributor 403, Specifically, for no association between the operators, each operator is assigned to a different second processor 402 to perform calculation.
  • the data processing system forms an execution tree according to the obtained query statement, wherein each tree node of the execution tree is an operator; traversing an operator on the execution tree to obtain an assignment of each operator a policy, wherein the allocation policy is determined according to performance parameters of each type of operators in the set distribution rule table, and the processor includes a heterogeneous processor; respectively, each operator is allocated according to an allocation strategy of each operator In the processor, to cause the processor to perform calculations on the assigned operator.
  • the processing of the allocation on the execution tree increases the allocation process, so that in the data processing system with the heterogeneous processor, the operator can perform calculations on each processor more uniformly, so that the data processing system reaches The optimal performance of the overall performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明实施例提供一种数据处理方法及系统。该方法包括根据获取的查询语句形成执行树,其中,所述执行树的每个树节点为一个算子;遍历所述执行树上的算子获取各所述算子的分配策略,其中,所述分配策略根据设定的分配规则表中各类型的所述算子对应各处理器的性能参数确定,所述处理器包括异构处理器;根据各所述算子的所述分配策略分别将各所述算子分配到处理器中,以使得所述处理器对分配到的所述算子执行计算。能够在包括异构处理器的数据库系统中将不同的算子合理的分配到合适的执行主体。

Description

一种数据处理方法及系统 技术领域
本发明实施例涉及通信领域, 尤其涉及一种数据处理方法及系统。 背景技术
数据库是按照数据结构来组织、 存储和管理数据的仓库, 发展至今, 数 据管理不再仅仅是存储和管理数据, 而转变成执行用户所需要的各种数据管 理的方式。现有的数据库执行有一定的过程,先将(Structured Query Language, SQL) 结构化査询语言转化为一个可执行的关系树, 再将关系树映射到相应 的执行函数上, 最后通过执行这些函数完成査询过程的执行, 其中, 树节点 则通过 (Query Engine, QE) 结构化引擎在处理器上执行, 这些树节点被称 为算子。
数据库是一个系统软件, 他需要运行在一定的硬件平台上, 数据库在硬 件形态上可能会发生很大的变化, 除了中央处理器(Central Processing Unit , CPU) 之外, 也会增加一些其他功能的处理器, 如加速的处理器, 典型的如 (Graphic Processing Unit, GPU) 图形处理单元, (Field Programmable Gate Array, FPGA) 现场可编程门阵列, 这些增加了其他功能的处理器称为异构 处理器。
现有的数据库软件实现是基于普通的处理器 (即 CPU) 的, 关系树上的 每个节点都是在普通的处理器上执行, 那么增加了新功能的处理器即异构处 理器不能突显其计算的优势, 如加速的优势, 因此, 如何在包括异构处理器 的数据库系统中将不同的算子合理的分配到合适的执行主体成为了亟待解决 的问题。
发明内容
本发明的实施例提供一种数据处理方法及系统, 能够在包括异构处理器 的数据库系统中将不同的算子合理的分配到合适的执行主体。
本发明的第一方面, 提供一种数据处理方法包括:
根据获取的査询语句形成执行树, 其中, 所述执行树的每个树节点为一 个算子;
遍历所述执行树上的算子获取各所述算子的分配策略, 其中, 所述分配 策略根据设定的分配规则表中各类型的所述算子对应各处理器的性能参数确 定, 所述处理器包括异构处理器;
根据各所述算子的所述分配策略分别将各所述算子分配到处理器中, 以 使得所述处理器对分配到的所述算子执行计算。
在第一种可能的实现方式中, 根据第一方面, 所述根据获取的査询语句 形成执行树之前, 还包括:
根据算子的类型得到各类型算子对应各处理器的性能参数;
将各类型算子对应各处理器的性能参数写入分配规则表, 其中, 所述算 子的类型至少包括选择、 投影和分组。
在第二种可能的实现方式中, 结合第一方面和第一种可能的实现方式, 所述性能参数为所述权重值和所述性能分;
所述分配策略根据设定的分配规则表中各处理器对应各所述算子的性能 参数确定包括:
分配策略根据设定的分配规则表中各所述处理器对应各所述算子的权重 值和性能分的乘积, 将乘积最高的所述权重值和所述性能分对应的处理器确 定为所述算子的处理器。
在第三种可能的实现方式中, 结合第一方面、 第一种可能的实现方式和 第二种可能的实现方式, 所述根据各所述算子的所述分配策略对应将各所述 算子分配到处理器中包括:
根据各所述算子的所述分配策略确定为各所述算子分配的所述处理器; 根据各所述处理器的载荷情况和 /或各所述算子的可并行情况对为各所 述算子分配的处理器进行调整。 本发明的第二方面, 提供了数据处理系统, 包括: 执行树构建模块, 用于根据获取的査询语句形成执行树, 其中, 所述执 行树的每个树节点为一个算子;
分配模块, 用于遍历所述执行树上的算子获取各所述算子的分配策略, 其中, 所述分配策略根据设定的分配规则表中各类型的所述算子对应各处理 器的性能参数确定, 所述处理器包括异构处理器;
执行模块, 用于根据各所述算子的所述分配策略分别将各所述算子分配 到处理器中, 以使得所述处理器对分配到的所述算子执行计算。
在第一种可能的实现方式中, 根据第二方面, 还包括:
获取模块, 用于根据算子的类型得到各类型算子对应各处理器的性能参 数;
分配规则表构建模块, 用于将各类型算子对应各处理器的性能参数对应 写入分配规则表, 其中, 所述算子的类型至少包括选择、 投影和分组, 所述 性能参数的初始值是根据指定文件获取或根据当前系统的性能平均分配得到 的, 所述性能参数保持动态更新。
在第二种可能的实现方式中, 结合第二方面和第一种可能的实现方式, 所述性能参数为所述权重值和所述性能分;
所述分配模块, 具体用于遍历所述执行树上的算子获取各所述算子的分 配策略, 其中, 分配策略根据设定的分配规则表中各所述处理器对应各所述 算子的权重值和性能分的乘积, 将乘积最高的所述权重值和所述性能分对应 的处理器确定为所述算子的处理器。
在第三种可能的实现方式中, 结合第二方面、 第一种可能的实现方式和 第二种可能的实现方式, 所述执行模块包括:
确定子模块, 用于根据各所述算子的所述分配策略确定为各所述算子分 配的所述处理器;
调度子模块, 用于根据各所述处理器的载荷情况和 /或各所述算子的可并 行情况对为各所述算子分配的处理器进行调整。 本发明的第三方面, 提供了一种本发明的第二方面, 提供了数据处理系 统, 包括: 第一处理器, 用于根据获取的査询语句形成执行树, 其中, 所述执行树 的每个树节点为一个算子; 遍历所述执行树上的算子获取各所述算子的分配 策略, 其中, 所述分配策略根据设定的分配规则表中各类型的所述算子对应 各第二处理器的性能参数确定, 所述第二处理器包括异构处理器;
分配器, 用于根据各所述算子的所述分配策略分别将各所述算子分配到 第二处理器中, 以使得所述第二处理器对分配到的所述算子执行计算。
在第一种可能的实现方式中, 根据第三方面, 还包括:
所述第一处理器, 还用于根据算子的类型得到各类型算子对应各第二处 理器的性能参数; 将各类型算子对应各第二处理器的性能参数写入分配规则 表, 其中, 所述算子的类型至少包括选择、 投影和分组。
在第二种可能的实现方式中, 结合第三方面和第一种可能的实现方式, 所述性能参数为所述权重值和所述性能分;
所述第一处理器, 具体用于遍历所述执行树上的算子获取各所述算子的 分配策略, 其中, 分配策略根据设定的分配规则表中各所述第二处理器对应 各所述算子的权重值和性能分的乘积, 将乘积最高的所述权重值和所述性能 分对应的第二处理器确定为所述算子的第二处理器。
在第三种可能的实现方式中, 结合第三方面、 第一种可能的实现方式和 第二种可能的实现方式,
所述分配器, 具体用于根据各所述算子的所述分配策略确定为各所述算 子分配的所述第二处理器; 根据各所述第二处理器的载荷情况和 /或各所述算 子的可并行情况对为各所述算子分配的第二处理器进行调整。 本发明实施例提供的数据处理方法及系统, 数据处理系统根据获取的査 询语句形成执行树, 其中, 执行树的每个树节点为一个算子; 遍历执行树上 的算子获取各算子的分配策略, 其中, 分配策略根据设定的分配规则表中各 类型的算子对应各处理器的性能参数确定, 处理器包括异构处理器; 根据各 算子的分配策略分别将各算子分配到处理器中, 以使得处理器对分配到的算 子执行计算。 这样一来, 对于执行树上算子增加了分配的处理过程, 使得在 存在异构处理器的数据处理系统中, 算子能够比较均匀的在各处理器上执行 计算, 从而使得数据处理系统达到整体性能的最优的运行效果。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下 面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 在 不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明提供的数据处理方法实施例的流程图;
图 2为本发明提供的数据处理方法另一实施例的流程图;
图 3为本发明提供的数据处理系统实施例的结构示意图;
图 4为本发明提供的数据处理系统另一实施例的结构示意图;
图 5为本发明提供的数据处理系统又一实施例的结构示意图;
图 6为本发明提供的另一数据处理系统实施例的结构示意图。
具体实施方式 为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本发 明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于 本发明中的实施例, 本领域普通技术人员在没有作出创造性劳动前提下所获 得的所有其他实施例, 都属于本发明保护的范围。
图 1为本发明提供的数据处理方法实施例的流程图, 如图 1所示, 该方 法包括:
S10 数据处理系统根据获取的査询语句形成执行树, 其中, 执行树的 每个树节点为一个算子。
举例来说, 一个用户可以通过终端向数据处理系统提交一个 SQL, 数据 处理系统中的 (Database Management System, DBMS) 数据库管理系统根据 语法将该 SQL解析成对应的执行树, 并根据既有的规则对这棵执行树进行优 化, 如进行对这棵树进行选择 select下推或重复加入 join合并等优化, 其中, select和 join是 SQL数据库中中常用的函数。 S102、数据处理系统遍历执行树上的算子获取各算子的分配策略, 其中, 分配策略根据设定的分配规则表中各类型的算子对应各处理器的性能参数确 定, 处理器包括异构处理器。
需要说明的是, 在本发明实施例中, 处理器至少包括两种类型的处理器, 在一种具体场景下,处理器可以包括 CPU或至少一种类型的异构处理;或者, 在另一种场景下, 处理器可以包括至少两种异构处理器。
举例来说, 现有技术中完成上述对执行树的优化后, 数据处理系统中的 数据库系统会将算子在处理器上进行计算执行, 最后汇总计算数据并反馈给 使用者。 数据处理系统中的数据库系统使用普通的处理器作为主处理器, 系 统增加的 GPU和 FPGA等处理器为协处理器(又称为异构处理器, 但异构处 理器不限于上述这两种处理器) 。 由于增加了异构处理器, 却不能在异构处 理器和普通处理器之间合理分配算子, 因此, 本实施例根据既有的规则对这 棵执行树进行优化后, 获取各算子的分配策略, 该分配策略是根据各算子在 各处理器上的运行参数和处理器本身的性能设定的, 即根据设定的分配规则 表中各处理器对应各算子的性能参数确定。 对算子的分配做了改进, 使得算 子能够比较均匀的在各处理器上执行计算, 从而达到整体性能的最优的运行 效果。
进一歩地, 上述的性能参数可以是各处理器对应算子的性能分, 或, 性 能参数可以是权重值和性能分。 其中, 权重值是根据数据量、 并行量、 硬件 型号、 网络速度等影响因子之中的至少一种确定的, 如, 权重值是根据数据 量和硬件型号确定的, 则在数据量发生变化时, 分配规则表中的权重值也需 要相应的改变; 或者, 硬件型号永久改变时, 权重值也需要相应的改变。 由 于权重值是性能分的权重, 因此数据量、 并行量、 硬件型号、 网络速度等影 响因子的变化不仅会影响权重值也会对性能分造成一些影响, 也会对性能分 造成一些影响。 性能参数的初始值可以是根据指定文件获取或根据当前系统 的性能平均分配得到的, 且性能参数可以保持动态更新。 性能分的动态更新 可以是根据一算子在各处理器上的运行速度进行归一化后的数值更新分配规 则表中各处理器对应算子的性能分, 如, 执行树上有算子 、 算子 B和算子 C三个算子, 根据算子 A在各处理器上的运行速度进行归一化后的数值更新 分配规则表中各处理器对应算子 A的性能分, 算子 B和算子 C亦然。
举例来说, 执行树上有算子 、 算子 B和算子 C三个算子, 数据库系统 中有一个普通处理器, 一个异构处理器, 算子 A在普通处理器的性能参数为 9, 在异构处理器的性能参数为 8, 算子 B在普通处理器的性能参数为 5, 在 异构处理器的性能参数为 7,算子 C在异构处理器的性能参数为 8,在普通处 理器的性能参数为 6, 则可以将算子 A的分配策略设定为普通处理器, 将算 子 B的分配策略设定为异构处理器, 将算子 C分配策略设定为异构处理器。 此处仅为一种情况的举例说明, 并不以此做任何限定。
进一歩地, 本方案中算子所在的处理器可能是异构的, 这种规则和权重 的方式可以优选处理单元的组合, 可以有效提升整体的执行性能。 所有的算 子会在性能最高的处理单元上执行,这样就保证了査询语句整体性能的最高。
S103、 数据处理系统根据各算子的分配策略分别将各算子分配到处理器 中, 以使得处理器对分配到的算子执行计算。
进一歩地, 不同处理器对分配到的算子执行计算的行方式不同, 这里以 普通的处理器,有 GPU功能的异构处理器和有 FPGA功能的异构处理器三者 为例说明执行的过程:
若算子分配策略对应将各算子分配到处理器是有 FPGA功能的异构处理 器执行, 则有 FPGA功能的异构处理器负责设置 FPGA配置信息, 选取对应 的算法, 有 FPGA功能的异构处理器执行完成后负责取回数据, 并将执行结 果应用到下级执行算子中。
若算子分配策略对应将各算子分配到处理器是有 GPU 功能的异构处 理器执行, 则有 GPU功能的异构处理器将执行代码载入 GPU执行, 并负责 取回执行后的数据, 确保数据传递到下一节点。
若算子分配策略对应将各算子分配到处理器是普通的处理器执行, 则 普通的处理器创建线程, 执行该算子。
本发明实施例提供的数据处理方法, 数据处理系统根据获取的査询语句 形成执行树, 其中, 执行树的每个树节点为一个算子; 遍历执行树上的算子 获取各算子的分配策略, 其中, 分配策略根据设定的分配规则表中各类型的 算子对应各处理器的性能参数确定, 处理器包括异构处理器; 根据各算子的 分配策略分别将各算子分配到处理器中, 以使得处理器对分配到的算子执行 计算。 这样一来, 对于执行树上算子增加了分配的处理过程, 使得在存在异 构处理器的数据处理系统中, 算子能够比较均匀的在各处理器上执行计算, 从而使得数据处理系统达到整体性能的最优的运行效果。 图 2为本发明提供的数据处理方法另一实施例的流程图, 如图 2所示, 本实施例以性能参数为权重值和性能分举例说明, 但不以此做任何限定, 该 方法包括:
S20 数据处理系统根据算子的类型得到各类型算子对应各处理器的权 重值和性能分。
S202、 数据处理系统将各类型算子对应各处理器的权重值和性能分对应 写入分配规则表。
其中, 算子的类型至少包括选择、 投影和分组。 进一歩地, 在数据处理 系统初始化的时候, 并没有生成执行树, 所以算子在各个处理器的权重值和 性能分根据算子的类型预先写入分配规则表中。
举例来说, 构建一张包含算子在普通处理器和异构处理器上处理能力的 规则表, 其处理能力由性能分来表示, 规则表同时可以包括权重值, 用来矫 正运行过程中的性能偏差。 构建一个规则库, 该规则库保存了各算子在每个 普通处理器和异构处理器上的运行性能分, 该性能分可以动态更新。
表 1为分配规则表, 如表 1所示, 数据处理系统构建一张分配规则表, 该分配规则表存储了每个算子在各个处理器上的性能分和权重值, 也可以写 入一些其他信息, 在此不再列举。 需要说明的是, 该分配规则表只是本实施 例中的某一种举例, 除了核心的性能分外, 该表的其它部分可以动态增删。
軒 性能分 权重值 处理 选择 10 1 GPU
8 1 FPGA
5 1 CPU 投影 5 GPU
1
8 FPGA
1
8 CPU
1
分组 10 GPU
1
8 FPGA
1
5 CPU
1
Figure imgf000010_0001
上述表 1 中的性能分初始时, 可以通过两种方法获取, 一是从指定文件 读取, 该指定文件可以是硬件确定时固定生成的, 其生成的规则可以是根据 数据处理系统中数据库系统的性能密集程度分配的, 如按照测试得到处理器 的性能较密集, 则性能分较高, 测试得到处理器的性能较稀疏, 则性能分较 低; 二是初始化为一个平均值, 即将所有的处理器的性能分都设定为同一个 数字, 如整个系统的性能测算得分为 100, 共有 20个处理器, 则可以将每个 处理器的性能分设定为 5。 本实施例使用第一种方法获取为例进行说明, 第 一种方法获取的好处是, 性能分可以按照测试的值初始化, 保证了性能的优 化度。
性能分的权重值初始化的数字获得也是利用上述的两种获取方法, 在此 不再赘述。 权重值在运行过程中会动态变化, 如, 权重值根据数据量, 并行 性, 硬件型号的变化而变化等。 如, 系统的性能分可能需要永久变化, 包括 硬件型号发生变化, 处理能力提升或者下降; 数据库数据规模发生较大的变 化。 这种变化是永久性的, 这个时候需要更新性能分。 如果硬件发生变化, 则按照核数的增减和当前比值进行调整, 如原来是 100核性能分是 10, 现在 是 80核, 则性能分是 80/100*10=8。
分配规则表根据上述的方法获取了性能分后, 可以根据各个算子在不同 处理器上的运行速度, 进行归一化, 将归一化的数值也写入到分配规则表中 对应的性能分字段中。 且该性能分可以动态更新, 即在运行时获取同一算子 在不同处理器上的实际运行速度时, 进行归一化, 将该归一化的结果更新后 写入分配规则表中对应的性能分字段中。
由于对处理器的性能权重进行了动态更新, 保证了数据处理系统执行一 段时间后会收敛到比较正确的性能指标。 因为数据处理系统的硬件、 互连和 数据量等因素都会影响整个性能分的评估, 因此引入了权重值来调整这些因 素对性能分的评估, 这种微调带来的好处就是最好的处理性能趋近了数据处 理系统中数据库系统的真实最优性能。
5203、 数据处理系统根据获取的査询语句形成执行树。
5204、 数据处理系统遍历执行树上的算子获取每个算子的分配策略, 其 中, 分配策略根据设定的分配规则表中各处理器对应各算子的权重值和性能 分的乘积,将乘积最高的权重值和性能分对应的处理器确定为算子的处理器, 处理器包括异构处理器。
需要说明的是, 遍历执行树上的算子获取各算子的分配策略也可以形成 分配树, 使得该各算子按照该分配树的树节点和结构进行下一歩的计算。
举例来说, 若设定的分配规则表如表 2所示, 选择算子在有 GPU功能的 异构处理器上的性能分为 10、权重值为 1, 则选择算子在有 GPU功能的异构 处理器上乘积为 10; 选择算子在普通处理器上的性能分为 5、 权重值为 1, 则选择算子在普通处理器上的乘积为 5 ; 投影算子在有 GPU功能的异构处理 器上的性能分为 5、 权重值为 1, 则投影算子在有 GPU功能的异构处理器上 乘积为 5 ; 投影算子在普通处理器上的性能分为 8、 权重值为 1, 则选择算子 在普通处理器上的乘积为 8。 那么, 分配策略可以有四种选择, 选择算子和 投影算子可以分别在不同处理器上, 也可以都在一个处理器上。 可以利用下 述方法确定分配策略。选择算子在有 GPU功能的异构处理器上, 投影算子也 在普通处理器上的总分数为 10+5=15 ; 选择算子普通处理器上, 投影算子也 在普通处理器上的总分数为 5+8=13; 选择算子在有 GPU功能的异构处理器 上, 投影算子在普通处理器上的总分数为 10+8=18; 选择算子在普通处理器 上, 投影算子在有 GPU功能的异构处理器上的总分数为 5+5=10。 选择其中 数值最高的即选择算子在有 GPU功能的异构处理器上,投影算子在普通处理 器上的分配策略。 軒 性能分 权重值 处理 选择 10 1 GPU
5 1 CPU 投影 5 1 GPU
8 1 CPU 表 2
需要说明的是, 上述举例仅为说明, 不以此做任何限定。
S205、数据处理系统根据各算子的分配策略确定为各算子分配的处理器。 进一歩地, 数据处理系统考虑各处理器的并行性和载荷平衡, 因此所 有算子可能会聚集在某一个处理器上, 需要进行载荷平衡, 读取每个算子的 分配策略, 确定为各算子分配的处理器是哪个。
S206、 数据处理系统根据各处理器的载荷情况和 /或各算子的可并行情 况对为各算子分配的处理器进行调整。
举例来说, 各算子中有至少两个算子为可并行计算的算子, 则将至少两 个算子分配到不同的处理器执行计算, 如, 两个算子为识别双子节点的操作 符, 并判断双子节点操作符是否在同一处理器上, 如果是, 则将分数较低的 部分算子转移到另外的处理器上。 该另外的处理器可以选择对该算子的处理 能力仅次于原处理器的。 或,
各算子中有至少两个算子为串行计算的算子, 则将至少两个算子分配到 同一个处理器执行计算; 或,
各算子之间无关联, 则将各算子分配到不同的处理器执行计算。
需要说明的是, 当需要将算子分配到其他处理器上时, 也可以不按照上 述规则而随机分配, 并不以上述举例说明的情况为限定。 对各处理器进行动 态平衡, 保证了数据处理系统的并行性。 因为不同处理器之间可以并行运行, 所以对处理器进行动态载荷平衡, 能够进一歩加速执行速度, 提升系统的整 体性能。
S207、 数据处理系统存储重要运行信息。 举例来说, 重要运行信息可以是分配的处理器执行完成计算的时间, 运 行时的数据量, 算子所在处理器等。 其中, 数据处理系统存储运行时的数据 量可以是当某个算子 A完成了算子 A在所有的处理器上运行,并且运行的数 据量相同, 则将该算子 A的运行时间进行归一化处理, 同时作为归一化数据 记录在该数据量下, 以作为下一次运行时的权重值计算参考影响因素。
本发明实施例提供的数据处理方法, 数据处理系统根据获取的査询语句 形成执行树, 其中, 执行树的每个树节点为一个算子; 遍历执行树上的算子 获取各算子的分配策略, 其中, 分配策略根据设定的分配规则表中各类型的 算子对应各处理器的性能参数确定, 处理器包括异构处理器; 根据各算子的 分配策略分别将各算子分配到处理器中, 以使得处理器对分配到的算子执行 计算。 这样一来, 对于执行树上算子增加了分配的处理过程, 使得在存在异 构处理器的数据处理系统中, 算子能够比较均匀的在各处理器上执行计算, 从而使得数据处理系统达到整体性能的最优的运行效果。 图 3为本发明提供的数据处理系统实施例的结构示意图, 如图 3所示, 数据处理系统 30包括: 执行树构建模块 301、分配模块 302和执行模块 303。
执行树构建模块 301, 用于根据获取的査询语句形成执行树, 其中, 执 行树的每个树节点为一个算子。
分配模块 302, 用于遍历执行树上的算子获取各算子的分配策略, 其中, 分配策略根据设定的分配规则表中各类型的算子对应各处理器的性能参数确 定, 处理器包括异构处理器。
执行模块 303, 用于根据各算子的分配策略分别将各算子分配到处理器 中, 以使得处理器对分配到的算子执行计算。
进一歩地,图 4为本发明提供的数据处理系统另一实施例的结构示意图, 如图 4所示, 数据处理系统 30还包括: 获取模块 304和分配规则表构建模块 305。
获取模块 304, 用于根据算子的类型得到各类型算子对应各处理器的性 能参数。
分配规则表构建模块 305, 用于将各类型算子对应各处理器的性能参数 对应写入分配规则表, 其中, 算子的类型至少包括选择、 投影和分组。
举例来说, 性能参数为性能分, 或, 性能参数为权重值和性能分; 其中, 权重值为性能分的权重, 是根据影响性能因子确定的, 影响性能因子为数据 量、 并行量、 硬件型号、 网络速度之中的至少一种。
进一歩地, 分配规则表构建模块 305, 还用于对性能分进行动态更新, 其中, 性能分的动态更新为根据一算子在各处理器上的运行速度进行归一化 后的数值, 更新分配规则表中各处理器对应算子的性能分。
举例来说, 若性能参数为权重值和性能分, 分配模块 302可以用于遍历 执行树上的算子获取各算子的分配策略, 其中, 分配策略根据设定的分配规 则表中各处理器对应各算子的权重值和性能分的乘积, 将乘积最高的权重值 和性能分对应的处理器确定为算子的处理器。 进一歩地,图 5为本发明提供的数据处理系统又一实施例的结构示意图, 如图 5所示, 执行模块 303包括: 确定子模块 3031和调度子模块 3032。
确定子模块 3031, 用于根据各算子的分配策略确定为各算子分配的处理 器。
调度子模块 3032,用于根据各处理器的载荷情况和 /或各算子的可并行情 况对为各算子分配的处理器进行调整。
举例来说, 调度子模块 3032可以用于各算子中有至少两个算子为可并 行计算的算子, 则将至少两个算子分配到不同的处理器执行计算; 或, 调度 子模块 3032还可以用于各算子中有至少两个算子为串行计算的算子,则将至 少两个算子分配到同一个处理器执行计算; 或, 调度子模块 3032还可以用于 各算子之间无关联, 则将各算子分配到不同的处理器执行计算。
本发明实施例提供的数据处理系统, 数据处理系统根据获取的査询语句 形成执行树, 其中, 执行树的每个树节点为一个算子; 遍历执行树上的算子 获取各算子的分配策略, 其中, 分配策略根据设定的分配规则表中各类型的 算子对应各处理器的性能参数确定, 处理器包括异构处理器; 根据各算子的 分配策略分别将各算子分配到处理器中, 以使得处理器对分配到的算子执行 计算。 这样一来, 对于执行树上算子增加了分配的处理过程, 使得在存在异 构处理器的数据处理系统中, 算子能够比较均匀的在各处理器上执行计算, 从而使得数据处理系统达到整体性能的最优的运行效果。 图 6为本发明提供的另一数据处理系统实施例的结构示意图, 如图 6所 示, 数据处理系统 40包括: 第一处理器 401、 第二处理器 402和分配器 403。
第一处理器 401, 用于根据获取的査询语句形成执行树, 其中, 执行树 的每个树节点为一个算子; 遍历执行树上的算子获取各算子的分配策略, 其 中, 分配策略根据设定的分配规则表中各类型的算子对应各第二处理器 402 的性能参数确定, 第二处理器 402包括异构处理器。
分配器 403, 用于根据各算子的分配策略分别将各算子分配到第二处理 器 402中, 以使得第二处理器 402对分配到的算子执行计算。
进一歩地, 第一处理器 401, 还用于根据算子的类型得到各类型算子对 应各第二处理器 402的性能参数; 将各类型算子对应各第二处理器 402的性 能参数写入分配规则表, 其中, 算子的类型至少包括选择、 投影和分组。
举例来说, 性能参数为性能分, 或, 性能参数为权重值和性能分; 其中, 权重值为性能分的权重, 是根据数据量、 并行量、 硬件型号、 网络速度等影 响性能因子之中的至少一种确定的。
进一歩地, 第一处理器 401, 还用于对性能分进行动态更新, 其中, 性 能分的动态更新为根据一算子在各第二处理器 402上的运行速度进行归一化 后的数值更新分配规则表中各第二处理器 402对应算子的性能分。
举例来说, 性能参数为权重值和性能分; 第一处理器 401, 具体用于遍 遍历执行树上的算子获取各算子的分配策略, 其中, 分配策略根据设定的分 配规则表中各第二处理器 402对应各算子的权重值和性能分的乘积, 将乘积 最高的权重值和性能分对应的第二处理器确定为算子的第二处理器。
分配器 403可以用于根据各算子的分配策略确定为各算子分配的第二处 理器 402; 根据各第二处理器 402的载荷情况和 /或各算子的可并行情况对为 各算子分配的第二处理器 402进行调整。
举例来说, 分配器 403, 具体用于各算子中有至少两个算子为可并行计 算的算子, 则将至少两个算子分配到不同的第二处理器 402执行计算; 或, 分配器 403, 具体用于各算子中有至少两个算子为串行计算的算子, 则将至 少两个算子分配到同一个第二处理器 402执行计算; 或, 分配器 403, 具体 用于各算子之间无关联,则将各算子分配到不同的第二处理器 402执行计算。
本发明实施例提供的数据处理系统, 数据处理系统根据获取的査询语句 形成执行树, 其中, 执行树的每个树节点为一个算子; 遍历执行树上的算子 获取各算子的分配策略, 其中, 分配策略根据设定的分配规则表中各类型的 算子对应各处理器的性能参数确定, 处理器包括异构处理器; 根据各算子的 分配策略分别将各算子分配到处理器中, 以使得处理器对分配到的算子执行 计算。 这样一来, 对于执行树上算子增加了分配的处理过程, 使得在存在异 构处理器的数据处理系统中, 算子能够比较均匀的在各处理器上执行计算, 从而使得数据处理系统达到整体性能的最优的运行效果。
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分歩骤 可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的歩骤; 而前述 的存储介质包括: ROM、 RAM,磁碟或者光盘等各种可以存储程序代码的介 质。
最后应说明的是: 以上各实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述各实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改, 或者对其中部分或者全部技术特征进行等同替换; 而这些修改或者替换, 并 不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims

权 利 要 求 书
1、 一种数据处理方法, 其特征在于, 包括:
根据获取的査询语句形成执行树, 其中, 所述执行树的每个树节点为一 个算子;
遍历所述执行树上的算子获取各所述算子的分配策略, 其中, 所述分配 策略根据设定的分配规则表中各类型的所述算子对应各处理器的性能参数确 定, 所述处理器包括异构处理器;
根据各所述算子的所述分配策略分别将各所述算子分配到处理器中, 以 使得所述处理器对分配到的所述算子执行计算。
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据获取的査询语句 形成执行树之前, 还包括:
根据算子的类型得到各类型算子对应各处理器的性能参数;
将各类型算子对应各处理器的性能参数写入分配规则表, 其中, 所述算 子的类型至少包括选择、 投影和分组。
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述性能参数为性能 分, 或, 所述性能参数为权重值和性能分; 其中, 所述权重值为所述性能分 的权重, 是根据影响性能因子确定的, 所述影响性能因子为数据量、 并行量、 硬件型号、 网络速度之中的至少一种。
4、 根据权利要求 3所述的方法, 其特征在于, 还包括:
对所述性能分进行动态更新, 其中, 所述性能分的动态更新为根据一算 子在各所述处理器上的运行速度进行归一化后的数值, 更新分配规则表中各 所述处理器对应所述算子的性能分。
5、 根据权利要求 1~4任一项所述的方法, 其特征在于,
所述性能参数为所述权重值和所述性能分;
所述分配策略根据设定的分配规则表中各处理器对应各所述算子的性能 参数确定包括:
分配策略根据设定的分配规则表中各所述处理器对应各所述算子的权重 值和性能分的乘积, 将乘积最高的所述权重值和所述性能分对应的处理器确 定为所述算子的处理器。
6、 根据权利要求 1~5任一项所述的方法, 其特征在于, 所述根据各所述 算子的所述分配策略分别将各所述算子分配到处理器中包括:
根据各所述算子的所述分配策略确定为各所述算子分配的所述处理器; 根据各所述处理器的载荷情况和 /或各所述算子的可并行情况对为各所 述算子分配的处理器进行调整。
7、 根据权利要求 6所述的方法, 其特征在于, 所述根据各所述算子的可 并行情况对为各所述算子分配的处理器进行调整包括:
各所述算子中有至少两个算子为可并行计算的算子, 则将所述至少两个 算子分配到不同的所述处理器执行计算; 或,
各所述算子中有至少两个算子为串行计算的算子, 则将所述至少两个算 子分配到同一个所述处理器执行计算; 或,
各所述算子之间无关联, 则将各所述算子分配到不同的所述处理器执行 计算。
8、 一种数据处理系统, 其特征在于, 包括:
执行树构建模块, 用于根据获取的査询语句形成执行树, 其中, 所述执 行树的每个树节点为一个算子;
分配模块, 用于遍历所述执行树上的算子获取各所述算子的分配策略, 其中, 所述分配策略根据设定的分配规则表中各类型的所述算子对应各处理 器的性能参数确定, 所述处理器包括异构处理器;
执行模块, 用于根据各所述算子的所述分配策略分别将各所述算子分配 到处理器中, 以使得所述处理器对分配到的所述算子执行计算。
9、 根据权利要求 8所述的系统, 其特征在于, 还包括:
获取模块, 用于根据算子的类型得到各类型算子对应各处理器的性能参 数;
分配规则表构建模块, 用于将各类型算子对应各处理器的性能参数写入 分配规则表, 其中, 所述算子的类型至少包括选择、 投影和分组。
10、 根据权利要求 8或 9所述的系统, 其特征在于,
所述性能参数为性能分, 或, 所述性能参数为权重值和性能分; 其中, 所述权重值为所述性能分的权重, 是根据影响性能因子确定的, 所述影响性 能因子为数据量、 并行量、 硬件型号、 网络速度之中的至少一种。
11、 根据权利要求 10所述的系统, 其特征在于,
所述分配规则表构建模块, 还用于对所述性能分进行动态更新, 其中, 所述性能分的动态更新为根据一算子在各所述处理器上的运行速度进行归一 化后的数值, 更新分配规则表中各所述处理器对应所述算子的性能分。
12、 根据权利要求 8~11任一项所述的系统, 其特征在于,
所述性能参数为所述权重值和所述性能分;
所述分配模块, 具体用于遍历所述执行树上的算子获取各所述算子的分 配策略, 其中, 所述分配策略根据设定的分配规则表中各所述处理器对应各 所述算子的权重值和性能分的乘积, 将乘积最高的所述权重值和所述性能分 对应的处理器确定为所述算子的处理器。
13、 根据权利要求 8~12任一项所述的系统, 其特征在于, 所述执行模块 包括:
确定子模块, 用于根据各所述算子的所述分配策略确定为各所述算子分 配的所述处理器;
调度子模块, 用于根据各所述处理器的载荷情况和 /或各所述算子的可并 行情况对为各所述算子分配的处理器进行调整。
14、 根据权利要求 13所述的系统, 其特征在于,
所述调度子模块,具体用于各所述算子中有至少两个算子为可并行计算 的算子, 则将所述至少两个算子分配到不同的所述处理器执行计算; 或, 所述调度子模块, 具体用于各所述算子中有至少两个算子为串行计算的 算子, 则将所述至少两个算子分配到同一个所述处理器执行计算; 或,
所述调度子模块, 具体用于各所述算子之间无关联, 则将各所述算子分 配到不同的所述处理器执行计算。
15、 一种数据处理系统, 其特征在于, 包括:
第一处理器, 用于根据获取的査询语句形成执行树, 其中, 所述执行树 的每个树节点为一个算子; 遍历所述执行树上的算子获取各所述算子的分配 策略, 其中, 所述分配策略根据设定的分配规则表中各类型的所述算子对应 各第二处理器的性能参数确定, 所述第二处理器包括异构处理器; 分配器, 用于根据各所述算子的所述分配策略分别将各所述算子分配到 第二处理器中, 以使得所述第二处理器对分配到的所述算子执行计算。
16、 根据权利要求 15所述的系统, 其特征在于,
所述第一处理器, 还用于根据算子的类型得到各类型算子对应各第二处 理器的性能参数; 将各类型算子对应各第二处理器的性能参数写入分配规则 表, 其中, 所述算子的类型至少包括选择、 投影和分组。
17、 根据权利要求 15或 16所述的系统, 其特征在于,
所述性能参数为性能分, 或, 所述性能参数为权重值和性能分; 其中, 所述权重值为所述性能分的权重, 是根据影响性能因子确定的, 所述影响性 能因子为数据量、 并行量、 硬件型号、 网络速度之中的至少一种。
18、 根据权利要求 17所述的系统, 其特征在于,
所述第一处理器, 还用于对所述性能分进行动态更新, 其中, 所述性能 分的动态更新为根据一算子在各所述第二处理器上的运行速度进行归一化后 的数值更新分配规则表中各所述第二处理器对应所述算子的性能分。
19、 根据权利要求 15~18任一项所述的系统, 其特征在于,
所述性能参数为所述权重值和所述性能分;
所述第一处理器, 具体用于遍历所述执行树上的算子获取各所述算子的 分配策略, 其中, 分配策略根据设定的分配规则表中各所述第二处理器对应 各所述算子的权重值和性能分的乘积, 将乘积最高的所述权重值和所述性能 分对应的第二处理器确定为所述算子的第二处理器。
20、 根据权利要求 15~19任一项所述的系统, 其特征在于,
所述分配器, 具体用于根据各所述算子的所述分配策略确定为各所述算 子分配的所述第二处理器; 根据各所述第二处理器的载荷情况和 /或各所述算 子的可并行情况对为各所述算子分配的第二处理器进行调整。
21、 根据权利要求 20所述的系统, 其特征在于,
所述分配器,具体用于各所述算子中有至少两个算子为可并行计算的算 子, 则将所述至少两个算子分配到不同的所述第二处理器执行计算; 或, 所述分配器,具体用于各所述算子中有至少两个算子为串行计算的算子, 则将所述至少两个算子分配到同一个所述第二处理器执行计算; 或, 所述分配器, 具体用于各所述算子之间无关联, 则将各所述算子分配到 不同的所述第二处理器执行计算。
PCT/CN2014/074798 2014-04-04 2014-04-04 一种数据处理方法及系统 WO2015149350A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201480000560.5A CN105210059B (zh) 2014-04-04 2014-04-04 一种数据处理方法及系统
PCT/CN2014/074798 WO2015149350A1 (zh) 2014-04-04 2014-04-04 一种数据处理方法及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/074798 WO2015149350A1 (zh) 2014-04-04 2014-04-04 一种数据处理方法及系统

Publications (1)

Publication Number Publication Date
WO2015149350A1 true WO2015149350A1 (zh) 2015-10-08

Family

ID=54239321

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/074798 WO2015149350A1 (zh) 2014-04-04 2014-04-04 一种数据处理方法及系统

Country Status (2)

Country Link
CN (1) CN105210059B (zh)
WO (1) WO2015149350A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980533A (zh) * 2016-01-18 2017-07-25 杭州海康威视数字技术股份有限公司 基于异构处理器的任务调度方法、装置及电子设备
CN114490193A (zh) * 2022-04-14 2022-05-13 网络通信与安全紫金山实验室 一种面向异构冗余系统的恢复方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10877795B2 (en) * 2018-07-25 2020-12-29 Microsoft Technology Licensing, Llc Automatic tuning of a dataflow execution graph
CN110069527B (zh) * 2019-04-22 2021-05-14 电子科技大学 一种面向数据库的gpu和cpu异构加速方法
CN112699140B (zh) * 2019-10-23 2023-12-26 阿里巴巴集团控股有限公司 数据处理方法、装置、设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789488A (zh) * 2012-06-29 2012-11-21 用友软件股份有限公司 数据查询处理系统和数据查询处理方法
CN102831011A (zh) * 2012-08-10 2012-12-19 上海交通大学 一种基于众核系统的任务调度方法及装置
CN103034614A (zh) * 2012-12-19 2013-04-10 哈尔滨理工大学 基于关键路径和任务复制的单任务多核调度方法
WO2013080152A1 (en) * 2011-12-01 2013-06-06 International Business Machines Corporation Dynamically configurable placement engine

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8688622B2 (en) * 2008-06-02 2014-04-01 The Boeing Company Methods and systems for loading data into a temporal data warehouse
US8683468B2 (en) * 2011-05-16 2014-03-25 Advanced Micro Devices, Inc. Automatic kernel migration for heterogeneous cores

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013080152A1 (en) * 2011-12-01 2013-06-06 International Business Machines Corporation Dynamically configurable placement engine
CN102789488A (zh) * 2012-06-29 2012-11-21 用友软件股份有限公司 数据查询处理系统和数据查询处理方法
CN102831011A (zh) * 2012-08-10 2012-12-19 上海交通大学 一种基于众核系统的任务调度方法及装置
CN103034614A (zh) * 2012-12-19 2013-04-10 哈尔滨理工大学 基于关键路径和任务复制的单任务多核调度方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980533A (zh) * 2016-01-18 2017-07-25 杭州海康威视数字技术股份有限公司 基于异构处理器的任务调度方法、装置及电子设备
CN106980533B (zh) * 2016-01-18 2020-04-28 杭州海康威视数字技术股份有限公司 基于异构处理器的任务调度方法、装置及电子设备
CN114490193A (zh) * 2022-04-14 2022-05-13 网络通信与安全紫金山实验室 一种面向异构冗余系统的恢复方法及装置
CN114490193B (zh) * 2022-04-14 2022-09-02 网络通信与安全紫金山实验室 一种面向异构冗余系统的恢复方法及装置

Also Published As

Publication number Publication date
CN105210059A (zh) 2015-12-30
CN105210059B (zh) 2018-12-07

Similar Documents

Publication Publication Date Title
CN110168516B (zh) 用于大规模并行处理的动态计算节点分组方法及系统
US10558672B1 (en) System and method for executing queries on multi-graphics processing unit systems
WO2019128475A1 (zh) 数据训练方法及装置、存储介质、电子装置
WO2015149350A1 (zh) 一种数据处理方法及系统
US20130151535A1 (en) Distributed indexing of data
CN105550274B (zh) 双副本并行数据库的查询方法和装置
JP2021513694A (ja) ダークローンチ実現方法、装置、計算ノード及びシステム
US9329906B2 (en) Virtual machine mobility using resource pools
US20170193077A1 (en) Load balancing for large in-memory databases
TWI786564B (zh) 任務調度方法和裝置、儲存媒體及計算機設備
CN112328378A (zh) 任务调度方法、计算机设备及存储介质
JP2018515844A (ja) データ処理方法及びシステム
US20190056942A1 (en) Method and apparatus for hardware acceleration in heterogeneous distributed computing
WO2021057465A1 (zh) 一种对深度学习模型进行并行处理的方法及装置
US20240289015A1 (en) Data access of distributed graph learning architecture
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
CN109241193B (zh) 分布式数据库的处理方法和装置,及服务器集群
CN106502790A (zh) 一种基于数据分布的任务分配优化方法
US11016947B1 (en) Apparatus and method for recommending and maintaining analytical views
US20240176657A1 (en) Task processing method and apparatus, electronic device, storage medium and program product
US11675515B2 (en) Intelligent partitioning engine for cluster computing
US9298769B1 (en) Method and apparatus to facilitate discrete-device accelertaion of queries on structured data
CN109582461A (zh) 一种针对Linux容器的资源部署方法及系统
CN110196879B (zh) 数据处理方法、装置、计算设备及存储介质
CN104156268B (zh) 一种GPU上MapReduce的负载分配和线程结构优化方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14888118

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14888118

Country of ref document: EP

Kind code of ref document: A1