WO2020211718A1 - 一种数据处理方法、装置及设备 - Google Patents

一种数据处理方法、装置及设备 Download PDF

Info

Publication number
WO2020211718A1
WO2020211718A1 PCT/CN2020/084425 CN2020084425W WO2020211718A1 WO 2020211718 A1 WO2020211718 A1 WO 2020211718A1 CN 2020084425 W CN2020084425 W CN 2020084425W WO 2020211718 A1 WO2020211718 A1 WO 2020211718A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
resource
processed
target
cost value
Prior art date
Application number
PCT/CN2020/084425
Other languages
English (en)
French (fr)
Inventor
周祥
王烨
李鸣翔
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020211718A1 publication Critical patent/WO2020211718A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • This application relates to the field of Internet technology, and in particular to a data processing method, device and equipment.
  • Data Lake Analytics is used to provide users with serverless query analysis services, which can analyze and query massive amounts of data in any dimension, and support high concurrency, low latency (millisecond response), Real-time online analysis, massive data query and other functions.
  • a database and a computing node may be included.
  • the database is used to store a large amount of data
  • the computing node is used to receive an execution plan, and process the data in the database according to the execution plan.
  • the data lake analysis system provides multiple types of computing resources. For example, CPU (Central Processing Unit, central processing unit) resources, FPGA (Field Programmable Gate Array, field programmable logic gate array) resources, GPU (Graphics Processing Unit, graphics processing unit) resources, etc., computing nodes can use these computing resources Process the data.
  • CPU Central Processing Unit, central processing unit
  • FPGA Field Programmable Gate Array, field programmable logic gate array
  • GPU Graphics Processing Unit, graphics processing unit
  • This application provides a data processing method, the method includes:
  • the to-be-processed operator is executed by the computing resource corresponding to the target resource type.
  • This application provides a data processing method applied to a data lake analysis platform, the data lake analysis platform is used to provide users with serverless data processing services, and the method includes:
  • the computing resource corresponding to the target resource type is used to execute the to-be-processed operator based on the cloud database provided by the data lake analysis platform.
  • This application provides a data processing method, the method includes:
  • the designated operator is executed by the computing resource corresponding to the designated resource type, and the cost value of the computing resource in the execution process is obtained; wherein, the designated resource type is any resource type among multiple resource types, and the designated operator Sub is any one of multiple operators;
  • the operator resource registry includes the correspondence between the designated operator, the designated resource type, and the cost value of the computing resource;
  • the operator resource registry is used to determine the cost value corresponding to the to-be-processed operator corresponding to the data processing request, and determine the target resource type of the to-be-processed operator according to the cost value, and pass the target resource
  • the computing resource corresponding to the type executes the to-be-processed operator.
  • This application provides a data processing method, the method includes:
  • Target execution plan corresponding to the original execution plan, the target execution plan including the to-be-processed operator and the target resource type corresponding to the to-be-processed operator;
  • the target execution plan is sent to the computing resource corresponding to the target resource type, so that the computing resource executes the to-be-processed operator according to the target execution plan.
  • the application provides a data processing device, the device includes:
  • the obtaining module is configured to obtain the to-be-processed operator corresponding to the data processing request, and to obtain the cost value of the to-be-processed operator under multiple resource types;
  • the selection module is used to select a target resource type from the multiple resource types according to the cost value
  • the processing module is configured to execute the to-be-processed operator through the computing resource corresponding to the target resource type.
  • This application provides a data processing device, including:
  • a processor and a machine-readable storage medium stores several computer instructions, and the processor performs the following processing when executing the computer instructions:
  • the to-be-processed operator is executed by the computing resource corresponding to the target resource type.
  • the cost value corresponding to multiple resource types of the operator to be processed can be obtained, and the target resource type can be selected from the multiple resource types according to the cost value, and the target resource type
  • the corresponding computing resource executes the operator to be processed.
  • a target computing resource such as CPU resource, FPGA resource, GPU resource, etc.
  • the target computing resource can be used to execute the to-be-processed operator, for different to-be-processed operators, It can also correspond to different target computing resources, so that the target computing resources can be selected reasonably, and the optimal execution plan can be obtained, which can achieve higher processing performance and better user experience.
  • FIG. 1 is a schematic flowchart of a data processing method in an embodiment of the present application
  • FIGS. 2 and 3 are schematic diagrams of application scenarios in an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a data processing method in another embodiment of the present application.
  • FIG. 5 is a processing schematic diagram of the optimizer of the front-end node in an embodiment of the present application.
  • FIGS. 6 and 7 are schematic diagrams of processing of the SQL operator execution unit in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of processing a target execution plan in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a data processing device in an embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of a data processing device in an embodiment of the present application.
  • first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
  • second information may also be referred to as first information.
  • if used can be interpreted as "when” or "when” or "in response to certainty.”
  • This embodiment of the application proposes a data processing method, which can be applied to any device, such as any device of a data lake analysis system.
  • any device such as any device of a data lake analysis system.
  • Figure 1 is a flowchart of the method, and the method may include:
  • Step 101 Obtain a to-be-processed operator corresponding to a data processing request.
  • Step 102 Obtain the corresponding cost value of the to-be-processed operator under multiple resource types.
  • the operator resource registry can be queried through the to-be-processed operator, and the cost value corresponding to the multiple resource types of the to-be-processed operator can be obtained.
  • the operator resource registry can be pre-generated;
  • the sub-resource registration table may include the correspondence between operators, resource types, and cost values.
  • the cost value of the to-be-processed operator under multiple resource types may also include: executing the designated operator through the computing resource corresponding to the designated resource type, and acquiring the execution process
  • the cost value of the computing resource in the; the designated resource type is any resource type among multiple resource types, and the designated operator is any one of the multiple operators.
  • an operator resource registry may be generated, and the operator resource registry may include the correspondence between the designated operator, the designated resource type, and the cost value of the computing resource.
  • Step 103 Select a target resource type from multiple resource types according to the cost value.
  • the minimum cost value can be selected from the cost values corresponding to the multiple resource types of the operator to be processed, and the resource type corresponding to the minimum cost value can be determined as the target resource type of the operator to be processed.
  • Step 104 Execute the to-be-processed operator through the computing resource corresponding to the target resource type.
  • obtaining the to-be-processed operator corresponding to the data processing request may include: obtaining an original execution plan corresponding to the data processing request, where the original execution plan includes the to-be-processed operator.
  • executing the to-be-processed operator through the computing resource corresponding to the target resource type may include but is not limited to: obtaining a target execution plan corresponding to the original execution plan, the target execution plan including the to-be-processed operator and the target resource type, The target execution plan is sent to the computing resource corresponding to the target resource type, so that the computing resource executes the target execution plan, and the process of the computing resource executing the target execution plan is to execute the target execution plan. The process of each pending operator.
  • each to-be-processed operator in the target execution plan data processing needs to be performed according to the to-be-processed operator. For example, if the operator to be processed is a scan operator, the data needs to be scanned; if the operator to be processed is a filter operator, the data needs to be filtered; if the operator to be processed is The sort (sort) operator needs to sort the data, and there is no restriction on this.
  • obtaining the target execution plan corresponding to the original execution plan may include but is not limited to: if there are multiple original execution plans corresponding to the data processing request, target each original execution plan in the multiple original execution plans According to the plan, the total cost value corresponding to the original execution plan can be obtained according to the cost value of the to-be-processed operator in the original execution plan under the target resource type, that is, the total cost value corresponding to each original execution plan. Select the original execution plan with the smallest total cost value from the multiple original execution plans, and obtain the target execution plan corresponding to the original execution plan with the smallest total cost value.
  • the resource type may include, but is not limited to, one or more of the following: CPU resource type, FPGA resource type, GPU resource type.
  • CPU resource type FPGA resource type
  • GPU resource type the CPU resource type, FPGA resource type, and GPU resource type are just examples, and other resource types may also be used, and there is no restriction on this.
  • the above execution order is just an example given for the convenience of description. In practical applications, the execution order between the steps can also be changed, and the execution order is not limited. Moreover, in other embodiments, the steps of the corresponding method are not necessarily executed in the order shown and described in this specification, and the steps included in the method may be more or less than those described in this specification. In addition, a single step described in this specification may be decomposed into multiple steps for description in other embodiments; multiple steps described in this specification may also be combined into a single step for description in other embodiments.
  • the cost value corresponding to multiple resource types of the operator to be processed can be obtained, and the target resource type can be selected from the multiple resource types according to the cost value, and the target resource type
  • the corresponding computing resource executes the operator to be processed.
  • a target computing resource such as CPU resource, FPGA resource, GPU resource, etc.
  • the target computing resource can be used to execute the to-be-processed operator, for different to-be-processed operators, It can also correspond to different target computing resources, so that the target computing resources can be selected reasonably, and the optimal execution plan can be obtained, which can achieve higher processing performance and better user experience.
  • another data processing method is proposed in the embodiments of this application, which can be applied to a data lake analysis platform, which is used to provide users with serverless data processing services.
  • a data lake analysis platform which is used to provide users with serverless data processing services.
  • the computing resource corresponding to the type executes the operator to be processed; wherein the computing resource corresponding to the target resource type is used to execute the operator to be processed based on the cloud database provided by the data lake analysis platform.
  • the computing resource may specifically be: a CPU cloud server for providing CPU resources; or, an FPGA cloud server for providing FPGA resources; or, a GPU cloud server for providing GPU resources.
  • the cloud database in this embodiment refers to the database provided by the data lake analysis platform.
  • the data lake analysis platform may be a storage cloud platform that mainly stores data, or a computing cloud platform that focuses on data processing, or, A comprehensive cloud computing platform that takes into account both computing and data storage and processing. There is no restriction on this data lake analysis platform.
  • the cloud database provided by the data lake analysis platform can be used to provide users with serverless query analysis services, which can analyze and query massive data in any dimension, support high concurrency, low latency (millisecond response), Real-time online analysis, massive data query and other functions.
  • the method includes: executing a designated operator through a computing resource corresponding to a designated resource type, and obtaining the computing resource in the execution process
  • the cost value of; the designated resource type is any one of the multiple resource types, and the designated operator is any one of the multiple operators;
  • the operator resource registry is generated; the operator resource registry can include the designated The corresponding relationship between the operator, the designated resource type and the cost value of the computing resource; wherein the operator resource registry is used to determine the cost value corresponding to the to-be-processed operator corresponding to the data processing request, and according to the The cost value determines the target resource type of the operator to be processed, and executes the operator to be processed through the computing resource corresponding to the target resource type.
  • the method includes: obtaining an original execution plan corresponding to the data processing request, and the original execution plan may include the operator to be processed; obtaining The cost value corresponding to the multiple resource types of the to-be-processed operator; select the target resource type from the multiple resource types according to the cost value; obtain the target execution plan corresponding to the original execution plan, the target execution plan including The to-be-processed operator and the target resource type corresponding to the to-be-processed operator; sending the target execution plan to the computing resource corresponding to the target resource type, so that the computing resource executes the to-be-processed operator according to the target execution plan.
  • Obtaining the target execution plan corresponding to the original execution plan may include: if there are multiple original execution plans corresponding to the data processing request, for each original execution plan in the multiple original execution plans, according to the original execution plan to be The cost value corresponding to the target resource type is processed by the operator, and the total cost value corresponding to the original execution plan is obtained; the original execution plan with the smallest total cost value is selected from multiple original execution plans, and the original execution plan with the smallest total cost value is obtained The target execution plan corresponding to the execution plan.
  • FIG. 2 is a schematic diagram of the application scenario of this embodiment of the application.
  • the method can be applied to clients, load balancing devices, front-end nodes (front-nodes, also referred to as front-end servers), and compute nodes (compute nodes). It can be called a computing server) and a data source system, such as a data lake analysis system.
  • a data lake analysis system can also include other servers, and there is no restriction on the system structure.
  • front-end nodes are taken as an example. In practical applications, the number of front-end nodes can also be other numbers, which is not limited. As shown in Fig. 2, taking 4 computing nodes as an example, in actual applications, the number of computing nodes can also be other numbers, and there is no restriction on this.
  • the client can be an APP (Application) included in a terminal device (such as a PC (Personal Computer), a notebook computer, a mobile terminal, etc.), or it can be a browser included in the terminal device, which is not limited .
  • the load balancing device is used to load balance the data processing request, for example, after receiving the data processing request, load balance the data processing request to each front-end node.
  • multiple front-end nodes are used to provide the same function and form a resource pool of front-end nodes.
  • each front-end node in the resource pool it is used to receive the data processing request sent by the client, perform SQL (Structured Query Language) analysis on the data processing request, generate an execution plan based on the analysis result, and process the Implementation plan.
  • the front-end node may send the execution plan to the computing node, and the computing node will process the execution plan.
  • the execution plan can be sent to a computing node, and the execution plan can be processed by the computing node; or, the execution plan can be disassembled into multiple sub-plans, and the multiple sub-plans can be sent to multiple computing nodes.
  • the node handles sub-plans.
  • multiple computing nodes are used to provide the same function and form a resource pool of computing nodes. For each computing node in the resource pool, if the execution plan sent by the front-end node is received, the execution plan can be processed; or, if the sub-plan sent by the front-end node is received, the sub-plan can be processed.
  • the data source is used to store various types of data, and there is no restriction on the data type, such as user data, product data, map data, video data, image data, audio data, etc.
  • the data source may include a database.
  • it may be a scenario for heterogeneous data sources, that is, these data sources may be the same type of database or different types of databases.
  • the data source can be a relational database or a non-relational database.
  • the type of data source may include, but is not limited to: OSS (Object Storage Service), Table Store (table storage), HBase (Hadoop Database, Hadoop database), HDFS (Hadoop Distributed File System, Hadoop distributed file system), MySQL (relational database), RDS (Relational Database Service, relational database service), DRDS (Distribute Relational Database Service, distributed relational database service), RDBMS (Relational Database) Management System, relational database management system), SQLServer (ie relational database), PostgreSQL (ie object relational database), MongoDB (ie database based on distributed file storage), etc.
  • OSS Object Storage Service
  • Table Store table storage
  • HBase Hadoop Database
  • HDFS Hadoop Distributed File System
  • MySQL relational database
  • RDS Relational Database Service
  • DRDS Distributed File System
  • RDBMS Relational Database Management System
  • SQLServer ie relational database
  • PostgreSQL ie object relational database
  • MongoDB
  • hardware acceleration technology can be used to accelerate specific computing tasks, such as SQL operators.
  • FPGAs and GPUs can be used to accelerate specific computing tasks.
  • FPGAs are for the acceleration of programmable logic gate arrays
  • GPUs are for the acceleration of large-scale multi-core computing tasks.
  • computing resources may include a CPU cloud server pool, a GPU cloud server pool, and an FPGA cloud server pool.
  • the CPU cloud server pool may include multiple CPU cloud servers, and these CPU cloud servers are used to provide CPU resources, that is, perform data processing through the CPU resources.
  • the GPU cloud server pool includes multiple GPU cloud servers, which are used to provide GPU resources, that is, perform data processing through GPU resources.
  • the FPGA cloud server pool includes multiple FPGA cloud servers, and these FPGA cloud servers are used to provide FPGA resources, that is, perform data processing through FPGA resources.
  • a data processing method is provided.
  • the cloud server can be reasonably selected from the CPU cloud server, GPU cloud server, and FPGA cloud server. Fusion of multiple hardware resources for heterogeneous SQL computing and unified scheduling.
  • an operator resource registration table can be generated in advance.
  • the operator resource registration table can include the corresponding relationship between the operator, the resource type, and the cost value.
  • the cost value can be either a time cost value or a cost value.
  • For the value of resources it can also be the value of time and resources, and it can also be other types of value. There is no restriction on the type of this value.
  • the time cost value is used as an example for description in the following. For other cost value realization methods, please refer to time cost value.
  • this SQL reference job task can include various types of operators, such as scan (scan) operator, filter (filter) operator, hash (hash) operator (for hash join and aggregation), sort (Classification) operator, input (input) operator, output (output) operator, join (join) operator, agg (aggregation) operator.
  • operators such as scan (scan) operator, filter (filter) operator, hash (hash) operator (for hash join and aggregation), sort (Classification) operator, input (input) operator, output (output) operator, join (join) operator, agg (aggregation) operator.
  • the SQL reference task is processed through the computing resource corresponding to the CPU resource type (ie, the CPU cloud server), that is, each operator in the SQL reference task is executed by the CPU cloud server.
  • the CPU cloud server For example, a CPU cloud server with a fixed number of computing units and network bandwidth can be used to run the SQL reference task to execute each operator in the SQL reference task.
  • the cost value of each operator (that is, the SQL operator cost unit) can be counted. See Table 1, which shows the cost value of each operator corresponding to the CPU resource.
  • the time cost value is taken as an example.
  • the cost value of processing a block (data block) that is, time cost unit
  • the cost value of processing a page data page, composed of multiple blocks
  • the SQL reference task is processed by the computing resource corresponding to the GPU resource type (that is, the GPU cloud server), that is, the various operators in the SQL reference task are executed by the GPU cloud server.
  • the GPU cloud server For example, a GPU cloud server with a fixed number of computing units and network bandwidth can be used to run the SQL reference task to execute each operator in the SQL reference task.
  • the cost value of each operator (that is, the SQL operator cost unit) can be counted. See Table 1, which shows the cost value of each operator corresponding to the GPU resource.
  • the time cost value is taken as an example.
  • the cost value of processing a block (that is, the time cost unit) of the GPU cloud server and the cost value of processing a page (a data page composed of multiple blocks) can also be counted.
  • the SQL reference task is processed through the computing resource (ie, FPGA cloud server) corresponding to the FPGA resource type, that is, each operator in the SQL reference task is executed through the FPGA cloud server.
  • the computing resource ie, FPGA cloud server
  • FPGA cloud server with a fixed number of computing units and network bandwidth can be used to run the SQL reference task to execute each operator in the SQL reference task.
  • the cost value of each operator (that is, the SQL operator cost unit) can be counted. See Table 1, which shows the cost value of each operator corresponding to the FPGA resource.
  • the time cost value is taken as an example.
  • the cost value of processing one block that is, time cost unit
  • the cost value of processing one page data page, composed of multiple blocks
  • Table 1 is an example of the operator resource registry.
  • this operator resource registry is just an example. In practical applications, there can be more resource types and more operators.
  • FIG. 4 is a schematic flowchart of the data processing method proposed in the embodiment of this application, the method can be applied to a front-end node, and the method can include the following steps:
  • Step 401 Obtain a data processing request, such as a SQL (Structured Query Language, structured query language) type data processing request, etc., and there is no restriction on the type of the data processing request.
  • a data processing request such as a SQL (Structured Query Language, structured query language) type data processing request, etc.
  • Step 402 Obtain an original execution plan according to the data processing request, that is, the original execution plan corresponding to the data processing request, and the original execution plan may include multiple to-be-processed operators.
  • the original execution plan may include the following operators to be processed: scan operator, filter operator, hash operator, sort operator, input operator, and output operator. These operators will be described as examples in the following.
  • the data processing request is a SQL type data processing request written by the user.
  • This data processing request can be converted into a machine executable execution plan.
  • This execution plan describes a specific execution step that can be optimized by the front-end node
  • the execution plan is generated by the processor.
  • this execution plan is called the original execution plan, and there is no restriction on the generation process of the original execution plan.
  • the original execution plan may include multiple to-be-processed operators (also called nodes), and each to-be-processed operator may represent a calculation step, and there is no restriction on the type of the to-be-processed operator.
  • Step 403 Query the operator resource registry through the to-be-processed operator (that is, each to-be-processed operator in the original execution plan) to obtain the corresponding cost value of the to-be-processed operator under multiple resource types.
  • Step 404 Select the minimum cost value from the cost value corresponding to the multiple resource types of the operator to be processed, and determine the resource type corresponding to the minimum cost value as the target resource type of the operator to be processed.
  • the original execution plan can include the following operators to be processed: scan operator, filter operator, hash operator, sort operator, input operator, and output operator.
  • the corresponding cost value of the child under the GPU resource type is 0.01.
  • the FPGA resource type corresponding to the minimum cost value 0.001 can be determined as the target resource type of the scan operator.
  • the target resource type of the filter operator is the FPGA resource type
  • the target resource type of the hash operator is the FPGA resource type
  • the target resource type of the sort operator is the GPU resource type
  • the target resource type of the input operator is the FPGA resource type.
  • the target resource type of the output operator is the FPGA resource type.
  • Step 405 Obtain a target execution plan corresponding to the original execution plan, where the target execution plan includes to-be-processed operators and target resource types.
  • the target execution plan includes the to-be-processed operator, and can also include the target resource type corresponding to the to-be-processed operator, indicating that the computing resource corresponding to the target resource type needs to execute the to-be-processed operator .
  • the target execution plan may include, but is not limited to: the corresponding relationship between scan operator and FPGA resource type, the corresponding relationship between filter operator and FPGA resource type, the corresponding relationship between hash operator and FPGA resource type, sort operator and GPU resource Type correspondence, input operator and FPGA resource type correspondence, output operator and FPGA resource type correspondence.
  • the target execution plan may also include other content, and there is no restriction on the content of the target execution plan.
  • Step 406 Send the target execution plan to the computing resource corresponding to the target resource type, so that the computing resource executes the target execution plan, that is, executes each to-be-processed operator in the target execution plan.
  • the target execution plan can be sent to the FPGA cloud server corresponding to the FPGA resource type, so that the FPGA cloud server obtains the pending operators that need to be processed by the FPGA cloud server from the target execution plan, that is, the target resource type is FPGA resource Types of operators to be processed, such as scan operator, filter operator, hash operator, input operator and output operator. Then, the FPGA cloud server can use the target execution plan to process these to-be-processed operators, and there is no restriction on the processing process.
  • the target execution plan can be sent to the FPGA cloud server corresponding to the FPGA resource type, so that the FPGA cloud server obtains the pending operators that need to be processed by the FPGA cloud server from the target execution plan, that is, the target resource type is FPGA resource Types of operators to be processed, such as scan operator, filter operator, hash operator, input operator and output operator.
  • the target execution plan can also be sent to the GPU cloud server corresponding to the GPU resource type, so that the GPU cloud server obtains the pending operators that need to be processed by the GPU cloud server from the target execution plan, that is, the target resource type is the GPU resource type
  • the pending operator such as the sort operator.
  • the GPU cloud server can use the target execution plan to process the sort operator, and there is no restriction on this.
  • the target execution plan can be divided into sub-plan 1 and sub-plan 2.
  • Sub-plan 1 can include the to-be-processed operators whose target resource type is FPGA resource type, such as scan operator, filter operator, hash operator, input Operators and output operators, etc.
  • Sub-plan 2 may include to-be-processed operators whose target resource type is a GPU resource type, such as a sort operator. Then, you can send sub-plan 1 to the FPGA cloud server corresponding to the FPGA resource type, so that the FPGA cloud server uses sub-plan 1 to process scan operators, filter operators, hash operators, input operators, and output operators. There is no restriction on this process.
  • the sub-plan 2 can be sent to the GPU cloud server corresponding to the GPU resource type, so that the GPU cloud server uses the sub-plan 2 to process the sort operator, and this processing process is not limited.
  • obtaining the target execution plan corresponding to the original execution plan may include: for each original execution plan, the original execution plan may be determined Execute the target resource type of each operator to be processed in the plan, and determine the corresponding cost value of the operator to be processed under the target resource type, that is, obtain the cost value of each operator to be processed. The cost value of all the operators to be processed is summed to obtain the total cost value corresponding to the original execution plan. In this way, the total cost value corresponding to each original execution plan can be obtained. Then, the original execution plan with the smallest total cost value is selected, and the target execution plan corresponding to the original execution plan with the smallest total cost value is obtained.
  • the target resource type corresponding to the to-be-processed operator can be re-determined.
  • the specific determination method refer to the above-mentioned embodiment, which will not be repeated here.
  • the CPU cloud server can be used for the operator to be processed Perform processing, that is, the target resource type of the to-be-processed operator is CPU resource.
  • the input data of the optimizer of the front-end node is the data processing request and the operator resource registry.
  • the target execution plan can be obtained.
  • the target execution plan includes multiple pending processing. Operators and the target resource types corresponding to these operators to be processed.
  • the target execution plan is output to a computing resource corresponding to the target resource type, such as a CPU cloud server, a GPU cloud server, or an FPGA cloud server.
  • the optimizer can generate a SQL distributed execution plan that integrates the CPU cloud server pool, FPGA cloud server pool, and GPU cloud server pool.
  • the SQL operator execution unit may include an input buffer (buffer), an output buffer, a software processing module, an FPGA processing module, and a GPU processing module. Further, for the CPU cloud server, the SQL operator execution unit may include an input buffer, an output buffer, and a software processing module. For the FPGA cloud server, the SQL operator execution unit may include an input buffer, an output buffer, a software processing module, and an FPGA processing module. For the GPU cloud server, the SQL operator execution unit may include an input buffer, an output buffer, a software processing module, and a GPU processing module.
  • FIG 8 shows a schematic diagram of the processing of a target execution plan.
  • the scan operator and filter operator are processed by the FPGA cloud server, the agg operator is processed by the CPU cloud server, and the hash join operator is processed by the GPU cloud server. Processing, the output operator is processed by the CPU cloud server.
  • a target computing resource (such as a CPU resource, FPGA resource, GPU resource, etc.) can be selected from a plurality of computing resources, and the target computing resource is used to execute the to-be-processed operator.
  • Different to-be-processed operators can correspond to different target computing resources, so that the target computing resources are selected reasonably, and the optimal execution plan is obtained, which can obtain higher processing performance and better user experience.
  • the software processing module, FPGA processing module, and GPU processing module of the SQL operator execution unit can be more universally adapted to run on the cloud CPU cloud server, FPGA cloud server, and GPU cloud server.
  • an embodiment of the present application also provides a data processing device.
  • FIG. 9 it is a structural diagram of the data processing device.
  • the data processing device includes:
  • the obtaining module 91 is configured to obtain the to-be-processed operator corresponding to the data processing request, and to obtain the cost value of the to-be-processed operator under multiple resource types, respectively; the selecting module 92 is used to obtain the cost value from The target resource type is selected from the multiple resource types; the processing module 93 is configured to execute the to-be-processed operator through the computing resource corresponding to the target resource type.
  • the acquiring module 91 acquires the cost value of the to-be-processed operator under multiple resource types, it is specifically used to query the operator resource registry through the to-be-processed operator to obtain the For the cost values corresponding to multiple resource types, the operator resource registration table is generated in advance; wherein, the operator resource registration table includes the corresponding relationship between the operator, the resource type and the cost value.
  • an embodiment of the present application further provides a data processing device, including: a processor and a machine-readable storage medium, the machine-readable storage medium stores several computer instructions, and the processor The following processing is performed when the computer instruction is executed:
  • the to-be-processed operator is executed by the computing resource corresponding to the target resource type.
  • the embodiment of the present application also provides a machine-readable storage medium on which several computer instructions are stored; when the computer instructions are executed, the following processing is performed:
  • the to-be-processed operator is executed by the computing resource corresponding to the target resource type.
  • the data processing device 100 may include: a processor 110, a network interface 120, a bus 130, and a memory 140.
  • the memory 140 may be any electronic, magnetic, optical, or other physical storage device, and may contain or store information, such as executable instructions, data, and so on.
  • the memory 140 may be: RAM (Random Access Memory), volatile memory, non-volatile memory, flash memory, storage drive (such as hard disk drive), solid state hard disk, any type of storage disk (such as optical disk) , Dvd, etc.).
  • a typical implementation device is a computer.
  • the specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A console, a tablet computer, a wearable device, or a combination of any of these devices.
  • the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the embodiments of the present application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • these computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device,
  • the instruction device realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operating steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so that the computer or other programmable equipment is executed
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Abstract

本申请提供一种数据处理方法、装置及设备,该方法包括:获取与数据处理请求对应的待处理算子;获取所述待处理算子在多个资源类型下分别对应的代价值;根据所述代价值从所述多个资源类型中选取目标资源类型;通过所述目标资源类型对应的计算资源执行所述待处理算子。通过本申请的技术方案,能够获得更高的处理性能,用户体验更好。

Description

一种数据处理方法、装置及设备
本申请要求2019年04月18日递交的申请号为201910314361.6、发明名称为“一种数据处理方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及一种数据处理方法、装置及设备。
背景技术
数据湖分析(Data Lake Analytics)用于为用户提供无服务器化(Serverless)的查询分析服务,能够对海量的数据进行任意维度的分析和查询,支持高并发、低延时(毫秒级响应)、实时在线分析、海量数据查询等功能。在数据湖分析系统中,可以包括数据库和计算节点,数据库用于存储大量的数据,计算节点用于接收执行计划,并根据该执行计划对数据库中的数据进行相应处理。
为了加速数据处理和计算性能,数据湖分析系统提供多种类型的计算资源。例如,CPU(Central Processing Unit,中央处理器)资源、FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)资源、GPU(Graphics Processing Unit,图形处理器)资源等,计算节点可以利用这些计算资源对数据进行处理。
但是,当数据湖分析系统同时支持CPU资源、FPGA资源、GPU资源时,应该选取哪个计算资源对数据进行处理,目前并没有合理的选取方式。
发明内容
本申请提供一种数据处理方法,所述方法包括:
获取与数据处理请求对应的待处理算子;
获取所述待处理算子在多个资源类型下分别对应的代价值;
根据所述代价值从所述多个资源类型中选取目标资源类型;
通过所述目标资源类型对应的计算资源执行所述待处理算子。
本申请提供一种数据处理方法,应用于数据湖分析平台,所述数据湖分析平台用于为用户提供无服务器化的数据处理服务,所述方法包括:
获取与数据处理请求对应的待处理算子;
获取所述待处理算子在多个资源类型下分别对应的代价值;
根据所述代价值从所述多个资源类型中选取目标资源类型;
通过所述目标资源类型对应的计算资源执行所述待处理算子;
其中,所述目标资源类型对应的计算资源,用于基于所述数据湖分析平台提供的云数据库,执行所述待处理算子。
本申请提供一种数据处理方法,所述方法包括:
通过指定资源类型对应的计算资源执行指定算子,并获取执行过程中的所述计算资源的代价值;其中,所述指定资源类型为多个资源类型中的任一资源类型,所述指定算子为多个算子中的任一算子;
生成算子资源注册表;其中,所述算子资源注册表包括所述指定算子、所述指定资源类型和所述计算资源的代价值之间的对应关系;
其中,所述算子资源注册表用于确定与数据处理请求对应的待处理算子对应的代价值,并根据所述代价值确定所述待处理算子的目标资源类型,通过所述目标资源类型对应的计算资源执行所述待处理算子。
本申请提供一种数据处理方法,所述方法包括:
获取与数据处理请求对应的原始执行计划,原始执行计划包括待处理算子;
获取所述待处理算子在多个资源类型下分别对应的代价值;
根据所述代价值从所述多个资源类型中选取目标资源类型;
获取与所述原始执行计划对应的目标执行计划,所述目标执行计划包括所述待处理算子、所述待处理算子对应的所述目标资源类型;
将所述目标执行计划发送给所述目标资源类型对应的计算资源,以使所述计算资源根据所述目标执行计划执行所述待处理算子。
本申请提供一种数据处理装置,所述装置包括:
获取模块,用于获取与数据处理请求对应的待处理算子,并获取所述待处理算子在多个资源类型下分别对应的代价值;
选取模块,用于根据所述代价值从所述多个资源类型中选取目标资源类型;
处理模块,用于通过所述目标资源类型对应的计算资源执行所述待处理算子。
本申请提供一种数据处理设备,包括:
处理器和机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,所述处理器执行所述计算机指令时进行如下处理:
获取与数据处理请求对应的待处理算子;
获取所述待处理算子在多个资源类型下分别对应的代价值;
根据所述代价值从所述多个资源类型中选取目标资源类型;
通过所述目标资源类型对应的计算资源执行所述待处理算子。
基于上述技术方案,本申请实施例中,可以获取待处理算子在多个资源类型下分别对应的代价值,并根据所述代价值从多个资源类型中选取目标资源类型,通过目标资源类型对应的计算资源执行待处理算子。经过上述方式,能够从多个计算资源中选择一个目标计算资源(如CPU资源、FPGA资源、GPU资源等),并使用该目标计算资源执行该待处理算子,针对不同的待处理算子,还可以对应不同的目标计算资源,从而合理的选取目标计算资源,得到最优的执行计划,能够获得更高的处理性能,用户体验更好。在云上具备CPU云服务器、FPGA云服务器、GPU云服务器时,能够融合多种硬件资源进行异构计算和统一调度,满足分布式计算任务的混合执行和加速需求,能够大大提升云上多种异构云计算服务器的任务自动化调度效率。
附图说明
为了更加清楚地说明本申请实施例或者现有技术中的技术方案,下面将对本申请实施例或者现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据本申请实施例的这些附图获得其它的附图。
图1是本申请一种实施方式中的数据处理方法的流程示意图;
图2和图3是本申请一种实施方式中的应用场景的示意图;
图4是本申请另一种实施方式中的数据处理方法的流程示意图;
图5是本申请一种实施方式中的前端节点的优化器的处理示意图;
图6和图7是本申请一种实施方式中的SQL算子执行单元的处理示意图;
图8是本申请一种实施方式中的目标执行计划的处理示意图;
图9是本申请一种实施方式中的数据处理装置的结构示意图;
图10是本申请一种实施方式中的数据处理设备的结构示意图。
具体实施方式
在本申请实施例使用的术语仅仅是出于描述特定实施例的目的,而非限制本申请。 本申请和权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本申请实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,此外,所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
本申请实施例提出一种数据处理方法,可以应用于任意设备,如数据湖分析系统的任意设备,参见图1所示,为该方法的流程图,该方法可以包括:
步骤101,获取与数据处理请求对应的待处理算子。
步骤102,获取该待处理算子在多个资源类型下分别对应的代价值。
具体的,可以通过该待处理算子查询算子资源注册表,得到该待处理算子在多个资源类型下分别对应的代价值,该算子资源注册表可以为预先生成的;其中,算子资源注册表可以包括算子、资源类型与代价值的对应关系。
可选的,在一个例子中,在获取该待处理算子在多个资源类型下分别对应的代价值之前,还可以包括:通过指定资源类型对应的计算资源执行指定算子,并获取执行过程中的所述计算资源的代价值;所述指定资源类型为多个资源类型中的任一资源类型,所述指定算子为多个算子中的任一算子。进一步的,可以生成算子资源注册表,该算子资源注册表可以包括所述指定算子、所述指定资源类型和所述计算资源的代价值之间的对应关系。
步骤103,根据所述代价值从多个资源类型中选取目标资源类型。
具体的,可以从待处理算子在多个资源类型下分别对应的代价值中选择最小代价值,将最小代价值对应的资源类型确定为待处理算子的目标资源类型。
步骤104,通过目标资源类型对应的计算资源执行待处理算子。
在一个例子中,获取与数据处理请求对应的待处理算子,可以包括:获取与数据处理请求对应的原始执行计划,该原始执行计划包括待处理算子。
进一步的,通过目标资源类型对应的计算资源执行待处理算子,可以包括但不限于:获取与该原始执行计划对应的目标执行计划,所述目标执行计划包括待处理算子和目标资源类型,并将所述目标执行计划发送给目标资源类型对应的计算资源,以使所述计算 资源执行该目标执行计划,而所述计算资源执行该目标执行计划的过程,就是执行该目标执行计划中的每个待处理算子的过程。
在执行目标执行计划中的每个待处理算子时,需要根据该待处理算子进行数据处理。例如,若待处理算子是scan(扫描)算子,则需要对数据进行扫描处理;若待处理算子是filter(过滤)算子,则需要对数据进行过滤处理;若待处理算子是sort(分类)算子,则需要对数据进行分类处理,对此不做限制。
在一个例子中,获取与该原始执行计划对应的目标执行计划,可以包括但不限于:若存在与数据处理请求对应的多个原始执行计划,则针对多个原始执行计划中的每个原始执行计划,可以根据该原始执行计划中的待处理算子在目标资源类型下对应的代价值,获取该原始执行计划对应的总代价值,即获取每个原始执行计划对应的总代价值。从多个原始执行计划中选取总代价值最小的原始执行计划,并获取与总代价值最小的原始执行计划对应的目标执行计划。
在上述实施例中,资源类型可以包括但不限于以下一种或者多种:CPU资源类型、FPGA资源类型、GPU资源类型。当然,CPU资源类型、FPGA资源类型、GPU资源类型只是一个示例,还可以为其它资源类型,对此不做限制。
在一个例子中,上述执行顺序只是为了方便描述给出的一个示例,在实际应用中,还可以改变步骤之间的执行顺序,对此执行顺序不做限制。而且,在其它实施例中,并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤,其方法所包括的步骤可以比本说明书所描述的更多或更少。此外,本说明书中所描述的单个步骤,在其它实施例中可能被分解为多个步骤进行描述;本说明书中所描述的多个步骤,在其它实施例也可能被合并为单个步骤进行描述。
基于上述技术方案,本申请实施例中,可以获取待处理算子在多个资源类型下分别对应的代价值,并根据所述代价值从多个资源类型中选取目标资源类型,通过目标资源类型对应的计算资源执行待处理算子。经过上述方式,能够从多个计算资源中选择一个目标计算资源(如CPU资源、FPGA资源、GPU资源等),并使用该目标计算资源执行该待处理算子,针对不同的待处理算子,还可以对应不同的目标计算资源,从而合理的选取目标计算资源,得到最优的执行计划,能够获得更高的处理性能,用户体验更好。在云上具备CPU云服务器、FPGA云服务器、GPU云服务器时,能够融合多种硬件资源进行异构计算和统一调度,满足分布式计算任务的混合执行和加速需求,能够大大提升云上多种异构云计算服务器的任务自动化调度效率。
基于与上述方法同样的申请构思,本申请实施例中提出另一种数据处理方法,可以应用于数据湖分析平台,数据湖分析平台用于为用户提供无服务器化的数据处理服务,所述方法包括:获取与数据处理请求对应的待处理算子;获取待处理算子在多个资源类型下分别对应的代价值;根据所述代价值从多个资源类型中选取目标资源类型;通过目标资源类型对应的计算资源执行待处理算子;其中,所述目标资源类型对应的计算资源,用于基于数据湖分析平台提供的云数据库,执行待处理算子。其中,所述计算资源具体可以为:用于提供CPU资源的CPU云服务器;或者,用于提供FPGA资源的FPGA云服务器;或者,用于提供GPU资源的GPU云服务器。
本实施例中的云数据库,是指数据湖分析平台提供的数据库,数据湖分析平台可以是以数据存储为主的存储型云平台,或者,以数据处理为主的计算型云平台,或者,计算和数据存储处理兼顾的综合云计算平台,对此数据湖分析平台不做限制。针对数据湖分析平台提供的云数据库,可以用于为用户提供无服务器化的查询分析服务,能够对海量的数据进行任意维度的分析和查询,支持高并发、低延时(毫秒级响应)、实时在线分析、海量数据查询等功能。
基于与上述方法同样的申请构思,本申请实施例中提出另一种数据处理方法,所述方法包括:通过指定资源类型对应的计算资源执行指定算子,并获取执行过程中的所述计算资源的代价值;指定资源类型为多个资源类型中的任一资源类型,指定算子为多个算子中的任一算子;生成算子资源注册表;算子资源注册表可以包括该指定算子、该指定资源类型和所述计算资源的代价值之间的对应关系;其中,算子资源注册表用于确定与数据处理请求对应的待处理算子对应的代价值,并根据所述代价值确定待处理算子的目标资源类型,通过目标资源类型对应的计算资源执行待处理算子。
其中,针对利用算子资源注册表确定与待处理算子对应的代价值、根据所述代价值确定目标资源类型等过程,可以参见上述实施例,在此不再赘述。
基于与上述方法同样的申请构思,本申请实施例中提出另一种数据处理方法,所述方法包括:获取与数据处理请求对应的原始执行计划,该原始执行计划可以包括待处理算子;获取该待处理算子在多个资源类型下分别对应的代价值;根据所述代价值从多个资源类型中选取目标资源类型;获取与该原始执行计划对应的目标执行计划,该目标执行计划包括该待处理算子、该待处理算子对应的目标资源类型;将该目标执行计划发送给目标资源类型对应的计算资源,以使所述计算资源根据该目标执行计划执行该待处理算子。
获取与原始执行计划对应的目标执行计划,可以包括:若存在与数据处理请求对应的多个原始执行计划,针对多个原始执行计划中的每个原始执行计划,根据该原始执行计划中的待处理算子在目标资源类型下对应的代价值,获取该原始执行计划对应的总代价值;从多个原始执行计划中选取总代价值最小的原始执行计划,并获取与总代价值最小的原始执行计划对应的目标执行计划。
以下结合具体实施例,对本申请实施例的数据处理方法进行说明。参见图2所示,为本申请实施例的应用场景示意图,该方法可以应用于包括客户端、负载均衡设备、前端节点(front node,也可以称为前端服务器)、计算节点(compute node,也可以称为计算服务器)和数据源的系统,如数据湖分析系统。当然,数据湖分析系统还可以包括其它服务器,对此系统结构不做限制。
参见图2所示,是以3个前端节点为例,在实际应用中,前端节点的数量还可以为其它数量,对此不做限制。参见图2所示,是以4个计算节点为例,在实际应用中,计算节点的数量还可以为其它数量,对此不做限制。
其中,客户端可以是终端设备(如PC(Personal Computer,个人计算机)、笔记本电脑、移动终端等)包括的APP(Application,应用),也可以是终端设备包括的浏览器,对此不做限制。负载均衡设备用于对数据处理请求进行负载均衡,如接收到数据处理请求后,将数据处理请求负载均衡到各前端节点。
其中,多个前端节点用于提供相同功能,形成前端节点的资源池。针对资源池中的每个前端节点,用于接收客户端发送的数据处理请求,并对数据处理请求进行SQL(Structured Query Language,结构化查询语言)解析,根据解析结果生成执行计划,并处理该执行计划。例如,前端节点可以将该执行计划发送给计算节点,由计算节点处理该执行计划。具体的,可以将执行计划发送给一个计算节点,由该计算节点处理该执行计划;或者,将该执行计划拆解为多个子计划,并将多个子计划发送给多个计算节点,每个计算节点处理子计划。
其中,多个计算节点用于提供相同的功能,形成计算节点的资源池。针对该资源池中的每个计算节点,若接收到前端节点发送的执行计划,则可以处理该执行计划;或者,若接收到前端节点发送的子计划,则可以处理该子计划。
其中,数据源用于存储各种类型的数据,对此数据类型不做限制,如可以是用户数据、商品数据、地图数据、视频数据、图像数据、音频数据等。
参见图2所示,数据源可以包括数据库,本实施例中,可以是针对异构数据源的场 景,也就是说,这些数据源可以是相同类型的数据库,也可以是不同类型的数据库,这些数据源可以是关系型数据库,也可以是非关系型数据库。
进一步的,对于每个数据源来说,数据源的类型可以包括但不限于:OSS(Object Storage Service,对象存储服务)、TableStore(表格存储)、HBase(Hadoop Database,Hadoop数据库)、HDFS(Hadoop Distributed File System,Hadoop分布式文件系统)、MySQL(即关系型数据库)、RDS(Relational Database Service,关系型数据库服务)、DRDS(Distribute Relational Database Service,分布式关系型数据库服务)、RDBMS(Relational Database Management System,关系数据库管理系统)、SQLServer(即关系型数据库)、PostgreSQL(即对象关系型数据库),MongoDB(即基于分布式文件存储的数据库)等,当然,上述类型只是数据源类型的几个示例,对此数据源的类型不做限制。
在一个例子中,为了加速数据处理和计算性能(如SQL计算任务等),可以利用硬件加速技术来对特定计算任务进行加速,如对SQL算子进行加速。例如,可以采用FPGA和GPU来对特定计算任务进行加速,FPGA是面向可编程逻辑门阵列的计算任务加速,GPU是面向大规模多核的计算任务加速。
综上所述,参见图3所示,计算资源可以包括CPU云服务器池、GPU云服务器池、FPGA云服务器池。CPU云服务器池可以包括多个CPU云服务器,这些CPU云服务器用于提供CPU资源,即通过CPU资源进行数据处理。GPU云服务器池包括多个GPU云服务器,这些GPU云服务器用于提供GPU资源,即通过GPU资源进行数据处理。FPGA云服务器池包括多个FPGA云服务器,这些FPGA云服务器用于提供FPGA资源,即通过FPGA资源进行数据处理。
当然,在实际应用中,还可以包括其它类型的计算资源,对此不做限制。
但是,当同时支持CPU云服务器、GPU云服务器、FPGA云服务器时,应该选取哪个类型的云服务器对数据进行处理,目前没有合理的选取方式。本申请实施例中,提供一种数据处理方法,在云上具备CPU云服务器、GPU云服务器、FPGA云服务器时,能够从CPU云服务器、GPU云服务器、FPGA云服务器中合理的选取云服务器,融合多种硬件资源进行异构SQL计算和统一调度。
为了实现本申请实施例的技术方案,可以预先生成算子资源注册表,该算子资源注册表可以包括算子、资源类型与代价值的对应关系,该代价值可以为时间代价值,也可以为资源代价值,还可以为时间代价值和资源代价值,还可以为其它类型的代价值,对 此代价值的类型不做限制。为了方便描述,后续以时间代价值为例进行说明,其它代价值的实现方式可以参见时间代价值。
假设资源类型为CPU资源、GPU资源和FPGA资源,为了生成算子资源注册表,可以采用如下方式:首先,获取CPU云服务器、GPU云服务器、FPGA云服务器均能够运行的SQL参照工作任务(即benchmark workload),这个SQL参照工作任务可以包括各种类型的算子,如scan(扫描)算子、filter(过滤)算子、hash(哈希)算子(用于hash join和聚合)、sort(分类)算子、input(输入)算子、output(输出)算子、join(加入)算子、agg(聚合)算子。当然,上述只是算子类型的几个示例,还可以有其它类型的算子,对此不做限制。
然后,通过CPU资源类型对应的计算资源(即CPU云服务器)对SQL参照工作任务进行处理,也就是说,通过CPU云服务器执行SQL参照工作任务中的各个算子。例如,可以使用固定计算单元数和网络带宽的CPU云服务器,运行SQL参照工作任务,从而执行SQL参照工作任务中的各个算子。在算子执行过程中,可以统计每个算子的代价值(即SQL算子开销单位)。参见表1所示,示出了CPU资源对应的每个算子的代价值,这里是以时间代价值为例。参见表1所示,在算子执行过程中,还可以统计处理一个block(数据块)的代价值(即时间开销单位)、处理一个page(数据页,由多个block组成)的代价值。
通过GPU资源类型对应的计算资源(即GPU云服务器)对SQL参照工作任务进行处理,也就是说,通过GPU云服务器执行SQL参照工作任务中的各个算子。例如,可以使用固定计算单元数和网络带宽的GPU云服务器,运行SQL参照工作任务,从而执行SQL参照工作任务中的各个算子。在算子执行过程中,可以统计每个算子的代价值(即SQL算子开销单位)。参见表1所示,示出了GPU资源对应的每个算子的代价值,这里是以时间代价值为例。参见表1所示,在算子执行过程中,还可以统计GPU云服务器处理一个block的代价值(即时间开销单位)、处理一个page(数据页,由多个block组成)的代价值。
通过FPGA资源类型对应的计算资源(即FPGA云服务器)对SQL参照工作任务进行处理,也就是说,通过FPGA云服务器执行SQL参照工作任务中的各个算子。例如,可以使用固定计算单元数和网络带宽的FPGA云服务器,运行SQL参照工作任务,从而执行SQL参照工作任务中的各个算子。在算子执行过程中,可以统计每个算子的代价值(即SQL算子开销单位)。参见表1所示,示出了FPGA资源对应的每个算子的代价值, 这里是以时间代价值为例。参见表1所示,在算子执行过程中,可以统计FPGA云服务器处理一个block的代价值(即时间开销单位)、处理一个page(数据页,由多个block组成)的代价值。
参见表1所示,为算子资源注册表的示例,当然,这个算子资源注册表只是一个示例,实际应用中,还可以有更多的资源类型,还可以有更多的算子。
表1
算子类型 CPU资源 FPGA资源 GPU资源
Block 0.5微秒/block 0.03微秒/block 0.2微秒/block
Page 1微秒/page 0.1微秒/page 0.5微秒/page
Scan 0.1微秒/record 0.001微秒/record 0.01微秒/record
Filter 0.01微秒/block 0.001微秒/block 0.005微秒/block
Hash 0.02微秒/block 0.002微秒/block 0.008微秒/block
Sort 0.05微秒/block 0.03微秒/block 0.02微秒/block
Input 0.05微秒/block 0.04微秒/block 0.1微秒/block
Output 0.05微秒/block 0.04微秒/block 0.1微秒/block
在上述应用场景下,参见图4所示,为本申请实施例中提出的数据处理方法的流程示意图,该方法可以应用于前端节点,该方法可以包括以下步骤:
步骤401,获取数据处理请求,如SQL(Structured Query Language,结构化查询语言)类型的数据处理请求等,对此数据处理请求的类型不做限制。
步骤402,根据该数据处理请求获取原始执行计划,即与该数据处理请求对应的原始执行计划,该原始执行计划可以包括多个待处理算子。
例如,原始执行计划可以包括如下待处理算子:scan算子、filter算子、hash算子、sort算子、input算子、output算子,后续以这些算子为例进行说明。
在一个例子中,数据处理请求是用户编写的SQL类型的数据处理请求,可以将这个数据处理请求转换为机器可执行的执行计划,这个执行计划描述了一个具体执行步骤,可以由前端节点的优化器来生成执行计划,为了区分方便,将这个执行计划称为原始执行计划,对此原始执行计划的生成过程不做限制。
其中,原始执行计划可以包括多个待处理算子(也可以称为节点),每个待处理算子可以表示一个计算步骤,对此待处理算子的类型不做限制。
步骤403,通过待处理算子(即原始执行计划中的每个待处理算子)查询算子资源注册表,得到该待处理算子在多个资源类型下分别对应的代价值。
步骤404,从待处理算子在多个资源类型下分别对应的代价值中选择最小代价值,将最小代价值对应的资源类型确定为该待处理算子的目标资源类型。
例如,假设原始执行计划可以包括如下待处理算子:scan算子、filter算子、hash算子、sort算子、input算子、output算子。可以通过scan算子查询表1所示的算子资源注册表,得到scan算子在CPU资源类型下对应的代价值为0.1,scan算子在FPGA资源类型下对应的代价值为0.001,scan算子在GPU资源类型下对应的代价值为0.01。显然,由于最小代价值是0.001,因此,可以将最小代价值0.001对应的FPGA资源类型确定为scan算子的目标资源类型。
此外,可以通过filter算子、hash算子、sort算子、input算子、output算子查询表1所示的算子资源注册表,得到这些算子在多个资源类型下分别对应的代价值,继而将最小代价值对应的资源类型确定为算子的目标资源类型。例如,filter算子的目标资源类型是FPGA资源类型,hash算子的目标资源类型是FPGA资源类型,sort算子的目标资源类型是GPU资源类型,input算子的目标资源类型是FPGA资源类型,output算子的目标资源类型是FPGA资源类型。
步骤405,获取与原始执行计划对应的目标执行计划,目标执行计划包括待处理算子和目标资源类型。其中,与原始执行计划相比,目标执行计划除了包括待处理算子,还可以包括该待处理算子对应的目标资源类型,表示需要由该目标资源类型对应的计算资源执行该待处理算子。
例如,目标执行计划可以包括但不限于:scan算子和FPGA资源类型的对应关系、filter算子与FPGA资源类型的对应关系、hash算子与FPGA资源类型的对应关系、sort算子与GPU资源类型的对应关系、input算子与FPGA资源类型的对应关系、output算子与FPGA资源类型的对应关系。当然,在实际应用中,目标执行计划还可以包括其它内容,对此目标执行计划的内容不做限制。
步骤406,将目标执行计划发送给目标资源类型对应的计算资源,以使该计算资源执行该目标执行计划,即执行该目标执行计划中的每个待处理算子。
例如,可以将该目标执行计划发送给FPGA资源类型对应的FPGA云服务器,以使FPGA云服务器从该目标执行计划中获取需要由FPGA云服务器处理的待处理算子,即目标资源类型为FPGA资源类型的待处理算子,如scan算子、filter算子、hash算子、input 算子和output算子。然后,FPGA云服务器可以利用目标执行计划对这些待处理算子进行处理,对此处理过程不做限制。
此外,还可以将目标执行计划发送给GPU资源类型对应的GPU云服务器,以使GPU云服务器从目标执行计划中获取需要由GPU云服务器处理的待处理算子,即目标资源类型为GPU资源类型的待处理算子,如sort算子。然后,GPU云服务器可以利用目标执行计划对sort算子进行处理,对此不做限制。
又例如,可以将目标执行计划拆分为子计划1和子计划2,子计划1可以包括目标资源类型为FPGA资源类型的待处理算子,如scan算子、filter算子、hash算子、input算子和output算子等,子计划2可以包括目标资源类型为GPU资源类型的待处理算子,如sort算子等。然后,可以将子计划1发送给FPGA资源类型对应的FPGA云服务器,以使FPGA云服务器利用子计划1对scan算子、filter算子、hash算子、input算子和output算子进行处理,对此处理过程不做限制。此外,可以将子计划2发送给GPU资源类型对应的GPU云服务器,以使GPU云服务器利用子计划2对sort算子进行处理,对此处理过程不做限制。
当然,上述方式只是执行目标执行计划的几个示例,对此不做限制。
可选的,在一个例子中,若存在与数据处理请求对应的多个原始执行计划,在获取与原始执行计划对应的目标执行计划时,可以包括:针对每个原始执行计划,可以确定该原始执行计划中的每个待处理算子的目标资源类型,并确定待处理算子在目标资源类型下对应的代价值,即得到每个待处理算子的代价值。对所有待处理算子的代价值求和,得到该原始执行计划对应的总代价值。这样,可以获取每个原始执行计划对应的总代价值。然后,选取总代价值最小的原始执行计划,并获取与总代价值最小的原始执行计划对应的目标执行计划。
可选的,在一个例子中,针对每个待处理算子,在得到该待处理算子对应的目标资源类型后,若该目标资源类型对应的计算资源有限,无法处理这个待处理算子,则可以在排除这个目标资源类型的基础上,重新确定该待处理算子对应的目标资源类型,具体确定方式参见上述实施例,在此不再赘述。
可选的,在一个例子中,针对某个待处理算子来说,若FPGA云服务器和GPU云服务器均无法对该待处理算子进行处理,则可以使用CPU云服务器对该待处理算子进行处理,即该待处理算子的目标资源类型为CPU资源。
以下结合几个具体应用场景,对上述数据处理方法进行说明。
参见图5所示,前端节点的优化器的输入数据是数据处理请求、算子资源注册表,基于数据处理请求和算子资源注册表,可以得到目标执行计划,目标执行计划包括多个待处理算子以及这些待处理算子对应的目标资源类型。然后,将目标执行计划输出给目标资源类型对应的计算资源,如CPU云服务器、GPU云服务器、或FPGA云服务器等。综上所述,优化器可以生成融合CPU云服务器池、FPGA云服务器池、GPU云服务器池的SQL分布式执行计划。
参见图6所示,SQL算子执行单元可以包括输入buffer(缓冲区)、输出buffer、软件处理模块、FPGA处理模块、GPU处理模块。进一步的,对于CPU云服务器来说,SQL算子执行单元可以包括输入buffer、输出buffer、软件处理模块。对于FPGA云服务器来说,SQL算子执行单元可以包括输入buffer、输出buffer、软件处理模块和FPGA处理模块。对于GPU云服务器来说,SQL算子执行单元可以包括输入buffer、输出buffer、软件处理模块和GPU处理模块。
参见图7所示,SQL算子执行单元的执行过程中,在CPU云服务器、FPGA云服务器和GPU云服务器执行时,分别采用对应的处理模块。例如,在FPGA云服务器执行时,使用软件处理模块和FPGA处理模块对SQL算子进行处理;在GPU云服务器执行时,使用软件处理模块和GPU处理模块对SQL算子进行处理;在CPU云服务器执行时,使用软件处理模块对SQL算子进行处理。
参见图8所示,示出了一个目标执行计划的处理示意图,scan算子和filter算子由FPGA云服务器进行处理,agg算子由CPU云服务器进行处理,hash join算子由GPU云服务器进行处理,output算子由CPU云服务器进行处理。
基于上述技术方案,本申请实施例中,能够从多个计算资源中选择一个目标计算资源(如CPU资源、FPGA资源、GPU资源等),并使用该目标计算资源执行该待处理算子,针对不同的待处理算子,可以对应不同目标计算资源,从而合理选取目标计算资源,得到最优执行计划,能够获得更高的处理性能,用户体验更好。在云上具备CPU云服务器、FPGA云服务器、GPU云服务器时,能够融合多种硬件资源进行异构计算和统一调度,能够融合多种异构云计算服务器硬件资源,满足分布式计算任务的混合执行和加速需求,能够大大提升云上多种异构云计算服务器的任务自动化调度效率。此外,SQL算子执行单元的软件处理模块、FPGA处理模块、GPU处理模块能更加通用的适配运行到云上的CPU云服务器、FPGA云服务器、GPU云服务器。
基于与上述方法同样的申请构思,本申请实施例还提供一种数据处理装置,如图9 所示,为所述数据处理装置的结构图,所述数据处理装置包括:
获取模块91,用于获取与数据处理请求对应的待处理算子,并获取所述待处理算子在多个资源类型下分别对应的代价值;选取模块92,用于根据所述代价值从所述多个资源类型中选取目标资源类型;处理模块93,用于通过所述目标资源类型对应的计算资源执行所述待处理算子。
所述获取模块91获取所述待处理算子在多个资源类型下分别对应的代价值时具体用于:通过所述待处理算子查询算子资源注册表,得到所述待处理算子在多个资源类型下分别对应的代价值,所述算子资源注册表为预先生成的;其中,所述算子资源注册表包括算子、资源类型与代价值的对应关系。
基于与上述方法同样的申请构思,本申请实施例还提供一种数据处理设备,包括:处理器和机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,所述处理器执行所述计算机指令时进行如下处理:
获取与数据处理请求对应的待处理算子;
获取所述待处理算子在多个资源类型下分别对应的代价值;
根据所述代价值从所述多个资源类型中选取目标资源类型;
通过所述目标资源类型对应的计算资源执行所述待处理算子。
本申请实施例还提供一种机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令;所述计算机指令被执行时进行如下处理:
获取与数据处理请求对应的待处理算子;
获取所述待处理算子在多个资源类型下分别对应的代价值;
根据所述代价值从所述多个资源类型中选取目标资源类型;
通过所述目标资源类型对应的计算资源执行所述待处理算子。
参见图10所示,为本申请实施例中提出的数据处理设备的结构图,所述数据处理设备100可以包括:处理器110,网络接口120,总线130,存储器140。存储器140可以是任何电子、磁性、光学或其它物理存储装置,可以包含或存储信息,如可执行指令、数据等等。例如,存储器140可以是:RAM(Random Access Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、dvd等)。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式 可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
而且,这些计算机程序指令也可以存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或者多个流程和/或方框图一个方框或者多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (15)

  1. 一种数据处理方法,其特征在于,所述方法包括:
    获取与数据处理请求对应的待处理算子;
    获取所述待处理算子在多个资源类型下分别对应的代价值;
    根据所述代价值从所述多个资源类型中选取目标资源类型;
    通过所述目标资源类型对应的计算资源执行所述待处理算子。
  2. 根据权利要求1所述的方法,其特征在于,
    所述获取所述待处理算子在多个资源类型下分别对应的代价值,包括:
    通过所述待处理算子查询算子资源注册表,得到所述待处理算子在多个资源类型下分别对应的代价值,所述算子资源注册表为预先生成的;
    其中,所述算子资源注册表包括算子、资源类型与代价值的对应关系。
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述待处理算子在多个资源类型下分别对应的代价值之前,所述方法还包括:
    通过指定资源类型对应的计算资源执行指定算子,并获取执行过程中的所述计算资源的代价值;其中,所述指定资源类型为所述多个资源类型中的任一资源类型,所述指定算子为多个算子中的任一算子;
    生成所述算子资源注册表,所述算子资源注册表包括所述指定算子、所述指定资源类型和所述计算资源的代价值之间的对应关系。
  4. 根据权利要求1所述的方法,其特征在于,
    所述根据所述代价值从所述多个资源类型中选取目标资源类型,包括:
    从所述待处理算子在多个资源类型下分别对应的代价值中选择最小代价值,将最小代价值对应的资源类型确定为所述待处理算子的目标资源类型。
  5. 根据权利要求1所述的方法,其特征在于,
    所述获取与数据处理请求对应的待处理算子,包括:获取与所述数据处理请求对应的原始执行计划,所述原始执行计划包括所述待处理算子;
    通过所述目标资源类型对应的计算资源执行所述待处理算子,包括:获取与所述原始执行计划对应的目标执行计划,所述目标执行计划包括所述待处理算子和所述目标资源类型,并将所述目标执行计划发送给所述目标资源类型对应的计算资源,以使所述计算资源执行所述目标执行计划。
  6. 根据权利要求5所述的方法,其特征在于,
    所述获取与所述原始执行计划对应的目标执行计划,包括:
    若存在与所述数据处理请求对应的多个原始执行计划,针对所述多个原始执行计划中的原始执行计划,根据该原始执行计划中的所述待处理算子在所述目标资源类型下对应的代价值,获取该原始执行计划对应的总代价值;
    从所述多个原始执行计划中选取总代价值最小的原始执行计划;
    获取与所述总代价值最小的原始执行计划对应的目标执行计划。
  7. 根据权利要求1-6任一所述的方法,其特征在于,
    所述资源类型包括以下一种或者多种:中央处理器CPU资源类型、现场可编程逻辑门阵列FPGA资源类型、图形处理器GPU资源类型。
  8. 一种数据处理方法,其特征在于,应用于数据湖分析平台,所述数据湖分析平台用于为用户提供无服务器化的数据处理服务,所述方法包括:
    获取与数据处理请求对应的待处理算子;
    获取所述待处理算子在多个资源类型下分别对应的代价值;
    根据所述代价值从所述多个资源类型中选取目标资源类型;
    通过所述目标资源类型对应的计算资源执行所述待处理算子;
    其中,所述目标资源类型对应的计算资源,用于基于所述数据湖分析平台提供的云数据库,执行所述待处理算子。
  9. 根据权利要求8所述的方法,其特征在于,所述计算资源具体为:
    用于提供中央处理器CPU资源的CPU云服务器;或者,
    用于提供现场可编程逻辑门阵列FPGA资源的FPGA云服务器;或者,
    用于提供图形处理器GPU资源的GPU云服务器。
  10. 一种数据处理方法,其特征在于,所述方法包括:
    通过指定资源类型对应的计算资源执行指定算子,并获取执行过程中的所述计算资源的代价值;其中,所述指定资源类型为多个资源类型中的任一资源类型,所述指定算子为多个算子中的任一算子;
    生成算子资源注册表;其中,所述算子资源注册表包括所述指定算子、所述指定资源类型和所述计算资源的代价值之间的对应关系;
    其中,所述算子资源注册表用于确定与数据处理请求对应的待处理算子对应的代价值,并根据所述代价值确定所述待处理算子的目标资源类型,通过所述目标资源类型对应的计算资源执行所述待处理算子。
  11. 一种数据处理方法,其特征在于,所述方法包括:
    获取与数据处理请求对应的原始执行计划,原始执行计划包括待处理算子;
    获取所述待处理算子在多个资源类型下分别对应的代价值;
    根据所述代价值从所述多个资源类型中选取目标资源类型;
    获取与所述原始执行计划对应的目标执行计划,所述目标执行计划包括所述待处理算子、所述待处理算子对应的所述目标资源类型;
    将所述目标执行计划发送给所述目标资源类型对应的计算资源,以使所述计算资源根据所述目标执行计划执行所述待处理算子。
  12. 根据权利要求11所述的方法,其特征在于,
    所述获取与所述原始执行计划对应的目标执行计划,包括:
    若存在与所述数据处理请求对应的多个原始执行计划,针对所述多个原始执行计划中的原始执行计划,根据该原始执行计划中的所述待处理算子在所述目标资源类型下对应的代价值,获取该原始执行计划对应的总代价值;
    从所述多个原始执行计划中选取总代价值最小的原始执行计划;
    获取与所述总代价值最小的原始执行计划对应的目标执行计划。
  13. 一种数据处理装置,其特征在于,所述装置包括:
    获取模块,用于获取与数据处理请求对应的待处理算子,并获取所述待处理算子在多个资源类型下分别对应的代价值;
    选取模块,用于根据所述代价值从所述多个资源类型中选取目标资源类型;
    处理模块,用于通过所述目标资源类型对应的计算资源执行所述待处理算子。
  14. 根据权利要求13所述的装置,其特征在于,
    所述获取模块获取所述待处理算子在多个资源类型下分别对应的代价值时具体用于:通过所述待处理算子查询算子资源注册表,得到所述待处理算子在多个资源类型下分别对应的代价值,所述算子资源注册表为预先生成的;其中,所述算子资源注册表包括算子、资源类型与代价值的对应关系。
  15. 一种数据处理设备,其特征在于,包括:
    处理器和机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,所述处理器执行所述计算机指令时进行如下处理:
    获取与数据处理请求对应的待处理算子;
    获取所述待处理算子在多个资源类型下分别对应的代价值;
    根据所述代价值从所述多个资源类型中选取目标资源类型;
    通过所述目标资源类型对应的计算资源执行所述待处理算子。
PCT/CN2020/084425 2019-04-18 2020-04-13 一种数据处理方法、装置及设备 WO2020211718A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910314361.6A CN111831425A (zh) 2019-04-18 2019-04-18 一种数据处理方法、装置及设备
CN201910314361.6 2019-04-18

Publications (1)

Publication Number Publication Date
WO2020211718A1 true WO2020211718A1 (zh) 2020-10-22

Family

ID=72837000

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084425 WO2020211718A1 (zh) 2019-04-18 2020-04-13 一种数据处理方法、装置及设备

Country Status (2)

Country Link
CN (1) CN111831425A (zh)
WO (1) WO2020211718A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232511A (zh) * 2006-12-07 2008-07-30 丛林网络公司 基于服务器功耗的网络通信分配
CN103377087A (zh) * 2012-04-27 2013-10-30 北大方正集团有限公司 一种数据任务处理方法、装置及系统
CN104144183A (zh) * 2013-05-08 2014-11-12 株式会社日立制作所 数据中心系统及数据中心系统的管理方法
CN105049268A (zh) * 2015-08-28 2015-11-11 东方网力科技股份有限公司 分布式计算资源分配系统和任务处理方法
CN106936925A (zh) * 2017-04-17 2017-07-07 广州孩教圈信息科技股份有限公司 负载均衡方法和系统
CN107431630A (zh) * 2015-01-30 2017-12-01 卡尔加里科学公司 高度可扩展、容错的远程访问架构和与之连接的方法
WO2018111987A1 (en) * 2016-12-13 2018-06-21 Amazon Technologies, Inc. Reconfigurable server

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7707207B2 (en) * 2006-02-17 2010-04-27 Microsoft Corporation Robust cardinality and cost estimation for skyline operator
CN103235974B (zh) * 2013-04-25 2015-10-28 中国科学院地理科学与资源研究所 一种提高海量空间数据处理效率的方法
WO2016041126A1 (zh) * 2014-09-15 2016-03-24 华为技术有限公司 基于gpu的数据流处理方法和装置
US9898337B2 (en) * 2015-03-27 2018-02-20 International Business Machines Corporation Dynamic workload deployment for data integration services
US10771538B2 (en) * 2015-10-08 2020-09-08 International Business Machines Corporation Automated ETL resource provisioner
CN108536692B (zh) * 2017-03-01 2022-03-11 华为技术有限公司 一种执行计划的生成方法、装置及数据库服务器
CN109241093B (zh) * 2017-06-30 2021-06-08 华为技术有限公司 一种数据查询的方法、相关装置及数据库系统
CN108491274A (zh) * 2018-04-02 2018-09-04 深圳市华傲数据技术有限公司 分布式数据管理的优化方法、装置、存储介质及设备
CN108959510B (zh) * 2018-06-27 2022-04-19 北京奥星贝斯科技有限公司 一种分布式数据库的分区级连接方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232511A (zh) * 2006-12-07 2008-07-30 丛林网络公司 基于服务器功耗的网络通信分配
CN103377087A (zh) * 2012-04-27 2013-10-30 北大方正集团有限公司 一种数据任务处理方法、装置及系统
CN104144183A (zh) * 2013-05-08 2014-11-12 株式会社日立制作所 数据中心系统及数据中心系统的管理方法
CN107431630A (zh) * 2015-01-30 2017-12-01 卡尔加里科学公司 高度可扩展、容错的远程访问架构和与之连接的方法
CN105049268A (zh) * 2015-08-28 2015-11-11 东方网力科技股份有限公司 分布式计算资源分配系统和任务处理方法
WO2018111987A1 (en) * 2016-12-13 2018-06-21 Amazon Technologies, Inc. Reconfigurable server
CN106936925A (zh) * 2017-04-17 2017-07-07 广州孩教圈信息科技股份有限公司 负载均衡方法和系统

Also Published As

Publication number Publication date
CN111831425A (zh) 2020-10-27

Similar Documents

Publication Publication Date Title
US11372888B2 (en) Adaptive distribution for hash operations
CN110168516B (zh) 用于大规模并行处理的动态计算节点分组方法及系统
Ranjan Streaming big data processing in datacenter clouds
US11614970B2 (en) High-throughput parallel data transmission
WO2020211717A1 (zh) 一种数据处理方法、装置及设备
US10691695B2 (en) Combined sort and aggregation
CN111723161A (zh) 一种数据处理方法、装置及设备
CN111782404A (zh) 一种数据处理方法及相关设备
US10891271B2 (en) Optimized execution of queries involving early terminable database operators
CN108319604B (zh) 一种hive中大小表关联的优化方法
CN112506887B (zh) 车辆终端can总线数据处理方法及装置
CN110909072B (zh) 一种数据表建立方法、装置及设备
WO2020211718A1 (zh) 一种数据处理方法、装置及设备
US10268727B2 (en) Batching tuples
Win et al. An efficient big data analytics platform for mobile devices
CN110866052A (zh) 一种数据分析方法、装置及设备
US11550793B1 (en) Systems and methods for spilling data for hash joins
CN111221858B (zh) 一种数据处理方法、装置及设备
CN116842225A (zh) 数据库查询方法、装置、设备、介质和程序产品
Radhakrishnan et al. ADAPTIVE HANDLING OF 3V’S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS
Sharma et al. Querying capability comparison of Hadoop technologies to find the more sustainable platform for big data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20791389

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20791389

Country of ref document: EP

Kind code of ref document: A1