CN111723112A - Data task execution method and device, electronic equipment and storage medium - Google Patents

Data task execution method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111723112A
CN111723112A CN202010529804.6A CN202010529804A CN111723112A CN 111723112 A CN111723112 A CN 111723112A CN 202010529804 A CN202010529804 A CN 202010529804A CN 111723112 A CN111723112 A CN 111723112A
Authority
CN
China
Prior art keywords
target
data
task
query task
data query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010529804.6A
Other languages
Chinese (zh)
Other versions
CN111723112B (en
Inventor
黄琼峰
桂祖宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010529804.6A priority Critical patent/CN111723112B/en
Publication of CN111723112A publication Critical patent/CN111723112A/en
Application granted granted Critical
Publication of CN111723112B publication Critical patent/CN111723112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides a data task execution method, a data task execution device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining the data volume of a target data query task and the resource consumption cost of the target data query task under each preset computing engine; and determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine. According to the data task execution method, the data task execution device, the electronic equipment and the storage medium, the matched computing engines are selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, the situation that the unmatched computing engines influence the execution of the data query task is avoided, and the execution efficiency and the stability of the data query task are improved.

Description

Data task execution method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data task execution method and apparatus, an electronic device, and a storage medium.
Background
The task processing form of big data is mainly divided into batch task processing and real-time task processing. Batch tasking, also known as Hive offline tasking because of processing time issues. The real-time task processing technology realizes the quick response time of data processing, the real-time task processing has different implementation modes, and the real-time task processing also comprises Storm streaming calculation and spark real-time interactive calculation.
Whether batch or real-time task processing, may be referred to as a compute engine. Different computing engines have different application scenarios. The off-line task processing is suitable for processing data with extremely large data volume and few updates, but the defects are obvious, the response time is slow, the interactivity is poor, and the programming mode is single. Although real-time task processing has a fast response time, most real-time computing engines have high requirements on hardware due to internal memory dependence, and have low tolerance on data volume.
The existing data platform supports mixed computing engines of batch task processing and real-time processing, for example, the same data platform can simultaneously support live, Spark, Storm, Kylin and other data real-time and off-line computing engines, and each computing engine has a specific application scene. Therefore, before the data query task is executed, a specific calculation engine is manually designated, and at the moment, one calculation engine is fixedly selected for the task and the data calculation engine cannot be dynamically selected, so that the data query task is influenced by the unsuitable calculation engine, and the execution efficiency and the stability of the data query task are reduced.
Disclosure of Invention
To solve the problems in the prior art, embodiments of the present invention provide a data task execution method and apparatus, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present invention provides a data task execution method, including:
determining the data volume of a target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
Further, the target data query task is obtained by analyzing a target query statement and is composed of a plurality of subtasks, and accordingly, the resource calculation cost of the target data query task under each preset calculation engine is determined, which includes:
determining IO read-write data volume, CPU resources and memory resources required by each subtask in the target data query task under each computing engine, and IO bandwidth, CPU resources and memory resources currently available for a data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform.
Further, the determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform includes:
determining the sum of the ratio of IO read-write data volume of each subtask under each computing engine to the current available IO bandwidth of the data platform, the sum of the ratio of CPU resources to the current available CPU resources of the data platform, and the sum of the ratio of memory resources to the current available memory resources of the data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to each sum.
Further, the determining a target computing engine for executing the target data query task according to the data amount and the resource computing cost includes:
when the data volume is determined to exceed the first threshold value, determining the Hive calculation engine as a target calculation engine;
and when the data volume is determined not to exceed the first threshold value, determining the computing engine corresponding to the minimum value in the resource computing cost under each computing engine as a target computing engine.
Further, before determining a target computing engine for executing the target data query task according to the data amount and the resource consumption cost, the method further includes:
acquiring an execution cache table corresponding to the target data query task;
and executing the target data query task according to a calculation engine prestored in an execution cache table.
Further, the executing cache table includes a calculation engine executing table and data cache valid time information, and accordingly, the executing of the target data query task according to the calculation engine pre-stored in the executing cache table includes:
and after the target data query task is determined to be cached according to the data caching effective time information, analyzing and obtaining a pre-stored computing engine from the computing engine execution table to execute the target data query task.
In a second aspect, an embodiment of the present invention provides a data task execution device, including:
the determining module is used for determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and the execution module is used for determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
In a third aspect, an embodiment of the present invention provides a data task execution system, including:
the task analyzer is used for determining a target data query task;
the task compiler is used for determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
the computing engine selector is used for determining a target computing engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target computing engine;
and the metadata repository is used for storing data required by executing the target data query task.
In a fourth aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the data task execution method when executing the program.
In a fifth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the steps of the data task execution method as described above.
According to the data task execution method, the data task execution device, the electronic equipment and the storage medium, the matched computing engines are selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, the situation that the unmatched computing engines influence the execution of the data query task is avoided, and the execution efficiency and the stability of the data query task are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a data task execution method of the present invention;
FIG. 2 is a schematic diagram of a complete flow chart of a data task execution method according to the present invention;
FIG. 3 is a block diagram of an embodiment of a data task execution device according to the present invention;
FIG. 4 is a block diagram of a data task execution system according to an embodiment of the present invention;
FIG. 5 is a block diagram of an embodiment of an electronic device according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 illustrates a data task execution method provided by an embodiment of the present invention, including:
s11, determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and S12, determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
With respect to step S11 and step S12, it should be noted that, in the embodiment of the present invention, the target data query task is formed by parsing and converting a data query statement input by a system interface. The data query statements comprise SQL statements, SPARQL statements, DQL statements and the like, so the data query tasks obtained by analyzing and converting the statements can be SQL tasks, SPARQL tasks, DQL tasks and the like.
The following explains the method of this embodiment with an SQL task, specifically as follows:
and analyzing and converting the SQL statement into a task tree form, wherein the task tree has a plurality of tree nodes, and each tree node corresponds to one subtask. During the parsing process of the SQL statement, some primary parsing works including syntax checking are executed, and after the syntax parsing is passed, the target SQL task is determined to be qualified.
When each subtask in the SQL task is executed, the bottom data needs to be processed, and the data volume can be obtained by calculating the bottom data, so that the target SQL task is calculated to obtain the corresponding data volume.
In the embodiment of the present invention, the underlying data corresponding to the SQL task may be stored in a metadata repository, where the metadata repository further includes names of data tables related to the SQL task, columns and partitions of the tables, attributes of the tables, a directory in which the data of the tables is located, and the like.
In embodiments of the present invention, the calculation engine includes Hive (offline), Spark (real time), Storm (streaming), and Kylin calculations, among others. For different computing engines, the resource consumption cost of the target SQL task under each computing engine is different. Therefore, the resource consumption cost of the target SQL task is calculated to obtain the resource consumption cost under each calculation engine.
And then, according to the obtained data volume and the resource consumption cost under each computing engine, determining a target computing engine for executing the target SQL task through a preset computing engine selection strategy, and executing the target SQL task according to the target computing engine.
In contrast, the calculation engine selection policy determines one calculation engine from among the calculation engines as a target calculation engine for executing a target SQL task, based on the data volume and the resource consumption cost of each calculation engine as a screening condition.
According to the data task execution method provided by the embodiment of the invention, the matched computing engines are selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, so that the influence of the unmatched computing engines on the execution of the data query task is avoided, and the execution efficiency and the stability of the data query task are improved.
In a further embodiment of the foregoing embodiment method, the explanation mainly determines resource consumption costs of the target SQL task under each preset computing engine, specifically as follows:
s111, determining IO read-write data volume, CPU resources and memory resources required by each subtask in the target SQL task under each computing engine, and IO bandwidth, CPU resources and memory resources currently available for a data platform;
and S112, determining the resource consumption cost of the target SQL task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available by the data platform.
With respect to step S111 and step S112, it should be noted that, in the embodiment of the present invention, since it is mentioned in the above embodiment that the target SQL task includes a plurality of subtasks. And analyzing each subtask to obtain corresponding IO read-write data volume, CPU resources and memory resources, and IO bandwidth, CPU resources and memory resources currently available for the data platform. Here, the data platform is the computing environment needed to perform data query tasks.
Determining the sum of the ratio of IO read-write data volume of each subtask under each computing engine to the current available IO bandwidth of the data platform, the sum of the ratio of CPU resources to the current available CPU resources of the data platform, and the sum of the ratio of memory resources to the current available memory resources of the data platform, and determining the resource consumption cost of the target data query task under each computing engine according to each sum.
In a further embodiment of the method in the above embodiment, a resource consumption cost obtaining formula is used to determine the resource consumption cost of the target SQL task under each computing engine.
Wherein the resource consumption cost obtaining formula includes:
Figure BDA0002534802630000061
Ccompute-engineis a resource consumption cost;
IOireading and writing data quantity, IO, required by the ith subtask in the target SQL task under the computing engineuseableThe IO bandwidth currently available for the data platform;
CPUithe CPU resource required by the ith subtask in the target SQL task under the computing engineuseableIs a currently available CPU resource of the data platform;
Memithe memory resource, Mem, required by the ith subtask in the target SQL task under the computing engineuseableIs the currently available memory resource of the data platform.
It should be noted that the amount of IO read and write data, CPU resources, and memory resources required by each subtask under different computing engines may be different. For this reason, the resource consumption cost of the target SQL task is different under each computing engine.
In a further embodiment of the foregoing embodiment method, a process of determining a target computing engine for executing a target SQL task according to a data amount, a resource computing cost, and a preset computing engine selection policy is mainly explained, where the data amount of the target SQL task and a resource consumption cost of the target SQL task under the computing engine are both numerical values, and therefore, the numerical values are mainly determined by using the computing engine selection policy, so as to determine a matched computing engine as the target computing engine of the target SQL task. The method comprises the following specific steps:
the calculation engine selection strategy comprises the following steps:
and when the data volume of the SQL task is determined to exceed the first threshold value, determining the Hive calculation engine as a target calculation engine.
And when the data volume of the SQL task is determined not to exceed the first threshold, determining the computing engine corresponding to the minimum value in the resource computing cost under each computing engine as a target computing engine.
In contrast, the Hive calculation engine is suitable for being applied to a large-data-volume offline calculation scenario, and the task execution time is relatively long. The computing engines such as Spark and Storm are suitable for being applied to a real-time computing scene, the data processing time is short, but the SQL task can be directly failed to be executed under the condition that the computing resource demand is large and the computing resource is insufficient.
Here, the first threshold is a boundary value for determining whether the SQL task is a large data volume task. Tasks that exceed 10T data volume, for example, employ hive calculation engines.
In a further embodiment of the method according to the above embodiment, before determining a target computing engine for executing the target SQL task according to the data size of the SQL task and the resource consumption cost of the SQL task in each computing engine, an explanation about a selection process of the computing engine for executing the target SQL task is mainly given as follows:
and acquiring an execution cache table corresponding to the target SQL task.
In the above embodiments, the underlying data corresponding to the SQL task may be stored in a metadata repository, where names of data tables related to the SQL task, columns and partitions of the tables, attributes of the tables, directories where the data of the tables are located, and the like are also included in the metadata repository. In addition, the metadata repository also comprises an execution cache table of the SQL task, and the execution cache table comprises information of a computing engine adopted by a certain SQL task within the cache effective time.
And after the data information on the execution cache table is determined to be valid, executing the target SQL task according to a calculation engine prestored in the execution cache table.
In a further embodiment of the foregoing embodiment method, the execution cache table includes a calculation engine execution table and data cache valid time information, and the calculation engine execution table includes a calculation engine used by the SQL task within the cache valid time. The data cache validity time information includes a cache validity time of the SQL task.
Therefore, after the target SQL task is determined to be cached according to the effective time information of the data cache, the prestored calculation engine execution target SQL task is obtained by analyzing the calculation engine execution table.
In addition, after the data information of the execution cache table is determined to be invalid, the data volume of the target SQL task and the resource consumption cost of the target SQL task under each preset computing engine are determined, and the target SQL task is executed according to the target computing engine.
It should be noted that, after determining a target computing engine for executing a target data query task according to the data amount and the resource consumption cost, and executing the target data query task according to the target computing engine, the target computing engine needs to be placed in an execution cache table and stored in the metadata repository.
In an embodiment, since the execution cache table is stored in the metadata repository, the system may directly access the metadata repository before determining the target computing engine for executing the target SQL task according to the data volume and the resource consumption cost, so as to quickly select a suitable computing engine according to the execution cache table.
With reference to the content of the foregoing embodiments, fig. 2 shows an overall flowchart of a data task execution method according to an embodiment of the present invention, and with reference to fig. 2, the following details are shown:
the SQL task has cache on an execution cache table, and if the Hive calculation engine is cached on the execution cache table, the SQL task is executed by adopting the Hive calculation engine; and if the Spark calculation engine is cached on the execution cache table, executing the SQL task by using the Spark calculation engine.
And if the SQL task is not cached in the execution cache table and the data volume exceeds a first threshold value, executing the SQL task by adopting a Hive calculation engine.
And if the data quantity does not exceed the first threshold, calculating the resource consumption cost of the SQL task under the Hive and Spark calculation engines.
And if the resource consumption cost under the Hive computing engine is relatively low, the Hive computing engine is adopted to execute the SQL task.
And if the resource consumption cost under the Spark calculation engine is relatively low, executing the SQL task by adopting the Spark calculation engine.
Fig. 3 shows a data task execution device according to an embodiment of the present invention, which includes a determining module 21 and an executing module 22, where:
the determining module 21 is configured to determine a data amount of the target data query task and resource consumption costs of the target data query task under preset computing engines;
and the execution module 22 is configured to determine a target computing engine for executing the target data query task according to the data volume and the resource consumption cost, and execute the target data query task according to the target computing engine.
In a further embodiment of the apparatus in the above embodiment, the determining module, in the process of determining the resource consumption cost of the target data query task under each preset computing engine, is specifically configured to:
determining IO read-write data volume, CPU resources and memory resources required by each subtask in the target data query task under each computing engine, and IO bandwidth, CPU resources and memory resources currently available for the data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform.
In a further embodiment of the apparatus in the above embodiment, determining resource consumption cost of the target data query task in each computing engine according to an IO read-write data amount, a CPU resource, and a memory resource required by each subtask in each computing engine, and an IO bandwidth, a CPU resource, and a memory resource currently available for the data platform, includes:
determining the sum of the ratio of IO read-write data volume of each subtask under each computing engine to the current available IO bandwidth of the data platform, the sum of the ratio of CPU resources to the current available CPU resources of the data platform, and the sum of the ratio of memory resources to the current available memory resources of the data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to each sum.
In a further embodiment of the apparatus in the foregoing embodiment, a resource consumption cost obtaining formula is used to determine resource consumption costs of the target data query task under each computing engine, where the resource consumption cost obtaining formula includes:
Figure BDA0002534802630000101
Ccompute-engineis a resource consumption cost;
IOithe IO read-write data volume, IO, required by the ith subtask in the target data query task under the calculation engineuseableIs a data platformThe IO bandwidth previously available;
CPUiinquiring CPU resources required by the ith subtask in the target data under a computing engine, wherein the CPU resources are used for the ith subtask in the target datauseableIs a currently available CPU resource of the data platform;
Memiinquiring memory resources, Mem, needed by the ith subtask in the target data query task under the computing engineuseableIs the currently available memory resource of the data platform.
In a further embodiment of the foregoing embodiment device, the execution module, in the determination process of determining a target computing engine for executing a target data query task according to the data volume and the resource consumption cost, is specifically configured to:
when the data volume is determined to exceed the first threshold value, determining the Hive calculation engine as a target calculation engine;
and when the data volume is determined not to exceed the first threshold value, determining the computing engine corresponding to the minimum value in the resource computing cost under each computing engine as a target computing engine.
In a further embodiment of the apparatus in the above embodiment, before determining a target computing engine for executing the target data query task according to the data size, the resource computing cost, and a preset computing engine selection policy, the apparatus further includes an obtaining module, configured to:
acquiring an execution cache table corresponding to the target data query task;
and executing the target data query task according to the calculation engine prestored in the execution cache table.
In a further embodiment of the apparatus in the foregoing embodiment, the execution cache table includes a calculation engine execution table and data cache valid time information, and accordingly, the obtaining module is specifically configured to, in a process of executing a target data query task according to a calculation engine prestored in the execution cache table:
and after the target data query task is determined to be cached according to the effective time information of the data cache, analyzing and obtaining a pre-stored computing engine execution target data query task from the computing engine execution table.
Since the principle of the apparatus according to the embodiment of the present invention is the same as that of the method according to the above embodiment, further details are not described herein for further explanation.
It should be noted that, in the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
According to the data task execution device provided by the embodiment of the invention, the matched computing engine is selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, so that the unmatched computing engines are prevented from influencing the execution of the data query task, and the execution efficiency and the stability of the data query task are improved.
Fig. 4 shows a schematic structural diagram of a data task execution system provided by an embodiment of the present invention, and referring to fig. 4, the system includes a task parser 31, a task compiler 32, a calculation engine selector 33, and a metadata repository 34, where:
a task parser 31 for determining a target data query task;
the task compiler 32 is configured to determine a data size of a target data query task and resource consumption costs of the target data query task under preset computing engines;
a calculation engine selector 33, configured to determine a target calculation engine for executing the target data query task according to the data amount and the resource consumption cost, and execute the target data query task according to the target calculation engine;
a metadata repository 34 for storing data required for performing the target data query task.
Since the system according to the embodiment of the present invention has the same principle as the method according to the above embodiment, further details are not described herein for further explanation.
It should be noted that, in the embodiment of the present invention, the relevant functional unit may be implemented by a hardware processor (hardware processor).
According to the data task execution system provided by the embodiment of the invention, the matched computing engine is selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, so that the situation that the execution of the data query task is influenced by the unmatched computing engines is avoided, and the execution efficiency and the stability of the data query task are improved.
Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)41, a communication Interface (communication Interface)42, a memory (memory)43 and a communication bus 44, wherein the processor 41, the communication Interface 42 and the memory 43 complete communication with each other through the communication bus 44. Processor 41 may call logic instructions in memory 43 to perform the following method: determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine; and determining a target calculation engine for executing the target data query task according to the data volume and the resource calculation cost, and executing the target data query task according to the target calculation engine.
Furthermore, the logic instructions in the memory 43 may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine; and determining a target calculation engine for executing the target data query task according to the data volume and the resource calculation cost, and executing the target data query task according to the target calculation engine.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for performing a data task, comprising:
determining the data volume of a target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
2. The data task execution method of claim 1, wherein the target data query task is obtained by parsing a target query statement and is composed of a plurality of subtasks, and accordingly, determining the resource consumption cost of the target SQL task under each preset computing engine comprises:
determining IO read-write data volume, CPU resources and memory resources required by each subtask in the target data query task under each computing engine, and IO bandwidth, CPU resources and memory resources currently available for a data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform.
3. The data task execution method of claim 2, wherein the determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform comprises:
determining the sum of the ratio of IO read-write data volume of each subtask under each computing engine to the current available IO bandwidth of the data platform, the sum of the ratio of CPU resources to the current available CPU resources of the data platform, and the sum of the ratio of memory resources to the current available memory resources of the data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to each sum.
4. The data task execution method of claim 1, wherein determining a target computing engine to execute the target data query task according to the data volume and the resource computing cost comprises:
when the data volume is determined to exceed the first threshold value, determining the Hive calculation engine as a target calculation engine;
and when the data volume is determined not to exceed the first threshold value, determining the computing engine corresponding to the minimum value in the resource consumption cost under each computing engine as a target computing engine.
5. The data task execution method of claim 1, further comprising, prior to determining a target compute engine to execute the target data query task based on the amount of data and the resource computation cost:
acquiring an execution cache table corresponding to the target data query task;
and executing the target data query task according to a calculation engine prestored in an execution cache table.
6. The data task execution method of claim 5, wherein the execution cache table comprises a calculation engine execution table and data cache validity time information, and accordingly, the target data query task is executed according to a calculation engine pre-stored in the execution cache table, comprising:
and after the target data query task is determined to be cached according to the data caching effective time information, analyzing the target data query task from the computing engine execution table to obtain a pre-stored computing engine, and executing the target data query task according to the pre-stored computing engine.
7. A data task execution apparatus, comprising:
the determining module is used for determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and the execution module is used for determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
8. A data task execution system, comprising:
the task analyzer is used for determining a target data query task;
the task compiler is used for determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
the computing engine selector is used for determining a target computing engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target computing engine;
and the metadata repository is used for storing data required by executing the target data query task.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the data task execution method according to any of claims 1 to 6 are implemented when the processor executes the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data task execution method according to any one of claims 1 to 6.
CN202010529804.6A 2020-06-11 2020-06-11 Data task execution method and device, electronic equipment and storage medium Active CN111723112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010529804.6A CN111723112B (en) 2020-06-11 2020-06-11 Data task execution method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010529804.6A CN111723112B (en) 2020-06-11 2020-06-11 Data task execution method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111723112A true CN111723112A (en) 2020-09-29
CN111723112B CN111723112B (en) 2023-07-07

Family

ID=72567982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010529804.6A Active CN111723112B (en) 2020-06-11 2020-06-11 Data task execution method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111723112B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507029A (en) * 2020-12-18 2021-03-16 上海哔哩哔哩科技有限公司 Data processing system and data real-time processing method
CN113139205A (en) * 2021-04-06 2021-07-20 华控清交信息科技(北京)有限公司 Secure computing method, general computing engine, device for secure computing and secure computing system
CN113641487A (en) * 2021-07-06 2021-11-12 多点生活(成都)科技有限公司 Intelligent automatic switching method for SQL task execution engine of big data platform
CN116450757A (en) * 2023-06-19 2023-07-18 深圳索信达数据技术有限公司 Method, device, equipment and storage medium for determining evaluation index of data asset

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750690A (en) * 2013-12-25 2015-07-01 中国移动通信集团公司 Query processing method, device and system
CN104834561A (en) * 2015-04-29 2015-08-12 华为技术有限公司 Data processing method and device
CN105824957A (en) * 2016-03-30 2016-08-03 电子科技大学 Query engine system and query method of distributive memory column-oriented database
CN107168806A (en) * 2017-06-29 2017-09-15 上海联影医疗科技有限公司 Resource regulating method, system and the computer equipment of distribution scheduling machine
CN107609130A (en) * 2017-09-18 2018-01-19 链家网(北京)科技有限公司 A kind of method and server for selecting data query engine
CN108363746A (en) * 2018-01-26 2018-08-03 福建星瑞格软件有限公司 A kind of unified SQL query system for supporting multi-source heterogeneous data
CN108549683A (en) * 2018-04-03 2018-09-18 联想(北京)有限公司 data query method and system
CN108985367A (en) * 2018-07-06 2018-12-11 中国科学院计算技术研究所 Computing engines selection method and more computing engines platforms based on this method
CN110222072A (en) * 2019-06-06 2019-09-10 江苏满运软件科技有限公司 Data Query Platform, method, equipment and storage medium
US20200073987A1 (en) * 2018-09-04 2020-03-05 Salesforce.Com, Inc. Technologies for runtime selection of query execution engines
CN111190932A (en) * 2019-12-16 2020-05-22 北京淇瑀信息科技有限公司 Privacy cluster query method and device and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750690A (en) * 2013-12-25 2015-07-01 中国移动通信集团公司 Query processing method, device and system
CN104834561A (en) * 2015-04-29 2015-08-12 华为技术有限公司 Data processing method and device
CN105824957A (en) * 2016-03-30 2016-08-03 电子科技大学 Query engine system and query method of distributive memory column-oriented database
CN107168806A (en) * 2017-06-29 2017-09-15 上海联影医疗科技有限公司 Resource regulating method, system and the computer equipment of distribution scheduling machine
CN107609130A (en) * 2017-09-18 2018-01-19 链家网(北京)科技有限公司 A kind of method and server for selecting data query engine
CN108363746A (en) * 2018-01-26 2018-08-03 福建星瑞格软件有限公司 A kind of unified SQL query system for supporting multi-source heterogeneous data
CN108549683A (en) * 2018-04-03 2018-09-18 联想(北京)有限公司 data query method and system
CN108985367A (en) * 2018-07-06 2018-12-11 中国科学院计算技术研究所 Computing engines selection method and more computing engines platforms based on this method
US20200073987A1 (en) * 2018-09-04 2020-03-05 Salesforce.Com, Inc. Technologies for runtime selection of query execution engines
CN110222072A (en) * 2019-06-06 2019-09-10 江苏满运软件科技有限公司 Data Query Platform, method, equipment and storage medium
CN111190932A (en) * 2019-12-16 2020-05-22 北京淇瑀信息科技有限公司 Privacy cluster query method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
薛荷: "大数据存储优化及快速检索技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507029A (en) * 2020-12-18 2021-03-16 上海哔哩哔哩科技有限公司 Data processing system and data real-time processing method
CN112507029B (en) * 2020-12-18 2022-11-04 上海哔哩哔哩科技有限公司 Data processing system and data real-time processing method
CN113139205A (en) * 2021-04-06 2021-07-20 华控清交信息科技(北京)有限公司 Secure computing method, general computing engine, device for secure computing and secure computing system
CN113641487A (en) * 2021-07-06 2021-11-12 多点生活(成都)科技有限公司 Intelligent automatic switching method for SQL task execution engine of big data platform
CN116450757A (en) * 2023-06-19 2023-07-18 深圳索信达数据技术有限公司 Method, device, equipment and storage medium for determining evaluation index of data asset

Also Published As

Publication number Publication date
CN111723112B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN111723112A (en) Data task execution method and device, electronic equipment and storage medium
US11449570B2 (en) Data caching method and apparatus
CN111414389B (en) Data processing method and device, electronic equipment and storage medium
Abramova et al. Testing cloud benchmark scalability with cassandra
CN113419824A (en) Data processing method, device, system and computer storage medium
CN110727890A (en) Page loading method and device, computer equipment and storage medium
CN111581234A (en) RAC multi-node database query method, device and system
US20220300477A1 (en) Data Read/Write Method and Apparatus for Database
CN110046181B (en) Data routing method and device based on database distributed storage
US11055223B2 (en) Efficient cache warm up based on user requests
CN109388651B (en) Data processing method and device
CN109240998B (en) Configurable file parsing method
CN108647102B (en) Service request processing method and device of heterogeneous system and electronic equipment
CN108763421B (en) Data searching method and system based on logic circuit
CN111435327A (en) Log record processing method, device and system
CN112395437A (en) 3D model loading method and device, electronic equipment and storage medium
US20210149960A1 (en) Graph Data Storage Method, System and Electronic Device
CN114064725A (en) Data processing method, device, equipment and storage medium
CN114637969A (en) Target object authentication method and device
US20210360005A1 (en) Inferring watchpoints for understandable taint reports
CN115794806A (en) Gridding processing system, method and device for financial data and computing equipment
CN108984615B (en) Data query method and system and storage medium
CN112114962A (en) Memory allocation method and device
CN111953813A (en) IP address identification method, system, electronic device and storage medium
CN113282405B (en) Load adjustment optimization method and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant