CN111723112A - Data task execution method and device, electronic equipment and storage medium - Google Patents
Data task execution method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111723112A CN111723112A CN202010529804.6A CN202010529804A CN111723112A CN 111723112 A CN111723112 A CN 111723112A CN 202010529804 A CN202010529804 A CN 202010529804A CN 111723112 A CN111723112 A CN 111723112A
- Authority
- CN
- China
- Prior art keywords
- target
- data
- task
- query task
- data query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The embodiment of the invention provides a data task execution method, a data task execution device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining the data volume of a target data query task and the resource consumption cost of the target data query task under each preset computing engine; and determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine. According to the data task execution method, the data task execution device, the electronic equipment and the storage medium, the matched computing engines are selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, the situation that the unmatched computing engines influence the execution of the data query task is avoided, and the execution efficiency and the stability of the data query task are improved.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data task execution method and apparatus, an electronic device, and a storage medium.
Background
The task processing form of big data is mainly divided into batch task processing and real-time task processing. Batch tasking, also known as Hive offline tasking because of processing time issues. The real-time task processing technology realizes the quick response time of data processing, the real-time task processing has different implementation modes, and the real-time task processing also comprises Storm streaming calculation and spark real-time interactive calculation.
Whether batch or real-time task processing, may be referred to as a compute engine. Different computing engines have different application scenarios. The off-line task processing is suitable for processing data with extremely large data volume and few updates, but the defects are obvious, the response time is slow, the interactivity is poor, and the programming mode is single. Although real-time task processing has a fast response time, most real-time computing engines have high requirements on hardware due to internal memory dependence, and have low tolerance on data volume.
The existing data platform supports mixed computing engines of batch task processing and real-time processing, for example, the same data platform can simultaneously support live, Spark, Storm, Kylin and other data real-time and off-line computing engines, and each computing engine has a specific application scene. Therefore, before the data query task is executed, a specific calculation engine is manually designated, and at the moment, one calculation engine is fixedly selected for the task and the data calculation engine cannot be dynamically selected, so that the data query task is influenced by the unsuitable calculation engine, and the execution efficiency and the stability of the data query task are reduced.
Disclosure of Invention
To solve the problems in the prior art, embodiments of the present invention provide a data task execution method and apparatus, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present invention provides a data task execution method, including:
determining the data volume of a target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
Further, the target data query task is obtained by analyzing a target query statement and is composed of a plurality of subtasks, and accordingly, the resource calculation cost of the target data query task under each preset calculation engine is determined, which includes:
determining IO read-write data volume, CPU resources and memory resources required by each subtask in the target data query task under each computing engine, and IO bandwidth, CPU resources and memory resources currently available for a data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform.
Further, the determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform includes:
determining the sum of the ratio of IO read-write data volume of each subtask under each computing engine to the current available IO bandwidth of the data platform, the sum of the ratio of CPU resources to the current available CPU resources of the data platform, and the sum of the ratio of memory resources to the current available memory resources of the data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to each sum.
Further, the determining a target computing engine for executing the target data query task according to the data amount and the resource computing cost includes:
when the data volume is determined to exceed the first threshold value, determining the Hive calculation engine as a target calculation engine;
and when the data volume is determined not to exceed the first threshold value, determining the computing engine corresponding to the minimum value in the resource computing cost under each computing engine as a target computing engine.
Further, before determining a target computing engine for executing the target data query task according to the data amount and the resource consumption cost, the method further includes:
acquiring an execution cache table corresponding to the target data query task;
and executing the target data query task according to a calculation engine prestored in an execution cache table.
Further, the executing cache table includes a calculation engine executing table and data cache valid time information, and accordingly, the executing of the target data query task according to the calculation engine pre-stored in the executing cache table includes:
and after the target data query task is determined to be cached according to the data caching effective time information, analyzing and obtaining a pre-stored computing engine from the computing engine execution table to execute the target data query task.
In a second aspect, an embodiment of the present invention provides a data task execution device, including:
the determining module is used for determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and the execution module is used for determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
In a third aspect, an embodiment of the present invention provides a data task execution system, including:
the task analyzer is used for determining a target data query task;
the task compiler is used for determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
the computing engine selector is used for determining a target computing engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target computing engine;
and the metadata repository is used for storing data required by executing the target data query task.
In a fourth aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the data task execution method when executing the program.
In a fifth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the steps of the data task execution method as described above.
According to the data task execution method, the data task execution device, the electronic equipment and the storage medium, the matched computing engines are selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, the situation that the unmatched computing engines influence the execution of the data query task is avoided, and the execution efficiency and the stability of the data query task are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a data task execution method of the present invention;
FIG. 2 is a schematic diagram of a complete flow chart of a data task execution method according to the present invention;
FIG. 3 is a block diagram of an embodiment of a data task execution device according to the present invention;
FIG. 4 is a block diagram of a data task execution system according to an embodiment of the present invention;
FIG. 5 is a block diagram of an embodiment of an electronic device according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 illustrates a data task execution method provided by an embodiment of the present invention, including:
s11, determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and S12, determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
With respect to step S11 and step S12, it should be noted that, in the embodiment of the present invention, the target data query task is formed by parsing and converting a data query statement input by a system interface. The data query statements comprise SQL statements, SPARQL statements, DQL statements and the like, so the data query tasks obtained by analyzing and converting the statements can be SQL tasks, SPARQL tasks, DQL tasks and the like.
The following explains the method of this embodiment with an SQL task, specifically as follows:
and analyzing and converting the SQL statement into a task tree form, wherein the task tree has a plurality of tree nodes, and each tree node corresponds to one subtask. During the parsing process of the SQL statement, some primary parsing works including syntax checking are executed, and after the syntax parsing is passed, the target SQL task is determined to be qualified.
When each subtask in the SQL task is executed, the bottom data needs to be processed, and the data volume can be obtained by calculating the bottom data, so that the target SQL task is calculated to obtain the corresponding data volume.
In the embodiment of the present invention, the underlying data corresponding to the SQL task may be stored in a metadata repository, where the metadata repository further includes names of data tables related to the SQL task, columns and partitions of the tables, attributes of the tables, a directory in which the data of the tables is located, and the like.
In embodiments of the present invention, the calculation engine includes Hive (offline), Spark (real time), Storm (streaming), and Kylin calculations, among others. For different computing engines, the resource consumption cost of the target SQL task under each computing engine is different. Therefore, the resource consumption cost of the target SQL task is calculated to obtain the resource consumption cost under each calculation engine.
And then, according to the obtained data volume and the resource consumption cost under each computing engine, determining a target computing engine for executing the target SQL task through a preset computing engine selection strategy, and executing the target SQL task according to the target computing engine.
In contrast, the calculation engine selection policy determines one calculation engine from among the calculation engines as a target calculation engine for executing a target SQL task, based on the data volume and the resource consumption cost of each calculation engine as a screening condition.
According to the data task execution method provided by the embodiment of the invention, the matched computing engines are selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, so that the influence of the unmatched computing engines on the execution of the data query task is avoided, and the execution efficiency and the stability of the data query task are improved.
In a further embodiment of the foregoing embodiment method, the explanation mainly determines resource consumption costs of the target SQL task under each preset computing engine, specifically as follows:
s111, determining IO read-write data volume, CPU resources and memory resources required by each subtask in the target SQL task under each computing engine, and IO bandwidth, CPU resources and memory resources currently available for a data platform;
and S112, determining the resource consumption cost of the target SQL task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available by the data platform.
With respect to step S111 and step S112, it should be noted that, in the embodiment of the present invention, since it is mentioned in the above embodiment that the target SQL task includes a plurality of subtasks. And analyzing each subtask to obtain corresponding IO read-write data volume, CPU resources and memory resources, and IO bandwidth, CPU resources and memory resources currently available for the data platform. Here, the data platform is the computing environment needed to perform data query tasks.
Determining the sum of the ratio of IO read-write data volume of each subtask under each computing engine to the current available IO bandwidth of the data platform, the sum of the ratio of CPU resources to the current available CPU resources of the data platform, and the sum of the ratio of memory resources to the current available memory resources of the data platform, and determining the resource consumption cost of the target data query task under each computing engine according to each sum.
In a further embodiment of the method in the above embodiment, a resource consumption cost obtaining formula is used to determine the resource consumption cost of the target SQL task under each computing engine.
Wherein the resource consumption cost obtaining formula includes:
Ccompute-engineis a resource consumption cost;
IOireading and writing data quantity, IO, required by the ith subtask in the target SQL task under the computing engineuseableThe IO bandwidth currently available for the data platform;
CPUithe CPU resource required by the ith subtask in the target SQL task under the computing engineuseableIs a currently available CPU resource of the data platform;
Memithe memory resource, Mem, required by the ith subtask in the target SQL task under the computing engineuseableIs the currently available memory resource of the data platform.
It should be noted that the amount of IO read and write data, CPU resources, and memory resources required by each subtask under different computing engines may be different. For this reason, the resource consumption cost of the target SQL task is different under each computing engine.
In a further embodiment of the foregoing embodiment method, a process of determining a target computing engine for executing a target SQL task according to a data amount, a resource computing cost, and a preset computing engine selection policy is mainly explained, where the data amount of the target SQL task and a resource consumption cost of the target SQL task under the computing engine are both numerical values, and therefore, the numerical values are mainly determined by using the computing engine selection policy, so as to determine a matched computing engine as the target computing engine of the target SQL task. The method comprises the following specific steps:
the calculation engine selection strategy comprises the following steps:
and when the data volume of the SQL task is determined to exceed the first threshold value, determining the Hive calculation engine as a target calculation engine.
And when the data volume of the SQL task is determined not to exceed the first threshold, determining the computing engine corresponding to the minimum value in the resource computing cost under each computing engine as a target computing engine.
In contrast, the Hive calculation engine is suitable for being applied to a large-data-volume offline calculation scenario, and the task execution time is relatively long. The computing engines such as Spark and Storm are suitable for being applied to a real-time computing scene, the data processing time is short, but the SQL task can be directly failed to be executed under the condition that the computing resource demand is large and the computing resource is insufficient.
Here, the first threshold is a boundary value for determining whether the SQL task is a large data volume task. Tasks that exceed 10T data volume, for example, employ hive calculation engines.
In a further embodiment of the method according to the above embodiment, before determining a target computing engine for executing the target SQL task according to the data size of the SQL task and the resource consumption cost of the SQL task in each computing engine, an explanation about a selection process of the computing engine for executing the target SQL task is mainly given as follows:
and acquiring an execution cache table corresponding to the target SQL task.
In the above embodiments, the underlying data corresponding to the SQL task may be stored in a metadata repository, where names of data tables related to the SQL task, columns and partitions of the tables, attributes of the tables, directories where the data of the tables are located, and the like are also included in the metadata repository. In addition, the metadata repository also comprises an execution cache table of the SQL task, and the execution cache table comprises information of a computing engine adopted by a certain SQL task within the cache effective time.
And after the data information on the execution cache table is determined to be valid, executing the target SQL task according to a calculation engine prestored in the execution cache table.
In a further embodiment of the foregoing embodiment method, the execution cache table includes a calculation engine execution table and data cache valid time information, and the calculation engine execution table includes a calculation engine used by the SQL task within the cache valid time. The data cache validity time information includes a cache validity time of the SQL task.
Therefore, after the target SQL task is determined to be cached according to the effective time information of the data cache, the prestored calculation engine execution target SQL task is obtained by analyzing the calculation engine execution table.
In addition, after the data information of the execution cache table is determined to be invalid, the data volume of the target SQL task and the resource consumption cost of the target SQL task under each preset computing engine are determined, and the target SQL task is executed according to the target computing engine.
It should be noted that, after determining a target computing engine for executing a target data query task according to the data amount and the resource consumption cost, and executing the target data query task according to the target computing engine, the target computing engine needs to be placed in an execution cache table and stored in the metadata repository.
In an embodiment, since the execution cache table is stored in the metadata repository, the system may directly access the metadata repository before determining the target computing engine for executing the target SQL task according to the data volume and the resource consumption cost, so as to quickly select a suitable computing engine according to the execution cache table.
With reference to the content of the foregoing embodiments, fig. 2 shows an overall flowchart of a data task execution method according to an embodiment of the present invention, and with reference to fig. 2, the following details are shown:
the SQL task has cache on an execution cache table, and if the Hive calculation engine is cached on the execution cache table, the SQL task is executed by adopting the Hive calculation engine; and if the Spark calculation engine is cached on the execution cache table, executing the SQL task by using the Spark calculation engine.
And if the SQL task is not cached in the execution cache table and the data volume exceeds a first threshold value, executing the SQL task by adopting a Hive calculation engine.
And if the data quantity does not exceed the first threshold, calculating the resource consumption cost of the SQL task under the Hive and Spark calculation engines.
And if the resource consumption cost under the Hive computing engine is relatively low, the Hive computing engine is adopted to execute the SQL task.
And if the resource consumption cost under the Spark calculation engine is relatively low, executing the SQL task by adopting the Spark calculation engine.
Fig. 3 shows a data task execution device according to an embodiment of the present invention, which includes a determining module 21 and an executing module 22, where:
the determining module 21 is configured to determine a data amount of the target data query task and resource consumption costs of the target data query task under preset computing engines;
and the execution module 22 is configured to determine a target computing engine for executing the target data query task according to the data volume and the resource consumption cost, and execute the target data query task according to the target computing engine.
In a further embodiment of the apparatus in the above embodiment, the determining module, in the process of determining the resource consumption cost of the target data query task under each preset computing engine, is specifically configured to:
determining IO read-write data volume, CPU resources and memory resources required by each subtask in the target data query task under each computing engine, and IO bandwidth, CPU resources and memory resources currently available for the data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform.
In a further embodiment of the apparatus in the above embodiment, determining resource consumption cost of the target data query task in each computing engine according to an IO read-write data amount, a CPU resource, and a memory resource required by each subtask in each computing engine, and an IO bandwidth, a CPU resource, and a memory resource currently available for the data platform, includes:
determining the sum of the ratio of IO read-write data volume of each subtask under each computing engine to the current available IO bandwidth of the data platform, the sum of the ratio of CPU resources to the current available CPU resources of the data platform, and the sum of the ratio of memory resources to the current available memory resources of the data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to each sum.
In a further embodiment of the apparatus in the foregoing embodiment, a resource consumption cost obtaining formula is used to determine resource consumption costs of the target data query task under each computing engine, where the resource consumption cost obtaining formula includes:
Ccompute-engineis a resource consumption cost;
IOithe IO read-write data volume, IO, required by the ith subtask in the target data query task under the calculation engineuseableIs a data platformThe IO bandwidth previously available;
CPUiinquiring CPU resources required by the ith subtask in the target data under a computing engine, wherein the CPU resources are used for the ith subtask in the target datauseableIs a currently available CPU resource of the data platform;
Memiinquiring memory resources, Mem, needed by the ith subtask in the target data query task under the computing engineuseableIs the currently available memory resource of the data platform.
In a further embodiment of the foregoing embodiment device, the execution module, in the determination process of determining a target computing engine for executing a target data query task according to the data volume and the resource consumption cost, is specifically configured to:
when the data volume is determined to exceed the first threshold value, determining the Hive calculation engine as a target calculation engine;
and when the data volume is determined not to exceed the first threshold value, determining the computing engine corresponding to the minimum value in the resource computing cost under each computing engine as a target computing engine.
In a further embodiment of the apparatus in the above embodiment, before determining a target computing engine for executing the target data query task according to the data size, the resource computing cost, and a preset computing engine selection policy, the apparatus further includes an obtaining module, configured to:
acquiring an execution cache table corresponding to the target data query task;
and executing the target data query task according to the calculation engine prestored in the execution cache table.
In a further embodiment of the apparatus in the foregoing embodiment, the execution cache table includes a calculation engine execution table and data cache valid time information, and accordingly, the obtaining module is specifically configured to, in a process of executing a target data query task according to a calculation engine prestored in the execution cache table:
and after the target data query task is determined to be cached according to the effective time information of the data cache, analyzing and obtaining a pre-stored computing engine execution target data query task from the computing engine execution table.
Since the principle of the apparatus according to the embodiment of the present invention is the same as that of the method according to the above embodiment, further details are not described herein for further explanation.
It should be noted that, in the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
According to the data task execution device provided by the embodiment of the invention, the matched computing engine is selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, so that the unmatched computing engines are prevented from influencing the execution of the data query task, and the execution efficiency and the stability of the data query task are improved.
Fig. 4 shows a schematic structural diagram of a data task execution system provided by an embodiment of the present invention, and referring to fig. 4, the system includes a task parser 31, a task compiler 32, a calculation engine selector 33, and a metadata repository 34, where:
a task parser 31 for determining a target data query task;
the task compiler 32 is configured to determine a data size of a target data query task and resource consumption costs of the target data query task under preset computing engines;
a calculation engine selector 33, configured to determine a target calculation engine for executing the target data query task according to the data amount and the resource consumption cost, and execute the target data query task according to the target calculation engine;
a metadata repository 34 for storing data required for performing the target data query task.
Since the system according to the embodiment of the present invention has the same principle as the method according to the above embodiment, further details are not described herein for further explanation.
It should be noted that, in the embodiment of the present invention, the relevant functional unit may be implemented by a hardware processor (hardware processor).
According to the data task execution system provided by the embodiment of the invention, the matched computing engine is selected from the computing engines to execute the data query task according to the data volume of the target data query task and the resource consumption cost of the target data query task under the preset computing engines, so that the situation that the execution of the data query task is influenced by the unmatched computing engines is avoided, and the execution efficiency and the stability of the data query task are improved.
Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)41, a communication Interface (communication Interface)42, a memory (memory)43 and a communication bus 44, wherein the processor 41, the communication Interface 42 and the memory 43 complete communication with each other through the communication bus 44. Processor 41 may call logic instructions in memory 43 to perform the following method: determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine; and determining a target calculation engine for executing the target data query task according to the data volume and the resource calculation cost, and executing the target data query task according to the target calculation engine.
Furthermore, the logic instructions in the memory 43 may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine; and determining a target calculation engine for executing the target data query task according to the data volume and the resource calculation cost, and executing the target data query task according to the target calculation engine.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for performing a data task, comprising:
determining the data volume of a target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
2. The data task execution method of claim 1, wherein the target data query task is obtained by parsing a target query statement and is composed of a plurality of subtasks, and accordingly, determining the resource consumption cost of the target SQL task under each preset computing engine comprises:
determining IO read-write data volume, CPU resources and memory resources required by each subtask in the target data query task under each computing engine, and IO bandwidth, CPU resources and memory resources currently available for a data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform.
3. The data task execution method of claim 2, wherein the determining the resource consumption cost of the target data query task under each computing engine according to the IO read-write data volume, the CPU resource and the memory resource required by each subtask under each computing engine, and the IO bandwidth, the CPU resource and the memory resource currently available for the data platform comprises:
determining the sum of the ratio of IO read-write data volume of each subtask under each computing engine to the current available IO bandwidth of the data platform, the sum of the ratio of CPU resources to the current available CPU resources of the data platform, and the sum of the ratio of memory resources to the current available memory resources of the data platform;
and determining the resource consumption cost of the target data query task under each computing engine according to each sum.
4. The data task execution method of claim 1, wherein determining a target computing engine to execute the target data query task according to the data volume and the resource computing cost comprises:
when the data volume is determined to exceed the first threshold value, determining the Hive calculation engine as a target calculation engine;
and when the data volume is determined not to exceed the first threshold value, determining the computing engine corresponding to the minimum value in the resource consumption cost under each computing engine as a target computing engine.
5. The data task execution method of claim 1, further comprising, prior to determining a target compute engine to execute the target data query task based on the amount of data and the resource computation cost:
acquiring an execution cache table corresponding to the target data query task;
and executing the target data query task according to a calculation engine prestored in an execution cache table.
6. The data task execution method of claim 5, wherein the execution cache table comprises a calculation engine execution table and data cache validity time information, and accordingly, the target data query task is executed according to a calculation engine pre-stored in the execution cache table, comprising:
and after the target data query task is determined to be cached according to the data caching effective time information, analyzing the target data query task from the computing engine execution table to obtain a pre-stored computing engine, and executing the target data query task according to the pre-stored computing engine.
7. A data task execution apparatus, comprising:
the determining module is used for determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
and the execution module is used for determining a target calculation engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target calculation engine.
8. A data task execution system, comprising:
the task analyzer is used for determining a target data query task;
the task compiler is used for determining the data volume of the target data query task and the resource consumption cost of the target data query task under each preset computing engine;
the computing engine selector is used for determining a target computing engine for executing the target data query task according to the data volume and the resource consumption cost, and executing the target data query task according to the target computing engine;
and the metadata repository is used for storing data required by executing the target data query task.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the data task execution method according to any of claims 1 to 6 are implemented when the processor executes the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data task execution method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010529804.6A CN111723112B (en) | 2020-06-11 | 2020-06-11 | Data task execution method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010529804.6A CN111723112B (en) | 2020-06-11 | 2020-06-11 | Data task execution method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111723112A true CN111723112A (en) | 2020-09-29 |
CN111723112B CN111723112B (en) | 2023-07-07 |
Family
ID=72567982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010529804.6A Active CN111723112B (en) | 2020-06-11 | 2020-06-11 | Data task execution method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111723112B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507029A (en) * | 2020-12-18 | 2021-03-16 | 上海哔哩哔哩科技有限公司 | Data processing system and data real-time processing method |
CN113139205A (en) * | 2021-04-06 | 2021-07-20 | 华控清交信息科技(北京)有限公司 | Secure computing method, general computing engine, device for secure computing and secure computing system |
CN113641487A (en) * | 2021-07-06 | 2021-11-12 | 多点生活(成都)科技有限公司 | Intelligent automatic switching method for SQL task execution engine of big data platform |
CN116450757A (en) * | 2023-06-19 | 2023-07-18 | 深圳索信达数据技术有限公司 | Method, device, equipment and storage medium for determining evaluation index of data asset |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104750690A (en) * | 2013-12-25 | 2015-07-01 | 中国移动通信集团公司 | Query processing method, device and system |
CN104834561A (en) * | 2015-04-29 | 2015-08-12 | 华为技术有限公司 | Data processing method and device |
CN105824957A (en) * | 2016-03-30 | 2016-08-03 | 电子科技大学 | Query engine system and query method of distributive memory column-oriented database |
CN107168806A (en) * | 2017-06-29 | 2017-09-15 | 上海联影医疗科技有限公司 | Resource regulating method, system and the computer equipment of distribution scheduling machine |
CN107609130A (en) * | 2017-09-18 | 2018-01-19 | 链家网(北京)科技有限公司 | A kind of method and server for selecting data query engine |
CN108363746A (en) * | 2018-01-26 | 2018-08-03 | 福建星瑞格软件有限公司 | A kind of unified SQL query system for supporting multi-source heterogeneous data |
CN108549683A (en) * | 2018-04-03 | 2018-09-18 | 联想(北京)有限公司 | data query method and system |
CN108985367A (en) * | 2018-07-06 | 2018-12-11 | 中国科学院计算技术研究所 | Computing engines selection method and more computing engines platforms based on this method |
CN110222072A (en) * | 2019-06-06 | 2019-09-10 | 江苏满运软件科技有限公司 | Data Query Platform, method, equipment and storage medium |
US20200073987A1 (en) * | 2018-09-04 | 2020-03-05 | Salesforce.Com, Inc. | Technologies for runtime selection of query execution engines |
CN111190932A (en) * | 2019-12-16 | 2020-05-22 | 北京淇瑀信息科技有限公司 | Privacy cluster query method and device and electronic equipment |
-
2020
- 2020-06-11 CN CN202010529804.6A patent/CN111723112B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104750690A (en) * | 2013-12-25 | 2015-07-01 | 中国移动通信集团公司 | Query processing method, device and system |
CN104834561A (en) * | 2015-04-29 | 2015-08-12 | 华为技术有限公司 | Data processing method and device |
CN105824957A (en) * | 2016-03-30 | 2016-08-03 | 电子科技大学 | Query engine system and query method of distributive memory column-oriented database |
CN107168806A (en) * | 2017-06-29 | 2017-09-15 | 上海联影医疗科技有限公司 | Resource regulating method, system and the computer equipment of distribution scheduling machine |
CN107609130A (en) * | 2017-09-18 | 2018-01-19 | 链家网(北京)科技有限公司 | A kind of method and server for selecting data query engine |
CN108363746A (en) * | 2018-01-26 | 2018-08-03 | 福建星瑞格软件有限公司 | A kind of unified SQL query system for supporting multi-source heterogeneous data |
CN108549683A (en) * | 2018-04-03 | 2018-09-18 | 联想(北京)有限公司 | data query method and system |
CN108985367A (en) * | 2018-07-06 | 2018-12-11 | 中国科学院计算技术研究所 | Computing engines selection method and more computing engines platforms based on this method |
US20200073987A1 (en) * | 2018-09-04 | 2020-03-05 | Salesforce.Com, Inc. | Technologies for runtime selection of query execution engines |
CN110222072A (en) * | 2019-06-06 | 2019-09-10 | 江苏满运软件科技有限公司 | Data Query Platform, method, equipment and storage medium |
CN111190932A (en) * | 2019-12-16 | 2020-05-22 | 北京淇瑀信息科技有限公司 | Privacy cluster query method and device and electronic equipment |
Non-Patent Citations (1)
Title |
---|
薛荷: "大数据存储优化及快速检索技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507029A (en) * | 2020-12-18 | 2021-03-16 | 上海哔哩哔哩科技有限公司 | Data processing system and data real-time processing method |
CN112507029B (en) * | 2020-12-18 | 2022-11-04 | 上海哔哩哔哩科技有限公司 | Data processing system and data real-time processing method |
CN113139205A (en) * | 2021-04-06 | 2021-07-20 | 华控清交信息科技(北京)有限公司 | Secure computing method, general computing engine, device for secure computing and secure computing system |
CN113641487A (en) * | 2021-07-06 | 2021-11-12 | 多点生活(成都)科技有限公司 | Intelligent automatic switching method for SQL task execution engine of big data platform |
CN116450757A (en) * | 2023-06-19 | 2023-07-18 | 深圳索信达数据技术有限公司 | Method, device, equipment and storage medium for determining evaluation index of data asset |
Also Published As
Publication number | Publication date |
---|---|
CN111723112B (en) | 2023-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111723112A (en) | Data task execution method and device, electronic equipment and storage medium | |
US11449570B2 (en) | Data caching method and apparatus | |
CN111414389B (en) | Data processing method and device, electronic equipment and storage medium | |
Abramova et al. | Testing cloud benchmark scalability with cassandra | |
CN113419824A (en) | Data processing method, device, system and computer storage medium | |
CN110727890A (en) | Page loading method and device, computer equipment and storage medium | |
CN111581234A (en) | RAC multi-node database query method, device and system | |
US20220300477A1 (en) | Data Read/Write Method and Apparatus for Database | |
CN110046181B (en) | Data routing method and device based on database distributed storage | |
US11055223B2 (en) | Efficient cache warm up based on user requests | |
CN109388651B (en) | Data processing method and device | |
CN109240998B (en) | Configurable file parsing method | |
CN108647102B (en) | Service request processing method and device of heterogeneous system and electronic equipment | |
CN108763421B (en) | Data searching method and system based on logic circuit | |
CN111435327A (en) | Log record processing method, device and system | |
CN112395437A (en) | 3D model loading method and device, electronic equipment and storage medium | |
US20210149960A1 (en) | Graph Data Storage Method, System and Electronic Device | |
CN114064725A (en) | Data processing method, device, equipment and storage medium | |
CN114637969A (en) | Target object authentication method and device | |
US20210360005A1 (en) | Inferring watchpoints for understandable taint reports | |
CN115794806A (en) | Gridding processing system, method and device for financial data and computing equipment | |
CN108984615B (en) | Data query method and system and storage medium | |
CN112114962A (en) | Memory allocation method and device | |
CN111953813A (en) | IP address identification method, system, electronic device and storage medium | |
CN113282405B (en) | Load adjustment optimization method and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |