CN117520112A

CN117520112A - Method, device, equipment and storage medium for efficiency analysis processing of computing task

Info

Publication number: CN117520112A
Application number: CN202210908156.4A
Authority: CN
Inventors: 薛文伟; 蒋杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2024-02-06

Abstract

The application relates to a method, a device, a computer device, a storage medium and a computer program product for efficiency analysis processing of computing tasks. The method relates to cloud technology, comprising: acquiring a query signature corresponding to a current computing task, acquiring a history execution record and a target engine in a preset time period, extracting a target history record set equivalent to the current computing task from the history execution record according to the query signature, and further performing efficiency analysis processing based on the target history record set to acquire an efficiency analysis processing result of executing the current computing task by using the target engine. By adopting the method, the target history record set equivalent to the current computing task can be subjected to efficiency improvement analysis processing so as to determine whether the current computing task can execute the efficiency improvement processing in advance, namely, the improvement analysis details of the execution efficiency of the current computing task can be accurately determined when the execution processing is carried out according to the matched target engine according to the obtained efficiency improvement analysis processing result.

Description

Method, device, equipment and storage medium for efficiency analysis processing of computing task

Technical Field

The application relates to the field of cloud technology, in particular to a method, a device, equipment and a storage medium for efficiency analysis and processing of a computing task.

Background

With the development of cloud technology and different types of data sources involved in actual business processing, in order to ensure normal and stable business processing, joint analysis is generally required to be performed on various types of data in different types of data sources, and in order to improve the processing efficiency of unified analysis on various types of heterogeneous data, resource allocation and processing engine allocation are required to be adjusted.

Because the actual service scene is complex and various, the technical characteristics and the merits of various processing engines are obviously different, the big data combined analysis platform generally provides various different engines for the manual selection and switching of users so as to meet the actual service demands. However, because the method relies on the self-selection and switching operation of the user, there are still some users who do not know the processing characteristics of the specific processing engine and the influence factors such as the complexity and diversity of the service data to be processed, so that the optimal processing engine or resource allocation mode cannot be determined accurately. Therefore, query optimization analysis based on historical load appears, namely, a mode of dynamically adjusting and optimizing the resource allocation of a cluster and determining an optimal processing engine by adopting the historical execution conditions of data query and calculation tasks according to periodic scheduling so as to improve the execution efficiency.

However, the inventor finds that the current query optimization analysis mode based on the historical load is only suitable for a database execution script or a data warehouse service scene with definite periodic repetitive execution, and the statistical information acquisition and analysis processes are all off-line processing. When the method is used for processing the real-time service, the method can not meet the requirements of real-time online processing, and can not ensure that the determined processing engine meets the processing requirements of the actual service or not, and the execution efficiency of the processing engine is effectively improved.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device, a computer readable storage medium, and a computer program product for performing analysis processing of a computing task that can accurately determine details of the improvement of the execution efficiency when the computing task is executed by a processing engine.

In a first aspect, the present application provides a method for performing efficiency analysis processing on a computing task. The method comprises the following steps:

Acquiring a query signature corresponding to a current computing task;

acquiring a history execution record and a target engine within a preset time period;

extracting a target history record set equivalent to the current computing task from the history execution record according to the query signature;

and carrying out efficiency analysis processing based on the target history record set to obtain efficiency analysis processing results of executing the current computing task by using the target engine.

In a second aspect, the application further provides a device for efficiency analysis and processing of the computing task. The device comprises:

the query signature acquisition module is used for acquiring a query signature corresponding to the current calculation task;

the history execution record obtaining module is used for obtaining a history execution record and a target engine in a preset time period;

the target history record set extraction module is used for extracting a target history record set equivalent to the current calculation task from the history execution record according to the query signature;

and the efficiency analysis processing result obtaining module is used for carrying out efficiency analysis processing based on the target history record set to obtain the efficiency analysis processing result of executing the current calculation task by utilizing the target engine.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

acquiring a query signature corresponding to a current computing task;

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

acquiring a query signature corresponding to a current computing task;

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

acquiring a query signature corresponding to a current computing task;

In the efficiency analysis processing method, the efficiency analysis processing device, the computer equipment, the storage medium and the computer program product of the computing task, the target history record set equivalent to the current computing task is extracted from the history execution record by acquiring the query signature corresponding to the current computing task and acquiring the history execution record and the target engine within the preset time period according to the query signature. And then the efficiency analysis processing can be carried out based on the target history record set, so that the efficiency analysis processing result of executing the current calculation task by utilizing the target engine is obtained. The method and the device realize that by carrying out efficiency improvement analysis processing on the target historical record set equivalent to the current calculation task, whether the current calculation task can execute the efficiency improvement processing or not is determined in advance, namely, according to the obtained efficiency improvement analysis processing result, the improvement analysis details of the execution efficiency of the current calculation task when the execution processing is carried out according to the matched target engine can be accurately determined.

Drawings

FIG. 1 is an application environment diagram of a method of efficient analytical processing of computing tasks in one embodiment;

FIG. 2 is a flow diagram of a method of efficient analytical processing of computing tasks in one embodiment;

FIG. 3 is a schematic diagram of a process for generating a result of an efficiency analysis process in one embodiment;

FIG. 4 is a flowchart of acquiring a query signature corresponding to a current computing task in one embodiment;

FIG. 5 is a schematic diagram of query signatures corresponding to computing tasks in one embodiment;

FIG. 6 is a flow diagram of obtaining results of an efficiency analysis process for executing a current computing task using a target engine, in one embodiment;

FIG. 7 is a flowchart of another embodiment for obtaining results of an efficiency analysis process for executing a current computing task using a target engine;

FIG. 8 is a flow chart of a method of performing efficient analysis of computing tasks in another embodiment;

FIG. 9 is a block diagram of an apparatus for efficient analysis of computing tasks in one embodiment;

FIG. 10 is a schematic architecture diagram of a computing task efficient analytical processing system in one embodiment;

FIG. 11 is an internal block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The method for efficiency analysis and processing of computing tasks provided by the embodiment of the application relates to Cloud technology (Cloud technology), wherein the Cloud technology refers to a hosting technology for integrating serial resources such as hardware, software, networks and the like in a wide area network or a local area network to realize computing, storage, processing and sharing of data. The cloud technology is also a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like based on cloud computing business model application, and can form a resource pool, and the resource pool is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

The Database (Database) may be considered as an electronic file cabinet, that is, a place where electronic files are stored, and a user may perform operations such as adding, querying, updating, deleting, etc. on data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application. A database management system (i.e., database Management System, abbreviated as DBMS) is a computer software system designed for managing databases, and generally has basic functions such as storage, interception, security, and backup. The database management system may classify according to the database model it supports, e.g., relational, XML (Extensible Markup Language ); or by the type of computer supported, e.g. server cluster, mobile phone; or by the query language used, such as SQL (structured query language (Structured Query Language), XQuery, or by the energy impact emphasis, such as maximum-scale, maximum-speed, or other classification means, regardless of which classification means is used, some DBMSs can cross-category, for example, while supporting multiple query languages.

The method for efficiency analysis and processing of the computing task provided by the embodiment of the application relates to a database technology in cloud technology, and can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The server 104 obtains a query signature corresponding to the current computing task, and obtains a history execution record and a target engine within a preset time period, so that a target history record set equivalent to the current computing task can be extracted from the history execution record according to the query signature. The data such as the query signature and the history execution record may be stored in the local storage of the terminal 102, or may be stored in the cloud storage or the data storage system of the server 104, and if the efficiency analysis processing needs to be performed, the server 104 further performs the efficiency analysis processing based on the target history record set, so as to obtain the efficiency analysis processing result of executing the current computing task by using the target engine.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, aircrafts and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein. The embodiment of the invention can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

In one embodiment, as shown in fig. 2, a method for performing efficiency analysis processing on a computing task is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:

step S202, acquiring a query signature corresponding to the current computing task.

The current computing task is used for representing the computing task for carrying out unified data analysis processing on heterogeneous data in different types of data sources, the types of the data sources and different service scenes are different, the corresponding big data combined analysis platforms are different, and further the processing requirements or task details of the corresponding computing task are different.

For example, a cross-engine intelligent fusion big data analysis platform of an enterprise, such as an enterprise a, supports multiple processing engines to perform data analysis processing, such as a Spark processing engine (i.e., a big data distributed computing engine), a prest processing engine (i.e., a memory-based MPP distributed SQL (structured query language) execution engine, where MPP is Massively Parallel Processing, representing massive parallel processing), and a Hive processing engine (i.e., a distributed computing engine based on a MapReduce framework), etc. The cross-engine intelligent fusion big data analysis platform can execute the calculation task, and the calculation task can be an SQL statement.

Specifically, the cross-engine intelligent fusion big data analysis platform can be matched with an optimal processing engine for each SQL sentence, for example, a prest processing engine with higher execution speed is selected by a small SQL sentence, a SQL sentence with large complex access data volume is selected, a Spark processing engine with higher execution stability is selected, a Hive processing engine is selected by a DDL (data definition language) sentence for creating or modifying table attributes, and the like, so that the execution efficiency and the resource utilization rate are improved.

The query signature represents a preset query template corresponding to the SQL statement, and is used for dividing all the SQL statements conforming to the preset query template (namely, the query signature) into the same equivalence class, namely, the query signature of each SQL statement in the equivalence class is character string equivalence (namely, complete matching), so that semantic limitation caused by judging equivalence only through the complete matching of the SQL statement is avoided, and meanwhile, an expansion optimization space which can be brought by a subsequent modification signature generation algorithm is reserved.

Specifically, for all computing tasks, it may be specifically an SQL statement (including parsing, checking, optimizing and executing, etc.) for all computing classes, such as an DQL statement (Data Query Language, i.e., a data retrieval statement, including select/with, screening, etc.), a DML statement (Data Manipulation Language, i.e., a data manipulation statement, whose keywords include insert in, insert over write, create-table-as-select (synchronous or asynchronous query source table and create a new table based on the query result, and insert the query result into the new table)), etc.

After each computing task is executed, no matter whether the execution succeeds or fails, a Query Signature (QS) field is newly added and generated during the flow and storage. While for other types of SQL statements, such as DDL statements (Data Definition Language, a data definition language, including create tables, drop tables, alter tables, etc.), and metadata command statements (desc, definition of look-up tables, show, use of database, etc.), no query signature is generated.

Further, the query signature field specifically includes a library table name accessed by the SQL statement and a column name included in the key subtask, and the current SQL statement and which historical SQL statement are equivalent can be matched and judged through the query signature. Where a critical subtask can be understood as a critical clause in an SQL statement, the critical clause can specifically include a Filter (or where, i.e., filter) clause, a Join (i.e., join) clause, a GroupBy (i.e., group) clause, and an Orderby (i.e., sort) clause.

Step S204, a history execution record and a target engine in a preset time period are obtained.

The historical execution record represents the execution record of the historical computing task before the current computing task is executed, and the preset time period can be adjusted and set according to actual requirements and is not limited to a certain specific value or a certain specific values. The target engine represents a current big data analysis platform, and a processing engine which needs to be focused is determined according to an actual scene, namely whether the execution efficiency of the target engine can be improved when the current calculation task is executed according to the target processing engine is specifically judged. In this embodiment, the target engine may specifically be a prest processing engine, and may also be adjusted to be a Spark processing engine or a Hive processing engine according to different actual service scenarios.

In this embodiment, the HBO process (History Based Optimization, i.e., the query optimization process based on the historical load) in the big data analysis platform is specifically considered, that is, the query optimization of the historical load is mainly performed, so as to determine whether the execution efficiency of the target engine can be improved when the target engine is used to execute the current computing task.

Further, in this embodiment, the cache index wide table is solidified by a unified metadata service component (such as Hive MetaStore (metadata service extension of data warehouse tool)) to extract the history execution record within a preset period of time from the index wide table, and the target engine matching with each history execution record. Each record of the index wide table corresponds to a historical SQL statement query, and includes information such as a query signature, execution time, engine type, result state, data volume, engine shuffle data (including data such as input data volume and output data volume related to allocation operation obtained after the engine reallocates the data), and the like.

Specifically, the execution state of each calculation task is obtained in real time, the execution state of each calculation task and the corresponding target engine are stored into a historical calculation task flow reservoir as historical execution record information, the historical execution record information stored in the historical calculation task flow reservoir is further written into an index wide table of the unified metadata service component in real time, and further the historical execution record in a preset time period and the target engine for executing the historical calculation task can be extracted from the index wide table.

The execution state of each computing task may include execution success and execution failure, and the execution state of each computing task and the corresponding target engine, for example, the detailed information of the execution success or the execution failure when using the prest processing engine, are stored as historical execution record information into a historical computing task stream reservoir.

Further, by writing the history execution record information stored in the history calculation task stream reservoir into the index wide table of the unified metadata service component in real time, the history execution record within a preset time period and the target engine for executing the history calculation task can be extracted from the index wide table. In this embodiment, due to focusing on the query optimization process based on the history load, the corresponding index wide table may be an HBO (query optimization based on the history load) index wide table, specifically, by integrating an HBase (distributed, column-oriented open source database) based persistence function and a dis (Remote Dictionary Server, i.e., remote dictionary service) cache acceleration function, to support the storage and retrieval functions of the HBO index wide table.

In one embodiment, the schema (set of database objects) of the HBO index wide table is shown in table 1 below:

TABLE 1

Further, referring to table 1, each record in the HBO index wide table corresponds to detailed execution characteristic data including a history SQL query, including a query signature, a task ID, success/failure status, engine type, execution time, CPU/memory/disk usage, amount of input/output/intermediate processing data, and stage cache information. The HBO index wide table synchronizes data from the historical computing task flow reservoir through the message queue at fixed time, and particularly updates and loads Redis after writing HBase. In this embodiment, when the amount of the search result data is moderate (for example, <100 records), the HBO index-wide table has an average response time within 100 ms.

Step S206, extracting a target history record set equivalent to the current calculation task from the history execution records according to the query signature.

The target historical record set comprises a set of historical computing tasks such as character string equivalence and the like, wherein the set of the target historical record set is equivalent to the query signature of the current computing task. And according to the query signature of the current computing task, a set of each historical computing task equivalent to the query signature of the current computing task as a character string can be determined from the historical execution records, so that a target historical record set equivalent to the current computing task is obtained.

Specifically, the history execution records in a preset time period are loaded into a cache component of the unified metadata service component, sorting and matching are carried out on each history execution record in the cache component according to the complete matching requirement with the query signature, a target history record set equivalent to the current calculation task is obtained, and then the corresponding target history record set is extracted according to the target parameter set by calling a query interface of the unified metadata service component.

Further, the unified metadata service component provides a cache component, and the cache component can be used for loading the history execution records written in the HBO index wide table within a preset time period, and further performs sorting processing on each history execution record in the cache component according to the unified metadata service component, specifically, the sorting processing can be performed according to the time of writing each history execution record in the HBO index wide table within the preset time period, so as to obtain the history execution record with the writing time closest to the current calculation task.

After the history execution record with the writing time closest to the current computing task is obtained, matching processing is further carried out on each history execution record according to the complete matching requirement with the query signature, so that a target history record set equivalent to the current computing task is obtained. The equivalent of the character string representing the query signature, namely the query signature of the current computing task and the character string of the query signature of each historical computing task in the historical execution record, so that a target historical record set equivalent to the query signature of the current computing task can be determined from the historical execution record.

In one embodiment, a mode of calling a query interface (i.e., REST API) provided by the unified metadata service component and transmitting a designated target parameter set is specifically adopted, so as to search in real time according to the target parameter set and obtain a target history record set in a preset time period corresponding to a current SQL statement. Wherein the search parameters are listed in the following table 2:

TABLE 2

As can be seen from table 2, the search parameters may specifically include a query signature, an engine type, an execution state, a constraint (e.g., the first N records after sorting, N may be set according to actual requirements), a search time, and a call timeout time.

In one embodiment, the partial matching of the query signature is realized by expanding based on full text retrieval of Redis, so that more historical data which is not completely equivalent but is partially similar to the current SQL sentence (for example, the query signature substrings corresponding to the Join clause are the same but not the query substrings of all clauses are the same) are further obtained, the summarized analysis data are comprehensively considered and fused, the accuracy of the final efficiency-improving analysis result is improved, and the specific improvement details of the corresponding execution efficiency are obtained.

And step S208, performing efficiency analysis processing based on the target history record set to obtain efficiency analysis processing results of executing the current calculation task by using the target engine.

Specifically, the target history record set obtained through analysis and statistics is synthesized with detailed parameters such as execution time, failure rate, engine distribution and the like, and a preset parameter threshold is compared to determine whether to calculate and efficiency the current calculation task (such as the current SQL statement). The calculation efficiency is effective analysis processing, which can be understood as specifically aiming at the current SQL statement, the analysis is needed to judge whether the target engine is used for executing the SQL statement or not, and the execution efficiency is higher.

In this embodiment, in the big data analysis platform, for query optimization processing (i.e., HBO) based on a history load, the method focuses on judging and prohibiting the potentially invalid (i.e., failure in efficiency) SQL statements of the prest processing engine, that is, judging whether each SQL statement has possibility of improving execution efficiency, specifically, whether the possibility of improving execution efficiency exists by using the prest processing engine, if the possibility of improving execution efficiency does not exist, the SQL statement needs to be removed from the SQL statement set of efficiency calculation analysis processing in advance, if the possibility of improving execution efficiency by using the prest processing engine exists, further judging success rate of efficiency improvement is needed, if the efficiency improvement success rate is lower, that is, the efficiency improvement failure possibility is higher, and similarly, the SQL statement needs to be removed from the SQL statement set of efficiency improvement calculation analysis processing in advance, and efficiency improvement analysis processing is not needed.

The enhancement analysis processing extension can be applied to other types of processing engines according to actual service scene requirements, such as processing engines of Livy+spark3, hive MapReduce (i.e. distributed computing engines based on a distributed computing framework), and the like, wherein Livy represents an open source REST service based on Apache Spark, apache Spark represents a distributed open source processing system for big data workload, and REST service represents a Web service architecture.

In one embodiment, after obtaining the target history record set corresponding to the current SQL statement, performing efficiency-improving analysis processing based on the target history record set to obtain an efficiency-improving execution success result (i.e., HBO, history Based Optimization) corresponding to the current computing task, and combining a query optimization analysis result (i.e., RBO, rule Based Optimization) based on rule matching and a query optimization analysis result (i.e., CBO, cost Based Optimization) based on cost estimation corresponding to the current computing task with the efficiency-improving execution success result to obtain an efficiency-improving analysis processing result for executing the current computing task by using the target engine.

Specifically, as shown in fig. 3, a process for generating a result of efficiency-improving analysis processing is provided, and as can be seen from fig. 3, when efficiency-improving analysis processing is performed, specifically, the execution state of each calculation task and the corresponding target engine are stored as history execution record information into a history calculation task flow reservoir, and a target history record set equivalent to the current calculation task is obtained by retrieving from the history calculation task flow reservoir. And further, by performing efficiency analysis processing, the execution record corresponding to each calculation task can be used for obtaining an HBO analysis result (namely, a load-based query optimization analysis result, and can be understood as an efficiency execution success result in the efficiency analysis processing process), and comprehensive weighted analysis is further performed by combining a rule-matching-based query optimization analysis result (namely, RBO), a cost-estimation-based query optimization analysis result (namely, CBO) and an efficiency execution success result, so as to obtain a final efficiency analysis processing result.

When the efficiency improvement analysis processing is carried out, the effect of increasing the RBO and the CBO into the HBO is achieved by considering the RBO, the CBO and the HBO, namely, the effect is enhanced or the effect is assisted to be improved, and meanwhile, the HBO can be fused on the basis of the RBO/CBO so as to achieve overall effect improvement.

Further, in the historical computing task flow reservoir, the execution success SQL set, the efficiency improvement failure SQL set, the execution failure SQL set, and other failure SQL sets specifically include, and the execution success SQL set represents the execution success SQL statement set by using a target engine (such as Presto processing engine). The efficiency failure SQL set represents SQL statements that include selected computational efficiency but fail execution when a processing engine (such as prest) is subsequently submitted, and fail execution is further followed by a failover to select other computing engines to execute (such as Spark processing engine or Hive processing engine, etc.), but have wasted limited prest computing resources. The execution failure SQL set represents SQL sentences with execution failure, and comprises SQL sentences in the conditions of target engine execution failure, target engine execution time super-threshold, target engine execution using resource super-threshold and the like.

In one embodiment, the HBO processing function in the big data analysis platform is turned on by default, specifically, the HBO processing function is set by a corresponding switching parameter of the HBO (for example, when the switching parameter=true, the HBO processing function is turned on), where when the parameter value of the switching parameter is set to false, the HBO processing function is disabled, that is, all SQL statements will not generate a corresponding query signature, and a signature field corresponding to the running water record is empty, and at the same time, history searching and HBO efficiency improvement determination will not be performed.

The on-off parameters of the HBO processing do not affect the on-off parameters of the original RBO/CBO processing of the platform, the HBO processing and the RBO/CBO processing independently operate, but the HBO processing can be fused on the basis of the original RBO/CBO processing, so that the efficiency of the processing analysis of the platform is improved.

Further, if the HBO processing result in the current efficiency improving flow is abnormal, only the alarm log is recorded by default, and execution of the SQL statement is not interrupted, namely, the HBO has no output result at this time, and the analysis result of calculating efficiency improving is determined according to the output results of the RBO and/or the CBO.

In the effectiveness analysis processing method of the computing task, the query signature corresponding to the current computing task is obtained, the history execution record and the target engine in the preset time period are obtained, and then the target history record set equivalent to the current computing task is extracted from the history execution record according to the query signature. And then the efficiency analysis processing can be carried out based on the target history record set, so that the efficiency analysis processing result of executing the current calculation task by utilizing the target engine is obtained. The method and the device realize that by carrying out efficiency improvement analysis processing on the target historical record set equivalent to the current calculation task, whether the current calculation task can execute the efficiency improvement processing or not is determined in advance, namely, according to the obtained efficiency improvement analysis processing result, the improvement analysis details of the execution efficiency of the current calculation task when the execution processing is carried out according to the matched target engine can be accurately determined.

In one embodiment, as shown in fig. 4, the step of obtaining a query signature corresponding to a current computing task specifically includes:

step S402, obtaining database table information accessed by a current computing task and column information used in a key subtask carried by the current computing task, wherein the database table information comprises a database table number and a database table name, and the column information comprises a column name and a column category symbol.

Specifically, the database table information required to be accessed by the current computing task is required to be acquired, including a database table name required to be accessed, and a database table number, wherein the database table name is used for representing a data table name carrying a database name prefix of a database, the data table name can be represented in a form of 'database name + table name', and the database table number can be determined by sequencing the database table number from first to last according to a dictionary order. Wherein the dictionary order representation is ordered according to the order of occurrence in the dictionary, and in the computer, for a single character, comprising 25 letters and numerical characters, the dictionary ordering is as follows: '0' < '1' < '2' < '> 9' < 'a' < 'b' < '> z'.

Likewise, column information used in the critical subtasks carried by the current computing task needs to be obtained, including a column name and a column category symbol, where the column name represents a field name corresponding to each data column in the data table of the database, and the column category symbol represents a symbol for representing the critical subtasks that need to be considered currently, and specifically includes a Filter (or white, i.e. filtering) clause, a Join (i.e. connection) clause, a GroupBy (i.e. grouping) clause, and an Orderby (i.e. sorting) clause, which are represented by f, j, g, o characters as column category symbols.

And step S404, splicing the access number and the library table name to obtain a first sub-query signature.

Specifically, all library table names are spliced in sequence according to the access numbers determined according to the dictionary sequence, and the access numbers are specifically adopted: and (3) in the form of a library name, a table name and a space character, sequentially splicing to obtain a first sub-query signature.

And step S406, splicing the column category symbol, the library table number to which the key subtask belongs and the column name to obtain a second sub-query signature.

Specifically, for each column category and column name, according to the serial number of the library table to which the key subtask belongs, the column category is adopted: library table number: and (5) in the form of column name and space character, splicing sequentially to obtain a second sub-query signature.

Step S408, based on the first sub-query signature and the second sub-query signature, a query signature corresponding to the current computing task is obtained.

Specifically, since a computing task may generally include multiple SQL statements, i.e. execute different sub-computing tasks, the specific composition of the corresponding SQL statements is different, the database table to be accessed and the related key sub-tasks are different, so that each SQL statement needs to be analyzed in turn to obtain a corresponding sub-query signature, when the database table needs to be accessed, a corresponding first sub-query signature is generated correspondingly, and when data operation processing is performed by using the key sub-task, a corresponding second sub-query signature is generated correspondingly. Finally, the query signature corresponding to the current computing task is obtained by combining the first sub-query signature and the second sub-query signature which are respectively corresponding to each SQL statement. One computing task corresponds to one query signature, and the query signature can be obtained by splicing the obtained sub-query signatures of all the sub-computing tasks.

In one embodiment, as shown in FIG. 5, a simple example of a query signature corresponding to a computational task is provided, where for each user SQL, the query signature is generated by traversing all nodes of the entire tree during the verification phase of the logical planning tree (i.e., sqlNode), extracting all library table names accessed by the SQL statement, and all column names used in the key grammar clauses contained, and concatenating them in order.

Specifically, referring to fig. 5, an SQL statement sample of one computing task is as follows:

SELECT count(distinct(col1))

from(SELECT*FROM db1.t1)

WHERE col2＝20211116)t1

LEFT JOIN

(SELECT*FROM db2.t2

WHERE col3＝20211115and col4＜100)t2

ON t1.id1＝t2.id2

GROUP BY t2.col5，t1.col6

ORDER BY t1.col7 desc，t2.col8

specifically, for an SQL statement that needs to access a database table, its corresponding sub-query signature is a first sub-query signature, and for a key clause that needs to be utilized, its sub-query signature is a second sub-query signature, such as "FROM (SELECT FROM db1.t1 WHERE col 2= 20211116) t1", by "0: db1.t1"," f:0:col2", where" 0: db1.T1 "means that the database table named" db1.T1 "needs to be accessed, and in the current computing task, when ordered in the dictionary order, the access number is" 0", and the meaning of" f:0: col2 "means that the data in the data column named" col2 "is screened from the database table named" db1.T1", wherein the WHERE key clause represents the filtering or screening meaning, is represented by the column classifier" f ", and the number" 0 "represents the database table named" db1.t1 "which needs to be accessed, and the number is" 0".

Likewise, for example, "(SELECT x FROM db2.T2 WHERE col 3=20211115 and col4 < 100) t2" can be obtained by "1: db2.t2"," f:1:col3"," f:1:col4", where" 1: db2.T2 "means that it is necessary to access the database table named" db2.T2 "and that the access numbers" 1"," f:1: col3 "and" f:1: col4 "when ordered in the dictionary order in the current computing task are as follows: the data in the data columns named "col3", "col4" are screened from the database table named "db2. T2". Wherein the WHERE key clause is represented by column category "f" and the number "1" indicates that the library table with the name "db2.t2" to be accessed has a "1" number.

For another example, "ON t1. Id1=t2.id2" can be represented by "j:0:id1", "j:1:id2", which means that a file with a file name "id1" in a database table with a database table name "db1.T1" needs to be linked to a file with a file name "id2" in a database table with a database table name "db2.T2", wherein a JOIN key clause represents a linked meaning, represented by a category "j", a number "0" represents a database table number "0" with a database table name "db1.t1" needs to be accessed, and a number "1" represents a database table number "1" with a database table name "db2.t2" needs to be accessed.

Similarly, "GROUP BY t2.Col5, t1.Col6" may be expressed BY "g:0:c0l6", "g:1:c0l5", and "g:0:col6", "g:1:col5" have the meanings: the data in the database table with the database table name "db2.T2" listed as "col6" and the data in the database table with the database table name "db1.T1" listed as "col5" are grouped. Here, the GROUP BY key clause indicates the meaning of the packet, specifically BY the column classification symbol "g", and similarly, the number "0" indicates that the library table number of "db1.T1" is "0", and the number "1" indicates that the library table number of "db2.T2" is "1".

For another example, "ORDER BY t1.Col7 desc, t2.Col8" may be expressed BY "o:0:col7", "o:1:col8", where the meaning of "o:0:col7", "o:1:col8" is: the data in the database table with the name "col7" and the database table with the name "db2.T2" are sorted. The meaning of the ORDER BY key clause is the ORDER, specifically represented BY the column category "o", and similarly, the number "0" indicates that the library table number of "db1.T1" is "0", and the number "1" indicates that the library table number of "db2.T2" is "1".

In one embodiment, the sub-string separation in the query signature may be implemented based on a single space character, and may be flexibly extended to support strings containing 1-N whistespace (space, tab, line feed, etc.), i.e., the manner of separation is not specifically limited.

The query signature does not contain columns in the SQL Select clause, namely if only the Select list is different, the two SQL clauses consider HBO equivalent, namely the two SQL clauses have the same query signature. In addition, the query signature does not consider the order of the fields of Filter/Join/GroupBy/OrderBy in SQL text, or the fields appear in the main query or nested sub-query of SQL, but only considers which columns are referenced in which clauses, thereby achieving abstract query signature, being more convenient for judging the similarity between two SQL statement queries, and being equivalent between the SQL statement of the current computing task and the historical computing task.

In this embodiment, a first sub-query signature is obtained by obtaining a library table number and a library table name accessed by a current computing task and a column name and a column category used in a key sub-task carried by the current computing task, and splicing the access number and the library table name, and a second sub-query signature is obtained by splicing the column category, the library table number and the column name to which the key sub-task belongs, so as to obtain a query signature corresponding to the current computing task based on the first sub-query signature and the second sub-query signature. According to the method, corresponding query signatures are obtained through rapid splicing according to library table information and column information related to the current computing task, so that equivalent judgment between the current computing task and the historical computing task can be achieved according to the query signatures, the method is not limited to complete matching of texts in SQL sentences, semantic limitation caused by complete matching is avoided, and accuracy of a final efficiency improving analysis result is improved through comprehensive consideration and fusion of summarized analysis data, and accurate improvement details of corresponding execution efficiency are obtained.

In one embodiment, as shown in fig. 6, the step of obtaining the efficiency analysis processing result of executing the current computing task by using the target engine, that is, performing efficiency analysis processing based on the target history set, and obtaining the efficiency analysis processing result of executing the current computing task by using the target engine specifically includes:

step S602, dividing the target history record set into a first subset and a second subset according to the execution state.

Specifically, by acquiring the number of history execution records in the target history record set and a preset number threshold, and comparing the number of history execution records in the target history record set with the preset number threshold, whether the number of history execution records is larger than the preset number threshold is judged. The preset number of thresholds can be flexibly set and adjusted according to the actual service scene, and are not limited to a certain or some specific values, and in this embodiment, 3 can be specifically selected.

And if the number of the history execution records in the target history record set is greater than a preset number threshold, acquiring the execution state of each history execution record in the target history record set. If the execution state of the history execution record is the execution failure, dividing the history execution record into a first subset. Wherein the execution failure includes a target engine execution failure, a target engine execution time super-threshold, and a target engine execution use resource super-threshold.

Specifically, the execution failure may be understood as that when the target engine is used to execute the computing task, an error or mistake occurs, which results in the execution failure, and the execution time exceeds a threshold, which may be understood as that when the target engine is used to execute the computing task, the execution time consumed exceeds a corresponding preset execution time threshold. Similarly, executing the usage resource over-threshold may be understood as using the target engine to execute the computing task with the usage resource exceeding the corresponding preset resource consumption threshold.

In one embodiment, when determining the execution status of each history execution record in the target history record set, it is first determined whether the execution status is an execution failure, if the history execution record is not an execution failure, it is further determined whether the history execution record is a target engine execution failure, but after performing the failure transfer, the execution is successful through other non-target engines. When the execution state of a certain historical execution record is execution failure, the historical execution record needs to be added into the first subset, and if the historical execution record is execution failure of the target engine, the execution of the target engine is successful through other non-target engines after the failure transfer is performed, and the historical execution record is also divided into the first subset.

If the history execution record is not the execution failure of the target engine, it is further determined whether the execution time of the corresponding computing task executed by the target engine exceeds a threshold value and whether the execution use resource exceeds a threshold value. Specifically, if the history execution record is not the execution failure of the target engine, but the execution time of the target engine exceeds the threshold value, the corresponding history execution record is divided into a first subset, where the preset execution time threshold value may be adjusted and modified, and is not specifically limited, and in this embodiment, it may be specifically taken 3 minutes. If the historical execution record is not the target engine execution failure, and the execution time of the target engine does not exceed the threshold value, further judging whether the execution use resources of the target engine exceed the threshold value, and if the execution use resources of the target engine exceed the threshold value, dividing the corresponding historical execution record into a first subset.

Further, if the execution state of the history execution record is that the target engine is executed successfully, and the execution time and the execution use resource do not exceed the threshold, dividing the history execution record into a second subset to obtain a first subset and a second subset after division. If the execution state of the history execution record is that the execution of the non-target engine is successful (i.e. the execution is not performed by the target engine, or the execution of the target engine is not performed after failure, and then the failure transfer is performed, and the execution of the history execution record is successful by other non-target engines), the history execution record needs to be filtered and is not included in the efficiency analysis processing process.

For example, taking the target engine focused by the current big data analysis platform as a prest processing engine as an example, success and failure situations of executing a computing task by using the prest processing engine need to be obtained specifically, so as to divide the target history record set into a first subset and a second subset based on an execution state of the success or failure of executing the computing task. Specifically, the "state" field in table 2 may be checked to obtain a record of the execution success or the execution failure, so as to determine the execution state of the current computing task.

Specifically, firstly, whether the execution state is the execution failure is judged, if the historical execution record is not the execution failure, whether the historical execution record is the execution failure of the prest processing engine is further judged, but after the failure transfer, the execution is successful through other non-target engines. When the execution state of a certain historical execution record is execution failure, the execution state needs to be added into the first subset, and if the historical execution record is failure of the prest processing engine, but the execution is successful through other non-target engines after the failure transfer is performed, the historical execution record is also divided into the first subset.

Further, if the history execution record is not the prest processing engine execution failure, it is further determined whether the execution time of the corresponding computing task executed by the prest processing engine exceeds the threshold and whether the execution use resource exceeds the threshold. Specifically, if the history execution record is not a prest processing engine execution failure, but the execution time of the prest processing engine exceeds a threshold, the corresponding history execution record is divided into a first subset. The preset execution time threshold may be adjusted and modified, and is not specifically limited, and in this embodiment, 3 minutes may be specifically taken.

Likewise, if the history execution record is not the prest processing engine execution failure and the execution time of the prest processing engine does not exceed the threshold, further judging whether the execution use resource of the prest processing engine exceeds the threshold, and if the execution use resource of the prest processing engine exceeds the threshold, dividing the corresponding history execution record into the first subset. Wherein, whether the execution use resource exceeds the threshold value can be determined by checking CPU/memory/disk use, data amount, engine shuffle data and the like.

And if the execution state of the history execution record is that the prest processing engine is successfully executed and the execution time and the execution use resource do not exceed the threshold value, dividing the history execution record into a second subset. Similarly, if the execution state of the history execution record is that the execution of the non-target engine (such as Spark processing engine or Hive processing engine) is successful, that is, the execution record is not processed by the Presto processing engine, the execution record is filtered, and no efficiency analysis processing is performed. The judging of the processing engine of the history execution record specifically refers to judging whether the query_id field corresponding to the current computing task contains prest query ID and query IDs of other processing engines at the same time by looking up the query_id field in table 1, and further obtaining the execution states of the prest processing engine and other non-prest processing engines according to the corresponding query IDs.

Step S604, performing efficiency analysis processing based on the first subset and the second subset to obtain an efficiency execution success result corresponding to the current computing task.

Specifically, the number of computing tasks included in the first subset, namely the first task number, and the number of computing tasks included in the second subset, namely the second task number, are obtained, and the effective execution success rate corresponding to the current computing task is calculated based on the first task number and the second task number.

Further, after the efficiency improving execution success rate is obtained through calculation, a preset execution success rate threshold is further obtained, and the efficiency improving execution success rate is compared with the corresponding execution success rate threshold to obtain a corresponding efficiency improving execution success result. The execution success rate threshold can be adjusted and modified according to the actual service, and is not limited to a certain or some specific values, but in this embodiment, 0.75 is specifically preferable.

Specifically, the following formula (1) is adopted, and the efficiency improvement execution success rate p corresponding to the current calculation task is obtained through calculation:

where p is the efficiency execution success rate, s.size represents the number of computing tasks in the second subset, i.e., the second number of tasks, and f.size represents the number of computing tasks in the first subset, i.e., the first number of tasks. The current calculated efficiency execution success rate corresponds to each history execution record in the target history record set, and the target history record set is equivalent to the current calculation task, so that the calculated efficiency execution success rate corresponds to the current calculation task, and the predicted efficiency execution success rate of the current calculation task can be understood.

In one embodiment, when the success rate p of the effective execution is greater than the corresponding success rate threshold of the execution, the generated result of the successful execution of the effective execution is that the target engine is not prohibited, and if the success rate p of the effective execution is less than the success rate threshold of the execution, the generated result of the successful execution of the effective execution is that the target engine is prohibited.

When judging the number of the history execution records in the target history record set, if the number of the history execution records in the target history record set is smaller than a preset number threshold, the generated successful result of the effective execution is that the target engine is not forbidden to execute.

Step S606, extracting a query optimization analysis result matched with the rule corresponding to the current calculation task and a query optimization analysis result of cost estimation.

Specifically, when performing efficiency analysis processing by using the cross-engine intelligent fusion big data analysis platform, after obtaining an efficiency execution success result (i.e., HBO, historical load-based query optimization analysis result) corresponding to a current computing task, an original rule matching-based query optimization analysis result (i.e., RBO) and a cost estimation-based query optimization analysis result (i.e., CBO) need to be further obtained.

And step 608, comprehensively analyzing according to the successful efficiency execution result, the rule-matched query optimization analysis result and the cost-estimated query optimization analysis result to obtain the efficiency analysis processing result of executing the current calculation task by using the target engine.

Specifically, after obtaining a successful efficiency execution result, a rule matched query optimization analysis result and a cost estimated query optimization analysis result, respectively setting corresponding weights for the successful efficiency execution result, the rule matched query optimization analysis result and the cost estimated query optimization analysis result according to actual service requirements, so as to perform comprehensive weighted analysis and obtain an efficiency analysis processing result after weighted summation calculation. Wherein the weight can be flexibly adjusted, is not limited to a certain or some specific values,

according to the embodiment, the target history record set is divided into the first subset and the second subset according to the execution state, and efficiency analysis processing is carried out on the basis of the first subset and the second subset, so that the efficiency execution success result corresponding to the current computing task is obtained. And extracting a query optimization analysis result matched with the rule corresponding to the current calculation task and a query optimization analysis result of cost estimation, so as to comprehensively analyze according to the successful result of efficiency execution, the query optimization analysis result matched with the rule and the query optimization analysis result of cost estimation, and obtain an efficiency analysis processing result of executing the current calculation task by utilizing the target engine. The efficiency improvement analysis processing efficiency of the current calculation task and the reliability of efficiency improvement analysis processing results are realized, accurate efficiency improvement details are obtained, efficiency improvement failure probability in the data analysis processing process of the big data analysis platform can be further reduced, resource waste caused by efficiency improvement failure is further saved, and resource utilization rate is improved.

In one embodiment, as shown in fig. 7, the step of obtaining the efficiency analysis processing result of executing the current computing task by using the target engine, that is, performing efficiency analysis processing based on the target history set, and obtaining the efficiency analysis processing result of executing the current computing task by using the target engine specifically includes:

and judging whether the number of the history execution records in the target history record set is larger than a preset number threshold value.

If the number of the history execution records in the target history record set is smaller than a preset number threshold, the generated successful result of the effective execution is that the target engine is not forbidden to execute.

And if the number of the history execution records in the target history record set is greater than a preset number threshold, acquiring the execution state of each history execution record in the target history record set.

And judging whether the execution state of each history execution record is the execution failure or not.

If the execution state of the history execution record is the execution failure, dividing the history execution record into a first subset.

If the history execution record is not failed, judging whether the history execution record is failed in execution of the target engine or not, and if so, performing failure transfer, and then performing successful execution through other non-target engines.

If the history execution record is that the target engine fails to execute, but the target engine fails to execute successfully through other non-target engines after the failure transfer is carried out, the history execution record is divided into a first subset.

If the history execution record is not the failure of the target engine execution, judging whether the execution time exceeds the threshold value when the target engine is used for executing the corresponding calculation task.

If the historical execution record is not the execution failure of the target engine but the execution time of the target engine exceeds a threshold value, the historical execution record is divided into a first subset.

If the history execution record is not the execution failure of the target engine and the execution time of the target engine does not exceed the threshold value, judging whether the execution use resource of the target engine exceeds the threshold value.

If the historical execution record is not the target engine execution failure, the execution time of the target engine does not exceed the threshold value, but the execution use resource of the target engine exceeds the threshold value, the historical execution record is divided into a first subset.

If the execution state of the history execution record is that the target engine is successfully executed and the execution time and the execution use resource do not exceed the threshold value, dividing the history execution record into a second subset.

When the processing engine corresponding to the history execution record is the other target engine and the execution state of the history execution record is that the execution of the other non-target engine is successful, the execution record is filtered, and the efficiency analysis processing is not performed.

The divided first subset and second subset are obtained, and the effective execution success rate corresponding to the current calculation task is calculated based on the first task number in the first subset and the second task number in the second subset.

Judging whether the efficiency-improving execution success rate is larger than an execution success rate threshold value.

If the success rate of the efficiency execution is greater than the threshold value of the success rate of the execution, the generated result of the success rate of the efficiency execution is that the target engine is not forbidden to execute.

If the success rate of the efficiency execution is smaller than the threshold value of the success rate of the execution, the generated result of the success rate of the efficiency execution is the prohibition of the target engine to execute.

And extracting a query optimization analysis result matched with the rule corresponding to the current calculation task and a query optimization analysis result of cost estimation.

And comprehensively analyzing according to the successful efficiency execution result, the rule matching query optimization analysis result and the cost estimation query optimization analysis result to obtain the efficiency analysis processing result of executing the current calculation task by using the target engine.

In this embodiment, by acquiring a query signature corresponding to a current computing task, and acquiring a history execution record and a target engine within a preset period of time, a target history record set equivalent to the current computing task is extracted from the history execution record according to the query signature. And then the efficiency analysis processing can be carried out based on the target history record set, so that the efficiency analysis processing result of executing the current calculation task by utilizing the target engine is obtained. The method and the device realize that by carrying out efficiency improvement analysis processing on the target historical record set equivalent to the current calculation task, whether the current calculation task can execute the efficiency improvement processing or not is determined in advance, namely, according to the obtained efficiency improvement analysis processing result, the improvement analysis details of the execution efficiency of the current calculation task when the execution processing is carried out according to the matched target engine can be accurately determined.

In one embodiment, as shown in fig. 8, a flow chart of a method for efficiency analysis processing of a computing task is provided, which specifically includes:

step S801, obtain the database table information accessed by the current computing task and the column information used in the key subtasks carried by the current computing task, where the database table information includes a library table number and a library table name, and the column information includes a column name and a column category symbol.

Step S802, splice the access number and the list name to obtain a first sub-query signature, splice the list category symbol, the list number of the key sub-task and the list name to obtain a second sub-query signature.

Step 803, based on the first sub-query signature and the second sub-query signature, a query signature corresponding to the current computing task is obtained.

Step S804, the execution state of each calculation task is obtained in real time, and the execution state of each calculation task and the corresponding target engine are stored into a historical calculation task flow reservoir as historical execution record information.

And step S805, writing the history execution record information stored in the history calculation task stream reservoir into an index wide table of the unified metadata service component in real time.

Step S806, extracting the history execution record in the preset time period and the target engine matched with each history execution record from the index wide table, and loading the history execution record in the preset time period into the cache component of the unified metadata service component.

Step S807, according to the complete matching requirement with the query signature, sorting and matching are carried out on each history execution record in the cache component, and a target history record set equivalent to the current calculation task is obtained.

Step S808, a query interface of the unified metadata service component is called, and a corresponding target history record set is extracted according to the target parameter set.

Step S809, if the number of history execution records in the target history collection is greater than the preset number threshold, obtaining the execution status of each history execution record in the target history collection.

In step S810, if the execution status of the history execution record is the execution failure, the history execution record is divided into the first subset, and the execution failure includes the target engine execution failure, the target engine execution time exceeding the threshold, and the target engine execution use resource exceeding the threshold.

In step S811, if the execution status of the history execution record is that the target engine is executed successfully, and the execution time and the execution use resource do not exceed the threshold, the history execution record is divided into the second subset.

Step S812, obtaining the divided first subset and second subset, and calculating to obtain the effective execution success rate corresponding to the current calculation task based on the first task number in the first subset and the second task number in the second subset.

Step S813, comparing the efficiency execution success rate with a corresponding execution success rate threshold value to obtain a corresponding efficiency execution success result.

Step S814, extracting the query optimization analysis result matched with the rule corresponding to the current calculation task and the query optimization analysis result of the cost estimation.

Step S815, comprehensively analyzing according to the successful efficiency execution result, the rule matching query optimization analysis result and the cost estimation query optimization analysis result to obtain the efficiency analysis processing result of executing the current calculation task by using the target engine.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a calculation task efficiency analysis processing device for realizing the calculation task efficiency analysis processing method. The implementation scheme of the solution to the problem provided by the device is similar to the implementation scheme described in the above method, so the specific limitation in the embodiments of the device for performing efficiency analysis processing on one or more computing tasks provided below may refer to the limitation of the method for performing efficiency analysis processing on computing tasks hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 9, there is provided a device for efficiency analysis processing of a computing task, including: a query signature acquisition module 902, a history execution record acquisition module 904, a target history set extraction module 906, and a effectiveness analysis processing result acquisition module 908, wherein:

the query signature acquisition module 902 is configured to acquire a query signature corresponding to a current computing task.

The history execution record obtaining module 904 is configured to obtain a history execution record and a target engine within a preset period of time.

The target history record set extracting module 906 is configured to extract, from the history execution record, a target history record set equivalent to the current computing task according to the query signature.

And the efficiency analysis processing result obtaining module 908 is configured to perform efficiency analysis processing based on the target history set, and obtain an efficiency analysis processing result of executing the current computing task by using the target engine.

In the effectiveness analysis processing device for the computing task, the query signature corresponding to the current computing task is obtained, the history execution record and the target engine in the preset time period are obtained, and then the target history record set equivalent to the current computing task is extracted from the history execution record according to the query signature. And then the efficiency analysis processing can be carried out based on the target history record set, so that the efficiency analysis processing result of executing the current calculation task by utilizing the target engine is obtained. The method and the device realize that by carrying out efficiency improvement analysis processing on the target historical record set equivalent to the current calculation task, whether the current calculation task can execute the efficiency improvement processing or not is determined in advance, namely, according to the obtained efficiency improvement analysis processing result, the improvement analysis details of the execution efficiency of the current calculation task when the execution processing is carried out according to the matched target engine can be accurately determined.

In one embodiment, the query signature acquisition module is further configured to: acquiring database table information accessed by a current computing task and column information used in a key subtask carried by the current computing task; the database table information comprises a database table number and a database table name; the column information includes a column name and a column category; splicing the access number and the library table name to obtain a first sub-query signature; splicing the column category symbol, the library table number to which the key subtask belongs and the column name to obtain a second sub-query signature; and obtaining the query signature corresponding to the current computing task based on the first sub-query signature and the second sub-query signature.

In one embodiment, the history execution record obtaining module is further configured to: acquiring the execution state of each calculation task in real time, and storing the execution state of each calculation task and a corresponding target engine into a historical calculation task flow reservoir as historical execution record information; writing the history execution record information stored in the history calculation task stream reservoir into an index wide table of the unified metadata service component in real time; and extracting a history execution record in a preset time period from the index wide table, and a target engine for executing a history calculation task.

In one embodiment, the target history set extraction module is further configured to: loading the history execution record in a preset time period into a cache component of the unified metadata service component; according to the complete matching requirement with the query signature, sequencing and matching the history execution records in the cache assembly to obtain a target history record set equivalent to the current calculation task; and calling a query interface of the unified metadata service component, and extracting a corresponding target history record set according to the target parameter set.

In one embodiment, the efficiency analysis processing result obtaining module is further configured to: dividing the target history record set into a first subset and a second subset according to the execution state; performing efficiency analysis processing based on the first subset and the second subset to obtain efficiency execution success results corresponding to the current computing task; extracting a query optimization analysis result matched with a rule corresponding to the current calculation task and a query optimization analysis result of cost estimation; and comprehensively analyzing according to the successful efficiency execution result, the rule matching query optimization analysis result and the cost estimation query optimization analysis result to obtain the efficiency analysis processing result of executing the current calculation task by using the target engine.

In one embodiment, the efficiency analysis processing result obtaining module is further configured to: if the number of the history execution records in the target history record set is greater than a preset number threshold, acquiring the execution state of each history execution record in the target history record set; if the execution state of the history execution record is the execution failure, dividing the history execution record into a first subset; the execution failure comprises target engine execution failure, target engine execution time exceeding a threshold value and target engine execution use resource exceeding a threshold value; if the execution state of the history execution record is that the target engine is successfully executed and the execution time and the execution use resource do not exceed the threshold value, dividing the history execution record into a second subset; the divided first subset and second subset are obtained.

In one embodiment, the efficiency analysis processing result obtaining module is further configured to: calculating to obtain the efficiency-improving execution success rate corresponding to the current calculation task based on the first task number in the first subset and the second task number in the second subset; and comparing the efficiency-improving execution success rate with a corresponding execution success rate threshold value to obtain a corresponding efficiency-improving execution success result.

The above-mentioned various modules in the computing task efficiency analysis processing device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, as shown in fig. 10, there is provided an architecture diagram of a computing task efficiency analysis processing system, and referring to fig. 10, the computing task efficiency analysis processing system specifically includes:

p1, inquiring a signature generation module: and extracting database table information accessed by a current computing task (namely SQL statement) and column information used in related key subtasks by utilizing an HBO processing component of the big data analysis platform, and splicing the database table information and the column information in sequence to obtain a query signature corresponding to the current computing task.

Specifically, for the SQL sentences of all computing classes, after each computing task is executed, no matter the execution succeeds or fails, a Query Signature (QS for short) field is newly generated during the flow storage, the Query Signature field specifically comprises a library table name accessed by the SQL sentences and column names contained in the key subtasks, and the current SQL sentence and which historical SQL sentence are equivalent can be matched and judged through the Query Signature.

The method comprises the steps of obtaining database table information which needs to be accessed by a current computing task, including a database table name which needs to be accessed, and numbering the database table, wherein the database table name is used for representing a data table name carrying a database name prefix of a database, the data table name can be represented in a form of 'database name + table name', and the database table number can be obtained by sequencing the database table number from first to last according to a dictionary order. Likewise, column information used in the critical subtasks carried by the current computing task needs to be acquired, including column names and column category identifiers, wherein the column names represent field names corresponding to each data column in the data table of the database, and the column category identifiers represent symbols used for representing the critical subtasks to be considered currently.

Further, all library table names are spliced in sequence according to the access numbers determined according to the dictionary sequence, and the access numbers are specifically adopted: and (3) in the form of a library name, a table name and a space character, sequentially splicing to obtain a first sub-query signature. By sequentially splicing each column category and column name according to the serial number of the library table to which the key subtask belongs, the column category is specifically adopted: library table number: and (5) in the form of column name and space character, splicing sequentially to obtain a second sub-query signature. And further, based on the first sub-query signature and the second sub-query signature, obtaining a query signature corresponding to the current computing task.

P2, index wide table retrieval module: the method comprises the steps of writing the execution state of each calculation task and a corresponding target engine which are acquired in real time into a historical calculation task flow reservoir in real time by adopting a JDBC (Java database connection) mode as historical execution record information, and further writing the historical execution record information stored in the historical calculation task flow reservoir into an index wide table of a unified metadata service component in real time through a message queue (tdbank queue). The index wide table can load the history execution record in a preset time period into a cache component of the unified metadata service component.

Wherein the cache index wide table is solidified by an unified metadata service component (e.g., hive MetaStore (metadata service for data warehouse tool) extension) to extract from the index wide table the history execution records within a preset period of time, and the target engine matching each history execution record. Each record of the index wide table corresponds to a historical SQL statement query, and includes information such as a query signature, execution time, engine type, result state, data volume, engine shuffle data (including data such as input data volume and output data volume related to allocation operation obtained after the engine reallocates the data), and the like.

The corresponding index wide table may be an HBO (i.e., history load-based query optimization) index wide table because of focusing on the history load-based query optimization process, and specifically, the HBO index wide table storage and retrieval functions are supported by integrating an HBase persistence function and a Redis cache acceleration function.

P3, a historical data retrieval module: according to the complete matching requirement with the query signature, sequencing and matching are carried out on each history execution record in the cache component, a target history record set equivalent to the current calculation task is obtained, and then the corresponding target history record set is extracted according to the target parameter set by calling the query interface of the unified metadata service component. When ordering the history execution records, the ordering process may be performed according to different ordering logics, for example, the ordering process may be performed in a manner of execution time corresponding to the history execution records, or a preset weight sequence, and the like, and is not limited specifically. The obtained target history record set may be specifically TOP N history execution records after sorting (i.e., TOP N history execution records after sorting), where N may be set according to actual requirements, and is not specifically limited.

Specifically, a query interface (REST API) provided by the unified metadata service component is called, and a specified target parameter set is transmitted to search in real time according to the target parameter set, and a target history record set in a preset time period corresponding to a current SQL statement is obtained. The query interface adopts a millisecond retrieval mode, and can quickly feed back a target history record set for query signature matching.

And further, after the history execution record with the writing time closest to the current calculation task is obtained, matching processing is further carried out on each history execution record according to the complete matching requirement with the query signature, so as to obtain a target history record set equivalent to the current calculation task.

P4, effectiveness-improving analysis judging module (including subset division, success rate calculation and comprehensive analysis): and analyzing the target historical record set obtained by statistics, integrating detailed parameters such as execution time, failure rate, engine distribution and the like, performing multi-factor statistical analysis, comparing preset parameter thresholds, and determining whether to perform calculation efficiency on a current calculation task (such as a current SQL statement).

Specifically, the target history record set is divided into a first subset and a second subset according to the execution state, wherein if the execution state of the history execution record is the execution failure, the history execution record is divided into the first subset. Wherein the execution failure includes a target engine execution failure, a target engine execution time super-threshold, and a target engine execution use resource super-threshold. If the execution state of the history execution record is that the target engine is successfully executed and the execution time and the execution use resource do not exceed the threshold value, dividing the history execution record into a second subset to obtain a first subset and a second subset after division.

Further, the number of computing tasks included in the first subset, namely the first task number, and the number of computing tasks included in the second subset, namely the second task number, are obtained, the efficiency-improving execution success rate corresponding to the current computing task is obtained through calculation based on the first task number and the second task number, and the efficiency-improving execution success rate is compared with the corresponding execution success rate threshold to obtain a corresponding efficiency-improving execution success result.

After the successful result of the efficiency execution is obtained, comprehensive weighted analysis is further performed according to the successful result of the efficiency execution, the query optimization analysis result of rule matching and the query optimization analysis result of cost estimation, so that the efficiency analysis processing result of executing the current calculation task by utilizing the target engine is obtained.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 11. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing data such as a current computing task, a query signature, a history execution record in a preset time period, a target engine, a target history record set, a effectiveness-improving analysis processing result and the like. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for efficient analysis and processing of computing tasks.

It will be appreciated by those skilled in the art that the structure shown in fig. 11 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method for efficiency analysis processing of a computing task, the method comprising:

acquiring a query signature corresponding to a current computing task;

2. The method of claim 1, wherein the obtaining the query signature corresponding to the current computing task comprises:

acquiring database table information accessed by the current computing task and column information used in a key subtask carried by the current computing task; the database table information comprises a database table number and a database table name; the column information includes a column name and a column category;

splicing the access number and the library table name to obtain a first sub-query signature;

splicing the column category symbol, the library table number to which the key subtask belongs and the column name to obtain a second sub-query signature;

and obtaining a query signature corresponding to the current computing task based on the first sub-query signature and the second sub-query signature.

3. The method of claim 1, wherein the obtaining the historical execution record and the target engine for the predetermined period of time comprises:

acquiring the execution state of each calculation task in real time, and storing the execution state of each calculation task and a corresponding target engine into a historical calculation task flow reservoir as historical execution record information;

Writing the history execution record information stored in the history calculation task stream reservoir into an index wide table of the unified metadata service component in real time;

and extracting a history execution record in a preset time period from the index wide table, and a target engine for executing the history calculation task.

4. A method according to claim 3, wherein said extracting a set of target histories equivalent to the current computing task from the histories based on the query signature comprises:

loading the history execution record in the preset time period into a cache component of the unified metadata service component;

according to the complete matching requirement with the query signature, sequencing and matching each history execution record in the cache component to obtain a target history record set equivalent to the current calculation task;

and calling a query interface of the unified metadata service component, and extracting a corresponding target history record set according to the target parameter set.

5. The method according to any one of claims 1 to 4, wherein the performing a effectiveness analysis process based on the target set of histories to obtain a effectiveness analysis process result of executing the current computing task with the target engine, includes:

Dividing the target history record set into a first subset and a second subset according to the execution state;

performing efficiency analysis processing based on the first subset and the second subset to obtain an efficiency execution success result corresponding to the current computing task;

extracting a query optimization analysis result matched with a rule corresponding to the current calculation task and a query optimization analysis result of cost estimation;

6. The method of claim 5, wherein the partitioning the set of target histories into a first subset and a second subset according to an execution state comprises:

if the number of the history execution records in the target history record set is greater than a preset number threshold, acquiring the execution state of each history execution record in the target history record set;

if the execution state of the history execution record is the execution failure, dividing the history execution record into a first subset; the execution failure comprises target engine execution failure, target engine execution time exceeding a threshold value and target engine execution use resource exceeding a threshold value;

If the execution state of the history execution record is that the target engine is successfully executed and the execution time and the execution use resource do not exceed the threshold value, dividing the history execution record into a second subset;

the divided first subset and second subset are obtained.

7. The method of claim 6, wherein the performing efficiency analysis processing based on the first subset and the second subset to obtain the efficiency execution success result corresponding to the current computing task includes:

calculating to obtain the efficiency-improving execution success rate corresponding to the current calculation task based on the first task number in the first subset and the second task number in the second subset;

and comparing the efficiency-improving execution success rate with a corresponding execution success rate threshold value to obtain a corresponding efficiency-improving execution success result.

8. An apparatus for efficient analysis and processing of computing tasks, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.