CN115905293A - Switching method and device of job execution engine - Google Patents

Switching method and device of job execution engine Download PDF

Info

Publication number
CN115905293A
CN115905293A CN202211660710.8A CN202211660710A CN115905293A CN 115905293 A CN115905293 A CN 115905293A CN 202211660710 A CN202211660710 A CN 202211660710A CN 115905293 A CN115905293 A CN 115905293A
Authority
CN
China
Prior art keywords
engine
job execution
execution engine
executed
classification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211660710.8A
Other languages
Chinese (zh)
Inventor
顾光晔
朱超
倪颖婷
钟亚洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211660710.8A priority Critical patent/CN115905293A/en
Publication of CN115905293A publication Critical patent/CN115905293A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a switching method and a switching device of a job execution engine, which can be used in the technical field of deep learning, and the method comprises the following steps: acquiring a query statement to be executed; determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences; and applying the target operation execution engine and the engine parameters to execute the query statement to be executed. The method and the device can improve the reliability and the efficiency of switching the operation execution engines, and further improve the operation efficiency.

Description

Switching method and device of job execution engine
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for switching job execution engines.
Background
Currently, big data ecological execution engines are mainly divided into two types of job execution engines: a distributed offline execution engine and a memory-level job execution engine.
The distributed offline execution engine is suitable for executing offline data tasks, is mainly suitable for large-batch cluster tasks, and can ensure the stability of a production system compared with a memory-level job execution engine when data is too large or cluster resources are busy.
The memory-level job execution engine is a general memory-based parallel computing framework, aims to improve data processing performance through memory iterative computation, is a quick and general execution engine specially designed for large-scale computing data processing, and can be used for completing various computations. The output result in the middle of the task is stored in the memory, and the distributed file system does not need to be read and written any more, so that the working efficiency can be greatly improved. However, such job execution engines are not suitable when the data is too large or the cluster resources are busy.
Therefore, one of the two job execution engines is applied independently, which cannot adapt to a suddenly increased data volume in a certain day, and cannot meet the current situation that the requirements of the business for developing the data volume, requiring iteration and time effectiveness are increased day by day.
Disclosure of Invention
In view of at least one problem in the prior art, the present application provides a method and an apparatus for switching job execution engines, which can improve reliability and efficiency of switching job execution engines, and further improve job efficiency.
In order to solve the technical problem, the present application provides the following technical solutions:
in a first aspect, the present application provides a method for switching job execution engines, including:
acquiring a query statement to be executed;
determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences;
and applying the target operation execution engine and the engine parameters to execute the query statement to be executed.
Further, the obtaining the query statement to be executed includes:
acquiring a target query script, and sequentially selecting query sentences in the target query script as query sentences to be executed;
correspondingly, the executing the query statement to be executed by applying the target job execution engine and the engine parameter includes:
if the query statement to be executed is the first query statement in the target query script, determining the target job execution engine as a current job execution engine, and applying the current job execution engine and the engine parameters to execute the query statement to be executed, otherwise determining whether the target job execution engine is the same as the current job execution engine, if not, switching the current job execution engine to the target job execution engine, and applying the current job execution engine and the engine parameters to execute the query statement to be executed.
Further, before determining the target job execution engine and the engine parameter corresponding to the query statement to be executed according to the preset engine classification model, the method further includes:
acquiring batch historical query sentences and corresponding engine tags thereof;
and training the multi-classification model by applying batch historical query sentences and the engine labels corresponding to the batch historical query sentences to obtain the preset engine classification model.
Further, the obtaining of the batch historical query statements and their respective corresponding engine tags includes:
obtaining batch historical query sentences;
respectively applying a first job execution engine and a second job execution engine, executing each historical query statement to obtain an execution result of the historical query statement, and determining an engine tag of the historical query statement according to the execution result, wherein the job execution engine tag comprises: a first job execution engine and a second job execution engine.
Further, after the target job execution engine corresponding to the query statement to be executed is determined according to the preset engine classification model, the method further includes:
obtaining a validation set, the validation set comprising: batch historical query sentences and corresponding engine tags thereof;
applying the query sentence to be executed, the target operation execution engine corresponding to the query sentence to be executed and the engine parameters to train the preset engine classification model again;
and applying the verification set to determine whether the trained engine classification model is superior to the engine classification model before training, and if so, replacing the preset engine classification model with the trained engine classification model.
Further, the determining, according to a preset engine classification model, a target job execution engine and engine parameters corresponding to the query statement to be executed includes:
obtaining the characteristic data of the query statement to be executed by interpreting a command;
and inputting the characteristic data into the engine classification model, and determining an output result of the preset engine classification model as the target job execution engine and the engine parameter.
Further, the execution result includes: applying the error report identifier, the error report reason and the first execution time length corresponding to the first job execution engine and applying the second execution time length corresponding to the second job execution engine;
correspondingly, the determining the engine tag of the historical query statement according to the execution result includes:
if the error report flag of the historical query statement is an error report flag and the error report reason is a memory overflow reason, determining that the job execution engine tag of the historical query statement is a second job execution engine;
if the ratio of the first execution time length to the second execution time length of the historical query statement is smaller than or equal to a ratio threshold or the difference between the first execution time length and the second execution time length is smaller than a difference threshold, determining that the job execution engine label of the historical query statement is a second job execution engine;
and if the ratio of the first execution time length to the second execution time length of the historical query statement is greater than a ratio threshold or the difference between the first execution time length and the second execution time length is greater than a difference threshold, determining that the job execution engine label of the historical query statement is the first job execution engine.
In a second aspect, the present application provides a switching apparatus for a job execution engine, including:
the acquisition module is used for acquiring the query statement to be executed;
the determining module is used for determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences;
and the application module is used for applying the target operation execution engine and the engine parameters and executing the query statement to be executed.
In one embodiment, the obtaining module includes:
the device comprises an acquisition unit, a query unit and a query unit, wherein the acquisition unit is used for acquiring a target query script and sequentially selecting query sentences in the target query script as query sentences to be executed; correspondingly, the application module comprises:
and the switching unit is used for determining the target job execution engine as a current job execution engine if the query statement to be executed is the first query statement in the target query script, and executing the query statement to be executed by applying the current job execution engine and the engine parameters, otherwise determining whether the target job execution engine is the same as the current job execution engine, if not, switching the current job execution engine to the target job execution engine, and executing the query statement to be executed by applying the current job execution engine and the engine parameters.
In one embodiment, the switching device of the job execution engine further includes:
the historical data acquisition module is used for acquiring batch historical query sentences and corresponding engine tags thereof;
and the training module is used for training the multi-classification models by applying the batch of historical query sentences and the corresponding engine labels to obtain the preset engine classification models.
In one embodiment, the obtain historical data module comprises:
the history statement acquisition unit is used for acquiring batch history query statements;
a determining unit, configured to apply a first job execution engine and a second job execution engine respectively, execute each historical query statement, obtain an execution result of the historical query statement, and determine an engine tag of the historical query statement according to the execution result, where the job execution engine tag includes: a first job execution engine and a second job execution engine.
In one embodiment, the switching device of the job execution engine further includes:
a get verification set module for getting a verification set, the verification set comprising: batch historical query sentences and corresponding engine tags thereof;
the iteration updating module is used for applying the query statement to be executed, the corresponding target operation execution engine and the corresponding engine parameters to train the preset engine classification model again;
and the comparison module is used for applying the verification set to determine whether the trained engine classification model is superior to the engine classification model before training, and if so, replacing the preset engine classification model with the trained engine classification model.
In one embodiment, the determining module comprises:
the execution unit is used for obtaining the characteristic data of the query statement to be executed by interpreting a command;
and the input unit is used for inputting the feature data into the engine classification model and determining the output result of the preset engine classification model as the target job execution engine and the engine parameter.
In one embodiment, the execution result includes: applying the error report identifier, the error report reason and the first execution time length corresponding to the first job execution engine and applying the second execution time length corresponding to the second job execution engine; correspondingly, the determining unit comprises:
a first determining subunit, configured to determine that a job execution engine tag of the historical query statement is a second job execution engine if the error flag of the historical query statement is an error flag and the error reason is a cause of memory overflow;
a second determining subunit, configured to determine that the job execution engine tag of the historical query statement is a second job execution engine if a ratio between the first execution time length and a second execution time length of the historical query statement is less than or equal to a ratio threshold or a difference between the first execution time length and the second execution time length is less than a difference threshold;
a third determining subunit, configured to determine that the job execution engine tag of the historical query statement is the first job execution engine if a ratio between the first execution time length and the second execution time length of the historical query statement is greater than a ratio threshold or a difference between the first execution time length and the second execution time length is greater than a difference threshold.
In a third aspect, the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for switching the job execution engine when executing the program.
In a third aspect, the present application provides a computer-readable storage medium having stored thereon computer instructions which, when executed, implement the method for switching job execution engines.
In view of the foregoing technical solutions, the present application provides a method and an apparatus for switching job execution engines. Wherein, the method comprises the following steps: acquiring a query statement to be executed; determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences; the target operation execution engine and the engine parameters are applied to execute the query statement to be executed, so that the reliability and the efficiency of switching the operation execution engines can be improved, and the operation efficiency can be further improved; specifically, the efficiency of the production system can be improved on the basis of ensuring the stability of the production system; the method introduces the multi-classification problem of finding the optimal execution engine and distributing spark parameters into a deep learning scene, can reduce the threshold of using SPARKSQL to execute the engine on an application side, increase the spark operation quantity of a production system, improve the overall batch performance of a cluster, solve the problems that some users set spark parameters improperly in the current production and some cluster resources are wasted, solve the problems or interruption and error reporting of the operation performance in days with suddenly increased data volume due to fixed parameter setting, and improve the stability of the production system. The scripts developed by the developer for live and spark are the same, and two sets of codes do not need to be developed, but are automatically adjusted and adapted by the method. And the cluster resource utilization rate is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a first flowchart of a job execution engine switching method according to an embodiment of the present application;
fig. 2 is a second flowchart of a switching method of a job execution engine in the embodiment of the present application;
fig. 3 is a third flowchart of a switching method of a job execution engine in the embodiment of the present application;
FIG. 4 is a block diagram of an engine classification model in one example of the present application;
FIG. 5 is a block diagram of a three-layer fully connected neural network with Relu activation functions according to an example of the present application;
FIG. 6 is a logic diagram of a switching method of an industrial execution engine according to an embodiment of the present application;
FIG. 7 is a first block diagram of a switching device of a job execution engine according to an embodiment of the present application;
FIG. 8 is a second schematic diagram of a switching device of a job execution engine according to an embodiment of the present application;
fig. 9 is a schematic block diagram of a system configuration of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
More and more internet people are turning traditional trade with internet thinking now under the background of mobile internet era, and the customer portrait is in marketing reinforcing, customer insight, channel optimization, and the product innovation, the operation promotes, and the effect in fields such as risk prevention is obvious day by day, and big data is than the place of traditional data ox, and not only the volume is big, 2 most important characteristics in addition: multi-dimensionality and timeliness. The multidimensional client portrait label of the client portrait comprises: creating a two-dimensional label system, classifying according to service level and use depth, classifying according to service level, wherein the labels can be classified into first-level labels (individuals, legal people (large and medium legal people, small and micro enterprises)), second-level labels (14 types) and third-level labels; according to the use of deep classification, the labels can be classified into fact class labels, model class labels (including artificial rule models and artificial intelligence models), and prediction class labels. Given the endless evolution of business logic requirements, and the desire to more quickly and accurately delineate a customer (both personal and legal), the data aging requirements for batch label processing operations are increasing.
In order to facilitate understanding of the present solution, technical contents related to the present solution are explained below.
At present, big data ecological hive execution engines are mainly divided into SPARKSQL memory-level execution engines and hipesql execution engines of MapReduce, spark and Hadoop are highly integrated, and the Spark and the Hadoop can be perfectly matched for use. The distributed offline execution engine may be a HIVESQL job execution engine, and the memory-level job execution engine may be a SPARKSQL memory-level job execution engine.
HIVESSQL: setting a HIVE execution engine method set HIVE. If the execution engine is set to mr, the MapReduce programming model of Hadoop is called to run the JOB of the task needing to be executed. HIVE converts the calculation task of data into a MapReduce task in a SQL mode. The distributed offline execution engine, mr for short, is suitable for executing offline data tasks, and is mainly suitable for large-batch cluster tasks. However, when the data is too large or the cluster resources are busy, the HIVESQL job execution engine can ensure the stability of the production system compared with the SPARKSQL memory-level job execution engine.
SPARKSQL: the method is a general memory-based parallel computing framework, aims to improve data processing performance through memory iterative computation, is a quick and general execution engine specially designed for large-scale computing data processing, and can be used for completing various computations, such as SQL query. SPARKSQL has the advantages of Hadoop MapReduce, but different from MapReduce, JOB intermediate output results are stored in a memory, and HDFS reading and writing are not needed, so that the operation efficiency can be greatly improved. However, SPARKSQL does not work when the data is too large or the cluster resources are busy.
The root cause of SPARKSQL that can greatly improve the operation efficiency compared with HIVESQL is the memory. However, SPARKSQL does not work when the data is too large or the cluster resources are busy, for the following reasons:
1, if the data is too large in the SPARKSQL processing process, errors such as OOM memory overflow and the like can be caused.
2. If a plurality of large tables are operated by join and the like with large data quantity, even if the memory is enough, the execution speed of the SPARKSQL execution engine is still not necessarily faster than that of the MapReduce engine of HIVESQL.
With more and more high-aging lake entering operations, more and more applications can also use high-aging lake entering surface to run daytime quasi real-time batch. 96 batches of data lake high aging are configured, 96 batches are also configured on the application side, one batch needs to be completed within 15 minutes, and the support of each scene is difficult to realize only by using HIVESSQL.
The open-source spark2x version of the SPARKSQL carries out related analysis operation, and the threshold is high, so that the used application is still few. Counting the number of jobs deployed in an HIVE program and an SPARK program, wherein the number of HIVE jobs is 24660, the number of SPARK jobs is 441, the number of HIVE jobs and the number of SPARK jobs are close to 56:1. in the 441 spark programs, only 23 applications are involved, the number of applications exceeding 10 jobs is only 7 applications, and the analysis of the first 5 applications of the number of jobs is shown in table 1, so that the number of jobs of application 1 occupies nearly half of the ratio, which reaches 46.03%.
TABLE 1
Applications of the invention Number of jobs
Applications 1 203
Application 2 51
Applications 3 44
Application 4 40
Applications 5 29
The current development process is as follows:
(a) Additionally maintaining an oracle table bdsp.
ETL _ SYSTEM: job groups, for example: F-BCAS _000002;
ETL _ JOB: job names, for example: BCAS _ JZCS _ FIRST _030;
JOB _ QUEUE: configuring a usage resource queue, for example: a queue T;
NUM _ EXECUTORS: for example, 128 means that 128 num-executors are configured for use (128 executors processes are launched);
EXECUTOR _ CORES: for example, 2, means that 2 executors-cores (how many threads each process starts) are configured;
EXECUTOR _ MEMORY: for example, 20G, means that 20G executors memory (the size of memory available for each executors process) is used for configuration;
DRIVER _ MEMORY: for example, 4G is driver-memory (driver-side memory size) for configuring 4G;
DRIVER _ CORES: for example, 1 means that 1 driver-cores (number of cores used at driver end) is arranged;
MAIN _ CLASS: for example, load.f2.BcasJzcs _ TEST _ FIRST refers to which class in the jar packet is called;
JAR _ NAME: jar, for example, which jar packet to use;
JAVA _ PARA: for example, 512| $ { process _ date } | $ { version _ num } | $ { cur _ batch }, refers to the parameter in MAIN _ CLASS.
(b) Developing a jar package, for example: the operation calls a load.f2.BcasJzcs _ TEST _ FIRST class in a jar package, a specific SQL statement is written in a java program, and java _ para is a load.f2.BcasJzcs _ TEST _ FIRST entry; all SQL logics are written in the java program, and when an operation is added, a class is added to the jar packet or a jar packet is added.
As described above, the cost of adding a SPARK job is much higher than that of adding a HIVE job, if the SPARK parameter is set unreasonably or the data amount is increased suddenly for one day, the problem of memory overflow and error reporting is likely to occur, after all, the SPARK is to load the data on the HDFS into the memory for operation, and the number of natural SPARK jobs is far behind that of the HIVE job.
In summary, only the HIVESQL job execution engine or only the SPARKSQL job execution engine is applied, which cannot guarantee stable production and adapt to a sudden and violent data volume, and therefore cannot meet the increasing demands for developing data volume, requiring iteration and time efficiency in a business. Therefore, a method and a device for switching the operation execution engines are urgently needed, which can realize that the HIVESQL operation execution engine is automatically converted into the SPARKSQL operation execution engine, improve the operation efficiency, bring about several times of improvement of the operation efficiency under the low-investment background, and simultaneously can ensure the safety and stability of the production system.
Aiming at the defects existing in the prior technical scheme, the method improves the current SPARKSQL execution script, can provide the HQL script as with HIVE, is characterized in that the scripts developed by developers aiming at HIVE and spark are the same, and does not need to develop two sets of codes.
The entry combination can be automatically generated: the number of started processes, num-executors, is used for indicating how many processes are started; the thread number executor-cores started by each process is used for indicating how many threads are started by each process; the process can use the memory size exenutor-memor; a driver-memory of the driver end (main process); a driver end (main process) uses a kernel number driver-cores; the maximum number of available concurrences spark is calculated.
The program can automatically generate SPARK parameters by using rules according to the number and the size of files in an input table under the HDFS at the beginning, and then automatically evolves to utilize an artificial intelligent machine learning technology to automatically judge whether the submitted HQL script can be accelerated by applying the SPARKSQL operation execution engine, if so, the best matched SPARKSQL entry-parameter combination is automatically generated by utilizing a multi-classification algorithm, and the cost of improving the operation performance by utilizing SPARKSQL by a user is reduced. If errors are reported due to memory overflow or other reasons in the operation process of the SPARKSQL operation execution engine, HIVE operation can be automatically switched to, so that the safety and stability of the production system are ensured.
The method and the apparatus for switching job execution engines disclosed in the present application may be used in the field of financial technology, and may also be used in any field other than the field of financial technology.
The following examples are intended to illustrate the details.
In order to improve the reliability and efficiency of switching between job execution engines and further improve the job efficiency, the present embodiment provides a method for switching between job execution engines, where the execution subject is a switching device of the job execution engines, the switching device of the job execution engines includes but is not limited to a server, and as shown in fig. 1, the method specifically includes the following contents:
step 100: and acquiring the query statement to be executed.
Specifically, the Query statement to be executed may be an HQL statement in a Hibernate Query Language (HQL) script of a job task, for example, "SELECT BANKCO DE, COUNT (1) NUM FROM bdpviewb. Dcm _ NAP _ SFBPAMBK _ S WHERE PT _ DT = '2021-12-31' group BY BANKCODE ORDER BY NUM DESC LIMIT 1".
Step 200: and determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences.
Step 300: and applying the target operation execution engine and the engine parameters to execute the query statement to be executed.
Specifically, by applying the target job execution engine and the engine parameters, the efficiency of executing the query statement to be executed can be improved, and the efficiency of executing the job task can be further improved. When the target job execution engine is a HIVESSQL job execution engine, the engine parameters may be null. The engine parameters may include parameter names and parameter values, and when the target job execution engine is a SPARKSQL job execution engine, the engine parameters are SPARK parameters, and the engine parameter values corresponding to different query statements to be executed may be different. The SPARK parameters may include: num-execuors: setting how many Executor processes the Spark job needs to execute in total; execlutor-memory: a memory for setting each executing process; execlutor-cores: for setting the number of CPU cores per execution process, etc.
The query sentence to be executed corresponding to the multidimensional client portrait label can be obtained when the query operation is carried out on the multidimensional client portrait label; determining a target operation execution engine corresponding to the query statement to be executed according to a preset engine classification model; and executing the query statement to be executed by applying the target operation execution engine and the engine parameters so as to improve the accuracy of the query label and describe the client more quickly and accurately. The multidimensional client representation label may include: basic user attributes, user association relations, user interests and hobbies, user value information and the like; the user basic attributes may include: sex, age, region, etc.; the life association relationship may include: family relation, whether children and girls exist, co-worker relation, friend relation and the like; the user interests may include: the product holding proportion, the product holding amount, the golf liking and the like; the user value information may include: whether there is a car, an annual income interval, etc.
To further increase the flexibility of job execution engine switching, in one embodiment of the present application, step 100 comprises:
step 101: acquiring a target query script, and sequentially selecting query sentences in the target query script as query sentences to be executed; correspondingly, step 300 includes:
step 301: if the query statement to be executed is the first query statement in the target query script, determining the target job execution engine as a current job execution engine, and applying the current job execution engine and the engine parameters to execute the query statement to be executed, otherwise determining whether the target job execution engine is the same as the current job execution engine, if not, switching the current job execution engine to the target job execution engine, and applying the current job execution engine and the engine parameters to execute the query statement to be executed.
Specifically, the target query script may be an HQL script; sequentially selecting HQL sentences in the target query script as query sentences to be executed according to the front and rear positions of the HQL sentences in the target query script; and comparing the current query sentence to be executed in the target query script with the job execution engine corresponding to the previous query sentence to be executed, and if not, switching the job execution engine. Further, if it is determined that the target job execution engine is the same as the current job execution engine, the query to be executed may be executed using the current job execution engine and the engine parameters.
In order to obtain a reliable engine classification model and then apply the reliable engine classification model to improve the accuracy of determining the job execution engine, as shown in fig. 2, in an embodiment of the present application, before step 200, the method further includes:
step 021: and obtaining batch historical query sentences and corresponding engine tags thereof.
Specifically, the historical query statement may be a historical HQL statement; the engine tag corresponding to the historical HQL statement may include: and the operation execution engine and the engine parameters which have the optimal effect of executing the historical HQL statement are determined in advance.
Step 022: and training the multi-classification model by applying batch historical query sentences and the engine labels corresponding to the batch historical query sentences to obtain the preset engine classification model.
Specifically, an Eexplain command can be applied to process a batch of historical HQL statements to obtain respective characteristic data; training a multi-classification model by applying characteristic data of a batch of historical HQL sentences and engine labels to obtain the preset engine classification model; the engine tag may be determined based on expert experience.
To improve the accuracy of determining the job execution engine tag of the historical HQL statement, as shown in fig. 3, in one embodiment of the present application, step 021 includes:
step 0211: and acquiring a batch historical query statement.
Step 0212: respectively applying a first job execution engine and a second job execution engine, executing each historical query statement to obtain an execution result of the historical query statement, and determining an engine tag of the historical query statement according to the execution result, wherein the job execution engine tag comprises: a first job execution engine and a second job execution engine.
Specifically, the first job execution engine may be a SPARKSQL job execution engine, and the second job execution engine may be a HIVESQL job execution engine.
In order to further improve the reliability of the engine classification model, in an embodiment of the present application, after step 200, the method further includes:
step 400: obtaining a validation set, the validation set comprising: batch historical query statements and their respective corresponding engine tags.
Step 500: and applying the query sentence to be executed, the target operation execution engine corresponding to the query sentence to be executed and the engine parameters to train the preset engine classification model again.
Step 600: and applying the verification set to determine whether the trained engine classification model is superior to the engine classification model before training, and if so, replacing the preset engine classification model with the trained engine classification model.
Specifically, if the trained engine classification model is better than the pre-trained engine classification model, the preset engine classification model may be replaced with the trained engine classification model, and the trained engine classification model is applied to determine the next job execution engine.
Specifically, in order to solve the problem that the model accuracy rate is reduced over time, self-learning iterative updating may be performed on the engine classification model, for example, the model self-learning and self-updating automatically triggers to acquire the HQL statements of the latest january and performs complete model training again end to end, and the model super-parameter automatic tuning specifically includes: the number of neurons of the three fully-connected neural networks, the dimensionality of embeddings and the like are required to be trained for more than 9 rounds of model training at each inventory, and the learning rate learning _ rate parameter is dynamically reduced along with the increase of the number of rounds so as to obtain higher model accuracy. After the model training, the performance (A/B TEST) of the new model and the old model on the latest verification set is compared, and the new model with better performance is updated to the prediction service. If the old model performs better on the verification set than the retrained model, then the model is not updated and the prediction service is not released, i.e. the prediction service remains unchanged.
To further improve the reliability of determining the job execution engine, in one embodiment of the present application, step 200 comprises:
obtaining the characteristic data of the query statement to be executed by interpreting a command; and inputting the characteristic data into the engine classification model, and determining an output result of the preset engine classification model as the target job execution engine and the engine parameter.
In particular, the interpretive command may refer to an explain command, whose main role is to obtain execution plans of HQL statements, which may be optimized by analyzing these execution plans. The characteristic data may include: the number of stages obtained after the explain, whether the MR operation is related to the Data table obtained after the explain, the related Data table and the related field thereof, the Data tilt ratio condition, the Num rows quantity of the Data table displayed by the explain, the Data size quantity of the Data table displayed by the explain, the time spent on waiting for the statement to be awakened and the like.
In order to further improve the accuracy of determining the job execution engine tag of the historical HQL statement, in an embodiment of the present application, the execution result includes: applying the error report identifier, the error report reason and the first execution time length corresponding to the first job execution engine and applying the second execution time length corresponding to the second job execution engine; correspondingly, step 0212 includes:
step 201: if the error report identifier of the historical query statement is an error report identifier and the error report reason is a memory overflow reason, determining that the job execution engine label of the historical query statement is a second job execution engine;
step 202: and if the ratio of the first execution time length to the second execution time length of the historical query statement is smaller than or equal to a ratio threshold or the difference between the first execution time length and the second execution time length is smaller than a difference threshold, determining that the job execution engine label of the historical query statement is a second job execution engine.
Step 203: and if the ratio of the first execution time length to the second execution time length of the historical query statement is greater than a ratio threshold or the difference between the first execution time length and the second execution time length is greater than a difference threshold, determining that the job execution engine label of the historical query statement is a first job execution engine.
Specifically, when the error flag of the historical query statement is an error flag, the first execution duration of the historical query statement is null.
Specifically, the first job execution engine may be a SPARKSQL job execution engine, and the second job execution engine may be a HIVESQL job execution engine. If the error is reported after a statement is converted into the SPARKSQL job execution engine once, and the error reporting information is not a cluster problem but is the case of memory overflow, etc., the situation can be marked that the SPARKSQL job execution engine is no longer used for execution, and the HIVESQL job execution engine is still used. If a statement after the transformation of the SPARKSQL job execution engine does not report an error, but the execution efficiency is not improved to more than 1.5 times or shortened within 1 hour, the statement is marked that the SPARKSQL job execution engine is no longer used for execution, and the HIVESQL job execution engine is still used.
To further illustrate the present solution, the present application provides an application example of a job execution engine switching method, which is specifically described as follows:
step 1: constructing a knowledge base, wherein the input of the knowledge base can be HQL sentences, and the output can be collected direct library information; the knowledge base only stores statement information under the normal condition of the cluster, otherwise, the statement information is not included. The purpose of constructing the knowledge base is to provide decision basis for the subsequent execution of the rule model and provide sample data for the subsequent supervised deep learning model training.
Specifically, the skew ratio is mainly used to determine whether there is a skew condition in the associated field, and if the cross multiplication of two tables is greater than 1 hundred million, there is a risk of memory overflow, and the statements are as follows:
SELECT BANKCODE,COUNT(1)NUM
FROM bdpviewb.DCM_NAP_SFBPAMBK_S
WHERE PT_DT='2021-12-31'
GROUP BY BANKCODE
ORDER BY NUM DESC
LIMIT 1
the knowledge base information collected is as follows (i.e., the output of the knowledge base):
(1) The number of stages obtained after the explain, wherein the information extraction execution path information comprises the number of stages;
(2) Whether the obtained information after explain relates to MR operation or not is included in the information extraction execution path;
(3) Relating to a data table and an associated field thereof, wherein the information is extracted from the execution path information;
(4) The data tilt ratio condition is obtained by running the previous example sentence after the latest data of the information date is generated;
(5) The Num rows number of the data table displayed by the explain, wherein the information extraction execution path information comprises;
(6) The Data size quantity of the Data table displayed by the explain, wherein the information extraction execution path information comprises;
(7) The time spent by the sentence waiting for being called up is obtained in the log when the information is normally executed;
(8) The waiting time of the statement in the cluster (note: the number of the statement related to the data files is more than 1, and the first state in running still represents that the statement is in a waiting state if the statement is user core =1 and user container =1, and is not execution time), and the information can be obtained in the log when the information is normally executed; when the cluster is busy or resources do not exist at present, the sentence has a waiting phenomenon, and waiting time needs to be eliminated when the execution time is finally calculated;
(9) The sentence is executed in the cluster, and the information can be obtained in the log when being normally executed;
(10) Outputting the record number, which is used for judging whether spark acceleration can be used in the future or not, and reading the record number of the result table after the information is normally executed;
(11) The total recent (30 days) and historical execution times, namely the normal execution times of the statement;
(12) The recent (30 days) and historical execution times of the user, namely the normal execution times of the statement by the tenant.
For the safety and stability of the production system, and according to the experience knowledge currently held by the team, for the job related to the data exceeding 1T, the usage of SPARKSQL job execution engine requires a great deal of memory resources, and is likely to be less efficient than the HIVESQL job execution engine or even fail to exit. In the application example, after the core operation group finishes the batch tasks of T-1 day at 15-24 points per day of the main cluster or in the standby cluster, all sentences in the operations in the non-low-aging operation group are selected to form knowledge information, and the data volume related to the sentences is less than 1T. In one example, the characteristic data warehousing process includes that a client submits an HQL statement; analyzing an HQL statement through an explain command; submitting the feature data obtained after analysis to a cluster; waiting for execution; executing normally; storing the log data obtained after normal execution to a knowledge base; and storing the execution path information, the data table and the associated fields thereof in the feature data and the inclination ratio calculated according to the table and the associated fields in a knowledge base when the waiting date comes.
Step 2: a rule model is applied. The input of the rule model can be HQL statement, and the output can be whether the execution time is obviously improved after the operation execution engine is switched to SPARKSQL operation execution engine, if so, the parameter combination is recorded.
Specifically, by applying the HADOOP platform statements, firstly, after the core operation group finishes a batch task for a day T-1 according to 15 to 24 points, the statements meeting the requirement that the total data amount is less than 1T are automatically switched to the SPARKSQL operation execution engine, and in addition, according to the number of files and the size of the files in the input table under the HDFS, parameters of the SPARKSQL operation execution engine are automatically generated by using simple rules (mainly according to the number of the files under the HDFS), and for the operation of which the execution effect of the SPARKSQL operation execution engine is improved by more than 1.5 times compared with the efficiency of the conventional HIVESQL operation execution engine, the operations are put into a knowledge base.
After a period of time (several weeks), data accumulation is performed, similar statements (for example, 50 similar statements are recalled) are searched in a knowledge base by referring to an EXPLAIN result and an association table/field for core jobs, so as to predict the actual possible execution time of the SPARKSQL job execution engine, if the execution time exceeds a threshold value (whether the predicted promotion time is more than 1.5 times or is shortened by more than 1 hour), the HIVESQL job execution engine is still used and is not converted into the SPARKSQL job execution engine, and for jobs which the execution time can be promoted to more than 1.5 times or shortened by more than 1 hour, the execution parameters of the SPARKSQL job execution engine are generated by rules according to the number of the elements below HDFS.
Once a certain statement converts spark2x and then executes error reporting, and error reporting information is not a cluster problem but is the case of memory overflow and the like, the parameter configuration of one level is automatically promoted in the following days, and if problems occur in three consecutive levels, the situation is marked that the execution of the SPARKSQL job execution engine is not used any more, and the HIVESQL job execution engine is still used. The grade is preset according to an empirical value, and the empirical value is mainly derived from the operation which is configured as spark on the current production.
Once a certain statement is converted into spark2x, the execution is not wrong, but the execution efficiency is not improved to more than 1.5 times or the execution is shortened within 1 hour, the parameter configuration of a grade is automatically improved in the following days, the parameter configuration does not reach the standard in three consecutive days, the execution is marked as no longer being executed by using the SPARKSQL operation execution engine, and the HIVESSQL operation execution engine is still used.
And step 3: and (6) data labeling. The input of the data labeling process can be an HQL statement, and the output is suitable for which group of spark parameters or the HIVESSQL job execution engine is still suggested to be used, which is used for multi-classification labeling, namely the output is the labeling result of the training sample. And labeled sample data is mainly provided for the subsequent supervised deep learning. The latter deep learning is a multi-class model. The classification criteria may be preset according to an empirical value, where the empirical value is mainly derived from the job (manual parameter configuration job) configured as spark in the current production, and it is mentioned that the number of jobs deployed in the hive program and spark program is counted, and is as close as 56:1, even in small proportions, are not present, but are in the hundreds. Alternative classifications are shown in table 2.
TABLE 2
Figure BDA0004013755220000161
Figure BDA0004013755220000171
The marked data source is sentences selected for the rule model, and after post verification, if the sentence classification of the execution time which can achieve the time for increasing the time is more than 1.5 times or shortening the execution time for more than 1 hour is 1-6; the classification 7 is adopted for the situations such as the error report caused by the problem that the execution time cannot be increased by more than 1.5 times or shortened by more than 1 hour, or the spark operation non-system side problem.
And 4, step 4: and (5) characteristic engineering. The input of the feature engineering can be an HQL statement, the feature engineering mainly comprises 5 parts of data, and the combination of the output of the data labeling process and the output of the data labeling process can be used as a final training sample. The input to the feature engineering may be taken from information in a knowledge base.
The application example takes each sentence of HQL sentences in a specific HQL script as granularity construction characteristic engineering. Similar to the intelligent recommendation system, the client representation needs to be constructed preferentially, and in the current scenario, the HQL statement related features containing enough information need to be constructed. It should be noted that: the information in the knowledge base is mainly used for the rule model, but if the subsequent deep learning model is modeled by using the features in the knowledge base, it is obvious that the representation result of the model can hardly achieve the accurate effect. The feature engineering is taken from information in the knowledge base. The feature engineering here mainly contains 5 parts of data, as follows:
(1) The HQL statement relates to data quantity related features, and the partial features can be directly acquired from a knowledge base.
(2) Statement execution plan related features; the statement contains stage information, the number of maps involved, the number of reduce, the number of map processes involved at most once, and the number of reduce processes involved. The grade value after the sentence stage number of sub-buckets (same as the input and output table data amount sub-bucket method), the map process number of sub-bucket grade value, and the reduce process number of sub-bucket grade value.
(3) Cluster-related features; by current prediction time point: the number of HQL, cores and memories which are allocated and the number of cores and memories which are left at present are calculated;
(4) A tenant-related characteristic; by the current prediction time point, how many jobs are run on the tenant, the CORE used in total, the memory used in total, and theoretically the CORE/memory resources which can be applied at most, and how many statements waiting to be executed under the tenant at present.
(5) Job history execution characteristics; the last 30 hive average runtimes and latencies therein, and the last 30 spark average runtimes and latencies therein (if any).
And 5: and constructing a multi-classification deep learning model. The input when applying the multi-class deep learning model may be feature data and the output may be a specific class. The functions realized by the multi-classification deep learning model can be equivalent to the functions realized by the preset engine classification model.
Before an HQL statement is executed, on the basis of collecting characteristic engineering, a model is required to be combined with information of a current cluster, a tenant, the statement, execution historical performance and the like, a most suitable current execution engine is automatically selected for the HQL statement, if an SPARKSQL operation execution engine is selected, corresponding execution parameters are required to be distributed for the SPARKSQL operation, namely the problem that multiple categories have supervised learning exists, namely the multiple category decision problem needs to be converted into a deep learning scene.
(I) feature pretreatment
Feature preprocessing is the process of converting features into a more suitable algorithmic model by some conversion function. In the context of this application example, all features are first classified into two categories, one for continuity features and one for discrete features.
For the continuous type features, the dimension is unified by adopting a standardization method (the aim is to make the features follow standard normal distribution after processing as much as possible). Because different characteristics are different dimensions and dimension units, such as height and age, the units are different, namely the dimensions are different, the result of the model is influenced, and in order to eliminate index comparability, after data is subjected to standardization processing, the characteristics can be in the same order of magnitude, so that the method is suitable for comprehensive evaluation comparison of subsequent models.
For discrete features, the discrete features One-Hot are encoded (by using an N-bit state register to encode N states, each state having its own independent register bit, and only One of them being valid at any time), so that the distance between features can be calculated more reasonably. After One-Hot coding is carried out on the discrete features, the coded features can be regarded as continuous features in each dimension. The step 5 specifically comprises the following steps:
step 51: in one example, the model structure is shown in fig. 4, where the environment in fig. 4 includes all the features collected in the feature engineering, and these features are input into the model; extracting a vector feat _ vals which is composed of continuous features; extracting a vector feat _ ids composed of all discrete features; aiming at a vector feat _ ids composed of discrete features, embedding is used for reducing dimensionality in a model to obtain a dense vector, so that the problem of a sparse matrix brought by One-Hot is solved. Meanwhile, model training parameters are reduced, and the training speed is improved.
Step 52: and performing two-stage feature combination of the features by performing inner product on the dense vectors after Embedding and vectors feat _ vals consisting of continuous features. Imbeddings = tf.nn.embedding _ lookup (embeddings _ V, feat _ ids), embeddings = tf.manifold (embeddings, feat _ vals). The purpose of this is to promote the generalized stacked fully connected layers of the model, i.e. the multi-layer mlp. I.e. higher order feature combinations are obtained by fully connected neural networks. After passing through the full connection layer. A new addition hql is predicted, i.e. a combination of features that when encountered are never present in the training sample before, but can be obtained after embedding, and if this combination of features was trained before, the prediction is still valid. And simultaneously splicing the combined feature vector (concatenate) into an implicit layer as the input of the next step.
Step 53: in the previous step, the hidden layer is input into the neural network with the three layers of full connection of Relu activation functions, in an example, the structure of the neural network with the three layers of full connection of Relu activation functions is shown in FIG. 5, data inputs are input from one side of the neural network, and data outputs are output from the other side of the neural network, wherein the number of neurons in the hidden layer is obtained by automatic parameter tuning), because the problem is multi-classification (the number of classes is equal to 7), the output of the last layer includes 7 neurons, and the obtained result is recorded as y _ deep.
Step 54: and finally, obtaining the probability value of each classification through softmax. The low-price feature combination and the high-order feature combination are considered in the prediction probability, and the generalization capability of the model is further improved.
Step 55: considering a multi-classification problem, defining a loss value by adopting a cross entropy loss function according to the labeled label and the predicted result value, namely the loss function:
loss=tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y,labels=labels))
and updating parameters of the neural network by using an adam gradient descent method through neural network back propagation so as to achieve the purpose of minimizing a loss function, namely enabling an execution engine trained by the neural network to be closer to a labeled execution engine.
Step 6: and (4) carrying out self-learning iterative updating on the model. In order to solve the problem that the accuracy of the model is reduced along with the time, the self-learning iterative updating of the model is considered. Training samples of the last 1 month can be obtained, and the latest multi-classification deep learning model is obtained.
The model self-learning and self-updating are automatically triggered every month to obtain the HQL statement of the latest 1 month, and complete model training is carried out again end to end, wherein the model training comprises (model hyper-parameter automatic tuning, specifically, the number of neurons of three fully-connected neural networks, the dimensionality of embeddings and the like), the model training of each inventory needs more than 9 rounds of training, and the learning rate parameter (learning _ rate parameter is dynamically reduced) is increased along with the increase of the number of rounds so as to obtain higher model accuracy. After model training, the performance of the new and old models on the latest verification set (A/B TEST) is compared, and the new models with better performance are updated to the prediction service. If the old model performs better on the verification set than the retrained model, then the model is not updated and the prediction service is not released, i.e. the prediction service remains unchanged.
As shown in table 3, in an example, the average operation time of the whole job group participating in the test point is significantly increased, the average execution time of the job in the test point in the daily batch is reduced from the original average 38.668 minutes of each job to the current average 23.532 minutes, and the average time is shortened by 39.14%. The effect is improved remarkably, and meanwhile, through intelligently selecting the execution engine of the HQL statement, spark operation error reporting does not occur in the operation of the test point any more (because the bottom-preserving measures such as selecting proper execution parameters and automatically switching hive are ensured), and the stability of the production system is ensured.
TABLE 3
Figure BDA0004013755220000201
To further explain the present solution, the present application further provides an application example of a switching method of a job execution engine, as shown in fig. 6, which is specifically described as follows: obtaining an EXPLAIN result from a knowledge base, performing feature engineering, and inputting a deep learning model; inputting EXPLAIN results into a knowledge base; obtaining an EXPLAIN result from a knowledge base, inputting the EXPLAIN result into a deep learning model after the EXPLAIN result is processed by a rule model and data annotation, and obtaining a model file, namely the deep learning model after training; carrying out model monthly stock iterative update on the trained deep learning model; and applying the final execution engine and parameter combination predicted by the trained deep learning model.
In terms of software, in order to improve the reliability and efficiency of switching the job execution engines and further improve the job efficiency, the present application provides an embodiment of a switching device of job execution engines for implementing all or part of the contents in the switching method of job execution engines, and referring to fig. 7, the switching device of job execution engines specifically includes the following contents:
the obtaining module 01 is used for obtaining a query statement to be executed;
the determining module 02 is configured to determine a target job execution engine and engine parameters corresponding to the query statement to be executed according to a preset engine classification model, where the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query statements and engine labels corresponding to the batch historical query statements;
and the application module 03 is configured to apply the target job execution engine and the engine parameters to execute the query statement to be executed.
In one embodiment, the obtaining module includes:
the device comprises an acquisition unit, a query unit and a query unit, wherein the acquisition unit is used for acquiring a target query script and sequentially selecting query sentences in the target query script as query sentences to be executed; correspondingly, the application module comprises:
and the switching unit is used for determining the target job execution engine as a current job execution engine if the query sentence to be executed is the first query sentence in the target query script, and applying the current job execution engine and the engine parameters to execute the query sentence to be executed, otherwise determining whether the target job execution engine is the same as the current job execution engine, if not, switching the current job execution engine to the target job execution engine, and applying the current job execution engine and the engine parameters to execute the query sentence to be executed.
As shown in fig. 8, in an embodiment, the switching device of the job execution engine further includes:
the historical data acquisition module 04 is used for acquiring batch historical query sentences and corresponding engine tags thereof;
and the training module 05 is configured to train multiple classification models by applying the batch of historical query sentences and the engine labels corresponding to the batch of historical query sentences to obtain the preset engine classification models.
In one embodiment, the obtain historical data module comprises:
the history statement acquisition unit is used for acquiring batch history query statements;
a determining unit, configured to apply a first job execution engine and a second job execution engine respectively, execute each historical query statement, obtain an execution result of the historical query statement, and determine an engine tag of the historical query statement according to the execution result, where the job execution engine tag includes: a first job execution engine and a second job execution engine.
In one embodiment, the switching device of the job execution engine further includes:
a get verification set module for getting a verification set, the verification set comprising: batch historical query sentences and corresponding engine tags thereof;
the iteration updating module is used for applying the query statement to be executed, the target operation execution engine corresponding to the query statement to be executed and the engine parameters to train the preset engine classification model again;
and the comparison module is used for applying the verification set to determine whether the trained engine classification model is superior to the engine classification model before training, and if so, replacing the preset engine classification model with the trained engine classification model.
In one embodiment, the determining module comprises:
the execution unit is used for obtaining the characteristic data of the query statement to be executed by interpreting a command;
and the input unit is used for inputting the feature data into the engine classification model and determining the output result of the preset engine classification model as the target job execution engine and the engine parameter.
In one embodiment, the execution result includes: applying the error report identifier, the error report reason and the first execution time length corresponding to the first job execution engine and applying the second execution time length corresponding to the second job execution engine; correspondingly, the determining unit comprises:
a first determining subunit, configured to determine that a job execution engine tag of the historical query statement is a second job execution engine if the error flag of the historical query statement is an error flag and the error reason is a cause of memory overflow;
a second determining subunit, configured to determine that the job execution engine tag of the historical query statement is a second job execution engine if a ratio between the first execution time length and a second execution time length of the historical query statement is less than or equal to a ratio threshold or a difference between the first execution time length and the second execution time length is less than a difference threshold;
a third determining subunit, configured to determine that the job execution engine tag of the historical query statement is the first job execution engine if a ratio between the first execution time length and the second execution time length of the historical query statement is greater than a ratio threshold or a difference between the first execution time length and the second execution time length is greater than a difference threshold.
The embodiment of the switching apparatus of the job execution engine provided in this specification may be specifically configured to execute the processing procedure of the embodiment of the switching method of the job execution engine, and the functions of the embodiment are not described herein again, and reference may be made to the detailed description of the embodiment of the switching method of the job execution engine.
In terms of hardware, in order to improve the reliability and efficiency of switching the job execution engines and further improve the job efficiency, the present application provides an embodiment of an electronic device for implementing all or part of the contents of the method for switching the job execution engines, where the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the switching device of the job execution engine and the related equipment such as the user terminal; the electronic device may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the electronic device may be implemented with reference to the embodiment of the method for switching the job execution engine and the embodiment of the switching device for implementing the job execution engine in the embodiments, and the contents of the embodiments are incorporated herein, and repeated descriptions are omitted.
Fig. 9 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 9, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 9 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one or more embodiments of the present application, the switching function of the job execution engine may be integrated into the central processor 9100. The central processor 9100 may be configured to control as follows:
step 100: acquiring a query statement to be executed;
step 200: determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences;
step 300: and applying the target operation execution engine and the engine parameters to execute the query statement to be executed.
As can be seen from the above description, the electronic device provided in the embodiments of the present application can improve the reliability and efficiency of job execution engine switching, and thus can improve job efficiency.
In another embodiment, the switching device of the job execution engine may be configured separately from the central processor 9100, for example, the switching device of the job execution engine may be configured as a chip connected to the central processor 9100, and the switching function of the job execution engine may be realized by the control of the central processor.
As shown in fig. 9, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 9; in addition, the electronic device 9600 may further include components not shown in fig. 9, which may be referred to in the prior art.
As shown in fig. 9, the central processor 9100, which is sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, the central processor 9100 receives input and controls the operation of various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. The power supply 9170 is used to provide power to the electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 may be a solid-state memory, e.g., read Only Memory (ROM), random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
As can be seen from the above description, the electronic device provided in the embodiments of the present application can improve the reliability and efficiency of job execution engine switching, and thus can improve job efficiency.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the switching method of the job execution engine in the above embodiment, where the computer-readable storage medium stores thereon a computer program, and when the computer program is executed by a processor, the computer program implements all the steps of the switching method of the job execution engine in the above embodiment, for example, the processor implements the following steps when executing the computer program:
step 100: acquiring a query statement to be executed;
step 200: determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences;
step 300: and applying the target operation execution engine and the engine parameters to execute the query statement to be executed.
As can be seen from the foregoing description, the computer-readable storage medium according to the embodiments of the present application can improve the reliability and efficiency of job execution engine switching, and thus can improve job efficiency.
In the present application, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on differences from other embodiments. Reference is made to the description of the method embodiments in part.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation of the present application are explained by applying specific embodiments in the present application, and the description of the above embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method for switching job execution engines, comprising:
acquiring a query statement to be executed;
determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences;
and applying the target operation execution engine and the engine parameters to execute the query statement to be executed.
2. The job execution engine switching method according to claim 1, wherein the acquiring the query statement to be executed includes:
acquiring a target query script, and sequentially selecting query sentences in the target query script as query sentences to be executed;
correspondingly, the executing the query statement to be executed by applying the target job execution engine and the engine parameter includes:
if the query statement to be executed is the first query statement in the target query script, determining the target job execution engine as a current job execution engine, and applying the current job execution engine and the engine parameters to execute the query statement to be executed, otherwise determining whether the target job execution engine is the same as the current job execution engine, if not, switching the current job execution engine to the target job execution engine, and applying the current job execution engine and the engine parameters to execute the query statement to be executed.
3. The method for switching job execution engines according to claim 1, wherein before determining the target job execution engine and the engine parameters corresponding to the query statement to be executed according to a preset engine classification model, the method further comprises:
acquiring batch historical query sentences and corresponding engine tags thereof;
and training the multi-classification model by applying batch historical query sentences and the engine labels corresponding to the batch historical query sentences to obtain the preset engine classification model.
4. The method for switching job execution engines according to claim 3, wherein the obtaining of the batch historical query statements and their respective corresponding engine tags comprises:
obtaining batch historical query sentences;
respectively applying a first job execution engine and a second job execution engine, executing each historical query statement to obtain an execution result of the historical query statement, and determining an engine tag of the historical query statement according to the execution result, wherein the job execution engine tag comprises: a first job execution engine and a second job execution engine.
5. The method for switching job execution engines according to claim 1, wherein after determining the target job execution engine corresponding to the query statement to be executed according to a preset engine classification model, the method further comprises:
obtaining a verification set, the verification set comprising: batch historical query sentences and corresponding engine tags thereof;
applying the query sentence to be executed, the target operation execution engine corresponding to the query sentence to be executed and the engine parameters to train the preset engine classification model again;
and applying the verification set to determine whether the trained engine classification model is superior to the engine classification model before training, and if so, replacing the preset engine classification model with the trained engine classification model.
6. The method for switching between job execution engines according to claim 1, wherein the determining target job execution engine and engine parameters corresponding to the query statement to be executed according to a preset engine classification model comprises:
obtaining the characteristic data of the query statement to be executed by interpreting a command;
and inputting the characteristic data into the engine classification model, and determining an output result of the preset engine classification model as the target job execution engine and the engine parameter.
7. The job execution engine switching method according to claim 4, wherein the execution result includes: applying the error report identifier, the error report reason and the first execution time length corresponding to the first job execution engine and applying the second execution time length corresponding to the second job execution engine;
correspondingly, the determining an engine tag of the historical query statement according to the execution result includes:
if the error report flag of the historical query statement is an error report flag and the error report reason is a memory overflow reason, determining that the job execution engine tag of the historical query statement is a second job execution engine;
if the ratio of the first execution time length to the second execution time length of the historical query statement is smaller than or equal to a ratio threshold or the difference between the first execution time length and the second execution time length is smaller than a difference threshold, determining that the job execution engine label of the historical query statement is a second job execution engine;
and if the ratio of the first execution time length to the second execution time length of the historical query statement is greater than a ratio threshold or the difference between the first execution time length and the second execution time length is greater than a difference threshold, determining that the job execution engine label of the historical query statement is a first job execution engine.
8. A job execution engine switching apparatus, comprising:
the acquisition module is used for acquiring the query statement to be executed;
the determining module is used for determining a target job execution engine and engine parameters corresponding to the query sentence to be executed according to a preset engine classification model, wherein the preset engine classification model is obtained by pre-training a multi-classification model based on batch historical query sentences and engine labels corresponding to the batch historical query sentences;
and the application module is used for applying the target operation execution engine and the engine parameters and executing the query statement to be executed.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of switching the job execution engine according to any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium having stored thereon computer instructions, wherein the instructions, when executed, implement the method of switching a job execution engine of any of claims 1 to 7.
CN202211660710.8A 2022-12-23 2022-12-23 Switching method and device of job execution engine Pending CN115905293A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211660710.8A CN115905293A (en) 2022-12-23 2022-12-23 Switching method and device of job execution engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211660710.8A CN115905293A (en) 2022-12-23 2022-12-23 Switching method and device of job execution engine

Publications (1)

Publication Number Publication Date
CN115905293A true CN115905293A (en) 2023-04-04

Family

ID=86488175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211660710.8A Pending CN115905293A (en) 2022-12-23 2022-12-23 Switching method and device of job execution engine

Country Status (1)

Country Link
CN (1) CN115905293A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116701350A (en) * 2023-05-19 2023-09-05 阿里云计算有限公司 Automatic optimization method, training method and device, and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116701350A (en) * 2023-05-19 2023-09-05 阿里云计算有限公司 Automatic optimization method, training method and device, and electronic equipment
CN116701350B (en) * 2023-05-19 2024-03-29 阿里云计算有限公司 Automatic optimization method, training method and device, and electronic equipment

Similar Documents

Publication Publication Date Title
Verenich et al. Survey and cross-benchmark comparison of remaining time prediction methods in business process monitoring
US11087245B2 (en) Predictive issue detection
WO2021164382A1 (en) Method and apparatus for performing feature processing for user classification model
CN111729305A (en) Map scene preloading method, model training method, device and storage medium
US11423307B2 (en) Taxonomy construction via graph-based cross-domain knowledge transfer
US10984781B2 (en) Identifying representative conversations using a state model
US20190228297A1 (en) Artificial Intelligence Modelling Engine
US20220414470A1 (en) Multi-Task Attention Based Recurrent Neural Networks for Efficient Representation Learning
CN111783873A (en) Incremental naive Bayes model-based user portrait method and device
KR20190094068A (en) Learning method of classifier for classifying behavior type of gamer in online game and apparatus comprising the classifier
CN115905293A (en) Switching method and device of job execution engine
CN113569955A (en) Model training method, user portrait generation method, device and equipment
CN113761193A (en) Log classification method and device, computer equipment and storage medium
US20230214676A1 (en) Prediction model training method, information prediction method and corresponding device
CN113705201B (en) Text-based event probability prediction evaluation algorithm, electronic device and storage medium
CN114358284A (en) Method, device and medium for training neural network step by step based on category information
CN112052386A (en) Information recommendation method and device and storage medium
CN112463964A (en) Text classification and model training method, device, equipment and storage medium
CN112507912A (en) Method and device for identifying illegal picture
CN116431779B (en) FAQ question-answering matching method and device in legal field, storage medium and electronic device
CN112559859B (en) Resource recommendation method, device, electronic equipment and machine-readable storage medium
US11755570B2 (en) Memory-based neural network for question answering
Przewozniczek et al. Fitness caching-from a minor mechanism to major consequences in modern evolutionary computation
CN117349344B (en) Intelligent product sales data acquisition method and system based on big data
US11961046B2 (en) Automatic selection of request handler using trained classification model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination