CN112860652B

CN112860652B - Task state prediction method and device and electronic equipment

Info

Publication number: CN112860652B
Application number: CN202110349909.8A
Authority: CN
Inventors: 门大洲; 周誉; 楼彩英
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Filing date: 2021-03-31
Publication date: 2024-07-09
Anticipated expiration: 2041-03-31

Abstract

The disclosure provides a method and a device for predicting an operation state and electronic equipment, which can be used in the fields of artificial intelligence, finance and the like. The job state prediction method comprises the following steps: collecting attribute information of a node for executing the job and a log associated with the job; processing attribute information of the node by using the trained job state prediction model to obtain a first prediction result, and determining a second prediction result based on at least one keyword extracted from the log; and determining a job status based on the first predictor, the first predictor weight, the second predictor, and the second predictor weight.

Description

Task state prediction method and device and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence technology and finance, and more particularly, to a job status prediction method, apparatus, and electronic device.

Background

With the rapid development of big data technology at present, the data to be processed by enterprises are increased in geometric indexes. For example, transaction data generated daily by large institutions reaches the TB or even PB level, which data needs to be collected for subsequent statistics, analysis, marketing, etc., and batch jobs play a role in data handling and processing.

In the process of implementing the disclosed concept, the applicant finds that at least the following problems exist in the related art, and when batch job processing is performed, scheduling needs to be performed based on the dependency relationship of batch jobs. The related art cannot predict possible operation abnormality in time, for example, only after abnormality such as operation interruption occurs, the operation abnormality can be handled by manual intervention, and the processing time is low.

Disclosure of Invention

In view of the above, the present disclosure provides a method, an apparatus, and an electronic device for predicting a job status, so as to at least partially solve the problem that the job exception handling is less time-efficient.

One aspect of the present disclosure provides a job status prediction method, which collects attribute information of a node for executing a job, and a log associated with the job; processing attribute information of the node by using the trained job state prediction model to obtain a first prediction result, and determining a second prediction result based on at least one keyword extracted from the log; and determining a job status based on the first predictor, the first predictor weight, the second predictor, and the second predictor weight.

According to an embodiment of the present disclosure, determining the second prediction result based on the at least one keyword extracted from the log includes: and determining a second prediction result representing the operation abnormality based on the matching result of the word segmentation result of the log in the operation abnormality knowledge base.

According to an embodiment of the present disclosure, keywords in a job anomaly knowledge base have corresponding keyword weights determined based on the number of occurrences of the keywords in a log for an anomaly job scene and the total number of feature words in the job anomaly knowledge base, and predictive attribute information including: the number of successful matches and the predicted result attribute.

According to an embodiment of the present disclosure, a job exception repository is constructed by: acquiring a first log word segmentation result aiming at an abnormal operation scene and a second log word segmentation result aiming at a normal operation scene; for each log sentence of the abnormal operation scene, determining the sentence similarity between the log sentence for the abnormal operation scene and each of all the log sentences for the normal operation scene based on the first log word segmentation result and the second log word segmentation result; determining candidate log sentences for the abnormal operation scene, wherein the sentence similarity is smaller than or equal to a preset similarity threshold value; acquiring feature words which exist in an abnormal operation scene and do not exist in a normal operation scene from the candidate log sentences based on the first log word segmentation result and the second log word segmentation result; and adding the feature words to the job exception knowledge base.

According to an embodiment of the present disclosure, acquiring, from a candidate log sentence, feature words that exist in an abnormal job scenario and do not exist in a normal job scenario, based on a first log word segmentation result and a second log word segmentation result, includes: constructing an abnormal operation information set and a normal operation information set, wherein the abnormal operation information set comprises operation attribute information of an abnormal operation scene and a first log word segmentation result, and the normal operation information set comprises operation attribute information of a normal operation scene and a second log word segmentation result; and the word segmentation result in the abnormal operation information set and the word segmentation result with empty matching result in the normal operation information set are used as the characteristic words.

According to an embodiment of the present disclosure, the method further includes: after a second prediction result representing the operation abnormality is determined, updating keyword weights and prediction attribute information corresponding to the keywords in an operation abnormality knowledge base; and updating the keywords in the operation anomaly knowledge base in response to the number of prediction errors in the prediction attribute information being greater than or equal to a preset error number threshold.

According to an embodiment of the present disclosure, a node for executing a job includes a bottom layer node, a middle layer node, and a top layer node, the middle layer node being associated with at least one bottom layer node; collecting attribute information of a node for executing a job includes: transmitting attribute information acquired from the bottom layer node to the middle layer node through a first message queue; and the middle layer node stores the attribute information in the attribute data set so that the top layer node reads the attribute information of the node for executing the job from the attribute data set.

According to an embodiment of the present disclosure, the attribute data set includes at least one of job identification, lot date, and run time information to determine input data of the job status prediction model from the attribute data set based on the attribute information.

According to an embodiment of the present disclosure, the job state prediction model includes a plurality of trees, an output of the job state prediction model is determined based on weights and sub-output results of each of the plurality of trees, and parameters of the job state prediction model include: the maximum depth of the tree, the learning rate, and the defined loss function.

One aspect of the present disclosure provides a job status prediction apparatus including: the acquisition module is used for acquiring attribute information of a node for executing the job and a log associated with the job; a processing module for processing attribute information of the node using the trained job status prediction model to obtain a first prediction result, and determining a second prediction result based on at least one keyword extracted from the log; and an output module for determining a job status based on the first predictor, the first predictor weight, the second predictor, and the second predictor weight.

Another aspect of the present disclosure provides an electronic device including one or more processors and a storage device, where the storage device is configured to store executable instructions that, when executed by the processors, implement a job status prediction method as above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a job status prediction method as above.

Another aspect of the present disclosure provides a computer program comprising computer executable instructions which, when executed, are adapted to implement the job status prediction method as above.

According to the job state prediction method, the job state prediction device and the electronic equipment, the job state is predicted from two dimensions of system performance and log information by collecting attribute information of a node executing a job and job association. According to the embodiment of the disclosure, the characteristic of log analysis and the index of system performance are not used as the operation state characteristic for operation state prediction, so that on one hand, the problem of computational complexity caused by feature dimension increase is reduced. On the other hand, the application side log is more complex and changeable relative to the system performance index, and the form difference generated by the system performance index is larger. The prediction results of the system performance and the log are weighted and comprehensively predicted, so that the prediction of different types of jobs under various scenes is met, and the prediction effect is improved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:

FIG. 1 schematically illustrates an exemplary system architecture to which job state prediction methods, apparatus, and electronic devices may be applied, in accordance with embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a job status prediction method in accordance with an embodiment of the present disclosure;

FIG. 3 schematically illustrates an architectural diagram of data acquisition according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a process diagram of predicting based on attribute information in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow diagram for building a job exception repository in accordance with an embodiment of the present disclosure;

FIG. 6 schematically illustrates a data flow diagram for building a job exception knowledge base in accordance with an embodiment of the present disclosure;

FIG. 7 schematically illustrates a process diagram of determining a second predicted outcome based on keywords in a log, according to an embodiment of the disclosure;

FIG. 8 schematically illustrates a block diagram of a job status prediction apparatus according to an embodiment of the present disclosure;

Fig. 9 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure; and

Fig. 10 schematically illustrates a schematic diagram of a computer-readable storage medium according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may include one or more features, either explicitly or implicitly.

With the rapid development of big data technology at present, institutions such as finance and internet industry have massive information to interact and process. Information is carried in the data, which also grows geometrically exponentially therewith. For example, transaction data generated daily by large banks reaches the TB or even PB level, which data needs to be collected for subsequent statistics, analysis, marketing, etc., and batch jobs play a role in data handling and processing.

When the batch operation is carried out, the batch scheduling core ensures that the batch operation of each layer can be successfully executed, and then the operation of the next layer on the pipeline can continue to normally run. Batch job interrupts terminate this data processing. At present, no very effective means is available for predicting the interruption of the operation, only after the interruption, the operation is manually intervened, and the timeliness is relatively low.

The job state prediction method, the job state prediction device and the electronic equipment provided by the embodiment of the disclosure comprise a multi-dimensional prediction process and a prediction result output process, wherein in the multi-dimensional prediction process, firstly, attribute information of a node for executing a job and a log associated with the job are collected, then, the attribute information of the node is processed by using a trained job state prediction model to obtain a first prediction result, and a second prediction result is determined based on at least one keyword extracted from the log. Entering a prediction result output process after the multi-dimensional prediction process is completed, and determining a job state based on the first prediction result, the first prediction result weight, the second prediction result and the second prediction result weight.

The operation state prediction method, the operation state prediction device and the electronic equipment provided by the embodiment of the disclosure can be used in the related aspect of operation state prediction in the field of artificial intelligence, and can also be used in various fields except the field of artificial intelligence, such as the financial field, and the application fields of the operation state prediction method, the operation state prediction device and the electronic equipment provided by the embodiment of the disclosure are not limited.

Fig. 1 schematically illustrates an exemplary system architecture to which job state prediction methods, apparatuses, and electronic devices may be applied, according to embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 may include a number of gateways, routers, hubs, network cables, etc. to provide a medium for communication links between the end devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user can interact with other terminal devices and the server 105 through the network 104 using the terminal devices 101, 102, 103 to receive or transmit information or the like, such as a model training instruction, a job state prediction instruction, image data, a job state prediction result, or the like. The terminal devices 101, 102, 103 may be installed with various communication client applications, such as, for example, a job status prediction class application, a banking class application, an electronic commerce class application, a web browser application, a search class application, an office class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).

Terminal devices 101, 102, 103 include, but are not limited to, smart phones, desktop computers, augmented reality devices, tablet computers, remote video monitoring terminals, laptop computers, etc. electronic devices that can support job status prediction, human-computer interaction. The terminal device may have a neural network stored thereon for job status prediction.

Server 105 may receive model training requests, job state prediction requests, model download requests, etc., and process the requests. For example, server 105 may be a background management server, a server cluster, or the like. The background management server can analyze and process the received service request, information request and the like, and feed back processing results (such as a work state prediction result, model parameters obtained by training a model, feature vectors of object identification and the like) to the terminal equipment.

It should be noted that the neural network training method and the job state prediction method provided by the embodiments of the present disclosure may be executed by the server 105. Accordingly, the training neural network device and the operation state prediction device provided by the embodiments of the present disclosure may be provided in the server 105. Further, the job status prediction method may be executed by the terminal apparatuses 101, 102, 103. It should be understood that the number of terminal devices, networks and servers is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

FIG. 2 schematically illustrates a flow chart of a job status prediction method in accordance with an embodiment of the present disclosure

As shown in fig. 2, the above method includes operations S201 to S203.

In operation S201, attribute information of a node for executing a job, and a log associated with the job are collected.

In this embodiment, attribute information collection may be performed by a collector provided in a node. The nodes may be servers, hosts, etc. The attribute information may be a value of a relevant quantitative performance index of the operating system and the cluster in real time, etc. For example, attribute information includes, but is not limited to: quantitative indexes such as Central Processing Unit (CPU) utilization rate, memory occupancy rate, SWAP exchange partition size, disk space utilization rate, TCP connection number, disk IOPS (hard disk performance index), java Virtual Machine (JVM) initial memory, JVM maximum memory, JVM new generation size, JVM old generation size, ORACLE current connection number, cluster disk capacity and the like.

The log associated with the job may be a log generated during the job, such as a job scheduling log, an alarm log, a cluster log, and the like.

In operation S202, attribute information of the node is processed using the trained job status prediction model to obtain a first prediction result, and a second prediction result is determined based on at least one keyword extracted from the log.

In this embodiment, the attribute information of the node may be processed by using the trained job state prediction model to implement prediction based on the dimension of the node attribute information. The output of the job state prediction model may be a probability that the current job is abnormal (such as interrupted) during the processing.

The second prediction result may be predicted based on the keywords extracted from the log. For example, if a keyword is included in the log that characterizes the possible abnormal state of the job, a probability of an abnormal job at the current job may be given.

In operation S203, a job status is determined based on the first prediction result, the first prediction result weight, the second prediction result, and the second prediction result weight.

In this embodiment, some interrupts are more relevant to the system performance from the job characteristics, and others are more relevant to the log report content, because of different scenarios. Thus, the first predictor weight and the second predictor weight may be determined based on the characteristics of the job. For example, the ratio of the first predictor weight to the second predictor weight may be 1:1, may be less than 1, or may be greater than 1.

For example, weights W1 and W2 may be reassigned to the prediction probabilities of xgboost and the application log according to the characteristics of the job, the first prediction result P1 and the second prediction result P2 are normalized, and then w1×p1+w2×p2 is used as the final prediction result value.

According to the embodiment of the disclosure, the comprehensive prediction of the system performance and the application log weighting is adopted, so that the prediction of different types of jobs under different conditions is satisfied, and a better prediction effect is achieved.

In one embodiment, the nodes for executing a job include a bottom tier node, a middle tier node, and a top tier node, the middle tier node being associated with at least one bottom tier node.

Accordingly, collecting attribute information of a node for executing a job may include the following operations.

First, attribute information collected from the bottom tier nodes is sent to the middle tier nodes through a first Message Queue (MQ).

Then, the middle tier node stores the attribute information in the attribute data set so that the top tier node reads the attribute information of the node for executing the job from the attribute data set. Wherein the intermediate layer may comprise a plurality of sub-layers. For example, for large banks, it may include a website, branch, headquarter, etc. Both the server clusters of the branches and the head office may act as a middle tier, since the attribute information collected by the bottom tier nodes is aggregated.

For example, operating system and cluster quantitative index: the collected data mainly comprises quantitative indexes such as the current CPU utilization rate, the memory occupancy rate, the SWAP exchange partition size, the disk space utilization rate, the TCP connection number, the disk IOPS, the JVM initial memory, the JVM maximum memory, the JVM new generation size, the JVM old generation size, the ORACLE current connection number, the cluster disk capacity and the like of the system. And collecting the collected data into a mode which can be processed by the operation state prediction model.

Fig. 3 schematically illustrates an architectural diagram of data acquisition according to an embodiment of the present disclosure.

As shown in fig. 3, taking a large banking institution as an example, performance data information of each server may be collected by a client (for example, an open source tool FileBeat, logstash, etc.) deployed on each server, the client sends the collected data to a branch Storm layer through a branch MQ message queue, the data of the branch Storm is sent to a general line Storm through a general line MQ, and after the Storm layer consumes data produced by the branch institution MQ, the data is sent to a kafka and mysql database. Whereas the deployed xgboost data collection tool reads real-time performance data from either kafka or mysql, translates into a processable way to be generated in the text. By adopting the design mode, the characteristics of low delay, high performance, distributed, expandable, good fault tolerance and the like of the Storm can be fully utilized, and the messages can be ensured not to be lost, so that the messages are strictly and orderly processed.

In one embodiment, the set of attribute data includes at least one of job identification, lot date, run time information to determine input data for the job state prediction model from the set of attribute data based on the attribute information.

For example, the collected data structures are shown in table 1.

TABLE 1

In one embodiment, the job state prediction model includes a plurality of trees, the output of the job state prediction model is determined based on the weights and sub-output results of each of the plurality of trees, and the parameters of the job state prediction model include: the maximum depth of the tree, the learning rate, and the defined loss function.

For example, a work state prediction model is exemplarily described with xgboost as an example.

The key of xgboost integrated learning algorithm is: and continuously splitting all sample characteristics, forming a tree by splitting each time, training to obtain K trees, when the score of one sample is predicted, namely, according to the characteristics of the sample, falling a corresponding leaf node in each tree, wherein each leaf node corresponds to one score, and finally adding the scores corresponding to each tree to obtain the sample predicted value. In this patent, each time, a certain performance parameter or parameters of the system performance are learned, a plurality of trees are built according to the CPU, the memory, the TCP connection number and the like, and then the result comprehensive weight of the plurality of trees is added and calculated, and a result model with the best prediction is iterated. The formulated expression is shown as a formula (1).

Wherein,The value represents the complexity of each tree, T is the number of leaf nodes, gamma and lambda are coefficients, and con tan is a constant term, the complexity of the constructed tree being related to the number of leaf nodes of the tree and the weight value of each leaf node.

Xgboost the prediction process can be roughly divided into the following steps:

1. In the above we have mentioned that Xgboost read data is data output to kafka by the STORM layer collected by the client, and the index of the system performance transmission is all of digital type. Therefore, it is not necessary to map non-numeric types, such as one-hot coding, and the performance index data collected herein are all stored in the xgboost.csv file. The data amount of 70% is stored in a file as a training set, the file name is xgboost _train.csv, and 30% is stored in xgboost _test.csv as a test set in the time-date section of the job described above.

For example, the data set shown in Table 1 may extract 30% as training data and 70% as test data. The selected data is a system performance index data set of about 6 months in the log record, and besides the system performance indexes, each job also has a corresponding batch date and time period, as shown in table 2.

TABLE 2

In order to more scientifically select a training data set and a test data set from the selected samples, according to the data in the form of the table, the (etl_job, batch_date) is taken as a main key, the (start time, end time) jobs are divided into three batches, wherein 1/3 of the jobs are selected as the training set for each batch of jobs, and the other 2/3 of the jobs are taken as the test data set.

Training a model for xgboost algorithm, and predicting whether the operation is interrupted or not after the CPU use frequency, the memory occupancy rate, the TCP connection number, the JVM stack and the cluster disk capacity index are evaluated in the actual real-time environment by using the trained model parameters.

Fig. 4 schematically illustrates a process illustration of predicting based on attribute information according to an embodiment of the present disclosure.

As shown in fig. 4, first, the training data source is read from an upstream database (e.g., a card, mysql, etc.). Each row of records in the source data text may be stored in order of CPU usage, memory occupancy, TCP connection number, JVM heap, cluster disk capacity, ORACLE current connection number, disk IOPS, JVM new generation size.

Then, the parameters to be trained of xgboost and xgboost are numerous, and specifically, the following main parameters, xgboost, have default values for most of the parameters, and the following parameters are selected for training, because the parameters have more important roles in prediction of the model, and the performance of the model is relatively improved. It should be noted that, prediction accuracy may be improved by means of feature engineering, model combination, and the like.

Where max_depth is the maximum depth of the tree. eta is the learning rate, and the influence of each tree is weakened by preventing over fitting through setting eta parameters, so that the problem that the subsequent tree has no learning space is avoided. The loss function defined by the objective is typically reg: linear (linear regression), reg: logistic (logistic regression), binary: logistic (a two-class logistic regression problem, output as probability). For example, embodiments of the present disclosure may employ multiple: softmax (multiple classification problem). silent: default value is 0, if set to 1, it is silent mode, and no print is done in the program.

Through the above steps in fig. 4, xgboost can be used to predict whether the operation will be interrupted in the case of the current system's CPU utilization, memory occupancy, etc. indicators. The current prediction accuracy of the model according to sample data statistics is as follows: p1.

In addition, in order to further improve the prediction accuracy of the job status, analysis may also be performed from the dimension of the log associated with the job.

For example, determining the second prediction result based on the at least one keyword extracted from the log may include the operations of: and determining a second prediction result representing the operation abnormality based on the matching result of the word segmentation result of the log in the operation abnormality knowledge base. The job anomaly knowledge base may be a pre-built knowledge base.

For example, keywords in the job anomaly knowledge base have corresponding keyword weights determined based on the number of occurrences of the keywords in the log for the anomaly job scene and the total number of feature words in the job anomaly knowledge base, and predictive attribute information including: the number of successful matches and the predicted result attribute. The keyword weights in the operation anomaly knowledge base can be normalized.

Specifically, the application side performs word segmentation analysis through the log so as to match with the interrupt keywords in the existing operation abnormality knowledge base. The table structure of the job exception repository may be as shown in table 3.

TABLE 3 Table 3

In one embodiment, the job exception repository may be constructed as follows.

FIG. 5 schematically illustrates a flow chart for building a job exception repository in accordance with an embodiment of the present disclosure.

As shown in fig. 5, operations S501 to S505 may be included.

In operation S501, a first log word segmentation result for an abnormal job scenario and a second log word segmentation result for a normal job scenario are acquired.

FIG. 6 schematically illustrates a data flow diagram for building a job exception repository in accordance with an embodiment of the present disclosure.

As shown in fig. 6, log collection is first performed, such as log collection including stock history and collection of real-time log. The collection of the stock histories mainly comprises the step of generating a job abnormality knowledge base aiming at the job interrupt field by utilizing historical data. The real-time log is collected to judge whether the operation will be interrupted or not.

Log collection process of stock history: the job running history log is collected from the following channels (including job scheduling log, alarm log, cluster log, etc.). After collection, punctuation marks are removed, the fictional words and word stemming are removed, word segmentation is carried out by means of LUCENCE word segmentation, JIEBA word segmentation and other tools, and the word segmentation of the operation is distinguished according to whether the operation is interrupted or not. For each job, the following processing steps are performed:

Interrupt scenario job log segmentation: [ { word 1, word 2, ], word 1, word 2, ].

Non-interrupt contextual job log segmentation: [ { word 1, word 2, ], word 1, word 2, ].

In operation S502, for each log sentence of the abnormal job scene, the sentence similarity between each of the log sentences for the abnormal job scene and all the log sentences for the normal job scene is determined based on the first log word segmentation result and the second log word segmentation result.

For example, the log word segmentation of the non-interrupt scene job is traversed, and the similarity of each sentence after word segmentation and each sentence after word segmentation of the interrupt job is judged. The two-dimensional genus groups are composed as shown in table 4.

TABLE 4 Table 4

Wherein,Sentence similarity corresponding to the i rows of the corresponding non-interrupt job and the j columns of the interrupt job. The initial values of i and j are both 1.

In operation S503, candidate log sentences for the abnormal job scene whose sentence similarity is less than or equal to a preset similarity threshold are determined.

The S _MiNj value of a sentence of the interrupt job and all the clauses of the non-interrupt job is lower than the thresholdIt is stated that the break sentence is different from the non-break job word, and the break sentence needs to be selected. The formulation expression can be shown as formula (2):

f (x) is a discrete function, each value of which is smaller than the threshold value J is the kth sentence of the interrupt job.

In operation S504, feature words that exist in the abnormal job scenario and do not exist in the normal job scenario are acquired from the candidate log sentences based on the first log word segmentation result and the second log word segmentation result.

For the followingAbove a threshold valueFirst, words with a similarity higher than a threshold value can be screened out. The word is a word that exists in the sentences of the interrupted job but does not exist in the sentences of the non-interrupted job.

In operation S505, feature words are added to the job abnormality knowledge base.

In one embodiment, based on the first log word segmentation result and the second log word segmentation result, the feature words that exist in the abnormal job scenario and do not exist in the normal job scenario are obtained from the candidate log sentences, and may include the following operations.

Firstly, an abnormal operation information set and a normal operation information set are constructed, wherein the abnormal operation information set comprises operation attribute information of an abnormal operation scene and a first log word segmentation result, and the normal operation information set comprises operation attribute information of a normal operation scene and a second log word segmentation result.

Then, the word segmentation result in the abnormal operation information set and the word segmentation result of which the matching result is blank in the normal operation information set are taken as characteristic words.

In one embodiment, the job interrupt table (JOBS _FAILED) is as shown in Table 5.

TABLE 5

For example, the job non-interrupt table (JOBS _ SUCC) is shown in table 6.

TABLE 6

The jobs are divided into two tables according to whether they are interrupted or not. For example, this can be achieved by:

Select t1.etl_job,t1.batch_date,

case when t1.word1 not in

(t2.word1,t2.word2,t2.word3....，t2.wordN)then t1.word1,

case when t1.word2 not in

(t2.word1,t2.word2,t2.word3....,t2.wordN)then t1.word2,

case when t1.word3 not in

(t2.word1,t2.word2,t2.word3....,t2.wordN)then t1.word3

.......

case when t1.wordK not in

(t2.word1，t2.word2,t2.word3....,t2.wordN)then t1.wordN from

JOBS_FAILED t1 inner join JOBS_SUCC t2 where t1.etl_job＝t2.etl_job

n is the total number of the words corresponding to each record of the t1 table, and K is the total number of the words corresponding to each record of the t2 table. Then, the selected word is subjected to the matching of the hyponym, namely, whether the hyponym of the unmatched word in the interrupt operation is in the word of the corresponding non-interrupt operation or not. And performing the matching of the hyponyms on the unmatched segmented words in the two sentences, and if the hyponyms are still unmatched, extracting the unmatched words of the sentences in the interrupt operation.

For example, the calculation formula of the similarity S _MiNj in the above embodiment may be shown as formula (3).

Where S _i and S _j are two sentences, w _k is a word in a sentence, the numerator part means the number of the same word appearing in both sentences at the same time, and the denominator is the sum of the logarithms of the number of words in the sentence. Where the log operation may suppress the advantage of longer sentences in similarity calculation).

Through the above process, a part of sentences of the interrupted job is obtained (the words in the sentences can be understood to have stronger characteristics of job interruption), the words are traversed once, and the results are formed according to word segmentation and number statistics as shown in table 7.

TABLE 7

Word segmentation name	Number of word segmentation
		WORD1	COUNT1
WORD2	COUNT2
		WORD3	COUNT3
......	......

The association relation screens out words which only appear in the interrupt log of the same job, and the words can be more discernable in judging the interrupt (the word segmentation comparison can be carried out only on the log generated by the interrupt of the same job, and the comparison of different jobs is not meaningful because the log output by the job is largely related to own business logic, different business logic of the different jobs is different, and whether the vocabulary of program segmentation of the business logic part is the same or not is irrelevant to the feature of the interrupt or not).

And (5) the words screened by the rules are stored in an operation anomaly knowledge base after the duplication is removed. The structure field of the knowledge base JOB KEYWORDS LIST can be designed as follows: keyword unique number (key_id), keyword (KEYWORD), keyword WEIGHT (WEIGHT), number of match successes (SUCC _count), number of prediction successes (PREDICT _ SUCC _count). Wherein, the initial values of the successful matching times and the successful predicting times are all 0, and the keyword weight is calculated by the following steps: Wherein p _i is the number of the word segmentation, And the total number of all the segmented words for the screened interrupt operation. The knowledge base data storage format is shown in table 8.

TABLE 8

The real-time log acquisition process is basically the same as that of the real-time log acquisition process, and the log is subjected to punctuation mark removal, virtual word removal, word stemming and job interruption prediction according to word segmentation matching after log acquisition. And meanwhile, carrying out self-increasing operation on the field value of the matching success times (PREDICT _ SUCC _count) of the matched keywords in the knowledge base, carrying out self-increasing operation on the predicted success times, and carrying out self-subtracting operation on the field value of the predicted failure times (PREDICT _FAIL_count).

TABLE 9

In one embodiment, after determining the second prediction result characterizing the job exception, the above method may further include the following operations.

First, keyword weights and prediction attribute information corresponding to each keyword in the job exception repository are updated.

And then, updating the keywords in the operation anomaly knowledge base in response to the number of prediction errors in the prediction attribute information being greater than or equal to a preset error number threshold.

Fig. 7 schematically illustrates a process of determining a second prediction result based on keywords in a log according to an embodiment of the present disclosure.

As shown in fig. 7, in the process of forming the knowledge base: first, a word segment in an interrupted job sentence (the format of which is shown with reference to table 5) and a word segment in a non-interrupted job sentence (the format of which is shown with reference to table 6) are determined. And then comparing the sentence similarity based on the word segmentation result. If the similarity is less than the preset threshold, adding the sentence into the break candidate sentence. If the similarity is larger than or equal to a preset threshold value, comparing whether different parts of the vocabulary in the sentences are hyponyms or not, and if not, adding the hyponyms into the candidate sentences. If a paraphrasing word is present, the sentence is discarded. Next, the vocabulary and the number thereof (shown with reference to Table 7) are counted, the table is stored, and the knowledge base (the format of which is shown with reference to Table 8) is updated. In general, when prediction is performed according to whether the prediction is interrupted or not, the values of the number of prediction successes (PREDICT _ SUCC _count) and the number of prediction failures (PREDICT _fail_count) become larger and larger. Here, by setting a threshold epsilon, when the number of prediction failures is greater than the threshold, the job interrupt log in the candidate library needs to be iteratively updated again according to the above process.

In making the new job predictions, the knowledge base may be used to analyze the log (as shown in reference to Table 9). The knowledge base may be iteratively updated when the number of prediction failures is greater than a threshold.

The operation may initially maintain a real-time information table including CPU utilization, memory occupancy, TCP connection number, log word 1, log word 2, etc. And (3) matching the existing keywords in the knowledge base by using the log segmentation 1, the log segmentation 2 and the like of the operation, and adding the weights of the matched keywords to be P2. After the predicted probability P1 of xgboost of system performance and the predicted probability P2 of the application-side log are obtained, some interruptions in the current job characteristics appear to be more relevant to system performance, while others are more relevant to log report content. Weights W1 and W2 can be reassigned to the predicted probabilities of xgboost and application logs according to the characteristics of the operation, P1 and P2 are normalized, and then W1 is equal to P1+W2 is equal to P2 and is used as a final predicted result value.

According to the embodiment of the disclosure, the problem of interrupt prediction of different types of jobs in current production is solved by integrating the learning algorithm xgboost and the log semantic analysis comprehensive measurement method, namely xgboost focuses on interrupt prediction of the system performance side, and log semantic focuses on prediction of log analysis of the application side. In principle, the interruption of the operation has relation with the system performance and the error reporting of the log, but the dependent weights are different, the method of taking the characteristic of the log analysis and the index of the system performance as the characteristic processing is not adopted in the present disclosure, firstly, the problem of the calculation complexity caused by the characteristic dimension increase is reduced, and secondly, the log on the application side is more complex and changeable relative to the system performance index, and the form difference generated by the index of the system performance is larger. The comprehensive prediction of the system performance and the application log weighting meets the prediction of different types of jobs under different conditions, so that the system has better prediction effect.

Fig. 8 schematically shows a block diagram of a job status prediction apparatus according to an embodiment of the present disclosure.

As shown in fig. 8, the job status prediction apparatus 800 may include: an acquisition module 810, a processing module 820, and an output module 830.

The collection module 810 is used to collect attribute information of nodes used to execute the job, as well as logs associated with the job.

The processing module 820 is configured to process attribute information of the node using the trained job status prediction model to obtain a first prediction result, and determine a second prediction result based on at least one keyword extracted from the log.

The output module 830 is configured to determine a job status based on the first prediction result, the first prediction result weight, the second prediction result, and the second prediction result weight.

It should be noted that, the implementation manner, the solved technical problems, the realized functions and the obtained technical effects of each module/unit and the like in the apparatus portion embodiment are the same as or similar to the implementation manner, the solved technical problems, the realized functions and the obtained technical effects of each corresponding step in the method portion embodiment, and are not described in detail herein.

Any number of the modules, units, or at least some of the functionality of any number of the modules, units, or units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuits, or in any one of or in any suitable combination of three of software, hardware, and firmware. Or one or more of the modules, units according to embodiments of the present disclosure may be at least partially implemented as computer program modules which, when executed, may perform the corresponding functions.

For example, any of the acquisition module 810, the processing module 820, and the output module 830 may be combined and implemented in one module, or any of the modules may be split into a plurality of modules. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. According to embodiments of the present disclosure, at least one of the acquisition module 810, the processing module 820, and the output module 830 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of any of the three. Or at least one of the acquisition module 810, the processing module 820 and the output module 830 may be at least partially implemented as a computer program module which, when executed, performs the corresponding functions.

Fig. 9 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 9 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 9, an electronic device 900 according to an embodiment of the present disclosure includes a processor 901 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. The processor 901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 901 may also include on-board memory for caching purposes. The processor 901 may include a single processing unit or a plurality of processing units for performing different actions of the method flow according to the embodiment of the present disclosure, and the plurality of processing units may be integrated in one processor or may be distributed in a plurality of processors, which is not limited herein.

In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are stored. The processor 901, the ROM 902, and the RAM 903 are communicatively connected to each other via a bus 904. The processor 901 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the program may be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.

According to an embodiment of the disclosure, the electronic device 900 may also include an input/output (I/O) interface 905, the input/output (I/O) interface 905 also being connected to the bus 904. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

The present disclosure also provides a computer-readable storage medium.

As shown in fig. 10, the computer-readable storage medium 1000 may be included in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium 1000 described above carries one or more programs, which when executed, implement methods according to embodiments of the present disclosure.

According to embodiments of the present disclosure, computer-readable storage medium 1000 may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 902 and/or RAM 903 and/or one or more memories other than ROM 902 and RAM 903 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program comprising program code for performing the methods provided by the embodiments of the present disclosure, the program code for causing an electronic device to implement the image model training method or the job state prediction method provided by the embodiments of the present disclosure when the computer program product is run on the electronic device.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, via communication portion 909, and/or installed from removable medium 911. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A job status prediction method, comprising:

collecting attribute information of a node for executing a job and a log associated with the job;

processing attribute information of the node using a trained job state prediction model to obtain a first prediction result, and determining a second prediction result based on at least one keyword extracted from the log,

Wherein said determining a second prediction result based on at least one keyword extracted from said log comprises:

determining a second predicted result representing the operation abnormality based on the matching result of the word segmentation result of the log in the operation abnormality knowledge base,

The operation anomaly knowledge base is constructed in the following mode:

acquiring a first log word segmentation result aiming at an abnormal operation scene and a second log word segmentation result aiming at a normal operation scene;

For each log sentence of the abnormal operation scene, determining the sentence similarity between the log sentence for the abnormal operation scene and each of all log sentences for the normal operation scene based on the first log word segmentation result and the second log word segmentation result;

determining candidate log sentences for the abnormal operation scene, wherein the sentence similarity is smaller than or equal to a preset similarity threshold value;

acquiring feature words which exist in an abnormal operation scene and do not exist in a normal operation scene from the candidate log sentences based on the first log word segmentation result and the second log word segmentation result; and

Adding the feature words to the job anomaly knowledge base,

Wherein the obtaining, based on the first log word segmentation result and the second log word segmentation result, feature words that exist in an abnormal job scenario and do not exist in a normal job scenario from the candidate log sentences includes:

constructing an abnormal operation information set and a normal operation information set, wherein the abnormal operation information set comprises operation attribute information of an abnormal operation scene and a first log word segmentation result, and the normal operation information set comprises operation attribute information of a normal operation scene and a second log word segmentation result; and

The word segmentation result in the abnormal operation information set and the word segmentation result with empty matching result of the paraphrasing in the normal operation information set are used as the characteristic words; and

The job status is determined based on the first predictor, a first predictor weight, the second predictor, and a second predictor weight.

2. The method of claim 1, wherein keywords in the job anomaly repository have corresponding keyword weights determined based on a number of occurrences of keywords in a log for an anomaly job scenario and a total number of feature words in the job anomaly repository and predictive attribute information comprising: the number of successful matches and the predicted result attribute.

3. The method of claim 2, further comprising: after determining the second prediction result characterizing the job exception,

Updating keyword weight and prediction attribute information corresponding to each keyword in the operation anomaly knowledge base; and

And updating the keywords in the operation anomaly knowledge base in response to the number of prediction errors in the prediction attribute information being greater than or equal to a preset number of error threshold.

4. The method of claim 1, wherein the nodes for executing jobs comprise a bottom layer node, a middle layer node, and a top layer node, the middle layer node being associated with at least one bottom layer node;

The collecting attribute information of a node for executing a job includes:

Transmitting attribute information acquired from the bottom layer node to the middle layer node through a first message queue; and

The middle layer node stores the attribute information in an attribute data set so that the top layer node can read the attribute information of the node for executing the job from the attribute data set.

5. The method of claim 4, wherein the set of attribute data includes at least one of job identification, batch date, run time information to determine input data for the job status prediction model from the set of attribute data based on the attribute information.

6. The method of any of claims 1-5, wherein the job state prediction model comprises a plurality of trees, the output of the job state prediction model is determined based on respective weights and sub-output results of the plurality of trees, the parameters of the job state prediction model comprise: the maximum depth of the tree, the learning rate, and the defined loss function.

7. A job status prediction apparatus comprising:

The system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring attribute information of a node for executing a job and a log associated with the job;

A processing module for processing attribute information of the node using a trained job status prediction model to obtain a first prediction result, and determining a second prediction result based on at least one keyword extracted from the log,

The operation anomaly knowledge base is constructed in the following mode:

Adding the feature words to the job anomaly knowledge base,

And the output module is used for determining the operation state based on the first prediction result, the first prediction result weight, the second prediction result and the second prediction result weight.

8. An electronic device, comprising:

One or more processors;

Storage means for storing executable instructions which, when executed by the processor, implement a job status prediction method according to any one of claims 1 to 6.

9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, implement the job status prediction method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program comprising executable instructions which, when executed by a processor, implement the job status prediction method according to any one of claims 1 to 6.