CN114676021A - Job log monitoring method and device, computer equipment and storage medium - Google Patents

Job log monitoring method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114676021A
CN114676021A CN202210456220.XA CN202210456220A CN114676021A CN 114676021 A CN114676021 A CN 114676021A CN 202210456220 A CN202210456220 A CN 202210456220A CN 114676021 A CN114676021 A CN 114676021A
Authority
CN
China
Prior art keywords
job
log data
analyzed
log
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210456220.XA
Other languages
Chinese (zh)
Inventor
聂文俊
石雪
邱玉华
徐忠民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210456220.XA priority Critical patent/CN114676021A/en
Publication of CN114676021A publication Critical patent/CN114676021A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a method and a device for monitoring a job log, computer equipment and a storage medium. The method comprises the following steps: acquiring job log data to be analyzed of a job log center; the log data of the job to be analyzed is data obtained by processing log data generated during the running of the target job; inputting job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a bidirectional gating circulation unit network with a multilayer structure; if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation. By adopting the method, the monitoring efficiency can be improved, and the recognition error rate can be reduced.

Description

Job log monitoring method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for monitoring a job log, a computer device, and a storage medium.
Background
With the development of artificial intelligence technology, cloud technology appears, and the technology can unify series resources such as hardware, software, network and the like in a wide area network or a local area network to realize a hosting technology for calculating, storing, processing and sharing data.
In the conventional technology, the requirement cannot be met more and more along with the increase of data volume by means of a manual semi-automatic conventional operation and maintenance mode. When the existing log monitoring and analysis are carried out on a large batch of logs, most of the logs still stay in passive and time-consuming operation and maintenance modes such as monitoring error reporting, keyword retrieval and the like, so that the monitoring efficiency is low, and the recognition error rate is high.
Disclosure of Invention
In view of the foregoing, there is a need to provide a method, an apparatus, a computer device, a computer readable storage medium and a computer program product for monitoring a job log based on an attention mechanism and a neural network, which can provide an abnormality alarm.
In a first aspect, the present application provides a job log monitoring method. The method comprises the following steps: acquiring job log data to be analyzed of a job log center; the log data of the job to be analyzed is data obtained by processing log data generated by the running of the target job; inputting the job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network; if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
In one embodiment, the inputting the job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and determining whether the job log data analysis result is interrupted based on the job log data analysis result to obtain a first determination result includes: acquiring a log data record corresponding to a job name corresponding to the target job, and a start time and an end time corresponding to the log data record; inputting the log data record corresponding to the job name, the start time and the end time corresponding to the log data record into the log analysis model to obtain the job log data analysis result; and judging whether the target operation is interrupted or not based on the operation log data analysis result to obtain the first judgment result.
In one embodiment, the inputting the log data record corresponding to the job name, the start time and the end time corresponding to the log data record into the log analysis model to obtain the job log data analysis result includes: calculating the job success rate and the historical repair problem duration of the time period corresponding to the job name based on the log data record corresponding to the job name; obtaining the running duration corresponding to the log data record based on the starting time and the ending time corresponding to the log data record; and inputting the operation success rate, the historical problem repairing time length and the running time length corresponding to the log data record into the log analysis model to obtain the operation log data analysis result.
In one embodiment, if the first determination result indicates that the target job is not interrupted, the inputting the job log data to be analyzed to an alarm tracking model to obtain a second determination result corresponding to the analysis result of the uninterrupted job log data includes: acquiring a table name or a file name corresponding to the to-be-analyzed job log data, wherein the table name or the file name reflects the load condition of the to-be-analyzed job log data; inputting the table name or file name corresponding to the job log data to be analyzed into the alarm tracking model to obtain a second judgment result corresponding to the uninterrupted job log data analysis result; and the second judgment result is used for representing whether the target operation has an abnormal alarm or not.
In one embodiment, the inputting the table name or the file name of the log data of the job to be analyzed to the alarm tracking model to obtain a second determination result corresponding to the analysis result of the uninterrupted log data of the job includes: obtaining the data volume corresponding to the job log data to be analyzed according to the table name or the file name of the job log data to be analyzed; and inputting the starting time, the running time length, the data volume corresponding to the job log data to be analyzed and the environment information into an abnormal state calculation formula to obtain a second judgment result corresponding to the uninterrupted job log data analysis result.
In one embodiment, the method further comprises: inputting the log data of the job to be analyzed, which is output based on the target job corresponding to the abnormal alarm, into a tracking algorithm model to obtain a third judgment result; if the third judgment result is abnormal, updating corresponding operation data in the abnormal operation information table; the abnormal operation information table is used for recording abnormal information corresponding to the target operation; and if the third judgment result is normal, repeatedly executing the step of outputting the log data of the target operation corresponding to the abnormal alarm and the operation to be analyzed to a tracking algorithm model, and recording the times of repeatedly executing the step.
In one embodiment, the repeatedly executing the target job corresponding to the abnormal alarm, outputting the job log data to be analyzed, and inputting the job log data to a tracking algorithm model to obtain a fourth determination result, includes: if the times are smaller than a preset time threshold, outputting the job log data to be analyzed again aiming at the target job, and judging the job log data to be analyzed to be abnormal; and if the times are greater than or equal to the time threshold, removing the abnormal monitoring state corresponding to the target operation, and outputting normal operation information.
In a second aspect, the present application further provides a job log monitoring apparatus. The device comprises: the analysis job log data acquisition module is used for acquiring the analysis job log data of the job log center; the log data of the job to be analyzed is data obtained by processing log data generated when the target job runs; a first judgment result obtaining module, configured to input the job log data to be analyzed to a log analysis model to obtain a job log data analysis result, and judge whether the target job is interrupted based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network; the abnormal alarm module is used for inputting the log data of the job to be analyzed into an alarm tracking model if the first judgment result indicates that the target job is not interrupted, and obtaining a second judgment result corresponding to the analysis result of the uninterrupted log data of the job; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
In one embodiment, the first determination result obtaining module is configured to obtain a log data record corresponding to a job name corresponding to the target job, and a start time and an end time corresponding to the log data record; inputting the log data record corresponding to the job name, the start time and the end time corresponding to the log data record into the log analysis model to obtain the job log data analysis result; and judging whether the target operation is interrupted or not based on the operation log data analysis result to obtain the first judgment result.
In one embodiment, the first determination result obtaining module is configured to calculate, based on a log data record corresponding to the job name, a job success rate and a historical repair problem duration of a time period corresponding to the job name; obtaining the running duration corresponding to the log data record based on the starting time and the ending time corresponding to the log data record; and inputting the operation success rate, the historical problem repairing time length and the running time length corresponding to the log data record into the log analysis model to obtain the operation log data analysis result.
In one embodiment, the second determination result obtaining module is configured to obtain a table name or a file name corresponding to the job log data to be analyzed, where the table name or the file name reflects a load condition of the job log data to be analyzed; inputting the table name or the file name corresponding to the job log data to be analyzed into the alarm tracking model to obtain a second judgment result corresponding to the uninterrupted job log data analysis result; and the second judgment result is used for representing whether the target operation has an abnormal alarm or not.
In one embodiment, the second determination result obtaining module is configured to obtain a data size corresponding to the job log data to be analyzed according to a table name or a file name of the job log data to be analyzed; and inputting the starting time, the running time length, the data volume corresponding to the job log data to be analyzed and the environment information into an abnormal state calculation formula to obtain a second judgment result corresponding to the uninterrupted job log data analysis result.
In one embodiment, the tracking module is configured to input the job log data to be analyzed, which is output based on the target job corresponding to the abnormal alarm, into a tracking algorithm model to obtain a third determination result; if the third judgment result is abnormal, updating corresponding operation data in the abnormal operation information table; the abnormal operation information table is used for recording abnormal information corresponding to the target operation; and if the third judgment result is normal, repeatedly executing the step of outputting the log data of the target operation corresponding to the abnormal alarm and the operation to be analyzed to a tracking algorithm model, and recording the times of repeatedly executing the step.
In one embodiment, the tracking module is configured to output the job log data to be analyzed again for the target job to determine that the job log data to be analyzed is abnormal if the number of times is smaller than a preset number-of-times threshold; and if the times are greater than or equal to the time threshold, removing the abnormal monitoring state corresponding to the target operation, and outputting normal operation information.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program: acquiring job log data to be analyzed of a job log center; the log data of the job to be analyzed is data obtained by processing log data generated when the target job runs; inputting the job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network; if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of: acquiring job log data to be analyzed of a job log center; the log data of the job to be analyzed is data obtained by processing log data generated when the target job runs; inputting the job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network; if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of: acquiring job log data to be analyzed of a job log center; the log data of the job to be analyzed is data obtained by processing log data generated by the running of the target job; inputting the job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network; if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
According to the job log monitoring method, the job log monitoring device, the computer equipment, the storage medium and the computer program product, job log data to be analyzed of the job log center are obtained; the log data of the job to be analyzed is data obtained by processing log data generated during the running of the target job; inputting job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network; if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
By acquiring the job log data to be analyzed of the job log center, the analyzed data can be accurately and quickly acquired from the terminal, so that the job log center and the analysis system are highly matched, and the stability of the two systems is improved; the analysis result of the job log data is obtained through the analysis of the log analysis model, the first judgment result is obtained based on the result, the analysis of multiple steps can be carried out on the job log data to be analyzed, the job log data to be analyzed has a credible analysis result, each data has the same analysis process and has uniformity, the judgment result obtained from the analysis result also continues the characteristics, and the analysis efficiency and the accuracy of the job log data to be analyzed are improved; the log data of the operation to be analyzed, which is not interrupted by the first judgment result, is judged for the second time to obtain the judgment result of normal operation or abnormal alarm, so that the log data of the operation to be analyzed, which is not interrupted by the system, can be classified more carefully, defective data is prevented from flowing into the next step, the judgment accuracy of the system corresponding to the method is further improved, and the judgment efficiency is improved.
The intelligent log monitoring system based on the attention mechanism and the neural network can effectively reduce the cost of manual analysis and monitoring aiming at the problems of real-time monitoring and intelligent analysis and processing of logs in the prior operation and maintenance system technology.
Drawings
FIG. 1 is a diagram of an application environment of a job log monitoring method in one embodiment;
FIG. 2 is a flowchart of a method for monitoring job logs, according to one embodiment;
FIG. 3 is a flowchart illustrating a method for monitoring job logs according to another embodiment;
FIG. 4 is a flowchart illustrating a method for monitoring job logs according to another embodiment;
FIG. 5 is a flowchart illustrating a method for monitoring and alarming job logs according to an embodiment;
FIG. 6 is a flowchart illustrating a method for monitoring and alarming job logs according to another embodiment;
FIG. 7 is a flowchart illustrating a method for monitoring and tracking job logs according to an embodiment;
FIG. 8 is a flowchart illustrating a method for monitoring and tracking job logs according to another embodiment;
FIG. 9 is a schematic diagram of a deep bidirectional GRU network model based on an attention mechanism in one embodiment;
FIG. 10 is a logic diagram of an alarm model in one embodiment;
FIG. 11 is a logic diagram of a tracking model in one embodiment;
FIG. 12 is a flow diagram of an intelligent log monitoring system based on an attention mechanism and a neural network in one embodiment;
fig. 13 is a block diagram showing a configuration of a job log monitoring apparatus according to an embodiment;
FIG. 14 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
The job log monitoring method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The terminal 102 acquires data, the server 104 receives the data of the terminal 102 in response to an instruction of the terminal 102 and performs calculation on the acquired data, and the server 104 transmits the calculation result of the data back to the terminal 102 and is displayed by the terminal 102. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The server 104 acquires job log data to be analyzed of a job log center from the terminal 102; the log data of the job to be analyzed is data obtained by processing log data generated during the running of the target job; inputting job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network; if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
In one embodiment, as shown in fig. 2, a method for monitoring a job log is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
step 202, obtaining job log data to be analyzed of the job log center.
The job log center may be a system for running a target job, and the system may generate data related to the job log, and by analyzing the data, specific situations of each target job with different properties in the job log center may be known, and corresponding adjustments may be made to the job log center.
The job log data to be analyzed may be data obtained by processing log data generated by the target job during running, and the job log data to be analyzed includes various data, for example: log data records, table names, start times, end times, and the like, and the process in obtaining job log data to be analyzed from a target job may be key information extraction using a neural network through an attention mechanism.
Specifically, the server responds to an instruction of the terminal, obtains unanalyzed job log data corresponding to at least one log center from the terminal, and extracts key information from the unanalyzed job log data through a neural network of an attention mechanism to obtain job log data to be analyzed, wherein the job log data to be analyzed corresponding to the log center obtained by the terminal can be in real time or quasi-real time, and the time delay of the quasi-real time is not more than 1 minute. For the whole acquisition process, a large number of operation running logs in a log center can pass through a neural network of an attention mechanism, so that the key information of the logs is extracted. The method comprises the steps of obtaining the start time and the end time of a job (including the time points of normal end and error reporting interruption), the table name or the file name of a data source, the generated file name or the written table name, the name of a dependent previous job, an error reporting keyword or a log segment (if the job is executed without error reporting, the log segment is empty), the availability of a server and the availability of a database. The server stores the data in a storage unit of the server after acquiring the data, and calls the corresponding data from the storage unit to a central processing unit of the server for calculation when the data is needed to be used.
For example, the job log center a generates unanalyzed job log data corresponding to the target job B, the terminal inputs the data into the server after detecting the unanalyzed job log data, and then the extraction of the log key information is achieved through a pre-trained attention mechanism neural network in the server, so as to obtain the job log data C to be analyzed. Data which needs to be calculated is directly input into the volatile memory, and data which does not need to be processed temporarily is written into the nonvolatile memory for storage.
Step 204, inputting the job log data to be analyzed into the log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted based on the job log data analysis result to obtain a first judgment result.
The log analysis model can be an artificial intelligence model (a multi-layer structure bidirectional gated cyclic unit network) for analyzing log data of the job to be analyzed, the artificial intelligence model can be an analytical artificial intelligence model, a functional artificial intelligence model, an interactive artificial intelligence model, a text artificial intelligence model and a visual artificial intelligence model, and the server intelligently selects one type of artificial intelligence model according to the type of the data and the task requirement of a target and can also select the artificial intelligence model through manual intervention.
The job log data analysis result may be a result obtained by analyzing the job log data to be analyzed by the artificial intelligence model, and the job log data analysis result includes various data, for example: running time, file size, file transcoding, transmission time and other information.
The first determination result may be a result output by the determination unit of the server for the determination of the job log data analysis result, and the first determination result may be a plurality of types, such as a normal job, an abnormal job, an interrupted job, and the like, and the determination result in the determination unit may be adjusted according to an actual situation.
Specifically, log data of an operation to be analyzed is input into a log analysis model, the log analysis model searches records in a database according to an operation name, and the success rate of the operation in a certain period of time in the near future and the time length of a historical repair problem are calculated; obtaining the running time length by using the starting time and the ending time (the time point of interruption); and obtaining the size of the processed data volume according to the table name or the file name of the data source, and obtaining the file size and the file transcoding and transmission time length by the generated file name. And then, judging according to the operation data processed in the first two steps to obtain a first judgment result. Firstly, judging whether the interruption is caused, and firstly, labeling the interruption as interruption. And then according to the starting time length, the running time length, the data size and the environment information (the available state of the server and the running state of the previous operation), the label is normal or uninterrupted but the running information is abnormal through an abnormal state calculation formula. The job labeled as normal is classified as normal job, and the job labeled as interrupted and not interrupted but running information abnormal is classified as abnormal job.
A deep bidirectional GRU network model based on attention mechanism. The model is composed of an Input Layer, a DBGRU Layer, a full connection Layer and an Output Layer. The DBGRU layer is formed by mutually staggered and superposed three layers of BGRU networks and three layers of Attention, and the number of nerve units on each layer is 128. The fully-connected layer is composed of two layers of fully-connected networks, the first layer of fully-connected network is provided with 64 neurons, and the second layer of fully-connected network is provided with 32 neurons. The operation log operation is input into the model, and after passing through the DBGRU layer and the full connection layer, the operation log operation is output as an error report mark (0-normal, 1-error report), an error report reason analysis and a solution preparation, and a schematic diagram of a deep bidirectional GRU network model based on an attention mechanism is shown in FIG. 9.
The training process is as follows:
updated history information door z stored in GRU network(t)And reset gate control r(t). x is the input data and h is the cell output state of the GRU.
z(t)=σ(Wxzx(t)+Whzh(t-1)+bz)
Wherein: z is a radical of formula(t)Is to update the current state value of the gate, z(t)By the current input data x(t)And the output h of the last GRU(t-1)Influence of (1), Wxz、WhzRespectively corresponding update gate weight values, bzIs a bias parameter. The update gate is used to control the extent to which the state information at the previous time is brought into the current state, and a larger value of the update gate indicates that more state information at the previous time is brought in.
r(t)=σ(Wxrx(t)+Wxrh(t-1)+br)
Wherein: r is a radical of hydrogen(t)Is to reset the current state value of the gate, z(t)By the current input data x(t)And the output h of the last GRU(t-1)Influence of (1), Wxr、WhrAre respectively the corresponding reset gate weights, brIs a bias parameter. The reset gate is used to control the extent to which the state information at the previous time is ignored, with smaller values of the reset gate indicating more ignorance.
Figure BDA0003620556720000091
Wherein:
Figure BDA0003620556720000092
is the hidden state value of the memory cell at the current time,
Figure BDA0003620556720000093
by the current input data x(t)And a current reset gate r(t)Output h of GRU at last moment(t-1)The effect of the product is that the influence of the product,
Figure BDA0003620556720000094
respectively, the weight values are the corresponding weight values,
Figure BDA0003620556720000095
is a bias parameter.
Figure BDA0003620556720000096
Wherein: h is a total of(t)Is the output value of the GRU unit at time t.
The gate mechanism of GRU makes the model able to capture long-distance history information, and in order to obtain context information at the same time, bidirectional GRU is used, so that the output value h in BGRU(t)Can be expressed as follows:
Figure BDA0003620556720000097
wherein:
Figure BDA0003620556720000098
the forward and backward output values of the input data at time t in the GRU respectively,
Figure BDA0003620556720000099
indicating an integration operation.
Obtaining the output value h of all BGRU units by all the formulas(t)H is to be(t)Inputting an attention mechanism model. The Attention mechanism that this patent adopted is the Soft Attention model, and the formula is as follows:
Figure BDA0003620556720000101
wherein: ci (t)Is the output value of the ith memory cell after attention mechanism at time t, ai (t)Is the attention distribution coefficient, h, of the ith memory cell at time ti (t)Is the output value of the GRU unit at time t. At the moment, the first layer of BGRU network and attention mechanism model obtain a first batch of parameters, and all C are calculatedi (t)Inputting a second layer of BGRU network, and repeating the processes until three layers of BGRUs and each layer of attention mechanism model obtain a first batch of parameters. And taking the output of the attention mechanism model of the last layer as the input of the fully-connected network, starting training the fully-connected network, and finally setting the loss function as the average absolute error.
For example, the job log data C to be analyzed is input into a pre-trained log analysis model for analysis, the log analysis model used here is a multi-layer bidirectional gated cyclic unit network, so as to obtain a job log data analysis result D, and the job log data analysis result D is input into a determination unit in the server for type determination, so as to obtain a first determination result O.
Step 206, if the first judgment result is that the target job is not interrupted, inputting the job log data to be analyzed into an alarm tracking model to obtain a second judgment result corresponding to the uninterrupted job log data analysis result; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
The alarm tracking model may be a model for further determining the job log data to be analyzed, which has the first determination result that is not interrupted, to obtain a determination result. The alarm tracking model can adopt an artificial intelligence model and can also adopt common judgment logic, and the specific selection is determined according to the specific representation form of the log data of the operation to be analyzed.
The second judgment result may be a result obtained by performing the next judgment on the log data of the job to be analyzed, which is not interrupted by the first judgment result, and mainly includes a normal job and an abnormal alarm for the second judgment result, and a new judgment aspect may be introduced according to a special requirement, so as to adjust the judgment result in the judgment unit according to an actual situation.
The abnormal operation data can be job log data to be analyzed corresponding to the target job with the second judgment result of the abnormal alarm, the data can reflect the specific condition of the abnormal alarm, and if the abnormal operation data has a requirement, the abnormal operation data can be input into a visualization terminal for visualization display.
Specifically, the job log data to be analyzed corresponding to the target job which is not interrupted as the first judgment result is input to the alarm tracking model for further judgment, a second judgment result corresponding to the analysis result of the job log data which is not interrupted is obtained through judgment, the second judgment result is generally set to be a normal job and an abnormal alarm, however, a target classification other than the two classification results can be obtained according to needs, and if the second judgment result after judgment is an abnormal alarm, the server outputs the job log data to be analyzed corresponding to the target job as abnormal operation data. Aiming at the alarm tracking model, firstly, judging to obtain a judgment result, if the alarm tracking model is not interrupted or abnormal, possibly judging by mistake by a log analysis model, at the moment, firstly, arranging and writing the operation information into an operation normal operation information table, and then, notifying an operation emergency contact person by a mail to judge whether the operation is normal or not; if the operation is abnormal, the recent operation information of the operation is firstly obtained from the database, and the abnormal information is sent to the visualization module for displaying. And transmitting the mail and automatically working the mail to contact an emergency person to inform that the work is abnormal. And finally writing the operation information into an operation abnormal operation information table. The abnormal state calculation formula is as follows:
Figure BDA0003620556720000111
wherein SiIndicating the ith characteristic information in the latest operation information of the job S, SjiIndicating the ith characteristic information in the j-th running information in the near future of the operation S,
Figure BDA0003620556720000112
the average value of ith characteristic information of the recent n-time running of the operation S is shown, delta is a manually set threshold value, if the value exceeds the threshold value, the operation is judged to be abnormal, otherwise, the operation is normal, and an alarm algorithm model logic diagram is shown in FIG. 10.
For example, if the first determination result O is uninterrupted, the job log data C to be analyzed is input into the alarm tracking model for the next determination, so as to obtain a second determination result P corresponding to the analysis result of the uninterrupted job log data, if the second determination result P is a normal job, the information of the normal job (including information of the job operation start time, the operation duration, the data size, the front-back dependency and the like) is recorded into the job normal operation information table in the database according to the operation of the normal job, and if the second determination result P is an abnormal alarm, the job log data to be analyzed corresponding to the target job is output as abnormal operation data.
In the job log monitoring method, job log data to be analyzed of a job log center is obtained; the log data of the job to be analyzed is data obtained by processing log data generated during the running of the target job; inputting job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network; if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
By acquiring the job log data to be analyzed of the job log center, the analyzed data can be accurately and quickly acquired from the terminal, so that the job log center and the analysis system are highly matched, and the stability of the two systems is improved; the analysis result of the job log data is obtained through the analysis of the log analysis model, the first judgment result is obtained based on the result, the analysis of multiple steps can be carried out on the job log data to be analyzed, the job log data to be analyzed has a credible analysis result, each data has the same analysis process and has uniformity, the judgment result obtained from the analysis result also continues the characteristics, and the analysis efficiency and the accuracy of the job log data to be analyzed are improved; the log data of the operation to be analyzed, which is not interrupted by the first judgment result, is judged for the second time to obtain the judgment result of normal operation or abnormal alarm, so that the log data of the operation to be analyzed, which is not interrupted by the system, can be classified more carefully, defective data is prevented from flowing into the next step, the judgment accuracy of the system corresponding to the method is further improved, and the judgment efficiency is improved.
The intelligent log monitoring system based on the attention mechanism and the neural network can effectively reduce the cost of manual analysis and monitoring aiming at the problems of real-time monitoring and intelligent analysis and processing of logs in the prior operation and maintenance system technology.
In one embodiment, as shown in fig. 3, inputting job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and performing a determination based on whether the job log data analysis result is interrupted to obtain a first determination result, including:
step 302, obtaining the log data record corresponding to the job name corresponding to the target job, and the start time and the end time corresponding to the log data record.
The log data record may be a record of generation, invocation, and the like of the log data in a storage unit of the server, and the record includes at least one message of log data change.
The start time may be a time when the task corresponding to the target job starts processing, and for example, the time recorded when the server starts processing the target job is t 1.
The end time may be a time when the task corresponding to the target job ends, and for example, the time recorded when the server ends the processing of the target job is t 2.
Specifically, the server responds to an instruction of the terminal, acquires a log data record corresponding to a job name corresponding to a target job, a start time corresponding to the log data record and an end time corresponding to the log data record from the terminal, and performs corresponding processing on the three data, for example, log data record 1 corresponds to the start time, log data record n corresponds to the end time, the server stores the data in a storage unit of the server after acquiring the data, and the server calls the corresponding data from the storage unit to a central processing unit of the server for calculation when the data needs to be used.
For example, the server acquires from the terminal three data corresponding to the job name corresponding to the target job, which are log data record H and start time t1 and end time t2 corresponding to the log data record.
Step 304, inputting the log data record corresponding to the job name, the start time and the end time corresponding to the log data record into the log analysis model to obtain the job log data analysis result.
Specifically, a log data record corresponding to the job name, and a start time and an end time (all of the three data are data in the log data of the job to be analyzed) corresponding to the log data record are input to the log analysis model. Searching records in a database according to the job names, and calculating the success rate of the jobs in a certain period of time in the near future and the time length of historical problem repair; obtaining the running time length by using the starting time and the ending time (the time point of interruption); obtaining the size of the processed data according to the table name or the file name of the data source; and obtaining the size of the file and the transcoding and transmission duration of the file according to the generated file name. And obtaining an analysis result corresponding to the log data of the operator.
For example, the log data record H corresponding to the job name, and the start time t1 and the end time t2 corresponding to the log data record H are input into the log analysis model for calculation, and the calculation result obtains data such as the file size, the file transcoding, the transmission duration, and the like, and the set of the data is the job log data analysis result.
Step 306, based on the job log data analysis result, determining whether the target job is interrupted, and obtaining a first determination result.
Specifically, the job log data analysis result is input into a judgment function module in the log analysis model for judgment, that is, the size of the processed data volume (log data record, start time and end time corresponding to the log data record) is obtained according to the table name or the file name of the data source, and the generated file name obtains the file size and the file transcoding and transmission time length. Then, judgment is carried out according to the operation data to obtain a first judgment result. Firstly, judging whether the interruption is caused, firstly marking the interruption as the label, and inputting to the next step if the first judgment result is not the interruption.
For example, the job log data analysis result D is input into the judgment function module in the log analysis model for judgment to obtain a first judgment result O, where the first judgment result O mainly includes an interruption operation or a non-interruption operation, if the interruption operation is the interruption operation, the calculation is stopped, and the job log data C to be analyzed, which causes the interruption operation to occur, is output, and if the interruption operation is the non-interruption operation, the job log data C to be analyzed, which corresponds to the judgment, is input to the next step.
In this embodiment, the log data analysis result is obtained by inputting the log data record, the start time, and the end time into the log analysis model, and the first determination result is further obtained, so that the job log data analysis result and the influence factor of the first determination result can be determined, and the job log center cannot be corrected by the parameter of the first determination result.
In one embodiment, as shown in fig. 4, inputting the log data record corresponding to the job name, the start time and the end time corresponding to the log data record into the log analysis model to obtain a job log data analysis result, including:
step 402, calculating the job success rate and the historical repair problem duration of the time period corresponding to the job name based on the log data record corresponding to the job name.
The job success rate can be a success rate obtained by performing target jobs in the log job center, the changed power can be obtained by analyzing and calculating log data of jobs to be analyzed, and the target jobs performed by the log job center are regarded as qualified jobs if the changed power is greater than a preset success rate.
The historical problem repair duration can be a time length corresponding to the repair of hardware or software of the log operation center when the log operation center has a problem, and since all the situations of the log operation center are recorded, the historical problem repair duration is also recorded in the log data of the operation to be analyzed.
Specifically, according to the log data record corresponding to the job name, the job success rate of the time period corresponding to the recent job name and the historical repair problem duration are calculated. Calculating the success rate of the operation by counting all the operation items and successful operation items in the time corresponding to the recent operation name, dividing the successful operation items by all the operation items and multiplying the division by the percentage to obtain the success rate of the operation; and calculating the time length of the historical repair problem, namely, subtracting the repair time of the job log center from the fault starting time to obtain a time period.
For example, all the job items M and the successful job items M are read according to the log data record H corresponding to the job name, the failure starting time is T3, the repairing time is T4, the calculated job success rate is (M/M) × 100%, and the calculated historical repairing problem time length is T2-T4-T3.
And step 404, obtaining an operation duration corresponding to the log data record based on the start time and the end time corresponding to the log data record.
The running time length may be a difference between a start time and an end time corresponding to a log data record generated by the target job, and the running time length of the target job is also the running time length of the log data record corresponding to the target job.
Specifically, according to the start time and the end time in the log data record, the difference between the two time points is calculated to obtain the running duration corresponding to the log data record.
For example, the start time and the end time in the log record data are T1 and T2, and the difference between the two time points, that is, T1 to T2 to T1, is obtained to obtain the running time length corresponding to the log record.
And step 406, inputting the job success rate, the historical repair problem duration and the run-time duration corresponding to the log data record into the log analysis model to obtain a job log data analysis result.
Specifically, the operation success rate, the history problem repairing time length and the running time length corresponding to the log data record are simultaneously input into a log analysis model for calculation, and a set of the file size and the file transcoding and transmission time length is obtained by combining the table name or the file name, namely the operation log data analysis result.
For example, a log analysis model is simultaneously input to calculate the job success rate (M/M) × 100%, the historical repair problem time length T2 and the running time length T1 corresponding to the log data record, and a set of the file size, the file transcoding time length and the transmission time length is obtained by combining the table name or the file name, namely the job log data analysis result.
In this embodiment, by describing the detailed process and the input parameters obtained by the job log data analysis result, the influence factors of the job log data analysis result can be accurately reflected, which is helpful for setting the judgment classification for the first judgment in the following.
In one embodiment, as shown in fig. 5, if the first determination result indicates that the target job is not interrupted, the step of inputting the job log data to be analyzed into the alarm tracking model to obtain a second determination result corresponding to the analysis result of the job log data that is not interrupted includes:
step 502, obtaining a table name or a file name corresponding to the job log data to be analyzed.
The table name or the file name may be a name corresponding to a table or a file stored in the job log data to be analyzed, and details corresponding to the job log data to be analyzed, including the job log data to be analyzed and attributes of the job log data to be analyzed, may be obtained by locating the table name or the file name.
Specifically, the server responds to an instruction of the terminal, obtains a table name or a file name containing job log data to be analyzed from the terminal, further locates to a file corresponding to the stored job log data to be analyzed, stores the data in a storage unit of the server after the server obtains the data, and calls the corresponding data from the storage unit to a central processing unit of the server to perform calculation when the data is needed to be used.
For example, the server responds to the instruction of the terminal, acquires a table name C1 or a file name C2 corresponding to the job log data C to be analyzed from the terminal, locates the file corresponding to the stored job log data C to be analyzed according to the table name C1 or the file name C2, calls the data which needs to be used into the central processing unit, and stores the data which does not need to be processed temporarily into the storage unit of the server.
Step 504, inputting the table name or file name corresponding to the job log data to be analyzed into the alarm tracking model, and obtaining a second judgment result corresponding to the uninterrupted job log data analysis result.
Specifically, a table name or a file name corresponding to job log data to be analyzed is input into an alarm tracking model, the job log data to be analyzed in the table or the file is judged based on information, content, parameters and the like in the table name or the file name to obtain a second judgment result corresponding to an uninterrupted job log data analysis result, if the second judgment result is normal operation, information of normal operation (including information of operation starting time, operation duration, data size, front-back dependency and the like) is output and recorded into an operation normal operation information table in a database, and if the second judgment result is an abnormal alarm in the near future, the operation normal operation data and the abnormal operation data are transmitted to a visualization module.
For example, the table name C1 or the file name C2 corresponding to the job log data C to be analyzed is input to the alarm tracking model for the second analysis determination, and since the job data to be analyzed entering the alarm tracking model is determined to be uninterrupted for the first time, the second determination result P is a normal job or an abnormal alarm. If the second judgment result P is normal operation, outputting information of the normal operation (including information of operation starting time, operation duration, data size, front-back dependency and the like) and recording the information into an operation normal operation information table in the database, and if the second judgment result P is abnormal alarm, transmitting the recent normal operation data of the operation and the current abnormal operation data to the visualization module.
In this embodiment, through the description of the obtained specific path of the second determination result, the input parameter and the output parameter of the alarm tracking model can be made clear, which indicates that there is a clear determination mechanism for the specific determination condition of the target job that has not been interrupted but has an abnormal alarm or a normal job, and the reason for the abnormal alarm can be further understood.
In one embodiment, as shown in fig. 6, inputting the table name or file name corresponding to the job log data to be analyzed into the alarm tracking model to obtain a second determination result corresponding to the analysis result of the uninterrupted job log data, including:
step 602, obtaining the data volume corresponding to the job log data to be analyzed according to the table name or the file name of the job log data to be analyzed.
The data amount may be a data amount owned by each share of the job log data to be analyzed, or may be considered as an information amount included in each share of the job log data to be analyzed.
Specifically, the server locates a table name or a file name corresponding to the job log data to be analyzed, extracts information in the table or the file to obtain data in the table or the file, and performs processing such as classification and statistics on the data in the table or the file to obtain a data volume corresponding to the job log data to be analyzed.
For example, the server locates the table name C1 or the file name C2 corresponding to the job log data C to be analyzed according to the information, extracts the data in the table name C1 or the file name C2 according to the acquisition rule to obtain the data corresponding to the table name C1 or the file name C2, where the data are part or all of the job log data C to be analyzed, and obtains the data amount corresponding to the job log data C to be analyzed after the data are classified, counted, and the like.
Step 604, inputting the starting time, the running time, the data amount corresponding to the job log data to be analyzed and the environment information into an abnormal state calculation formula to obtain a second judgment result corresponding to the uninterrupted job log data analysis result.
The environment information may be a server availability status, a previous job running status, and the like, and generally represents the situation of the current computing environment and the amount that can be calculated.
The abnormal state calculation formula may be an analysis formula for integrating and comprehensively calculating each data in the log data of the job to be analyzed in the characterization abnormal analysis model, and then obtaining the condition of the target job.
Specifically, the starting time, the running time length and the data volume of the log data of the job to be analyzed without interruption operation and the environmental information corresponding to the current server are input into the abnormal state calculation formula for second analysis, and the target job is judged according to the analysis result. And obtaining a second judgment result corresponding to the analysis result of the uninterrupted job log data, wherein the label is normal or the uninterrupted job log data is abnormal in running information, classifying the label as normal into normal operation, and classifying the uninterrupted job log data which is abnormal in running information into abnormal operation.
For example, the start time T1, the running time T1, and the data size of the job log data C to be analyzed, which is not subjected to the interruption operation, and the environmental information (e.g., the available state of the server, the running state of the previous job) corresponding to the current server are input into the abnormal state calculation formula for the second analysis, the abnormal state calculation formula processes the T1, the T1, the data size, and the environmental information to obtain the corresponding analysis results, and the second judgment is performed according to the analysis results to obtain the second judgment result P corresponding to the analysis results of the job log data that is not subjected to the interruption operation.
In this embodiment, a plurality of different parameters are input to the abnormal state calculation formula to obtain a second judgment result, so that the obtaining manner of the second judgment result can be determined, and for a target job for which the second judgment result is an abnormal alarm, the reverse access can be performed according to the parameter corresponding to the second judgment result to find out the reason causing the abnormal alarm.
In one embodiment, as shown in fig. 7, the method further comprises:
step 702, inputting the log data of the job to be analyzed, which is output based on the target job corresponding to the abnormal alarm, into the tracking algorithm model to obtain a third judgment result.
The tracking algorithm model may monitor recent abnormal jobs, if the second determination result is found to be abnormal and does not show an abnormal condition after the monitoring of the tracking algorithm model, the abnormal jobs are classified into normal jobs, and if the second determination result is found to be the same as the second determination result after the monitoring of the tracking algorithm model, the abnormal jobs are still classified.
The third determination result may be a new determination result obtained after the target job with the second determination result being abnormal is tracked, and the determination result may change the conclusion of the second determination result, so as to prevent the data from being sent to the wrong target location due to the occurrence of erroneous determination of the second determination result.
Specifically, job log data to be analyzed, which is output by a target job corresponding to an abnormal alarm, is input to the tracking algorithm model, and the tracking algorithm model performs tracking and follow-up for a preset time period to obtain a result corresponding to tracking based on the preset time period, that is, a third determination result, where the third determination result may be a normal job or an abnormal alarm, and is similar to the second determination result. For the tracking algorithm, an abnormal operation list is firstly pulled from the operation abnormal operation information table, and the operation execution information of the day is obtained according to the list. Judging whether the operation is in an abnormal state or not, and if so, updating operation related data in the operation abnormal operation information table; if not, accumulating the normal execution times, and judging whether the normal execution times is less than N (N times of successful execution is needed) to update the abnormal monitoring state of the release operation.
The abnormal state calculation formula is as follows:
Figure BDA0003620556720000181
wherein SiIndicates ith feature information, S 'in latest operation information of operation S'jiIndicating the ith characteristic information in the jth recent normal operation information of the operation S,
Figure BDA0003620556720000182
the average value of ith characteristic information of the recent n times of normal operation of the operation S is represented, delta is a manually set threshold value, if the value exceeds the threshold value, the operation is judged to be abnormal, otherwise, the operation is normal, and a logic diagram of a tracking algorithm model is shown in FIG. 11.
For example, for a target job for which the second determination result P is an abnormal alarm, the log data of the job to be analyzed corresponding to the target job is input into the tracking algorithm model to perform tracking and follow-up for a period of time, where the period of time may be a preset time, and a third determination result Q corresponding to the tracking and follow-up is obtained.
In step 704, if the third determination result is abnormal, the corresponding running job data in the abnormal running information table is updated.
The abnormal operation information table may be used to record abnormal information corresponding to the target job, for example: the success rate is too low, the start time is not corresponding, the running time is too long, the server state is unavailable, the running state of the previous operation is not completed, the repair time is too long, and the like.
Specifically, if the third determination result after the third determination is an abnormal alarm, updating the running job data in the abnormal running information table generated by the abnormal alarm in the second determination result, and overwriting the running job data generated by the third determination result.
For example, if the third determination result Q obtained after the third determination is an abnormal alarm and the running job data Q corresponding to the third determination result is generated at the same time, the running job data P generated by the second determination result P and also the abnormal alarm is covered by the running job data Q.
And 706, if the third judgment result is normal, repeatedly executing the step of inputting the log data of the target job output to be analyzed corresponding to the abnormal alarm into the tracking algorithm model, and recording the times of repeatedly executing the step.
Specifically, if the third determination result is normal, the log data of the job to be analyzed corresponding to the target job whose second determination result is abnormal is repeatedly input to the tracking algorithm model to perform multiple operations, where the multiple operations are greater than or equal to 2, and are generally a threshold N of a preset number of times, and the number of times of repeated operations is recorded.
For example, if the third determination result Q is a normal job, the job log data C to be analyzed corresponding to the target job whose second determination result P is abnormal is input to the tracking algorithm model again, the threshold N of the number of times of repeatedly operating the step is set, and the number of times of repeatedly operating N' is recorded after a plurality of operations.
In this embodiment, the job log data to be analyzed corresponding to the abnormal alarm is input to the tracking algorithm model for multiple times of analysis and calculation to obtain a third determination result, and multiple times of tracking analysis can be performed on the job log data to be analyzed having the abnormal alarm, so that deviation of the flow direction of the data to the destination due to a determination error in the second determination result is avoided.
In an embodiment, as shown in fig. 8, repeatedly executing the target job corresponding to the abnormal alarm and outputting the job log data to be analyzed to the tracking algorithm model to obtain a fourth determination result, including:
and step 802, if the number of times is smaller than a preset number threshold, outputting the log data of the job to be analyzed again aiming at the target job, and determining that the log data of the job to be analyzed is abnormal.
Specifically, if the third determination result is an abnormal alarm when the number of times of repeatedly inputting the job log data to be analyzed to the tracking algorithm model is less than the preset number threshold, the original determination result is maintained for the job log data to be analyzed output by the target job without tracking and following of the tracking algorithm model, and the abnormal alarm is given.
For example, if the number of times of repeatedly inputting the job log data C to be analyzed to the tracking algorithm model is 8 times, the third determination result Q is abnormal again, the set threshold value of the number of times of repetition is 10 times, and since the number of times of repetition is 8 times less than the preset 10 times, the original determination result is still maintained for the job log data C repeatedly input to the tracking algorithm model, that is, the abnormal alarm is given.
And step 804, if the frequency is greater than or equal to the frequency threshold value, removing the abnormal monitoring state corresponding to the target operation, and outputting normal operation information.
Specifically, if the third determination result is a normal job when the number of times of repeatedly inputting the job log data to be analyzed to the tracking algorithm model is more than the preset number threshold, the original determination result of the job log data to be analyzed output for the target job is changed by tracking and following the tracking algorithm model, that is, the original abnormal alarm is changed into a normal job.
For example, if the number of times of repeatedly inputting the job log data C to be analyzed to the tracking algorithm model is 10, the third determination result Q is a normal job output, and the set threshold value of the number of times of repetition is 10, and since the number of times of repetition 10 is equal to the preset number of times of repetition, the original determination result is changed for the analysis job log data C repeatedly input to the tracking algorithm model, that is, an abnormal alarm is changed to a normal job.
In this embodiment, by comparing the loop calculation frequency of the analysis job log data for the abnormal alarm with the preset value, it can be determined that the log data corresponding to the threshold value that reaches the loop calculation frequency is a normal job, and the log data corresponding to the threshold value that cannot reach the loop calculation frequency is an abnormal job.
In one embodiment, the job log monitoring method further includes: the visualization component can display various kinds of related information of operation according to user requirements, such as operation starting time, operation duration, data size, file size and other information graphs. An error reporting analysis and solution proposal aiming at the interrupted operation is also arranged below, can be referred by developers, and has certain auxiliary function for new employees or developers who just participate in the project.
For the logic of the job log monitoring scheme, a flow chart of an intelligent log monitoring system based on an attention mechanism and a neural network is shown in fig. 12, and firstly, a log analysis module is provided, a deep bidirectional cyclic neural network model introducing the attention mechanism is used as a main body, logs of application job operation are analyzed in real time, and the daily operation state of the job is monitored and recorded in a database. The method uses a bidirectional cyclic neural network to carry out context deep analysis on the logs, and then introduces an attention mechanism to improve the analysis capability of the network on the logs with huge data volume, thereby improving the analysis efficiency and achieving real-time or quasi-real-time analysis. And classifying and sorting the operation information of each operation, directly transmitting the information of the normal operation to a database, and writing the information into an operation normal operation information table. And transmitting the analysis result of the abnormal operation to the next module for subsequent operation.
And providing an alarm tracking module for classifying the abnormal operation according to an alarm algorithm. Alarming at the first time of the interrupted operation, intercepting an error log, collecting related data (success rate, response time and the like), comparing the data with the related data of the recent normal operation of the operation, and transmitting a comparison result to a visualization module; for the operation which is not interrupted but has abnormal operation information, early warning is carried out, relevant data (success rate, response time and the like) are collected, the relevant data are compared with the relevant data of the operation which normally operates in the near term, and the comparison result is transmitted to a visualization module; and performing key monitoring and analysis on the monitored operation according to the tracking algorithm. And after N times of normal operation is achieved, the monitoring state of the operation is released.
And finally, providing a visualization module for visualizing the operation information and displaying the operation information in a chart form. The chart form can also be changed according to the specific requirements of the user.
It should be understood that, although the steps in the flowcharts related to the embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a job log monitoring apparatus for implementing the above-mentioned job log monitoring method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so specific limitations in one or more embodiments of the job log monitoring device provided below can be referred to the limitations of the job log monitoring method in the foregoing, and details are not described herein again.
In one embodiment, as shown in fig. 13, there is provided a job log monitoring apparatus including: the system comprises a to-be-analyzed job log data acquisition module, a first judgment result acquisition module and an abnormity alarm module, wherein:
an analysis job log data obtaining module 1302, configured to obtain job log data to be analyzed in a job log center; the log data of the job to be analyzed is data obtained by processing log data generated during the running of the target job;
a first judgment result obtaining module 1304, configured to input job log data to be analyzed into the log analysis model to obtain a job log data analysis result, and judge whether the target job is interrupted based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network;
an exception alarm module 1306, configured to, if the first determination result is that the target job is not interrupted, input job log data to be analyzed to an alarm tracking model, and obtain a second determination result corresponding to an analysis result of the uninterrupted job log data; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
In one embodiment, the first determination result obtaining module is configured to obtain a log data record corresponding to a job name corresponding to a target job, and start time and end time corresponding to the log data record; inputting the log data record corresponding to the job name, the start time and the end time corresponding to the log data record into a log analysis model to obtain a job log data analysis result; and judging whether the target operation is interrupted or not based on the operation log data analysis result to obtain a first judgment result.
In one embodiment, the first judgment result obtaining module is configured to calculate, based on a log data record corresponding to a job name, a job success rate and a historical problem repair duration of a time period corresponding to the job name; obtaining an operation duration corresponding to the log data record based on the start time and the end time corresponding to the log data record; and inputting the operation success rate, the history repairing problem duration and the running duration corresponding to the log data record into the log analysis model to obtain an operation log data analysis result.
In one embodiment, the second determination result obtaining module is configured to obtain a table name or a file name corresponding to the job log data to be analyzed, where the table name or the file name reflects a bearing condition of the job log data to be analyzed; inputting the table name or file name corresponding to the job log data to be analyzed into the alarm tracking model to obtain a second judgment result corresponding to the uninterrupted job log data analysis result; and the second judgment result is used for representing whether the target operation has an abnormal alarm or not.
In one embodiment, the second determination result obtaining module is configured to obtain a data amount corresponding to the job log data to be analyzed according to a table name or a file name of the job log data to be analyzed; and inputting the data volume corresponding to the job log data to be analyzed and the environmental information into an abnormal state calculation formula based on the starting time, the running time, and the second judgment result corresponding to the uninterrupted job log data analysis result.
In one embodiment, the tracking module is configured to input job log data to be analyzed, which is output based on a target job corresponding to the abnormal alarm, into the tracking algorithm model to obtain a third determination result; if the third judgment result is abnormal, updating the corresponding operation data in the abnormal operation information table; the abnormal operation information table is used for recording abnormal information corresponding to the target operation; and if the third judgment result is normal, repeatedly executing the step of outputting the log data of the job to be analyzed to the tracking algorithm model by the target job corresponding to the abnormal alarm, and recording the times of repeatedly executing the step.
In one embodiment, the tracking module is configured to output job log data to be analyzed again for the target job and determine that the job log data is abnormal if the number of times is smaller than a preset number threshold; and if the frequency is greater than or equal to the frequency threshold value, removing the abnormal monitoring state corresponding to the target operation, and outputting normal operation information.
Each module in the above-described job log monitoring apparatus may be wholly or partially implemented by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 14. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing server data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a job log monitoring method.
Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (11)

1. A method for monitoring a job log, the method comprising:
acquiring job log data to be analyzed of a job log center; the log data of the job to be analyzed is data obtained by processing log data generated by the running of the target job;
inputting the job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and judging whether the target job is interrupted or not based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network;
if the first judgment result is that the target operation is not interrupted, the log data of the operation to be analyzed is input into an alarm tracking model, and a second judgment result corresponding to the analysis result of the uninterrupted log data of the operation is obtained; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
2. The method according to claim 1, wherein the inputting the job log data to be analyzed into a log analysis model to obtain a job log data analysis result, and determining whether the job log data analysis result is interrupted to obtain a first determination result comprises:
acquiring a log data record corresponding to the job name corresponding to the target job, wherein the start time and the end time correspond to the log data record;
inputting the log data record corresponding to the job name, the start time and the end time corresponding to the log data record into the log analysis model to obtain the job log data analysis result;
and judging whether the target operation is interrupted or not based on the operation log data analysis result to obtain the first judgment result.
3. The method according to claim 2, wherein the inputting the log data record corresponding to the job name, the start time and the end time corresponding to the log data record into the log analysis model to obtain the job log data analysis result comprises:
calculating the job success rate and the historical repair problem duration of the time period corresponding to the job name based on the log data record corresponding to the job name;
obtaining the running duration corresponding to the log data record based on the starting time and the ending time corresponding to the log data record;
and inputting the operation success rate, the historical problem repairing time length and the running time length corresponding to the log data record into the log analysis model to obtain the operation log data analysis result.
4. The method according to claim 1, wherein if the first determination result indicates that the target job is not interrupted, inputting the job log data to be analyzed into an alarm tracking model to obtain a second determination result corresponding to the analysis result of the uninterrupted job log data, and the method includes:
acquiring a table name or a file name corresponding to the to-be-analyzed job log data, wherein the table name or the file name reflects the load condition of the to-be-analyzed job log data;
inputting the table name or file name corresponding to the job log data to be analyzed into the alarm tracking model to obtain a second judgment result corresponding to the uninterrupted job log data analysis result; and the second judgment result is used for representing whether the target operation has an abnormal alarm or not.
5. The method of claim 4, wherein the server availability status and the previous job running status are context information; inputting the table name or the file name corresponding to the job log data to be analyzed into the alarm tracking model to obtain a second judgment result corresponding to the uninterrupted job log data analysis result, wherein the method comprises the following steps:
obtaining the data volume corresponding to the job log data to be analyzed according to the table name or the file name of the job log data to be analyzed;
and inputting the starting time, the running time length, the data volume corresponding to the job log data to be analyzed and the environment information into an abnormal state calculation formula to obtain a second judgment result corresponding to the uninterrupted job log data analysis result.
6. The method of claim 1, further comprising:
inputting the log data of the job to be analyzed, which is output based on the target job corresponding to the abnormal alarm, into a tracking algorithm model to obtain a third judgment result;
if the third judgment result is abnormal, updating corresponding operation data in the abnormal operation information table; the abnormal operation information table is used for recording abnormal information corresponding to the target operation;
and if the third judgment result is normal, repeatedly executing the step of outputting the log data of the target operation corresponding to the abnormal alarm and the operation to be analyzed to a tracking algorithm model, and recording the times of repeatedly executing the step.
7. The method according to claim 6, wherein the repeatedly executing the target job corresponding to the abnormal alarm and outputting the log data of the job to be analyzed to a tracking algorithm model to obtain a fourth determination result includes:
if the times are smaller than a preset time threshold, outputting the job log data to be analyzed again aiming at the target job, and judging the job log data to be analyzed to be abnormal;
and if the times are greater than or equal to the time threshold, removing the abnormal monitoring state corresponding to the target operation, and outputting normal operation information.
8. An apparatus for monitoring a job log, the apparatus comprising:
the analysis job log data acquisition module is used for acquiring the analysis job log data of the job log center; the log data of the job to be analyzed is data obtained by processing log data generated when the target job runs;
a first judgment result obtaining module, configured to input the job log data to be analyzed to a log analysis model to obtain a job log data analysis result, and judge whether the target job is interrupted based on the job log data analysis result to obtain a first judgment result; the log analysis model is a multi-layer structure bidirectional gating circulation unit network;
the abnormal alarm module is used for inputting the log data of the job to be analyzed into an alarm tracking model if the first judgment result indicates that the target job is not interrupted, and obtaining a second judgment result corresponding to the analysis result of the uninterrupted log data of the job; and if the second judgment result is an abnormal alarm, outputting abnormal operation data aiming at the target operation.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by a processor.
CN202210456220.XA 2022-04-28 2022-04-28 Job log monitoring method and device, computer equipment and storage medium Pending CN114676021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210456220.XA CN114676021A (en) 2022-04-28 2022-04-28 Job log monitoring method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210456220.XA CN114676021A (en) 2022-04-28 2022-04-28 Job log monitoring method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114676021A true CN114676021A (en) 2022-06-28

Family

ID=82079672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210456220.XA Pending CN114676021A (en) 2022-04-28 2022-04-28 Job log monitoring method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114676021A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277180A (en) * 2022-07-26 2022-11-01 电子科技大学 Block chain log anomaly detection and tracing system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277180A (en) * 2022-07-26 2022-11-01 电子科技大学 Block chain log anomaly detection and tracing system
CN115277180B (en) * 2022-07-26 2023-04-28 电子科技大学 Block chain log anomaly detection and tracing system

Similar Documents

Publication Publication Date Title
CN111178456B (en) Abnormal index detection method and device, computer equipment and storage medium
CN111045894B (en) Database abnormality detection method, database abnormality detection device, computer device and storage medium
KR102118670B1 (en) System and method for management of ict infra
TW202215242A (en) Data real-time monitoring method and apparatus based on machine learning
US20160344762A1 (en) Method and system for aggregating and ranking of security event-based data
CN112579728B (en) Behavior abnormity identification method and device based on mass data full-text retrieval
CN107423205B (en) System fault early warning method and system for data leakage prevention system
CN113900844A (en) Service code level-based fault root cause positioning method, system and storage medium
CN115313625A (en) Transformer substation monitoring method and system
KR20190001501A (en) Artificial intelligence operations system of telecommunication network, and operating method thereof
CN115237717A (en) Micro-service abnormity detection method and system
CN114676021A (en) Job log monitoring method and device, computer equipment and storage medium
CN107480703B (en) Transaction fault detection method and device
CN110413482B (en) Detection method and device
CN116149895A (en) Big data cluster performance prediction method and device and computer equipment
CN114312930B (en) Train operation abnormality diagnosis method and device based on log data
Azvine et al. Intelligent process analytics for CRM
Zhang et al. A Real-time, Scalable Monitoring and User Analytics Solution for Microservices-based Software Applications
He et al. Hard Disk Fault Detection Method based on Temporal Convolutional Network
CN115408197B (en) Load data verification method based on streaming processing and multi-source data cross verification
Neto et al. Towards a Transition Matrix-Based Concept Drift Approach: Experiments on the Detection Task.
US11749070B2 (en) Identification of anomalies in an automatic teller machine (ATM) network
CN114358911B (en) Invoicing data risk control method and device, computer equipment and storage medium
Wang et al. A Two-Layer Architecture for Failure Prediction Based on High-Dimension Monitoring Sequences
CN117194092A (en) Root cause positioning method, root cause positioning device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination