CN116361104A

CN116361104A - Big data-based application fault prediction method, device, equipment and storage medium

Info

Publication number: CN116361104A
Application number: CN202310130883.7A
Authority: CN
Inventors: 汤文鹏; 朱桂林; 王青召; 翟钧; 苏琳珂
Original assignee: Chongqing Changan New Energy Automobile Technology Co Ltd
Current assignee: Chongqing Changan New Energy Automobile Technology Co Ltd
Priority date: 2023-02-17
Filing date: 2023-02-17
Publication date: 2023-06-30

Abstract

The application fault prediction method, the device, the equipment and the storage medium based on big data are characterized in that through collecting historical log data of a plurality of historical moments of an application to be predicted and marking, the marked historical log data are used as a training set to train a preset basic prediction model to obtain an application fault prediction model for predicting a fault prediction result of the application to be predicted in a future time period, the obtained current log data of the application to be predicted are input into the application fault prediction model to obtain the fault prediction result, prediction of an application fault can be actively carried out through the mode, an active defense scheme of the application fault prediction is provided, the application fault prediction can be changed into active one, the situation is prevented, and the operation and maintenance efficiency is improved.

Description

Big data-based application fault prediction method, device, equipment and storage medium

Technical Field

The application relates to the technical field of intelligent operation and maintenance, in particular to an application fault prediction method, device and equipment based on big data and a storage medium.

Background

The development of operation and maintenance can be divided into four stages, namely manual operation and maintenance, script and open source tool operation and maintenance, automatic operation and maintenance and intelligent operation and maintenance, and most companies are in the second stage and the third stage at present, and are passive operation and maintenance.

In the prior art, document 1 (CN 108123834 a) describes a log analysis system based on a big data platform, which proposes to perform real-time data analysis on a network data packet, perform data feature matching through a network data protocol feature library, send the network log data confirmed as abnormal by the matching to the big data platform for storage, perform cluster analysis, classify training, and dynamically update the network data protocol feature library. However, the conventional document 1 only provides a scheme for collecting and analyzing the anomaly logs to dynamically update the network data protocol feature library, and only provides a technical point of view of determining whether the current log is anomalous from comparison of individual anomaly cases, but in the field of applying fault prediction, the non-rain murmur is far more important than judging whether a fault occurs.

At present, in the related art, an alarm can be given only when an application fault occurs, and the system stays at the angle of passive operation and maintenance, so that an intelligent operation and maintenance scheme for predicting the application fault is lacking.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, the present application provides an application fault prediction method, apparatus, device and storage medium based on big data, so as to solve the above-mentioned technical problem that the related art lacks an intelligent operation and maintenance scheme for predicting an application fault.

The application provides an application fault prediction method based on big data, which comprises the following steps: acquiring current log data of an application to be predicted; inputting the current log data into an application fault prediction model to obtain a fault prediction result of the application to be predicted; the training mode of the application fault prediction model comprises the steps of collecting historical log data of a plurality of historical moments of the application to be predicted, and marking to obtain marking results of the historical log data, wherein at least one marking result of the historical log data is a fault, the marked historical log data is used as a training data set, and the training data set is used for training a preset basic prediction model to obtain the application fault prediction model for predicting the fault prediction result of the application to be predicted in a future time period.

In an embodiment of the present application, collecting and annotating the history log data of the plurality of history moments of the application to be predicted, to obtain an annotation result of each history log data, including: collecting historical operation data of the application to be predicted at a plurality of historical moments, and marking the abnormal state of the historical operation data according to a preset marking standard to obtain marking results of the historical operation data; and collecting a plurality of historical fault data of a plurality of historical moments in the application fault time period to be predicted, wherein the fault time period comprises the fault moment, a first preset time period before the fault moment and a second preset time period after the fault moment, the historical fault data of the fault moment are marked as faults, the historical fault data of the rest historical moments are marked as normal, marking results of the historical fault data are obtained, and the historical log data comprise the historical operation data and the historical fault data.

In an embodiment of the present application, collecting historical operation data of the plurality of historical moments of the application to be predicted includes: and reading process related data of a plurality of processes of the application to be predicted, memory operation data of a host, disk operation data of a server and use condition data of preset components in the server at intervals of preset time to obtain historical operation data of a plurality of historical moments.

In an embodiment of the present application, labeling the abnormal state of the historical operation data according to a preset labeling standard to obtain a labeling result of each historical operation data includes: if the preset labeling standard is met, labeling the abnormal state of the historical operation data as a fault, and if the preset labeling standard is not met, labeling the abnormal state of the historical operation data as normal, so as to obtain labeling results of the historical operation data; the preset labeling standard comprises at least one of a central processing unit utilization rate being larger than a first preset threshold, a memory occupation being larger than a second preset threshold, a single garbage collection frequency being larger than a third preset threshold, single garbage collection time being larger than a first preset duration, a garbage collection cycle frequency being larger than a fourth preset threshold, and a disk input/output utilization rate being larger than a preset utilization rate, and the historical operation data comprises at least one of the central processing unit utilization rate, the memory occupation, the single garbage collection frequency, the single garbage collection time being, the garbage collection cycle frequency and the disk input/output utilization rate.

In an embodiment of the present application, collecting a plurality of historical fault data of a plurality of historical moments in the application fault period to be predicted includes: the method comprises the steps of collecting historical fault data of fault time when an application to be predicted breaks down, collecting historical fault data of a plurality of historical time points of a first preset time period before the fault time, and collecting historical fault data of a plurality of historical time points of a second preset time period after the fault time, wherein the fault comprises at least one of application running, process blocking and process absence, and the historical fault data comprises at least one of central processing unit utilization rate, memory occupation, disk input and output utilization rate, garbage collection single-time data and garbage collection period data.

In an embodiment of the present application, training the preset basic prediction model through the training data set includes: predicting each history log data in the training data set through the preset basic prediction model to obtain an initial prediction result, wherein the initial prediction result comprises at least one initial fault result at a prediction moment, and the prediction moment is later than the history moment of the history log data; determining a target loss function according to the labeling result of each history log data and the initial prediction result; and training the preset basic prediction model according to the target loss function.

In an embodiment of the present application, the initial prediction result includes at least two initial failure results at prediction moments, and the application failure prediction model is used for predicting failure prediction results of at least two future moments of the application to be predicted in a future time period.

The application also provides an application fault prediction device based on big data, which comprises: the acquisition module is used for acquiring current log data of the application to be predicted; the result prediction module is used for inputting the current log data into an application fault prediction model to obtain a fault prediction result of the application to be predicted; the model training module is used for collecting the historical log data of the plurality of historical moments of the application to be predicted, and labeling the historical log data to obtain labeling results of the historical log data, wherein the labeling result of at least one of the historical log data is a fault, the labeled historical log data is used as a training data set, and the training data set is used for training a preset basic prediction model to obtain an application fault prediction model for predicting the fault prediction result of the application to be predicted in a future time period.

The application also provides an application fault prediction device based on big data, the device comprising: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the apparatus to implement the big data based application failure prediction method as claimed in any of the preceding claims.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the big data based application failure prediction method as set forth in any of the above.

As described above, the application fault prediction method, apparatus, device and storage medium based on big data have the following beneficial effects:

the method comprises the steps of collecting historical log data of a plurality of historical moments of an application to be predicted, marking, taking marked historical log data as a training set to train a preset basic prediction model to obtain an application fault prediction model for predicting a fault prediction result of the application to be predicted in a future time period, inputting the obtained current log data of the application to be predicted into the application fault prediction model to obtain a fault prediction result, and actively predicting an application fault in the mode.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

FIG. 1 is a flow chart of a big data based application failure prediction method according to an embodiment of the present application;

FIG. 2 is a schematic image of a sigmoid function according to an embodiment of the present application;

FIG. 3 is a graphical representation of a log (x) function provided by an embodiment of the present application;

FIG. 4 is a graphical representation of a log (1-x) function provided by an embodiment of the present application;

FIG. 5 is a flowchart of a big data based application failure prediction method according to another embodiment of the present application;

FIG. 6 is a schematic hardware structure of a big data based application failure prediction device according to an embodiment of the present application;

fig. 7 is a schematic diagram of a hardware architecture of a big data based application failure prediction device suitable for implementing one or more embodiments of the present application.

Detailed Description

Further advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure in the present specification, by describing embodiments of the present application with reference to the accompanying drawings and preferred examples. The present application may be embodied or carried out in other specific embodiments, and the details of the present application may be modified or changed from various points of view and applications without departing from the spirit of the present application. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation to the scope of the present application.

It should be noted that, the illustrations provided in the following embodiments merely illustrate the basic concepts of the application by way of illustration, and only the components related to the application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complex.

In the present application, "and/or" describing the association relationship of the association object, it means that there may be three relationships, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The term "plurality" as used herein refers to two or more.

In the description of this application, the words "first," "second," and the like are used solely for the purpose of distinguishing between descriptions and not necessarily for the purpose of indicating or implying a relative importance or order.

In addition, in the embodiments of the present application, the term "exemplary" is used to mean serving as an example, instance, or illustration. Any embodiment or implementation described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or implementations. Rather, the term use of an example is intended to present concepts in a concrete fashion.

In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the present application, however, it will be apparent to one skilled in the art that embodiments of the present application may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present application.

Fig. 1 shows a flowchart of an application failure prediction method based on big data according to an embodiment of the present application. Specifically, in an exemplary embodiment, as shown in fig. 1, the present embodiment provides an application failure prediction method based on big data, the method including the steps of:

step S110, current log data of an application to be predicted is obtained.

The application to be predicted may be an application preset by a person skilled in the art, and the current log data may be data generated by running the predicted application in one or more hardware devices.

It should be noted that, the current log data is data that needs to predict a time of an application to be predicted, and is not particularly data collected in real time.

And step S120, inputting the current log data into an application fault prediction model to obtain a fault prediction result of the application to be predicted.

The training mode of applying the fault prediction model comprises the following steps:

collecting historical log data of a plurality of historical moments of an application to be predicted, and marking to obtain marking results of the historical log data, wherein the marking result of at least one historical log data is a fault;

and training the preset basic prediction model through the training data set by taking the marked historical log data as the training data set to obtain an application fault prediction model for predicting a fault prediction result of the application to be predicted in a future time period.

It can be seen that this embodiment provides a scheme for predicting failure of an application based on big data, and predicts whether the application will fail at a certain future point in time based on log analysis of the server and the application. The method comprises the steps of inputting historical log data collected through a big data platform, such as server logs (CPU, memory and disk use conditions), application log information (GC frequency and time consumption) and historical fault information, training the historical log data into a model, and inputting log information (namely current log data) of current log information (a certain day or a certain time period) into an application fault prediction model after training to predict whether the application to be predicted is likely to have faults. The method can actively predict the application faults, provides an active defense scheme for the application fault prediction, can change the passive mode into the active mode, prevents the application faults from happening, and improves the operation and maintenance efficiency. And by combining a big data analysis technology and a machine learning technology, fault points are rapidly positioned, the state of a system and possible faults are predicted, and the traditional operation and maintenance thought is changed.

In an embodiment of the present application, collecting and annotating history log data of a plurality of history moments of an application to be predicted, to obtain an annotation result of each history log data, including:

collecting historical operation data of a plurality of historical moments of the application to be predicted, and marking the abnormal state of the historical operation data according to a preset marking standard to obtain marking results of the historical operation data;

collecting a plurality of historical fault data of a plurality of historical moments in a to-be-predicted application fault period, wherein the fault period comprises a fault moment, a first preset time period before the fault moment and a second preset time period after the fault moment, marking the historical fault data of the fault moment as a fault, marking the historical fault data of the rest historical moments as normal, and obtaining marking results of the historical fault data, wherein the historical log data comprises historical operation data and historical fault data.

For example, the CPU (central processing unit) usage rate, the memory occupation condition, the JVM GC (java adds Garbage Collection (GC)) frequency on the JVM virtual machine, and the like of each node of the application system can be collected as the history running data. In this embodiment, collecting historical operation data of a plurality of historical moments of an application to be predicted includes:

and reading process related data of a plurality of processes of the application to be predicted, memory operation data of a host, disk operation data of a server and use condition data of preset components in the server at intervals of preset time to obtain historical operation data of a plurality of historical moments.

The method for collecting process-related data of a plurality of processes in one example is that process-related information provided by a proc interface of a Linux system is read once every 30 seconds (other time periods set by a person skilled in the art may be also just one example, and the details are not repeated herein), and relevant fields (process-related data) of the extracted information can be referred to table 1, which mainly reflects users of each process, occupied memory and display memory, start and end time of the process, and the like. After the data is acquired, the data is classified and stored in a preset storage space according to the process number, such as an elastic search and the like. Specific examples of the process-related data (one or more groups thereof may be selected as the process-related data) are as follows:

TABLE 1

An elastic search insertion and data query method can be used for two types of appointed Id and automatic generation Id, wherein the appointed Id uses a PUT operation, the automatic generation Id uses a POST operation, mapping is automatically generated when the POST is performed, and if parameters of the POST are not strictly defined, the corresponding mapping is automatically established according to the condition of the POST; the data query mode may be: the query specifying index information, the query specifying document information, all data under the query corresponding index, the query character string search and the structured query are combined by one or more conditions, and the query mode can be fuzzy matching, similar to like in sql or accurate matching. For example, showing the age of 22 years, all people with the name band ston 22, output the top 10 items closest to the query result, ordered from high to low in terms of the degree of compliance.

The method for acquiring the memory operation data of the host machine includes that the memory operation data of the host machine is obtained by reading the content of a host machine/proc/meminfo file once every 30 seconds through codes, the related information of the memory operation of the host machine is mainly reflected, after the data is acquired, the data is stored into a preset storage space such as an elastic search according to the acquisition time, and specific parameters (one or more groups of the memory operation data can be selected) of the memory operation data of the host machine can be seen as the memory operation data in the following table 2:

TABLE 2

The method for acquiring the disk operation data of the server and the usage data of the preset components in the server includes that the content of a host/proc/disks file is read every 30 seconds through codes, the fields of the extracted information are different according to the number of disks of the server, the relevant information of the disk operation of the server is mainly reflected, and the data are stored in an elastic search (real-time data storage engine) according to the acquisition time after being acquired.

The collection method of the preset component use condition data in the server in an example is that the JVM GC condition of the application program is obtained through codes every 30 seconds, and the use condition of JVM (java related component) in the server is mainly reflected. After data acquisition, the process numbers are classified and stored into an elastic search according to the acquisition time, the extracted fields can be seen in table 3, and one or more fields in table 3 can be selected as preset component use condition data:

TABLE 3 Table 3

In an embodiment, marking the abnormal state of the historical operation data according to a preset marking standard to obtain a marking result of each historical operation data, including:

if the preset labeling standard is met, labeling the abnormal state of the historical operation data as a fault, and if the preset labeling standard is not met, labeling the abnormal state of the historical operation data as normal, so as to obtain labeling results of the historical operation data;

the preset labeling standard comprises at least one of a central processing unit utilization rate being larger than a first preset threshold, a memory occupation being larger than a second preset threshold, a single garbage collection frequency being larger than a third preset threshold, single garbage collection time being larger than a first preset duration, a garbage collection cycle frequency being larger than a fourth preset threshold, a disk input/output utilization rate being larger than a preset utilization rate, and historical operation data comprising at least one of the central processing unit utilization rate, the memory occupation, the single garbage collection frequency, the single garbage collection time being longer than the single garbage collection cycle frequency and the disk input/output utilization rate.

For example, in the elastic search, data with CPU occupation (central processing unit utilization) exceeding 80% is marked, data with memory utilization (memory occupation) exceeding 80% is marked, data with disk input/output utilization exceeding 80% is marked, and data with Young GC frequency (garbage collection single frequency) exceeding 200ms for once per minute or GC time (garbage collection single time) or Full GC frequency (garbage collection cycle frequency) exceeding once per day is marked.

In an embodiment, collecting a plurality of historical fault data for a plurality of historical moments in a period of application fault to be predicted includes:

the method comprises the steps of collecting historical fault data of fault time when an application to be predicted breaks down, collecting historical fault data of a plurality of historical time points of a first preset time period before the fault time, and collecting historical fault data of a plurality of historical time points of a second preset time period after the fault time, wherein the fault comprises at least one of application running, process blocking and process absence, and the historical fault data comprises at least one of central processing unit utilization rate, memory occupation, disk input and output utilization rate, garbage recycling single-time data and garbage recycling period data.

The fault may be generated by natural operation of the application or may be generated by manual simulation, and is not limited herein. For example, five minutes before and after an application exception (application running, process stuck, JVM process not present, etc.) may be marked for CPU, memory, disk IO usage, young GC, full GC log data.

In one embodiment, training the pre-set base prediction model with the training dataset includes:

predicting each history log data in the training data set through a preset basic prediction model to obtain an initial prediction result, wherein the initial prediction result comprises at least one initial fault result at a prediction moment, and the prediction moment is later than the history moment of the history log data;

determining a target loss function according to the labeling result and the initial prediction result of each history log data;

and training a preset basic prediction model according to the target loss function.

In an embodiment, the initial prediction results comprise initial failure results of at least two prediction moments, and the failure prediction model is applied for predicting failure prediction results of at least two future moments of the application to be predicted in the future time period. For example, logistic regression is a linear regression model that uses a gradient descent method to solve parameters by assuming that the data obeys Bernoulli distribution, and then achieves the goal of two classifications.

The Bernoulli distribution (Bernoulli distribution), also known as a two-point distribution or 0-1 distribution, is the simplest discrete probability distribution. The success probability is noted as p (p is more than or equal to 0 and less than or equal to 1), the failure probability is noted as q=1-p, and the following is true:

where P (x) is probability, positive class is 1, negative class is 0, and obviously obeys a 0-1 distribution.

An exemplary model training and prediction process is as follows:

training: the model is trained by training data, i.e. a learning process, i.e. parameters of the model are determined.

And (3) predicting: after training, the model parameters are determined, and a result is obtained when predicted data is input.

The common linear regression y=wx+b, the applied fault prediction model is trained by a training set, i.e. the model parameters w, b are obtained, so that a straight line or hyperplane (x is multidimensional) is determined. Next, for the test set, a data x, w, b has been learned, and the carry-over y=wx+b, a y value, i.e., the predicted value, is obtained.

By the previous linear regression, y=wx+b has been obtained. It is a real number, and the value range of y can be (minus infinity, plus infinity). Now, it is not intended to have its value so large, so it is intended to give the compression to 0,1. Researchers have found that a signomid function can achieve this function. Therefore, the introduction investigated this y with a signomid function.

An exemplary sigmoid function is

The image of the sigmoid function can be compressed, i.e. y=wx+b is brought into the sigmoid (x), see fig. 2. The output of this function is also defined as y, namely:

thus, y is the value of (0, 1), and equation (2) is transformed as follows:

the loss function is a function that measures the difference between the true value and the predicted value. Therefore, it is desirable that the smaller this function is, the better. Here, the minimum loss is 0. Taking the classification (0, 1) as an example: when the true value is 1 and the prediction output of the model is 1, the loss is preferably 0, and the loss is preferably as large as possible when the prediction is 0. Similarly, when the true value is 0 and the prediction output of the model is 0, the loss is preferably 0, and when the prediction is 1, the loss is the largest. Therefore, minimizing the loss function indicates that the smaller the prediction is, the more accurate the prediction is. An example loss function is:

function-based images-log (x), see fig. 3, and-log (1-x) images-after compression can be seen in fig. 4, the prediction y is between 0-1. By using the loss function, the loss is reduced as much as possible, and a good effect can be achieved.

These two losses are combined:

- [ ylog (x) + (1-y) log (1-x) ] formula (5),

wherein y is the label, and 0 and 1 are taken respectively.

Total loss for m samples:

in this equation, m is the number of samples, y is a label, the value 0 or 1, i represents the i-th sample, and f (x) represents the predicted output. J (θ) is the final loss value of the model, and the minimum value is 0, indicating the probability of failure occurrence.

And substituting the data to be predicted into the loss function formula (6) to solve in the stage of the test model, so as to obtain a predicted value.

Referring to fig. 5, fig. 5 is a flow chart illustrating a specific big data based failure prediction method according to the present invention, as shown in fig. 5, the specific method includes:

and continuously collecting system logs and application logs of each node of the application system, CPU utilization rate, memory occupation condition and JVM GC frequency three-dimensional data. Firstly, collecting CPU utilization rate, memory occupation condition and JVM GC frequency of each node of an application system. And secondly, marking data, namely marking the data with the CPU utilization rate of more than 80%, the memory occupation of more than 80%, the disk IO utilization rate of more than 80%, the Young GC frequency of more than once per minute or more than 200ms in time, and the full GC frequency of more than once per day or more than 300ms in time. And then collecting and marking IO utilization rate data of a CPU, a memory and a disk when the application fails, and collecting and focusing JVM GC frequencies 5 minutes before and after the failure. Then, the model is trained, and the acquired data is input into the model for training. And finally, inputting index data of a certain time point of the application node by the test model, and predicting whether the system can fail in a certain time period in the future.

In summary, the application fault prediction method based on big data is provided, history log data of a plurality of history moments of an application to be predicted are collected and marked, the marked history log data are used as a training set to train a preset basic prediction model to obtain an application fault prediction model for predicting a fault prediction result of the application to be predicted in a future time period, the obtained current log data of the application to be predicted are input into the application fault prediction model to obtain a fault prediction result, the fault prediction result comprises one or more prediction results of whether the application to be predicted will fail in the future time, prediction of the application fault can be actively performed in the mode, an active defense scheme of application fault prediction is provided, the application fault prediction can be changed into active and prevented, and operation and maintenance efficiency is improved.

As shown in fig. 6, the present application further provides an application failure prediction device based on big data, where the device includes:

an obtaining module 601, configured to obtain current log data of an application to be predicted;

the result prediction module 602 is configured to input current log data into an application fault prediction model to obtain a fault prediction result of an application to be predicted;

the model training module 603 is configured to collect historical log data of multiple historical moments of an application to be predicted, and annotate the historical log data to obtain annotation results of each historical log data, where the annotation result of at least one historical log data is a fault, and train a preset basic prediction model through the training data set with the annotated historical log data as a training data set, so as to obtain an application fault prediction model for predicting a fault prediction result of the application to be predicted in a future time period.

Therefore, the present embodiment provides a scheme for predicting the application fault based on big data, which can actively predict the application fault, provides an active defense scheme for predicting the application fault, can change the passive mode into the active mode, prevents the situation from happening, and improves the operation and maintenance efficiency.

In summary, the application fault prediction device based on big data is provided, history log data of a plurality of history moments of an application to be predicted are collected and marked, the marked history log data are used as a training set to train a preset basic prediction model, an application fault prediction model for predicting a fault prediction result of the application to be predicted in a future time period is obtained, the obtained current log data of the application to be predicted is input into the application fault prediction model, the fault prediction result is obtained, prediction of an application fault can be actively performed through the mode, an active defense scheme of application fault prediction is provided, the application fault prediction can be changed into active one, the application fault prediction is prevented from being happened, and the operation and maintenance efficiency is improved.

It should be noted that, the big data based application fault prediction device provided in the foregoing embodiment and the big data based application fault prediction method provided in the foregoing embodiment belong to the same concept, and the specific manner in which each module and unit perform the operation has been described in detail in the method embodiment, which is not repeated herein. In practical application, the application fault prediction device based on big data provided in the above embodiment may allocate the functions to different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above, which is not limited herein.

The embodiment of the application also provides an application fault prediction device based on big data, which comprises the following steps: one or more processors; and a storage means for storing one or more programs that, when executed by the one or more processors, cause the big data based application failure prediction device to implement the big data based application failure prediction method provided in the above embodiments.

Fig. 7 shows a schematic structural diagram of a computer apparatus suitable for use in implementing the big data based application failure prediction device of the embodiments of the present application. It should be noted that, the computer system 1000 of the big data based application failure prediction device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 7, the computer system 1000 includes a central processing unit (Central Processing Unit, CPU) 1001 which can perform various appropriate actions and processes according to a program stored in a Read-only memory (ROM) 1002 or a program loaded from a storage section 1008 into a random access memory (Random Access Memory, RAM) 1003, for example, performing the method described in the above embodiment. In the RAM1003, various programs and data required for system operation are also stored. The CPU 1001, ROM 1002, and RAM1003 are connected to each other by a bus 1004. An Input/Output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN (Local AreaNetwork ) card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed on the drive 1010 as needed, so that a computer program read out therefrom is installed into the storage section 1008 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. When executed by a Central Processing Unit (CPU) 1001, the computer program performs various functions defined in the apparatus of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

Another aspect of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the big data based application failure prediction method as described above. The computer-readable storage medium may be contained in the big data based application failure prediction apparatus described in the above embodiment or may exist alone without being assembled into the big data based application failure prediction apparatus.

Another aspect of the present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the big data based application failure prediction method provided in the above embodiments.

The above embodiments are merely illustrative of the principles of the present application and its effectiveness and are not intended to limit the present application. Modifications and variations may be made to the above-described embodiments by those of ordinary skill in the art without departing from the spirit and scope of the present application. It is therefore contemplated that the appended claims will cover all such equivalent modifications and changes as fall within the true spirit and scope of the disclosure.

Claims

1. An application fault prediction method based on big data, the method comprising:

acquiring current log data of an application to be predicted;

inputting the current log data into an application fault prediction model to obtain a fault prediction result of the application to be predicted;

the training mode of the application fault prediction model comprises the steps of collecting historical log data of a plurality of historical moments of the application to be predicted, and marking to obtain marking results of the historical log data, wherein at least one marking result of the historical log data is a fault, the marked historical log data is used as a training data set, and the training data set is used for training a preset basic prediction model to obtain the application fault prediction model for predicting the fault prediction result of the application to be predicted in a future time period.

2. The big data-based application fault prediction method of claim 1, wherein collecting and annotating historical log data of the plurality of historical moments of the application to be predicted to obtain an annotation result of each historical log data, and the method comprises the following steps:

collecting historical operation data of the application to be predicted at a plurality of historical moments, and marking the abnormal state of the historical operation data according to a preset marking standard to obtain marking results of the historical operation data;

and collecting a plurality of historical fault data of a plurality of historical moments in the application fault time period to be predicted, wherein the fault time period comprises the fault moment, a first preset time period before the fault moment and a second preset time period after the fault moment, the historical fault data of the fault moment are marked as faults, the historical fault data of the rest historical moments are marked as normal, marking results of the historical fault data are obtained, and the historical log data comprise the historical operation data and the historical fault data.

3. The big data based application failure prediction method of claim 2, wherein collecting historical operating data for a plurality of historical moments of the application to be predicted comprises:

4. The big data-based application fault prediction method as claimed in claim 2, wherein labeling the abnormal state of the historical operation data according to a preset labeling standard to obtain labeling results of the historical operation data, comprises:

the preset labeling standard comprises at least one of a central processing unit utilization rate being larger than a first preset threshold, a memory occupation being larger than a second preset threshold, a single garbage collection frequency being larger than a third preset threshold, single garbage collection time being larger than a first preset duration, a garbage collection cycle frequency being larger than a fourth preset threshold, and a disk input/output utilization rate being larger than a preset utilization rate, and the historical operation data comprises at least one of the central processing unit utilization rate, the memory occupation, the single garbage collection frequency, the single garbage collection time being, the garbage collection cycle frequency and the disk input/output utilization rate.

5. The big data based application failure prediction method according to claim 2, wherein collecting a plurality of historical failure data at a plurality of historical moments in the application failure period to be predicted includes:

the method comprises the steps of collecting historical fault data of fault time when an application to be predicted breaks down, collecting historical fault data of a plurality of historical time points of a first preset time period before the fault time, and collecting historical fault data of a plurality of historical time points of a second preset time period after the fault time, wherein the fault comprises at least one of application running, process blocking and process absence, and the historical fault data comprises at least one of central processing unit utilization rate, memory occupation, disk input and output utilization rate, garbage collection single-time data and garbage collection period data.

6. The big data based application failure prediction method of any of claims 1-5, wherein training a pre-set base prediction model with the training data set comprises:

predicting each history log data in the training data set through the preset basic prediction model to obtain an initial prediction result, wherein the initial prediction result comprises at least one initial fault result at a prediction moment, and the prediction moment is later than the history moment of the history log data;

determining a target loss function according to the labeling result of each history log data and the initial prediction result;

and training the preset basic prediction model according to the target loss function.

7. The big data based application failure prediction method of claim 6, wherein the initial prediction results include initial failure results at least two prediction moments, and the application failure prediction model is used to predict failure prediction results of at least two future moments of the application to be predicted in a future time period.

8. An application failure prediction apparatus based on big data, the apparatus comprising:

the acquisition module is used for acquiring current log data of the application to be predicted;

the result prediction module is used for inputting the current log data into an application fault prediction model to obtain a fault prediction result of the application to be predicted;

the model training module is used for collecting the historical log data of the plurality of historical moments of the application to be predicted, and labeling the historical log data to obtain labeling results of the historical log data, wherein the labeling result of at least one of the historical log data is a fault, the labeled historical log data is used as a training data set, and the training data set is used for training a preset basic prediction model to obtain an application fault prediction model for predicting the fault prediction result of the application to be predicted in a future time period.

9. An application failure prediction apparatus based on big data, the apparatus comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the apparatus to implement the method of any of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the method of any of claims 1 to 7.