CN110955586A - System fault prediction method, device and equipment based on log - Google Patents

System fault prediction method, device and equipment based on log Download PDF

Info

Publication number
CN110955586A
CN110955586A CN201911181749.XA CN201911181749A CN110955586A CN 110955586 A CN110955586 A CN 110955586A CN 201911181749 A CN201911181749 A CN 201911181749A CN 110955586 A CN110955586 A CN 110955586A
Authority
CN
China
Prior art keywords
data
preset
log
performance
artificial intelligence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911181749.XA
Other languages
Chinese (zh)
Inventor
代朝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201911181749.XA priority Critical patent/CN110955586A/en
Publication of CN110955586A publication Critical patent/CN110955586A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a system fault prediction method, a system fault prediction device and a system fault prediction device based on logs, wherein the method comprises the following steps: acquiring system log data according to a preset data capturing rule; performing data classification on the acquired log data to obtain abnormal data, performance data and service data; and analyzing the change trends of the abnormal data, the performance data and the service data based on a preset artificial intelligence model, and outputting an analysis result, thereby realizing the fault prediction of the system and reducing the operation and maintenance cost of the system.

Description

System fault prediction method, device and equipment based on log
Technical Field
The invention relates to the technical field of data processing, in particular to a system fault prediction method, device and equipment based on logs.
Background
In the prior art, when a computer system or other systems have faults, the type of the system fault is determined according to log analysis results by performing log analysis on the computer system or other systems, and historical operating data of the system is stored in the logs.
Therefore, in the prior art, log analysis of the current system is passive, log analysis is performed only after problems occur in the production process, system parameters and a deployment strategy are adjusted according to an analysis result, and when the problems occur, the problems are solved, so that the operation and maintenance cost of the system is high.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and a device for predicting a system fault based on a log, so as to implement fault prediction of a system.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
a log-based system failure prediction method comprises the following steps:
acquiring system log data according to a preset data capturing rule;
performing data classification on the acquired log data to obtain abnormal data, performance data and service data;
and analyzing the change trends of the abnormal data, the performance data and the service data based on a preset artificial intelligence model, and outputting an analysis result.
Optionally, in the log-based system fault prediction method, analyzing the change trend of the abnormal data based on a preset artificial intelligence model includes:
analyzing the increment of the abnormal data based on a preset artificial intelligence model to obtain the moment when the increment of the abnormal data reaches a preset warning value, wherein the increment of the abnormal data comprises but is not limited to the occurrence frequency of the abnormal data in a preset time period and the difference value of the abnormal data and the value of the normal data.
Optionally, in the log-based system fault prediction method, analyzing the variation trend of the performance data based on a preset artificial intelligence model includes:
analyzing the variation trend of the performance data of the system based on a preset artificial intelligence model to obtain the time when the performance data reaches a preset performance threshold, wherein the performance data comprises but is not limited to a system memory and a CPU utilization rate.
Optionally, in the log-based system fault prediction method, analyzing the change trend of the service data based on a preset artificial intelligence model includes:
analyzing the change trend of the business data of the system based on a preset artificial intelligence model to obtain the change trend of the data volume of various business data.
Optionally, in the log-based system fault prediction method, the obtained performance data is compared with a preset performance threshold, and when the performance data reaches the preset performance threshold, an expansion request is output to the upper-level system, so as to increase system resources of the system.
A log-based system failure prediction apparatus, comprising:
the log data capturing unit is used for acquiring system log data according to a preset data capturing rule;
the data classification unit is used for performing data classification on the acquired log data to obtain abnormal data, performance data and service data;
and the data analysis unit is used for analyzing the change trends of the abnormal data, the performance data and the service data based on a preset artificial intelligence model and outputting an analysis result.
Optionally, in the log-based system failure prediction apparatus, when the data analysis unit analyzes the variation trend of the abnormal data based on a preset artificial intelligence model, the data analysis unit is specifically configured to:
analyzing the increment of the abnormal data based on a preset artificial intelligence model to obtain the moment when the increment of the abnormal data reaches a preset warning value, wherein the increment of the abnormal data comprises but is not limited to the occurrence frequency of the abnormal data in a preset time period and the difference value of the abnormal data and the value of the normal data.
Optionally, in the log-based system failure prediction apparatus, when the data analysis unit analyzes the variation trend of the performance data based on a preset artificial intelligence model, the data analysis unit is specifically configured to:
analyzing the variation trend of the performance data of the system based on a preset artificial intelligence model to obtain the time when the performance data reaches a preset performance threshold, wherein the performance data comprises but is not limited to a system memory and a CPU utilization rate.
Optionally, in the log-based system failure prediction apparatus, when the data analysis unit analyzes the change trend of the service data based on a preset artificial intelligence model, the data analysis unit is specifically configured to:
analyzing the change trend of the business data of the system based on a preset artificial intelligence model to obtain the change trend of the data volume of various business data.
Optionally, in the log-based system failure prediction apparatus, the data analysis unit is further configured to: and comparing the acquired performance data with a preset performance threshold, and outputting an expansion request to the upper-level system when the performance data reaches the preset performance threshold so as to increase system resources of the system.
A log-based system failure prediction device, comprising:
a memory and a processor;
the memory is configured to store program code, and the processor is configured to invoke the program code and, when executed, implement any of the log-based system failure prediction methods described above.
Based on the technical scheme, the technical scheme provided by the embodiment of the invention adopts the preset artificial intelligence model to predict the change trend of the abnormal data, the performance data and the service data to obtain the change trend of the abnormal data, the performance data and the service data in a future period of time, so that the early warning of the system working condition is realized, and the system operation and maintenance personnel can perform targeted management and maintenance on the system according to the change trend of the abnormal data, the performance data and the service data, thereby reducing the operation and maintenance cost of the system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flow chart illustrating a log-based system failure prediction method disclosed in an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating the prediction of abnormal data by the log-based system failure prediction method according to the embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating performance data prediction performed by the log-based system failure prediction method according to the embodiment of the present disclosure;
fig. 4 is a schematic flowchart illustrating a process of predicting service data by the log-based system failure prediction method according to the embodiment of the present application;
FIG. 5 is a schematic structural diagram of a log-based system failure prediction apparatus disclosed in an embodiment of the present application;
fig. 6 is a schematic structural diagram of a log-based system failure prediction device disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the problem that log data are analyzed only when a system fails in the prior art and great loss is caused to production, the application discloses a log-based system failure prediction method, which comprises the following steps:
step S101: acquiring system log data according to a preset data capturing rule;
in the scheme, a plurality of log files are stored in the system, and different log files are used for storing log information of different modules in the system, in the scheme, different data capture rules are configured for each log file in advance, and when the log data are captured, the log data are captured from the log files based on the preset capture rules, wherein the log files include but are not limited to middleware logs (Apache, jboss and the like), application logs, system operation indexes, system operation logs and the like;
step S102: performing data classification on the acquired log data to obtain abnormal data, performance data and service data;
in this step, the data processing module is adopted to carry out data filtering and processing on the collected log data, and the log data processed finally is divided into three parts: abnormal data, performance data, and business data;
the abnormal data can include one or more of the information of the time when the abnormality of the abnormal data occurs, the abnormal content, the abnormal operation, the abnormal level, whether the abnormality is recoverable abnormality and the like besides the abnormal data;
the performance data can comprise the response time of the request data acquired by the system and other data used for representing the response capability of the system to the data besides the system performance data;
the service data may include information such as request time distribution and request frequency of different service data, in addition to the service data itself processed by the system.
In order to facilitate data management, in the technical solution disclosed in the embodiment of the present application, a data table corresponding to the abnormal data, the performance data, and the service data one to one is further configured, and the abnormal data, the performance data, and the service data may be stored in the data table corresponding to the abnormal data, the performance data, and the service data.
Step S103: analyzing the change trends of the abnormal data, the performance data and the service data based on a preset artificial intelligence model, and outputting an analysis result;
the artificial intelligence model is a commonly used data processing and prediction scheme in the prior art and is obtained by constructing a basic model and then training the basic model by adopting training data. Based on different processed data objects, the basic models preloaded by the artificial intelligence models are different, but the training processes are basically the same, before the scheme is executed, the artificial intelligence models corresponding to the abnormal data, the performance data and the service data one by one can be pre-constructed, after the abnormal data, the performance data and the service data are obtained, the abnormal data, the performance data and the service data are loaded into the corresponding preset artificial intelligence models, the preset artificial intelligence models are adopted to predict the variation trend of the abnormal data, the performance data and the service data, and the variation trend of the abnormal data, the performance data and the service data in a future period of time is obtained, so that early warning of the system working condition is realized, and system operation and maintenance personnel can manage and maintain the system in a targeted manner according to the variation trend of the abnormal data, the performance data and the service data, the operation and maintenance cost of the system is reduced.
Further, in the technical solution disclosed in the embodiment of the present application, the abnormal data are different in type, and the standard for monitoring the abnormal data is also different, and generally speaking, the abnormal data may be classified into size monitoring of an abnormal data value and frequency monitoring of occurrence of the abnormal data, for this reason, when the abnormal data is analyzed by using the preset artificial intelligence model, the abnormal data may be classified in advance according to a monitoring object of the abnormal data, the preset artificial intelligence model analyzes the abnormal data of the same type each time, and an output result of the preset artificial intelligence model may include a growth rate of the abnormal data value of the abnormal data input to the preset artificial intelligence model this time and the frequency of occurrence of the abnormal data.
Referring to fig. 2, in the above method, analyzing the variation trend of the abnormal data based on a preset artificial intelligence model may include:
step S1031, analyzing the increment of the loaded abnormal data based on a preset artificial intelligence model, and predicting to obtain an increment trend curve of the abnormal data in a future preset time;
step S1032: and acquiring the moment when the increment of the abnormal data reaches a preset warning value from the increment curve of the abnormal data.
The increase amount of the abnormal data includes, but is not limited to, the frequency of the abnormal data, i.e., the number of occurrences of the abnormal data within a preset time period, and the difference between the value of the abnormal data and the value of the normal data. In the scheme, the occurrence frequency growth trend of the loaded abnormal data and the change trend of the size of the abnormal data are predicted through the preset artificial intelligence model, the change trend of the abnormal data in a certain time period in the future is obtained, the time when the value of the abnormal data reaches the preset warning value and the occurrence frequency of the abnormal data reaches the preset warning frequency is predicted, the system fault is predicted, and the system can be effectively prevented from being shut down due to faults.
In this scheme, as the amount of data processed by the system is larger and larger, the resource requirements of the system, such as memory, CPU, and the like, are also higher and higher, in this scheme, the fault not only refers to a system fault caused by a data processing error in a computer data processing process, but also includes a fault caused by insufficient system data processing capability, for example, a response timeout of some data or request is caused by too slow system data processing speed, and therefore, in this scheme, the system performance may be detected to predict a change trend of the system to the processing speed and the response speed of the data, in this case, referring to fig. 3, in the above scheme, analyzing the change trend of the performance data based on the preset artificial intelligence model may include:
step S1033: analyzing the change trend of the performance data of the system based on a preset artificial intelligence model, and predicting to obtain a change trend curve of the performance data in a future preset time;
step S1034: and acquiring the time when the increase of the performance data reaches a preset performance threshold value from the increase trend curve of the performance data.
The performance data includes, but is not limited to, a system memory and a CPU usage rate, that is, when the performance data is predicted, the performance data also needs to be classified in advance, and each time the same type of performance data is loaded into the preset artificial intelligence model, for example, the performance data used for representing the memory usage rate of the system is loaded into the preset artificial intelligence model, a change trend of the memory usage rate of the system is predicted by using the preset artificial intelligence model, an increase trend of the memory usage rate of the system is obtained, and a time when the memory usage rate of the system reaches the memory usage rate is obtained based on the increase trend; loading performance data for representing the CPU utilization rate of the system to the preset artificial intelligence model, predicting the change trend of the CPU utilization rate of the system by adopting the preset artificial intelligence model to obtain the increase trend of the CPU utilization rate of the system, and obtaining the moment when the CPU utilization rate of the system reaches the memory utilization rate based on the increase trend.
By adopting the scheme, the trend of the user demand can be predicted according to the change trend of the business data, so that the system end can reasonably distribute system resources, and similarly, in the scheme, when the change trend of the business data is predicted, the business data also needs to be classified, the same type of business data is loaded into the preset artificial intelligence model at each time, and the analysis of the change trend of the business data based on the preset artificial intelligence model comprises the following steps: the change trend of the business data of the system is analyzed based on a preset artificial intelligence model to obtain the change trend of the data volume of various business data, the change trend of the data volume can indicate the change condition of the business data required by a user in a period of time in the future, and the system can reasonably adjust the system resources occupied by each business according to the change trend of the business data.
Further, in the technical solution disclosed in the embodiment of the present application, when analyzing the performance data and detecting that the performance data of the system has reached the preset performance threshold, an expansion request may be sent to a superior system to increase system resources of the system to which the method is applied, that is, referring to fig. 4, the method may further include:
step S104: comparing the acquired performance data with a preset performance threshold value, and judging whether the performance data is greater than the preset performance threshold value; if yes, go to step S105;
step S105: when the performance data reaches the preset performance threshold, outputting an expansion request to an upper-level system so as to increase system resources of the system;
in this scheme, the determining whether the performance data is greater than the preset performance threshold may refer to a case that the performance data acquired within a preset time period is greater than the preset performance threshold, or a case that an average value of the performance data acquired within the preset time period is greater than the preset performance threshold.
Corresponding to the above method, the present application also discloses a log-based system failure prediction apparatus, and in this embodiment, specific working contents of each unit of the log-based system failure prediction apparatus refer to the contents of the above method embodiment, and the log-based system failure prediction apparatus provided in the embodiment of the present invention is described below, and the log-based system failure prediction apparatus described below and the log-based system failure prediction method described above may be referred to correspondingly.
Referring to fig. 5, the log-based system failure prediction apparatus disclosed in the embodiment of the present application may include:
a log data capture unit 100, a data classification unit 200, and a data analysis unit 300;
corresponding to the above method, the log data capture unit 100 is configured to obtain system log data according to a preset data capture rule;
corresponding to the above method, the data classification unit 200 is configured to perform data classification on the acquired log data to obtain abnormal data, performance data, and service data, specifically, the abnormal data may include, in addition to the abnormal data itself, one or more of information such as time of occurrence of an abnormality of the abnormal data, abnormal content, abnormal operation, abnormal level, and whether the abnormality is a recoverable abnormality; the performance data can comprise the response time of the request data acquired by the system and other data used for representing the response capability of the system to the data besides the system performance data; the service data may include information such as request time distribution and request frequency of different service data, in addition to the service data itself processed by the system. In this scheme, in order to facilitate data management, the data classification unit 200 is further configured with a data table corresponding to the abnormal data, the performance data, and the service data one to one, and the abnormal data, the performance data, and the service data may all be stored in the data table corresponding to them;
corresponding to the above method, the data analysis unit 300 is configured to analyze the variation trend of the abnormal data, the performance data, and the service data based on a preset artificial intelligence model, and output an analysis result.
In this embodiment, the data analysis unit 300 stores therein artificial intelligence models corresponding to the abnormal data, the performance data, and the service data one by one, after the abnormal data, the performance data and the service data are obtained, loading the abnormal data, the performance data and the service data into a corresponding preset artificial intelligence model, adopting the preset artificial intelligence model to predict the change trend of the abnormal data, the performance data and the service data, the abnormal data, the performance data and the change trend of the service data in a period of time in the future can be obtained, therefore, early warning on the system working condition is realized, and system operation and maintenance personnel can perform targeted management and maintenance on the system according to the change trend of the abnormal data, the performance data and the service data, so that the operation and maintenance cost of the system is reduced.
Corresponding to the above method, when the data analysis unit analyzes the variation trend of the abnormal data based on a preset artificial intelligence model, the abnormal data may be classified in advance according to a monitoring object of the abnormal data, the preset artificial intelligence model analyzes the abnormal data of the same type each time, an output result of the preset artificial intelligence model may include a growth rate of an abnormal data value of the abnormal data input into the preset artificial intelligence model this time and an occurrence frequency of the abnormal data, and the analyzing, by the data analysis unit 300, the variation trend of the abnormal data based on the preset artificial intelligence model may include: analyzing the increment of the loaded abnormal data based on a preset artificial intelligence model, and predicting to obtain the moment when the increment of the abnormal data reaches a preset warning value, wherein the increment of the abnormal data comprises but is not limited to the frequency of abnormal data occurrence, namely the frequency of abnormal data occurrence in a preset time period, and the difference value of the abnormal data and the value of normal data. The data analysis unit 300 predicts the occurrence frequency growth trend of the loaded abnormal data and the change trend of the size of the abnormal data through the preset artificial intelligence model to obtain the change trend of the abnormal data within a certain period of time in the future, and predicts the time when the value of the abnormal data reaches the preset warning value and the occurrence frequency reaches the preset warning frequency, so that the system fault is predicted, and the system can be effectively prevented from being shut down due to the fault.
Corresponding to the above method, when the data analysis unit analyzes the variation trend of the performance data based on a preset artificial intelligence model, the data analysis unit is specifically configured to:
analyzing the variation trend of the performance data of the system based on a preset artificial intelligence model to obtain the time when the performance data reaches a preset performance threshold value, wherein the performance data comprises but is not limited to system memory and CPU utilization rate, that is, as in the above-described method embodiment, the data analysis unit, when predicting the performance data, the performance data also needs to be classified in advance, each time the same type of performance data is loaded into the preset artificial intelligence model, for example, loading performance data for representing the memory usage rate of the system to the preset artificial intelligence model, predicting the change trend of the memory usage rate of the system by using the preset artificial intelligence model to obtain the increase trend of the memory usage rate of the system, and obtaining the time when the memory usage rate of the system reaches the memory usage rate based on the increase trend; loading performance data for representing the CPU utilization rate of the system to the preset artificial intelligence model, predicting the change trend of the CPU utilization rate of the system by adopting the preset artificial intelligence model to obtain the increase trend of the CPU utilization rate of the system, and obtaining the moment when the CPU utilization rate of the system reaches the memory utilization rate based on the increase trend.
Corresponding to the above method, when the data analysis unit analyzes the change trend of the service data based on a preset artificial intelligence model, the data analysis unit is specifically configured to:
the change trend of the business data of the system is analyzed based on a preset artificial intelligence model to obtain the change trend of the data volume of various business data, the change trend of the data volume can indicate the change condition of the business data required by a user in a period of time in the future, and the system can reasonably adjust the system resources occupied by each business according to the change trend of the business data. As in the foregoing method embodiment, when the service data is obtained, the data analysis unit may also classify the service data first.
Further, in the above scheme disclosed in the embodiment of the present application, the data analysis unit may be further configured to: and comparing the acquired performance data with a preset performance threshold, and outputting an expansion request to the upper-level system when the performance data reaches the preset performance threshold so as to increase system resources of the system.
Further, referring to fig. 5, the method disclosed in the embodiment of the present application may further include a log display system, where the log display system is configured to display the prediction result of the data analysis unit 300.
Correspondingly, the present application also discloses a log-based system failure prediction device, referring to fig. 6, the device may include:
a memory 400 and a processor 500;
the jamming device further comprises a communication interface 600 and a communication bus 700, wherein the memory 400, the processor 500 and the communication interface 600 are all in communication with each other through the communication bus 700.
The memory 400 is used for storing program codes; the program code includes computer operational instructions.
Memory 400 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 500 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The processor 500 is configured to call the program code, and when the program code is executed, is configured to perform the method according to any of the embodiments of the present application.
For convenience of description, the above system is described with the functions divided into various modules, which are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the invention.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A log-based system failure prediction method is characterized by comprising the following steps:
acquiring system log data according to a preset data capturing rule;
performing data classification on the acquired log data to obtain abnormal data, performance data and service data;
and analyzing the change trends of the abnormal data, the performance data and the service data based on a preset artificial intelligence model, and outputting an analysis result.
2. The log-based system failure prediction method of claim 1, wherein analyzing the variation trend of the abnormal data based on a preset artificial intelligence model comprises:
analyzing the increment of the abnormal data based on a preset artificial intelligence model to obtain the moment when the increment of the abnormal data reaches a preset warning value, wherein the increment of the abnormal data comprises but is not limited to the occurrence frequency of the abnormal data in a preset time period and the difference value of the abnormal data and the value of the normal data.
3. The log-based system failure prediction method of claim 1, wherein analyzing the trend of the performance data based on a preset artificial intelligence model comprises:
analyzing the variation trend of the performance data of the system based on a preset artificial intelligence model to obtain the time when the performance data reaches a preset performance threshold, wherein the performance data comprises but is not limited to the system memory utilization rate and the CPU utilization rate.
4. The log-based system failure prediction method of claim 1, wherein analyzing the change trend of the business data based on a preset artificial intelligence model comprises:
analyzing the change trend of the business data of the system based on a preset artificial intelligence model to obtain the change trend of the data volume of various business data.
5. The log-based system failure prediction method of claim 1, further comprising:
and comparing the acquired performance data with a preset performance threshold, and outputting an expansion request to an upper-level system when the performance data reaches the preset performance threshold.
6. A log-based system failure prediction apparatus, comprising:
the log data capturing unit is used for acquiring system log data according to a preset data capturing rule;
the data classification unit is used for performing data classification on the acquired log data to obtain abnormal data, performance data and service data;
and the data analysis unit is used for analyzing the change trends of the abnormal data, the performance data and the service data based on a preset artificial intelligence model and outputting an analysis result.
7. The log-based system failure prediction device of claim 6, wherein the data analysis unit, when analyzing the variation trend of the abnormal data based on a preset artificial intelligence model, is specifically configured to:
analyzing the increment of the abnormal data based on a preset artificial intelligence model to obtain the moment when the increment of the abnormal data reaches a preset warning value, wherein the increment of the abnormal data comprises but is not limited to the occurrence frequency of the abnormal data in a preset time period and the difference value of the abnormal data and the value of the normal data.
8. The log-based system failure prediction device of claim 6, wherein the data analysis unit, when analyzing the variation trend of the performance data based on a preset artificial intelligence model, is specifically configured to:
analyzing the variation trend of the performance data of the system based on a preset artificial intelligence model to obtain the time when the performance data reaches a preset performance threshold, wherein the performance data comprises but is not limited to a system memory and a CPU utilization rate.
9. The log-based system failure prediction device of claim 6, wherein the data analysis unit is further configured to:
and comparing the acquired performance data with a preset performance threshold, and outputting an expansion request to an upper-level system when the performance data reaches the preset performance threshold.
10. A log-based system failure prediction device, comprising:
a memory and a processor;
the memory is configured to store program code, and the processor is configured to invoke the program code and, when executed, to implement the log-based system failure prediction method of any of claims 1-5.
CN201911181749.XA 2019-11-27 2019-11-27 System fault prediction method, device and equipment based on log Pending CN110955586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911181749.XA CN110955586A (en) 2019-11-27 2019-11-27 System fault prediction method, device and equipment based on log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911181749.XA CN110955586A (en) 2019-11-27 2019-11-27 System fault prediction method, device and equipment based on log

Publications (1)

Publication Number Publication Date
CN110955586A true CN110955586A (en) 2020-04-03

Family

ID=69977035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911181749.XA Pending CN110955586A (en) 2019-11-27 2019-11-27 System fault prediction method, device and equipment based on log

Country Status (1)

Country Link
CN (1) CN110955586A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861021A (en) * 2020-07-28 2020-10-30 中国联合网络通信集团有限公司 Business risk prediction method, device, equipment and computer readable storage medium
CN112346948A (en) * 2020-11-27 2021-02-09 中国农业银行股份有限公司 Log monitoring method and device
CN112860523A (en) * 2021-03-16 2021-05-28 中国工商银行股份有限公司 Fault prediction method and device for batch job processing and server
CN112882898A (en) * 2021-02-24 2021-06-01 上海浦东发展银行股份有限公司 Anomaly detection method, system, device and medium based on big data log analysis
CN114138601A (en) * 2021-11-26 2022-03-04 北京金山云网络技术有限公司 Service alarm method, device, equipment and storage medium
WO2022105685A1 (en) * 2020-11-17 2022-05-27 中兴通讯股份有限公司 Memory management method and device for optical transmission device, and storage medium
CN115225470A (en) * 2022-07-28 2022-10-21 天翼云科技有限公司 Business abnormity monitoring method and device, electronic equipment and storage medium
CN115329900A (en) * 2022-10-12 2022-11-11 北京安帝科技有限公司 Abnormal event mining method and system for massive industrial control network log data
CN115981986A (en) * 2023-03-21 2023-04-18 北京淘友天下技术有限公司 User behavior scene reproduction method in app
CN116755910A (en) * 2023-08-16 2023-09-15 中移(苏州)软件技术有限公司 Host machine high availability prediction method and device based on cold start and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256219A (en) * 2017-04-24 2017-10-17 卡斯柯信号有限公司 Big data convergence analysis method applied to automatic train control system massive logs
CN108521433A (en) * 2018-05-29 2018-09-11 广西电网有限责任公司 A kind of monitoring of key message infrastructure security and early warning system based on artificial intelligence
CN109669837A (en) * 2018-10-31 2019-04-23 平安科技(深圳)有限公司 Equipment state method for early warning, system, computer installation and readable storage medium storing program for executing
CN110321371A (en) * 2019-07-01 2019-10-11 腾讯科技(深圳)有限公司 Daily record data method for detecting abnormality, device, terminal and medium
US20190324831A1 (en) * 2017-03-28 2019-10-24 Xiaohui Gu System and Method for Online Unsupervised Event Pattern Extraction and Holistic Root Cause Analysis for Distributed Systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190324831A1 (en) * 2017-03-28 2019-10-24 Xiaohui Gu System and Method for Online Unsupervised Event Pattern Extraction and Holistic Root Cause Analysis for Distributed Systems
CN107256219A (en) * 2017-04-24 2017-10-17 卡斯柯信号有限公司 Big data convergence analysis method applied to automatic train control system massive logs
CN108521433A (en) * 2018-05-29 2018-09-11 广西电网有限责任公司 A kind of monitoring of key message infrastructure security and early warning system based on artificial intelligence
CN109669837A (en) * 2018-10-31 2019-04-23 平安科技(深圳)有限公司 Equipment state method for early warning, system, computer installation and readable storage medium storing program for executing
CN110321371A (en) * 2019-07-01 2019-10-11 腾讯科技(深圳)有限公司 Daily record data method for detecting abnormality, device, terminal and medium

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111861021A (en) * 2020-07-28 2020-10-30 中国联合网络通信集团有限公司 Business risk prediction method, device, equipment and computer readable storage medium
CN111861021B (en) * 2020-07-28 2024-07-05 中国联合网络通信集团有限公司 Service risk prediction method, device, equipment and computer readable storage medium
WO2022105685A1 (en) * 2020-11-17 2022-05-27 中兴通讯股份有限公司 Memory management method and device for optical transmission device, and storage medium
CN112346948A (en) * 2020-11-27 2021-02-09 中国农业银行股份有限公司 Log monitoring method and device
CN112882898A (en) * 2021-02-24 2021-06-01 上海浦东发展银行股份有限公司 Anomaly detection method, system, device and medium based on big data log analysis
CN112882898B (en) * 2021-02-24 2022-07-19 上海浦东发展银行股份有限公司 Anomaly detection method, system, device and medium based on big data log analysis
CN112860523A (en) * 2021-03-16 2021-05-28 中国工商银行股份有限公司 Fault prediction method and device for batch job processing and server
CN114138601A (en) * 2021-11-26 2022-03-04 北京金山云网络技术有限公司 Service alarm method, device, equipment and storage medium
CN115225470B (en) * 2022-07-28 2023-10-13 天翼云科技有限公司 Business abnormality monitoring method and device, electronic equipment and storage medium
CN115225470A (en) * 2022-07-28 2022-10-21 天翼云科技有限公司 Business abnormity monitoring method and device, electronic equipment and storage medium
CN115329900A (en) * 2022-10-12 2022-11-11 北京安帝科技有限公司 Abnormal event mining method and system for massive industrial control network log data
CN115329900B (en) * 2022-10-12 2023-01-24 北京安帝科技有限公司 Abnormal event mining method and system for massive industrial control network log data
CN115981986B (en) * 2023-03-21 2023-07-14 北京淘友天下技术有限公司 User behavior scene reproduction method in app
CN115981986A (en) * 2023-03-21 2023-04-18 北京淘友天下技术有限公司 User behavior scene reproduction method in app
CN116755910A (en) * 2023-08-16 2023-09-15 中移(苏州)软件技术有限公司 Host machine high availability prediction method and device based on cold start and electronic equipment
CN116755910B (en) * 2023-08-16 2023-11-03 中移(苏州)软件技术有限公司 Host machine high availability prediction method and device based on cold start and electronic equipment

Similar Documents

Publication Publication Date Title
CN110955586A (en) System fault prediction method, device and equipment based on log
CN111212038B (en) Open data API gateway system based on big data artificial intelligence
CN110708204B (en) Abnormity processing method, system, terminal and medium based on operation and maintenance knowledge base
CN109992473B (en) Application system monitoring method, device, equipment and storage medium
CN110009347B (en) Block chain transaction information auditing method and device
CN111651595A (en) Abnormal log processing method and device
CN109376981B (en) Data processing mode determining method and device, server and data processing method
CN112148561A (en) Service system running state prediction method and device and server
CN108390793A (en) A kind of method and device of analysis system stability
CN114911615B (en) Intelligent prediction scheduling method and application during micro-service running
CN112948223B (en) Method and device for monitoring running condition
CN114598621B (en) Power communication network reliability evaluation system
CN113992602B (en) Cable monitoring data uploading method, device, equipment and storage medium
CN113123955B (en) Plunger pump abnormity detection method and device, storage medium and electronic equipment
CN109462510B (en) CDN node quality evaluation method and device
CN113468014A (en) Abnormity detection method and device for operation and maintenance data
CN114116128B (en) Container instance fault diagnosis method, device, equipment and storage medium
CN113992378B (en) Security monitoring method and device, electronic equipment and storage medium
CN111885159B (en) Data acquisition method and device, electronic equipment and storage medium
CN110413573B (en) Log storage control method and device, computer equipment and storage medium
CN113626289A (en) User activity monitoring method and device
CN112162827A (en) Cloud platform container descending and matching method, device and system and storage medium
CN111724048A (en) Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering
CN111258866A (en) Computer performance prediction method, device, equipment and readable storage medium
CN114422332B (en) Network slice control method, device, processing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination