CN103761173A - Log based computer system fault diagnosis method and device - Google Patents

Log based computer system fault diagnosis method and device Download PDF

Info

Publication number
CN103761173A
CN103761173A CN201310740549.XA CN201310740549A CN103761173A CN 103761173 A CN103761173 A CN 103761173A CN 201310740549 A CN201310740549 A CN 201310740549A CN 103761173 A CN103761173 A CN 103761173A
Authority
CN
China
Prior art keywords
daily record
fault
log
word
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310740549.XA
Other languages
Chinese (zh)
Inventor
邹德清
金海�
秦昊
羌卫中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201310740549.XA priority Critical patent/CN103761173A/en
Publication of CN103761173A publication Critical patent/CN103761173A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a log management and fault diagnosis method and device under multiple hosts. The method comprises fault log collection, fault log analysis and fault log correlation analysis. Fault log collection comprises collecting fault logs of all hardware and software in the cluster and storing the logs in a log server uniformly. Fault log analysis comprises filtering the fault logs, extracting log template information and classifying the logs according to log types. Fault log correlation analysis comprises performing fault reason analysis by using log analysis results and combining time windows, clustering the related faults caused by the same fault into a category and trying to find the root of the type of the faults. The method and the device are capable of analyzing the system operation condition effectively, assisting administrators to process system faults and improving the fault type determination accuracy.

Description

A kind of computer system method for diagnosing faults and device based on daily record
Technical field
The invention belongs to the fault analysis field under computer system security, more specifically, relate to a kind of computer system method for diagnosing faults and device based on daily record.
Background technology
Along with the introducing of cloud computing structure, computer system becomes and becomes increasingly complex, and running software is thereon also more and more abundanter, and more frequent alternately, coupling is high, makes the reason of the system failure and kind also be difficult to analyze.In order to solve some crucial application harsh requirements to system processing power and fault-tolerant ability, therefore must monitor and record its daily behavior.So that after there is mistake, can find out the reason that fault occurs in time to diagnosing malfunction and location, thereby the performance of the mistake of repair system, raising system guarantees that similar fault can not occur again.
In a complete infosystem, log system is a very important ingredient.Daily record is the real picture of computer system running orbit, and it is widely used in system debug, monitoring and safety detection.Log management and analysis are the infrastructure of system management and intrusion detection, are evaluating system operation conditionss, the necessary means of supervising network security strategy validity.By system journal, not only can detect and analytic system fault, can also monitoring system situation.Optimization system performance, adjustment System behavior.
When daily record is analyzed, conventionally there are 3 main stages: (1) daily record is filtered, filter the daily record irrelevant with fault; (2) daily record fault analysis, from finding the type of fault in the middle of daily record, understands the wrong process occurring in detail; (3) log correlation analysis, the daily record in same fault source is assigned to same group, and analysis of failure is at the mechanism of transmission of different nodes and inter-module.
The existing diagnostic method based on daily record exists the problem of several respects:
(1) need a large amount of manual interventions.Because the fault diagnosis of the classification of daily record is all the center that is judged as with people, rely on merely the automatic processing of computing machine to be difficult to the result that obtains wanting.Therefore traditional fault diagnosis system tends to rely on expert system, need to be that it is by basic data collection by a large amount of artificial treatment, computing machine just can produce more satisfactory result through learning for a long time after correction, and this has wasted a large amount of system managers' time;
(2) response speed is slow.Current analytic system is the determinating mode based on a set of complexity normally, is difficult in real time the fault in system be responded;
(3) degree of accuracy is low.The accuracy of classifying for daily record is often lower, and lacks the algorithm of revising erroneous judgement.
Summary of the invention
For the defect of prior art, the object of the invention is to rapidly the daily record producing in computer system be classified, and only need less manual intervention, just can realize high-precision classification.In addition, can also revise at any time the result of classification, the new classificating knowledge of convenient study.
For achieving the above object, the present invention proposes a kind of computer system method for diagnosing faults based on daily record, comprise the following steps:
(S1) fault log analysis: the daily record in computer system is carried out to real-time analysis, utilize the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;
(S2) fault log is associated: utilizing the result of fault log analysis binding time window to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.
The invention allows for a kind of computer system trouble-shooter based on daily record, comprise with lower module:
Fault log analysis module, for the daily record of computer system is carried out to real-time analysis, utilizes the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;
Fault log relating module, for utilizing result the binding time window of fault log analysis to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.
Compared with prior art, system of the present invention has following beneficial effect:
(1) adopted fail close keyword matrix to store the result of machine learning, can easily to daily record, carry out failure modes rapidly, also can revise at any time judgment rule and add new fault type.Compare and improved processing speed with traditional approach, saved the time relearning;
(2) proposed new daily record sorting technique, can to daily record, according to different fault types, classify accurately;
(3) adopt improved time-based daily record correlation analysis, utilize the fault type of daily record to instruct correlation analysis, improved the accuracy of classification, reduced rate of false alarm and rate of failing to report;
(4) by fail close keyword matrix, preserve the time window of different faults type, can be real-time modify, has improved the learning ability to new fault type.
Accompanying drawing explanation
Fig. 1 is the structural representation of the system fault diagnosis device based on fault log for multi-host environment of the embodiment of the present invention;
Fig. 2 is the process flow diagram of the daily record Fault Classification of the embodiment of the present invention;
Fig. 3 is the process flow diagram of the time-based fault correlation analytical approach of the embodiment of the present invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
As shown in Figure 1, the invention provides a kind of system fault diagnosis device that is applicable to large-scale cloud platform.This device can be collected the daily record producing in whole computer system, daily record is added up and is learnt, and mark the fault type of each daily record.When having fault to occur, keeper can click corresponding daily record, and system can find with this daily record and have associated daily record, according to the order of fault propagation, forms fault tree, helps keeper to find the basic reason of fault diagnosis fault.This device mainly comprises three assemblies:
1) fault log analysis module, carries out real-time analysis to the daily record in computer system, utilizes the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;
2) fault log relating module, utilizes the result of log analysis binding time window to carry out failure reason analysis, is a class, and attempts to find the root of this class fault the dependent failure log aggregation being caused by same fault.
Correspondingly, a kind of computer system method for diagnosing faults based on daily record of the embodiment of the present invention, comprises the following steps:
(S1) fault log analysis: the daily record in computer system is carried out to real-time analysis, utilize the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;
(S2) fault log is associated: utilizing the result of fault log analysis binding time window to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.
As shown in Figure 2, the fault log analysis of the embodiment of the present invention comprises the following steps:
(1) daily record pre-service
Daily record pre-service has comprised two parts, comprises that the daily record to repeating is filtered and insignificant word in every daily record is filtered.Specifically comprise the following steps:
(1-1) daily record repeating producing within certain period in same process is filtered.
By finding the statistics of daily record, the time that this redundancy daily record produces all concentrates in together, and the process of generation should be all identical.So carrying out pretreated time, filtering the similar daily record that same process produces within certain period.The concrete time period is selected different threshold values according to different computer systems.Daily record and last daily record that certain equipment produces ought be detected just the same, and the time producing in the threshold values of stipulating, just remove this daily record.If occurred that in time threshold values different daily record or same log have exceeded time range, so just retained this daily record.
When concrete application, need to select a suitable threshold values to the time period that can merge, time span is too large else if, and the similar log information that different faults may be produced, to filtering out, causes and fails to report.In the present embodiment, because the filtration is here just as pre-service, for guaranteeing not fail to report, in judgement, can only the duplicate daily record occurring continuously be filtered out, it is looser that the time threshold values of this spline filter just can design.
(1-2) for nonsensical word, use English to stop vocabulary and filter, only retain the word that really has real justice.
For this nonsensical word of the function word in English, can use English to stop vocabulary and filter, this table has comprised the most of function word in English.Consider in addition the singularity of daily record, further, can also filter most adjective and adverbial word, only retain the word that really has real justice.
2) extract daily record invariant, also by filtering out the variable in daily record text, extract the central structural information of daily record.
After daily record is carried out to pre-service, next need to extract the Template Information of daily record, Template Information is further to extract in daily record and the result of fault related content, Templated daily record will be used for carrying out the study of regulation, after templating, can reduce solution space, the fail close keyword matrix update finally obtaining is simplified, in addition, the daily record that template produces in the time of also can being used for matching system operation, carries out Fast Classification.
Variable in daily record refers to the word that can change in the middle of daily record text, comprises numeral, IP address, memory address, catalogue, filename, program name, port etc.
Particularly, the process of extraction daily record invariant is divided into two steps.
(2-1) adopt regular expression to remove numeral, catalogue, IP address, the memory address in daily record text.These are parts of easily removing, and wherein, deleted element adopts the symbol with different meanings to replace.Succession because word on grammer exists, simultaneously in order to distinguish different positions, by symbolic substitution, more can reflect the raw information of daily record, reduces distortion in leaching process.
(2-2) adopt the method based on word frequency statistics further log information to be screened, leave the Template Information of daily record.
In the present embodiment, the described method based on word frequency statistics can adopt improved TF-IDF to calculate the importance that each word accounts for whole daily record, and detailed process is as follows:
The word frequency here (term frequency, TF) refers to the frequency that some given words occur in all daily records of this equipment output.This numeral is the normalization to word number (term count), to prevent the long file of its deflection.(same word may have higher word number than short essay part in long article part, and no matter whether important this word is.) for the word t in a certain particular device i, its importance can be expressed as:
tf i , j = n i , j Σ k n k , j
N in above formula i, jthat this word is at equipment d jin occurrence number, denominator is at equipment d jin the occurrence number sum of all words.
Reverse file frequency (inverse document frequency, IDF) is the tolerance of a word general importance.A certain particular words t iiDF, can be by this equipment d jthe total daily record number producing is divided by the number of the daily record that comprises this word, then the business who obtains is taken the logarithm and obtained:
idf i = log | D | | { j : t i ∈ d j } |
Wherein:
| D|: the sum that this equipment produces
| { j:t i∈ d j|: comprise word t idaily record number (be n i, j≠ 0 number of files) if it is zero that this word not in daily record, will cause dividend, therefore generally use 1+|{j:t i∈ d j|, then:
tfidf i,j=tf i,j×idf i
By calculating TF-IDF, can obtain the importance that each word accounts for whole daily record, by predefined threshold values, daily record be screened afterwards.
(3) daily record Template Information filters
Before carrying out artificial classification, consider that template number is still huger, the present embodiment has taked the mode of automatic cluster to sort out daily record, further dwindles the space that needs artificial judgment.
In the automatic cluster stage, the content daily record similar with form can be put in the middle of a classification, in manual sort, only need to judge and to carry out trickle adjustment just passable each class like this.
Automatic classification has adopted DBSCAN algorithm to carry out automatic cluster to daily record template here, and the benefit of this algorithm maximum is to set in advance the quantity of clustering cluster, can be as required when classifying cluster dividing class automatically.
When chosen distance formula, consider the singularity of daily record text, adopted editing distance to investigate two distances between daily record.In concrete calculating, editing distance is revised: in editing distance, added coefficient, reduced the impact that length is adjusted the distance and calculated.The editing distance formula (LLD) of definition daily record, the distance between daily record A and daily record B is:
LLD ( A , B ) = 2 × LD ( A , B ) length ( A ) + length ( B )
Wherein LD (A, B) refers to original editing distance, and length () represents the length of daily record.
(4) obtain fail close keyword matrix
(4-1) daily record in sample is carried out to manual sort.In this step, first keeper is revised as the mark with practical significance the automatic mark of system, then the daily record template in each classification is adjusted, revised the result of automatic cluster, guarantee that the daily record in each type is relevant to the label of the type.
(4-2) manual sort's result is learnt, set up fail close keyword matrix.
Fail close keyword matrix (matrix A) is that the word in template appears at the two-dimensional matrix that the probability of every kind of fault type forms, as shown below:
A = a 1,1 a 1,2 . . . a 1 , n a 2,1 a 2,2 . . . a 2 , n . . . . . . . . . . . . a m , 1 a m , 2 . . . a m , n
Fail close keyword matrix A is the matrix of a m * n, and m represents all various words numbers that appear in sample daily record template, and n represents the quantity of daily record fault type, a i, jrepresent that the probability that i word belongs to the failure modes of j kind is a i, j.(note: the probability is here a relative probability coefficent is not real probability.)
Probability a is below described in detail in detail i, wsolution procedure:
A i, wbe used for judging whether word belongs to certain type, its value representation be the frequency that each word occurs in specific fault type, can think that so probability that i word occurs in fault type w is the ratio of total number of word in number and the type w of word i in type w, release a fundamental formular to be:
P ( i , w ) = count ( i , w ) Σ j = 1 m count ( j , w )
Wherein P (i, w) represents i the probability that word occurs in fault type w, and count (i, w) represents the occurrence number of word i in fault type w.
Afterwards, above formula is revised, is added a scale-up factor:
K ( i , w ) = - log ( sum ( i ) - count ( i , w ) sum ( i ) ) + 1
Wherein sum (i) represents the number of times sum that word i occurs in all fault types of template base, that is:
sum ( i ) = Σ t = 1 n count ( i , t )
Thus, can draw probability coefficent a i, wthe computing formula significance level that is word and the frequency of occurrences long-pending, that is:
a i,w=P(i,w)×K(i,w)
Importing two formulas above can obtain:
a i , w = count ( i , w ) Σ j = 1 m count ( j , w ) [ - log ( 1 - count ( i , w ) sum ( i ) ) + 1 ]
On this basis, the detailed process of utilizing fail close keyword matrix to carry out fault type study is described below:
After obtaining manual sort's result, according to formula, according to row, carry out compute matrix A.To solve w, classify example as.
A) in program meeting statistical mask storehouse, belong to total number of the word of type w, be made as T (w), that is:
T ( w ) = Σ j = 1 m count ( j , w )
In the present embodiment, for convenient, fail close keyword matrix modified and upgraded, additionally adding one and be used for preserving T (w), reducing the number of times of double counting.
B) for each the word i in type w, all can calculate sum (i), same T (w) is similar, in order to reduce calculation times, also can increase additional space and be used for storing sum (i).Therefore, here fail close keyword matrix is expanded, increased a line and row are used for preserving statistical information, final matrix A is:
A = a 1,1 a 1,2 . . . a 1 , n sum ( 1 ) a 2,1 a 2,2 . . . a 2 , n sum ( 2 ) . . . . . . . . . . . . . . . a m , 1 a m , 2 . . . a m , n sum ( m ) T ( 1 ) T ( 2 ) . . . T ( n )
C) when calculating concrete a i, wtime, according to formula, need to calculate count (i, w).Consider that count (i, w) can change along with the renewal of fail close keyword matrix, therefore, when actual storage, in the middle of matrix A, can directly preserve count (i, w), when needs are used, can calculate a i, w, owing to being used for calculating a i, wvariable be all kept in the middle of matrix A, therefore like this calculating can't be added extra expense to system.
In addition, after obtaining fail close keyword matrix, if classification is modified when system is moved, will adjust so fail close keyword matrix, adjust the three kinds of following situations that are divided into:
A), during the classification under need to revising certain template, system can scan all words in this template, and accordingly matrix is modified.
Suppose to have word i in template, in this template, occur n time, this template belonged to classification w originally, will change now classification u into.At this moment need to change as follows:
Figure BDA0000448968650000093
B) when needs add new template, system can directly be revised fail close keyword matrix, and concrete method is similar with the method for setting up matrix with artificial study.
C) when needs add new template to new classification, system can be added the new classification (being made as u classification) of a new row storage, then for each the word i in template, upgrade sum (i), then calculate T (u), in calculation template, the count (i, w) of each word is worth again.
5) fault log type judgement
There is fail close keyword matrix, just can to the daily record in the middle of system, carry out fault judgement easily.Concrete process is as follows:
Suppose to judge the fault type of daily record L, for the corresponding row in each the word lookup fail close keyword matrix A in daily record L, calculate the probable value that daily record L belongs to different faults type, if there is a most probable value, the fault type of daily record L is judged as to the corresponding fault type of most probable value.Provide the false code of concrete computation process below:
Figure BDA0000448968650000101
Figure BDA0000448968650000111
Further, in code, three last row are used for judging whether L has the possibility that belongs to other types.When there is other probable value and the difference between most probable value when being less than predetermined threshold, show cannot judge exactly the fault type under L by current fail close keyword matrix.Therefore must tell keeper, allow management select in most probable several situations.System log (SYSLOG) keeper's judgement, and utilize template and the judged result of L, according to the correction fail close keyword matrix described in step 4, occurs that similar fault log just can judge fault type exactly afterwards.
In addition, in the present embodiment, utilize fault correlation analysis that all fault logs that caused by same fault are all gathered together, form a fault propagation tree, and sequentially sort according to the time order and function occurring, for the basic reason of keeper's failure judgement is offered help.
As shown in Figure 3, the improved time-based fault correlation analytical approach of the embodiment of the present invention, comprises the following steps:
(1) by traditional time-based correlation analysis, daily record is carried out to polymerization.This method be the fault based on homology all can occur in the close time section in a this idea realize.Concrete method is to set up a time window, when fault occurs, according to each daily record of time order and function sequential scanning, within if daily record drops at the same time window, so just think that these daily records all belong to the fault log being produced by the same source of trouble, are classified as a class (referred to herein as " tuple ").
(2) daily record in each tuple is carried out to manual analysis, determine the time window size of different faults type.
(3) utilize fail close keyword matrix, in matrix, add a line, be used for storing every kind of time window size that fault type is different.
(4) select to carry out the daily record of fault diagnosis, confirm after the fault type of this daily record, by inquiry fail close keyword matrix, find the corresponding time window of this fault, then utilize time window to diagnose daily record.
Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. the computer system method for diagnosing faults based on daily record, comprises the following steps:
(S1) fault log analysis: the daily record in computer system is carried out to real-time analysis, utilize the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;
(S2) fault log is associated: utilizing the result of fault log analysis binding time window to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.
2. method for diagnosing faults according to claim 1, wherein, step (S1) comprises the following steps:
(1) daily record pre-service, comprises that the daily record to repeating is filtered and insignificant word in every daily record is filtered;
(2) extract daily record invariant, also by filtering out the variable in daily record text, extract the central structural information of daily record;
(3) daily record Template Information filters: take automatic cluster mode that the content daily record similar with form is put in a classification;
(4) obtain fail close keyword matrix, wherein fail close keyword matrix is that word in template appears at the two-dimensional matrix that the probability of every kind of fault type forms;
(5) utilize fail close keyword matrix to carry out fault judgement to daily record.
3. method for diagnosing faults according to claim 2, wherein, step (1) comprises following sub-step:
(1-1) daily record repeating producing within certain period in same process is filtered;
(1-2) for the function word in English, use English to stop vocabulary and filter, only retain the word that really has real justice.
4. method for diagnosing faults according to claim 2, wherein, described variable refers to the word that can change in the middle of daily record text, comprises numeral, IP address, memory address, catalogue, filename, program name, port.
5. method for diagnosing faults according to claim 4, wherein, step (2) comprises following sub-step:
(2-1) adopt regular expression to remove numeral, catalogue, IP address, the memory address in daily record text;
(2-2) adopt the method based on word frequency statistics further log information to be screened, leave the Template Information of daily record.
6. method for diagnosing faults according to claim 5, in step (2-2), the described method based on word frequency statistics adopts improved TF-IDF to calculate each word t iaccount for the importance tfidf of whole daily record i, j=tf i, j* idf i, wherein,
Tf i, jfor word t iat a certain particular device d jin importance:
Figure FDA0000448968640000021
n in formula i, jthat this word is at equipment d jin occurrence number, denominator is at equipment d jin the occurrence number sum of all words;
Idf ifor this word t ireverse file frequency, by equipment d jthe total daily record number producing is divided by the number of the daily record that comprises this word, then the business who obtains is taken the logarithm and obtained.
7. method for diagnosing faults according to claim 2, wherein, step (4) comprises following sub-step:
(4-1) daily record in sample is carried out to manual sort, the daily record template in each classification is adjusted, revise the result of automatic cluster;
(4-2) manual sort's result is learnt, is set up fail close keyword matrix A:
A = a 1,1 a 1,2 . . . a 1 , n a 2,1 a 2,2 . . . a 2 , n . . . . . . . . . . . . a m , 1 a m , 2 . . . a m , n
Wherein, m represents all various words numbers that appear in sample daily record template, and n represents the quantity of daily record fault type, a i, jrepresent that i word belongs to the probability of j kind failure modes.
8. method for diagnosing faults according to claim 2, wherein, step (5) is specially: the fault type of supposing to judge daily record L, for the corresponding row in each the word lookup fail close keyword matrix A in daily record L, calculate the probable value that daily record L belongs to different faults type, if there is a most probable value, the fault type of daily record L is judged as to the corresponding fault type of most probable value.
9. fault according to claim 1 is examined method, and wherein, step (S2) comprises the following steps:
(1) by time-based correlation analysis traditionally, daily record is carried out to polymerization;
(2) daily record in each tuple is carried out to manual analysis, determine the time window size of different faults type;
(3) utilize fail close keyword matrix, in matrix, add a line, be used for storing every kind of time window size that fault type is different;
(4) select to carry out the daily record of fault diagnosis, confirm after the fault type of this daily record, by inquiry fail close keyword matrix, find the corresponding time window of this fault, then utilize time window to diagnose daily record.
10. the computer system trouble-shooter based on daily record, comprises with lower module:
Fault log analysis module, for the daily record of computer system is carried out to real-time analysis, utilizes the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;
Fault log relating module, for utilizing result the binding time window of fault log analysis to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.
CN201310740549.XA 2013-12-28 2013-12-28 Log based computer system fault diagnosis method and device Pending CN103761173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310740549.XA CN103761173A (en) 2013-12-28 2013-12-28 Log based computer system fault diagnosis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310740549.XA CN103761173A (en) 2013-12-28 2013-12-28 Log based computer system fault diagnosis method and device

Publications (1)

Publication Number Publication Date
CN103761173A true CN103761173A (en) 2014-04-30

Family

ID=50528415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310740549.XA Pending CN103761173A (en) 2013-12-28 2013-12-28 Log based computer system fault diagnosis method and device

Country Status (1)

Country Link
CN (1) CN103761173A (en)

Cited By (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462606A (en) * 2014-12-31 2015-03-25 中国科学院深圳先进技术研究院 Method for determining diagnosis treatment measures based on log data
CN104461844A (en) * 2014-10-31 2015-03-25 大唐移动通信设备有限公司 Log service method based on rule
CN104657622A (en) * 2015-03-12 2015-05-27 浪潮集团有限公司 Cluster fault analysis method based on event-driven analysis
CN105049247A (en) * 2015-07-06 2015-11-11 中国科学院信息工程研究所 Network safety log template extraction method and device
CN105335277A (en) * 2014-06-27 2016-02-17 可牛网络技术(北京)有限公司 Fault information processing method and device as well as terminal
CN105471659A (en) * 2015-12-25 2016-04-06 华为技术有限公司 Root fault cause analysis method and analysis device
CN105468677A (en) * 2015-11-13 2016-04-06 国家计算机网络与信息安全管理中心 Log clustering method based on graph structure
CN105577440A (en) * 2015-12-24 2016-05-11 华为技术有限公司 Network fault time location method and analyzing device
CN105589800A (en) * 2015-12-25 2016-05-18 中国银联股份有限公司 Application system for predicting faults of complex system
CN105830060A (en) * 2014-02-06 2016-08-03 富士施乐株式会社 Information processing device, information processing program, storage medium, and information processing method
CN105893225A (en) * 2015-08-25 2016-08-24 乐视网信息技术(北京)股份有限公司 Automatic error processing method and device
CN106155827A (en) * 2016-06-28 2016-11-23 浪潮(北京)电子信息产业有限公司 A kind of cpu fault its diagnosis processing method based on Linux system and system
CN106953759A (en) * 2017-03-22 2017-07-14 联想(北京)有限公司 Cluster control method and cluster control facility
CN107181630A (en) * 2017-07-24 2017-09-19 郑州云海信息技术有限公司 The treating method and apparatus of service fault in cloud system
CN107241212A (en) * 2017-04-20 2017-10-10 努比亚技术有限公司 A kind of log processing method and device, equipment
CN107301120A (en) * 2017-07-12 2017-10-27 北京京东尚科信息技术有限公司 Method and device for handling unstructured daily record
CN107301118A (en) * 2017-06-15 2017-10-27 中国科学院计算技术研究所 A kind of fault indices automatic marking method and system based on daily record
CN107391727A (en) * 2017-08-01 2017-11-24 北京航空航天大学 The method for digging and device of equipment fault sequence pattern
CN107402863A (en) * 2016-03-28 2017-11-28 阿里巴巴集团控股有限公司 A kind of method and apparatus for being used for the daily record by log system processing business system
CN107577547A (en) * 2017-08-08 2018-01-12 国家超级计算深圳中心(深圳云计算中心) A kind of urgent operation of High-Performance Computing Cluster continues calculation method and system
CN107861856A (en) * 2017-11-08 2018-03-30 郑州云海信息技术有限公司 The processing method and computer-readable storage medium of warning information in cloud data system
CN108038049A (en) * 2017-12-13 2018-05-15 西安电子科技大学 Real-time logs control system and control method, cloud computing system and server
CN108153804A (en) * 2017-11-17 2018-06-12 极道科技(北京)有限公司 A kind of metadata daily record update method of symmetric distributed file system
CN108268473A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 A kind of log processing method and device
CN108377255A (en) * 2017-01-31 2018-08-07 欧姆龙株式会社 Information processing unit, information processing method and recording medium
CN108600007A (en) * 2018-04-24 2018-09-28 山东乾云启创信息科技股份有限公司 A kind of cloud platform Liability Retroact method and system
CN108845560A (en) * 2018-05-30 2018-11-20 国网浙江省电力有限公司宁波供电公司 A kind of power scheduling log Fault Classification
CN109213773A (en) * 2017-07-06 2019-01-15 阿里巴巴集团控股有限公司 A kind of diagnostic method, device and the electronic equipment of online failure
CN109343990A (en) * 2018-09-25 2019-02-15 江苏润和软件股份有限公司 A kind of cloud computing system method for detecting abnormality based on deep learning
CN109344060A (en) * 2018-09-20 2019-02-15 迈普通信技术股份有限公司 A kind of analysis method and device of automatic test
CN109510721A (en) * 2018-11-01 2019-03-22 郑州云海信息技术有限公司 A kind of network log management method and system based on Syslog
CN109684181A (en) * 2018-11-20 2019-04-26 华为技术有限公司 Alarm root is because of analysis method, device, equipment and storage medium
CN109685217A (en) * 2017-10-17 2019-04-26 博彦科技股份有限公司 Data processing method, device, storage medium and processor
CN110187992A (en) * 2019-04-11 2019-08-30 阿里巴巴集团控股有限公司 Failure analysis methods and device
CN110377576A (en) * 2019-07-24 2019-10-25 中国工商银行股份有限公司 Create method and apparatus, the log analysis method of log template
CN110427306A (en) * 2019-08-12 2019-11-08 吉林吉大通信设计院股份有限公司 A kind of big data log Intelligent routing and storage system and method
CN110659175A (en) * 2018-06-30 2020-01-07 中兴通讯股份有限公司 Log trunk extraction method, log trunk classification method, log trunk extraction equipment and log trunk storage medium
CN110688448A (en) * 2019-09-18 2020-01-14 上海擎创信息技术有限公司 Real-time log clustering analysis method based on reverse table
CN110855503A (en) * 2019-11-22 2020-02-28 叶晓斌 Fault cause determining method and system based on network protocol hierarchy dependency relationship
CN110928718A (en) * 2019-11-18 2020-03-27 上海维谛信息科技有限公司 Exception handling method, system, terminal and medium based on correlation analysis
CN111125164A (en) * 2018-10-30 2020-05-08 千寻位置网络有限公司 Reference station troubleshooting method and system and fault elimination terminal
CN111324583A (en) * 2018-12-17 2020-06-23 中国移动通信集团广西有限公司 Method and device for classifying service logs
CN111444156A (en) * 2020-04-20 2020-07-24 南阳理工学院 Fault diagnosis method based on cloud computing
CN111541559A (en) * 2020-03-23 2020-08-14 广东工业大学 Fault positioning method based on causal rule
CN111830931A (en) * 2020-07-15 2020-10-27 中国科学院微电子研究所 Fault diagnosis method of DCS (distributed control system)
CN112000502A (en) * 2020-08-11 2020-11-27 杭州安恒信息技术股份有限公司 Processing method and device for mass error logs, electronic device and storage medium
CN112306787A (en) * 2019-07-24 2021-02-02 阿里巴巴集团控股有限公司 Error log processing method and device, electronic equipment and intelligent sound box
CN112445775A (en) * 2019-08-15 2021-03-05 上海微电子装备(集团)股份有限公司 Fault analysis method, device, equipment and storage medium of photoetching machine
CN112800016A (en) * 2020-12-31 2021-05-14 武汉思普崚技术有限公司 Log data classification and sorting method and device
CN113254255A (en) * 2021-07-15 2021-08-13 苏州浪潮智能科技有限公司 Cloud platform log analysis method, system, device and medium
CN113448811A (en) * 2021-05-31 2021-09-28 山东英信计算机技术有限公司 Method, device, equipment and readable medium for lighting fault lamp of server system
CN113656323A (en) * 2021-08-27 2021-11-16 国家计算机网络与信息安全管理中心 Method for automatically testing, positioning and repairing fault and storage medium
CN113656354A (en) * 2021-08-06 2021-11-16 杭州安恒信息技术股份有限公司 Log classification method, system, computer device and readable storage medium
US11243835B1 (en) 2020-12-03 2022-02-08 International Business Machines Corporation Message-based problem diagnosis and root cause analysis
US11366712B1 (en) 2020-12-02 2022-06-21 International Business Machines Corporation Adaptive log analysis
US11403326B2 (en) 2020-12-03 2022-08-02 International Business Machines Corporation Message-based event grouping for a computing operation
CN114844778A (en) * 2022-04-25 2022-08-02 中国联合网络通信集团有限公司 Core network anomaly detection method and device, electronic equipment and readable storage medium
US11474892B2 (en) 2020-12-03 2022-10-18 International Business Machines Corporation Graph-based log sequence anomaly detection and problem diagnosis
US11513930B2 (en) 2020-12-03 2022-11-29 International Business Machines Corporation Log-based status modeling and problem diagnosis for distributed applications
US11599404B2 (en) 2020-12-03 2023-03-07 International Business Machines Corporation Correlation-based multi-source problem diagnosis
CN115277230B (en) * 2022-07-30 2023-07-07 重庆长安汽车股份有限公司 Method, device, equipment and storage medium for monitoring server login abnormality
US11797538B2 (en) 2020-12-03 2023-10-24 International Business Machines Corporation Message correlation extraction for mainframe operation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5469463A (en) * 1988-03-30 1995-11-21 Digital Equipment Corporation Expert system for identifying likely failure points in a digital data processing system
CN101325520A (en) * 2008-06-17 2008-12-17 南京邮电大学 Method for locating and analyzing fault of intelligent self-adapting network based on log
CN101714928A (en) * 2008-10-07 2010-05-26 中兴通讯股份有限公司 Method and system for realizing fault detection and location of communication products

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5469463A (en) * 1988-03-30 1995-11-21 Digital Equipment Corporation Expert system for identifying likely failure points in a digital data processing system
CN101325520A (en) * 2008-06-17 2008-12-17 南京邮电大学 Method for locating and analyzing fault of intelligent self-adapting network based on log
CN101714928A (en) * 2008-10-07 2010-05-26 中兴通讯股份有限公司 Method and system for realizing fault detection and location of communication products

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周浩: "基于机器学习的E级系统故障预测关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
彭剑等: "基于聚类矩阵的入侵日志关联规则算法", 《计算机工程》 *
郝春风等: "一种用于大规模文本分类的特征表示方法", 《计算机工程与应用》 *

Cited By (85)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105830060A (en) * 2014-02-06 2016-08-03 富士施乐株式会社 Information processing device, information processing program, storage medium, and information processing method
CN105335277A (en) * 2014-06-27 2016-02-17 可牛网络技术(北京)有限公司 Fault information processing method and device as well as terminal
CN104461844A (en) * 2014-10-31 2015-03-25 大唐移动通信设备有限公司 Log service method based on rule
CN104462606A (en) * 2014-12-31 2015-03-25 中国科学院深圳先进技术研究院 Method for determining diagnosis treatment measures based on log data
CN104462606B (en) * 2014-12-31 2018-06-22 中国科学院深圳先进技术研究院 A kind of method that diagnostic process measure is determined based on daily record data
CN104657622A (en) * 2015-03-12 2015-05-27 浪潮集团有限公司 Cluster fault analysis method based on event-driven analysis
CN105049247A (en) * 2015-07-06 2015-11-11 中国科学院信息工程研究所 Network safety log template extraction method and device
CN105049247B (en) * 2015-07-06 2019-04-26 中国科学院信息工程研究所 A kind of network security log template abstracting method and device
CN105893225A (en) * 2015-08-25 2016-08-24 乐视网信息技术(北京)股份有限公司 Automatic error processing method and device
CN105468677A (en) * 2015-11-13 2016-04-06 国家计算机网络与信息安全管理中心 Log clustering method based on graph structure
CN105468677B (en) * 2015-11-13 2019-11-19 国家计算机网络与信息安全管理中心 A kind of Log Clustering method based on graph structure
CN105577440A (en) * 2015-12-24 2016-05-11 华为技术有限公司 Network fault time location method and analyzing device
CN105577440B (en) * 2015-12-24 2019-06-11 华为技术有限公司 A kind of network downtime localization method and analytical equipment
CN105589800A (en) * 2015-12-25 2016-05-18 中国银联股份有限公司 Application system for predicting faults of complex system
CN105471659B (en) * 2015-12-25 2019-03-01 华为技术有限公司 A kind of failure root cause analysis method and analytical equipment
CN105471659A (en) * 2015-12-25 2016-04-06 华为技术有限公司 Root fault cause analysis method and analysis device
CN107402863B (en) * 2016-03-28 2021-03-09 阿里巴巴集团控股有限公司 Method and equipment for processing logs of service system through log system
CN107402863A (en) * 2016-03-28 2017-11-28 阿里巴巴集团控股有限公司 A kind of method and apparatus for being used for the daily record by log system processing business system
CN106155827A (en) * 2016-06-28 2016-11-23 浪潮(北京)电子信息产业有限公司 A kind of cpu fault its diagnosis processing method based on Linux system and system
CN108268473A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 A kind of log processing method and device
CN108377255A (en) * 2017-01-31 2018-08-07 欧姆龙株式会社 Information processing unit, information processing method and recording medium
CN108377255B (en) * 2017-01-31 2021-06-11 欧姆龙株式会社 Information processing apparatus, information processing method, and recording medium
CN106953759A (en) * 2017-03-22 2017-07-14 联想(北京)有限公司 Cluster control method and cluster control facility
CN106953759B (en) * 2017-03-22 2020-05-26 联想(北京)有限公司 Cluster control method and cluster control equipment
CN107241212A (en) * 2017-04-20 2017-10-10 努比亚技术有限公司 A kind of log processing method and device, equipment
CN107301118A (en) * 2017-06-15 2017-10-27 中国科学院计算技术研究所 A kind of fault indices automatic marking method and system based on daily record
CN107301118B (en) * 2017-06-15 2019-11-19 中国科学院计算技术研究所 A kind of fault indices automatic marking method and system based on log
CN109213773A (en) * 2017-07-06 2019-01-15 阿里巴巴集团控股有限公司 A kind of diagnostic method, device and the electronic equipment of online failure
CN107301120A (en) * 2017-07-12 2017-10-27 北京京东尚科信息技术有限公司 Method and device for handling unstructured daily record
CN107301120B (en) * 2017-07-12 2021-04-30 北京京东尚科信息技术有限公司 Method and device for processing unstructured log
CN107181630A (en) * 2017-07-24 2017-09-19 郑州云海信息技术有限公司 The treating method and apparatus of service fault in cloud system
CN107391727A (en) * 2017-08-01 2017-11-24 北京航空航天大学 The method for digging and device of equipment fault sequence pattern
CN107577547A (en) * 2017-08-08 2018-01-12 国家超级计算深圳中心(深圳云计算中心) A kind of urgent operation of High-Performance Computing Cluster continues calculation method and system
CN109685217A (en) * 2017-10-17 2019-04-26 博彦科技股份有限公司 Data processing method, device, storage medium and processor
CN107861856A (en) * 2017-11-08 2018-03-30 郑州云海信息技术有限公司 The processing method and computer-readable storage medium of warning information in cloud data system
CN108153804B (en) * 2017-11-17 2021-03-16 极道科技(北京)有限公司 Metadata log updating method for symmetric distributed file system
CN108153804A (en) * 2017-11-17 2018-06-12 极道科技(北京)有限公司 A kind of metadata daily record update method of symmetric distributed file system
CN108038049A (en) * 2017-12-13 2018-05-15 西安电子科技大学 Real-time logs control system and control method, cloud computing system and server
CN108600007B (en) * 2018-04-24 2021-07-23 山东乾云启创信息科技股份有限公司 Cloud platform responsibility tracing method and system
CN108600007A (en) * 2018-04-24 2018-09-28 山东乾云启创信息科技股份有限公司 A kind of cloud platform Liability Retroact method and system
CN108845560B (en) * 2018-05-30 2021-07-13 国网浙江省电力有限公司宁波供电公司 Power dispatching log fault classification method
CN108845560A (en) * 2018-05-30 2018-11-20 国网浙江省电力有限公司宁波供电公司 A kind of power scheduling log Fault Classification
CN110659175A (en) * 2018-06-30 2020-01-07 中兴通讯股份有限公司 Log trunk extraction method, log trunk classification method, log trunk extraction equipment and log trunk storage medium
CN109344060A (en) * 2018-09-20 2019-02-15 迈普通信技术股份有限公司 A kind of analysis method and device of automatic test
CN109343990A (en) * 2018-09-25 2019-02-15 江苏润和软件股份有限公司 A kind of cloud computing system method for detecting abnormality based on deep learning
CN111125164A (en) * 2018-10-30 2020-05-08 千寻位置网络有限公司 Reference station troubleshooting method and system and fault elimination terminal
CN109510721A (en) * 2018-11-01 2019-03-22 郑州云海信息技术有限公司 A kind of network log management method and system based on Syslog
CN109684181A (en) * 2018-11-20 2019-04-26 华为技术有限公司 Alarm root is because of analysis method, device, equipment and storage medium
CN111324583A (en) * 2018-12-17 2020-06-23 中国移动通信集团广西有限公司 Method and device for classifying service logs
CN111324583B (en) * 2018-12-17 2023-10-27 中国移动通信集团广西有限公司 Service log classification method and device
CN110187992A (en) * 2019-04-11 2019-08-30 阿里巴巴集团控股有限公司 Failure analysis methods and device
CN110187992B (en) * 2019-04-11 2023-01-24 创新先进技术有限公司 Fault analysis method and device
CN110377576A (en) * 2019-07-24 2019-10-25 中国工商银行股份有限公司 Create method and apparatus, the log analysis method of log template
CN112306787B (en) * 2019-07-24 2022-08-09 阿里巴巴集团控股有限公司 Error log processing method and device, electronic equipment and intelligent sound box
CN112306787A (en) * 2019-07-24 2021-02-02 阿里巴巴集团控股有限公司 Error log processing method and device, electronic equipment and intelligent sound box
CN110377576B (en) * 2019-07-24 2021-10-29 中国工商银行股份有限公司 Method and device for creating log template and log analysis method
CN110427306A (en) * 2019-08-12 2019-11-08 吉林吉大通信设计院股份有限公司 A kind of big data log Intelligent routing and storage system and method
CN112445775A (en) * 2019-08-15 2021-03-05 上海微电子装备(集团)股份有限公司 Fault analysis method, device, equipment and storage medium of photoetching machine
CN112445775B (en) * 2019-08-15 2024-04-19 上海微电子装备(集团)股份有限公司 Fault analysis method, device, equipment and storage medium of photoetching machine
CN110688448B (en) * 2019-09-18 2023-03-31 上海擎创信息技术有限公司 Real-time log clustering analysis method based on reverse table
CN110688448A (en) * 2019-09-18 2020-01-14 上海擎创信息技术有限公司 Real-time log clustering analysis method based on reverse table
CN110928718B (en) * 2019-11-18 2024-01-30 上海维谛信息科技有限公司 Abnormality processing method, system, terminal and medium based on association analysis
CN110928718A (en) * 2019-11-18 2020-03-27 上海维谛信息科技有限公司 Exception handling method, system, terminal and medium based on correlation analysis
CN110855503A (en) * 2019-11-22 2020-02-28 叶晓斌 Fault cause determining method and system based on network protocol hierarchy dependency relationship
CN111541559A (en) * 2020-03-23 2020-08-14 广东工业大学 Fault positioning method based on causal rule
CN111444156A (en) * 2020-04-20 2020-07-24 南阳理工学院 Fault diagnosis method based on cloud computing
CN111444156B (en) * 2020-04-20 2023-01-24 南阳理工学院 Fault diagnosis method based on cloud computing
CN111830931B (en) * 2020-07-15 2021-08-20 中国科学院微电子研究所 Fault diagnosis method of DCS (distributed control system)
CN111830931A (en) * 2020-07-15 2020-10-27 中国科学院微电子研究所 Fault diagnosis method of DCS (distributed control system)
CN112000502A (en) * 2020-08-11 2020-11-27 杭州安恒信息技术股份有限公司 Processing method and device for mass error logs, electronic device and storage medium
US11366712B1 (en) 2020-12-02 2022-06-21 International Business Machines Corporation Adaptive log analysis
US11797538B2 (en) 2020-12-03 2023-10-24 International Business Machines Corporation Message correlation extraction for mainframe operation
US11403326B2 (en) 2020-12-03 2022-08-02 International Business Machines Corporation Message-based event grouping for a computing operation
US11243835B1 (en) 2020-12-03 2022-02-08 International Business Machines Corporation Message-based problem diagnosis and root cause analysis
US11474892B2 (en) 2020-12-03 2022-10-18 International Business Machines Corporation Graph-based log sequence anomaly detection and problem diagnosis
US11513930B2 (en) 2020-12-03 2022-11-29 International Business Machines Corporation Log-based status modeling and problem diagnosis for distributed applications
US11599404B2 (en) 2020-12-03 2023-03-07 International Business Machines Corporation Correlation-based multi-source problem diagnosis
CN112800016A (en) * 2020-12-31 2021-05-14 武汉思普崚技术有限公司 Log data classification and sorting method and device
CN113448811A (en) * 2021-05-31 2021-09-28 山东英信计算机技术有限公司 Method, device, equipment and readable medium for lighting fault lamp of server system
CN113254255A (en) * 2021-07-15 2021-08-13 苏州浪潮智能科技有限公司 Cloud platform log analysis method, system, device and medium
CN113656354A (en) * 2021-08-06 2021-11-16 杭州安恒信息技术股份有限公司 Log classification method, system, computer device and readable storage medium
CN113656323A (en) * 2021-08-27 2021-11-16 国家计算机网络与信息安全管理中心 Method for automatically testing, positioning and repairing fault and storage medium
CN114844778B (en) * 2022-04-25 2023-05-30 中国联合网络通信集团有限公司 Abnormality detection method and device for core network, electronic equipment and readable storage medium
CN114844778A (en) * 2022-04-25 2022-08-02 中国联合网络通信集团有限公司 Core network anomaly detection method and device, electronic equipment and readable storage medium
CN115277230B (en) * 2022-07-30 2023-07-07 重庆长安汽车股份有限公司 Method, device, equipment and storage medium for monitoring server login abnormality

Similar Documents

Publication Publication Date Title
CN103761173A (en) Log based computer system fault diagnosis method and device
JP7100155B2 (en) Alarm log compression methods, devices and systems, and storage media
CN110928718B (en) Abnormality processing method, system, terminal and medium based on association analysis
US10795753B2 (en) Log-based computer failure diagnosis
CN108415789B (en) Node fault prediction system and method for large-scale hybrid heterogeneous storage system
US10417072B2 (en) Scalable predictive early warning system for data backup event log
CN108427720B (en) System log classification method
US11651375B2 (en) Below-the-line thresholds tuning with machine learning
CN107147639A (en) A kind of actual time safety method for early warning based on Complex event processing
US8112667B2 (en) Automated system problem diagnosing
CN111209131A (en) Method and system for determining fault of heterogeneous system based on machine learning
US11681282B2 (en) Systems and methods for determining relationships between defects
CN104598367A (en) System and method for automatically managing fault events of data center
CN111027615B (en) Middleware fault early warning method and system based on machine learning
AU2019275633B2 (en) System and method of automated fault correction in a network environment
WO2019051042A1 (en) Apparatus and method for real time analysis, predicting and reporting of anomalous database transaction log activity
US20190171644A1 (en) Efficient event searching
CN110633371A (en) Log classification method and system
CN111581056B (en) Software engineering database maintenance and early warning system based on artificial intelligence
CN112951311A (en) Hard disk fault prediction method and system based on variable weight random forest
US11025478B2 (en) Method and apparatus for analysing performance of a network by managing network data relating to operation of the network
Zou et al. Improving log-based fault diagnosis by log classification
CN112131090B (en) Service system performance monitoring method, device, equipment and medium
CN112699005A (en) Server hardware fault monitoring method, electronic equipment and storage medium
CN114721861B (en) Log differentiation comparison-based fault positioning method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140430