CN103761173A - Log based computer system fault diagnosis method and device - Google Patents

Log based computer system fault diagnosis method and device Download PDF

Info

Publication number
CN103761173A
CN103761173A CN201310740549.XA CN201310740549A CN103761173A CN 103761173 A CN103761173 A CN 103761173A CN 201310740549 A CN201310740549 A CN 201310740549A CN 103761173 A CN103761173 A CN 103761173A
Authority
CN
China
Prior art keywords
daily record
fault
log
word
analysis
Prior art date
Application number
CN201310740549.XA
Other languages
Chinese (zh)
Inventor
邹德清
金海�
秦昊
羌卫中
Original Assignee
华中科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华中科技大学 filed Critical 华中科技大学
Priority to CN201310740549.XA priority Critical patent/CN103761173A/en
Publication of CN103761173A publication Critical patent/CN103761173A/en

Links

Abstract

The invention discloses a log management and fault diagnosis method and device under multiple hosts. The method comprises fault log collection, fault log analysis and fault log correlation analysis. Fault log collection comprises collecting fault logs of all hardware and software in the cluster and storing the logs in a log server uniformly. Fault log analysis comprises filtering the fault logs, extracting log template information and classifying the logs according to log types. Fault log correlation analysis comprises performing fault reason analysis by using log analysis results and combining time windows, clustering the related faults caused by the same fault into a category and trying to find the root of the type of the faults. The method and the device are capable of analyzing the system operation condition effectively, assisting administrators to process system faults and improving the fault type determination accuracy.

Description

A kind of computer system method for diagnosing faults and device based on daily record

Technical field

The invention belongs to the fault analysis field under computer system security, more specifically, relate to a kind of computer system method for diagnosing faults and device based on daily record.

Background technology

Along with the introducing of cloud computing structure, computer system becomes and becomes increasingly complex, and running software is thereon also more and more abundanter, and more frequent alternately, coupling is high, makes the reason of the system failure and kind also be difficult to analyze.In order to solve some crucial application harsh requirements to system processing power and fault-tolerant ability, therefore must monitor and record its daily behavior.So that after there is mistake, can find out the reason that fault occurs in time to diagnosing malfunction and location, thereby the performance of the mistake of repair system, raising system guarantees that similar fault can not occur again.

In a complete infosystem, log system is a very important ingredient.Daily record is the real picture of computer system running orbit, and it is widely used in system debug, monitoring and safety detection.Log management and analysis are the infrastructure of system management and intrusion detection, are evaluating system operation conditionss, the necessary means of supervising network security strategy validity.By system journal, not only can detect and analytic system fault, can also monitoring system situation.Optimization system performance, adjustment System behavior.

When daily record is analyzed, conventionally there are 3 main stages: (1) daily record is filtered, filter the daily record irrelevant with fault; (2) daily record fault analysis, from finding the type of fault in the middle of daily record, understands the wrong process occurring in detail; (3) log correlation analysis, the daily record in same fault source is assigned to same group, and analysis of failure is at the mechanism of transmission of different nodes and inter-module.

The existing diagnostic method based on daily record exists the problem of several respects:

(1) need a large amount of manual interventions.Because the fault diagnosis of the classification of daily record is all the center that is judged as with people, rely on merely the automatic processing of computing machine to be difficult to the result that obtains wanting.Therefore traditional fault diagnosis system tends to rely on expert system, need to be that it is by basic data collection by a large amount of artificial treatment, computing machine just can produce more satisfactory result through learning for a long time after correction, and this has wasted a large amount of system managers' time;

(2) response speed is slow.Current analytic system is the determinating mode based on a set of complexity normally, is difficult in real time the fault in system be responded;

(3) degree of accuracy is low.The accuracy of classifying for daily record is often lower, and lacks the algorithm of revising erroneous judgement.

Summary of the invention

For the defect of prior art, the object of the invention is to rapidly the daily record producing in computer system be classified, and only need less manual intervention, just can realize high-precision classification.In addition, can also revise at any time the result of classification, the new classificating knowledge of convenient study.

For achieving the above object, the present invention proposes a kind of computer system method for diagnosing faults based on daily record, comprise the following steps:

(S1) fault log analysis: the daily record in computer system is carried out to real-time analysis, utilize the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;

(S2) fault log is associated: utilizing the result of fault log analysis binding time window to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.

The invention allows for a kind of computer system trouble-shooter based on daily record, comprise with lower module:

Fault log analysis module, for the daily record of computer system is carried out to real-time analysis, utilizes the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;

Fault log relating module, for utilizing result the binding time window of fault log analysis to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.

Compared with prior art, system of the present invention has following beneficial effect:

(1) adopted fail close keyword matrix to store the result of machine learning, can easily to daily record, carry out failure modes rapidly, also can revise at any time judgment rule and add new fault type.Compare and improved processing speed with traditional approach, saved the time relearning;

(2) proposed new daily record sorting technique, can to daily record, according to different fault types, classify accurately;

(3) adopt improved time-based daily record correlation analysis, utilize the fault type of daily record to instruct correlation analysis, improved the accuracy of classification, reduced rate of false alarm and rate of failing to report;

(4) by fail close keyword matrix, preserve the time window of different faults type, can be real-time modify, has improved the learning ability to new fault type.

Accompanying drawing explanation

Fig. 1 is the structural representation of the system fault diagnosis device based on fault log for multi-host environment of the embodiment of the present invention;

Fig. 2 is the process flow diagram of the daily record Fault Classification of the embodiment of the present invention;

Fig. 3 is the process flow diagram of the time-based fault correlation analytical approach of the embodiment of the present invention.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

As shown in Figure 1, the invention provides a kind of system fault diagnosis device that is applicable to large-scale cloud platform.This device can be collected the daily record producing in whole computer system, daily record is added up and is learnt, and mark the fault type of each daily record.When having fault to occur, keeper can click corresponding daily record, and system can find with this daily record and have associated daily record, according to the order of fault propagation, forms fault tree, helps keeper to find the basic reason of fault diagnosis fault.This device mainly comprises three assemblies:

1) fault log analysis module, carries out real-time analysis to the daily record in computer system, utilizes the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;

2) fault log relating module, utilizes the result of log analysis binding time window to carry out failure reason analysis, is a class, and attempts to find the root of this class fault the dependent failure log aggregation being caused by same fault.

Correspondingly, a kind of computer system method for diagnosing faults based on daily record of the embodiment of the present invention, comprises the following steps:

(S1) fault log analysis: the daily record in computer system is carried out to real-time analysis, utilize the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;

(S2) fault log is associated: utilizing the result of fault log analysis binding time window to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.

As shown in Figure 2, the fault log analysis of the embodiment of the present invention comprises the following steps:

(1) daily record pre-service

Daily record pre-service has comprised two parts, comprises that the daily record to repeating is filtered and insignificant word in every daily record is filtered.Specifically comprise the following steps:

(1-1) daily record repeating producing within certain period in same process is filtered.

By finding the statistics of daily record, the time that this redundancy daily record produces all concentrates in together, and the process of generation should be all identical.So carrying out pretreated time, filtering the similar daily record that same process produces within certain period.The concrete time period is selected different threshold values according to different computer systems.Daily record and last daily record that certain equipment produces ought be detected just the same, and the time producing in the threshold values of stipulating, just remove this daily record.If occurred that in time threshold values different daily record or same log have exceeded time range, so just retained this daily record.

When concrete application, need to select a suitable threshold values to the time period that can merge, time span is too large else if, and the similar log information that different faults may be produced, to filtering out, causes and fails to report.In the present embodiment, because the filtration is here just as pre-service, for guaranteeing not fail to report, in judgement, can only the duplicate daily record occurring continuously be filtered out, it is looser that the time threshold values of this spline filter just can design.

(1-2) for nonsensical word, use English to stop vocabulary and filter, only retain the word that really has real justice.

For this nonsensical word of the function word in English, can use English to stop vocabulary and filter, this table has comprised the most of function word in English.Consider in addition the singularity of daily record, further, can also filter most adjective and adverbial word, only retain the word that really has real justice.

2) extract daily record invariant, also by filtering out the variable in daily record text, extract the central structural information of daily record.

After daily record is carried out to pre-service, next need to extract the Template Information of daily record, Template Information is further to extract in daily record and the result of fault related content, Templated daily record will be used for carrying out the study of regulation, after templating, can reduce solution space, the fail close keyword matrix update finally obtaining is simplified, in addition, the daily record that template produces in the time of also can being used for matching system operation, carries out Fast Classification.

Variable in daily record refers to the word that can change in the middle of daily record text, comprises numeral, IP address, memory address, catalogue, filename, program name, port etc.

Particularly, the process of extraction daily record invariant is divided into two steps.

(2-1) adopt regular expression to remove numeral, catalogue, IP address, the memory address in daily record text.These are parts of easily removing, and wherein, deleted element adopts the symbol with different meanings to replace.Succession because word on grammer exists, simultaneously in order to distinguish different positions, by symbolic substitution, more can reflect the raw information of daily record, reduces distortion in leaching process.

(2-2) adopt the method based on word frequency statistics further log information to be screened, leave the Template Information of daily record.

In the present embodiment, the described method based on word frequency statistics can adopt improved TF-IDF to calculate the importance that each word accounts for whole daily record, and detailed process is as follows:

The word frequency here (term frequency, TF) refers to the frequency that some given words occur in all daily records of this equipment output.This numeral is the normalization to word number (term count), to prevent the long file of its deflection.(same word may have higher word number than short essay part in long article part, and no matter whether important this word is.) for the word t in a certain particular device i, its importance can be expressed as:

tf i , j = n i , j Σ k n k , j

N in above formula i, jthat this word is at equipment d jin occurrence number, denominator is at equipment d jin the occurrence number sum of all words.

Reverse file frequency (inverse document frequency, IDF) is the tolerance of a word general importance.A certain particular words t iiDF, can be by this equipment d jthe total daily record number producing is divided by the number of the daily record that comprises this word, then the business who obtains is taken the logarithm and obtained:

idf i = log | D | | { j : t i ∈ d j } |

Wherein:

| D|: the sum that this equipment produces

| { j:t i∈ d j|: comprise word t idaily record number (be n i, j≠ 0 number of files) if it is zero that this word not in daily record, will cause dividend, therefore generally use 1+|{j:t i∈ d j|, then:

tfidf i,j=tf i,j×idf i

By calculating TF-IDF, can obtain the importance that each word accounts for whole daily record, by predefined threshold values, daily record be screened afterwards.

(3) daily record Template Information filters

Before carrying out artificial classification, consider that template number is still huger, the present embodiment has taked the mode of automatic cluster to sort out daily record, further dwindles the space that needs artificial judgment.

In the automatic cluster stage, the content daily record similar with form can be put in the middle of a classification, in manual sort, only need to judge and to carry out trickle adjustment just passable each class like this.

Automatic classification has adopted DBSCAN algorithm to carry out automatic cluster to daily record template here, and the benefit of this algorithm maximum is to set in advance the quantity of clustering cluster, can be as required when classifying cluster dividing class automatically.

When chosen distance formula, consider the singularity of daily record text, adopted editing distance to investigate two distances between daily record.In concrete calculating, editing distance is revised: in editing distance, added coefficient, reduced the impact that length is adjusted the distance and calculated.The editing distance formula (LLD) of definition daily record, the distance between daily record A and daily record B is:

LLD ( A , B ) = 2 × LD ( A , B ) length ( A ) + length ( B )

Wherein LD (A, B) refers to original editing distance, and length () represents the length of daily record.

(4) obtain fail close keyword matrix

(4-1) daily record in sample is carried out to manual sort.In this step, first keeper is revised as the mark with practical significance the automatic mark of system, then the daily record template in each classification is adjusted, revised the result of automatic cluster, guarantee that the daily record in each type is relevant to the label of the type.

(4-2) manual sort's result is learnt, set up fail close keyword matrix.

Fail close keyword matrix (matrix A) is that the word in template appears at the two-dimensional matrix that the probability of every kind of fault type forms, as shown below:

A = a 1,1 a 1,2 . . . a 1 , n a 2,1 a 2,2 . . . a 2 , n . . . . . . . . . . . . a m , 1 a m , 2 . . . a m , n

Fail close keyword matrix A is the matrix of a m * n, and m represents all various words numbers that appear in sample daily record template, and n represents the quantity of daily record fault type, a i, jrepresent that the probability that i word belongs to the failure modes of j kind is a i, j.(note: the probability is here a relative probability coefficent is not real probability.)

Probability a is below described in detail in detail i, wsolution procedure:

A i, wbe used for judging whether word belongs to certain type, its value representation be the frequency that each word occurs in specific fault type, can think that so probability that i word occurs in fault type w is the ratio of total number of word in number and the type w of word i in type w, release a fundamental formular to be:

P ( i , w ) = count ( i , w ) Σ j = 1 m count ( j , w )

Wherein P (i, w) represents i the probability that word occurs in fault type w, and count (i, w) represents the occurrence number of word i in fault type w.

Afterwards, above formula is revised, is added a scale-up factor:

K ( i , w ) = - log ( sum ( i ) - count ( i , w ) sum ( i ) ) + 1

Wherein sum (i) represents the number of times sum that word i occurs in all fault types of template base, that is:

sum ( i ) = Σ t = 1 n count ( i , t )

Thus, can draw probability coefficent a i, wthe computing formula significance level that is word and the frequency of occurrences long-pending, that is:

a i,w=P(i,w)×K(i,w)

Importing two formulas above can obtain:

a i , w = count ( i , w ) Σ j = 1 m count ( j , w ) [ - log ( 1 - count ( i , w ) sum ( i ) ) + 1 ]

On this basis, the detailed process of utilizing fail close keyword matrix to carry out fault type study is described below:

After obtaining manual sort's result, according to formula, according to row, carry out compute matrix A.To solve w, classify example as.

A) in program meeting statistical mask storehouse, belong to total number of the word of type w, be made as T (w), that is:

T ( w ) = Σ j = 1 m count ( j , w )

In the present embodiment, for convenient, fail close keyword matrix modified and upgraded, additionally adding one and be used for preserving T (w), reducing the number of times of double counting.

B) for each the word i in type w, all can calculate sum (i), same T (w) is similar, in order to reduce calculation times, also can increase additional space and be used for storing sum (i).Therefore, here fail close keyword matrix is expanded, increased a line and row are used for preserving statistical information, final matrix A is:

A = a 1,1 a 1,2 . . . a 1 , n sum ( 1 ) a 2,1 a 2,2 . . . a 2 , n sum ( 2 ) . . . . . . . . . . . . . . . a m , 1 a m , 2 . . . a m , n sum ( m ) T ( 1 ) T ( 2 ) . . . T ( n )

C) when calculating concrete a i, wtime, according to formula, need to calculate count (i, w).Consider that count (i, w) can change along with the renewal of fail close keyword matrix, therefore, when actual storage, in the middle of matrix A, can directly preserve count (i, w), when needs are used, can calculate a i, w, owing to being used for calculating a i, wvariable be all kept in the middle of matrix A, therefore like this calculating can't be added extra expense to system.

In addition, after obtaining fail close keyword matrix, if classification is modified when system is moved, will adjust so fail close keyword matrix, adjust the three kinds of following situations that are divided into:

A), during the classification under need to revising certain template, system can scan all words in this template, and accordingly matrix is modified.

Suppose to have word i in template, in this template, occur n time, this template belonged to classification w originally, will change now classification u into.At this moment need to change as follows:

B) when needs add new template, system can directly be revised fail close keyword matrix, and concrete method is similar with the method for setting up matrix with artificial study.

C) when needs add new template to new classification, system can be added the new classification (being made as u classification) of a new row storage, then for each the word i in template, upgrade sum (i), then calculate T (u), in calculation template, the count (i, w) of each word is worth again.

5) fault log type judgement

There is fail close keyword matrix, just can to the daily record in the middle of system, carry out fault judgement easily.Concrete process is as follows:

Suppose to judge the fault type of daily record L, for the corresponding row in each the word lookup fail close keyword matrix A in daily record L, calculate the probable value that daily record L belongs to different faults type, if there is a most probable value, the fault type of daily record L is judged as to the corresponding fault type of most probable value.Provide the false code of concrete computation process below:

Further, in code, three last row are used for judging whether L has the possibility that belongs to other types.When there is other probable value and the difference between most probable value when being less than predetermined threshold, show cannot judge exactly the fault type under L by current fail close keyword matrix.Therefore must tell keeper, allow management select in most probable several situations.System log (SYSLOG) keeper's judgement, and utilize template and the judged result of L, according to the correction fail close keyword matrix described in step 4, occurs that similar fault log just can judge fault type exactly afterwards.

In addition, in the present embodiment, utilize fault correlation analysis that all fault logs that caused by same fault are all gathered together, form a fault propagation tree, and sequentially sort according to the time order and function occurring, for the basic reason of keeper's failure judgement is offered help.

As shown in Figure 3, the improved time-based fault correlation analytical approach of the embodiment of the present invention, comprises the following steps:

(1) by traditional time-based correlation analysis, daily record is carried out to polymerization.This method be the fault based on homology all can occur in the close time section in a this idea realize.Concrete method is to set up a time window, when fault occurs, according to each daily record of time order and function sequential scanning, within if daily record drops at the same time window, so just think that these daily records all belong to the fault log being produced by the same source of trouble, are classified as a class (referred to herein as " tuple ").

(2) daily record in each tuple is carried out to manual analysis, determine the time window size of different faults type.

(3) utilize fail close keyword matrix, in matrix, add a line, be used for storing every kind of time window size that fault type is different.

(4) select to carry out the daily record of fault diagnosis, confirm after the fault type of this daily record, by inquiry fail close keyword matrix, find the corresponding time window of this fault, then utilize time window to diagnose daily record.

Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. the computer system method for diagnosing faults based on daily record, comprises the following steps:
(S1) fault log analysis: the daily record in computer system is carried out to real-time analysis, utilize the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;
(S2) fault log is associated: utilizing the result of fault log analysis binding time window to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.
2. method for diagnosing faults according to claim 1, wherein, step (S1) comprises the following steps:
(1) daily record pre-service, comprises that the daily record to repeating is filtered and insignificant word in every daily record is filtered;
(2) extract daily record invariant, also by filtering out the variable in daily record text, extract the central structural information of daily record;
(3) daily record Template Information filters: take automatic cluster mode that the content daily record similar with form is put in a classification;
(4) obtain fail close keyword matrix, wherein fail close keyword matrix is that word in template appears at the two-dimensional matrix that the probability of every kind of fault type forms;
(5) utilize fail close keyword matrix to carry out fault judgement to daily record.
3. method for diagnosing faults according to claim 2, wherein, step (1) comprises following sub-step:
(1-1) daily record repeating producing within certain period in same process is filtered;
(1-2) for the function word in English, use English to stop vocabulary and filter, only retain the word that really has real justice.
4. method for diagnosing faults according to claim 2, wherein, described variable refers to the word that can change in the middle of daily record text, comprises numeral, IP address, memory address, catalogue, filename, program name, port.
5. method for diagnosing faults according to claim 4, wherein, step (2) comprises following sub-step:
(2-1) adopt regular expression to remove numeral, catalogue, IP address, the memory address in daily record text;
(2-2) adopt the method based on word frequency statistics further log information to be screened, leave the Template Information of daily record.
6. method for diagnosing faults according to claim 5, in step (2-2), the described method based on word frequency statistics adopts improved TF-IDF to calculate each word t iaccount for the importance tfidf of whole daily record i, j=tf i, j* idf i, wherein,
Tf i, jfor word t iat a certain particular device d jin importance: n in formula i, jthat this word is at equipment d jin occurrence number, denominator is at equipment d jin the occurrence number sum of all words;
Idf ifor this word t ireverse file frequency, by equipment d jthe total daily record number producing is divided by the number of the daily record that comprises this word, then the business who obtains is taken the logarithm and obtained.
7. method for diagnosing faults according to claim 2, wherein, step (4) comprises following sub-step:
(4-1) daily record in sample is carried out to manual sort, the daily record template in each classification is adjusted, revise the result of automatic cluster;
(4-2) manual sort's result is learnt, is set up fail close keyword matrix A:
A = a 1,1 a 1,2 . . . a 1 , n a 2,1 a 2,2 . . . a 2 , n . . . . . . . . . . . . a m , 1 a m , 2 . . . a m , n
Wherein, m represents all various words numbers that appear in sample daily record template, and n represents the quantity of daily record fault type, a i, jrepresent that i word belongs to the probability of j kind failure modes.
8. method for diagnosing faults according to claim 2, wherein, step (5) is specially: the fault type of supposing to judge daily record L, for the corresponding row in each the word lookup fail close keyword matrix A in daily record L, calculate the probable value that daily record L belongs to different faults type, if there is a most probable value, the fault type of daily record L is judged as to the corresponding fault type of most probable value.
9. fault according to claim 1 is examined method, and wherein, step (S2) comprises the following steps:
(1) by time-based correlation analysis traditionally, daily record is carried out to polymerization;
(2) daily record in each tuple is carried out to manual analysis, determine the time window size of different faults type;
(3) utilize fail close keyword matrix, in matrix, add a line, be used for storing every kind of time window size that fault type is different;
(4) select to carry out the daily record of fault diagnosis, confirm after the fault type of this daily record, by inquiry fail close keyword matrix, find the corresponding time window of this fault, then utilize time window to diagnose daily record.
10. the computer system trouble-shooter based on daily record, comprises with lower module:
Fault log analysis module, for the daily record of computer system is carried out to real-time analysis, utilizes the failure modes result of the artificial study of fail close keyword matrix quantization, according to different fault types, every fault log is confirmed to fault type;
Fault log relating module, for utilizing result the binding time window of fault log analysis to carry out failure reason analysis, is a class the dependent failure log aggregation being caused by same fault, finds the root of this class fault.
CN201310740549.XA 2013-12-28 2013-12-28 Log based computer system fault diagnosis method and device CN103761173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310740549.XA CN103761173A (en) 2013-12-28 2013-12-28 Log based computer system fault diagnosis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310740549.XA CN103761173A (en) 2013-12-28 2013-12-28 Log based computer system fault diagnosis method and device

Publications (1)

Publication Number Publication Date
CN103761173A true CN103761173A (en) 2014-04-30

Family

ID=50528415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310740549.XA CN103761173A (en) 2013-12-28 2013-12-28 Log based computer system fault diagnosis method and device

Country Status (1)

Country Link
CN (1) CN103761173A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461844A (en) * 2014-10-31 2015-03-25 大唐移动通信设备有限公司 Log service method based on rule
CN104462606A (en) * 2014-12-31 2015-03-25 中国科学院深圳先进技术研究院 Method for determining diagnosis treatment measures based on log data
CN104657622A (en) * 2015-03-12 2015-05-27 浪潮集团有限公司 Cluster fault analysis method based on event-driven analysis
CN105049247A (en) * 2015-07-06 2015-11-11 中国科学院信息工程研究所 Network safety log template extraction method and device
CN105335277A (en) * 2014-06-27 2016-02-17 可牛网络技术(北京)有限公司 Fault information processing method and device as well as terminal
CN105471659A (en) * 2015-12-25 2016-04-06 华为技术有限公司 Root fault cause analysis method and analysis device
CN105468677A (en) * 2015-11-13 2016-04-06 国家计算机网络与信息安全管理中心 Log clustering method based on graph structure
CN105577440A (en) * 2015-12-24 2016-05-11 华为技术有限公司 Network fault time location method and analyzing device
CN105589800A (en) * 2015-12-25 2016-05-18 中国银联股份有限公司 Application system for predicting faults of complex system
CN105830060A (en) * 2014-02-06 2016-08-03 富士施乐株式会社 Information processing device, information processing program, storage medium, and information processing method
CN105893225A (en) * 2015-08-25 2016-08-24 乐视网信息技术(北京)股份有限公司 Automatic error processing method and device
CN106155827A (en) * 2016-06-28 2016-11-23 浪潮(北京)电子信息产业有限公司 A kind of cpu fault its diagnosis processing method based on Linux system and system
CN106953759A (en) * 2017-03-22 2017-07-14 联想(北京)有限公司 Cluster control method and cluster control facility
CN107181630A (en) * 2017-07-24 2017-09-19 郑州云海信息技术有限公司 The treating method and apparatus of service fault in cloud system
CN107241212A (en) * 2017-04-20 2017-10-10 努比亚技术有限公司 A kind of log processing method and device, equipment
CN107301118A (en) * 2017-06-15 2017-10-27 中国科学院计算技术研究所 A kind of fault indices automatic marking method and system based on daily record
CN107301120A (en) * 2017-07-12 2017-10-27 北京京东尚科信息技术有限公司 Method and device for handling unstructured daily record
CN107391727A (en) * 2017-08-01 2017-11-24 北京航空航天大学 The method for digging and device of equipment fault sequence pattern
CN107402863A (en) * 2016-03-28 2017-11-28 阿里巴巴集团控股有限公司 A kind of method and apparatus for being used for the daily record by log system processing business system
CN107577547A (en) * 2017-08-08 2018-01-12 国家超级计算深圳中心(深圳云计算中心) A kind of urgent operation of High-Performance Computing Cluster continues calculation method and system
CN107861856A (en) * 2017-11-08 2018-03-30 郑州云海信息技术有限公司 The processing method and computer-readable storage medium of warning information in cloud data system
CN108153804A (en) * 2017-11-17 2018-06-12 极道科技(北京)有限公司 A kind of metadata daily record update method of symmetric distributed file system
CN109684181A (en) * 2018-11-20 2019-04-26 华为技术有限公司 Alarm root is because of analysis method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5469463A (en) * 1988-03-30 1995-11-21 Digital Equipment Corporation Expert system for identifying likely failure points in a digital data processing system
CN101325520A (en) * 2008-06-17 2008-12-17 南京邮电大学 Method for locating and analyzing fault of intelligent self-adapting network based on log
CN101714928A (en) * 2008-10-07 2010-05-26 中兴通讯股份有限公司 Method and system for realizing fault detection and location of communication products

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5469463A (en) * 1988-03-30 1995-11-21 Digital Equipment Corporation Expert system for identifying likely failure points in a digital data processing system
CN101325520A (en) * 2008-06-17 2008-12-17 南京邮电大学 Method for locating and analyzing fault of intelligent self-adapting network based on log
CN101714928A (en) * 2008-10-07 2010-05-26 中兴通讯股份有限公司 Method and system for realizing fault detection and location of communication products

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周浩: "基于机器学习的E级系统故障预测关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
彭剑等: "基于聚类矩阵的入侵日志关联规则算法", 《计算机工程》 *
郝春风等: "一种用于大规模文本分类的特征表示方法", 《计算机工程与应用》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105830060A (en) * 2014-02-06 2016-08-03 富士施乐株式会社 Information processing device, information processing program, storage medium, and information processing method
CN105335277A (en) * 2014-06-27 2016-02-17 可牛网络技术(北京)有限公司 Fault information processing method and device as well as terminal
CN104461844A (en) * 2014-10-31 2015-03-25 大唐移动通信设备有限公司 Log service method based on rule
CN104462606A (en) * 2014-12-31 2015-03-25 中国科学院深圳先进技术研究院 Method for determining diagnosis treatment measures based on log data
CN104462606B (en) * 2014-12-31 2018-06-22 中国科学院深圳先进技术研究院 A kind of method that diagnostic process measure is determined based on daily record data
CN104657622A (en) * 2015-03-12 2015-05-27 浪潮集团有限公司 Cluster fault analysis method based on event-driven analysis
CN105049247B (en) * 2015-07-06 2019-04-26 中国科学院信息工程研究所 A kind of network security log template abstracting method and device
CN105049247A (en) * 2015-07-06 2015-11-11 中国科学院信息工程研究所 Network safety log template extraction method and device
CN105893225A (en) * 2015-08-25 2016-08-24 乐视网信息技术(北京)股份有限公司 Automatic error processing method and device
CN105468677A (en) * 2015-11-13 2016-04-06 国家计算机网络与信息安全管理中心 Log clustering method based on graph structure
CN105468677B (en) * 2015-11-13 2019-11-19 国家计算机网络与信息安全管理中心 A kind of Log Clustering method based on graph structure
CN105577440B (en) * 2015-12-24 2019-06-11 华为技术有限公司 A kind of network downtime localization method and analytical equipment
CN105577440A (en) * 2015-12-24 2016-05-11 华为技术有限公司 Network fault time location method and analyzing device
CN105471659B (en) * 2015-12-25 2019-03-01 华为技术有限公司 A kind of failure root cause analysis method and analytical equipment
CN105471659A (en) * 2015-12-25 2016-04-06 华为技术有限公司 Root fault cause analysis method and analysis device
CN105589800A (en) * 2015-12-25 2016-05-18 中国银联股份有限公司 Application system for predicting faults of complex system
CN107402863A (en) * 2016-03-28 2017-11-28 阿里巴巴集团控股有限公司 A kind of method and apparatus for being used for the daily record by log system processing business system
CN106155827A (en) * 2016-06-28 2016-11-23 浪潮(北京)电子信息产业有限公司 A kind of cpu fault its diagnosis processing method based on Linux system and system
CN106953759A (en) * 2017-03-22 2017-07-14 联想(北京)有限公司 Cluster control method and cluster control facility
CN106953759B (en) * 2017-03-22 2020-05-26 联想(北京)有限公司 Cluster control method and cluster control equipment
CN107241212A (en) * 2017-04-20 2017-10-10 努比亚技术有限公司 A kind of log processing method and device, equipment
CN107301118B (en) * 2017-06-15 2019-11-19 中国科学院计算技术研究所 A kind of fault indices automatic marking method and system based on log
CN107301118A (en) * 2017-06-15 2017-10-27 中国科学院计算技术研究所 A kind of fault indices automatic marking method and system based on daily record
CN107301120A (en) * 2017-07-12 2017-10-27 北京京东尚科信息技术有限公司 Method and device for handling unstructured daily record
CN107181630A (en) * 2017-07-24 2017-09-19 郑州云海信息技术有限公司 The treating method and apparatus of service fault in cloud system
CN107391727A (en) * 2017-08-01 2017-11-24 北京航空航天大学 The method for digging and device of equipment fault sequence pattern
CN107577547A (en) * 2017-08-08 2018-01-12 国家超级计算深圳中心(深圳云计算中心) A kind of urgent operation of High-Performance Computing Cluster continues calculation method and system
CN107861856A (en) * 2017-11-08 2018-03-30 郑州云海信息技术有限公司 The processing method and computer-readable storage medium of warning information in cloud data system
CN108153804A (en) * 2017-11-17 2018-06-12 极道科技(北京)有限公司 A kind of metadata daily record update method of symmetric distributed file system
CN109684181A (en) * 2018-11-20 2019-04-26 华为技术有限公司 Alarm root is because of analysis method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US9128916B2 (en) Machine data web
Bailis et al. Macrobase: Prioritizing attention in fast data
EP3107026A1 (en) Event anomaly analysis and prediction
US20190079993A1 (en) Method and system for implementing efficient classification and exploration of data
CN104461842B (en) Based on daily record similitude come the method and apparatus of handling failure
US8386854B2 (en) Automatic analysis of log entries through use of clustering
US9910727B2 (en) Detecting anomalous accounts using event logs
US8024617B2 (en) Method and apparatus for cause analysis involving configuration changes
Bodik et al. Fingerprinting the datacenter: automated classification of performance crises
Gainaru et al. Taming of the shrew: Modeling the normal and faulty behaviour of large-scale hpc systems
US10599957B2 (en) Systems and methods for detecting data drift for data used in machine learning models
Oliner et al. Alert detection in system logs
Fung et al. Time-dependent event hierarchy construction
Lo et al. SMArTIC: towards building an accurate, robust and scalable specification miner
CN106570513B (en) The method for diagnosing faults and device of big data network system
CN102724059B (en) Website operation state monitoring and abnormal detection based on MapReduce
US8898092B2 (en) Leveraging user-to-tool interactions to automatically analyze defects in it services delivery
CN106104496A (en) The abnormality detection not being subjected to supervision for arbitrary sequence
Aharon et al. One graph is worth a thousand logs: Uncovering hidden structures in massive system event logs
KR101593910B1 (en) System for online monitering individual information and method of online monitering the same
US7464068B2 (en) System and method for continuous diagnosis of data streams
US8533193B2 (en) Managing log entries
CN105653444B (en) Software defect fault recognition method and system based on internet daily record data
US9298538B2 (en) Methods and systems for abnormality analysis of streamed log data
WO2015020922A1 (en) Dynamic collection analysis and reporting of telemetry data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140430