CN105471659B - A kind of failure root cause analysis method and analytical equipment - Google Patents

A kind of failure root cause analysis method and analytical equipment Download PDF

Info

Publication number
CN105471659B
CN105471659B CN201510990742.8A CN201510990742A CN105471659B CN 105471659 B CN105471659 B CN 105471659B CN 201510990742 A CN201510990742 A CN 201510990742A CN 105471659 B CN105471659 B CN 105471659B
Authority
CN
China
Prior art keywords
log
class
log information
root
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510990742.8A
Other languages
Chinese (zh)
Other versions
CN105471659A (en
Inventor
宋跃忠
林程勇
张星辰
谭屯子
高随祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510990742.8A priority Critical patent/CN105471659B/en
Publication of CN105471659A publication Critical patent/CN105471659A/en
Application granted granted Critical
Publication of CN105471659B publication Critical patent/CN105471659B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/024Standardisation; Integration using relational databases for representation of network management data, e.g. managing via structured query language [SQL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis

Abstract

The invention discloses a kind of failure root cause analysis method and analytical equipments, it is related to data mining and field of network management, during solving existing failure root cause analysis, it needs to analyze network log using a large amount of manpower and time, the problem of caused failure root cause analysis efficiency is lower, cannot exclude network failure in time.Method comprises determining that the fault time point of the network equipment;Obtain the first log information collection that the network equipment generates in first time period;First log information collection includes M class log information;Every class log information in M class log information is analyzed according to presupposition analysis strategy, obtains the N class root in M class log information because of log;Root is because of log are as follows: the log information generated when the network equipment breaks down, M >=N >=1, presupposition analysis strategy are as follows: the rule that log occurs when the predetermined network equipment failure occurs;According to N class root because log determines the reason of network equipment breaks down.

Description

A kind of failure root cause analysis method and analytical equipment
Technical field
The present invention relates to data mining and field of network management more particularly to a kind of failure root cause analysis method and analysis to set It is standby.
Background technique
With the development of network technology, the application of broadband router in a network becomes more and more extensive, and in a network It play an important role.Then, broadband router inevitably will appear failure in the process of running, when broadband router breaks down When, if determining fault occurrence reason not in time, debugging then will lead to network and temporary interruption occur, brings to enterprise Inconvenience and loss, therefore, it is necessary for determining network failure occurrence cause in time and excluding the failure of broadband router 's.
Due to containing major part in the network log that broadband router generates and broadband router run relevant letter Breath, therefore, existing technical staff can position the reason of broadband router failure occurs (i.e. failure by analysis network log Root because).But in the implementation of the present invention, technical staff has found: current failure root cause analysis mostly uses manual analysis day The mode of will, the part artificially participated in the analysis process is more, has put into a large amount of manpower and time, meanwhile, and need to tie A large amount of professional knowledge locating network fault root is closed because failure root cause analysis efficiency is lower, and then causing cannot quickly timely Exclude network failure.
Summary of the invention
To solve the above problems, the embodiment of the present invention provides a kind of failure root cause analysis method and analytical equipment, to solve During existing failure root cause analysis, need to analyze network log using a large amount of manpower and time, caused failure root because The problem of analysis efficiency is lower, cannot exclude network failure in time.
Optional to reach above-mentioned mesh, the embodiment of the present invention adopts the following technical scheme that
In a first aspect, the embodiment of the present invention provides a kind of failure root cause analysis method, executed by analytical equipment, the method May include:
Determine the fault time point of the network equipment;
Obtain the first log information collection that the network equipment generates in first time period;The first log information collection Comprising M class log information, the M is the integer more than or equal to 1, the first time period are as follows: from the fault time point it The period between the second moment after the first preceding moment to the fault time point;
Every class log information in the M class log information is analyzed according to presupposition analysis strategy, obtains the M class day N class root in will information is because of log;The N class root is because of log are as follows: the log information that the network equipment generates when breaking down, M >=N >=1, the presupposition analysis strategy are as follows: the rule that log occurs when the predetermined network equipment failure occurs Rule;
The reason of network equipment breaks down because log determines according to the N class root.
Since the network equipment may generate at least a kind of log information (i.e. root is because of log), and this when failure occurs Obvious characteristic rule is presented in appearance of a little class log informations near fault time point, for this purpose, invention technician In conjunction with a large amount of far, the log information nearby generated to a large amount of fault time points in advance is analyzed, and is excavated The characteristic rule that failure root occurs by log: (1) at least a kind of log information generated when failure occurs would generally be combined one Rise near fault point repeat and continual appearance;(2) a kind of log information generated when failure occurs is usually long A period in frequently occur, and in the trend that increases suddenly at fault time point.
Therefore, described that the M class log is believed according to presupposition analysis strategy in a kind of achievable mode of first aspect Every class log information is analyzed in breath, obtains the N class root in the M class log information because log may include:
The corresponding M Log Types of the M class log information are divided into i different log combinations;Each log combination Comprising at least one Log Types in the M Log Types, and each log combines the Log Types for including respectively not Identical, the i is the integer more than or equal to 1;
The i log combination is traversed, determines at least one of i log combination root because log is combined;It is described Root is combined because of log are as follows: log that is frequent in the first time period and persistently occurring is combined;
To at least one described root because log combination is handled;
Will treated at least one root because the corresponding at least a kind of log information of log combination be determined as the N class root because Log.
It can select, any log is combined, determine that the log group is combined into root because log combination may include:
The first time period is divided at least one time window, by each time at least one described time window Window is divided at least one small time window;
It calculates the log and combines the first frequency occurred in any time window;The first frequency are as follows: the time The ratio of the number for the small time window that the number and the time window for occurring the small time window of the log combination in window include;
If the first frequency is greater than the first preset threshold, it is determined that the log group is combined into frequent in the time window Log combination;
It calculates the frequent log and combines the second frequency occurred in the first time period;The second frequency are as follows: The time that the number and the first time period for occurring the time window of the frequent log combination in the first time period include The ratio of the number of window;
If the second frequency is greater than the second preset threshold, it is determined that the frequent log group is combined into root because log is combined.
Optionally, may include: because log combination carries out processing at least one described root
If traverse at least one root that i log combination is determined because in log combination there are first because of log group Close and second because log combine, and described first because log combination be included in described second because log combination in;Then institute It states and includes: because log combination carries out processing at least one described root
Corresponding second frequency, which is combined, because of log when described first is greater than described second because of log combination corresponding the When two frequencies, described first is not rejected because log is combined;
Corresponding second frequency, which is combined, because of log when described first is less than described second because of log combination corresponding the When two frequencies, described first is rejected because log is combined.
Alternatively, if traverse at least one root that i log combination is determined because in log combination there are third root because Log combination, and the third root because of the root before log group the is combined into fault time point because log is combined, then it is described to time Go through at least one root that the i log combination is determined includes: because log combination carries out processing
The third root is rejected because log is combined.
It should be noted that in embodiments of the present invention, the log group is combined into the group comprising at least one Log Types It closes, the time window is a time interval, and during dividing to time window, the size of each time window can be equal Can be unequal, the size of each small time window can be equal or unequal, and the log combination can in time window appearance To refer to: the corresponding log information of Log Types that the log combination includes occurs in the corresponding time interval of the time window.
First preset threshold and the second preset threshold can according to need and be configured, the embodiment of the present invention to this without It limits, if a log combines corresponding first frequency and is greater than the first preset threshold, then it represents that log combination concentrates on certain for the moment Quarter frequently occurs, and is determined as frequent log combination, if the log combines corresponding first frequency and is less than or equal to the first default threshold Value, it is believed that it is some log informations generated when the network equipment is normal that corresponding log information is combined in the log;If frequently The second frequency of log combination is greater than the second preset threshold, then it represents that the day at a time frequently occurred aims at first time period It inside persistently frequently occurs, i.e., the frequent log group is combined into the log for repeating in first time period and uninterruptedly occurring, and meets failure The rule that root occurs by log determines that the frequent log group is combined into root because log is combined, combines the log log class for including The corresponding log information of type is determined as the log information generated when the network equipment breaks down, if frequently the second frequency of log combination Rate is less than or equal to the second preset threshold, then it represents that aims at the day at a time frequently occurred to continue in first time period Property occur, it is believed that it is some log informations for the generating when network equipment is normal that corresponding log information is combined in the log.
In addition, by a large amount of failure correlation logs and the discovery of normal log analysis, periodicity is often presented in normal log The rule of appearance, distribution is relatively uniform, and the comparison occurred in entire log is frequent.And failure Gen Yin aims at fault point But almost never occur in unexpected increasing trend, and in the corresponding log of non-faulting mode, this and appearance described in information theory The higher content information content of frequency is lower consistent, for this purpose, another in first aspect can be in realization mode, the basis is pre- If analysis strategy analyzes every class log information in the M class log information, the N class in the M class log information is obtained Root may include: because of log
The determining and one-to-one M exceptional value of the M class log information;The exceptional value is for indicating: a kind of log The frequent degree and mutation content that information occurs in second time period, the second time period include the first time period;
Top n largest outliers are obtained from the M exceptional value, and first log information is concentrated and the preceding N The corresponding N class log information of a largest outliers is determined as the N class root because of log.
Optionally, the second log information collection that the network equipment generates in the second time period can be obtained; The second log information collection includes at least one log information, each log information corresponding time point;
The second log information collection is pre-processed, the first log behavioural matrix is obtained;The first log behavior Matrix includes: Q group log behavior vector, and every group of log behavior vector occupies a time interval, every group of log behavior vector packet Containing R element, the R is the group number of the corresponding Log Types of the second log information collection, the R >=M;The log J-th of element representation in behavior vector: the number of jth class log information in the time interval of the log behavior vector;
According to formulaCalculate separately the R class log information Exceptional value, obtain and the one-to-one R exceptional value of the R class log information;
It is obtained and the one-to-one M exceptional value of the M class log information from the R exceptional value.
Wherein, the time interval is bigger, usually dozens of minutes, and the time interval of every group of log behavior vector can With it is equal can also be unequal.
It is describedIndicate that jth class log information is concentrated out in second log information in the R class log information Existing frequent degree, it is describedIndicate that jth class log information is described in the R class log information The mutation content that two log informations are concentrated, the qjIt is described for the group number of the log behavior vector comprising jth class log information ck+1,jIndicate the total quantity of jth class log information in+1 time interval of kth, the ck,jIndicate jth in k-th of time interval The total quantity of class log information.
It should be noted that above two mode obtains the N class root in the M class log information because log can individually be held Row, can also be combined execution, the exact cause occurred with more accurate locating network fault.
Finally, due to which log information is record information of the network equipment in the crawler behavior at a time point, therefore, Ke Yizhi It obtains and takes the N class root because of log, the reason of N class root is broken down because of the corresponding record information of log as the network equipment; Can also be analyzed using existing analysis method combination N class root because of log, determination cause N class root because of the most fundamental of log therefore Hinder reason.
It can also will merge on the basis of the first obtains mode of the N class root in the M class log information because of log The number that each Log Types occur at least one root is combined because of log with it at least one Log Types afterwards is corresponding Record, the basic reason that the corresponding log information of the highest Log Types of number is occurred directly as network equipment failure.
Second aspect, the embodiment of the present invention also provide a kind of analytical equipment, for analyzing the failure root of the network equipment because of institute Stating analytical equipment may include:
Determination unit, for determining the fault time point of the network equipment;
Acquiring unit, the first log information collection generated in first time period for obtaining the network equipment;It is described First log information collection includes M class log information, and the M is the integer more than or equal to 1, the first time period are as follows: from institute The period between the second moment after the first moment to the fault time point before stating fault time point;
Analytical unit, every class in the M class log information for being got according to presupposition analysis strategy to the acquiring unit Log information is analyzed, and obtains the N class root in the M class log information because of log;The N class root is because of log are as follows: the net The log information generated when network device fails, M >=N >=1, the presupposition analysis strategy are as follows: predetermined described The rule that log occurs when network equipment failure occurs;
The determination unit, be also used to according to the N class root because log determines, the network equipment breaks down the reason of.
Since the network equipment may generate at least a kind of log information (i.e. root is because of log), and this when failure occurs Obvious characteristic rule is presented in appearance of a little class log informations near fault time point, for this purpose, invention technician In conjunction with a large amount of far, the log information nearby generated to a large amount of fault time points in advance is analyzed, and is excavated The characteristic rule that failure root occurs by log: (1) at least a kind of log information generated when failure occurs would generally be combined one Rise near fault point repeat and continual appearance;(2) a kind of log information generated when failure occurs is usually long A period in frequently occur, and in the trend that increases suddenly at fault time point.
Therefore, in a kind of achievable mode of second aspect, the analytical unit can be used for:
The corresponding M Log Types of the M class log information are divided into i different log combinations;Each log combination Comprising at least one Log Types in the M Log Types, and each log combines the Log Types for including respectively not Identical, the i is the integer more than or equal to 1;
The i log combination is traversed, determines at least one of i log combination root because log is combined;It is described Root is combined because of log are as follows: log that is frequent in the first time period and persistently occurring is combined;
To at least one described root because log combination is handled;
Will treated at least one root because the corresponding at least a kind of log information of log combination be determined as the N class root because Log.
Wherein, any log is combined, the analytical unit can be used for:
The first time period is divided at least one time window, by each time at least one described time window Window is divided at least one small time window;
It calculates the log and combines the first frequency occurred in any time window;The first frequency are as follows: the time The ratio of the number for the small time window that the number and the time window for occurring the small time window of the log combination in window include;
If the first frequency is greater than the first preset threshold, it is determined that the log group is combined into frequent in the time window Log combination;
It calculates the frequent log and combines the second frequency occurred in the first time period;The second frequency are as follows: The time that the number and the first time period for occurring the time window of the frequent log combination in the first time period include The ratio of the number of window;
If the second frequency is greater than the second preset threshold, it is determined that the frequent log group is combined into root because log is combined.
It should be noted that in embodiments of the present invention, the log group is combined into the group comprising at least one Log Types It closes, the time window is a time interval, and during dividing to time window, the size of each time window can be equal Can be unequal, the size of each small time window can be equal or unequal, and the log combination can in time window appearance To refer to: the corresponding log information of Log Types that the log combination includes occurs in the corresponding time interval of the time window.
First preset threshold and the second preset threshold can according to need and be configured, the embodiment of the present invention to this without It limits, if a log combines corresponding first frequency and is greater than the first preset threshold, then it represents that log combination concentrates on certain for the moment Quarter frequently occurs, and is determined as frequent log combination, if the log combines corresponding first frequency and is less than or equal to the first default threshold Value, it is believed that it is some log informations generated when the network equipment is normal that corresponding log information is combined in the log;If frequently The second frequency of log combination is greater than the second preset threshold, then it represents that the day at a time frequently occurred aims at first time period It inside persistently frequently occurs, i.e., the frequent log group is combined into the log for repeating in first time period and uninterruptedly occurring, and meets failure The rule that root occurs by log determines that the frequent log group is combined into root because log is combined, combines the log log class for including The corresponding log information of type is determined as the log information generated when the network equipment breaks down, if frequently the second frequency of log combination Rate is less than or equal to the second preset threshold, then it represents that aims at the day at a time frequently occurred to continue in first time period Property occur, it is believed that it is some log informations for the generating when network equipment is normal that corresponding log information is combined in the log.
In addition, by a large amount of failure correlation logs and the discovery of normal log analysis, periodicity is often presented in normal log The rule of appearance, distribution is relatively uniform, and the comparison occurred in entire log is frequent.And failure Gen Yin aims at fault point But almost never occur in unexpected increasing trend, and in the corresponding log of non-faulting mode, this and appearance described in information theory The higher content information content of frequency is lower consistent, for this purpose, another in second aspect can be in realization mode, the analysis is singly Member can be used for:
The determining and one-to-one M exceptional value of the M class log information;The exceptional value is for indicating: a kind of log The frequent degree and mutation content that information occurs in second time period, the second time period include the first time period;
The analytical unit is used for: top n largest outliers is obtained from the M exceptional value, by first log Information concentrates N class log information corresponding with the top n largest outliers to be determined as the N class root because of log.
Optionally, the second log information collection that the available network equipment generates in the second time period;Institute Stating the second log information collection includes at least one log information, each log information corresponding time point;
The second log information collection is pre-processed, the first log behavioural matrix is obtained;The first log behavior Matrix includes: Q group log behavior vector, and every group of log behavior vector occupies a time interval, every group of log behavior vector packet Containing R element, the R is the group number of the corresponding Log Types of the second log information collection, the R >=M;The log J-th of element representation in behavior vector: the number of jth class log information in the time interval of the log behavior vector;
According to formulaCalculate separately the R class log information Exceptional value, obtain and the one-to-one R exceptional value of the R class log information;
It is obtained and the one-to-one M exceptional value of the M class log information from the R exceptional value.
Wherein, the time interval is bigger, usually dozens of minutes, and the time interval of every group of log behavior vector can With it is equal can also be unequal.
It is describedIndicate that jth class log information is concentrated out in second log information in the R class log information Existing frequent degree, it is describedIndicate that jth class log information is described in the R class log information The mutation content that two log informations are concentrated, the qjIt is described for the group number of the log behavior vector comprising jth class log information ck+1,jIndicate the total quantity of jth class log information in+1 time interval of kth, the ck,jIndicate jth in k-th of time interval The total quantity of class log information.
It should be noted that above two mode obtains the N class root in the M class log information because log can individually be held Row, can also be combined execution, the exact cause occurred with more accurate locating network fault.
Finally, due to log information is record information of the network equipment in the crawler behavior at a time point, it is therefore, described true Order member, can be used for:
The N class root is directly acquired because of log, occur using N class root because of the corresponding record information of log as the network equipment therefore The reason of barrier;
It can be also used for using existing analysis method combination N class root because log is analyzed, determination causes N class root because of log The most fundamental failure cause.
It can be used for obtaining the N class root in the M class log information because of the side of log using the first in analytical unit On the basis of formula, by each Log Types at least one Log Types after merging with its at least one root because log is combined The number corresponding record of middle appearance sends out the corresponding log information of the highest Log Types of number directly as network equipment failure Raw basic reason.
The third aspect, the embodiment of the present invention also provide a kind of analytical equipment, for analyzing the failure root of the network equipment because of institute Stating analytical equipment may include:
Processor, for determining the fault time point of the network equipment;
Receiver, the first log information collection generated in first time period for obtaining the network equipment;Described One log information collection includes M class log information, and the M is integer more than or equal to 1, the first time period are as follows: from described The period between the second moment after the first moment to the fault time point before fault time point;
Processor, every class log in the M class log information for being got according to presupposition analysis strategy to the receiver Information is analyzed, and obtains the N class root in the M class log information because of log;The N class root is because of log are as follows: the network is set For the log information generated when breaking down, M >=N >=1, the presupposition analysis strategy are as follows: the predetermined network The rule that log occurs when equipment fault occurs;
The processor, be also used to according to the N class root because log determines, the network equipment breaks down the reason of.
Since the network equipment may generate at least a kind of log information (i.e. root is because of log), and this when failure occurs Obvious characteristic rule is presented in appearance of a little class log informations near fault time point, for this purpose, invention technician In conjunction with a large amount of far, the log information nearby generated to a large amount of fault time points in advance is analyzed, and is excavated The characteristic rule that failure root occurs by log: (1) at least a kind of log information generated when failure occurs would generally be combined one Rise near fault point repeat and continual appearance;(2) a kind of log information generated when failure occurs is usually long A period in frequently occur, and in the trend that increases suddenly at fault time point.
Therefore, in a kind of achievable mode of the third aspect, the processor can be used for:
The corresponding M Log Types of the M class log information are divided into i different log combinations;Each log combination Comprising at least one Log Types in the M Log Types, and each log combines the Log Types for including respectively not Identical, the i is the integer more than or equal to 1;
The i log combination is traversed, determines at least one of i log combination root because log is combined;It is described Root is combined because of log are as follows: log that is frequent in the first time period and persistently occurring is combined;
To at least one described root because log combination is handled;
Will treated at least one root because the corresponding at least a kind of log information of log combination be determined as the N class root because Log.
Wherein, any log is combined, the processor can be used for:
The first time period is divided at least one time window, by each time at least one described time window Window is divided at least one small time window;
It calculates the log and combines the first frequency occurred in any time window;The first frequency are as follows: the time The ratio of the number for the small time window that the number and the time window for occurring the small time window of the log combination in window include;
If the first frequency is greater than the first preset threshold, it is determined that the log group is combined into frequent in the time window Log combination;
It calculates the frequent log and combines the second frequency occurred in the first time period;The second frequency are as follows: The time that the number and the first time period for occurring the time window of the frequent log combination in the first time period include The ratio of the number of window;
If the second frequency is greater than the second preset threshold, it is determined that the frequent log group is combined into root because log is combined.
It should be noted that in embodiments of the present invention, the log group is combined into the group comprising at least one Log Types It closes, the time window is a time interval, and during dividing to time window, the size of each time window can be equal Can be unequal, the size of each small time window can be equal or unequal, and the log combination can in time window appearance To refer to: the corresponding log information of Log Types that the log combination includes occurs in the corresponding time interval of the time window.
First preset threshold and the second preset threshold can according to need and be configured, the embodiment of the present invention to this without It limits, if a log combines corresponding first frequency and is greater than the first preset threshold, then it represents that log combination concentrates on certain for the moment Quarter frequently occurs, and is determined as frequent log combination, if the log combines corresponding first frequency and is less than or equal to the first default threshold Value, it is believed that it is some log informations generated when the network equipment is normal that corresponding log information is combined in the log;If frequently The second frequency of log combination is greater than the second preset threshold, then it represents that the day at a time frequently occurred aims at first time period It inside persistently frequently occurs, i.e., the frequent log group is combined into the log for repeating in first time period and uninterruptedly occurring, and meets failure The rule that root occurs by log determines that the frequent log group is combined into root because log is combined, combines the log log class for including The corresponding log information of type is determined as the log information generated when the network equipment breaks down, if frequently the second frequency of log combination Rate is less than or equal to the second preset threshold, then it represents that aims at the day at a time frequently occurred to continue in first time period Property occur, it is believed that it is some log informations for the generating when network equipment is normal that corresponding log information is combined in the log.
In addition, by a large amount of failure correlation logs and the discovery of normal log analysis, periodicity is often presented in normal log The rule of appearance, distribution is relatively uniform, and the comparison occurred in entire log is frequent.And failure Gen Yin aims at fault point But almost never occur in unexpected increasing trend, and in the corresponding log of non-faulting mode, this and appearance described in information theory The higher content information content of frequency is lower consistent, for this purpose, another in the third aspect can be in realization mode, the processor It can be used for:
The determining and one-to-one M exceptional value of the M class log information;The exceptional value is for indicating: a kind of log The frequent degree and mutation content that information occurs in second time period, the second time period include the first time period;
The processor is used for: obtaining top n largest outliers from the M exceptional value, first log is believed Breath concentrates N class log information corresponding with the top n largest outliers to be determined as the N class root because of log.
Optionally, the second log information collection that the available network equipment generates in the second time period;Institute Stating the second log information collection includes at least one log information, each log information corresponding time point;
The second log information collection is pre-processed, the first log behavioural matrix is obtained;The first log behavior Matrix includes: Q group log behavior vector, and every group of log behavior vector occupies a time interval, every group of log behavior vector packet Containing R element, the R is the group number of the corresponding Log Types of the second log information collection, the R >=M;The log J-th of element representation in behavior vector: the number of jth class log information in the time interval of the log behavior vector;
According to formulaCalculate separately the R class log information Exceptional value, obtain and the one-to-one R exceptional value of the R class log information;
It is obtained and the one-to-one M exceptional value of the M class log information from the R exceptional value.
Wherein, the time interval is bigger, usually dozens of minutes, and the time interval of every group of log behavior vector can With it is equal can also be unequal.
It is describedIndicate that jth class log information is concentrated out in second log information in the R class log information Existing frequent degree, it is describedIndicate that jth class log information is described in the R class log information The mutation content that two log informations are concentrated, the qjIt is described for the group number of the log behavior vector comprising jth class log information ck+1,jIndicate the total quantity of jth class log information in+1 time interval of kth, the ck,jIndicate jth in k-th of time interval The total quantity of class log information.
It should be noted that above two mode obtains the N class root in the M class log information because log can individually be held Row, can also be combined execution, the exact cause occurred with more accurate locating network fault.
Finally, due to which log information is record information of the network equipment in the crawler behavior at a time point, therefore, the place Device is managed, can be used for:
The N class root is directly acquired because of log, occur using N class root because of the corresponding record information of log as the network equipment therefore The reason of barrier;
It can be also used for using existing analysis method combination N class root because log is analyzed, determination causes N class root because of log The most fundamental failure cause.
It can be used in processor in such a way that the first obtains N class root in the M class log information because of log On the basis of, by each Log Types at least one Log Types after merging with it at least one root is combined because of log The corresponding log information of the highest Log Types of number occurs the number corresponding record of appearance directly as network equipment failure Basic reason.
From the foregoing, it will be observed that the embodiment of the present invention provides a kind of failure root cause analysis method and analytical equipment, the network equipment is determined Fault time point;Obtain the first log information collection that the network equipment generates in first time period;First log Information collection includes M class log information, and the M is integer more than or equal to 1, the first time period are as follows: when from the failure Between put before the first moment to the fault time point after the second moment between period;According to presupposition analysis strategy Every class log information in the M class log information is analyzed, obtains the N class root in the M class log information because of log;Root The reason of network equipment breaks down because log determines according to the N class root.In this way, automatically to each near fault time point Class log information is analyzed, and the log information for meeting the rule that root occurs by log when failure occurs is obtained, according to the log Information determines the basic reason that the network equipment breaks down, and realizes and root occurs because automatically analyzing to network equipment failure, mention The high efficiency of failure root cause analysis.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the functional block diagram of failure root cause analysis provided in an embodiment of the present invention;
Fig. 2 is the structure chart of analytical equipment 20 provided in an embodiment of the present invention;
Fig. 3 is the flow chart of failure root cause analysis method provided in an embodiment of the present invention;
Fig. 4 is the structure chart of analytical equipment 30 provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Basic principle of the invention is: first to the root near a large amount of fault points because log occur mode carry out data mining with Machine learning finds out failure root when failure occurs and then finds a kind of suitable mathematics according to this rule because of the rule of appearance Analysis method analyzes the log information near fault point in online log in real time, if the log information near fault point It is middle to there is a kind of log information for meeting the rule, it is determined that the Gen Yin when log information breaks down for the network equipment Will, and then according to the root because log determines the basic reason that network failure occurs, in this way, according to suitable analysis method to log Information, which automatically analyze, finds out root because of log, improves the efficiency of failure root cause analysis.
For example, Fig. 1 is the functional block diagram of failure root cause analysis provided in an embodiment of the present invention, as shown in Figure 1, to offline day Will carries out data mining and machine learning and obtains failure root because (such as: Gen Yin aims at fault point nearby not for the pests occurrence rule of log It is intermittent to repeat or Gen Yin aims at fault point in unexpected increasing trend), then, log is passed through to online log Pretreatment, fault point confirmation, the work of the aspect of root cause analysis three, orient the precise reason why of network failure generation, so by therefore Hinder reason and testing staff is fed back to by analytical statement;Wherein, log integrity specifically includes that Log Clustering, and being used for will be same The log of type is uniformly processed;Fault point confirmation is primarily referred to as: the time point that the confirmation network equipment breaks down;Root cause analysis master It include: to carry out the rule that the failure root that data mining and machine learning obtain occurs by log according to offline logs, to event Log information near barrier point is analyzed, and acquisition meets root because of the log information of log pests occurrence rule, according to the log information Determine the precise reason why that network equipment failure occurs.It should be noted that offline logs refer to this in functional block diagram shown in FIG. 1 The log that invention of training uses, and online log refers to the actual log that the present invention applies.
Wherein, method provided by the invention can analytical equipment 20 as shown in Figure 2 execute, for being carried out to the network equipment 10 Accident analysis and positioning.The analytical equipment 20 can be with are as follows: interchanger, router, Network Management Equipment, Web (webpage) server, soft Part defines any equipment in the equipment such as network (Software Defined Network, SDN) controller.Optionally, such as Shown in Fig. 2, the analytical equipment 20 may include: processor 2011, memory 2012, receiver 2013, transmitter 2014 with And at least one communication bus 2015, for realizing the connection between these devices and it is in communication with each other;
Receiver 2013 can be used for carrying out data interaction between ext nal network element, such as: the day that collection network equipment 10 generates Will information.
Memory 2012 can be volatile memory (volatile memory), such as random access memory (random-access memory, RAM);Or nonvolatile memory (non-volatile memory), such as read-only deposit Reservoir (read-only memory, ROM), flash memory (flash memory), hard disk (hard disk drive, HDD) Or solid state hard disk (solid-state drive, SSD);Or the combination of the memory of mentioned kind.
Processor 2011 may be a central processing unit (central processing unit, referred to as CPU), It can be specific integrated circuit (Application Specific Integrated Circuit, ASIC), or be configured At the one or more integrated circuits for implementing the embodiment of the present invention, such as: one or more microprocessors (digital Singnal processor, DSP), or, one or more field programmable gate array (Field Programmable Gate Array, FPGA);Failure root is obtained because of the generation of log with machine learning for first carrying out data mining to offline logs Then rule obtains the fault point that the network equipment breaks down by log integrity, fault point confirmation to online log, obtains The log information near fault point is taken, according to root because log pests occurrence rule carries out root because dividing to the log information near fault point Analysis, acquisition meet root because of the log information of log pests occurrence rule, determine the root that the network equipment breaks down according to the log information This reason.
Transmitter 2014 can be used for carrying out data interaction between ext nal network element, such as: can be a human-computer interaction interface, use Testing staff is fed back in the failure cause for orienting processor 2011.
Communication bus 2015 can be divided into address bus, data/address bus, control bus etc., can be Industry Standard Architecture knot Structure (Industry Standard Architecture, ISA) bus, external equipment interconnection (Peripheral Component, PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, EISA) Bus etc..Only to be indicated with a thick line in Fig. 2 convenient for indicating, it is not intended that an only bus or a type of total Line.
Optionally, processor 2011, for determining the fault time point of the network equipment.
Wherein, the fault time point is the time point that the network equipment breaks down, since within a period, network is set Standby that multiple failure may occur, therefore, above-mentioned fault time point can refer to the time that the network equipment arbitrarily once breaks down Point.
Receiver 2013, the first log information collection generated in first time period for obtaining the network equipment;Institute Stating the first log information collection includes M class log information, and the M is the integer more than or equal to 1, the first time period are as follows: from The second moment after the first moment to the fault time point before the fault time point that the processor 2011 determines it Between period.
Wherein, the duration from the fault time o'clock to the first moment and from the fault time o'clock to second The duration at moment, which can according to need, to be configured, and the embodiment of the present invention is to this without limiting, and the present invention is only to obtain failure Log information near time point is principle, to determine the first moment and the second moment.
Preferably, before available fault time point 40 minutes after 20 minutes log informations and fault time point The log information of clock, the log information that will acquire is as the first log information collection in first time period;It can also only obtain Log information in the latter time period of fault time point (in such as 60 minutes), the log information that will acquire is as the first log Information collection.
Processor 2011, in the M class log information for being got according to presupposition analysis strategy to the receiver 2013 Every class log information is analyzed, and obtains the N class root in the M class log information because of log;The N class root is because of log are as follows: institute State the log information generated when the network equipment breaks down, M >=N >=1, the presupposition analysis strategy are as follows: predetermined The rule that log occurs when the network equipment failure occurs.
The processor 2011, the N class root for being also used to be obtained according to the processor 2011 network because log determines The reason of device fails.
Since the network equipment may generate at least a kind of log information (i.e. root is because of log), and these when failure occurs Obvious characteristic rule is presented in appearance of the class log information near fault time point, for this purpose, invention technician ties A large amount of far is closed, the log information nearby generated to a large amount of fault time points in advance is analyzed, and event is excavated The characteristic rule that barrier root occurs by log: (1) at least a kind of log information generated when failure occurs would generally be combined Near fault point repeat and continual appearance;(2) a kind of log information generated when failure occurs is usually long It is frequently occurred in one period, and is based on this in the trend increased suddenly at fault time point, invention technician proposes It is capable of determining that and meets root because of the analysis strategy of log occurrence law, the M class log got is believed according to the analysis strategy Breath is analyzed, and determines a few class log informations for meeting root because of log occurrence law, optionally, processor 2011 can pass through Following two ways obtain root because of log:
(1) the corresponding M Log Types of the M class log information are divided into i different logs to combine;Each log group It closes comprising at least one Log Types in the M Log Types, and the Log Types that each log combination includes are each Not identical, the i is the integer more than or equal to 1;
The i log combination is traversed, determines at least one of i log combination root because log is combined;It is described Root is combined because of log are as follows: log that is frequent in the first time period and persistently occurring is combined;
To at least one described root because log combination is handled;
Will treated at least one root because the corresponding at least a kind of log information of log combination be determined as the N class root because Log.
Wherein, any log is combined, determines that the log group is combined into root because log combination may include:
The first time period is divided at least one time window, by each time at least one described time window Window is divided at least one small time window;
It calculates the log and combines the first frequency occurred in any time window;The first frequency are as follows: the time The ratio of the number for the small time window that the number and the time window for occurring the small time window of the log combination in window include;
If the first frequency is greater than the first preset threshold, it is determined that the log group is combined into frequent in the time window Log combination;
It calculates the frequent log and combines the second frequency occurred in the first time period;The second frequency are as follows: The time that the number and the first time period for occurring the time window of the frequent log combination in the first time period include The ratio of the number of window;
If the second frequency is greater than the second preset threshold, it is determined that the frequent log group is combined into root because log is combined.
It should be noted that in embodiments of the present invention, the log group is combined into the group comprising at least one Log Types It closes, the time window is a time interval, and during dividing to time window, the size of each time window can be equal Can be unequal, the size of each small time window can be equal or unequal, and the log combination can in time window appearance To refer to: the corresponding log information of Log Types that the log combination includes occurs in the corresponding time interval of the time window.
First preset threshold and the second preset threshold can according to need and be configured, the embodiment of the present invention to this without It limits, if a log combines corresponding first frequency and is greater than the first preset threshold, then it represents that log combination concentrates on certain for the moment Quarter frequently occurs, and is determined as frequent log combination, if the log combines corresponding first frequency and is less than or equal to the first default threshold Value, it is believed that it is some log informations generated when the network equipment is normal that corresponding log information is combined in the log;If frequently The second frequency of log combination is greater than the second preset threshold, then it represents that the day at a time frequently occurred aims at first time period It inside persistently frequently occurs, i.e., the frequent log group is combined into the log for repeating in first time period and uninterruptedly occurring, and meets failure The rule that root occurs by log determines that the frequent log group is combined into root because log is combined, combines the log log class for including The corresponding log information of type is determined as the log information generated when the network equipment breaks down, if frequently the second frequency of log combination Rate is less than or equal to the second preset threshold, then it represents that aims at the day at a time frequently occurred to continue in first time period Property occur, it is believed that it is some log informations for the generating when network equipment is normal that corresponding log information is combined in the log.
In addition, by a large amount of failure correlation logs and the discovery of normal log analysis, periodicity is often presented in normal log The rule of appearance, distribution is relatively uniform, and the comparison occurred in entire log is frequent.And failure Gen Yin aims at fault point But almost never occur in unexpected increasing trend, and in the corresponding log of non-faulting mode, this and appearance described in information theory The higher content information content of frequency is lower consistent, and we have proposed the exceptional value calculating sides for meeting log behavior pattern thus Method, and fault log is selected based on obtained exceptional value, specific implementation is as shown in (2):
(2) the determining and one-to-one M exceptional value of the M class log information;The exceptional value is for indicating: Yi Lei The frequent degree and mutation content that will information occurs in second time period, the second time period include the first time Section;
Top n largest outliers are obtained from the M exceptional value, and first log information is concentrated and the preceding N The corresponding N class log information of a largest outliers is determined as the N class root because of log.
Optionally, the second log information collection that the available network equipment generates in the second time period;Institute Stating the second log information collection includes at least one log information, each log information corresponding time point;
The second log information collection is pre-processed, the first log behavioural matrix is obtained;The first log behavior Matrix includes: Q group log behavior vector, and every group of log behavior vector occupies a time interval, every group of log behavior vector packet Containing R element, the R is the group number of the corresponding Log Types of the second log information collection, the R >=M;The log J-th of element representation in behavior vector: the number of jth class log information in the time interval of the log behavior vector;
According to formulaCalculate separately the R class log information Exceptional value, obtain and the one-to-one R exceptional value of the R class log information;
It is obtained and the one-to-one M exceptional value of the M class log information from the R exceptional value.
Wherein, the time interval is bigger, usually dozens of minutes, and the time interval of every group of log behavior vector can With it is equal can also be unequal.
It is describedIndicate that jth class log information is concentrated out in second log information in the R class log information Existing frequent degree, it is describedIndicate that jth class log information is described in the R class log information The mutation content that two log informations are concentrated, the qjIt is described for the group number of the log behavior vector comprising jth class log information ck+1,jIndicate the total quantity of jth class log information in+1 time interval of kth, the ck,jIndicate jth in k-th of time interval The total quantity of class log information.
It should be noted that above two mode can be individually performed, execution can also be combined, with more accurate Locating network fault occur exact cause, such as: mode (1) can be first passed through and determine that the 1st class, the 5th class log information are The log combination frequently and persistently occurred, then, further according to mode (2) to the exceptional value of the log information of the 1st class, the 5th class into Row calculates, if the corresponding 1st class log of the first log information collection and the 5th class log, and the exceptional value of the 1st class log is a most in preceding M In big exceptional value, it is determined that the 1st class log information is the root of failure generation because of log, in this way, improving network failure reason The accuracy of analysis.
Since log information is record information of the network equipment in the crawler behavior at a time point, therefore, the processor 2011, it can be used for directly acquiring the N class root because of log, using N class root because the corresponding record information of log is as the network equipment The reason of breaking down;
Can also be analyzed using existing analysis method combination N class root because of log, determination cause N class root because of log most Basic failure cause;
Can also be on the basis of processor 2011 (1) by the way of, it will be at least one Log Types after merging The number corresponding record that each Log Types occur at least one root is combined because of log with it, by the highest log class of number The basic reason that the corresponding log information of type occurs directly as network equipment failure.
From the foregoing, it will be observed that the embodiment of the present invention provides a kind of analytical equipment, the fault time point of the network equipment is determined;Obtain institute State the first log information collection that the network equipment generates in first time period;The first log information collection is believed comprising the log of M class Breath, the M are integer more than or equal to 1, the first time period are as follows: from the first moment before the fault time point to The period between the second moment after the fault time point;According to presupposition analysis strategy in the M class log information Every class log information is analyzed, and obtains the N class root in the M class log information because of log;According to the N class root because log is true The reason of fixed network equipment breaks down.In this way, all kinds of log informations near fault time point are analyzed automatically, The log information for meeting the rule that root occurs by log when failure occurs is obtained, the network equipment is determined according to the log information The basic reason of failure realizes and root occurs because automatically analyzing to network equipment failure, improves the effect of failure root cause analysis Rate.
For ease of description, following embodiment one is shown in the form of step and analytical equipment in the present invention is described in detail The 20 failure root cause analysis methods executed, wherein the step of showing can also can in such as one group in addition to analytical equipment 20 It is executed in the computer system executed instruction, such as: method of the present invention can also be executed by the network equipment 10, i.e. Fig. 2 institute The unit for the execution method provided by the invention for including in the analytical equipment 20 shown also may be embodied in the network equipment 10, by net Network equipment 10 executes failure root cause analysis method provided by the invention.Although in addition, be shown in figure logical order, It in some cases, can be with the steps shown or described are performed in an order that is different from the one herein.
Embodiment one
Fig. 3 is the flow chart of failure root cause analysis method provided in an embodiment of the present invention, analytical equipment 20 as shown in Figure 2 It executes, for carrying out failure root cause analysis to the analytical equipment 20 in Fig. 2, as shown in figure 3, the method may include:
S101: the fault time point of the network equipment is determined.
Wherein, the fault time point is the time point that the network equipment breaks down, since within a period, network is set Standby that multiple failure may occur, therefore, above-mentioned fault time point can refer to the time that the network equipment arbitrarily once breaks down Point.
Optionally, the fault time point that the network equipment can be determined using existing people's method, can also use following methods The fault time point of locating network device:
Obtain at least one log information that the network equipment generates within a period;
At least one log information is handled, the second log lines of formation are matrix;Wherein, second log Behavioural matrix includes X log behavior vector, each log behavior vector one small time interval of occupancy, each log behavior to Amount includes Y element;The Y is the number of Log Types, y-th of element representation in the log behavior vector: described In the small time interval of log behavior vector and the number of the log information that belongs to y class;
Second log lines are calculated for the log behavior vector in matrix according to preset model, determine the net The time of failure of network equipment;Wherein, the preset model is used for: filtering out the behavior met when the network equipment breaks down The log behavior vector of feature.
Wherein, at least one log information is the record information of crawler behavior of the network equipment within a period, Every log information describes the network equipment once individual crawler behavior, and every log information may include: the network equipment is held Act the information such as timestamp, host or the module name of part, event level, information profile, event message;Need to illustrate when, Multiple failure can occur for the network equipment in one period.
Optionally, analytical equipment can grab at least one log that technology obtains the network equipment by existing log scan Information, such as: at least one log information of the network equipment can be obtained by web crawlers technology, in this not go into detail.
Optionally, analytical equipment can be handled at least one log information using following methods, form the Two log behavioural matrixes:
The content format of every log information is converted to preset journal format at least one log information obtained;
Log information after format conversion is sorted out, and the classification logotype belonging to log information replaces the log Information forms a time series being made of classification logotype;
The time series is divided according to small time interval is preset;
For each small time interval, classification logotype identical in the small time interval is subjected to counting statistics, and will Statistics number is arranged in a Y dimension log behavior vector;
It is matrix that all log behavior vectors are formed second log lines sequentially in time.
Wherein, preset journal format, which can according to need, presets, and the embodiment of the present invention is to this without limiting.Example Such as: log information may include: timestamp/host name/event level/information profile/event information (Timestamp/ Device/Event severity/Briefly information/Event message) etc. information fields;And each field Format can standardize as format as shown in table 1 below, such as: being believed with the time shaped like " 21 2015 02:34:25 of Apr " format Breath indicates " timestamp " in log information, " event class " is represented with the number of grade is indicated, at this point, a log if it exists The timestamp of information are as follows: 2015-11-11 09:00:00, then need to be converted to the timestamp " 11 2015 09:00 of Nov: 00”。
1 log of table standardization format
The classification logotype is for indicating: Log Types;Such as: if log information " 21 2015 12:12 of Apr: 12User login " belongs to Log Types 1, then can represent this log information with digital " 1 ".
Preferably, every log information after format conversion can be sorted out using the method for hierarchical clustering, wherein The hierarchical clustering is the classic algorithm in artificial intelligence, and character string phase is measured using the clustering tool of q-gram algorithm Every log information after format conversion is carried out by q-gram distance as the diversity factor magnitude between different logs like degree Cluster, by adjusting clustering parameter q, obtains optimal Log Types number;Wherein, the difference of q value will lead to the difference of analog result Different, from largely experimentally, q preferably takes 3 in the present invention, influence of this value to Log Clustering result is little, specific implementation It repeats no more.
The corresponding small time interval of each log behavior vector can be equal or unequal, between the small time Be configured every can according to need, the embodiment of the present invention to this without limit, such as: can be 1 minute or 5 minutes.Example Such as, if the number of Log Types is Y, X period is marked off according to preset small time interval, then the second log constructed Behavioural matrix are as follows:
Wherein, (xT1,1xT1,2,...,xT1,Y) indicate the log behavior vector of small time interval T1, the log behavior vector In y-th of element xT1,yIt indicates: belonging to the number of the log information of y class.Such as: the number of Log Types is 10, and is used The number of 1-10 is as classification logotype, one-to-one mark Log Types 1-10, at this point, if obtaining in the small time interval of T1 To 100 log informations, wherein there is 3,70 day of classification logotype of 1,20 log informations of classification logotype of 10 log informations The classification logotype 7 of will information, then the log behavior vector of the small time interval of T1 are as follows: (10,0,20,0,0,0,70,0,0,0).
Optionally, described that second log lines are counted for the log behavior vector in matrix according to preset model It calculates, determines that the time of failure of the network equipment can specifically include following two ways:
(1) the log frequency and log type of each log behavior vector are calculated separately;
For second log lines be matrix in any log behavior vector, calculate the log behavior vector sum with Log frequency variance and log type variance between at least one adjacent log behavior vector of the log behavior vector;
If the mean value of the log frequency variance and log type variance be greater than preset threshold, by the log behavior to It measures corresponding small time interval and is determined as the network equipment failure time of origin.
Wherein, preset threshold can be analyzed to obtain by a large amount of fault logs, and the present invention is herein without limiting, if described The mean value of log frequency variance and log type variance is greater than preset threshold, then it represents that the log frequency of log behavior vector and day Will type mutates, which is that network failure occurs;If the mean value of the log frequency variance and log type variance Less than or equal to preset threshold, then it represents that the log frequency and log type of log behavior vector are that network equipment normal operation is capable It is characterized.
It should be noted that at least one log behavior vector adjacent with the log behavior vector can be the log Several log behavior vectors before behavior vector, or several log behavior vectors after the log behavior vector, Can also be for several log behavior vectors before and after the log behavior vector occur, number, which can according to need, to be set It sets, the embodiment of the present invention is to this without limiting;Preferably, according to many experiments it is found that adjacent with the log behavior vector At least one log behavior vector can be with are as follows: four adjacent log behavior vectors after the log behavior vector.
For example, if calculated log frequency variance and log type variance are respectively aiAnd bi, at this point, ifλ1By analyzing a large amount of fault logs to obtain preset threshold, then this vector corresponding time to be determined as event Downtime.
It should be noted that since in periodical log, the log information number of the generation in the unit time is will not to send out Raw to change, i.e. log frequency is fixed and invariable, so, for periodical log, in the fault detection of aforesaid way Frequency mutation is not significant, influences failure detection result, and in order to solve this problem, the invention proposes be based on acquisition of information Quan Fangfa is assigned in the log of technology, has comprehensively considered the distribution situation of all kinds of logs, effectively promotes the accuracy of fault time demarcation; Optionally, when the network equipment generates periodical log, the embodiment of the present invention is in the day for calculating separately each log behavior vector Before will frequency and log type, it is also necessary to carry out following processes:
According to formulaTo y-th of element in each log behavior vector into Row weighting assignment;
Wherein, y-th of element is the either element in the log behavior vector;nyGo out for y class log information The number of existing small time interval refers to that y class log information occurred in n small time intervals;Std (y) are as follows: y class The distribution variance of log information.
The distribution variance of the y class log information are as follows: the number of y class log information in the log behavior vector, It is y class day in other all log behavior vectors in matrix in addition to the log behavior vector with second log lines Variance between the number of will information.
For example, two log behavior vectors be respectively as follows: (10,0,20,0,0,0,70,0,0,0), (10,0,20,0,20,0, 30,0,10,10), i.e., in identical small time interval, 100 log informations are generated, log frequency is identical, at this point, It can be respectively that each element in the two log behavior vectors is weighted assignment according to above-mentioned assignment formula, obtain: (11.7307,0,4.79,0,0,0,2.348,0,0,0), (2.5597,0,3.9780,0,2.67,0,30,0,5.648,10), In this way, the corresponding characterization value of each log behavior is different, replace original log frequency that can make fault time positioning more with it It is accurate to add.
(2) each log behavior vector in the X log behavior vector, the log behavior vector sum are traversed Similitude after the log behavior vector time between the log behavior vector adjacent with the log behavior vector, obtains To fiducial value corresponding with the log behavior vector;
It is obtaining with the X log behavior to each log behavior vector traversed in the X log behavior vector Each one-to-one fiducial value of log behavior vector in vector is arranged from big to small;
The small time interval of the corresponding log behavior vector of preceding k value after arrangement is determined as the network equipment failure Time of origin;Wherein, k is the integer more than or equal to 1.
It optionally, can be according to formulaCompare the log behavior vector sum the log behavior to Similitude after the amount time between the log behavior vector adjacent with the log behavior vector, obtains and the log behavior The corresponding fiducial value of vector;Wherein, t is small time interval locating for log behavior vector, xt,yRepresent t row log behavior to Y-th of element of amount.
Wherein, in embodiments of the present invention, k is the integer more than or equal to 1, and number k can be chosen with experience, may be used also To set a threshold value, by fiducial value be greater than the threshold value k log behavior vector be determined as the log behavior being abnormal to For network equipment failure point occurs for amount.
It should be noted that above two mode can be individually performed, execution can also be combined, with more accurate Locating network fault occur exact time, such as: can first pass through mode (1) determine the 1st row, the 5th row log behavior to The frequency and type of amount mutate, and point occur for failure, then, further according to mode (2) only to the similitude of the 1st row, the 5th row It is compared, determines that point occurs for the 1st row or the 5th behavior failure.
S102: the first log information collection that the network equipment generates in first time period is obtained;First log Information collection includes M class log information, and the M is integer more than or equal to 1, the first time period are as follows: when from the failure Between put before the first moment to the fault time point after the second moment between period.
Wherein, the duration from the fault time o'clock to the first moment and from the fault time o'clock to second The duration at moment, which can according to need, to be configured, and the embodiment of the present invention is to this without limiting, and the present invention is only to obtain failure Log information near time point is principle, to determine the first moment and the second moment.
Preferably, before available fault time point 40 minutes after 20 minutes log informations and fault time point The log information of clock, the log information that will acquire is as the first log information collection in first time period;It can also only obtain Log information in the latter time period of fault time point (in such as 60 minutes), the log information that will acquire is as the first log Information collection.
Next, the content format for every log information that the first log information that can will acquire is concentrated be converted to it is pre- If journal format, then, using hierarchical clustering method to format conversion after every log information sort out, determine institute Stating the first log information collection includes M class log information;Can also be directly on the basis of S101, the second log lines of inquiry are matrix The Log Types for including, if there is M class log information in the first time period that the second log lines are matrix, it is determined that first Log information collection corresponds to M class log information.
S103: every class log information in the M class log information is analyzed according to presupposition analysis strategy, described in acquisition N class root in M class log information is because of log;The N class root is because of log are as follows: the log that the network equipment generates when breaking down Information, M >=N >=1, the presupposition analysis strategy are as follows: log occurs when the predetermined network equipment failure occurs Rule.
Since the network equipment may generate at least a kind of log information (i.e. root is because of log), and this when failure occurs Obvious characteristic rule is presented in appearance of a little class log informations near fault time point, for this purpose, invention technician In conjunction with a large amount of far, the log information nearby generated to a large amount of fault time points in advance is analyzed, and is excavated The characteristic rule that failure root occurs by log: (1) at least a kind of log information generated when failure occurs would generally be combined one Rise near fault point repeat and continual appearance;(2) a kind of log information generated when failure occurs is usually long A period in frequently occur, and be based on this, invention technician mentions in the trend that increases suddenly at fault time point Root is gone out because of the analysis strategy of log occurrence law, the M class log information that S102 is got has been divided according to the analysis strategy Analysis, determines a few class log informations for meeting root because of log occurrence law, optionally, can be obtained by following two ways Root is because of log:
(1) the corresponding M Log Types of the M class log information are divided into i different logs to combine;Each log group It closes comprising at least one Log Types in the M Log Types, and the Log Types that each log combination includes are each Not identical, the i is the integer more than or equal to 1;
The i log combination is traversed, determines at least one of i log combination root because log is combined;It is described Root is combined because of log are as follows: log that is frequent in the first time period and persistently occurring is combined;
To at least one described root because log combination is handled;
Will treated at least one root because the corresponding at least a kind of log information of log combination be determined as the N class root because Log.
Wherein, any log is combined, the determination log group is combined into root because log combination may include:
The first time period is divided at least one time window, by each time at least one described time window Window is divided at least one small time window;
It calculates the log and combines the first frequency occurred in any time window;The first frequency are as follows: the time The ratio of the number for the small time window that the number and the time window for occurring the small time window of the log combination in window include;
If the first frequency is greater than the first preset threshold, it is determined that the log group is combined into frequent in the time window Log combination;
It calculates the frequent log and combines the second frequency occurred in the first time period;The second frequency are as follows: The time that the number and the first time period for occurring the time window of the frequent log combination in the first time period include The ratio of the number of window;
If the second frequency is greater than the second preset threshold, it is determined that the frequent log group is combined into root because log is combined.
It should be noted that in embodiments of the present invention, the log group is combined into the group comprising at least one Log Types It closes, the time window is a time interval, and during dividing to time window, the size of each time window can be equal Can be unequal, the size of each small time window can be equal or unequal, and the log combination can in time window appearance To refer to: the corresponding log information of Log Types that the log combination includes occurs in the corresponding time interval of the time window.
First preset threshold and the second preset threshold can according to need and be configured, the embodiment of the present invention to this without It limits, if a log combines corresponding first frequency and is greater than the first preset threshold, then it represents that log combination concentrates on certain for the moment Quarter frequently occurs, and is determined as frequent log combination, if the log combines corresponding first frequency and is less than or equal to the first default threshold Value, it is believed that it is some log informations generated when the network equipment is normal that corresponding log information is combined in the log;If frequently The second frequency of log combination is greater than the second preset threshold, then it represents that the day at a time frequently occurred aims at first time period It inside persistently frequently occurs, i.e., the frequent log group is combined into the log for repeating in first time period and uninterruptedly occurring, and meets failure The rule that root occurs by log determines that the frequent log group is combined into root because log is combined, combines the log log class for including The corresponding log information of type is determined as the log information generated when the network equipment breaks down, if frequently the second frequency of log combination Rate is less than or equal to the second preset threshold, then it represents that aims at the day at a time frequently occurred to continue in first time period Property occur, it is believed that it is some log informations for the generating when network equipment is normal that corresponding log information is combined in the log.
Such as: if the first log information collection corresponds to (1,2,3) three Log Types, first time period is divided into 7 times Window, each time window are divided into 12 small time windows, and the first preset threshold is set as 3/4, and the second preset threshold is set as 1/2, (1,2,3) three Log Types can be then divided into (1,2,3), (1,2), (1,3), (2,3), (1), (2) and (3) seven logs Combination, at this point, occurring in the 10 small time windows of (1,2,3) in first time window if log is combined, then log is combined The first frequency of (1,2,3) in the first time window are as follows: log combination (1,2,3) is determined as at the first time by 10/12 > 3/4 Frequent log combination in window, traverses each time window, however, it is determined that log combines (1,2,3) in first time window, the second time Window, the 6th time window, is frequent log combination in the 7th time window at the 4th time window, then the of log combination (1,2,3) Two frequencies are as follows: log combination (1,2,3) is determined as root because log is combined, i.e., the in log combination (1,2,3) by 5/7 > 1/2 The production when log information of the log information of 1 type, the log information of the 2nd type and the 3rd type breaks down for the network equipment Raw some log informations similarly traverse other logs combination (1,2), (1,3), (2,3), (1), (2) and (3), determine (1,2), (1,3) and (2) three log groups are combined into root because log is combined.
Further, since obtained above because log combine there may be comprising relationship, as root because log combination (1,2, 3) comprising root because (1,2) is combined in log, at this point, if by comprising the second frequency that combines of log be less than comprising its log group Close, then can reject by comprising root because log combine, still, if by comprising log combine second frequency be much larger than comprising Its log combination, then can using by comprising log combination and comprising its log combination as a kind of independent failure Phenomenon is retained;Optionally, if traversing at least one root that i log combination is determined because there are the in log combination One because log combination and second because log combine, and described first because log combination be included in described second because of log It is in combination, then described to may include: because log combination carries out processing at least one described root
Corresponding second frequency, which is combined, because of log when described first is greater than described second because of log combination corresponding the When two frequencies, described first is not rejected because log is combined;
Corresponding second frequency, which is combined, because of log when described first is less than described second because of log combination corresponding the When two frequencies, described first is rejected because log is combined.
For example, if the above-mentioned root determined because in log combination (1,2,3), (1,2), (1,3) and (2), the of (1,2,3) Two frequencies are greater than the second frequency of (1,2) and (2), and the second frequency of (1,3) is greater than the second frequency of (1,2,3), then rejects root Because (1,2) and (2) is combined in log, only root is believed because log combination (1,2,3) and (1,3) merges to get to the 1st class log Breath, the 2nd class log information and the 3rd class log information are determined as root because of log.
Finally, due to which failure occurs in preceding log also comprising log combination that is several frequent and persistently occurring, usual situation Lower such log combination is normal inspection log, unrelated with the generation of failure, therefore, however, it is determined that at least one Gen Yin gone out There are third root because log is combined in will combination, and the third root because the root before log group the is combined into fault time point because Log combination, then it is described to include: because log combination is handled at least one described root
The third root is rejected because log is combined.
Wherein, the root before fault time point because log combine are as follows: within the period before fault time point frequently and The log combination persistently occurred;Optionally, the method that can also be combined using the root in above-mentioned determining first time period by log Come determine the root before fault time point because log combine, for example, obtain fault time point before between period, when by this Between section be divided at least one time window, each time window is divided at least one small time window;Calculate log combination when Between the first frequency that occurs in window;If the first frequency is greater than the first preset threshold, log combination is calculated described the The second frequency occurred in one period;If the second frequency is greater than the second preset threshold, it is determined that the log group is combined into Root is combined because of log.
For example, if it is determined that root in first time period because log group is combined into (1,2,3) and (Isosorbide-5-Nitrae), and log combination (1, 2,3) for the root before fault time point because log is combined, then log combination (1,2,3) is rejected.
In addition, by a large amount of failure correlation logs and the discovery of normal log analysis, periodicity is often presented in normal log The rule of appearance, distribution is relatively uniform, and the comparison occurred in entire log is frequent.And failure Gen Yin aims at fault point But almost never occur in unexpected increasing trend, and in the corresponding log of non-faulting mode, this and appearance described in information theory The higher content information content of frequency is lower consistent.We have proposed the exceptional value calculating sides for meeting log behavior pattern thus Method, and fault log is selected based on obtained exceptional value, specific implementation is as shown in (2):
(2) the determining and one-to-one M exceptional value of the M class log information;The exceptional value is for indicating: Yi Lei The frequent degree and mutation content that will information occurs in second time period, the second time period include the first time Section;
Top n largest outliers are obtained from the M exceptional value, and first log information is concentrated and the preceding N The corresponding N class log information of a largest outliers is determined as the N class root because of log.
Optionally, the second log information collection that the network equipment generates in the second time period is obtained;Described Two log information collection include at least one log information, each log information corresponding time point;
The second log information collection is pre-processed, the first log behavioural matrix is obtained;The first log behavior Matrix includes: Q group log behavior vector, and every group of log behavior vector occupies a time interval, every group of log behavior vector packet Containing R element, the R is the group number of the corresponding Log Types of the second log information collection, the R >=M;The log J-th of element representation in behavior vector: the number of jth class log information in the time interval of the log behavior vector;
According to formulaCalculate separately the R class log information Exceptional value, obtain and the one-to-one R exceptional value of the R class log information;
It is obtained and the one-to-one M exceptional value of the M class log information from the R exceptional value.
Wherein, the time interval is bigger, usually dozens of minutes, and the time interval of every group of log behavior vector can With it is equal can also be unequal.
The construction method and S101 of above-mentioned first log behavioural matrix construct the second log lines when determining fault time point The method of matrix is identical, can be with are as follows: is converted to the content format for every log information that second log information is concentrated pre- If journal format;Log information after format conversion is sorted out, and the classification logotype belonging to log information replaces institute Log information is stated, a time series being made of classification logotype is formed;According to prefixed time interval to the time series into Row divides;For each time interval, classification logotype identical in the time interval is subjected to counting statistics, and will statistics Number is arranged in a R dimension log behavior vector;All log behavior vectors are formed into first log lines sequentially in time For matrix;Unlike, when carrying out the confirmation of fault time point the second log lines for constructing be in matrix each log behavior to It is shorter to measure corresponding time interval, usually several seconds or a few minutes, and in the first log constructed when exceptional value calculating The corresponding time interval of every group of log behavior vector is long in behavioural matrix, usually dozens of minutes, hence, it will be appreciated that It is that the second log lines constructed when directlying adopt S101 and determining fault time point are matrix as above-mentioned first log behavior square When battle array, need to be one group with several by Y log behavior vector and be divided into Q group log behavior vector, Y < Q, so that exceptional value When calculating, the corresponding time interval of every group of log behavior vector is bigger.
It is describedIndicate that jth class log information is concentrated out in second log information in the R class log information Existing frequent degree, it is describedIndicate that jth class log information is described in the R class log information The mutation content that two log informations are concentrated, the qjIt is described for the group number of the log behavior vector comprising jth class log information ck+1,jIndicate the total quantity of jth class log information in+1 time interval of kth, the ck,jIndicate jth in k-th of time interval The total quantity of class log information.
For example, if above-mentioned first log behavioural matrix includes 100 groups of log behavior vectors and 1,2,3,4,5 five log class Type, and it is directed to the 1st class log information, occur in 1,5,6,9,10 5 group of log behavior vector, every group of log behavior vector is corresponding Time interval in the number of the 1st class log information be respectively as follows: 100,20,30,60,90, then the frequent degree of the 1st class log information Are as follows:The mutation content of 1st class log information are as follows: The exceptional value of 1st class log information are as follows: log20*log100.
It should be noted that above two mode can be individually performed, execution can also be combined, with more accurate Locating network fault occur exact cause, such as: mode (1) can be first passed through and determine that the 1st class, the 5th class log information are The log combination frequently and persistently occurred, then, further according to mode (2) to the exceptional value of the log information of the 1st class, the 5th class into Row calculates, if the corresponding 1st class log of the first log information collection and the 5th class log, and the exceptional value of the 1st class log is a most in preceding M In big exceptional value, it is determined that the 1st class log information is the root of failure generation because of log, in this way, improving network failure reason The accuracy of analysis.
S104: the reason of network equipment breaks down because log determines according to the N class root.
Since log information is therefore the network equipment can be obtained directly in the record information of the crawler behavior at a time point Take the N class root because of log, the reason of N class root is broken down because of the corresponding record information of log as the network equipment;In addition, It is understood that since the primary fault of the network equipment is there may be a variety of log informations, in the embodiment of the present invention In, can also be analyzed using existing analysis method combination N class root because of log, determine cause N class root because log most at all Failure cause.
It can also be on the basis of mode (1) be described in S103, by each day at least one Log Types after merging The number corresponding record that will type occurs at least one root is combined because of log with it, the highest Log Types of number are corresponding The basic reason that occurs directly as network equipment failure of log information.
For example, if by the mode (1) of S103 determine 7 roots because log combination (11,12,13,14,15,16), (11, 14,16,17), (11,28,35), (11,28,8), (11,31), (11,34), (11,35,8), the 11st kind of Log Types this 7 A root is the highest Log Types of frequency of occurrence because occurring 7 times in log combination, therefore, can directly be believed the 7th class log Most basic reason when the record information of breath breaks down as the network equipment carries out subsequent analysis processing.
From the foregoing, it will be observed that the embodiment of the present invention provides a kind of failure root cause analysis method, the fault time of the network equipment is determined Point;Obtain the first log information collection that the network equipment generates in first time period;The first log information collection includes M Class log information, the M are integer more than or equal to 1, the first time period are as follows: the before the fault time point The period being carved between the second moment after the fault time point for the moment;According to presupposition analysis strategy to the M class day Every class log information is analyzed in will information, obtains the N class root in the M class log information because of log;According to the N class root Because log determines the reason of network equipment breaks down.In this way, automatically to all kinds of log informations near fault time point It is analyzed, obtains the log information for meeting the rule that root occurs by log when failure occurs, net is determined according to the log information The basic reason of network device fails realizes and root occurs because automatically analyzing to network equipment failure, improves failure root Because of the efficiency of analysis.
According to embodiments of the present invention, the following embodiments of the present invention additionally provide a kind of analytical equipment 30, are preferably used for reality Method in existing above method embodiment.
Embodiment two
Fig. 4 is a kind of structure chart of analytical equipment 30 provided in an embodiment of the present invention, and the analytical equipment 30 can be with are as follows: is handed over Change planes, router, Network Management Equipment, Web (webpage) server, software defined network (Software Defined Network, SDN) any equipment in the equipment such as controller, for executing method described in embodiment one, as shown in figure 4, the analysis Equipment 30 may include:
Determination unit 201, for determining the fault time point of the network equipment.
Wherein, the fault time point is the time point that the network equipment breaks down, since within a period, network is set Standby that multiple failure may occur, therefore, above-mentioned fault time point can refer to the time that the network equipment arbitrarily once breaks down Point.
Acquiring unit 202, the first log information collection generated in first time period for obtaining the network equipment;Institute Stating the first log information collection includes M class log information, and the M is the integer more than or equal to 1, the first time period are as follows: from The second moment after the first moment to the fault time point before the fault time point that the determination unit 201 determines it Between period.
Wherein, the duration from the fault time o'clock to the first moment and from the fault time o'clock to second The duration at moment, which can according to need, to be configured, and the embodiment of the present invention is to this without limiting, and the present invention is only to obtain failure Log information near time point is principle, to determine the first moment and the second moment.
Preferably, before available fault time point 40 minutes after 20 minutes log informations and fault time point The log information of clock, the log information that will acquire is as the first log information collection in first time period;It can also only obtain Log information in the latter time period of fault time point (in such as 60 minutes), the log information that will acquire is as the first log Information collection.
Analytical unit 203, the M class log information for being got according to presupposition analysis strategy to the acquiring unit 202 In every class log information analyzed, obtain the N class root in the M class log information because of log;The N class root is because of log are as follows: The log information that the network equipment generates when breaking down, M >=N >=1, the presupposition analysis strategy are as follows: predefine The network equipment failure occur when log occur rule.
The determination unit 201, the N class root for being also used to be obtained according to the analytical unit 203 net because log determines The reason of network device fails.
Further, the determination unit 201 can determine the fault time point of the network equipment using existing people's method, The fault time point of following methods locating network device can be used:
Obtain at least one log information that the network equipment generates within a period;
At least one log information is handled, the second log lines of formation are matrix;Wherein, second log Behavioural matrix includes X log behavior vector, each log behavior vector one small time interval of occupancy, each log behavior to Amount includes Y element;The Y is the number of Log Types, y-th of element representation in the log behavior vector: described In the small time interval of log behavior vector and the number of the log information that belongs to y class;
Second log lines are calculated for the log behavior vector in matrix according to preset model, determine the net The time of failure of network equipment;Wherein, the preset model is used for: filtering out the behavior met when the network equipment breaks down The log behavior vector of feature.
Wherein, at least one log information is the record information of crawler behavior of the network equipment within a period, Every log information describes the network equipment once individual crawler behavior, and every log information may include: the network equipment is held Act the information such as timestamp, host or the module name of part, event level, information profile, event message;Need to illustrate when, Multiple failure can occur for the network equipment in one period.
Optionally, determination unit 201 can be handled at least one log information using following methods, be formed Second log lines are matrix:
The content format of every log information is converted to preset journal format at least one log information obtained;
Log information after format conversion is sorted out, and the classification logotype belonging to log information replaces the log Information forms a time series being made of classification logotype;
The time series is divided according to small time interval is preset;
For each small time interval, classification logotype identical in the small time interval is subjected to counting statistics, and will Statistics number is arranged in a Y dimension log behavior vector;
It is matrix that all log behavior vectors are formed second log lines sequentially in time.
Wherein, preset journal format, which can according to need, presets, and the embodiment of the present invention is to this without limiting.
The classification logotype is for indicating: Log Types;Such as: if log information " Apr 21201512:12:12User Login " belongs to Log Types 1, then can represent this log information with digital " 1 ".
Preferably, every log information after format conversion can be sorted out using the method for hierarchical clustering, wherein The hierarchical clustering is the classic algorithm in artificial intelligence, and character string phase is measured using the clustering tool of q-gram algorithm Every log information after format conversion is carried out by q-gram distance as the diversity factor magnitude between different logs like degree Cluster, by adjusting clustering parameter q, obtains optimal Log Types number;Wherein, the difference of q value will lead to the difference of analog result Different, from largely experimentally, q preferably takes 3 in the present invention, influence of this value to Log Clustering result is little, specific implementation It repeats no more.
The corresponding small time interval of each log behavior vector can be equal or unequal, between the small time Be configured every can according to need, the embodiment of the present invention to this without limit, such as: can be 1 minute or 5 minutes.Example Such as, if the number of Log Types is Y, X period is marked off according to preset small time interval, then the second log constructed Behavioural matrix are as follows:
Wherein, (xT1,1xT1,2,...,xT1,Y) indicate the log behavior vector of small time interval T1, the log behavior vector In y-th of element xT1,yIt indicates: belonging to the number of the log information of y class.Such as: the number of Log Types is 10, and is used The number of 1-10 is as classification logotype, one-to-one mark Log Types 1-10, at this point, if obtaining in the small time interval of T1 To 100 log informations, wherein there is 3,70 day of classification logotype of 1,20 log informations of classification logotype of 10 log informations The classification logotype 7 of will information, then the log behavior vector of the small time interval of T1 are as follows: (10,0,20,0,0,0,70,0,0,0).
Optionally, the determination unit 201 is specifically used for determining the failure of the network equipment by following two ways Time of origin:
(1) the log frequency and log type of each log behavior vector are calculated separately;
For second log lines be matrix in any log behavior vector, calculate the log behavior vector sum with Log frequency variance and log type variance between at least one adjacent log behavior vector of the log behavior vector;
If the mean value of the log frequency variance and log type variance be greater than preset threshold, by the log behavior to It measures corresponding small time interval and is determined as the network equipment failure time of origin.
Wherein, preset threshold can be analyzed to obtain by a large amount of fault logs, and the present invention is herein without limiting, if described The mean value of log frequency variance and log type variance is greater than preset threshold, then it represents that the log frequency of log behavior vector and day Will type mutates, which is that network failure occurs;If the mean value of the log frequency variance and log type variance Less than or equal to preset threshold, then it represents that the log frequency and log type of log behavior vector are that network equipment normal operation is capable It is characterized.
It should be noted that at least one log behavior vector adjacent with the log behavior vector can be the log Several log behavior vectors before behavior vector, or several log behavior vectors after the log behavior vector, Can also be for several log behavior vectors before and after the log behavior vector occur, number, which can according to need, to be set It sets, the embodiment of the present invention is to this without limiting;Preferably, according to many experiments it is found that adjacent with the log behavior vector At least one log behavior vector can be with are as follows: four adjacent log behavior vectors after the log behavior vector.
For example, if calculated log frequency variance and log type variance are respectively aiAnd bi, at this point, ifλ1By analyzing a large amount of fault logs to obtain preset threshold, then this vector corresponding time to be determined as event Downtime.
It should be noted that since in periodical log, the log information number of the generation in the unit time is will not to send out Raw to change, i.e. log frequency is fixed and invariable, so, for periodical log, in the fault detection of aforesaid way Frequency mutation is not significant, influences failure detection result, and in order to solve this problem, the invention proposes be based on acquisition of information Quan Fangfa is assigned in the log of technology, has comprehensively considered the distribution situation of all kinds of logs, effectively promotes the accuracy of fault time demarcation; Optionally, when the network equipment generates periodical log, the embodiment of the present invention is in the day for calculating separately each log behavior vector Before will frequency and log type, it is also necessary to carry out following processes:
According to formulaTo y-th of element in each log behavior vector into Row weighting assignment;
Wherein, y-th of element is the either element in the log behavior vector;nyGo out for y class log information The number of existing small time interval refers to that y class log information occurred in n small time intervals;Std (y) are as follows: y class The distribution variance of log information.
The distribution variance of the y class log information are as follows: the number of y class log information in the log behavior vector, It is y class day in other all log behavior vectors in matrix in addition to the log behavior vector with second log lines Variance between the number of will information.
(2) each log behavior vector in the X log behavior vector, the log behavior vector sum are traversed Similitude after the log behavior vector time between the log behavior vector adjacent with the log behavior vector, obtains To fiducial value corresponding with the log behavior vector;
It is obtaining with the X log behavior to each log behavior vector traversed in the X log behavior vector Each one-to-one fiducial value of log behavior vector in vector is arranged from big to small;
The small time interval of the corresponding log behavior vector of preceding k value after arrangement is determined as the network equipment failure Time of origin;Wherein, k is the integer more than or equal to 1.
It optionally, can be according to formulaCompare the log behavior vector sum the log behavior to Similitude after the amount time between the log behavior vector adjacent with the log behavior vector, obtains and the log behavior The corresponding fiducial value of vector;Wherein, t is small time interval locating for log behavior vector, xt,yRepresent t row log behavior to Y-th of element of amount.
Wherein, in embodiments of the present invention, k is the integer more than or equal to 1, and number k can be chosen with experience, may be used also To set a threshold value, by fiducial value be greater than the threshold value k log behavior vector be determined as the log behavior being abnormal to For network equipment failure point occurs for amount.
It should be noted that above two mode can be individually performed, execution can also be combined, with more accurate Locating network fault occur exact time, such as: can first pass through mode (1) determine the 1st row, the 5th row log behavior to The frequency and type of amount mutate, and point occur for failure, then, further according to mode (2) only to the similitude of the 1st row, the 5th row It is compared, determines that point occurs for the 1st row or the 5th behavior failure.
Further, since the network equipment may generate at least a kind of log information (i.e. Gen Yin when failure occurs Will), and obvious characteristic rule is presented in appearance of these class log informations near fault time point, for this purpose, of the invention Technical staff combines a large amount of far, the log information minute nearby generated to a large amount of fault time points in advance The characteristic rule that the root that is out of order occurs by log is excavated in analysis: (1) at least a kind of log information generated when failure occurs would generally Combine near fault point repeat and continual appearance;(2) a kind of log information generated when failure occurs is usual It is frequently occurred within a long period, and is based on this, skill of the present invention in the trend increased suddenly at fault time point Art personnel propose root because of the analysis strategy of log occurrence law, according to the analysis strategy to the M class log information got into Row analysis, determines a few class log informations for meeting root because of log occurrence law, optionally, analytical unit 203 can be under Two ways is stated to obtain root because of log:
(1) the corresponding M Log Types of the M class log information are divided into i different logs to combine;Each log group It closes comprising at least one Log Types in the M Log Types, and the Log Types that each log combination includes are each Not identical, the i is the integer more than or equal to 1;
The i log combination is traversed, determines at least one of i log combination root because log is combined;It is described Root is combined because of log are as follows: log that is frequent in the first time period and persistently occurring is combined;
To at least one described root because log combination is handled;
Will treated at least one root because the corresponding at least a kind of log information of log combination be determined as the N class root because Log.
Wherein, any log is combined, the analytical unit 203 can be used for:
The first time period is divided at least one time window, by each time at least one described time window Window is divided at least one small time window;
It calculates the log and combines the first frequency occurred in any time window;The first frequency are as follows: the time The ratio of the number for the small time window that the number and the time window for occurring the small time window of the log combination in window include;
If the first frequency is greater than the first preset threshold, it is determined that the log group is combined into frequent in the time window Log combination;
It calculates the frequent log and combines the second frequency occurred in the first time period;The second frequency are as follows: The time that the number and the first time period for occurring the time window of the frequent log combination in the first time period include The ratio of the number of window;
If the second frequency is greater than the second preset threshold, it is determined that the frequent log group is combined into root because log is combined.
It should be noted that in embodiments of the present invention, the log group is combined into the group comprising at least one Log Types It closes, the time window is a time interval, and during dividing to time window, the size of each time window can be equal Can be unequal, the size of each small time window can be equal or unequal, and the log combination can in time window appearance To refer to: the corresponding log information of Log Types that the log combination includes occurs in the corresponding time interval of the time window.
First preset threshold and the second preset threshold can according to need and be configured, the embodiment of the present invention to this without It limits, if a log combines corresponding first frequency and is greater than the first preset threshold, then it represents that log combination concentrates on certain for the moment Quarter frequently occurs, and is determined as frequent log combination, if the log combines corresponding first frequency and is less than or equal to the first default threshold Value, it is believed that it is some log informations generated when the network equipment is normal that corresponding log information is combined in the log;If frequently The second frequency of log combination is greater than the second preset threshold, then it represents that the day at a time frequently occurred aims at first time period It inside persistently frequently occurs, i.e., the frequent log group is combined into the log for repeating in first time period and uninterruptedly occurring, and meets failure The rule that root occurs by log determines that the frequent log group is combined into root because log is combined, combines the log log class for including The corresponding log information of type is determined as the log information generated when the network equipment breaks down, if frequently the second frequency of log combination Rate is less than or equal to the second preset threshold, then it represents that aims at the day at a time frequently occurred to continue in first time period Property occur, it is believed that it is some log informations for the generating when network equipment is normal that corresponding log information is combined in the log.
Further, since obtained root because log combine there may be comprising relationship, as root because log combination (1,2,3) packet Containing root because log combine (1,2), at this point, if by comprising log combine second frequency be less than comprising its log combine, Can reject by comprising root because log combine, still, if by comprising log combine second frequency be much larger than comprising its Log combination, then can using by comprising log combination and comprising its log combination as a kind of independent phenomenon of the failure Retained;Optionally, if traversing the i log combines at least one root determined because there are first in log combination Because log combination and second because log combine, and described first because log combination be included in described second because log combine In;Then the analytical unit 203 can be also used for:
Corresponding second frequency, which is combined, because of log when described first is greater than described second because of log combination corresponding the When two frequencies, described first is not rejected because log is combined;
Corresponding second frequency, which is combined, because of log when described first is less than described second because of log combination corresponding the When two frequencies, described first is rejected because log is combined.
Finally, due to which failure occurs in preceding log also comprising log combination that is several frequent and persistently occurring, usual situation Lower such log combination is normal inspection log, unrelated with the generation of failure, therefore, if it is true to traverse the i log combination At least one root made because log combine in there are third root because log is combined, and described in the third root is combined into because of log group Root before fault time point is because of log combination, then the analytical unit 203 can be also used for:
The third root is rejected because log is combined.
In addition, by a large amount of failure correlation logs and the discovery of normal log analysis, periodicity is often presented in normal log The rule of appearance, distribution is relatively uniform, and the comparison occurred in entire log is frequent.And failure Gen Yin aims at fault point But almost never occur in unexpected increasing trend, and in the corresponding log of non-faulting mode, this and appearance described in information theory The higher content information content of frequency is lower consistent, and we have proposed the exceptional value calculating sides for meeting log behavior pattern thus Method, and fault log is selected based on obtained exceptional value, specific implementation is as shown in (2):
(2) determination unit 201 is also used to the determining and one-to-one M exceptional value of the M class log information;Institute Exceptional value is stated for indicating: the frequent degree and mutation content that a kind of log information occurs in second time period, described second Period includes the first time period;
The analytical unit 203 is also used to obtain top n largest outliers from the M exceptional value, by described first Log information concentrates N class log information corresponding with the top n largest outliers to be determined as the N class root because of log.
Optionally, the acquiring unit 202, can be also used for:
Before the determination of determination unit 201 with the one-to-one M exceptional value of the M class log information, the net is obtained The second log information collection that network equipment generates in the second time period;The second log information collection includes at least one day Will information, each log information corresponding time point;
The determination unit 201, for being located in advance to the second log information collection that the acquiring unit 202 is got Reason obtains the first log behavioural matrix;The first log behavioural matrix includes: Q group log behavior vector, every group of log behavior Vector occupies a time interval, and every group of log behavior vector includes R element, and the R is the second log information collection pair The group number for the Log Types answered, the R >=M;J-th of element representation in the log behavior vector: in the log The number of jth class log information in the time interval of behavior vector;
According to formulaCalculate separately the R class log information Exceptional value, obtain and the one-to-one R exceptional value of the R class log information;
It is obtained and the one-to-one M exceptional value of the M class log information from the R exceptional value.
Wherein, the time interval is bigger, usually dozens of minutes, and the time interval of every group of log behavior vector can With it is equal can also be unequal.
It is describedIndicate that jth class log information is concentrated out in second log information in the R class log information Existing frequent degree, it is describedIndicate that jth class log information is described in the R class log information The mutation content that two log informations are concentrated, the qjIt is described for the group number of the log behavior vector comprising jth class log information ck+1,jIndicate the total quantity of jth class log information in+1 time interval of kth, the ck,jIndicate jth in k-th of time interval The total quantity of class log information.
It should be noted that above two mode can be individually performed, execution can also be combined, with more accurate Locating network fault occur exact cause, such as: mode (1) can be first passed through and determine that the 1st class, the 5th class log information are The log combination frequently and persistently occurred, then, further according to mode (2) to the exceptional value of the log information of the 1st class, the 5th class into Row calculates, if the corresponding 1st class log of the first log information collection and the 5th class log, and the exceptional value of the 1st class log is a most in preceding M In big exceptional value, it is determined that the 1st class log information is the root of failure generation because of log, in this way, improving network failure reason The accuracy of analysis.
Since log information is record information of the network equipment in the crawler behavior at a time point, therefore, described determining single Member 201, can be used for directly acquiring the N class root because of log, using N class root because the corresponding record information of log is as the network equipment The reason of breaking down;
Can also be analyzed using existing analysis method combination N class root because of log, determination cause N class root because of log most Basic failure cause;
Can also be on the basis of analytical unit 203 (1) by the way of, it will be at least one Log Types after merging The number corresponding record that each Log Types occur at least one root is combined because of log with it, by the highest log class of number The basic reason that the corresponding log information of type occurs directly as network equipment failure.
From the foregoing, it will be observed that the embodiment of the present invention provides a kind of analytical equipment, the fault time point of the network equipment is determined;Obtain institute State the first log information collection that the network equipment generates in first time period;The first log information collection is believed comprising the log of M class Breath, the M are integer more than or equal to 1, the first time period are as follows: from the first moment before the fault time point to The period between the second moment after the fault time point;According to presupposition analysis strategy in the M class log information Every class log information is analyzed, and obtains the N class root in the M class log information because of log;According to the N class root because log is true The reason of fixed network equipment breaks down.In this way, all kinds of log informations near fault time point are analyzed automatically, The log information for meeting the rule that root occurs by log when failure occurs is obtained, the network equipment is determined according to the log information The basic reason of failure realizes and root occurs because automatically analyzing to network equipment failure, improves the effect of failure root cause analysis Rate.
It is apparent to those skilled in the art that for convenience and simplicity of description, the unit of foregoing description It with the specific work process of system, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, apparatus embodiments described above are merely indicative, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that the independent physics of each unit includes, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the portion of each embodiment the method for the present invention Step by step.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc. are various can store The medium of program code.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware (such as processor) by program, which can store in a computer-readable storage In medium, storage medium may include: read-only memory, random access memory, disk or CD etc..
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (16)

1. a kind of failure root cause analysis method characterized by comprising
Analytical equipment determines the fault time point of the network equipment;
The analytical equipment obtains the first log information collection that the network equipment generates in first time period;Described first day Will information collection includes M class log information, and the M is the integer more than or equal to 1, the first time period are as follows: from the failure The period between the second moment after the first moment to the fault time point before time point;
The analytical equipment analyzes every class log information in the M class log information according to presupposition analysis strategy, obtains N class root in the M class log information is because of log;The N class root is because of log are as follows: what the network equipment generated when breaking down Log information, wherein M >=N >=1, the presupposition analysis strategy are as follows: log when the predetermined network equipment failure occurs The rule of generation;
The reason of network equipment breaks down because log determines according to the N class root.
2. the method according to claim 1, wherein described believe the M class log according to presupposition analysis strategy Every class log information is analyzed in breath, obtains the N class root in the M class log information because log includes:
The corresponding M Log Types of the M class log information are divided into i different log combinations;Each log is combined At least one Log Types in the M Log Types, and the Log Types that each log combination includes are different, The i is the integer more than or equal to 1;
The i log combination is traversed, determines at least one of i log combination root because log is combined;Described because Log combination are as follows: log that is frequent in the first time period and persistently occurring is combined;
To at least one described root because log combination is handled;
Treated at least one root is determined as the N class Gen Yin because of the corresponding at least a kind of log information of log combination Will.
3. according to the method described in claim 2, it is characterized in that, determining that the log group is combined into for any log combination Root is combined because of log
The first time period is divided at least one time window, each time window at least one described time window is drawn It is divided at least one small time window;
It calculates the log and combines the first frequency occurred in any time window;The first frequency are as follows: in the time window The ratio of the number for the small time window that the number and the time window for the small time window of the log combination occur include;
If the first frequency is greater than the first preset threshold, it is determined that the log group is combined into the frequent log in the time window Combination;
It calculates the frequent log and combines the second frequency occurred in the first time period;The second frequency are as follows: described The time window that the number and the first time period for occurring the time window of the frequent log combination in first time period include The ratio of number;
If the second frequency is greater than the second preset threshold, it is determined that the frequent log group is combined into root because log is combined.
4. according to the method in claim 2 or 3, which is characterized in that determined at least if traversing the i log combination One root because log combine in have first because log combination and second are because of log combination, and described first because of log group It closes and is included in described second because in log combination;It is then described to include: because log combination carries out processing at least one described root
Corresponding second frequency, which is combined, because of log when described first is greater than described second because of corresponding second frequency of log combination When rate, described first is not rejected because log is combined;
Corresponding second frequency, which is combined, because of log when described first is less than described second because of corresponding second frequency of log combination When rate, described first is rejected because log is combined.
5. according to the method in claim 2 or 3, which is characterized in that determined at least if traversing the i log combination One root because log combine in there are third root because log combine, and the third root because log group is combined into the fault time point Root before because log is combined, then the described pair of traversal i log combine at least one root for determining because log combine into Row is handled
The third root is rejected because log is combined.
6. the method according to claim 1, wherein described believe the M class log according to presupposition analysis strategy Every class log information is analyzed in breath, obtains the N class root in the M class log information because log includes:
The determining and one-to-one M exceptional value of the M class log information;The exceptional value is for indicating: a kind of log information The frequent degree and mutation content occurred in second time period, the second time period include the first time period;
Top n largest outliers are obtained from the M exceptional value, and first log information is concentrated with the top n most The big corresponding N class log information of exceptional value is determined as the N class root because of log.
7. according to the method described in claim 6, it is characterized in that, determining a with the one-to-one M of M class log information Before exceptional value, the method also includes:
Obtain the second log information collection that the network equipment generates in the second time period;The second log information collection Comprising at least one log information, each log information corresponding time point;
The second log information collection is pre-processed, the first log behavioural matrix is obtained;The first log behavioural matrix Include: Q group log behavior vector, each log behavior vector occupy a time interval, and every group of log behavior vector includes R Element, the R are the number of the corresponding Log Types of the second log information collection, the R >=M;The log behavior J-th of element representation in vector: the number of jth class log information in the time interval of the log behavior vector;
According to formulaCalculate separately the exception of the R class log information Value, obtains and the one-to-one R exceptional value of the R class log information;
The determination includes: with the one-to-one M exceptional value of the M class log information
It is obtained and the one-to-one M exceptional value of the M class log information from the R exceptional value;
Wherein, describedIndicate that jth class log information is concentrated out in second log information in the R class log information Existing frequent degree, it is describedIndicate that jth class log information is described in the R class log information The mutation content that two log informations are concentrated, the qjIt is described for the group number of the log behavior vector comprising jth class log information ck+1,jIndicate the total quantity of jth class log information in+1 time interval of kth, the ck,jIndicate jth in k-th of time interval The total quantity of class log information.
8. the method according to the description of claim 7 is characterized in that described pre-process the second log information collection, Obtaining the first log behavioural matrix includes:
The content format for every log information that second log information is concentrated is converted into preset journal format;
Log information after format conversion is sorted out, and the classification logotype belonging to log information replaces the log to believe Breath forms a time series being made of classification logotype;
The time series is divided according to prefixed time interval;
For each time interval, classification logotype identical in the time interval is subjected to counting statistics, and number will be counted It is arranged in a R dimension log behavior vector;
All log behavior vectors are formed into the first log behavioural matrix sequentially in time.
9. a kind of analytical equipment, for analyze the failure root of the network equipment because characterized by comprising
Determination unit, for determining the fault time point of the network equipment;
Acquiring unit, the first log information collection generated in first time period for obtaining the network equipment;Described first Log information collection includes M class log information, and the M is integer more than or equal to 1, the first time period are as follows: from it is described really The time between the second moment after the first moment to the fault time point before the fault time point that order member determines Section;
Analytical unit, every class log in the M class log information for being got according to presupposition analysis strategy to the acquiring unit Information is analyzed, and obtains the N class root in the M class log information because of log;The N class root is because of log are as follows: the network is set For the log information generated when breaking down, M >=N >=1, the presupposition analysis strategy are as follows: the predetermined network The rule that log occurs when equipment fault occurs;
The determination unit, because log determines, the network equipment occurs the N class root for being also used to be obtained according to the analytical unit The reason of failure.
10. analytical equipment according to claim 9, which is characterized in that the analytical unit is used for:
The corresponding M Log Types of the M class log information are divided into i different log combinations;Each log is combined At least one Log Types in the M Log Types, and the Log Types that each log combination includes are different, The i is the integer more than or equal to 1;
The i log combination is traversed, determines at least one of i log combination root because log is combined;Described because Log combination are as follows: log that is frequent in the first time period and persistently occurring is combined;
To at least one described root because log combination is handled;
Treated at least one root is determined as the N class Gen Yin because of the corresponding at least a kind of log information of log combination Will.
11. analytical equipment according to claim 10, which is characterized in that any log is combined, the analytical unit For:
The first time period is divided at least one time window, each time window at least one described time window is drawn It is divided at least one small time window;
It calculates the log and combines the first frequency occurred in any time window;The first frequency are as follows: in the time window The ratio of the number for the small time window that the number and the time window for the small time window of the log combination occur include;
If the first frequency is greater than the first preset threshold, it is determined that the log group is combined into the frequent log in the time window Combination;
It calculates the frequent log and combines the second frequency occurred in the first time period;The second frequency are as follows: described The time window that the number and the first time period for occurring the time window of the frequent log combination in first time period include The ratio of number;
If the second frequency is greater than the second preset threshold, it is determined that the frequent log group is combined into root because log is combined.
12. analytical equipment described in 0 or 11 according to claim 1, which is characterized in that determined if traversing the i log combination At least one root out because log combine in have first because log combination and second are because of log combination, and described first Because log combination is included in described second because in log combination;Then the analytical unit is used for:
Corresponding second frequency, which is combined, because of log when described first is greater than described second because of corresponding second frequency of log combination When rate, described first is not rejected because log is combined;
Corresponding second frequency, which is combined, because of log when described first is less than described second because of corresponding second frequency of log combination When rate, described first is rejected because log is combined.
13. analytical equipment described in 0 or 11 according to claim 1, which is characterized in that determined if traversing the i log combination At least one root out because log combine in there are third root because log combine, and the third root because log group is combined into it is described therefore Root before Downtime point is because of log combination, then the analytical unit is used for:
The third root is rejected because log is combined.
14. analytical equipment according to claim 9, which is characterized in that
The determination unit is also used to the determining and one-to-one M exceptional value of the M class log information;The exceptional value is used In expression: the frequent degree and mutation content that a kind of log information occurs in second time period, the second time period include The first time period;
The analytical unit is used for: top n largest outliers is obtained from the M exceptional value, by first log information N class log information corresponding with the top n largest outliers is concentrated to be determined as the N class root because of log.
15. analytical equipment according to claim 14, which is characterized in that the acquiring unit is also used to:
Before determination unit determination with the one-to-one M exceptional value of the M class log information, obtains the network equipment and exist The the second log information collection generated in the second time period;The second log information collection includes at least one log information, Each log information corresponding time point;
The determination unit obtains first for pre-processing to the second log information collection that the acquiring unit is got Log behavioural matrix;The first log behavioural matrix includes: Q group log behavior vector, and each log behavior vector occupies one A time interval, every group of log behavior vector include R element, and the R is the corresponding log class of the second log information collection The number of type, the R >=M;J-th of element representation in the log behavior vector: in the log behavior vector The number of jth class log information in time interval;
According to formulaCalculate separately the exception of the R class log information Value, obtains and the one-to-one R exceptional value of the R class log information;
The determination includes: with the one-to-one M exceptional value of the M class log information
It is obtained and the one-to-one M exceptional value of the M class log information from the R exceptional value;
Wherein, describedIndicate that jth class log information is concentrated out in second log information in the R class log information Existing frequent degree, it is describedIndicate that jth class log information is described in the R class log information The mutation content that two log informations are concentrated, the qjIt is described for the group number of the log behavior vector comprising jth class log information ck+1,jIndicate the total quantity of jth class log information in+1 time interval of kth, the ck,jIndicate jth in k-th of time interval The total quantity of class log information.
16. analytical equipment according to claim 14, which is characterized in that the determination unit is used for:
The content format for every log information that second log information is concentrated is converted into preset journal format;
Log information after format conversion is sorted out, and the classification logotype belonging to log information replaces the log to believe Breath forms a time series being made of classification logotype;
The time series is divided according to prefixed time interval;
For each time interval, classification logotype identical in the time interval is subjected to counting statistics, and number will be counted It is arranged in a R dimension log behavior vector;
All log behavior vectors are formed into the first log behavioural matrix sequentially in time.
CN201510990742.8A 2015-12-25 2015-12-25 A kind of failure root cause analysis method and analytical equipment Active CN105471659B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510990742.8A CN105471659B (en) 2015-12-25 2015-12-25 A kind of failure root cause analysis method and analytical equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510990742.8A CN105471659B (en) 2015-12-25 2015-12-25 A kind of failure root cause analysis method and analytical equipment

Publications (2)

Publication Number Publication Date
CN105471659A CN105471659A (en) 2016-04-06
CN105471659B true CN105471659B (en) 2019-03-01

Family

ID=55608973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510990742.8A Active CN105471659B (en) 2015-12-25 2015-12-25 A kind of failure root cause analysis method and analytical equipment

Country Status (1)

Country Link
CN (1) CN105471659B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105847065B (en) * 2016-05-24 2019-05-10 华为技术有限公司 A kind of network element device misconfiguration detection method and detection device
CN107548086B (en) * 2016-06-24 2022-09-27 中兴通讯股份有限公司 Root cause positioning method and device
CN108268473A (en) * 2016-12-30 2018-07-10 北京国双科技有限公司 A kind of log processing method and device
CN108322318B (en) * 2017-01-16 2021-04-09 华为技术有限公司 Alarm analysis method and equipment
US11132620B2 (en) 2017-04-20 2021-09-28 Cisco Technology, Inc. Root cause discovery engine
CN109428741A (en) * 2017-08-22 2019-03-05 中兴通讯股份有限公司 A kind of detection method and device of network failure
CN108011752B (en) * 2017-11-21 2020-06-16 江苏天联信息科技发展有限公司 Fault positioning analysis method and device and computer readable storage medium
CN109905270B (en) * 2018-03-29 2021-09-14 华为技术有限公司 Method, apparatus and computer readable storage medium for locating root cause alarm
CN108664374B (en) * 2018-05-17 2020-05-08 腾讯科技(深圳)有限公司 Fault alarm model creating method and device and fault alarm method and device
CN110545195A (en) * 2018-05-29 2019-12-06 华为技术有限公司 network fault analysis method and device
CN110932878A (en) * 2018-09-20 2020-03-27 中国移动通信有限公司研究院 Management method, equipment and system of distributed network
CN111669282B (en) * 2019-03-08 2023-10-24 华为技术有限公司 Method, device and computer storage medium for identifying suspected root cause alarm
CN111858291B (en) * 2019-04-30 2022-04-22 华为技术有限公司 Root cause determination method, equipment and system for data abnormity in charging system migration test
CN111181812B (en) * 2020-01-03 2022-04-08 四川新网银行股份有限公司 Link fault detection method based on network flow
CN114285730A (en) * 2020-09-18 2022-04-05 华为技术有限公司 Method and device for determining fault root cause and related equipment
CN112242929B (en) * 2020-10-16 2023-03-24 中国联合网络通信集团有限公司 Log detection method and device
CN113254255B (en) * 2021-07-15 2021-10-29 苏州浪潮智能科技有限公司 Cloud platform log analysis method, system, device and medium
CN113806196B (en) * 2021-09-17 2022-04-15 北京九章云极科技有限公司 Root cause analysis method and system
CN115118580B (en) * 2022-05-20 2023-10-31 阿里巴巴(中国)有限公司 Alarm analysis method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325520A (en) * 2008-06-17 2008-12-17 南京邮电大学 Method for locating and analyzing fault of intelligent self-adapting network based on log
CN103761173A (en) * 2013-12-28 2014-04-30 华中科技大学 Log based computer system fault diagnosis method and device
CN104794136A (en) * 2014-01-22 2015-07-22 华为技术有限公司 Fault analysis method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461842B (en) * 2013-09-23 2018-02-16 伊姆西公司 Based on daily record similitude come the method and apparatus of handling failure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101325520A (en) * 2008-06-17 2008-12-17 南京邮电大学 Method for locating and analyzing fault of intelligent self-adapting network based on log
CN103761173A (en) * 2013-12-28 2014-04-30 华中科技大学 Log based computer system fault diagnosis method and device
CN104794136A (en) * 2014-01-22 2015-07-22 华为技术有限公司 Fault analysis method and device

Also Published As

Publication number Publication date
CN105471659A (en) 2016-04-06

Similar Documents

Publication Publication Date Title
CN105471659B (en) A kind of failure root cause analysis method and analytical equipment
CN105577440B (en) A kind of network downtime localization method and analytical equipment
CN105812177B (en) A kind of network failure processing method and processing equipment
CN107832355B (en) A kind of method and device that the agency of crawlers obtains
CN105512799B (en) Power system transient stability evaluation method based on mass online historical data
CN111131304B (en) Cloud platform-oriented large-scale virtual machine fine-grained abnormal behavior detection method and system
DE112016001742T5 (en) Integrated community and role discovery in enterprise networks
CN107203450A (en) The sorting technique and equipment of failure
CN106685750A (en) System anomaly detection method and device
CN104967629A (en) Network attack detection method and apparatus
CN107145959A (en) A kind of electric power data processing method based on big data platform
CN108964960A (en) A kind of processing method and processing device of alarm event
EP2392120B1 (en) Method and sensor network for attribute selection for an event recognition
DE102022201746A1 (en) MANAGE DATA CENTERS WITH MACHINE LEARNING
CN106209856B (en) Method for generating big data security posture map based on trusted computing
CN108021509B (en) Test case dynamic sequencing method based on program behavior network aggregation
CN108111463A (en) The self study of various dimensions baseline and abnormal behaviour analysis based on average value and standard deviation
CN111310139A (en) Behavior data identification method and device and storage medium
CN109587000B (en) High-delay anomaly detection method and system based on crowd-sourcing network measurement data
CN109740634A (en) Disaggregated model training method and terminal device
CN110213087B (en) Complex system fault positioning method based on dynamic multilayer coupling network
CN105184403B (en) The workflow allocation optimum optimization method examined based on machine learning and statistical model
CN107222497A (en) Network traffic anomaly monitor method and electronic equipment
DE112021003657T5 (en) FAULT LOCATION FOR CLOUD-NATIVE APPLICATIONS
CN111614520B (en) IDC flow data prediction method and device based on machine learning algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant