CN117149500A - Abnormal root cause obtaining method and system based on index data and log data - Google Patents

Abnormal root cause obtaining method and system based on index data and log data Download PDF

Info

Publication number
CN117149500A
CN117149500A CN202311417601.8A CN202311417601A CN117149500A CN 117149500 A CN117149500 A CN 117149500A CN 202311417601 A CN202311417601 A CN 202311417601A CN 117149500 A CN117149500 A CN 117149500A
Authority
CN
China
Prior art keywords
log
data
index
abnormal
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311417601.8A
Other languages
Chinese (zh)
Other versions
CN117149500B (en
Inventor
张竞超
张泽锟
余螯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Sigao Intelligent Technology Co ltd
Original Assignee
Anhui Sigao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Sigao Intelligent Technology Co ltd filed Critical Anhui Sigao Intelligent Technology Co ltd
Priority to CN202311417601.8A priority Critical patent/CN117149500B/en
Publication of CN117149500A publication Critical patent/CN117149500A/en
Application granted granted Critical
Publication of CN117149500B publication Critical patent/CN117149500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method for obtaining an abnormal root cause based on index data and log data, which comprises the following steps: s1: acquiring index data and log data of a micro-service system; s2: calculating to obtain an index anomaly score sequence set MASS of index data through a BIRCH clustering algorithm; s3: obtaining a log anomaly score sequence LAS of log data through calculation of a deep algorithm; s4: carrying out association analysis on the clustering result of each index data in the index abnormal score sequence set MASS and the log abnormal score sequence LAS to obtain association; s5: and obtaining an abnormal root cause index through relevancy sorting. According to the method, the association degree analysis is carried out through the clustering result of the index data and the log abnormal score sequence, the abnormal root cause can be quantified through the association degree sequencing, the operation and maintenance personnel can be assisted to quickly locate the problem root cause, and the operation and maintenance loss of enterprises is reduced.

Description

Abnormal root cause obtaining method and system based on index data and log data
Technical Field
The invention relates to the field of intelligent operation and maintenance, in particular to a method and a system for obtaining an abnormal root cause based on index data and log data.
Background
The rapid growth of the internet has led to a dramatic expansion in the size and complexity of microservices systems. Most of Internet enterprises have too single operation and maintenance means and still stay in the stage of manual analysis. The traditional operation and maintenance mode of manual analysis is gradually lagged, and the problems of large scale and high complexity cannot be solved.
In recent years, with the development of the field of artificial intelligence, data-driven automation algorithms have been successfully applied in a variety of complex scenarios, which also provides a trigger for solving these problems. The basis of the data-driven automation algorithm is data, and journals and metrics are important components of operation and maintenance observability for micro-service systems. The log is an important data source for detecting the abnormity of the micro-service system, and records detailed operation information during the operation of the micro-service system, a time stamp of an event, related methods, parameters and the like. The inspection log can help maintenance manager to know the behavior of the system and find possible abnormal information. The system operation index is timing data collected at fixed time, such as CPU usage, corresponding delay, etc. The commonly collected metrics are in the form of (time stamps, values). When the numerical value presents abnormality, such as sudden increase and drop, etc., the micro-service related to the numerical value presents some abnormality, and operation and maintenance personnel are required to position root cause in time and take effective measures.
However, the existing automatic detection method also has the problems that the root cause of the micro-service system is only suitable for the index level, the root cause analysis of the log alarm problem is only suitable, the cause analysis of the log data and the index data does not consider the abnormal attribute in the log operation process, and the like.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for obtaining an abnormal root cause based on index data and log data, comprising the following steps:
s1: acquiring index data and log data of a micro-service system;
s2: calculating to obtain an index anomaly score sequence set MASS of index data through a BIRCH clustering algorithm;
s3: obtaining a log anomaly score sequence LAS of log data through calculation of a deep algorithm;
s4: carrying out association analysis on the clustering result of each index data in the index abnormal score sequence set MASS and the log abnormal score sequence LAS to obtain association;
s5: and obtaining an abnormal root cause index through relevancy sorting.
Preferably, step S2 specifically includes:
s21: normalizing the obtained N index data to convert the index data into [0,1 ]]Index vector m= { M within range 1 ,m 2 ,...,m N };
S22: for each index data M in the index vector M by BIRCH clustering algorithm u Clustering is carried out respectively, and a clustering result set of each index data is used as an index anomaly score sequence set MASS= { MAS 1 ,MAS 2 ,...,MAS N }, MAS therein u And as a clustering result of the ith index data, the value range of u is 1 to N.
Preferably, the step S3 specifically includes:
s31: analyzing the log data into a log key sequence and a parameter vector sequence according to the log category;
s32: analyzing the log key sequence based on deep log to obtain a log key anomaly score sequence LAS t
S33: obtaining a parameter vector anomaly score sequence LAS based on deep analysis of the parameter vector sequence p
S34: anomaly score sequence LAS through log keys t And a sequence of anomaly scores LAS for the parameter vector p And calculating a log anomaly score sequence LAS for obtaining log data.
Preferably, step S32 specifically includes:
s321: setting a first time window, and acquiring a log key set window of a log key sequence in the first time window h ={k h-H ,k h-H+1 ,...,k h Where H is the time, H is the length of the first time window, k h Is the h log key; predicting log key k of log key set at time h+1 through deep log h+1
S322: obtaining log key k by standard polynomial logic function calculation h+1 Is set of probability distributions p= { k 1 :p 1 ,k 2 :p 2 ,...,k i :p i ,...,k g :p g I is the number of the log key, p i Representing log key k h+1 Is a log key k i G is the number of log key types;
s323: if the true log key of the log key at the time h+1 is k i And p is i If the log key abnormality score is smaller than the Threshold, judging that the log is abnormal in execution path, and enabling the log key abnormality score AS at the time of h+1 to be equal to or smaller than the Threshold th =Threshold-p i The method comprises the steps of carrying out a first treatment on the surface of the If p i If the log key abnormality score is not smaller than Threshold, judging that the log is normal, and enabling the log key abnormality score AS at the time of h+1 to be th =0;
S324: let h=h+1;
s325: repeating steps S321-S324, and constructing a log key anomaly score sequence LAS through anomaly scores of all log keys in the log key sequence t
Preferably, step S33 specifically includes:
s331: setting a second time window, and acquiring a parameter vector set e of the parameter vector sequence in the second time window q ={v q-Q ,v q-Q+1 ,...,v q Q is the number of the parameter vector, Q is the length of the second time window, v q Is the q-th parameter vector; by deep log pair parameter vector set e q Predicting to obtain a prediction parameter vector setCalculate->And e q+1 Parameter vector error z between q+1
S332: error of parameter vector z q+1 Modeling as a gaussian distribution; if z q+1 Within the high confidence interval of the Gaussian distribution, the parameter vector v is judged q Normally, set the parameter vector v q Abnormal fraction AS of (2) pq =0; otherwise, judging the parameter vector v q Abnormality, set parameter vector v q Abnormal fraction AS of (2) pq =1;
S333: let p=p+1;
s334: repeating steps S331-S333, and constructing a parameter vector anomaly score sequence LAS by anomaly scores of all parameter vectors in the parameter vector sequence p
Preferably, the calculation formula of the log anomaly score sequence LAS is as follows:
wherein w is a hyper-parameter.
Preferably, the calculation formula of the association degree is:
wherein MI (MAS) u LAS) is MAS u Correlation with LAS, x is MAS u In (2), y is the log anomaly score in LAS, p (x, y) is the joint probability distribution function of x and y, p (x) is the edge probability distribution function of x, and p (y) is the edge probability distribution function of y.
An abnormal root cause acquisition system based on index data and log data, comprising:
the data acquisition module is used for acquiring index data and log data of the micro service system;
the index anomaly score calculation module is used for calculating an index anomaly score sequence set MASS of the index data through a BIRCH clustering algorithm;
the log anomaly score calculation module is used for calculating a log anomaly score sequence LAS for obtaining log data through a deep algorithm;
the association degree analysis module is used for carrying out association degree analysis on the clustering result of each index data in the index abnormal score sequence set MASS and the log abnormal score sequence LAS to obtain association degree;
the abnormal root indicator acquisition module is used for acquiring abnormal root indicators through relevancy sorting.
The invention has the following beneficial effects:
the index data and the log data of the micro-service system are used for carrying out exception analysis, so that the index data and the log data can cover more types of exceptions, and the exception reporting phenomenon caused by a single data source is reduced; and the association degree analysis is carried out on the clustering result of the index data and the log abnormal score sequence, and the abnormal root cause can be quantified through the association degree sequencing, so that the operation and maintenance personnel can be assisted to quickly locate the problem root cause, and the operation and maintenance loss of an enterprise is reduced.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, the invention provides an abnormal root cause obtaining method based on index data and log data, which aims at the problem of low compatibility of observability data (index+log) in a micro-service system, can reduce abnormal report missing phenomenon caused by a single data source and assist operation and maintenance personnel to quickly locate the root cause of the problem.
Comprising the following steps:
s1: acquiring index data and log data of a micro-service system;
s2: calculating to obtain an index anomaly score sequence set MASS of index data through a BIRCH clustering algorithm;
s3: obtaining a log anomaly score sequence LAS of log data through calculation of a deep algorithm;
s4: carrying out association analysis on the clustering result of each index data in the index abnormal score sequence set MASS and the log abnormal score sequence LAS to obtain association;
s5: and obtaining an abnormal root cause index through relevancy sorting.
Further, the step S1 specifically includes:
step S11: setting an oversampling parameter, carrying out oversampling collection on abnormal point data (log data+index data) by expanding the length of abnormal time, and assuming that the abnormal time period is L, and expanding the abnormal time period into (1+alpha) L in the process of collecting the data, wherein alpha=0.4;
step S12: the method comprises the steps of oversampling and collecting log data Raw_Logs of a micro-service system, wherein the log data comprise a log timestamp, a cmdb_id, a log file name and log content; the log data Raw_Logs are stored in an elastic search database;
step S13: oversampling at a time interval of 5s to collect micro-service system index data raw_metrics including performance index data and business index data; the performance index data records the state information of the server component, such as CPU utilization rate, memory utilization rate, network packet loss rate and the like; the service index data comprise system response rate, success rate, average response time and the like; the index data raw_metrics are stored in the elastic search database.
Further, the step S2 specifically includes:
s21: normalizing the obtained N index data to convert the index data into [0,1 ]]Index vector m= { M within range 1 ,m 2 ,...,m N };
Specifically, the calculation formula of the normalization process is:
wherein x' represents a normalization result, x represents source index data, and the normalization processing process converts the index data into index vectors so as to ensure that different index data have comparability;
s22: for each index data M in the index vector M by BIRCH clustering algorithm u Clustering is carried out respectively, and a clustering result set of each index data is used as an index anomaly score sequence set MASS= { MAS 1 ,MAS 2 ,...,MAS N }, MAS therein u And as a clustering result of the ith index data, the value range of u is 1 to N.
Further, the step S3 specifically includes:
s31: analyzing the log data into a log key sequence and a parameter vector sequence according to the log category;
specifically, a Drain log analysis tool is applied to analyze the log data Raw_log into a form of 'log key + parameter vector' according to the log category;
s32: analyzing the log key sequence based on deep log to obtain a log key anomaly score sequence LAS t
S33: obtaining a parameter vector anomaly score sequence LAS based on deep analysis of the parameter vector sequence p
S34: anomaly score sequence LAS through log keys t And a sequence of anomaly scores LAS for the parameter vector p And calculating a log anomaly score sequence LAS for obtaining log data.
Further, the step S32 specifically includes:
s321: setting a first time window, and acquiring a log key set window of a log key sequence in the first time window h ={k h-H ,k h-H+1 ,...,k h Where H is the time, H is the length of the first time window, k h Is the h log key; predicting log key k of log key set at time h+1 through deep log h+1
S322: obtaining log key k by standard polynomial logic function calculation h+1 Is set of probability distributions p= { k 1 :p 1 ,k 2 :p 2 ,...,k i :p i ,...,k g :p g I is the number of the log key, p i Representing log key k h+1 Is a log key k i G is the number of log key types;
s323: if the true log key of the log key at the time h+1 is k i And p is i If the log key abnormality score is smaller than the Threshold, judging that the log is abnormal in execution path, and enabling the log key abnormality score AS at the time of h+1 to be equal to or smaller than the Threshold th =Threshold-p i The method comprises the steps of carrying out a first treatment on the surface of the If p i If the log key abnormality score is not smaller than Threshold, judging that the log is normal, and enabling the log key abnormality score AS at the time of h+1 to be th =0;
S324: let h=h+1;
s325: repeating steps S321-S324, and constructing a log key abnormality score sequence through abnormality scores of all log keys in the log key sequenceColumn LAS t
Further, step S33 specifically includes:
s331: setting a second time window, and acquiring a parameter vector set e of the parameter vector sequence in the second time window q ={v q-Q ,v q-Q+1 ,...,v q Q is the number of the parameter vector, Q is the length of the second time window, v q Is the q-th parameter vector; by deep log pair parameter vector set e q Predicting to obtain a prediction parameter vector setCalculate->And e q+1 Parameter vector error z between q+1
S332: error of parameter vector z q+1 Modeling as a gaussian distribution; if z q+1 Within the high confidence interval of the Gaussian distribution, the parameter vector v is judged q Normally, set the parameter vector v q Abnormal fraction AS of (2) pq =0; otherwise, judging the parameter vector v q Abnormality, set parameter vector v q Abnormal fraction AS of (2) pq =1;
S333: let p=p+1;
s334: repeating steps S331-S333, and constructing a parameter vector anomaly score sequence LAS by anomaly scores of all parameter vectors in the parameter vector sequence p
Further, the calculation formula of the log anomaly score sequence LAS is as follows:
where w is a super parameter (super parameter w is set to 0.6).
Further, the calculation formula of the association degree is:
wherein MI (MAS) u LAS) is MAS u Correlation with LAS, x is MAS u In (2), y is the log anomaly score in LAS, p (x, y) is the joint probability distribution function of x and y, p (x) is the edge probability distribution function of x, and p (y) is the edge probability distribution function of y.
Further, the step S5 specifically includes:
after the association degree between the clustering results of all the index data and the log anomaly score sequence is obtained through calculation, the association degrees are sequentially ranked from high to low, and if the association degree is larger, the ranking in the list is higher, namely the index data is more likely to be an anomaly root cause.
An abnormal root cause acquisition system based on index data and log data, comprising:
the data acquisition module is used for acquiring index data and log data of the micro service system;
the index anomaly score calculation module is used for calculating an index anomaly score sequence set MASS of the index data through a BIRCH clustering algorithm;
the log anomaly score calculation module is used for calculating a log anomaly score sequence LAS for obtaining log data through a deep algorithm;
the association degree analysis module is used for carrying out association degree analysis on the clustering result of each index data in the index abnormal score sequence set MASS and the log abnormal score sequence LAS to obtain association degree;
the abnormal root indicator acquisition module is used for acquiring abnormal root indicators through relevancy sorting.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (8)

1. An abnormal root cause obtaining method based on index data and log data, comprising the steps of:
s1: acquiring index data and log data of a micro-service system;
s2: calculating to obtain an index anomaly score sequence set MASS of index data through a BIRCH clustering algorithm;
s3: obtaining a log anomaly score sequence LAS of log data through calculation of a deep algorithm;
s4: carrying out association analysis on the clustering result of each index data in the index abnormal score sequence set MASS and the log abnormal score sequence LAS to obtain association;
s5: and obtaining an abnormal root cause index through relevancy sorting.
2. The method for obtaining an abnormal root cause based on index data and log data according to claim 1, wherein step S2 is specifically:
s21: normalizing the obtained N index data to convert the index data into [0,1 ]]Index vector m= { M within range 1 ,m 2 ,...,m N };
S22: for each index data M in the index vector M by BIRCH clustering algorithm u Clustering respectively, toClustering result set of each index data is used as index anomaly score sequence set MASS= { MAS 1 ,MAS 2 ,...,MAS N }, MAS therein u And as a clustering result of the ith index data, the value range of u is 1 to N.
3. The method for obtaining an abnormal root cause based on index data and log data according to claim 1, wherein step S3 is specifically:
s31: analyzing the log data into a log key sequence and a parameter vector sequence according to the log category;
s32: analyzing the log key sequence based on deep log to obtain a log key anomaly score sequence LAS t
S33: obtaining a parameter vector anomaly score sequence LAS based on deep analysis of the parameter vector sequence p
S34: anomaly score sequence LAS through log keys t And a sequence of anomaly scores LAS for the parameter vector p And calculating a log anomaly score sequence LAS for obtaining log data.
4. The method for obtaining an abnormal root cause based on index data and log data according to claim 3, wherein step S32 is specifically:
s321: setting a first time window, and acquiring a log key set window of a log key sequence in the first time window h ={k h-H ,k h-H+1 ,...,k h Where H is the time, H is the length of the first time window, k h Is the h log key; predicting log key k of log key set at time h+1 through deep log h+1
S322: obtaining log key k by standard polynomial logic function calculation h+1 Is set of probability distributions p= { k 1 :p 1 ,k 2 :p 2 ,...,k i :p i ,...,k g :p g I is the number of the log key, p i Representing log key k h+1 Is a log key k i G is the number of log key types;
S323:if the true log key of the log key at the time h+1 is k i And p is i If the log key abnormality score is smaller than the Threshold, judging that the log is abnormal in execution path, and enabling the log key abnormality score AS at the time of h+1 to be equal to or smaller than the Threshold th =Threshold-p i The method comprises the steps of carrying out a first treatment on the surface of the If p i If the log key abnormality score is not smaller than Threshold, judging that the log is normal, and enabling the log key abnormality score AS at the time of h+1 to be th =0;
S324: let h=h+1;
s325: repeating steps S321-S324, and constructing a log key anomaly score sequence LAS through anomaly scores of all log keys in the log key sequence t
5. The method for obtaining an abnormal root cause based on index data and log data according to claim 3, wherein step S33 is specifically:
s331: setting a second time window, and acquiring a parameter vector set e of the parameter vector sequence in the second time window q ={v q-Q ,v q-Q+1 ,...,v q Q is the number of the parameter vector, Q is the length of the second time window, v q Is the q-th parameter vector; by deep log pair parameter vector set e q Predicting to obtain a prediction parameter vector setCalculate->And e q+1 Parameter vector error z between q+1
S332: error of parameter vector z q+1 Modeling as a gaussian distribution; if z q+1 Within the high confidence interval of the Gaussian distribution, the parameter vector v is judged q Normally, set the parameter vector v q Abnormal fraction AS of (2) pq =0; otherwise, judging the parameter vector v q Abnormality, set parameter vector v q Abnormal fraction AS of (2) pq =1;
S333: let p=p+1;
s334: repeating steps S331-S333, passing the parameter vectorAnomaly scores of all parameter vectors in the sequence, and constructing a parameter vector anomaly score sequence LAS p
6. The method for obtaining an abnormal root cause based on index data and log data according to claim 3, wherein the calculation formula of the log abnormal score sequence LAS is:
wherein w is a hyper-parameter.
7. The method for obtaining an abnormal root cause based on index data and log data according to claim 1, wherein the calculation formula of the degree of association is:
wherein MI (MAS) u LAS) is MAS u Correlation with LAS, x is MAS u In (2), y is the log anomaly score in LAS, p (x, y) is the joint probability distribution function of x and y, p (x) is the edge probability distribution function of x, and p (y) is the edge probability distribution function of y.
8. An abnormal root cause acquisition system based on index data and log data, comprising:
the data acquisition module is used for acquiring index data and log data of the micro service system;
the index anomaly score calculation module is used for calculating an index anomaly score sequence set MASS of the index data through a BIRCH clustering algorithm;
the log anomaly score calculation module is used for calculating a log anomaly score sequence LAS for obtaining log data through a deep algorithm;
the association degree analysis module is used for carrying out association degree analysis on the clustering result of each index data in the index abnormal score sequence set MASS and the log abnormal score sequence LAS to obtain association degree;
the abnormal root indicator acquisition module is used for acquiring abnormal root indicators through relevancy sorting.
CN202311417601.8A 2023-10-30 2023-10-30 Abnormal root cause obtaining method and system based on index data and log data Active CN117149500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311417601.8A CN117149500B (en) 2023-10-30 2023-10-30 Abnormal root cause obtaining method and system based on index data and log data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311417601.8A CN117149500B (en) 2023-10-30 2023-10-30 Abnormal root cause obtaining method and system based on index data and log data

Publications (2)

Publication Number Publication Date
CN117149500A true CN117149500A (en) 2023-12-01
CN117149500B CN117149500B (en) 2024-01-26

Family

ID=88899118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311417601.8A Active CN117149500B (en) 2023-10-30 2023-10-30 Abnormal root cause obtaining method and system based on index data and log data

Country Status (1)

Country Link
CN (1) CN117149500B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019063812A1 (en) * 2017-09-29 2019-04-04 Siemens Aktiengesellschaft Method and device for detecting abnormalities of discrete production equipment
US20210026722A1 (en) * 2019-07-25 2021-01-28 International Business Machines Corporation Detecting and responding to an anomaly in an event log
CN113014421A (en) * 2021-02-08 2021-06-22 武汉大学 Micro-service root cause positioning method for cloud native system
US20210192586A1 (en) * 2019-12-20 2021-06-24 Cintra Holding US Corp. Systems and Methods for Detecting and Responding to Anomalous Traffic Conditions
CN113282635A (en) * 2021-04-12 2021-08-20 国电南瑞科技股份有限公司 Micro-service system fault root cause positioning method and device
CN113312447A (en) * 2021-03-10 2021-08-27 天津大学 Semi-supervised log anomaly detection method based on probability label estimation
CN114201326A (en) * 2021-12-02 2022-03-18 中国神华国际工程有限公司 Micro-service abnormity diagnosis method based on attribute relation graph
CN114598539A (en) * 2022-03-16 2022-06-07 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
CN115604082A (en) * 2022-10-19 2023-01-13 北银金融科技有限责任公司(Cn) Fault diagnosis system based on AIOps
US20230153825A1 (en) * 2019-12-20 2023-05-18 Capital One Services, Llc Transaction exchange platform with a validation microservice for validating transactions before being processed
US20230153826A1 (en) * 2019-12-20 2023-05-18 Capital One Services, Llc Detecting and preventing duplicate transactions on a transaction exchange platform
CN116418653A (en) * 2023-03-17 2023-07-11 圣麦克思智能科技(江苏)有限公司 Fault positioning method and device based on multi-index root cause positioning algorithm
CN116450399A (en) * 2023-06-13 2023-07-18 西华大学 Fault diagnosis and root cause positioning method for micro service system
CN116737436A (en) * 2023-05-17 2023-09-12 武汉大学 Root cause positioning method and system for micro-service system facing mixed deployment scene

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019063812A1 (en) * 2017-09-29 2019-04-04 Siemens Aktiengesellschaft Method and device for detecting abnormalities of discrete production equipment
US20210026722A1 (en) * 2019-07-25 2021-01-28 International Business Machines Corporation Detecting and responding to an anomaly in an event log
US20230153825A1 (en) * 2019-12-20 2023-05-18 Capital One Services, Llc Transaction exchange platform with a validation microservice for validating transactions before being processed
US20210192586A1 (en) * 2019-12-20 2021-06-24 Cintra Holding US Corp. Systems and Methods for Detecting and Responding to Anomalous Traffic Conditions
US20230153826A1 (en) * 2019-12-20 2023-05-18 Capital One Services, Llc Detecting and preventing duplicate transactions on a transaction exchange platform
CN113014421A (en) * 2021-02-08 2021-06-22 武汉大学 Micro-service root cause positioning method for cloud native system
CN113312447A (en) * 2021-03-10 2021-08-27 天津大学 Semi-supervised log anomaly detection method based on probability label estimation
CN113282635A (en) * 2021-04-12 2021-08-20 国电南瑞科技股份有限公司 Micro-service system fault root cause positioning method and device
CN114201326A (en) * 2021-12-02 2022-03-18 中国神华国际工程有限公司 Micro-service abnormity diagnosis method based on attribute relation graph
CN114598539A (en) * 2022-03-16 2022-06-07 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
CN115604082A (en) * 2022-10-19 2023-01-13 北银金融科技有限责任公司(Cn) Fault diagnosis system based on AIOps
CN116418653A (en) * 2023-03-17 2023-07-11 圣麦克思智能科技(江苏)有限公司 Fault positioning method and device based on multi-index root cause positioning algorithm
CN116737436A (en) * 2023-05-17 2023-09-12 武汉大学 Root cause positioning method and system for micro-service system facing mixed deployment scene
CN116450399A (en) * 2023-06-13 2023-07-18 西华大学 Fault diagnosis and root cause positioning method for micro service system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CSDN啊阿啊阿啊: "基于层次聚类算法应用研究的实现", Retrieved from the Internet <URL:https://blog.csdn.net/weixin_36217085/article/details/119016411> *
SCHOOL OF COMPUTING, UNIVERSITY OF UTAH: "DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning", PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY *
夏禹: "基于深度学习的日志异常检测算法研究", 中国优秀硕士学位论文全文数据库信息科技辑 *
杨勇;李影;吴中海;: "分布式追踪技术综述", 软件学报, no. 07 *
贾统;李影;吴中海;: "基于日志数据的分布式软件系统故障诊断综述", 软件学报, no. 07 *

Also Published As

Publication number Publication date
CN117149500B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
Lan et al. Toward automated anomaly identification in large-scale systems
US8630962B2 (en) Error detection method and its system for early detection of errors in a planar or facilities
US8078913B2 (en) Automated identification of performance crisis
CN113282461B (en) Alarm identification method and device for transmission network
US20060188011A1 (en) Automated diagnosis and forecasting of service level objective states
CN110381079B (en) Method for detecting network log abnormity by combining GRU and SVDD
CN116450399B (en) Fault diagnosis and root cause positioning method for micro service system
Lim et al. Identifying recurrent and unknown performance issues
US20060010093A1 (en) System and method for continuous diagnosis of data streams
CN114185760A (en) System risk assessment method and device and charging equipment operation and maintenance detection method
CN115858794B (en) Abnormal log data identification method for network operation safety monitoring
CN114968727B (en) Database through infrastructure fault positioning method based on artificial intelligence operation and maintenance
CN114741369A (en) System log detection method of graph network based on self-attention mechanism
CN115237717A (en) Micro-service abnormity detection method and system
CN116361059A (en) Diagnosis method and diagnosis system for abnormal root cause of banking business
Chen et al. Exploiting local and global invariants for the management of large scale information systems
US11665185B2 (en) Method and apparatus to detect scripted network traffic
CN117149500B (en) Abnormal root cause obtaining method and system based on index data and log data
CN114816962B (en) ATTENTION-LSTM-based network fault prediction method
CN110808947A (en) Automatic vulnerability quantitative evaluation method and system
Febriansyah et al. Outlier detection and decision tree for wireless sensor network fault diagnosis
Liu et al. MTAD: Tools and Benchmarks for Multivariate Time Series Anomaly Detection
CN116302883A (en) Full-link pressure measurement monitoring method and system
CN117669594B (en) Big data relation network analysis method and system for abnormal information
Landolfi et al. Cloud telemetry modeling via residual gauss-markov random fields

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant