CN114201326A - Micro-service abnormity diagnosis method based on attribute relation graph - Google Patents

Micro-service abnormity diagnosis method based on attribute relation graph Download PDF

Info

Publication number
CN114201326A
CN114201326A CN202111461547.8A CN202111461547A CN114201326A CN 114201326 A CN114201326 A CN 114201326A CN 202111461547 A CN202111461547 A CN 202111461547A CN 114201326 A CN114201326 A CN 114201326A
Authority
CN
China
Prior art keywords
node
service
calling
graph
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111461547.8A
Other languages
Chinese (zh)
Inventor
何明栋
曹阳
王宝会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Shenhua International Engineering Co ltd
Original Assignee
China Shenhua International Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Shenhua International Engineering Co ltd filed Critical China Shenhua International Engineering Co ltd
Priority to CN202111461547.8A priority Critical patent/CN114201326A/en
Publication of CN114201326A publication Critical patent/CN114201326A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a micro-service abnormity diagnosis method based on an attribute relation graph, which aims to solve the problems of poor robustness, poor diagnosis effect and the like of the existing algorithm after a micro-service system is abnormal. Firstly, according to service calling track information acquired by monitoring agent service, abnormality is detected in real time; after the abnormity is found, establishing a micro-service calling topological relation graph to depict the real-time abnormity propagation relation of the micro-service based on the calling and deployment information of the abnormity occurrence time point; then acquiring comprehensive monitoring information before and after the occurrence of the abnormality, calculating personalized weight attributes of the nodes and edges on the graph by adopting a custom formula, and establishing a micro-service attribute relation graph; and (4) evaluating the nodes on the graph based on a PageRank algorithm, and reasoning out the most possible abnormal root nodes. The invention realizes the real-time detection of the micro-service abnormity, automatically establishes the attribute relation graph and intelligently deduces the abnormity degree of the service node so as to realize the abnormity diagnosis of the micro-service.

Description

Micro-service abnormity diagnosis method based on attribute relation graph
Technical Field
The invention relates to an abnormality diagnosis method of a micro-service software system, belonging to the technical field of software.
Background
The monolithic architecture and the SOA software architecture are the architecture forms commonly adopted by software companies, and through the development of over a decade, software systems become abnormally complex, have low expansibility and maintainability, and bear heavy technical debts. The existing internet is competitive, user requirements and market environments are in rapid changes all the time, when the existing internet is applied, the expansibility and flexibility of the traditional software architecture form are obviously insufficient, and the design, development, test, operation and maintenance costs are obviously increased. Therefore, the concept of microservice is proposed, which is a software architecture that treats a single application as a suite of software services, each running in a separate process, communicating with each other via lightweight protocols. The characteristics of the micro-service architecture are very suitable for agile development and continuous integration, the pain of the traditional software architecture is solved, and the extensive attention and research of academia and industry are obtained.
After the software system is micro-serviced, the maintenance and the flexibility are improved, meanwhile, the dependency relationship between services is complicated, and the probability of occurrence of faults and the loss caused by the faults are increased. For example, in a high traffic website, a delay of a certain service component may cause all application resources to be exhausted, causing a so-called avalanche effect, which may seriously cause the whole system to be broken down. Therefore, the effective monitoring system and the rapid positioning of the fault cause are one of the key technologies for guaranteeing the reliability and the performance of the micro-service.
The following categories of work are mainly done for microservice fault diagnosis: (1) a diagnostic method based on metric monitoring. The method mainly collects system operation indexes such as CPU, memory, network and the like so as to reflect the current state of the application program and the operation trend in a period of time. If a certain measurement exceeds a preset threshold value, indicating that the system has a problem, triggering an alarm, and then solving the problem by taking the monitoring data as the basis and combining the experience of the administrator; (2) the log-based monitoring and analyzing method has the advantages that the log clearly records the running condition of the system, is convenient to persist, can be easily searched, and is an effective means for finding out the fault reason and supporting more business targets; (3) the diagnosis method based on the distributed calling track information establishes a service calling topological graph according to the calling track, and carries out root cause inference by using a search algorithm.
The fault diagnosis method based on measurement and log monitoring is simple to implement, but cannot reflect the overall state of the system and cannot track the service flow, the fault location level is usually a service component, and in a complex micro-service interaction relationship, an administrator consumes a large amount of time to search and locate problems; based on the algorithm of the distributed calling track, because the established topological graph has the server nodes only with the deployment relation and without the calling relation, the established topological graph is incomplete, and only track information is adopted, the algorithm robustness is poor, and the diagnosis effect is inaccurate.
Disclosure of Invention
The technical problem of the invention is solved: the defects of the prior art are overcome, the micro-service-oriented high-efficiency abnormity diagnosis system is provided, the topological graph is automatically established by analyzing and calling track information, the overall state of the system is reflected in real time, and the expansibility of the system is improved. The personalized weight is calculated through the mixed weighting of the three abnormal scores, the algorithm robustness is improved, and the diagnosis effect is more accurate.
The method comprises the steps of transparently calling and monitoring the service, processing calling track information in real time, detecting online abnormity, automatically analyzing a calling relation according to a detection result, combining a deployment relation, and establishing a mixed weight attribute relation graph by using various monitoring indexes including a calling failure proportion, so that only service nodes of the deployment relation can be embodied in the graph, the expansibility of the system is improved, and the abnormal root cause positioning of the service level is realized by deducing the attribute graph.
The technical solution adopted by the invention comprises the following steps:
the method comprises the steps of firstly, detecting data flow real-time abnormity, acquiring calling track information based on monitoring agent service, processing the calling track information, carrying out abnormity detection on the processed time sequence through an online clustering algorithm, and determining an abnormity occurrence time point.
The calling track information mainly comprises the following steps:
(callType, startTime, elapsedTime, success, traceId, id, pid, cmdb _ id, serviceName), wherein callType represents the call type, startTime represents the call start time, elapsedTime represents the call time this time, success represents, traceId represents the call id of a complete request, id represents the current call id, pid represents the father node id, serviceName represents the network element id to which the service belongs, and serviceName represents the service name, the specific steps are as follows:
step 101, acquiring all calling track data of a single request by calling id, reducing time consumption of child nodes by parent node time, and acquiring execution time of the node;
102, adopting a 30-second time window for the calling time between two micro-service sub-nodes, taking a median value of the calling time in the window as a single data value of a group of time sequences, and acquiring a row of calling time sequences by using the method;
103, carrying out online real-time anomaly detection on each acquired calling time sequence through a BIRCH online clustering algorithm to determine anomaly occurrence time points;
step two, establishing a real-time topological graph, and for the specific abnormal occurrence time point in the step one, determining a parent service node and a child service node related to the current time period calling by analyzing calling track data when the abnormality occurs, and constructing the real-time topological graph comprising a server, a container and a database by combining with the deployment information of the service;
step 201, in step one, the calling track time sequence is "father node-child node: and calling a time key value pair form, and analyzing a calling time sequence, wherein a father node is an out-degree node of an edge in the graph, and a child node is an in-degree node of the edge in the graph. And reading the deployment configuration file, wherein for the deployment relationship, the node where the service container is located is an out-degree node of the edge in the graph, and the deployment server node is an in-degree node of the edge in the graph.
Step 201, analyzing the calling track time sequence at the specific abnormal occurrence time point in the step one, wherein the calling track time sequence is' father node-child node: and calling a key value pair form of time', splitting parent and child service nodes, reading the configuration file, and acquiring deployment server information corresponding to the parent and child service nodes to obtain all nodes for constructing the topological graph.
Step 202, for all the nodes of the topological graph obtained in step 201, if the nodes are in a calling relationship, the father node is an out-degree node of an edge in the graph, the child node is an in-degree node of the edge in the graph, and the direction of the edge is pointed to the child node by the father node; if the nodes are in a deployment relationship, the node where the service is located is an out-degree node of an edge in the graph, the deployment server node is an in-degree node of the edge in the graph, and the direction of the edge is that the service node points to the deployment server.
Step three, calculating personalized weight, establishing an attribute relation graph, acquiring monitoring data in a period of time before and after the occurrence of the abnormality after the occurrence time point of the abnormality is determined in the step one, calculating the personalized weight according to a formula, and establishing the attribute relation graph;
step 301, constructing a node weight of a topological graph, and constructing a feature vector N of a node on the topological graph (node _ on-off, node _ s-connect, node _ network, node _ CPU, and node _ memory), where node _ on-off represents a database monitoring switch, node _ s-connect represents a maximum connection number of a database, node _ network represents a network flow, node _ CPU represents a container or server CPU usage rate, and node _ memory represents a container or memory usage rate, and at the same time, constructing a related index weight calculation formula:
Figure BDA0003388964180000031
wherein λiRepresenting the similarity of the index feature vectors before and after the occurrence of the abnormality,
Figure BDA0003388964180000032
and (4) taking the mean value of the feature vectors in the 5 minutes before the occurrence of the abnormality and the mean value in the 5 minutes after the occurrence of the abnormality to perform cosine similarity calculation, wherein the larger the similarity is, the lower the abnormality degree is. T isiAnd representing attenuation coefficients for controlling that indexes on a plurality of service nodes are reflected sometimes after one abnormity occurs, such as network failure, wherein a attenuation sequence parameter is defined to control that the earlier the abnormity occurs, the higher the abnormity score is.
Step 302, the weight of the connecting edge of the nodes of the topological graph: adopting mixed weighting weight of service calling time of a period of time after the exception occurs, resource utilization information of a container or a server to which the service belongs, calling failure rate in a period of time and other information to establish a weight calculation formula: wij=c1St+c2Sm+c3SfWherein:
①Strepresenting a call delay exception score, St=max(-logPx),PxThe function is estimated for the kernel density. Acquiring a sum density function Px of a calling time sequence through normal data, and acquiring a delayed abnormal score value through a kernel density estimation function according to the calling time of an abnormal time period;
②Smon behalf of the server resource utilization index,
Figure BDA0003388964180000033
the meaning and the calculation mode of the specific parameters are the same as those of step 301;
③Sfrepresenting the abnormal score of the failed call, wherein a calculation formula of the abnormal score of the failed call is defined as follows, wherein n represents the number of failures, and m represents the total number of calls;
Figure BDA0003388964180000041
for step two, the pass coefficient c1c2c3The influence degree of the indexes on different types of nodes such as server nodes, container nodes, database nodes and the like is controlled. For only deployed servers, the resource utilization index feature is more important, and for database nodes, the call failure proportion feature is more important.
And step four, diagnosing abnormal root causes, and for the attribute relationship graph established in the step three, evaluating the abnormal degree of each service on the attribute graph by using a PageRank algorithm, giving a most possible root cause node ranking list and diagnosing the abnormal root causes.
The principle of the invention is as follows: and in view of the real-time requirement of the operation and maintenance system, performing abnormity diagnosis on the data stream by adopting an unsupervised online clustering mode. After the exception occurs, due to the characteristics of multi-language, multi-node and dynamic of the micro-service system, the calling relation executed by the current service is determined by analyzing the parent-child node relation among the calling tracks, and a real-time system topological graph is established by combining the deployment information of the service, so that the exception propagation condition of the system can be accurately described. Meanwhile, the abnormal characteristics can be reflected on a plurality of monitoring indexes, the invention provides a mixed weighting mode to evaluate the abnormal degree on the attribute graph, establishes the attribute relation graph, and finally adopts a PageRank algorithm to score the abnormal degree of the service and find out the service node causing the abnormality.
Compared with the prior art, the method has the advantages that firstly, a new abnormity evaluation index is provided: and the failure ratio abnormal characteristic value enables the characteristic description of the abnormality to be richer and more accurate. Secondly, a mixed weighting calculation mode is adopted to evaluate the abnormal scores between adjacent nodes on the topological graph, and compared with the original method that only calling time single index is adopted to calculate the abnormal scores, on one hand, a server which only has a deployment relation but does not have direct interface calling can be embodied on the topological graph, so that the method for diagnosing the abnormal scores based on the topological graph is more intelligent; on the other hand, various indexes are evaluated in a mixed weighting mode, so that the abnormity diagnosis is more accurate and the robustness is stronger.
Drawings
FIG. 1 is a general flowchart of a method for diagnosing abnormal microservice based on an attribute relationship diagram according to the present invention;
FIG. 2 is an environment of use of an example method of the invention.
Detailed Description
The present invention will be described in detail below with reference to specific embodiments and the accompanying drawings.
As shown in fig. 2, as a use environment of the embodiment method of the present invention, a target micro-service application is Sock-Shop, and kubernets are used as a basic operation environment to deploy service instances on pod, where 10 services of a core each have one instance, a MongoDB service has three instances, and MySQL has one instance. And each pod is provided with a proxy Agent for monitoring service calling information and measurement change in the service. The load generator simulates a user request and generates a load; the fault injector injects faults into the system through a preset script so as to test the diagnosis effect of the fault diagnosis system; the fault diagnosis system performs fault diagnosis based on the collected data. The method provided by the invention is realized in a fault diagnosis system.
As shown in fig. 1, the method flow of the embodiment of the present invention:
step one, collecting calling track information among all child nodes in the microservice through a deployed monitoring Agent, wherein the calling track information mainly comprises (callType, startTime, elapsedTime, success, traceId, id,
pid, cmdb _ id, serviceName), wherein callType represents a calling type, startTime represents a calling start time, elapsedTime represents the current calling time, success represents whether success or not, traceId represents a calling id of a complete request, id represents a current calling id, pid represents a father node id, serviceName represents a network element id to which a service belongs, serviceName represents a service name, calling track information is processed, calling time of all child nodes is subtracted by the father node calling time, a mode that a 30-second sliding window takes a median value is used for reducing data noise points, real-time data stream anomaly detection is carried out on a processed time sequence through an online clustering algorithm, and an anomaly occurrence time point is determined;
and step two, detecting the time point of the occurrence of the abnormality based on the abnormality detection in the step one, determining the calling relation among all sub-service nodes in the micro-service and the deployment relation between the sub-service nodes and the server by analyzing the calling track information and combining the deployment information of the micro-service, representing the sub-service nodes by using the nodes on the topological graph, representing the edges among the nodes on the graph by using the calling relation or the deployment relation, and automatically constructing the real-time topological graph of the micro-service application system.
Step three, detecting the abnormal occurrence time point based on the abnormality detection in the step one, acquiring detection data in a period of time before and after the abnormal occurrence, and inquiring monitoring data 5 minutes before and after the abnormal occurrence, including resource utilization information of a server or a container and a service call trackCalculating weight attributes of nodes on a topological graph, constructing a characteristic vector N (node _ on-off, node _ ss-connect, node _ network, node _ CPU, node _ memory), wherein the node _ on-off represents a database monitoring switch, the node _ ss-connect represents the maximum connection number of a database, the node _ network represents network flow, the node _ CPU represents the utilization rate of a container or a server CPU, the node _ memory represents the utilization rate of a container or a memory, and calculating the similarity of index characteristic vectors before and after abnormality occurrence
Figure BDA0003388964180000051
Wherein r isiA feature vector representing the ith data point after the occurrence of the anomaly,
Figure BDA0003388964180000052
taking the mean value of the feature vectors of 5 data points in 5 minutes before the occurrence of the abnormality, and taking the mean value of the feature vectors in 5 minutes before the occurrence of the abnormality and the mean value in 5 minutes after the occurrence of the abnormality as cosine similarity, and simultaneously calculating a formula according to index weight:
Figure BDA0003388964180000053
calculating the weight attribute of each node, wherein TiRepresents the attenuation coefficient of the ith data point after the occurrence of the anomaly, and n represents the number of data points. Determined according to the frequency of data acquisition and the number of vectors after processing, here using [0.95,0.85,0.75,0.65,0.55 ]]. Then, the upper weight of the topological graph is calculated, and the weight formula Wij=c1St+c2Sm+c3Sf,StRepresenting abnormal track abnormality score, SmResource utilization anomaly score, S, on behalf of a container or serverfRepresenting a failing Call proportional Exception score, parameter c1c2c3Representative weight coefficient determined according to the type of node, for general server, container and database node, c1=c2=c30.33; for the deployment server node, since there is no direct service call, C1=1,C2=C3=0。
And step four, for the attribute relation graph established in the step three, positioning the abnormal service node by using a PageRank algorithm. In the initial stage, the abnormal weight value of the service node is used as the initial PR value of the service, and P is ═ P0,P1,,...,Pn]T is a column vector consisting of initial values of PR for a plurality of services. By the formula
Figure BDA0003388964180000061
Calculate the PR value for each service, where q is the damping coefficient, typically taken as 0.85, I (p)j) Is directed to pjSet of microservice child nodes, O (p)j) Is pjSet of directed microserver sub-nodes, Pk(pi) Serving p for the kth iterationiIs scored. After a number of iterations, when Pk(pi) Satisfy | Pk-Pk-1I < delta, i.e. when Pk(pi) After convergence, the iteration ends. And ranking the abnormal degrees of the services according to the abnormal scores of the services, wherein the service with the highest score is the service which is most likely to cause the abnormality.
In short, the method detects the service abnormity in real time by detecting the service call track data flow, constructs the service topological graph according to the service call relation and the service deployment information in the abnormal time period, calculates the mixed weighting weight of the nodes and edges of the topological graph according to the call track information, the resource utilization information and the failure proportion in a period of time monitored by the service, establishes the attribute relation graph of abnormal service propagation, and finally deduces the most probable abnormal root cause node by using the PageRank algorithm. The invention realizes the real-time detection of the micro-service abnormity, automatically establishes the attribute relation graph and intelligently deduces the abnormity degree of the service node so as to realize the abnormity diagnosis of the micro-service.

Claims (4)

1. A micro-service abnormity diagnosis method based on an attribute relationship diagram is characterized by comprising the following steps:
acquiring calling track information based on a monitoring agent service, processing the calling track information, carrying out data flow real-time anomaly detection on the processed time sequence through an online clustering algorithm, and determining an anomaly occurrence time point;
step two, detecting an abnormality occurrence time point based on the abnormality detection step one, determining a calling relation between sub-service nodes in the micro-service and a deployment relation between the sub-service nodes and a server by analyzing calling track information and combining the deployment information of the micro-service, representing the sub-service nodes by using the nodes on the topological graph, representing edges between the nodes on the graph by using the calling relation or the deployment relation, and automatically constructing a real-time topological graph of the micro-service application system;
step three, detecting an abnormality occurrence time point based on the abnormality detection step one, acquiring monitoring data in a period of time before and after the abnormality occurrence, calculating personalized abnormality weight, and establishing an attribute relation graph;
and step four, evaluating the abnormal degree of each micro-service child node on the attribute relationship graph by using a PageRank algorithm for the attribute relationship graph established in the step three, obtaining a most possible root cause node ranking list, and diagnosing abnormal root causes.
2. The method for diagnosing microservice abnormality based on the attribute-relationship diagram according to claim 1, characterized in that: in the first step, when the calling track information is processed, a 30-second time window is adopted for the calling time between two micro-service sub-nodes, the median value is taken for the calling time in the window to serve as a single data value of a group of time sequences, the noise in the data is reduced in a median value taking mode, and the data quality is improved.
3. The method for diagnosing microservice abnormality based on the attribute-relationship diagram according to claim 1, characterized in that: the topological graph of the second step is established in the following way:
step 201, analyzing the calling track time sequence at the specific abnormal occurrence time point in the step one, wherein the calling track time sequence is' father node-child node: calling a key value pair form of time', splitting parent and child service nodes, reading a configuration file, and acquiring deployment server information corresponding to the parent and child service nodes to obtain all nodes for constructing a topological graph;
step 202, for all the nodes of the topological graph obtained in step 201, if the nodes are in a calling relationship, the father node is an out-degree node of an edge in the graph, the child node is an in-degree node of the edge in the graph, and the direction of the edge is pointed to the child node by the father node; if the nodes are in a deployment relationship, the node where the service is located is an out-degree node of an edge in the graph, the deployment server node is an in-degree node of the edge in the graph, and the direction of the edge is that the service node points to the deployment server.
4. The method for diagnosing microservice abnormality based on the attribute-relationship diagram according to claim 1, characterized in that: and step three, carrying out weight calculation on the sub-service nodes and edges in the micro-service, wherein the weights are divided into node weights and edge weights, and the specific calculation steps are as follows:
step 301, calculating the node weight of the topological graph, and constructing a feature vector N of a node on the topological graph (node _ on-off, node _ s-connect, node _ network, node _ CPU, and node _ memory), where node _ on-off represents a database monitoring switch, node _ s-connect represents the maximum connection number of a database, node _ network represents network traffic, node _ CPU represents the usage rate of a container or a server CPU, and node _ memory represents the usage rate of a container or a server memory, and a related index weight calculation formula is defined by itself:
Figure FDA0003388964170000021
wherein λiRepresenting the similarity of the characteristic vectors of the indicators before and after the occurrence of an abnormality, TiRepresenting attenuation coefficient, and calculating the abnormal degree attribute of the node by a formula;
step 302, calculating the weight of the connection edge of the nodes of the topological graph, calculating the personalized weight by adopting a mixed weighting mode of three abnormal scores, namely service calling time of a period of time before and after the occurrence of the abnormality, resource utilization information of a container or a server to which the service belongs and calling failure rate within a period of time, and establishing a weight calculation formula: wij=c1St+c2Sm+c3SfThe specific calculation method is as follows:
Stdelegate invocationDelay abnormal score with the formula St=max(-logPx),PxEstimating a function for the kernel density;
Smrepresenting the resource utilization index of the server, the calculation formula is
Figure FDA0003388964170000022
SfRepresenting the abnormal score of the failed call, which is a new abnormal evaluation index provided by the invention, the calculation formula is as follows, n represents the failure number, and m represents the total call number;
Figure FDA0003388964170000023
CN202111461547.8A 2021-12-02 2021-12-02 Micro-service abnormity diagnosis method based on attribute relation graph Pending CN114201326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111461547.8A CN114201326A (en) 2021-12-02 2021-12-02 Micro-service abnormity diagnosis method based on attribute relation graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111461547.8A CN114201326A (en) 2021-12-02 2021-12-02 Micro-service abnormity diagnosis method based on attribute relation graph

Publications (1)

Publication Number Publication Date
CN114201326A true CN114201326A (en) 2022-03-18

Family

ID=80650220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111461547.8A Pending CN114201326A (en) 2021-12-02 2021-12-02 Micro-service abnormity diagnosis method based on attribute relation graph

Country Status (1)

Country Link
CN (1) CN114201326A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461437A (en) * 2022-04-11 2022-05-10 中航信移动科技有限公司 Data processing method, electronic equipment and storage medium
CN115033477A (en) * 2022-06-08 2022-09-09 山东省计算中心(国家超级计算济南中心) Large-scale micro-service-oriented active performance anomaly detection and processing method and system
CN117149500A (en) * 2023-10-30 2023-12-01 安徽思高智能科技有限公司 Abnormal root cause obtaining method and system based on index data and log data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461437A (en) * 2022-04-11 2022-05-10 中航信移动科技有限公司 Data processing method, electronic equipment and storage medium
CN114461437B (en) * 2022-04-11 2022-06-10 中航信移动科技有限公司 Data processing method, electronic equipment and storage medium
CN115033477A (en) * 2022-06-08 2022-09-09 山东省计算中心(国家超级计算济南中心) Large-scale micro-service-oriented active performance anomaly detection and processing method and system
CN115033477B (en) * 2022-06-08 2023-06-27 山东省计算中心(国家超级计算济南中心) Performance abnormality active detection and processing method and system for large-scale micro-service
CN117149500A (en) * 2023-10-30 2023-12-01 安徽思高智能科技有限公司 Abnormal root cause obtaining method and system based on index data and log data
CN117149500B (en) * 2023-10-30 2024-01-26 安徽思高智能科技有限公司 Abnormal root cause obtaining method and system based on index data and log data

Similar Documents

Publication Publication Date Title
CN109933452B (en) Micro-service intelligent monitoring method facing abnormal propagation
CN114201326A (en) Micro-service abnormity diagnosis method based on attribute relation graph
Zhang et al. Ensembles of models for automated diagnosis of system performance problems
JP5699206B2 (en) System and method for determining application dependent paths in a data center
Liu et al. Microhecl: High-efficient root cause localization in large-scale microservice systems
Hu et al. Web service recommendation based on time series forecasting and collaborative filtering
JP5380528B2 (en) Ranking the importance of alarms for problem determination within large-scale equipment
US20100324869A1 (en) Modeling a computing entity
CN112698975A (en) Fault root cause positioning method and system of micro-service architecture information system
US20060188011A1 (en) Automated diagnosis and forecasting of service level objective states
US7782792B2 (en) Apparatus and methods for determining availability and performance of entities providing services in a distributed system using filtered service consumer feedback
US20180121275A1 (en) Method and apparatus for detecting and managing faults
US8204719B2 (en) Methods and systems for model-based management using abstract models
WO2010044797A1 (en) Performance analysis of applications
CN113900845A (en) Method and storage medium for micro-service fault diagnosis based on neural network
CN114528175A (en) Micro-service application system root cause positioning method, device, medium and equipment
CN115118621B (en) Dependency graph-based micro-service performance diagnosis method and system
CN111884859B (en) Network fault diagnosis method and device and readable storage medium
CN113467421B (en) Method for acquiring micro-service health status index and micro-service abnormity diagnosis method
KR20190096706A (en) Method and Apparatus for Monitoring Abnormal of System through Service Relevance Tracking
CN116719664B (en) Application and cloud platform cross-layer fault analysis method and system based on micro-service deployment
CN113392893A (en) Method, device, storage medium and computer program product for positioning service fault
CN111769974A (en) Cloud system fault diagnosis method
Toka et al. Predicting cloud-native application failures based on monitoring data of cloud infrastructure
US20060287739A1 (en) Method and apparatus of capacity learning for computer systems and applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination