CN117560275A - Root cause positioning method and device for micro-service system based on graphic neural network model - Google Patents

Root cause positioning method and device for micro-service system based on graphic neural network model Download PDF

Info

Publication number
CN117560275A
CN117560275A CN202311854026.8A CN202311854026A CN117560275A CN 117560275 A CN117560275 A CN 117560275A CN 202311854026 A CN202311854026 A CN 202311854026A CN 117560275 A CN117560275 A CN 117560275A
Authority
CN
China
Prior art keywords
micro
neural network
network model
service
root cause
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311854026.8A
Other languages
Chinese (zh)
Other versions
CN117560275B (en
Inventor
袁水平
余螯
朱雨涵
张泽锟
王健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Sigao Intelligent Technology Co ltd
Original Assignee
Anhui Sigao Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Sigao Intelligent Technology Co ltd filed Critical Anhui Sigao Intelligent Technology Co ltd
Priority to CN202311854026.8A priority Critical patent/CN117560275B/en
Publication of CN117560275A publication Critical patent/CN117560275A/en
Application granted granted Critical
Publication of CN117560275B publication Critical patent/CN117560275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a root cause positioning method and a root cause positioning device of a micro-service system based on a graph neural network model, comprising the following steps: constructing a graph neural network model; training the graph neural network model by using the historical fault multidimensional time sequence performance index to obtain a trained graph neural network model; constructing a heterogeneous topological graph of the micro service system at an instance level through the collected real-time micro service topological structure and the call relationship; adjusting the abnormal weight of each micro service node by combining the service request link; and inputting the root cause candidate set and the real-time index characteristic data of the abnormal time window into a graph neural network model, and obtaining final root cause and the root cause abnormal type after characteristic weighting. The device is used for realizing the method. The invention has the beneficial effects that: the abnormality of the micro-service system can be detected rapidly and accurately, and the positioning granularity is reduced to an instance level; the dynamic change of the micro-service system is well adapted by effectively combining the machine learning model with the dynamic diagram calculation method.

Description

Root cause positioning method and device for micro-service system based on graphic neural network model
Technical Field
The invention relates to the field of fault positioning of server systems, in particular to a root cause positioning method and device of a micro-service system based on a graph neural network model.
Background
With the development of the internet, cloud computing and computer industries, more and more systems are designed and built by adopting a micro-service architecture, and the micro-service architecture is widely applied to various actual scenes, for example: large enterprise applications, internet of things applications, and cloud services. The micro-service architecture can bring high availability, high expansibility and elastic expansion capability to the system so as to better adapt to the requirements of the current large-scale software application. In recent years, the concept of a cloud native software architecture has been developed as a method for constructing and running an application program, which makes the application program need to consider the running scene of the cloud environment at the time of design. The micro-service is one of the core points of the cloud native software architecture, the cloud native software architecture requires an application program to be designed and constructed in the form of the micro-service, communication and interaction are carried out between the services through a RESTful API, and the cloud native software architecture can fully utilize the capability of high cloud availability and high accommodation, so that the application program can be finally loaded and supported by the cloud in the form of a container. On the basis of micro-services, the cloud native software architecture can be transversely expanded in a very large scale, and has high availability and safety. However, how to better guarantee the reliability and observability of a large-scale micro-service system, and to better locate the service root cause when an abnormality occurs, are also facing a number of difficulties. An effective method is designed to automatically help operation and maintenance personnel to locate the root cause of the fault, which has important significance.
Currently, challenges to the micro-service root due to the localization problem are: 1) The positioning granularity is too large: the current micro-service root can be basically positioned to the micro-service granularity only and cannot be positioned to the micro-service embodiment granularity, but in a real scene, a certain micro-service instance or a container abnormality where the micro-service embodiment is located eventually causes jitter, and when the micro-service has a plurality of instances, the micro-service instance which should be checked or restarted cannot be known well. 2) The monitoring indexes are as follows: the index data which can be collected by the monitoring system not only comprises the index data of the micro service level, but also comprises the index data of the micro service instance, the container where the micro service instance is located and the host where the container is located, and the multi-dimensional data of the micro service system can be fully utilized to further position the micro service abnormal root cause with finer granularity, namely the service instance level. 3) Abnormal root cause type is ambiguous: root cause positioning is the first step when the micro-service system is abnormal, the type of the root cause abnormality is better screened, key information can be provided for the subsequent maintenance and repair process, and the current root cause positioning method is less related to the research and discussion of the angle.
In a micro-service system, a service is a collection of service instances, which are the smallest units that carry and run the actual business processes. After the service receives the request, the request is routed to the designated service instance through a variety of different load balancing policies. Dynamic changes in service instances are frequent and difficult to predict, coupled with constraints on system resources, traffic size, and bearer capability, and different resource constraints on different instances are often the root cause of anomalies in single or multiple service instances.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a root cause positioning method of a micro-service system based on a graph neural network model, which comprises the following steps:
s1, collecting historical fault multidimensional time sequence performance indexes of a micro-service system;
s2, constructing a graph neural network model; training the graph neural network model by using the historical fault multidimensional time sequence performance index to obtain a trained graph neural network model;
s3, constructing a heterogeneous topological graph of the micro service system at an instance level through the collected real-time micro service topological structure and the collected calling relationship;
s4, adjusting the abnormal weight of each micro service node by combining the service request link;
and S5, inputting the root cause candidate set and the real-time index characteristic data of the abnormal time window into a graph neural network model, and obtaining final root cause and the root cause abnormal type after characteristic weighting.
A micro-service system root cause positioning device based on a graph neural network model, comprising: a processor and a storage device; the processor loads and executes instructions and data in the storage device, and the instructions and data are used for realizing the root cause positioning method of the micro-service system based on the graph neural network model.
The beneficial effects provided by the invention are as follows: according to the method for positioning the abnormal root cause of the micro-service system based on the time sequence node sampling graph neural network model and the random walk algorithm, 1) the abnormality of the micro-service system can be detected rapidly and accurately, and the positioning granularity is reduced to an instance level; 2) The dynamic change of the micro-service system is well adapted by effectively combining the machine learning model with the dynamic diagram calculation method.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a neural network model design of an embodiment of the present invention;
FIG. 3 is a diagram illustrating heterogeneous topologies of an embodiment of the invention;
fig. 4 is a schematic view of the structure of the device of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a schematic diagram of a process flow of the present invention; the invention provides a root cause positioning method of a micro-service system based on a graph neural network model, which specifically comprises the following steps:
s1, collecting historical fault multidimensional time sequence performance indexes of a micro-service system;
it should be noted that, the multi-dimensional time sequence performance index in step S1 includes: an index of a micro service level, an index of a micro service instance level, and an index of a host level.
As an embodiment, the index of the micro service level includes: the relation of the service level serves the grid network delay fault and network packet loss fault;
an index of service instance level, comprising: instance level CPU high load fault, memory high load fault, instance network delay fault and instance abnormal termination fault;
host level metrics, including: high load faults of a host level CPU, high load faults of a memory, high load faults of file reading and writing and the like.
S2, constructing a graph neural network model; training the graph neural network model by using the historical fault multidimensional time sequence performance index to obtain a trained graph neural network model;
referring to fig. 2, fig. 2 is a schematic diagram of a neural network model design according to an embodiment of the invention.
The step S2 is specifically as follows:
s21, carrying out data sampling on the multi-dimensional time sequence performance index according to a fixed time interval to obtain sampling points; inputting the sampling points to an encoder of the graph neural network model to obtain index data characteristics of the sampling points;
the multi-dimensional index data for each sample point is normalized to a value between (0, 1) before being input to the encoder. The encoder refers to the idea of word embedding model, receives a certain number of node index data, the number of nodes is similar to the number of words, the micro-service multi-dimensional index data of each time sequence node is similar to the feature vector of the words, and the dimension conversion is needed because the dimension of the micro-service index data is influenced by the micro-service topological structure at the current moment, and the dimension conversion is needed to be converted into a unified feature dimension to represent the feature of each time sequence node for training of the neural network model of the subsequent graph.
S22, taking index data characteristics of the same abnormal interval sampling points as nodes of the graph network; connecting nodes in the same abnormal section according to time sequence to form the edge of the nodes in the same abnormal section of the graph network;
it should be noted that, for nodes of different anomaly time intervals of the same anomaly type, according to the time lapse of anomaly injection, connecting lines are formed between the nodes to form edges of a graph network similar anomaly characteristic interval;
s23, inputting the feature vector of each node and the feature vector of the adjacent node into an aggregator of the graph neural network model according to a fixed sampling number, and aggregating by adopting a convolution layer;
the feature vector of each node and the feature vector of the adjacent node are calculated according to a fixed sampling numbernInput into the aggregator for convolution aggregation, and sampling numbernRepresenting each nodenFeature aggregation is carried out on adjacent nodes, and the adjacent nodes are smaller thannThen the aggregation process takes the number of adjacent nodes as the sampling number, and the node are combinednAnd inputting the feature vectors of the sampling points into the encoder again to perform secondary coding aggregation, and performing feature aggregation on the index features of the adjacent nodes and the current node based on a recursion idea. Each recursion process is called a convolution layer, and proper quantity of convolution layers can be reasonably selected according to the micro-service cluster scale and the index data dimension to balance the model training time and the node characteristic aggregation degree;
in the first placekWhen the sub-graph is rolled up, the same time is takenVIs carried out by micro service index data, micro service instance index data and host machine index dataCONCATMerging, feature extraction of associated index data using a method similar to word embedding representation
S24, selecting proper number of convolution layers, marking corresponding fault type labels for different abnormal time windows, training the graph neural network model, and outputting the trained graph neural network model when the classification loss function converges to an expected value.
Randomly selecting a fixed number n of adjacent node sets among all time sequence nodes in the same abnormal intervalN(V) By means ofMEANThe method performs feature aggregation. Through the process ofkThe whole process of the secondary graph convolution is expressed asWhereinCONCATRepresenting merging and stitching nodes and feature dimensions of adjacent nodes, < >>Indicating that the adjacent node is at the firstkFeature sets of the secondary graph convolution.
S3, constructing a heterogeneous topological graph of the micro service system at an instance level through the collected real-time micro service topological structure and the collected calling relationship;
the step S3 is specifically as follows:
s31, when a fault occurs, constructing a real-time topological graph according to the topological structure of the micro-service system and call link data;
s32, combining the index of the micro service level collected in the step S1 to give a firstmPersonal service node weightingServicemAnd service->There is a direct call relationship between them to the service nodes m Service nodes n Assigning weights to data edges in combination with service invocation delay indicatorss m -s n ];
S33, combining the indexes of the micro-service example level collected in the step S1 to give the firstmThe first of the individual servicesjPersonal instance nodesi mj Weight is givenWhereinRepresenting instance nodessi mj Container CPU load->Indicating the load of the memory in the container,representing the network load of the container>Representing container throughput, +.>Representing the success rate of the request response of the container, and then calculating the correlation degree of various different index sequences according to the Pearson correlation coefficient; will eventually bemThe first of the individual servicesjThe example edge gives the greatest relevance +.>
S34, combining the host level indexes collected in the step S1 to give a firstkThe individual hosts assign weightsWherein->Representing the CPU load of the host machine,representing the memory load of the host,/->Representing the network load of a host, and then calculating the correlation degree of various different index sequences according to the Pearson correlation coefficient; finally, the maximum correlation is given to the host machine and all the example node edges on the host machine>
The calculation formula of the pearson correlation coefficient is as follows:
wherein the method comprises the steps ofx,yTwo sequence data for which correlation needs to be calculated.
S4, adjusting the abnormal weight of each micro service node by combining the service request link;
the step S4 is specifically as follows:
s41, give service nodeAssigning a personalized value as an average value of all the connecting edge weights of the personalized value, wherein the personalized value comprises the following components: directly calling edges and subordinate edges of all instances of the service and the service between service nodes;
s42, giving example nodesi mj Giving an individuation value as an edge weight value of the service to which the individuation value belongs;
s43, giving host noden k Giving personalized value as average value of edge weight value of the personalized value and all examples on the host;
s44, adopting a personalized random walk algorithm to sort the abnormal degrees of all nodes in a descending order on the heterogeneous topological graph to generate a preliminary root cause candidate set.
The calculation method of the personalized random walk adopts the following formula to calculate:
wherein the method comprises the steps ofvRepresenting the final scoring result of the node, and ranking the results of the instance root cause positioning simultaneously;Pin order to personalize the array of data,cto continue the probability of random walk forward,uscoring the result for the next node. After multiple rounds of walk iterations, the scoring results for each node will tend to converge, producing a preliminary set of root cause candidates.
And S5, inputting the root cause candidate set and the real-time index characteristic data of the abnormal time window into a graph neural network model, and obtaining final root cause and the root cause abnormal type after characteristic weighting.
The step S5 is specifically as follows:
s51, when a micro-service system runs in real time and fails, collecting multi-dimensional index data of the whole cluster in an abnormal time window;
s52, inputting the multidimensional index data into the neural network model trained in the step S2 to obtain classification weights of different types of root causes in the real-time abnormal interval;
and S53, carrying out product operation on the root cause candidate set obtained in the step S4 and the classification weight in the step S52 to obtain the final root cause ranking and the abnormal type, wherein the higher the ranking is, the more likely the root cause is.
As an example, the present invention is illustrated in Hipster Shop;
early stage preparation of experiment: the experimental environment is three Ubuntu physical machines, kubernetes, istio and Prometheus are installed on the physical machines. The Hipster Shop micro-service system is used as an example: wherein Hipster Shop is a micro-service business demonstration application comprising 12 micro-services. The application is a Web-based e-commerce application in which a user can browse goods, add the goods to a shopping cart, and make purchases. Including 8 business micro-services and 4 analog micro-services to implement the shopping process. The hardware and software information for a particular environment is shown in table 1.
The injected fault types and data set sizes are shown in table 2.
In order to simulate a real user scene, the embodiment uses the locusts as simulated concurrency generators to generate different workloads for simulating user concurrency behaviors for different business scenes. Meanwhile, in order to simulate the performance problem of a real environment, the following common anomalies are injected by adopting a chaos engineering tool ChaoMesh. 1) Delay; 2) Container instance CPU load; 3) A container instance memory load; 4) Network packet loss of container examples; 5) The container process stops. Collecting historical multidimensional index data of the occurrence of the abnormality;
table 1 hardware and software information table of the environment of the embodiment of the present invention
Table 2 injected fault types and data set sizes
The index data reported in the examples are shown in table 3.
The training parameters of the graph neural network model trained in the present invention are shown in table 4.
Table 3 index data reported by examples
TABLE 4 training parameters for neural network models
Referring to fig. 3, as shown in fig. 3, when a real-time fault occurs, a heterogeneous topology graph including all micro service nodes, instance nodes and host nodes is constructed, and weights are given to nodes and edges of the heterogeneous topology graph by combining cluster multi-dimension index data in the time interval.
And calculating the personalized array value of each node, and executing a personalized random walk algorithm to obtain a final root cause ranking list.
Finally, according to the 20 root cause positioning accuracy results of the embodiment, rank1, rank3 and Rank5 respectively represent whether the previous 1, 3 and 5 root causes can be positioned to the true root cause, 1 represents that the positioning can be successfully performed, and 0 represents that the positioning cannot be successfully performed. The results are shown in Table 5.
TABLE 5 final results of the invention
Referring to fig. 4, fig. 4 is a schematic structural diagram of the device of the present invention.
The apparatus 401 specifically includes: processor 402 and storage device 403.
Micro-service system root cause positioning device 401 based on graph neural network model: the root cause positioning device 401 of the micro service system based on the graph neural network model realizes the root cause positioning method of the micro service system based on the graph neural network model.
Processor 402: the processor 402 loads and executes the instructions and data in the storage device 403 to implement the root cause positioning method of the micro service system based on the graph neural network model.
Storage device 403: the storage device 403 stores instructions and data; the storage device 403 is configured to implement the root cause positioning method of the micro service system based on the graph neural network model.
In combination, the invention has the beneficial effects that: according to the method for positioning the abnormal root cause of the micro-service system based on the time sequence node sampling graph neural network model and the random walk algorithm, 1) the abnormality of the micro-service system can be detected rapidly and accurately, and the positioning granularity is reduced to an instance level; 2) The dynamic change of the micro-service system is well adapted by effectively combining the machine learning model with the dynamic diagram calculation method.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A root cause positioning method of a micro-service system based on a graph neural network model is characterized by comprising the following steps of: the method comprises the following steps:
s1, collecting historical fault multidimensional time sequence performance indexes of a micro-service system;
s2, constructing a graph neural network model; training the graph neural network model by using the historical fault multidimensional time sequence performance index to obtain a trained graph neural network model;
s3, constructing a heterogeneous topological graph of the micro service system at an instance level through the collected real-time micro service topological structure and the collected calling relationship;
s4, adjusting the abnormal weight of each micro service node by combining the service request link;
and S5, inputting the root cause candidate set and the real-time index characteristic data of the abnormal time window into a graph neural network model, and obtaining final root cause and the root cause abnormal type after characteristic weighting.
2. The method for positioning root cause of micro service system based on graphic neural network model as set forth in claim 1, wherein: the multi-dimensional time sequence performance index in the step S1 comprises the following steps: an index of a micro service level, an index of a micro service instance level, and an index of a host level.
3. The method for positioning root cause of micro service system based on graphic neural network model as set forth in claim 1, wherein: the step S2 specifically comprises the following steps:
s21, carrying out data sampling on the multi-dimensional time sequence performance index according to a fixed time interval to obtain sampling points; inputting the sampling points to an encoder of the graph neural network model to obtain index data characteristics of the sampling points;
s22, taking index data characteristics of the same abnormal interval sampling points as nodes of the graph network; connecting nodes in the same abnormal section according to time sequence to form the edge of the nodes in the same abnormal section of the graph network;
s23, inputting the feature vector of each node and the feature vector of the adjacent node into an aggregator of the graph neural network model according to a fixed sampling number, and aggregating by adopting a convolution layer;
s24, selecting proper number of convolution layers, marking corresponding fault type labels for different abnormal time windows, training the graph neural network model, and outputting the trained graph neural network model when the classification loss function converges to an expected value.
4. The method for positioning root cause of micro service system based on graphic neural network model as claimed in claim 2, wherein the method comprises the following steps: the step S3 is specifically as follows:
s31, when a fault occurs, constructing a real-time topological graph according to the topological structure of the micro-service system and call link data;
s32, combining the index of the micro service level collected in the step S1 to give a firstmPersonal service node weightingServicemAnd service->There is a direct call relationship between them to the service nodes m Service nodes n Assigning weights to data edges in combination with service invocation delay indicatorss m -s n ];
S33, combining the indexes of the micro-service example level collected in the step S1 to give the firstmThe first of the individual servicesjPersonal instance nodesi mj Weight is givenWherein->Representing instance nodessi mj Container CPU load->Indicating the load of the memory in the container,representing the network load of the container>Representing container throughput, +.>Representing the success rate of the request response of the container, and then calculating the correlation degree of various different index sequences according to the Pearson correlation coefficient; will eventually bemThe first of the individual servicesjThe example edge gives the greatest relevance +.>
S34, combining the host level indexes collected in the step S1 to give a firstkThe individual hosts assign weightsWherein->Representing the CPU load of the host machine,representing the memory load of the host,/->Representing the network load of a host, and then calculating the correlation degree of various different index sequences according to the Pearson correlation coefficient; finally, the maximum correlation is given to the host machine and all the example node edges on the host machine>
5. The method for positioning root cause of micro service system based on graphic neural network model as set forth in claim 4, wherein: the calculation formula of the pearson correlation coefficient is as follows:
wherein the method comprises the steps ofx,yTwo sequence data for which correlation needs to be calculated.
6. The method for positioning root cause of micro service system based on graphic neural network model as set forth in claim 1, wherein: the step S4 is specifically as follows:
s41, give service nodeAssigning a personalized value as an average value of all the connecting edge weights of the personalized value, wherein the personalized value comprises the following components: directly calling edges and subordinate edges of all instances of the service and the service between service nodes;
s42, giving example nodesi mj Assigning a personalized value to itThe edge weight of the belonging service;
s43, giving host noden k Giving personalized value as average value of edge weight value of the personalized value and all examples on the host;
s44, adopting a personalized random walk algorithm to sort the abnormal degrees of all nodes in a descending order on the heterogeneous topological graph to generate a preliminary root cause candidate set.
7. The method for positioning root cause of micro service system based on graphic neural network model as set forth in claim 1, wherein: the step S5 is specifically as follows:
s51, when a micro-service system runs in real time and fails, collecting multi-dimensional index data of the whole cluster in an abnormal time window;
s52, inputting the multidimensional index data into the neural network model trained in the step S2 to obtain classification weights of different types of root causes in the real-time abnormal interval;
and S53, carrying out product operation on the root cause candidate set obtained in the step S4 and the classification weight in the step S52 to obtain the final root cause ranking and the abnormal type, wherein the higher the ranking is, the more likely the root cause is.
8. The utility model provides a micro-service system root cause positioner based on graph neural network model which characterized in that: comprising the following steps: a processor and a storage device; the processor loads and executes instructions and data in the storage device to implement a root cause positioning method for a micro-service system based on a graph neural network model according to any one of claims 1 to 7.
CN202311854026.8A 2023-12-29 2023-12-29 Root cause positioning method and device for micro-service system based on graphic neural network model Active CN117560275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311854026.8A CN117560275B (en) 2023-12-29 2023-12-29 Root cause positioning method and device for micro-service system based on graphic neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311854026.8A CN117560275B (en) 2023-12-29 2023-12-29 Root cause positioning method and device for micro-service system based on graphic neural network model

Publications (2)

Publication Number Publication Date
CN117560275A true CN117560275A (en) 2024-02-13
CN117560275B CN117560275B (en) 2024-03-12

Family

ID=89813030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311854026.8A Active CN117560275B (en) 2023-12-29 2023-12-29 Root cause positioning method and device for micro-service system based on graphic neural network model

Country Status (1)

Country Link
CN (1) CN117560275B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118503001A (en) * 2024-07-17 2024-08-16 安徽思高智能科技有限公司 RPA task flow-oriented fault diagnosis method and equipment
CN118708395A (en) * 2024-08-27 2024-09-27 深圳开鸿数字产业发展有限公司 Super equipment fault detection method and system based on multidimensional data analysis

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112929223A (en) * 2021-03-08 2021-06-08 北京邮电大学 Method and system for training neural network model based on federal learning mode
CN113014421A (en) * 2021-02-08 2021-06-22 武汉大学 Micro-service root cause positioning method for cloud native system
CN113285831A (en) * 2021-05-24 2021-08-20 广州大学 Network behavior knowledge intelligent learning method and device, computer equipment and storage medium
CN113467421A (en) * 2021-07-01 2021-10-01 中国科学院计算技术研究所 Method for acquiring micro-service health status index and micro-service abnormity diagnosis method
WO2021217855A1 (en) * 2020-04-30 2021-11-04 平安科技(深圳)有限公司 Abnormal root cause positioning method and apparatus, and electronic device and storage medium
CN113900845A (en) * 2021-09-28 2022-01-07 大唐互联科技(武汉)有限公司 Method and storage medium for micro-service fault diagnosis based on neural network
CN114385397A (en) * 2021-12-31 2022-04-22 广西大学 Micro-service fault root cause positioning method based on fault propagation diagram
CN114615019A (en) * 2022-02-15 2022-06-10 北京云集智造科技有限公司 Anomaly detection method and system based on micro-service topological relation generation
CN114721860A (en) * 2022-05-23 2022-07-08 北京航空航天大学 Micro-service system fault positioning method based on graph neural network
CN115640159A (en) * 2022-11-03 2023-01-24 香港中文大学深圳研究院 Micro-service fault diagnosis method and system
US20230069074A1 (en) * 2021-08-20 2023-03-02 Nec Laboratories America, Inc. Interdependent causal networks for root cause localization
CN115859143A (en) * 2022-11-14 2023-03-28 之江实验室 Graph neural network anomaly detection method and device based on neighborhood node structure coding
CN115981902A (en) * 2022-12-16 2023-04-18 武汉大学 Fine-grained distributed micro-service system abnormal root cause positioning method and device
CN116633758A (en) * 2023-03-21 2023-08-22 湖北工业大学 Network fault prediction method and system based on full-heterogeneous element comparison learning model

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021217855A1 (en) * 2020-04-30 2021-11-04 平安科技(深圳)有限公司 Abnormal root cause positioning method and apparatus, and electronic device and storage medium
CN113014421A (en) * 2021-02-08 2021-06-22 武汉大学 Micro-service root cause positioning method for cloud native system
CN112929223A (en) * 2021-03-08 2021-06-08 北京邮电大学 Method and system for training neural network model based on federal learning mode
CN113285831A (en) * 2021-05-24 2021-08-20 广州大学 Network behavior knowledge intelligent learning method and device, computer equipment and storage medium
CN113467421A (en) * 2021-07-01 2021-10-01 中国科学院计算技术研究所 Method for acquiring micro-service health status index and micro-service abnormity diagnosis method
US20230069074A1 (en) * 2021-08-20 2023-03-02 Nec Laboratories America, Inc. Interdependent causal networks for root cause localization
CN113900845A (en) * 2021-09-28 2022-01-07 大唐互联科技(武汉)有限公司 Method and storage medium for micro-service fault diagnosis based on neural network
CN114385397A (en) * 2021-12-31 2022-04-22 广西大学 Micro-service fault root cause positioning method based on fault propagation diagram
CN114615019A (en) * 2022-02-15 2022-06-10 北京云集智造科技有限公司 Anomaly detection method and system based on micro-service topological relation generation
CN114721860A (en) * 2022-05-23 2022-07-08 北京航空航天大学 Micro-service system fault positioning method based on graph neural network
CN115640159A (en) * 2022-11-03 2023-01-24 香港中文大学深圳研究院 Micro-service fault diagnosis method and system
CN115859143A (en) * 2022-11-14 2023-03-28 之江实验室 Graph neural network anomaly detection method and device based on neighborhood node structure coding
CN115981902A (en) * 2022-12-16 2023-04-18 武汉大学 Fine-grained distributed micro-service system abnormal root cause positioning method and device
CN116633758A (en) * 2023-03-21 2023-08-22 湖北工业大学 Network fault prediction method and system based on full-heterogeneous element comparison learning model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋宗礼;李苗苗;张津丽;: "基于融合元路径图卷积的异质网络表示学习", 计算机科学, no. 07, 31 December 2020 (2020-12-31) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118503001A (en) * 2024-07-17 2024-08-16 安徽思高智能科技有限公司 RPA task flow-oriented fault diagnosis method and equipment
CN118708395A (en) * 2024-08-27 2024-09-27 深圳开鸿数字产业发展有限公司 Super equipment fault detection method and system based on multidimensional data analysis

Also Published As

Publication number Publication date
CN117560275B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN117560275B (en) Root cause positioning method and device for micro-service system based on graphic neural network model
CN111373415A (en) Analyzing sequence data using neural networks
CN108345544A (en) A kind of software defect distribution analysis of Influential Factors method based on complex network
CN108804576B (en) Domain name hierarchical structure detection method based on link analysis
Bogatinovski et al. Self-supervised anomaly detection from distributed traces
CN108683560A (en) A kind of performance benchmark test system and method for high amount of traffic processing frame
CN113900844B (en) Fault root cause positioning method, system and storage medium based on service code level
CN111539493B (en) Alarm prediction method and device, electronic equipment and storage medium
CN111027591B (en) Node fault prediction method for large-scale cluster system
CN115373888A (en) Fault positioning method and device, electronic equipment and storage medium
WO2021062219A1 (en) Clustering data using neural networks based on normalized cuts
CN112613666A (en) Power grid load prediction method based on graph convolution neural network and transfer learning
CN113221475A (en) Grid self-adaption method for high-precision flow field analysis
Mei et al. Machinery condition monitoring in the era of industry 4.0: A relative degree of contribution feature selection and deep residual network combined approach
Chang et al. Scientific Data Analysis using Neo4j.
CN109977131A (en) A kind of house type matching system
WO2024056051A1 (en) Non-intrusive flexible load aggregation characteristic identification and optimization method, apparatus, and device
CN105677565A (en) Defect correlation coefficient measuring method
Li et al. Root cause analysis of anomalies based on graph convolutional neural network
CN113761460A (en) Ductile power distribution network load outage loss risk assessment method and system
CN112766509A (en) Method for analyzing fault propagation path of electronic information system
CN118427578B (en) Micro-service system data evaluation method, device and medium based on chaotic engineering
CN115065605B (en) Cloud manufacturing resource node importance assessment method and system
Chen et al. En-beats: A novel ensemble learning-based method for multiple resource predictions in cloud
CN118313628B (en) Workshop resource allocation scheme generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant