CN113467421A - Method for acquiring micro-service health status index and micro-service abnormity diagnosis method - Google Patents

Method for acquiring micro-service health status index and micro-service abnormity diagnosis method Download PDF

Info

Publication number
CN113467421A
CN113467421A CN202110740846.9A CN202110740846A CN113467421A CN 113467421 A CN113467421 A CN 113467421A CN 202110740846 A CN202110740846 A CN 202110740846A CN 113467421 A CN113467421 A CN 113467421A
Authority
CN
China
Prior art keywords
micro
service
node
nodes
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110740846.9A
Other languages
Chinese (zh)
Other versions
CN113467421B (en
Inventor
周朋朋
王阳
李振宇
谢高岗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202110740846.9A priority Critical patent/CN113467421B/en
Publication of CN113467421A publication Critical patent/CN113467421A/en
Application granted granted Critical
Publication of CN113467421B publication Critical patent/CN113467421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0259Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterized by the response to fault detection
    • G05B23/0262Confirmation of fault detection, e.g. extra checks to confirm that a failure has indeed occurred
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/24Pc safety
    • G05B2219/24065Real time diagnostics

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method for acquiring a micro-service health status index and a micro-service abnormity diagnosis method, which comprises the following steps: acquiring a health state index of the micro service; acquiring hardware state data based on a physical machine where the micro-service is located; and judging the abnormity of the micro service by utilizing a BP neural network based on the health state index of the micro service and the hardware state data of the physical machine where the micro service is located, wherein the BP neural network is trained by utilizing the health state index of the micro service and the sample data of the hardware state data of the physical machine where the micro service is located. The invention can provide high-speed and high-accuracy abnormity diagnosis for the complex micro service module.

Description

Method for acquiring micro-service health status index and micro-service abnormity diagnosis method
Technical Field
The invention relates to the field of micro-services, in particular to a method for acquiring micro-service health status indexes and a micro-service abnormity diagnosis method.
Background
The rapid development of computer technology has led to greater processing power, more storage space, and faster network environments. This allows applications to serve more people, facing greater load pressures. With the increasing complexity of the internal logic of the application, the development and operation and maintenance of the application face more and more challenges. In order to reduce the cost of development and operation, more and more applications are beginning to implement development and deployment of services in a micro-service framework. The micro-service splits the application function into several tiny functional modules. And all the functional modules are interacted through a network. However, with the subdivision of microservice modules, the stability of microservices also faces greater challenges. A single application may be composed of tens or hundreds of micro-service modules, each of which is heterogeneous, such that the micro-service is often subject to various anomalies.
In order to improve the usability of the micro service and reduce the overhead of recovering the micro service abnormality, the existing work carries out deep research aiming at the micro service abnormality diagnosis. The existing part of work verifies the stability and robustness of the micro service by adding and deleting the micro service module, automatically injecting some specific exceptions, customizing a test scene and the like, and provides a single point exception for the micro service module which may have the exception. The method mostly only realizes the test of the micro-service performance and the single-point abnormity diagnosis, and fails to analyze the root cause of the micro-service abnormity. Part of the work is also to realize the root cause positioning of the micro-service exception by constructing the calling relation or the dependency relation of the micro-service. However, the construction of the micro-service relationship graph of these methods lacks universality. They either need specific information support in data or require a great deal of manual participation in the construction of the relational graph, so that the disadvantages of low efficiency of the construction of the relational graph and limited application scenes exist.
The micro service modules finish complex services through network interaction. Some of these research efforts are directed to the microservices themselves, with each microservice module being considered as an independent point for separate analysis and research. Other methods fully consider the interactive characteristics among the microservices and try to build the calling relationship of the microservices to realize the root cause positioning of microservice abnormity. The above research work can be divided into two broad categories according to whether it uses correlation diagnostics: monomer diagnosis and root cause localization.
The single body diagnosis category mainly focuses on the stability of each micro service module, the module performance and whether there is a behavior abnormality. Most of these methods use hardware data or log data of the system as data input indicators. D ü llmann(2017)[1]The performance change situation of the micro-service in the cloud is researched. The method adds and deletes the micro-service module deployed at the cloud end in a timing manner to observe the influence of the change of the architecture on the overall performance of the micro-service module. The same authors, Dullmann and van Hoorn (2017)[2]A framework for defining a micro-service topology is provided, and robustness of a micro-service module is verified through injection of an anomaly. Similarly, Gremlin[3]A more flexible micro-service abnormity verification method is provided. The method can enable a user to define some test scenes, and then Gremlin translates the test scenes into a corresponding network environment, so that the test and diagnosis of the micro-service are realized. However, most of these research methods are directed to diagnosing the microservice module itself or the microservice network, and the root cause of the anomaly cannot be clearly and specifically analyzed.
The root cause positioning method needs to consider the interaction characteristics among the microservice modules besides monitoring and analyzing the microservice modules. The purpose of diagnosing the abnormity of the micro service modules is achieved by establishing a calling relation topological graph among the micro service modules by using network data or log data and the like. TraceAnomaly[4]A microservice abnormal root cause positioning method based on unsupervised learning is provided. The method comprises the steps of firstly tracking the flow characteristics of task operation and diagnosing the abnormity through the response time among modules. It requires that data interaction between micro service modules contains UUID information (e.g. ID of a certain task), and thus is difficult to apply in a cloud computing platform.
Figure BDA0003142819100000021
Brandón[5]A micro-service root cause diagnosis method constructed based on a calling relation is provided. The method comprises the steps of firstly constructing a micro-service calling relation topological graph. And constructing a complete abnormal mode library according to the topological relation graph to finish the abnormal diagnosis of the micro-service. However, the construction of the topology requires an expert to assist in the construction after making a deep knowledge of the system microservice module, and thus the usage scenario is severely limited. Sieve[6]A microservice root cause abnormity positioning method constructed based on microservice calling relation is provided. Sieve first providesAnd the platform is used for collecting data generated by the micro-service module. The unimportant data content is filtered out from the collected data, and then possible dependencies between the microservice modules are guessed. The micro-service calling relation constructed by the method has the defect of low accuracy, and the accuracy in root cause positioning is low. The method realizes the root cause positioning of the micro-service abnormity by constructing the micro-service calling relation topological graph. However, the method has the disadvantages of limited use scenes and inaccurate topological graph construction. Therefore, the method has the defect of low accuracy in the positioning of the micro-service abnormal root cause.
Through the analysis, the existing micro-service single-point abnormity diagnosis method based on log data and hardware data can carry out adaptive diagnosis aiming at different micro-service modules, so that the diagnosis is more targeted. However, this kind of method has great difficulty in locating the abnormal root cause of the microservice because it is difficult to construct the dependency graph between microservice modules; the micro-service abnormity diagnosis method based on the network data can construct a calling relation graph of the micro-service, so that the method has advantages in positioning of abnormal root causes. However, in this kind of method, it is difficult to perform adaptive analysis for different micro service modules, so the diagnosis pertinence is poor and the diagnosis accuracy needs to be improved.
Reference documents:
[1]Düllmann T F.Performance anomaly detection in microservice architectures under continuous change[D].2017.
[2]Düllmann T F,van Hoorn A.Model-driven generation of microservice architectures forbenchmarking performance and resilience engineering approaches[C]//Proceedings of the8th ACM/SPEC on International Conference on Performance Engineering Companion.2017:171-172.
[3]Heorhiadi V,Rajagopalan S,Jamjoom H,et al.Gremlin:Systematic resilience testing ofmicroservices[C]//2016IEEE 36th International Conference on Distributed Computing Sys-tems(ICDCS).IEEE,2016:57-66.
[4]Liu P,Xu H,Ouyang Q,et al.Unsupervised detection of microservice trace anomalies throughservice-level deep bayesian networks[C]//2020IEEE 31st International Symposium on Soft-ware Reliability Engineering(ISSRE).IEEE,2020:48-58
[5]Brandón
Figure BDA0003142819100000031
SoléM,Huélamo A,et al.Graph-based root cause analysis for service-orientedand microservice architectures[J].Journal of Systems and Software,2020,159:110432.
[6]Thalheim J,Rodrigues A,Akkus I E,et al.Sieve:Actionable insights from monitored metricsin microservices[J].arXiv preprint arXiv:1709.06686,2017.
[7]Chen T,Guestrin C.Xgboost:A scalable tree boosting system[C]//Proceedings of the 22ndacm sigkdd international conference on knowledge discovery and data mining.2016:785-794.
disclosure of Invention
In order to solve the problems of insufficient pertinence to different modules and difficult root cause positioning existing in the prior art during micro-service abnormity diagnosis, the invention provides a micro-service network health state quantification method based on flow portrait, which comprises the following steps:
step H1), constructing a micro-service calling relation graph based on the interactive data packets among the micro-service modules;
step H2), generating a calling matrix of the micro service nodes based on the micro service calling relation graph, wherein each item of the calling matrix represents the number of data packets sent between the micro service nodes;
step H3), acquiring the health status index of the micro service by utilizing a convolutional neural network based on the calling matrix, wherein the convolutional neural network is generated by utilizing sample training of the calling matrix.
Preferably, the nodes in the call relationship graph represent microservice nodes, edges between the nodes represent that data interaction exists between the nodes, and the numerical value of the edges represents the number of data packets transmitted between the corresponding nodes; two directed edges in different directions are included between the nodes.
Preferably, the number of rows and columns of the call matrix is equal to the number of nodes in the call relationship graph, and an element in the ith row and the jth column represents the number of data packets sent by the source node i to the destination node j.
Preferably, the health status indicator is a network status vector of the microservice acquired by using a convolutional neural network.
The invention provides a micro-service single-point abnormity diagnosis method based on hardware state, which comprises the following steps:
step D1), acquiring the health status index of the micro service based on the method;
step D2), obtaining hardware state data based on the physical machine where the micro service is located;
and D3), judging the abnormity of the micro service by using a BP neural network based on the health state index of the micro service and the hardware state data of the physical machine where the micro service is located, wherein the BP neural network is trained by using the health state index of the micro service and the sample data of the hardware state data of the physical machine where the micro service is located.
Preferably, the step D2 includes:
and filtering hardware state data by using an XGboost algorithm, and selecting the first N hardware data indexes, wherein N is an integer greater than or equal to 1.
The invention provides a root cause positioning method based on a micro-service calling relation, which comprises the following steps:
step R1, acquiring abnormal micro service nodes based on the micro service abnormality diagnosis method;
step R2, calculating the influence factor of the abnormal micro service node based on the micro service calling relation graph by using the following formula;
Figure BDA0003142819100000041
where i, j represents an abnormal node, fiRepresenting the influence factor of node i, di,jRepresents the distance from node i to node j;
and R3, taking the first K nodes as root nodes according to the influence factors of the abnormal micro service nodes, wherein K is an integer greater than or equal to 1.
Preferably, d is reached when node i is reached to node ji,jIs the total hop count from node i to node j; when node i is unreachable to node j, di,jM +1, M is the total number of nodes in the microservice invocation relationship graph.
The invention provides a computer-readable storage medium having a computer program stored thereon, wherein the program realizes the steps of any of the above-mentioned methods when executed by a processor.
The invention provides a computer device comprising a memory and a processor, a computer program being stored on the memory and being executable on the processor, characterized in that the processor implements the steps of any of the above-mentioned methods when executing the program.
The invention has the following characteristics and beneficial effects: the invention can provide high-speed and high-accuracy abnormity diagnosis for the complex micro service module. The method comprises the steps of firstly constructing a micro-service calling relation graph according to the flow characteristics among micro-service modules, and carrying out characteristic quantification on the network state of the micro-service modules by using a convolutional neural network. Meanwhile, the invention collects the hardware data index characteristics of the micro-service module. Because the hardware data indexes have a large number of redundant and invalid characteristics, the XGboost is firstly used in the invention[7]And filtering and selecting the hardware data indexes by an algorithm. And the selected hardware data index and the network quantization characteristic are used as the input of a BP neural network and used for the anomaly detection of the micro-service single point. Since anomalies have spatial transitivity between microservice modules, a single anomaly may cause extensive propagation. In order to find the initial node of the micro-service abnormal propagation, the invention uses the calling relation graph of the micro-service and combines the result of single-point abnormal detection to carry out root quantitative scoring on all abnormal nodes. The invention selects Top N as root cause candidate set to report the administrator.
Drawings
FIG. 1 illustrates a microservice call graph build method according to one embodiment of the invention.
Fig. 2 illustrates a network state quantization method according to an embodiment of the present invention.
FIG. 3 illustrates a micro-service single point anomaly diagnosis method based on hardware states according to one embodiment of the present invention.
FIG. 4 illustrates a microservice root location method according to one embodiment of the invention.
FIG. 5 illustrates a system architecture diagram according to one embodiment of the invention.
FIG. 6 shows experimental results on Web Serving for one embodiment of the present invention.
FIG. 7 shows the results of a quasi-experiment on a Sock Shop for one embodiment of the present invention.
Detailed Description
The invention is described below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The inventor finds that the micro-service abnormity has strong correlation with network data and hardware state when carrying out micro-service abnormity diagnosis research. Microservice is often accompanied by dramatic fluctuations in certain data metrics when anomalous. For example, when a network anomaly is encountered, the microservice module exhibits abnormal traffic characteristics, and when the microservice module is encountered, it is usually accompanied by severe fluctuations in hardware data metrics. Therefore, the flow characteristics and the hardware state of the micro-service module are monitored in real time, and the abnormity of the micro-service module can be effectively diagnosed and positioned quickly and accurately.
When analyzing the network data characteristics of the micro service modules in abnormal conditions, the inventor finds that complex calling relationships exist among the micro service modules. The accurate extraction and analysis of the calling relation is the key for positioning the micro-service abnormal root cause. Therefore, based on the network interaction characteristics of the microservice, the invention provides a microservice calling relationship construction method based on the flow portrait.
According to one embodiment of the invention, the invention provides a micro-service call relation graph construction method based on flow portrait. FIG. 1 illustrates a micro-service invocation relationship construction method according to one embodiment of the present invention. The specific construction method is as follows.
Firstly, data interaction data packets among the micro service modules are collected, and the collected data packets are divided into a plurality of sets according to a certain time window size. And respectively constructing a calling relation graph for each set. The required information is extracted from the data packet, and according to one embodiment of the invention, the required information mainly comprises three parts: a timestamp, a source IP, and a destination IP. The timestamp is used for determining the time information of the data packet, and the source IP and the destination IP are used for constructing the directed edge of the micro-service call relation graph. Each node zone of the micro-service call relation graph represents a specific micro-service module. These microservice modules represent the carrier, e.g., container, etc., that carries the microservice. Every two nodes are connected through directed edges. If a directed edge exists from the node M to the node N, the node M has data interaction with the node N, and the information transmission direction is from M to N. For a certain data packet in the set, the invention searches whether a constructed topological graph has a directed edge from a source IP to a destination IP, and if so, the value of the edge is increased by 1. If not, the directed edge is added to the graph and the value of the edge is set to 1. By the method, the influence of the service on the construction of the calling relational graph can be avoided, so that the accurate calling relational graph is constructed.
Based on the calling relationship diagram of the micro-service, the invention combines the convolutional neural network model to carry out characteristic quantization on the network state of each micro-service module. The invention provides a micro-service network state quantization method based on a convolutional neural network. Fig. 2 illustrates a method of network state quantization according to one embodiment of the present invention. The method is based on the call relation diagram constructed in figure 1, and firstly generates a call matrix according to the call relation diagram. The size of the calling matrix is determined by the number of nodes of the calling relational graph. If the call relational graph has N nodes, a call matrix of N x N is generated. Where the rows and columns of the matrix represent nodes. The ith row and the jth column of the matrix indicate the number of data packets which are sent by the ith node as a source and the jth node as a destination. For example, the numbers in row 1 and column two of the graph call matrix indicate that node a sends 18 packets to node B.
The data change of the calling matrix reflects the health condition of the micro-service module to a certain extent. For example, when the data sent by node a to node B is stable and the amount of data sent by node B to node a decreases, it is likely that an anomaly has occurred in node B. In order to learn the data change characteristics in the matrix, according to one embodiment of the invention, a convolutional neural network is used for characteristic extraction and learning of the matrix. The convolution neural network takes an N-by-N calling relation matrix as data input, and an N-dimensional feature vector is generated through output. In model training, the invention learns the normal mode or abnormal mode of the system operation from historical data. Where 1 is set to the abnormal state and 0 is set to the normal state. After the model training is completed, the convolutional neural network may output an N-dimensional vector. Where each value of the vector corresponds to a particular microservice node. The value of the vector is a fraction between 0 and 1. The health status of the microservice node is identified. The closer to 1, the higher the probability of abnormality, and conversely, the healthier the micro service node. The output of the convolutional neural network will be a key index for the anomaly determination, and will play a great role in the final anomaly determination.
The inventor constructs a micro-service calling relation graph based on data interaction among micro-service modules, generates a calling matrix based on the calling relation graph, and extracts health state indexes of micro-services by using a convolutional neural network based on the calling matrix.
The inventor realizes that when analyzing the hardware data index when the micro service is abnormal: the hardware data indexes reflect the hardware state of the micro-service module and a physical machine where the micro-service module is located, and comprise information of multiple dimensions such as a memory, a CPU (central processing unit), a disk, a Cache and the like; the micro-service modules have diversity, and different modules have different sensitivity degrees to different data indexes, such as CPU sensitivity, memory sensitivity and the like. In order to make more accurate abnormality diagnosis for the micro-service module and aim at the diversity of the micro-service module, the invention provides a micro-service single-point abnormality diagnosis method based on hardware state and network indexes.
Fig. 3 illustrates a micro-service single-point anomaly diagnosis method based on hardware status and network indicators according to an embodiment of the present invention. According to one embodiment of the invention, information is collected from Prometheus and PCM. Prometheus is a suite of open source system monitoring and alarm frameworks. PCM (Performance Counter monitor) is a tool for monitoring the resource utilization of intel platform processors. The collected information contains dozens or even hundreds of data indicators, which have a large amount of useless information. Such as similar data indicators, useless data indicators, and unchanged data indicators, among others. If all data indexes are added into the model property for training, not only huge storage overhead is brought, but also the training of the model takes huge time cost. More importantly, the large amount of data indexes cannot improve the diagnosis accuracy, and sometimes, the large amount of interference information can reduce the diagnosis accuracy. In order to reduce the challenge caused by the excessive number of data indexes, according to one embodiment of the invention, the XGBoost algorithm is used for filtering the data indexes. The XGboost algorithm can evaluate and score the data indexes of each dimension according to the fluctuation condition of the data indexes. The high and low of the score represent the degree of influence of the dimension data on the final result. According to one embodiment of the present invention, the data index of Top N is selected as the input of the subsequent model.
According to one embodiment of the invention, all data is divided into a plurality of time windows according to a certain time length, and the hardware data of each dimension is averaged in the time window. And after filtering by the XGboost algorithm, obtaining N hardware data indexes in the period of time, and simultaneously taking a network data index quantization result generated by the convolutional neural network of FIG. 2 as the (N + 1) th data index, wherein the indexes are used as the input of a subsequent model.
After the hardware data index is combined with the network data index, the data index fluctuation characteristics in a normal state or an abnormal state need to be extracted. According to one embodiment of the invention, a BP neural network is used to learn the variation characteristics of the data index. The input of the BP neural network is a data vector with dimension of N + 1. Wherein, N hardware data indexes and 1 network data index. Thus, the BP neural network comprises input nodes of N +1 nodes. In addition to the input layer, the neural network comprises a bp _ layer hidden layer and an output layer comprising 1 node. The nodes of each layer of the neural network are provided with a connecting edge to be connected with any node of the adjacent layer. The specific topology is shown in fig. 3 as BP neural network. The weight of each connecting edge represents the influence degree of the node on the final result. In the initial stage of training, each weight is set to a random fraction between-0.5 and 0.5. The training process of the BP neural network is a process of continuously changing the weight of the connecting edge, so that the output of the neural network is more in line with the expectation. The BP neural network output represents the abnormal condition of the current node. The output is a decimal between 0 and 1, and the larger the numerical value is, the higher the possibility that the current node is abnormal is represented.
In order to further narrow the range of the anomaly and find the starting point of the anomaly propagation in time or space, the invention provides a root cause positioning method based on the micro-service call relation by using the micro-service call relation graph and combining with a single-point anomaly diagnosis method, and fig. 4 shows the root cause positioning method according to an embodiment of the invention. As shown in fig. 4, the system comprises 6 nodes, each node represents a micro service module, and the nodes are connected through directed edges. A double-headed arrow represents two-way data communication, while a single-headed arrow represents data communication in only one direction. Node B, C, E represents an abnormal node detected at the time of microservice single point detection. To perform root cause analysis on the three nodes, the influence factor f is first performed on all abnormal nodesiCalculating, wherein i is a node identifier, and a calculation formula of the influence factor is as follows:
Figure BDA0003142819100000091
wherein d isi,jRepresenting nodes i to nodeThe distance of point j. The calculation of the distance is divided into the following two cases:
1) if node i is reachable to node j, di,jIs the total hop count from node i to node j;
2) if node i is unreachable to node j, di,jM +1, where M is the total number of nodes in the current graph. j is all detected abnormal nodes, wherein j ≠ i. As an example shown in the figure. The distance from C to B is 2(C->A->B) The distance from C to E is 3(C->A->B->E) In that respect Therefore, the influence factor of C is 0.2. In the same way, the influence factor of B is 0.5. Node E is unreachable to node B, C, so the distance to both is M +1, where M is 6, so the impact factor of E is 1/14. And after the influence factors of each node are calculated, arranging the influence factors in a descending order, and outputting Top N nodes as a candidate set which is most likely to be abnormal root causes for reference of an administrator.
Based on the method, the invention provides a micro-service abnormity diagnosis system based on flow characteristics and hardware states. The overall architecture of the system is shown in fig. 5. Firstly, network interaction information among the microservice modules is extracted from a microservice cluster. And constructing a micro-service calling relation graph through the flow characteristics of network interaction, and performing characteristic quantization on the network state of the micro-service module by using a convolutional neural network. Meanwhile, combining hardware data indexes collected from the micro-service cluster with the network characteristic quantification result of the convolutional neural network, and carrying out micro-service single-point anomaly detection by using the BP neural network. Micro-service single point anomaly detection will detect all micro-service modules that may have anomalies. In order to further narrow the range of the anomaly and find the starting point of the anomaly propagation in time or space, the invention uses the calling relation of the micro-service and the result of single-point anomaly detection to carry out root cause positioning of the micro-service anomaly. And quantitatively scoring all abnormal modules during root cause positioning, and selecting Top N as a root cause candidate set to be reported to an administrator according to a scoring result.
The invention carries out experiments on the open source micro-service framework Web Serving and Sock Shop. The experiment constructs a micro-service calling relation graph according to the flow characteristics among the micro-service modules, and performs characteristic quantification on the network state of the micro-service modules by using a convolutional neural network. Meanwhile, hardware data index features of the micro-service module are collected, XGboost algorithm is used for filtering and selecting the hardware data indexes, and a large number of redundant and invalid features are removed. And the selected hardware data index and the network quantitative characteristics are used as the input of a BP neural network and are used for the anomaly detection of the micro-service single point. And then, carrying out root quantitative scoring on all abnormal nodes by using a calling relation graph of the micro-service and combining the result of single-point abnormality detection, and selecting Top N as a root candidate set. FIG. 6 shows the experimental results on Web Serving of an embodiment of the present invention, and FIG. 7 shows the experimental results on Sock Shop of an embodiment of the present invention, where Ateller represents the present invention. Experimental results show that compared with the existing method, the method provided by the invention has higher diagnosis accuracy, and the F-Measure is improved by 17% at most.
It is to be noted and understood that various modifications and improvements can be made to the invention described in detail above without departing from the spirit and scope of the invention as claimed in the appended claims. Accordingly, the scope of the claimed subject matter is not limited by any of the specific exemplary teachings provided.

Claims (10)

1. A method for obtaining a micro-service health status index includes:
step H1), constructing a micro-service calling relation graph based on the interactive data packets among the micro-service modules;
step H2), generating a calling matrix of the micro service nodes based on the micro service calling relation graph, wherein each item of the calling matrix represents the number of data packets sent between the micro service nodes;
step H3), acquiring the health status index of the micro service by utilizing a convolutional neural network based on the calling matrix, wherein the convolutional neural network is generated by utilizing sample training of the calling matrix.
2. The method of claim 1, wherein nodes in the call relation graph represent microservice nodes, edges between the nodes represent data interaction between the nodes, and the numerical value of an edge represents the number of data packets transmitted between corresponding nodes; two directed edges in different directions are included between the nodes.
3. The method according to claim 1, wherein the number of rows and columns of the call matrix is equal to the number of nodes in the call relation graph, and an element in an ith row and a jth column represents the number of data packets sent by a source node i to a destination node j.
4. The method of claim 1, the health indicator being a network state vector of the microservice obtained using a convolutional neural network.
5. A microservice anomaly diagnostic method, comprising:
step D1) of obtaining a health status indicator of the microservice based on the method according to any one of claims 1 to 4;
step D2), obtaining hardware state data based on the physical machine where the micro service is located;
and D3), judging the abnormity of the micro service by using a BP neural network based on the health state index of the micro service and the hardware state data of the physical machine where the micro service is located, wherein the BP neural network is trained by using the health state index of the micro service and the sample data of the hardware state data of the physical machine where the micro service is located.
6. The method of claim 5, said step D2 comprising:
and filtering hardware state data by using an XGboost algorithm, and selecting the first N hardware data indexes, wherein N is an integer greater than or equal to 1.
7. A root cause positioning method based on micro-service calling relation comprises the following steps:
a step R1, obtaining abnormal micro service nodes based on the method of claim 5 or 6;
step R2, calculating the influence factor of the abnormal micro service node based on the micro service calling relation graph by using the following formula;
Figure FDA0003142819090000021
where i, j represents an abnormal node, fiRepresenting the influence factor of node i, di,jRepresents the distance from node i to node j;
and R3, taking the first K nodes as root nodes according to the influence factors of the abnormal micro service nodes, wherein K is an integer greater than or equal to 1.
8. The root cause location method of claim 7, when node i is reachable to node j, di,jIs the total hop count from node i to node j; when node i is unreachable to node j, di,jM +1, M is the total number of nodes in the microservice invocation relationship graph.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
10. A computer device comprising a memory and a processor, on which memory a computer program is stored which is executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.
CN202110740846.9A 2021-07-01 2021-07-01 Method for acquiring micro-service health status index and micro-service abnormity diagnosis method Active CN113467421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110740846.9A CN113467421B (en) 2021-07-01 2021-07-01 Method for acquiring micro-service health status index and micro-service abnormity diagnosis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110740846.9A CN113467421B (en) 2021-07-01 2021-07-01 Method for acquiring micro-service health status index and micro-service abnormity diagnosis method

Publications (2)

Publication Number Publication Date
CN113467421A true CN113467421A (en) 2021-10-01
CN113467421B CN113467421B (en) 2022-10-11

Family

ID=77876845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110740846.9A Active CN113467421B (en) 2021-07-01 2021-07-01 Method for acquiring micro-service health status index and micro-service abnormity diagnosis method

Country Status (1)

Country Link
CN (1) CN113467421B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114024837A (en) * 2022-01-06 2022-02-08 杭州大乘智能科技有限公司 Fault root cause positioning method of micro-service system
CN114598539A (en) * 2022-03-16 2022-06-07 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
CN117560275A (en) * 2023-12-29 2024-02-13 安徽思高智能科技有限公司 Root cause positioning method and device for micro-service system based on graphic neural network model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109088762A (en) * 2018-08-14 2018-12-25 安徽云才信息技术有限公司 A kind of service health monitor method based on micro services
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN111078688A (en) * 2019-11-18 2020-04-28 福建天泉教育科技有限公司 Method for micro-service health check and storage medium
US10685286B1 (en) * 2019-07-30 2020-06-16 SparkCognition, Inc. Automated neural network generation using fitness estimation
CN111737033A (en) * 2020-05-26 2020-10-02 复旦大学 Micro-service fault positioning method based on runtime map analysis
CN112698975A (en) * 2020-12-14 2021-04-23 北京大学 Fault root cause positioning method and system of micro-service architecture information system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109088762A (en) * 2018-08-14 2018-12-25 安徽云才信息技术有限公司 A kind of service health monitor method based on micro services
US10685286B1 (en) * 2019-07-30 2020-06-16 SparkCognition, Inc. Automated neural network generation using fitness estimation
CN110888755A (en) * 2019-11-15 2020-03-17 亚信科技(中国)有限公司 Method and device for searching abnormal root node of micro-service system
CN111078688A (en) * 2019-11-18 2020-04-28 福建天泉教育科技有限公司 Method for micro-service health check and storage medium
CN111737033A (en) * 2020-05-26 2020-10-02 复旦大学 Micro-service fault positioning method based on runtime map analysis
CN112698975A (en) * 2020-12-14 2021-04-23 北京大学 Fault root cause positioning method and system of micro-service architecture information system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114024837A (en) * 2022-01-06 2022-02-08 杭州大乘智能科技有限公司 Fault root cause positioning method of micro-service system
CN114024837B (en) * 2022-01-06 2022-04-05 杭州乘云数字技术有限公司 Fault root cause positioning method of micro-service system
CN114598539A (en) * 2022-03-16 2022-06-07 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
CN114598539B (en) * 2022-03-16 2024-03-01 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
CN117560275A (en) * 2023-12-29 2024-02-13 安徽思高智能科技有限公司 Root cause positioning method and device for micro-service system based on graphic neural network model
CN117560275B (en) * 2023-12-29 2024-03-12 安徽思高智能科技有限公司 Root cause positioning method and device for micro-service system based on graphic neural network model

Also Published As

Publication number Publication date
CN113467421B (en) 2022-10-11

Similar Documents

Publication Publication Date Title
Wu et al. Microrca: Root cause localization of performance issues in microservices
CN113467421B (en) Method for acquiring micro-service health status index and micro-service abnormity diagnosis method
Meng et al. Localizing failure root causes in a microservice through causality inference
Yang et al. A time efficient approach for detecting errors in big sensor data on cloud
Wang et al. Groot: An event-graph-based approach for root cause analysis in industrial settings
Wu et al. Microdiag: Fine-grained performance diagnosis for microservice systems
US20080148242A1 (en) Optimizing an interaction model for an application
US20080148039A1 (en) Selecting instrumentation points for an application
US9122784B2 (en) Isolation of problems in a virtual environment
Oliner et al. Online detection of multi-component interactions in production systems
US20170034001A1 (en) Isolation of problems in a virtual environment
Lee et al. Eadro: An end-to-end troubleshooting framework for microservices on multi-source data
CN113516174B (en) Call chain abnormality detection method, computer device, and readable storage medium
CN115237717A (en) Micro-service abnormity detection method and system
Cai et al. A real-time trace-level root-cause diagnosis system in alibaba datacenters
CN116166505A (en) Monitoring platform, method, storage medium and equipment for dual-state IT architecture in financial industry
Bogatinovski et al. Artificial intelligence for it operations (aiops) workshop white paper
Gamage et al. Using dependency graph and graph theory concepts to identify anti-patterns in a microservices system: A tool-based approach
CN115118621A (en) Micro-service performance diagnosis method and system based on dependency graph
Aggarwal et al. Causal modeling based fault localization in cloud systems using golden signals
Song et al. Autonomous selection of the fault classification models for diagnosing microservice applications
Toka et al. Predicting cloud-native application failures based on monitoring data of cloud infrastructure
Wang et al. A methodology for root-cause analysis in component based systems
Samarakoon et al. System abnormality detection in stock market complex trading systems using machine learning techniques
Wang et al. Online reliability time series prediction for service-oriented system of systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant