CN111737033A - Micro-service fault positioning method based on runtime map analysis - Google Patents

Micro-service fault positioning method based on runtime map analysis Download PDF

Info

Publication number
CN111737033A
CN111737033A CN202010457981.8A CN202010457981A CN111737033A CN 111737033 A CN111737033 A CN 111737033A CN 202010457981 A CN202010457981 A CN 202010457981A CN 111737033 A CN111737033 A CN 111737033A
Authority
CN
China
Prior art keywords
micro
service
monitoring
data
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010457981.8A
Other languages
Chinese (zh)
Other versions
CN111737033B (en
Inventor
彭鑫
冀超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202010457981.8A priority Critical patent/CN111737033B/en
Publication of CN111737033A publication Critical patent/CN111737033A/en
Application granted granted Critical
Publication of CN111737033B publication Critical patent/CN111737033B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to the technical field of software engineering and cloud computing, and particularly relates to a micro-service fault positioning method based on runtime map analysis. The method automatically updates and maintains the micro-service runtime map based on the micro-service system runtime data, evaluates the abnormal degree of each system component by means of the data in the map when a request fault occurs, analyzes the propagation relation of the abnormal degree and finally obtains a fault positioning result; the method specifically comprises the following steps: the method comprises the steps of constructing and dynamically updating a map in real time during micro-service operation; fault location based on a runtime map; the method uses data such as service deployment, service calling and monitoring indexes of the micro-service system to construct a runtime map for describing the running state of the micro-service system; after the fault occurs, each component of the system is analyzed according to the map data, the most possible fault position is provided for developers, the fault positioning speed is increased, and the manual workload is reduced.

Description

Micro-service fault positioning method based on runtime map analysis
Technical Field
The invention belongs to the technical field of software engineering and cloud computing, and particularly relates to a micro-service fault positioning method.
Background
The Microservice (Microservice) architecture is an architectural concept that disassembles an entire application into several decoupled functional modules. Each functional module has a separate process and execution environment, and information interaction is performed between the functional modules through a lightweight communication protocol (such as an RPC protocol or an HTTP protocol), and such functional modules are called microservices. Fine-grained micro-service partitioning and isolation of the operating environment enable applications based on a micro-service architecture to be independently developed and deployed and flexibly scale as required. Micro-service architecture has become a key technology in cloud native technology and is widely used in many enterprises.
Fault location is an important part of system operation and maintenance. When the single system or the distributed system has faults, developers can search fault sources by adopting modes of increment debugging, operation track comparison, fault feature learning and the like. The microservice system is more complex and dynamic than a typical monolithic system and a distributed system, making the aforementioned approach less effective. When a failure of the micro-service system occurs, developers have to face complex service interaction relationships, diverse operating environments, and dynamically creating and destroying instances, causing failure localization to become difficult and inefficient.
Disclosure of Invention
The invention aims to provide a micro-service fault positioning method based on runtime map analysis, which can accelerate fault positioning speed and reduce manual workload.
The method uses data such as service deployment, service calling and monitoring indexes of the micro-service system to construct a runtime map for describing the running state of the micro-service system; after a fault occurs, the invention analyzes each component of the system according to the map data and provides the most probable fault position for developers.
The method automatically updates and maintains the micro-service runtime map based on the micro-service system runtime data, evaluates the abnormality degree of each system component by means of the data in the map when the request fault occurs, analyzes the propagation relation of the abnormality degree and finally obtains a fault positioning result. The method mainly comprises two parts, wherein one part is the real-time construction and dynamic updating of the graph during the micro-service operation, and the other part is the fault positioning by means of the graph during the operation. The former continuously updates map data as the system operates, while the latter reads map data and analyzes the location of the fault source when a fault occurs.
The runtime map continuously collects and associates the runtime data of multiple aspects of the microservice system, and aims to describe the running state of the microservice system. The original data of the map comprises service deployment, service invocation, monitoring indexes and the like. The high-level architecture of the micro-service runtime graph constructed by the invention is shown in fig. 1. The system comprises a virtual machine, a micro service instance, an API and a container, which are various running components in a micro service system and are also called nodes; there are three monitoring indicators: the container monitoring index, the micro-service instance monitoring index and the API monitoring index respectively correspond to the container, the micro-service instance and the interface and are used for storing monitoring index data of the corresponding node. Relationships between the running components include: "included," "deployed," and "attributed to" are used to describe the deployment architecture of the microservice system, e.g., a container "included" in a microservice instance, "deployed" in a virtual machine, and "attributed" to a microservice; the calling between the micro-service and the API and the belonging to describe the calling relation statistic between the micro-service and the API; the calling relation between the micro service instance and the API, wherein the calling in a certain request is recorded, and the calling in a certain request is recorded; three monitoring relationships describe those graph nodes (i.e., microservice instances, APIs, containers) that are under the monitoring of the monitoring system. The real-time construction of the runtime map and the dynamic updating process are executed circularly at intervals of fixed time to ensure the real-time updating of data. The process comprises the following steps:
(1) extracting a deployment architecture; the deployment architecture refers to the deployment position of the micro service instance on the virtual machine, the logical relationship between the micro service instance and the micro service and the container composition in the micro service instance; the part of data is mainly provided by a container arrangement platform; the method specifically comprises the following substeps:
1) acquiring data of a container, a micro service instance, a micro service and a virtual machine, building a new map and adding nodes;
2) acquiring the deployment position of the micro-service instance from the micro-service instance, and increasing the deployment-in relation of the instance in the graph;
3) acquiring the micro service of the micro service instance from the micro service instance, and increasing the attribution relation of the instance in the graph;
4) and acquiring the micro service instance to which the micro service instance belongs from the container, and adding the 'contained' relationship of the container in the figure.
(2) Analyzing the calling relation; the calling relation macroscopically refers to the calling relation between the micro service and the micro service; because each request is finished by a micro-service instance, the calling relationship refers to the calling relationship of a micro-service instance to other micro-service instance APIs in a certain request microscopically; the method specifically comprises the following substeps:
1) analyzing each cross-service call in each call chain, and adding an API node, calling in a certain request and completing in the certain request into the graph;
2) for each "call in certain request" and "complete in certain request" relationship, the "call", "belong to" relationship is added to the graph.
(3) Collecting monitoring indexes; the monitoring index data refers to the resource occupation amount and some performance indexes of the system components at different moments; for the container and micro-service example, the monitoring indexes are mainly CPU usage and memory occupation; for API, refers to request response time; the part of data is used for evaluating the running state of the component and judging whether the component is in a normal running state or not; the method specifically comprises the following substeps:
1) obtaining the name of a monitored component and the name of a monitoring index;
2) acquiring monitoring data of the corresponding component from the monitoring platform;
3) and adding monitoring nodes in the graph and storing data.
(4) Updating map data; comparing the newly constructed runtime atlas with an old atlas in a database, adding new data into the database and modifying changed data; after a fixed time interval, returning to the step (1) to circularly execute the process;
when a fault occurs, the fault source position is analyzed by means of the atlas data during operation; the method has the main idea that: calculating the degree (called abnormal degree) of each node deviating from the normal operation state by using the monitoring data of the nodes, analyzing the common cause among the nodes with high abnormal degree by using the relationship among the nodes, and finally outputting the result; the fault positioning method comprises the following 4 steps:
(1) triggering a fault locating process; when the system has an explicit request error, the fault positioning process can be triggered; explicit request errors include request result errors, request response times significantly outside of normal ranges, etc.
(2) Calculating the abnormal degree of each node of the map; the node abnormality degree is a measure of the degree of deviation of the operating conditions of each node in the graph from the normal state; monitoring index data of each past moment of the corresponding node is stored in the monitoring node in the graph; the invention uses the ratio of the difference value of the index value at a certain moment and the mean value of the index values in the past period relative to the standard deviation as the abnormal degree of the component; defining A (t) as the abnormal degree of a certain monitored node at the time t; defining t as the monitoring data acquisition time closest to the fault occurrence time; v. oftValue v of the monitoring index of the node at time tt-1Taking a value of a first monitoring index from time t onward, vt-2Taking a value for the monitoring index from the moment t to the front for the second time, and so on; mu.stThe average value of the monitoring index values n times before the time t; sigmatThe standard deviation of the monitoring indexes n times before the time t; then, for a certain monitored node, the calculation method of the abnormality degree a (t) at the time t is as follows:
Figure DEST_PATH_IMAGE001
the method specifically comprises the following substeps:
1) acquiring a plurality of times of monitoring index data before the latest moment of the fault occurrence;
2) calculating the mean value and the standard deviation of the data before the fault occurs;
3) and calculating the ratio of the difference between the monitoring index value and the mean value to the standard deviation at the latest moment of the fault occurrence, and taking the ratio result as the abnormal degree of the node.
(3) And analyzing the abnormal degree propagation relation. The position where the system fault directly occurs is often not the root position of the fault, and a plurality of fault positions may have factors which jointly cause the fault. In the step, a common cause among a plurality of abnormal assemblies is searched by combining the topological structure of the map and the abnormal degree of each assembly and analyzing the propagation relation of the abnormal degree in the map, so that the final fault positioning result is determined. The pseudocode process for the analysis of the outlier propagation relationship is described in the appendix. The method specifically comprises the following substeps:
1) taking each monitoring node as a starting point, traversing the abnormal degree of the monitoring node layer by layer in a breadth-first mode, and multiplying the abnormal degree of each layer which is propagated outwards by a damping coefficient;
2) after the propagation is finished, each node receives a plurality of abnormal degree values, and the abnormal degree values are summed to obtain a total abnormal degree value of each node, which is used as the final accumulated abnormal degree of each node.
(4) And (5) sorting and outputting the results. And sorting the results and outputting the results. The developer next examines the fault locations in order in the results and determines the final fault location.
The advantages of the invention are mainly three.
The invention provides the suspected fault position ordered list for the research of developers, and reduces the search range of fault positioning of the developers, thereby accelerating the fault positioning speed, reducing the time consumption of fault positioning and avoiding the situation that the developers search fault positions in a large number of operating components of the micro-service system.
The invention is deployed in a non-invasive mode, and does not interfere with the normal operation of the micro-service system.
The data collection and result output of the invention are real-time, and do not need excessive system resources and time requirements.
The method of the invention can greatly accelerate the speed of fault location and reduce the required manual workload. Three common faults of different types are injected into the open source micro service reference system TrainTicket and a fault positioning comparison experiment is carried out, so that the fault positioning time of the method is reduced by 64% on average compared with a fault positioning method based on a manual analysis system log.
Drawings
FIG. 1 is a diagram of a high-level structure of a micro-service runtime graph constructed by the present invention.
Detailed Description
The following description is directed to embodiments of runtime graph construction and runtime graph analysis-based fault localization for microservices that deploy and orchestrate containers using Docker and kubernets and monitor data collection and call chain data collection using Prometheus and Zipkin.
For the real-time construction and dynamic update of the micro-service runtime map, the implementation method comprises the following steps:
(1) and extracting the deployment architecture. Acquiring state and attribute data of a virtual machine, a micro-service instance and micro-service in a cluster from a Kubernetes platform interface; and acquiring the state and attribute data of each container from an interface provided by a Docker Daemon process on each virtual machine. The io. kubernets. pod. name attribute of the container indicates the micro-service instance to which it belongs; the nodename attribute of the microservice instance specifies its deployment location; the label of the microservice instance and the selector attribute of the microservice specify which microservice the microservice instance belongs to. And constructing a deployment architecture by using the data and storing the node state and the attribute.
(2) And analyzing the calling relation. And obtaining Trace data in the last period of time from a Zipkin platform interface. Each Span in each Trace is analyzed on a case by case basis. Url property of Span specifies the called API, node _ id property specifies the calling initiator or calling recipient microservice instance. And adding calling relation data in the graph by using the attributes and storing the API node related attributes.
(3) And collecting monitoring indexes. Acquiring CPU usage and memory occupation monitoring data of micro-service instances and containers from a Prometous platform, and storing the data in corresponding monitoring data nodes; and reading the duration attribute of each Span in each Trace of Zipkin in the latest period of time as response time data of the interface, and storing the response time data in the monitoring node corresponding to the API.
(4) And updating the map data. The newly acquired data is compared to the old spectra and the data is updated in the Neo4j database. And (5) after a fixed time interval of 5 seconds, entering the step (1) for cyclic execution.
For the fault positioning method, the implementation mode is as follows:
(1) triggering a fault location procedure. When the request fails, a developer can input the calling chain ID to trigger the fault positioning process.
(2) And calculating the abnormal degree of each node of the map. And reading time series data of the monitored nodes from the graph in the running process, calculating the mean value and the standard deviation of the data 20 times before the latest fault moment, and then calculating the multiple of the index value and the mean value difference value of the latest fault moment relative to the standard deviation to obtain the abnormal degree of each node.
(3) And analyzing the abnormal degree propagation relation. And (4) taking each monitoring node as a starting point, and spreading the abnormal degree outwards layer by layer along the topological relation of the map. The damping coefficient for each layer propagating outward is 0.7. And calculating the sum of the received abnormality degrees of each node in the graph.
(4) And (5) sorting and outputting the results. And classifying the map nodes according to the node types, and sequencing according to the sum of the degrees of abnormality. And removing nodes with the abnormality degree lower than the mean value of the type from the result. The rest is output as a result. Then, the developer can sequentially troubleshoot the faults according to the output results and judge the fault source.
Three common faults of different types are injected into the open source micro service reference system TrainTicket and a fault positioning comparison experiment is carried out, so that the fault positioning time of the method is reduced by 64% on average compared with a fault positioning method based on a manual analysis system log.
Appendix
And (3) a propagation analysis algorithm of the degree of abnormality on the map:
and inputting an image node set C with monitoring indexes in the runtime map and all image node sets V in the runtime map.
And V// returning the node set V of the result to calculate the accumulated abnormal degree of each node.
01: function FaultAnalysis(C, V)
02: for c in C:
03: dfsQueue.offer(c)
04: baseAbnormality = c.abnormality
05: while dfsQueue≠∅:
06: layerSize = dfsQueue.size()
07: baseAbnormality = baseAbnormality * 0.7
08: for (i = 0; i<layerSize; i++)
09: currNode<- dfsQueue.poll()
10: for neighborNode in currNode.nextNeighbors
11: v.scoreList.add(baseAbnormality)
12: dfsQueue.offer(v)
13: end for
14 end for
15: end while
16: end for
17: for v in V:
18: v. abnormality = average(v.scoreList) * log2(size(v.scoreList)+ 1)
19: end for
20: return V
21: end function

Claims (3)

1. A micro-service fault positioning method based on runtime map analysis is characterized in that a micro-service runtime map is automatically updated and maintained based on runtime data of a micro-service system, the abnormal degree of each system component is evaluated by means of the data in the map when a request fault occurs, the propagation relation of the abnormal degree is analyzed, and a fault positioning result is finally obtained; the method specifically comprises two stages: (1) the method comprises the following steps of (1) constructing and dynamically updating a graph in real time during micro-service operation, (II) positioning faults based on the graph during operation; the former continuously updates map data along with the operation of the system, and the latter reads the map data and analyzes the position of the fault source when the fault occurs;
the original data of the graph during the micro-service operation comprises service deployment, service calling and monitoring indexes; the micro-service runtime map operation architecture comprises a virtual machine, a micro-service instance, an API (application programming interface) and a container, and is various operation components in a micro-service system, also called nodes; there are three monitoring indicators: the system comprises a container monitoring index, a micro-service instance monitoring index and an API monitoring index, wherein the container monitoring index, the micro-service instance monitoring index and the API monitoring index respectively correspond to a container, a micro-service instance and an interface and are used for storing monitoring index data of corresponding nodes; relationships between the running components include: "included," "deployed," and "attributed" to describe the deployment architecture of the microservice system; the container is contained in the micro service instance, the micro service instance is deployed in the virtual machine, and the micro service instance belongs to the micro service; the calling between the micro-service and the API and the belonging to describe the calling relation statistic between the micro-service and the API; the calling relation between the micro service instance and the API, wherein the calling in a certain request is recorded, and the calling in a certain request is recorded; the three monitoring relations describe that the graph nodes, namely the micro-service instances, the API and the containers are under the monitoring of the monitoring system;
analyzing the fault root cause position by means of the runtime map data: and calculating the degree of each node deviating from the normal operation state by using the monitoring data of the nodes, namely the abnormal degree, analyzing the common cause among the nodes with high abnormal degree by using the relationship among the nodes, and finally obtaining a result.
2. The micro-service fault location method based on runtime graph analysis according to claim 1, wherein the detailed flow of the real-time construction and dynamic update stage of the micro-service runtime graph is as follows:
(1) abstraction deployment architecture
The deployment architecture refers to the deployment position of the micro service instance on the virtual machine, the logical relationship between the micro service instance and the micro service and the container composition in the micro service instance; the part of data is mainly provided by a container arrangement platform; the method specifically comprises the following substeps:
1) acquiring data of a container, a micro service instance, a micro service and a virtual machine, building a new map and adding nodes;
2) acquiring the deployment position of the micro-service instance from the micro-service instance, and increasing the deployment-in relation of the instance in the graph;
3) acquiring the micro service of the micro service instance from the micro service instance, and increasing the attribution relation of the instance in the graph;
4) acquiring the micro service instance to which the container belongs from the container, and adding the 'contained' relationship of the container in the figure;
(2) resolving call relationships
The calling relation macroscopically refers to the calling relation between the micro service and the micro service; because each request is finished by a micro-service instance, the calling relationship refers to the calling relationship of a micro-service instance to other micro-service instance APIs in a certain request microscopically; the method specifically comprises the following substeps:
1) analyzing each cross-service call in each call chain, and adding an API node, calling in a certain request and completing in the certain request into the graph;
2) for each relation of 'call in a certain request' and 'completion in a certain request', adding the relation of 'call' and 'belonging' to the graph;
(3) collecting monitoring indicators
The monitoring index data refers to the resource occupation amount and some performance indexes of the system components at different moments; for the container and micro-service example, the monitoring indexes are mainly CPU usage and memory occupation; for API, refers to request response time; the part of data is used for evaluating the running state of the component and judging whether the component is in a normal running state or not; the method specifically comprises the following substeps:
1) obtaining the name of a monitored component and the name of a monitoring index;
2) acquiring monitoring data of the corresponding component from the monitoring platform;
3) adding monitoring nodes in the graph and storing data;
(4) updating map data
Comparing the newly constructed runtime atlas with an old atlas in a database, adding new data into the database and modifying changed data; after a fixed time interval, the step (1) is returned again, and the process is executed circularly.
3. The microservice fault location method based on runtime atlas analysis of claim 1, wherein the fault location phase based on runtime atlas is as follows:
(1) triggering fault location procedures
When the system has an explicit request error, the fault positioning process can be triggered; explicit request errors include request result error, request response time significantly out of normal range;
(2) calculating the abnormal degree of each node of the map
The node abnormality degree is a measure of the degree of deviation of the operating conditions of each node in the graph from the normal state; monitoring index data of each past moment of the corresponding node is stored in the monitoring node in the graph; using the ratio of the difference value of the index value at a certain moment and the mean value of the index values in a past period of time relative to the standard deviation as the abnormality degree of the component; defining A (t) as the abnormal degree of a certain monitored node at the time t; t is the monitoring data acquisition time closest to the fault occurrence time; v. oftValue v of the monitoring index of the node at time tt-1Taking a value of a first monitoring index from time t onward, vt-2Taking a value for the monitoring index from the moment t to the front for the second time, and so on; mu.stThe average value of the monitoring index values n times before the time t; sigmatThe standard deviation of the monitoring indexes n times before the time t; then, for a certain monitored node, the calculation method of the abnormality degree a (t) at the time t is as follows:
Figure DEST_PATH_IMAGE002
the method specifically comprises the following substeps:
1) acquiring a plurality of times of monitoring index data before the latest moment of the fault occurrence;
2) calculating the mean value and the standard deviation of the data before the fault occurs;
3) calculating the ratio of the difference between the monitoring index value and the mean value to the standard deviation at the latest moment of the fault occurrence, and taking the ratio result as the abnormal degree of the node;
(3) analyzing outlier propagation relationships
Combining the topological structure of the map and the abnormality degree of each component, analyzing the propagation relation of the abnormality degree in the map, searching the common cause among the abnormal components, and further determining the final fault positioning result; the method specifically comprises the following substeps:
1) taking each monitoring node as a starting point, traversing the abnormal degree of the monitoring node layer by layer in a breadth-first mode, and multiplying the abnormal degree of each layer which is propagated outwards by a damping coefficient;
2) after the propagation is finished, each node receives a plurality of abnormal degree values, and the abnormal degree values are summed to obtain a total abnormal degree value of each node, which is used as the final accumulated abnormal degree of each node;
(4) sorting and outputting the results
Sorting and sorting the results and outputting the results; and (4) the developer examines the fault positions according to the sequence in the result and judges the final fault position.
CN202010457981.8A 2020-05-26 2020-05-26 Microservice fault positioning method based on runtime pattern analysis Active CN111737033B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010457981.8A CN111737033B (en) 2020-05-26 2020-05-26 Microservice fault positioning method based on runtime pattern analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010457981.8A CN111737033B (en) 2020-05-26 2020-05-26 Microservice fault positioning method based on runtime pattern analysis

Publications (2)

Publication Number Publication Date
CN111737033A true CN111737033A (en) 2020-10-02
CN111737033B CN111737033B (en) 2024-03-08

Family

ID=72647746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010457981.8A Active CN111737033B (en) 2020-05-26 2020-05-26 Microservice fault positioning method based on runtime pattern analysis

Country Status (1)

Country Link
CN (1) CN111737033B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540832A (en) * 2020-12-24 2021-03-23 中山大学 Cloud native system fault analysis method based on knowledge graph
CN113360722A (en) * 2021-06-25 2021-09-07 杭州优云软件有限公司 Fault root cause positioning method and system based on multidimensional data map
CN113467421A (en) * 2021-07-01 2021-10-01 中国科学院计算技术研究所 Method for acquiring micro-service health status index and micro-service abnormity diagnosis method
CN114124738A (en) * 2021-11-04 2022-03-01 昆明理工大学 Cloud environment service fault probability calculation method, system and terminal based on service interaction graph
CN114201231A (en) * 2021-11-29 2022-03-18 江苏金农股份有限公司 Distributed micro-service arranging system and method
CN114598539A (en) * 2022-03-16 2022-06-07 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
CN114675988A (en) * 2022-03-15 2022-06-28 科大讯飞股份有限公司 Component analysis method and related device
US20220308869A1 (en) * 2021-03-26 2022-09-29 International Business Machines Corporation Computer management of microservices for microservice based applications
CN115509789B (en) * 2022-09-30 2023-08-11 中国科学院重庆绿色智能技术研究院 Method and system for predicting faults of computing system based on component call analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180159747A1 (en) * 2016-12-05 2018-06-07 General Electric Company Automated feature deployment for active analytics microservices
CN109213616A (en) * 2018-09-25 2019-01-15 江苏润和软件股份有限公司 A kind of micro services software systems method for detecting abnormality based on calling map analysis
CN109800127A (en) * 2019-01-03 2019-05-24 众安信息技术服务有限公司 A kind of system fault diagnosis intelligence O&M method and system based on machine learning
CN110636108A (en) * 2019-08-16 2019-12-31 南方电网科学研究院有限责任公司 Micro-service architecture for electric power metering and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180159747A1 (en) * 2016-12-05 2018-06-07 General Electric Company Automated feature deployment for active analytics microservices
CN109213616A (en) * 2018-09-25 2019-01-15 江苏润和软件股份有限公司 A kind of micro services software systems method for detecting abnormality based on calling map analysis
CN109800127A (en) * 2019-01-03 2019-05-24 众安信息技术服务有限公司 A kind of system fault diagnosis intelligence O&M method and system based on machine learning
CN110636108A (en) * 2019-08-16 2019-12-31 南方电网科学研究院有限责任公司 Micro-service architecture for electric power metering and implementation method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵建涛;黄立松;: "微服务故障诊断相关技术研究探讨", 网络新媒体技术, no. 01, 15 January 2020 (2020-01-15) *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540832A (en) * 2020-12-24 2021-03-23 中山大学 Cloud native system fault analysis method based on knowledge graph
US20220308869A1 (en) * 2021-03-26 2022-09-29 International Business Machines Corporation Computer management of microservices for microservice based applications
CN113360722A (en) * 2021-06-25 2021-09-07 杭州优云软件有限公司 Fault root cause positioning method and system based on multidimensional data map
CN113360722B (en) * 2021-06-25 2022-08-09 杭州优云软件有限公司 Fault root cause positioning method and system based on multidimensional data map
CN113467421A (en) * 2021-07-01 2021-10-01 中国科学院计算技术研究所 Method for acquiring micro-service health status index and micro-service abnormity diagnosis method
CN114124738A (en) * 2021-11-04 2022-03-01 昆明理工大学 Cloud environment service fault probability calculation method, system and terminal based on service interaction graph
CN114124738B (en) * 2021-11-04 2024-03-19 昆明理工大学 Cloud environment service fault probability calculation method, system and terminal based on service interaction diagram
CN114201231A (en) * 2021-11-29 2022-03-18 江苏金农股份有限公司 Distributed micro-service arranging system and method
CN114675988A (en) * 2022-03-15 2022-06-28 科大讯飞股份有限公司 Component analysis method and related device
CN114598539A (en) * 2022-03-16 2022-06-07 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
CN114598539B (en) * 2022-03-16 2024-03-01 京东科技信息技术有限公司 Root cause positioning method and device, storage medium and electronic equipment
CN115509789B (en) * 2022-09-30 2023-08-11 中国科学院重庆绿色智能技术研究院 Method and system for predicting faults of computing system based on component call analysis

Also Published As

Publication number Publication date
CN111737033B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN111737033A (en) Micro-service fault positioning method based on runtime map analysis
Debnath et al. LogLens: A real-time log analysis system
CN111459766A (en) Calling chain tracking and analyzing method for micro-service system
Balsamo et al. A review on queueing network models with finite capacity queues for software architectures performance prediction
US20220358023A1 (en) Method And System For The On-Demand Generation Of Graph-Like Models Out Of Multidimensional Observation Data
Rugina et al. The ADAPT tool: From AADL architectural models to stochastic petri nets through model transformation
CN114095273A (en) Deep learning-based internet vulnerability mining method and big data mining system
CN109936479B (en) Control plane fault diagnosis system based on differential detection and implementation method thereof
CN111459698A (en) Database cluster fault self-healing method and device
CN115118621B (en) Dependency graph-based micro-service performance diagnosis method and system
WO2021109874A1 (en) Method for generating topology diagram, anomaly detection method, device, apparatus, and storage medium
CN115145751A (en) Method, device, equipment and storage medium for positioning fault root cause of micro-service system
CN115374595A (en) Automatic software process modeling method and system based on process mining
Rodrigues et al. Component identification through program slicing
Saluja et al. Optimized approach for antipattern detection in service computing architecture
CN111459984B (en) Log data processing system and method based on streaming processing
CN116097226A (en) Apparatus and method for injecting faults into a distributed system
Helal et al. Runtime deduction of case ID for unlabeled business process execution events
D’Ambrogio et al. A method for the prediction of software reliability
US8478575B1 (en) Automatic anomaly detection for HW debug
Chen et al. MFRL-CA: Microservice fault root cause location based on correlation analysis
CN111523921B (en) Funnel analysis method, analysis device, electronic device, and readable storage medium
Zuo et al. Temporal relations extraction and analysis of log events for micro-service framework
del Foyo et al. Improving the verification of real-time systems using time Petri nets
Klenik et al. Adding semantics to measurements: Ontology-guided, systematic performance analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant