CN110134539A - A kind of diagnostic method of Faults in Distributed Systems root - Google Patents
A kind of diagnostic method of Faults in Distributed Systems root Download PDFInfo
- Publication number
- CN110134539A CN110134539A CN201910398251.2A CN201910398251A CN110134539A CN 110134539 A CN110134539 A CN 110134539A CN 201910398251 A CN201910398251 A CN 201910398251A CN 110134539 A CN110134539 A CN 110134539A
- Authority
- CN
- China
- Prior art keywords
- faults
- calling
- root
- diagnostic method
- fault
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Test And Diagnosis Of Digital Computers (AREA)
Abstract
The invention discloses a kind of diagnostic methods of Faults in Distributed Systems root, comprising the following steps: constructs the calling figure of the distributed system;Based on the calling figure, the fault rootstock is searched.The present invention simulates the artificial process for finding abnormal root by automation building anomalous propagation figure, by Random Walk Algorithm, reduces human error, reduces the time of O&M investigation failure, improves the efficiency that failure solves, and improves whole O&M quality on line.
Description
Technical field
A kind of computer science software information technical field of the present invention, and in particular to diagnosis of Faults in Distributed Systems root
Method.
Background technique
Distributed system is to support the software systems of distributed treatment, is in the multiprocessor system interconnected by communication network
The system of task is executed in structure.It includes distributed operating system, distributed program design language and its compiling (explanation) system
System, distributed file system and distributed data base system etc..And the failure of distributed system can occur in various modules.
For distributed system, when Artificial Diagnosis fault rootstock, O&M engineer is often according to the mould in brain
Block call graph carrys out Check System.Many times, failure is all because seeing many mistakes on the front-end module of most upstream
The request discovery lost.At this moment, O&M engineer will look into down along A.Because A has invoked B module, need to check B
Index, if there is Indexes Abnormality so with regard to suspect be that B results in failure.Then the direct downstream module C of B is reexamined, with this
Analogize.In this process, the suspection of O&M engineer is constantly transmitted down by the call relation of module, until passing not go down
Until.
Artificial Diagnosis mode, it is desirable that O&M engineer has business professional knowledge, can identify service exception, while Artificial Diagnosis
Mode is not able to satisfy quickly positioning, the O&M scenarios solved the problems, such as.
Summary of the invention
For the defects in the prior art, the present invention provides a kind of diagnostic method of Faults in Distributed Systems root, reduces
Dependence of the operation maintenance personnel to business professional knowledge solves the problems, such as that problem difficult, that orientation problem is slow is positioned manually.
In order to solve the above-mentioned technical problem, present invention employs the following technical solutions:
A kind of diagnostic method of Faults in Distributed Systems root, which comprises the following steps:
Construct the calling figure of the distributed system;
Based on the calling figure, the fault rootstock is searched.
Optionally, the calling figure of the building distributed system includes:
The call relation of intermodule each in the distributed system is built into calling figure.
Optionally, the calling figure is stored in database profession.
Optionally, the database is chart database.
Optionally, described to be based on the calling figure, searching the fault rootstock includes:
The exception of distributed system is labeled on the node of graph of the calling figure, forms anomalous propagation figure.
Optionally, described to be based on the calling figure, search the fault rootstock further include:
Fault rootstock is searched in the anomalous propagation figure.
Optionally, the method also includes:
The fault rootstock is shown on interface.
Optionally, the fault rootstock is searched based on Random Walk Algorithm.
Optionally, the fault rootstock is shown on interface to include that local fault root is shown.
Optionally, the fault rootstock is shown on interface to include that global fault's root is shown.
Optionally, the local fault root is shown method particularly includes: the module name for inputting abnormal module, to described different
The downstream module of norm block carries out fault inquiry and shows.
Optionally, global fault's root is shown method particularly includes: is carried out to the downstream module of all abnormal modules
Fault inquiry is simultaneously shown.
In this way, user can see the fault rootstocks of all abnormal modules in front-end interface.
The beneficial effects of the present invention are embodied in: the present invention is calculated by automation building anomalous propagation figure by random walk
The artificial process for finding abnormal root of method simulation, reduces human error, reduces the time of O&M investigation failure, improve failure
The efficiency of solution improves whole O&M quality on line.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element
Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is a kind of method flow of the diagnostic method of Faults in Distributed Systems root described in a certain embodiment of the present invention
Figure;
Fig. 2 is a kind of method flow of the diagnostic method of Faults in Distributed Systems root described in another embodiment of the present invention
Figure;
Fig. 3 is the exemplary diagram of calling figure.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for
Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention
It encloses.
It should be noted that unless otherwise indicated, technical term or scientific term used in this application should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in Figure 1, a kind of diagnostic method of Faults in Distributed Systems root, comprising the following steps:
A kind of diagnostic method of Faults in Distributed Systems root, which comprises the following steps:
Construct the calling figure of the distributed system;
Based on the calling figure, the fault rootstock is searched.
Optionally, the calling figure of the building distributed system includes:
The call relation of intermodule each in the distributed system is built into calling figure.
Optionally, the calling figure is stored in database profession.
Optionally, the database is chart database.
Optionally, described to be based on the calling figure, searching the fault rootstock includes:
The exception of distributed system is labeled on the node of graph of the calling figure, forms anomalous propagation figure.
Optionally, described to be based on the calling figure, search the fault rootstock further include:
Fault rootstock is searched in the anomalous propagation figure.
Optionally, the method also includes:
The fault rootstock is shown on interface.
Optionally, the fault rootstock is searched based on Random Walk Algorithm.
Optionally, the fault rootstock is shown on interface to include that local fault root is shown.
Optionally, the fault rootstock is shown on interface to include that global fault's root is shown.
Optionally, the local fault root is shown method particularly includes: the module name for inputting abnormal module, to described different
The downstream module of norm block carries out fault inquiry and shows.
Optionally, global fault's root is shown method particularly includes: is carried out to the downstream module of all abnormal modules
Fault inquiry is simultaneously shown.
As shown in Fig. 2, a kind of diagnostic method of Faults in Distributed Systems root, comprising the following steps:
S1, the call relation of distributed system intermodule is built into calling figure and is stored in database profession.
In the present embodiment, database is chart database.Distributed remote invocation of procedure frame is transformed, uploads and calls
It is related to message queue, then by consumption message, obtains the call relation of the intermodule of system, using system module as vertex,
Relationship is directed edge, forms a digraph, is stored by diagram data.Here module is exactly system engineering.
Fig. 3 is the exemplary diagram of calling figure, and each point is exactly a module, and the oriented arrow between module is exactly between module
Call relation.The point of arrow is called module.
S2, the exception of distributed system is labeled on the node of graph of calling figure, forms anomalous propagation figure.
Then whether alarm for each module, increase " Yes/No " attribute to module vertex, to form a Zhang Yi
Often propagate figure.Having abnormal is "Yes", and not abnormal is "No".
S3, the tracing for troubles root in anomalous propagation figure.
In the present embodiment, for anomalous propagation figure, the artificial process for finding root is simulated, the algorithm of tracing for troubles root is
Random Walk Algorithm.Specifically: N number of people is simulated, everyone is then respectively past on any abnormal module vertex of anomalous propagation figure
Then lower lookup abnormal module counts the vertex of all processes until cannot continue down, as long as simulated person
It passes by primary, just count is incremented, finally provides the most ranked list of number of passing by, they are considered possible fault rootstock.
Such as: from an abnormal nodes, the process of abnormal module is inquired down:
Definition A is upstream abnormal module, and B, C, D are downstream abnormal module, and the number that reports an error for defining B, C, D is the different of them
Chang Chengdu reports an error more, and intensity of anomaly is bigger.
S4, fault rootstock is shown on interface.
In the present embodiment, methods of exhibiting includes that local fault root is shown and the displaying of global fault's root.Specifically:
O&M engineer checks fault rootstock by front-end interface, is divided into local fault root and global fault's root, office
Portion's fault rootstock needs to input specific module name, carries out fault location to the abnormal of the module, i.e., simulated person is only in the mould
The relevant off path of block carries out migration;Global fault's root positions the current whole failure of system, i.e. simulated person
Migration is carried out in all off paths.
That is, finding failure by step S1-S3 when system breaks down, fault rootstock is shown by step S4, shows event
The method of barrier root, which is divided into, shows global fault's root and displaying local fault root.Show that global fault's root shows label
For faulty application, at this point, all labels are searched the source of trouble according to Random Walk Algorithm for application and are shown
Come;It shows that local fault can search for some application, this event applied will be searched by random walk after putting some application
Hinder root.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme should all cover within the scope of the claims and the description of the invention.
Claims (10)
1. a kind of diagnostic method of Faults in Distributed Systems root, which comprises the following steps:
Construct the calling figure of the distributed system;
Based on the calling figure, the fault rootstock is searched.
2. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that the building institute
The calling figure for stating distributed system includes:
The call relation of intermodule each in the distributed system is built into calling figure.
3. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that by the calling
Figure is stored in database profession.
4. a kind of diagnostic method of Faults in Distributed Systems root according to claim 3, which is characterized in that the database
For chart database.
5. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that described to be based on institute
Calling figure is stated, searching the fault rootstock includes:
The exception of distributed system is labeled on the node of graph of the calling figure, forms anomalous propagation figure.
6. a kind of diagnostic method of Faults in Distributed Systems root according to claim 3, which is characterized in that described to be based on institute
Calling figure is stated, the fault rootstock is searched further include:
Fault rootstock is searched in the anomalous propagation figure.
7. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that the method is also
Include:
The fault rootstock is shown on interface.
8. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that based on random trip
It walks algorithm and searches the fault rootstock.
9. a kind of diagnostic method of Faults in Distributed Systems root according to claim 7, which is characterized in that by the failure
Root shows on interface to include that local fault root is shown.
10. a kind of diagnostic method of Faults in Distributed Systems root according to claim 7, which is characterized in that will the event
Hinder root and shows on interface to include that global fault's root is shown.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910398251.2A CN110134539A (en) | 2019-05-14 | 2019-05-14 | A kind of diagnostic method of Faults in Distributed Systems root |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910398251.2A CN110134539A (en) | 2019-05-14 | 2019-05-14 | A kind of diagnostic method of Faults in Distributed Systems root |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110134539A true CN110134539A (en) | 2019-08-16 |
Family
ID=67573755
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910398251.2A Withdrawn CN110134539A (en) | 2019-05-14 | 2019-05-14 | A kind of diagnostic method of Faults in Distributed Systems root |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110134539A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110609761A (en) * | 2019-09-06 | 2019-12-24 | 北京三快在线科技有限公司 | Method and device for determining fault source, storage medium and electronic equipment |
CN111597070A (en) * | 2020-07-27 | 2020-08-28 | 北京必示科技有限公司 | Fault positioning method and device, electronic equipment and storage medium |
CN111679953A (en) * | 2020-06-09 | 2020-09-18 | 平安科技(深圳)有限公司 | Fault node identification method, device, equipment and medium based on artificial intelligence |
CN113162787A (en) * | 2020-01-23 | 2021-07-23 | 华为技术有限公司 | Method for fault location in a telecommunication network, node classification method and related device |
CN114064344A (en) * | 2022-01-18 | 2022-02-18 | 苏州浪潮智能科技有限公司 | Root cause positioning method, device and medium |
-
2019
- 2019-05-14 CN CN201910398251.2A patent/CN110134539A/en not_active Withdrawn
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110609761A (en) * | 2019-09-06 | 2019-12-24 | 北京三快在线科技有限公司 | Method and device for determining fault source, storage medium and electronic equipment |
CN110609761B (en) * | 2019-09-06 | 2020-10-16 | 北京三快在线科技有限公司 | Method and device for determining fault source, storage medium and electronic equipment |
CN113162787A (en) * | 2020-01-23 | 2021-07-23 | 华为技术有限公司 | Method for fault location in a telecommunication network, node classification method and related device |
CN113162787B (en) * | 2020-01-23 | 2023-09-29 | 华为技术有限公司 | Method for fault location in a telecommunication network, node classification method and related devices |
CN111679953A (en) * | 2020-06-09 | 2020-09-18 | 平安科技(深圳)有限公司 | Fault node identification method, device, equipment and medium based on artificial intelligence |
WO2021114613A1 (en) * | 2020-06-09 | 2021-06-17 | 平安科技(深圳)有限公司 | Artificial intelligence-based fault node identification method, device, apparatus, and medium |
CN111679953B (en) * | 2020-06-09 | 2022-04-12 | 平安科技(深圳)有限公司 | Fault node identification method, device, equipment and medium based on artificial intelligence |
CN111597070A (en) * | 2020-07-27 | 2020-08-28 | 北京必示科技有限公司 | Fault positioning method and device, electronic equipment and storage medium |
CN114064344A (en) * | 2022-01-18 | 2022-02-18 | 苏州浪潮智能科技有限公司 | Root cause positioning method, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110134539A (en) | A kind of diagnostic method of Faults in Distributed Systems root | |
Wolf et al. | Mining task-based social networks to explore collaboration in software teams | |
US7194445B2 (en) | Adaptive problem determination and recovery in a computer system | |
US9413597B2 (en) | Method and system for providing aggregated network alarms | |
US20090271351A1 (en) | Rules engine test harness | |
CN108711030A (en) | The end-to-end project management platform integrated with artificial intelligence | |
Fox | The intelligent management system: an overview | |
US11422795B2 (en) | System and method for predicting the impact of source code modification based on historical source code modifications | |
US11853794B2 (en) | Pipeline task verification for a data processing platform | |
Gökalp et al. | A visual programming framework for distributed Internet of Things centric complex event processing | |
JP7442001B1 (en) | Comprehensive failure diagnosis method for hydroelectric power generation units | |
US20220291966A1 (en) | Systems and methods for process mining using unsupervised learning and for automating orchestration of workflows | |
WO2024031191A1 (en) | Systems and methods for project and program management using artificial intelligence | |
US11544055B2 (en) | System and method for identifying source code defect introduction during source code modification | |
US20210142233A1 (en) | Systems and methods for process mining using unsupervised learning | |
Kim et al. | Machine learning frameworks for automated software testing tools: a study | |
US11790249B1 (en) | Automatically evaluating application architecture through architecture-as-code | |
JP6820956B2 (en) | Systems and methods for identifying information relevant to a company | |
US7562061B2 (en) | Context-based failure reporting for a constraint satisfaction problem | |
Pinto et al. | Maturity models for business continuity–A systematic literature review | |
US20220399132A1 (en) | Machine learning models for automated selection of executable sequences | |
Yousef et al. | On the use of predictive analytics techniques for network elements failure prediction in telecom operators | |
Bashir et al. | Smart Cities Paradigm with AI-Enabled Effective Requirements Engineering | |
Fard et al. | Detection of implied scenarios in multiagent systems with clustering agents' communications | |
TWI536289B (en) | System and method for identifying relevant information for an enterprise |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190816 |