CN110134539A - A kind of diagnostic method of Faults in Distributed Systems root - Google Patents

A kind of diagnostic method of Faults in Distributed Systems root Download PDF

Info

Publication number
CN110134539A
CN110134539A CN201910398251.2A CN201910398251A CN110134539A CN 110134539 A CN110134539 A CN 110134539A CN 201910398251 A CN201910398251 A CN 201910398251A CN 110134539 A CN110134539 A CN 110134539A
Authority
CN
China
Prior art keywords
faults
calling
root
diagnostic method
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910398251.2A
Other languages
Chinese (zh)
Inventor
程名
蒋世勇
金先友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jizhi (shanghai) Enterprise Management Consulting Co Ltd
Original Assignee
Jizhi (shanghai) Enterprise Management Consulting Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jizhi (shanghai) Enterprise Management Consulting Co Ltd filed Critical Jizhi (shanghai) Enterprise Management Consulting Co Ltd
Priority to CN201910398251.2A priority Critical patent/CN110134539A/en
Publication of CN110134539A publication Critical patent/CN110134539A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention discloses a kind of diagnostic methods of Faults in Distributed Systems root, comprising the following steps: constructs the calling figure of the distributed system;Based on the calling figure, the fault rootstock is searched.The present invention simulates the artificial process for finding abnormal root by automation building anomalous propagation figure, by Random Walk Algorithm, reduces human error, reduces the time of O&M investigation failure, improves the efficiency that failure solves, and improves whole O&M quality on line.

Description

A kind of diagnostic method of Faults in Distributed Systems root
Technical field
A kind of computer science software information technical field of the present invention, and in particular to diagnosis of Faults in Distributed Systems root Method.
Background technique
Distributed system is to support the software systems of distributed treatment, is in the multiprocessor system interconnected by communication network The system of task is executed in structure.It includes distributed operating system, distributed program design language and its compiling (explanation) system System, distributed file system and distributed data base system etc..And the failure of distributed system can occur in various modules.
For distributed system, when Artificial Diagnosis fault rootstock, O&M engineer is often according to the mould in brain Block call graph carrys out Check System.Many times, failure is all because seeing many mistakes on the front-end module of most upstream The request discovery lost.At this moment, O&M engineer will look into down along A.Because A has invoked B module, need to check B Index, if there is Indexes Abnormality so with regard to suspect be that B results in failure.Then the direct downstream module C of B is reexamined, with this Analogize.In this process, the suspection of O&M engineer is constantly transmitted down by the call relation of module, until passing not go down Until.
Artificial Diagnosis mode, it is desirable that O&M engineer has business professional knowledge, can identify service exception, while Artificial Diagnosis Mode is not able to satisfy quickly positioning, the O&M scenarios solved the problems, such as.
Summary of the invention
For the defects in the prior art, the present invention provides a kind of diagnostic method of Faults in Distributed Systems root, reduces Dependence of the operation maintenance personnel to business professional knowledge solves the problems, such as that problem difficult, that orientation problem is slow is positioned manually.
In order to solve the above-mentioned technical problem, present invention employs the following technical solutions:
A kind of diagnostic method of Faults in Distributed Systems root, which comprises the following steps:
Construct the calling figure of the distributed system;
Based on the calling figure, the fault rootstock is searched.
Optionally, the calling figure of the building distributed system includes:
The call relation of intermodule each in the distributed system is built into calling figure.
Optionally, the calling figure is stored in database profession.
Optionally, the database is chart database.
Optionally, described to be based on the calling figure, searching the fault rootstock includes:
The exception of distributed system is labeled on the node of graph of the calling figure, forms anomalous propagation figure.
Optionally, described to be based on the calling figure, search the fault rootstock further include:
Fault rootstock is searched in the anomalous propagation figure.
Optionally, the method also includes:
The fault rootstock is shown on interface.
Optionally, the fault rootstock is searched based on Random Walk Algorithm.
Optionally, the fault rootstock is shown on interface to include that local fault root is shown.
Optionally, the fault rootstock is shown on interface to include that global fault's root is shown.
Optionally, the local fault root is shown method particularly includes: the module name for inputting abnormal module, to described different The downstream module of norm block carries out fault inquiry and shows.
Optionally, global fault's root is shown method particularly includes: is carried out to the downstream module of all abnormal modules Fault inquiry is simultaneously shown.
In this way, user can see the fault rootstocks of all abnormal modules in front-end interface.
The beneficial effects of the present invention are embodied in: the present invention is calculated by automation building anomalous propagation figure by random walk The artificial process for finding abnormal root of method simulation, reduces human error, reduces the time of O&M investigation failure, improve failure The efficiency of solution improves whole O&M quality on line.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is a kind of method flow of the diagnostic method of Faults in Distributed Systems root described in a certain embodiment of the present invention Figure;
Fig. 2 is a kind of method flow of the diagnostic method of Faults in Distributed Systems root described in another embodiment of the present invention Figure;
Fig. 3 is the exemplary diagram of calling figure.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention It encloses.
It should be noted that unless otherwise indicated, technical term or scientific term used in this application should be this hair The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in Figure 1, a kind of diagnostic method of Faults in Distributed Systems root, comprising the following steps:
A kind of diagnostic method of Faults in Distributed Systems root, which comprises the following steps:
Construct the calling figure of the distributed system;
Based on the calling figure, the fault rootstock is searched.
Optionally, the calling figure of the building distributed system includes:
The call relation of intermodule each in the distributed system is built into calling figure.
Optionally, the calling figure is stored in database profession.
Optionally, the database is chart database.
Optionally, described to be based on the calling figure, searching the fault rootstock includes:
The exception of distributed system is labeled on the node of graph of the calling figure, forms anomalous propagation figure.
Optionally, described to be based on the calling figure, search the fault rootstock further include:
Fault rootstock is searched in the anomalous propagation figure.
Optionally, the method also includes:
The fault rootstock is shown on interface.
Optionally, the fault rootstock is searched based on Random Walk Algorithm.
Optionally, the fault rootstock is shown on interface to include that local fault root is shown.
Optionally, the fault rootstock is shown on interface to include that global fault's root is shown.
Optionally, the local fault root is shown method particularly includes: the module name for inputting abnormal module, to described different The downstream module of norm block carries out fault inquiry and shows.
Optionally, global fault's root is shown method particularly includes: is carried out to the downstream module of all abnormal modules Fault inquiry is simultaneously shown.
As shown in Fig. 2, a kind of diagnostic method of Faults in Distributed Systems root, comprising the following steps:
S1, the call relation of distributed system intermodule is built into calling figure and is stored in database profession.
In the present embodiment, database is chart database.Distributed remote invocation of procedure frame is transformed, uploads and calls It is related to message queue, then by consumption message, obtains the call relation of the intermodule of system, using system module as vertex, Relationship is directed edge, forms a digraph, is stored by diagram data.Here module is exactly system engineering.
Fig. 3 is the exemplary diagram of calling figure, and each point is exactly a module, and the oriented arrow between module is exactly between module Call relation.The point of arrow is called module.
S2, the exception of distributed system is labeled on the node of graph of calling figure, forms anomalous propagation figure.
Then whether alarm for each module, increase " Yes/No " attribute to module vertex, to form a Zhang Yi Often propagate figure.Having abnormal is "Yes", and not abnormal is "No".
S3, the tracing for troubles root in anomalous propagation figure.
In the present embodiment, for anomalous propagation figure, the artificial process for finding root is simulated, the algorithm of tracing for troubles root is Random Walk Algorithm.Specifically: N number of people is simulated, everyone is then respectively past on any abnormal module vertex of anomalous propagation figure Then lower lookup abnormal module counts the vertex of all processes until cannot continue down, as long as simulated person It passes by primary, just count is incremented, finally provides the most ranked list of number of passing by, they are considered possible fault rootstock.
Such as: from an abnormal nodes, the process of abnormal module is inquired down:
Definition A is upstream abnormal module, and B, C, D are downstream abnormal module, and the number that reports an error for defining B, C, D is the different of them Chang Chengdu reports an error more, and intensity of anomaly is bigger.
S4, fault rootstock is shown on interface.
In the present embodiment, methods of exhibiting includes that local fault root is shown and the displaying of global fault's root.Specifically:
O&M engineer checks fault rootstock by front-end interface, is divided into local fault root and global fault's root, office Portion's fault rootstock needs to input specific module name, carries out fault location to the abnormal of the module, i.e., simulated person is only in the mould The relevant off path of block carries out migration;Global fault's root positions the current whole failure of system, i.e. simulated person Migration is carried out in all off paths.
That is, finding failure by step S1-S3 when system breaks down, fault rootstock is shown by step S4, shows event The method of barrier root, which is divided into, shows global fault's root and displaying local fault root.Show that global fault's root shows label For faulty application, at this point, all labels are searched the source of trouble according to Random Walk Algorithm for application and are shown Come;It shows that local fault can search for some application, this event applied will be searched by random walk after putting some application Hinder root.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme should all cover within the scope of the claims and the description of the invention.

Claims (10)

1. a kind of diagnostic method of Faults in Distributed Systems root, which comprises the following steps:
Construct the calling figure of the distributed system;
Based on the calling figure, the fault rootstock is searched.
2. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that the building institute The calling figure for stating distributed system includes:
The call relation of intermodule each in the distributed system is built into calling figure.
3. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that by the calling Figure is stored in database profession.
4. a kind of diagnostic method of Faults in Distributed Systems root according to claim 3, which is characterized in that the database For chart database.
5. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that described to be based on institute Calling figure is stated, searching the fault rootstock includes:
The exception of distributed system is labeled on the node of graph of the calling figure, forms anomalous propagation figure.
6. a kind of diagnostic method of Faults in Distributed Systems root according to claim 3, which is characterized in that described to be based on institute Calling figure is stated, the fault rootstock is searched further include:
Fault rootstock is searched in the anomalous propagation figure.
7. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that the method is also Include:
The fault rootstock is shown on interface.
8. a kind of diagnostic method of Faults in Distributed Systems root according to claim 1, which is characterized in that based on random trip It walks algorithm and searches the fault rootstock.
9. a kind of diagnostic method of Faults in Distributed Systems root according to claim 7, which is characterized in that by the failure Root shows on interface to include that local fault root is shown.
10. a kind of diagnostic method of Faults in Distributed Systems root according to claim 7, which is characterized in that will the event Hinder root and shows on interface to include that global fault's root is shown.
CN201910398251.2A 2019-05-14 2019-05-14 A kind of diagnostic method of Faults in Distributed Systems root Withdrawn CN110134539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910398251.2A CN110134539A (en) 2019-05-14 2019-05-14 A kind of diagnostic method of Faults in Distributed Systems root

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910398251.2A CN110134539A (en) 2019-05-14 2019-05-14 A kind of diagnostic method of Faults in Distributed Systems root

Publications (1)

Publication Number Publication Date
CN110134539A true CN110134539A (en) 2019-08-16

Family

ID=67573755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910398251.2A Withdrawn CN110134539A (en) 2019-05-14 2019-05-14 A kind of diagnostic method of Faults in Distributed Systems root

Country Status (1)

Country Link
CN (1) CN110134539A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609761A (en) * 2019-09-06 2019-12-24 北京三快在线科技有限公司 Method and device for determining fault source, storage medium and electronic equipment
CN111597070A (en) * 2020-07-27 2020-08-28 北京必示科技有限公司 Fault positioning method and device, electronic equipment and storage medium
CN111679953A (en) * 2020-06-09 2020-09-18 平安科技(深圳)有限公司 Fault node identification method, device, equipment and medium based on artificial intelligence
CN113162787A (en) * 2020-01-23 2021-07-23 华为技术有限公司 Method for fault location in a telecommunication network, node classification method and related device
CN114064344A (en) * 2022-01-18 2022-02-18 苏州浪潮智能科技有限公司 Root cause positioning method, device and medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609761A (en) * 2019-09-06 2019-12-24 北京三快在线科技有限公司 Method and device for determining fault source, storage medium and electronic equipment
CN110609761B (en) * 2019-09-06 2020-10-16 北京三快在线科技有限公司 Method and device for determining fault source, storage medium and electronic equipment
CN113162787A (en) * 2020-01-23 2021-07-23 华为技术有限公司 Method for fault location in a telecommunication network, node classification method and related device
CN113162787B (en) * 2020-01-23 2023-09-29 华为技术有限公司 Method for fault location in a telecommunication network, node classification method and related devices
CN111679953A (en) * 2020-06-09 2020-09-18 平安科技(深圳)有限公司 Fault node identification method, device, equipment and medium based on artificial intelligence
WO2021114613A1 (en) * 2020-06-09 2021-06-17 平安科技(深圳)有限公司 Artificial intelligence-based fault node identification method, device, apparatus, and medium
CN111679953B (en) * 2020-06-09 2022-04-12 平安科技(深圳)有限公司 Fault node identification method, device, equipment and medium based on artificial intelligence
CN111597070A (en) * 2020-07-27 2020-08-28 北京必示科技有限公司 Fault positioning method and device, electronic equipment and storage medium
CN114064344A (en) * 2022-01-18 2022-02-18 苏州浪潮智能科技有限公司 Root cause positioning method, device and medium

Similar Documents

Publication Publication Date Title
CN110134539A (en) A kind of diagnostic method of Faults in Distributed Systems root
Wolf et al. Mining task-based social networks to explore collaboration in software teams
US7194445B2 (en) Adaptive problem determination and recovery in a computer system
US9413597B2 (en) Method and system for providing aggregated network alarms
US20090271351A1 (en) Rules engine test harness
CN108711030A (en) The end-to-end project management platform integrated with artificial intelligence
Fox The intelligent management system: an overview
US11422795B2 (en) System and method for predicting the impact of source code modification based on historical source code modifications
US11853794B2 (en) Pipeline task verification for a data processing platform
Gökalp et al. A visual programming framework for distributed Internet of Things centric complex event processing
JP7442001B1 (en) Comprehensive failure diagnosis method for hydroelectric power generation units
US20220291966A1 (en) Systems and methods for process mining using unsupervised learning and for automating orchestration of workflows
WO2024031191A1 (en) Systems and methods for project and program management using artificial intelligence
US11544055B2 (en) System and method for identifying source code defect introduction during source code modification
US20210142233A1 (en) Systems and methods for process mining using unsupervised learning
Kim et al. Machine learning frameworks for automated software testing tools: a study
US11790249B1 (en) Automatically evaluating application architecture through architecture-as-code
JP6820956B2 (en) Systems and methods for identifying information relevant to a company
US7562061B2 (en) Context-based failure reporting for a constraint satisfaction problem
Pinto et al. Maturity models for business continuity–A systematic literature review
US20220399132A1 (en) Machine learning models for automated selection of executable sequences
Yousef et al. On the use of predictive analytics techniques for network elements failure prediction in telecom operators
Bashir et al. Smart Cities Paradigm with AI-Enabled Effective Requirements Engineering
Fard et al. Detection of implied scenarios in multiagent systems with clustering agents' communications
TWI536289B (en) System and method for identifying relevant information for an enterprise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190816