CN116155688A - Link fault detection method, device, equipment and medium - Google Patents

Link fault detection method, device, equipment and medium Download PDF

Info

Publication number
CN116155688A
CN116155688A CN202211500900.3A CN202211500900A CN116155688A CN 116155688 A CN116155688 A CN 116155688A CN 202211500900 A CN202211500900 A CN 202211500900A CN 116155688 A CN116155688 A CN 116155688A
Authority
CN
China
Prior art keywords
analysis
link
type
fault
micro
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211500900.3A
Other languages
Chinese (zh)
Inventor
朱珠
凌志钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Volcano Engine Technology Co Ltd
Original Assignee
Beijing Volcano Engine Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Volcano Engine Technology Co Ltd filed Critical Beijing Volcano Engine Technology Co Ltd
Priority to CN202211500900.3A priority Critical patent/CN116155688A/en
Publication of CN116155688A publication Critical patent/CN116155688A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/065Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability

Abstract

The present disclosure provides a method, an apparatus, a device, and a medium for detecting a link failure, where the method includes: and responding to a link detection request aiming at the target fault exposure point, acquiring an analysis object corresponding to the target fault exposure point and an analysis type of the analysis object, and screening a link set to be analyzed from a plurality of links based on the analysis object and the analysis type, wherein the links are used for representing a calling relation among micro services generated during a micro service processing task, and carrying out link fault detection on the link set based on the analysis object and the analysis type to obtain a link fault detection result of the target fault exposure point. According to the embodiment of the disclosure, the link is detected through the analysis object and the analysis type, manual investigation of fault exposure points in sequence is not needed, fault detection time is saved, and detection efficiency is improved.

Description

Link fault detection method, device, equipment and medium
Technical Field
The disclosure relates to the technical field of micro services, and in particular relates to a link failure detection method, a device, electronic equipment and a storage medium.
Background
The micro service architecture is an emerging software architecture, and functions of some complex services are realized by deploying a plurality of micro services, and because each micro service is mutually dependent and mutually influenced, when any micro service fails, other micro services associated with the micro service are abnormal, so that sources of the failures and the influenced other micro services need to be detected. In the related art, when a micro service fails, manual analysis is generally adopted to check the failure information of each micro service, and the manual effort consumed by the method is more and the check time is longer, so that the failure check efficiency is lower.
Disclosure of Invention
The embodiment of the disclosure provides at least a method, a device, an electronic device and a storage medium for detecting link faults, which are used for automatically detecting faults of a plurality of links based on analysis objects and analysis types of the analysis objects, and are beneficial to improving the efficiency of detecting the link faults.
The embodiment of the disclosure provides a link failure detection method, which comprises the following steps:
responding to a link detection request aiming at a target fault exposure point, and acquiring an analysis object corresponding to the target fault exposure point and an analysis type of the analysis object; the analysis object is used for representing a target micro-service corresponding to the target fault exposure point, and the analysis type is used for representing the calling type of the target micro-service;
Screening a link set to be analyzed from a plurality of links based on the analysis object and the analysis type; the link is used for representing the calling relation among the micro services generated during the micro service processing task;
and carrying out link fault detection on the link set based on the analysis object and the analysis type to obtain a link fault detection result of the target fault exposure point.
In the embodiment of the disclosure, based on the analysis object and the analysis type, the link failure detection is performed on the link set automatically, so that compared with the method for manually checking the failure information in the related technology, the detection time can be saved, and the detection efficiency and the detection precision are further improved. Furthermore, based on the analysis object and the analysis type, the link set to be analyzed is screened from the multiple links, so that the number of links to be detected can be reduced in the subsequent detection step, that is, the detection range is reduced, and the detection efficiency is improved.
In an optional implementation manner, the performing, based on the analysis object and the analysis type, link failure detection on the link set to obtain a link failure detection result of the target failure exposure point includes:
Determining an analysis starting point of the link set based on the analysis object and the analysis type; the analysis starting point is used for determining a sub-link representing the minimum fault propagation range from the link set;
deleting branches, which are irrelevant to the analysis starting point, in the link set according to the analysis starting point to obtain a sub-link of the minimum fault propagation range associated with the analysis starting point;
and carrying out link fault detection on the sub-link of the minimum fault propagation range associated with the analysis starting point to obtain a link fault detection result of the target fault exposure point.
In the embodiment of the disclosure, according to the analysis starting point, the sub-link of the minimum fault propagation range associated with the analysis starting point is determined from the link set, that is, the branch of each link in the link set is pruned, so that in the subsequent step of detecting the link fault, the detection of the branch unrelated to the analysis starting point can be avoided, thereby being beneficial to improving the efficiency of fault detection.
In an optional implementation manner, the performing link failure detection on the sub-link in the minimum fault propagation range associated with the analysis starting point in the link set to obtain a link failure detection result of the target fault exposure point includes:
Generating a directed link graph centered on the analysis starting point based on a sub-link of the minimum fault propagation range associated with the analysis starting point in the link set, wherein each node in the directed link graph corresponds to one micro service;
and carrying out fault analysis on each node in the directed link graph to obtain a link fault detection result of the target fault exposure point.
In the embodiment of the disclosure, based on the sub-link associated with the analysis starting point, a directed link graph centering on the analysis starting point is generated, and each node in the directed link graph is analyzed, so that the source of the fault and the influence surface of the fault can be intuitively displayed through the directed link graph, and the use experience of a user is improved.
In an alternative embodiment, the fault detection result includes at least one of an error number of each node, an error rate and a contribution rate, where the error number refers to the number of links where the node has an error in each sub-link, the error rate refers to a ratio between the error number and the total number of sub-links to which the node belongs, and the contribution rate refers to a ratio between the error number of the node and the total number of sub-links to which the analysis starting point belongs.
In the embodiment of the disclosure, by determining at least one of the error number, the error rate and the contribution rate of each node, fault detection on each node from multiple detection angles is realized, which is beneficial to improving the accuracy of fault detection.
In an alternative embodiment, the analysis type includes a self-tuning type; the determining an analysis start point of the link set based on the analysis object and the analysis type includes:
and determining the analysis object as the analysis starting point in the case that the analysis type is the self-tuning type.
In the embodiment of the disclosure, in the case that the analysis type is the self-tuning type, it is explained that the analysis object itself has a fault, so the analysis object can be determined as the analysis starting point, and thus, the accuracy of the analysis starting point can be improved.
In an alternative embodiment, the link includes call information between micro services, and the analysis type includes an active call type; the determining an analysis start point of the link set based on the analysis object and the analysis type includes:
determining active call information from the link set according to the analysis object and the active call type under the condition that the analysis type is the active call type;
And taking the calling micro-service corresponding to the active calling information as an analysis starting point of the link set.
In the embodiment of the disclosure, under the condition that the analysis type is an active call type, the active call information is determined according to the analysis object and the analysis type, and the call micro-service corresponding to the active call information is determined as the analysis starting point, so that the accuracy of the analysis starting point is improved.
In an alternative embodiment, the link includes call information between micro services, and the analysis type includes a passive call type; the determining an analysis start point of the link set based on the analysis object and the analysis type includes:
under the condition that the analysis type is a passive call type, determining passive call information from the link set according to the analysis object and the passive call type;
and taking the called micro-service corresponding to the passive call information as an analysis starting point of the link set.
In the embodiment of the disclosure, under the condition that the analysis type is a passive call type, the passive call information is determined according to the analysis object and the analysis type, and the called microservice corresponding to the passive call information is determined as the analysis starting point, so that the accuracy of the analysis starting point is improved.
The embodiment of the disclosure also provides a link failure detection device, which comprises:
the acquisition module is used for responding to a link detection request aiming at a target fault exposure point and acquiring an analysis object corresponding to the target fault exposure point and an analysis type of the analysis object; the analysis object is used for representing a target micro-service corresponding to the target fault exposure point, and the analysis type is used for representing the calling type of the target micro-service;
the screening module is used for screening a link set to be analyzed from a plurality of links based on the analysis object and the analysis type; the link is used for representing the calling relation among the micro services generated during the micro service processing task;
and the detection module is used for carrying out link fault detection on the link set based on the analysis object and the analysis type to obtain a link fault detection result of the target fault exposure point.
In an alternative embodiment, the detection module is specifically configured to:
determining an analysis starting point of the link set based on the analysis object and the analysis type;
the analysis starting point is used for determining a sub-link representing the minimum fault propagation range from the link set;
Deleting branches, which are irrelevant to the analysis starting point, in the link set according to the analysis starting point to obtain a sub-link of the minimum fault propagation range associated with the analysis starting point;
and carrying out link fault detection on the sub-link of the minimum fault propagation range associated with the analysis starting point to obtain a link fault detection result of the target fault exposure point.
In an alternative embodiment, the detection module is specifically configured to:
generating a directed link graph centered on the analysis starting point based on a sub-link of the minimum fault propagation range associated with the analysis starting point in the link set, wherein each node in the directed link graph corresponds to one micro service;
and carrying out fault analysis on each node in the directed link graph to obtain a link fault detection result of the target fault exposure point.
In an alternative embodiment, the fault detection result includes at least one of an error number of each node, an error rate and a contribution rate, where the error number refers to the number of links where the node has an error in each sub-link, the error rate refers to a ratio between the error number and the total number of sub-links to which the node belongs, and the contribution rate refers to a ratio between the error number of the node and the total number of sub-links to which the analysis starting point belongs.
In an alternative embodiment, the analysis type includes a self-tuning type; the detection module is specifically used for:
and determining the analysis object as the analysis starting point in the case that the analysis type is the self-tuning type.
In an alternative embodiment, the link includes call information between micro services, and the analysis type includes an active call type; the detection module is specifically used for:
determining active call information from the link set according to the analysis object and the active call type under the condition that the analysis type is the active call type;
and taking the calling micro-service corresponding to the active calling information as an analysis starting point of the link set.
In an alternative embodiment, the link includes call information between micro services, and the analysis type includes a passive call type; the detection module is specifically used for:
under the condition that the analysis type is a passive call type, determining passive call information from the link set according to the analysis object and the passive call type;
and taking the called micro-service corresponding to the passive call information as an analysis starting point of the link set.
The embodiment of the disclosure also provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the link failure detection method described in any one of the possible embodiments above.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the link failure detection method described in any one of the possible implementations above.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
Fig. 1 shows a flowchart of a link failure detection method provided by an embodiment of the present disclosure;
FIG. 2 illustrates a schematic diagram of a link provided by an embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of a method of failure detection of a link set provided by an embodiment of the present disclosure;
FIG. 4 illustrates a schematic diagram of a determination of an analysis start point under an active call type provided by an embodiment of the present disclosure;
FIG. 5 illustrates a schematic diagram of determining an analysis start point under a passive call type provided by an embodiment of the present disclosure;
FIG. 6 is a flow chart illustrating a method for detecting a failure of a sub-link in a link set according to an embodiment of the present disclosure;
FIG. 7 illustrates a schematic diagram of a directed link graph provided by an embodiment of the present disclosure;
fig. 8 is a schematic diagram of a link failure detection apparatus according to an embodiment of the present disclosure;
fig. 9 shows a schematic diagram of an electronic device provided by an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
The micro-service architecture is an emerging software architecture, and is based on a small functional block with single responsibility and functions, so that a single application program and service are split into a plurality of micro-services, and the corresponding functions are realized by calling each micro-service.
Because of the interdependence and interaction between the micro-services, when any one micro-service fails, the source of the failure and the influence of other micro-services are usually required to be detected, for example, the source of the failure of any one micro-service may be propagated by other micro-services except for a few hops, and the abnormal micro-service may affect other micro-services.
In the related art, when a fault occurs, the log information of each micro service is usually checked by adopting a manual analysis mode, for example, manual checking is performed for each fault exposure point, and the user-assisted checking of the upstream and downstream micro services is pulled according to the checked information.
Based on the above study, the disclosure provides a method, a device, equipment and a storage medium for detecting a link fault, which are used for responding to a link detection request aiming at a target fault exposure point, obtaining an analysis object corresponding to the target fault exposure point and an analysis type of the analysis object, wherein the analysis object is used for representing a target micro-service corresponding to the target fault exposure point, the analysis type is used for representing a calling type of the target micro-service, then screening a link set to be analyzed from a plurality of links based on the analysis object and the analysis type, wherein the links are used for representing a calling relation among micro-services generated during a micro-service processing task, and finally carrying out link fault detection on the link set based on the analysis object and the analysis type, so as to obtain a link fault detection result of the target fault exposure point.
In the embodiment of the disclosure, based on the analysis object and the analysis type, the link failure detection is automatically performed on the link set, so that compared with the method for manually checking the failure information in the related art, the method can save the failure detection time, and further improve the failure detection efficiency and the failure detection precision.
Furthermore, in this embodiment, the link set to be analyzed is screened from the multiple links based on the analysis object and the analysis type, so that the number of links to be detected in the subsequent detection step can be reduced, that is, the detection range is reduced, and the detection efficiency is improved.
For the sake of understanding the present embodiment, first, a detailed description will be given of a link failure detection method disclosed in an embodiment of the present disclosure, where an execution body of the link failure detection method provided in the embodiment of the present disclosure is generally an electronic device with a certain computing capability, and the electronic device includes, for example: the server or other processing equipment, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud storage, big data, artificial intelligent platforms and the like. Other processing devices may be devices including processors and memory, and are not limited in this regard. In some possible implementations, the link failure detection method may be implemented by way of a processor invoking computer readable instructions stored in a memory.
The above-described link failure detection method is described below with respect to an execution subject serving as a server.
Referring to fig. 1, a flowchart of a link failure detection method according to an embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:
s101, responding to a link detection request aiming at a target fault exposure point, and acquiring an analysis object corresponding to the target fault exposure point and an analysis type of the analysis object; the analysis object is used for representing the target micro-service corresponding to the target fault exposure point, and the analysis type is used for representing the calling type of the target micro-service.
The link (Trace), also called call chain, refers to the tracking of the complete call of a task processing request. Specifically, the first micro service executing the task processing request in the micro service architecture generates a globally unique trace identifier, where the trace identifier is used to identify the task processing request, and the trace identifier remains unchanged regardless of the number of micro services passing through during the call of the task processing request, and is continuously transferred along with the call of each layer of micro service. Finally, the paths of this task processing request in the micro-service architecture can be all concatenated by trace identification. For example, when a request is executed, the micro service a calls the micro service B, the micro service B calls the micro service C and the micro service D, and from the micro service a to the micro service B, and from the micro service B to the micro service C and the micro service D are one call chain, that is, one task processing request corresponds to one link, and each link includes call information between the micro services and different micro services.
Exemplary, please refer to fig. 2, which is a schematic diagram of a link provided by an embodiment of the present disclosure. As shown in fig. 2, the link includes a micro service S1, a micro service S2, and a micro service S3, where M1 is an interface of the micro service S1, M2 is an interface of the micro service 2, M3 is an interface of the micro service 3, S1 generates corresponding call information when calling the interface M2 of S2 through the interface M1, S2 generates corresponding call information when calling the interface M3 of S3 through the interface M2, and S2 is that InnerFunc refers to an internal function of S2. Here, in the micro-service calling process, concepts of Server Span and Client Span may be generated, for example, when S2 calls S3, S2 is the Server Span, and when S2 calls S3 is the Client Span (i.e. call information), each Span may record some information, such as what upstream micro-service S2 is called and what downstream micro-service S2 is called, and information about response time consumed by S2 to respond to a request.
The target fault exposure point refers to a node exposed in the link and having a fault, and the link detection request may be a detection request generated by a link detection operation input by a user aiming at the target fault exposure point.
The analysis object is used for representing the target micro-service corresponding to the target fault exposure point, the analysis type is used for representing the calling type of the target micro-service, and the calling type of the target micro-service can be an active calling type, a passive calling type or a self calling type.
Specifically, the user may input an analysis object corresponding to the target fault exposure point and an analysis type corresponding to the analysis object according to the target fault exposure point, for example, the user may input a target micro service (for example, the micro service S2) and a corresponding call type (for example, an active call type) in a preset fault detection interface, so a link detection request for the target fault exposure point may be generated, and the link detection request may be sent to the server, and further the server may obtain the analysis object corresponding to the target fault exposure point and the analysis type of the analysis object in response to the link detection request. That is, in the present embodiment, the analysis object and the analysis type are determined by the user; in other embodiments, the analysis object and the analysis type may also be determined by the server according to the condition of the fault exposure point, specifically, the server may analyze each fault exposure point, determine the analysis object and the analysis type according to the analysis result, for example, determine, for the interface call information of each micro service, whether the interface itself fails or fails in the process of interface call, and further determine the analysis object and the analysis type.
S102, screening a link set to be analyzed from a plurality of links based on the analysis object and the analysis type; the link is used for representing the calling relation among the micro services generated during the micro service processing task.
It will be appreciated that the interfaces of each link for each micro-service are generated by calling each other in response to the task processing request, and each link includes call information between each micro-service and different micro-service required for executing task processing.
Based on the above, since the request corresponds to one link at a time and the analysis object is a target micro-service corresponding to the target fault exposure point, for example, the analysis object input by the user may be in a format of < psm, method >, where psm is the target micro-service and method is an interface of the target micro-service, so that the link set to be analyzed meeting the screening condition can be screened from multiple links according to the analysis type input by the user as a further screening condition.
Taking the target micro service S2 as an example, the link set is acquired for the above three analysis types.
(1) The analysis object may be the target micro service S2, and if the analysis type is the self-tuning type, call information corresponding to the interface of the target micro service S2 may be found, so as to obtain a corresponding link set to be analyzed.
(2) The analysis object may be the target micro service S2, and the analysis type is an active call type, so that call information corresponding to calling the downstream micro service occurring in the interface of the target micro service S2 may be found, and a corresponding link set to be analyzed may be obtained.
(3) The analysis object may be a target micro service S2, and the analysis type is a passive call type, and then the interface of the target micro service S2 may be found to be called by an upstream micro service, and the corresponding call information of the upstream micro service record is obtained, and a corresponding link set to be analyzed is obtained.
In some embodiments, a link set to be analyzed may be obtained through a real-time sampling detection mode in response to a link detection request for a target fault exposure point, where the real-time sampling detection mode refers to performing search sampling from a real-time log storage, and the mode supports flexible screening conditions, for example, may detect an error propagation chain of "a cluster to which a certain analysis object belongs calls a certain downstream micro service in a certain machine room", and may also establish tasks of obtaining multiple links, so as to implement multipath concurrency, so as to improve efficiency of link obtaining.
In other embodiments, the link set to be analyzed may be obtained through an offline subscription detection mode in response to a link detection request for the target fault exposure point, where the offline subscription detection mode refers to that a user may subscribe to some fixed analysis objects, and perform subsequent fault detection on links to which each analysis object belongs by setting a timed offline task, for example, obtain a link set related to the analysis object in the last five days, perform subsequent fault detection on the link set, obtain a detection result, and send the detection result to the user.
The two detection modes may be determined by a user, or any one of the two detection modes may be set in advance as a default detection mode, which is not limited herein.
In still other embodiments, the offline subscription detection mode may further perform subsequent fault detection on the link to which each analysis object belongs by setting a timed offline task, in this mode, no response to a link detection request is required, no user is required to select two modes, and a fault detection result is obtained by acquiring all the failed links within a preset time (for example, one week), and separately detecting each failed node in the links.
And S103, carrying out link fault detection on the link set based on the analysis object and the analysis type to obtain a link fault detection result of the target fault exposure point.
In the step, after the link set is obtained through the screening process, automatic link fault detection can be performed on the link set according to the analysis object and the analysis type to obtain a link fault detection result of the target fault exposure point, so that each fault exposure point is not required to be checked in sequence by manpower, the detection time can be saved, and the detection efficiency is further improved.
Specifically, for step S103, when performing link failure detection on the link set based on the analysis object and the analysis type to obtain a link failure detection result of the target failure exposure point, please refer to fig. 3, including the following S1031-S1033:
s1031, determining an analysis starting point of the link set based on the analysis object and the analysis type, wherein the analysis starting point is used for determining a sub-link representing the minimum fault propagation range from the link set.
The manner in which the analysis start point is determined is different for different analysis types, and since the analysis types are used to characterize the call types of the target micro service, the determination manner of the analysis start point is described below based on the different call types, specifically as follows:
(a) And determining the analysis object as the analysis starting point in the case that the analysis type is the self-tuning type.
(b) And under the condition that the analysis type is the active call type, determining active call information from the link set according to the analysis object and the active call type, and taking a call micro-service corresponding to the active call information as an analysis starting point of the link set.
Exemplary, please refer to fig. 4, which is a schematic diagram of determining an analysis start point under an active call type according to an embodiment of the present disclosure. As shown in fig. 4, according to the analysis object S2 and the active call type, active call information S2- > S3 may be found from the link set: : m3 in this case indicates that there is a fault when S2 calls downstream S3, and therefore, the impact surface of the fault needs to be determined, and further based on S2- > S3: : m3, determining the corresponding calling micro service (namely, the father server span) from the link set, namely S2: : m2, and will S2: : m2 serves as an analysis starting point for the link set.
In some embodiments, if the calling micro-service corresponding to the active call information is not found from the link set, the subsequent detection step may be continued by complementing the way of calling the micro-service. For example, if the sum S2- > S3 is not found: : and calling the micro service corresponding to M3, and directly adding S2 into the link set as an analysis starting point of the link set.
(c) And under the condition that the analysis type is a passive call type, determining passive call information from the link set according to the analysis object and the passive call type, and taking a called micro service corresponding to the passive call information as an analysis starting point of the link set.
It will be appreciated that this step is similar to the content of step (b) above, and for example, please refer to fig. 5, which is a schematic diagram of determining an analysis start point under a passive call type according to an embodiment of the present disclosure. As shown in fig. 5, according to the analysis object and the passive call type, the passive call information S1- > S2 may be found from the link set: : m2, in this case, indicates that S2 has a fault when invoked by the upstream S1 as the invoked micro-service, i.e. the source of the fault is determined here, so the impact of the fault needs to be further determined, and further based on S1- > S2: : m2, determining the corresponding called micro service (namely, sub server span) from the link set, namely S2: : m2, and will S2: : m2 serves as an analysis starting point for the link set.
Optionally, when finding the called micro-service, it can determine whether the state of the called micro-service is an error state, if the state of the called micro-service is not an error state, then the state is marked as an error state, and the reason for marking the error state is that the process called by the upstream is in error.
In some embodiments, if the call micro-service corresponding to the passive call information is not found from the link set, the subsequent detection step may be continued by complementing the way the call micro-service is called. For example, if the sum S1- > S2 is not found: : and the called micro service corresponding to the M2 can be directly added into the link set according to the S2 to serve as an analysis starting point of the link set.
S1032, according to the analysis starting point, deleting the branch circuit which is irrelevant to the analysis starting point in the link set, and obtaining the sub-link of the minimum fault propagation range which is relevant to the analysis starting point.
In this step, branches in the link set, which are not related to the analysis start point, are pruned based on the analysis start point, and a sub-link of the minimum fault propagation range related to the analysis start point is obtained. Specifically, for each link in the link set, since the links share the same analysis start point, the branches in each link that are not related to the analysis start point can be pruned separately.
For example, if the analysis starting point is S2 and the analysis type is the active call type, when the link set is acquired, the relevant complete link is acquired, for example, the acquired link includes S1- > S2, S2- > S3, S2- > S4, that is, S2 calls two micro services S3 and S4 at the same time, and there is a fault in S2- > S3, and there is no fault in the branch of S2- > S4, so that the branch of S2- > S4 may be pruned from the link, that is, in this step, only the branch of S2- > S3 is reserved, that is, from the analysis starting point, the irrelevant brother node is pruned, and only the sub-link associated with the analysis starting point is reserved after each link is pruned.
The number of micro services in the present embodiment is merely illustrative, and in other embodiments, the number of micro services may be any number, and is not limited thereto.
In some embodiments, the user may select a "sub-error link that does not propagate to the analysis origin" mode in the failure detection interface, i.e., if a failure occurs in the analysis origin's downstream micro-services, but the failure does not propagate to the analysis origin, i.e., the analysis origin has no impact, then the failed downstream micro-services may be pruned.
S1033, detecting link faults of the sub-links in the minimum fault propagation range associated with the analysis starting point in the link set, and obtaining a link fault detection result of the target fault exposure point.
It can be understood that after the screening of the link set in the above steps and the deletion of the branches in the link, the sub-link finally obtained is a relatively simplified link, so that the sub-link associated with the analysis starting point can be subjected to link failure detection, and the link failure detection result of the target failure exposure point is obtained.
Optionally, for step S1033, when performing link failure detection on the sub-links associated with the analysis start point in the link set to obtain a link failure detection result of the target failure exposure point, please refer to fig. 6, the following steps S10331 to S10332 may be included:
S10331, based on the sub-links of the minimum fault propagation range associated with the analysis start point in the link set, generating a directed link graph centered on the analysis start point, where each node in the directed link graph corresponds to a micro service.
S10332, carrying out fault analysis on each node in the directed link graph to obtain a link fault detection result of the target fault exposure point.
Exemplary, please refer to fig. 7, which is a schematic diagram of a directed link diagram provided in an embodiment of the present disclosure. As shown in fig. 7, each sub-link is centered on and connected to the analysis start point, where each circled node in the directed link graph represents one micro-service, and an arrow between two nodes represents a call relationship between two micro-services, e.g., S1, S3, S5 calls S2, respectively, and S2 calls S4, S6, S8, respectively.
The link failure detection result comprises at least one of error number, error rate and contribution rate of each node.
The error number refers to the number of links where the node has errors in each sub-link, the error rate refers to the ratio between the error number and the total number of sub-links to which the node belongs, and the contribution rate refers to the ratio between the error number of the node and the total number of sub-links to which the analysis starting point belongs.
For example, since each request corresponds to one link, for request 1, request 2 and request 3, the corresponding links include the micro service S1 call micro service S2, where the S1 call S2 in the links corresponding to request 1 and request 2 fails, and the S1 call S2 in the links corresponding to request 3 does not fail, the number of errors is 2 for node S2, the total number is 3, and the error rate is 2/3.
Taking S2 in fig. 7 as an analysis starting point as an example, a method for determining a contribution rate is described, where the total number of sub-links to which the analysis starting point belongs is the total number of links of the link set determined in the foregoing embodiment, that is, based on the number of links in the link set determined by the analysis object S2 being 100, S2 calls S4, S6 and S8 simultaneously, where the number of times S2 calls S4 is 95, the number of times S2 calls S6 is 4, and the number of times S2 calls S8 is 1, then for the node S4, the corresponding contribution rate thereof is 95/100, for the node S6, the corresponding contribution rate thereof is 4/100, and for the node S8, the corresponding contribution rate thereof is 1/100.
Optionally, after fault detection, the error number, the error rate, the contribution rate and the like of each node may be marked in the directed link graph, and the directed link graph may be displayed in response to a user's request for checking the directed link graph, so that the user may intuitively analyze the fault information of each node.
Alternatively, after each request is executed, if a node fails, an error status code of the node and a key error log corresponding to the error status code may be generated, so that the error status code and the key error log of the node may be marked in the directed link map, and a link identifier to which the node belongs may be marked for a user to check.
In some embodiments, the directed link graph may include branches that do not have faults, if all the branches are shown in the directed link graph and are redundant, the node may be deleted according to the call proportion of the node, for example, if S2 calls S3 99 times and S2 calls S4 only 1 time, and in this case, the association between S4 and S2 is only 1%, and the S4 node may be deleted.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiments of the present disclosure further provide a link failure detection device corresponding to the link failure detection method, and since the principle of solving the problem by the device in the embodiments of the present disclosure is similar to that of the link failure detection method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repetition is not repeated.
Referring to fig. 8, a schematic diagram of a link failure detection apparatus 800 according to an embodiment of the disclosure is provided, where the apparatus includes:
an obtaining module 801, configured to obtain an analysis object corresponding to a target failure exposure point and an analysis type of the analysis object in response to a link detection request for the target failure exposure point; the analysis object is used for representing a target micro-service corresponding to the target fault exposure point, and the analysis type is used for representing the calling type of the target micro-service;
a screening module 802, configured to screen a link set to be analyzed from a plurality of links based on the analysis object and the analysis type; the link is used for representing the calling relation among the micro services generated during the micro service processing task;
and a detection module 803, configured to perform link failure detection on the link set based on the analysis object and the analysis type, to obtain a link failure detection result of the target failure exposure point.
In an alternative embodiment, the detection module 803 is specifically configured to:
determining an analysis starting point of the link set based on the analysis object and the analysis type; the analysis starting point is used for determining a sub-link representing the minimum fault propagation range from the link set;
Deleting branches, which are irrelevant to the analysis starting point, in the link set according to the analysis starting point to obtain a sub-link of the minimum fault propagation range associated with the analysis starting point;
and carrying out link fault detection on the sub-link of the minimum fault propagation range associated with the analysis starting point to obtain a link fault detection result of the target fault exposure point.
In an alternative embodiment, the detection module 803 is specifically configured to:
generating a directed link graph centered on the analysis starting point based on a sub-link of the minimum fault propagation range associated with the analysis starting point in the link set, wherein each node in the directed link graph corresponds to one micro service;
and carrying out fault analysis on each node in the directed link graph to obtain a link fault detection result of the target fault exposure point.
In an alternative embodiment, the fault detection result includes at least one of an error number of each node, an error rate and a contribution rate, where the error number refers to the number of links where the node has an error in each sub-link, the error rate refers to a ratio between the error number and the total number of sub-links to which the node belongs, and the contribution rate refers to a ratio between the error number of the node and the total number of sub-links to which the analysis starting point belongs.
In an alternative embodiment, the analysis type includes a self-tuning type; the detection module 803 specifically is configured to:
and determining the analysis object as the analysis starting point in the case that the analysis type is the self-tuning type.
In an alternative embodiment, the link includes call information between micro services, and the analysis type includes an active call type; the detection module 803 specifically is configured to:
determining active call information from the link set according to the analysis object and the active call type under the condition that the analysis type is the active call type;
and taking the calling micro-service corresponding to the active calling information as an analysis starting point of the link set.
In an alternative embodiment, the link includes call information between micro services, and the analysis type includes a passive call type; the detection module 803 specifically is configured to:
under the condition that the analysis type is a passive call type, determining passive call information from the link set according to the analysis object and the passive call type;
and taking the called micro-service corresponding to the passive call information as an analysis starting point of the link set.
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
Based on the same technical concept, the embodiment of the disclosure also provides electronic equipment. Referring to fig. 9, a schematic structural diagram of an electronic device 900 according to an embodiment of the disclosure includes a processor 901, a memory 902, and a bus 903. The memory 902 is configured to store execution instructions, including a memory 9021 and an external memory 9022; the memory 9021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 901 and data exchanged with an external memory 9022 such as a hard disk, and the processor 901 exchanges data with the external memory 9022 via the memory 9021.
In the embodiment of the present application, the memory 902 is specifically configured to store application program codes for executing the solution of the present application, and the processor 901 controls the execution. That is, when the electronic device 900 is running, communication between the processor 901 and the memory 902 is via the bus 903, such that the processor 901 executes the application code stored in the memory 902, thereby performing the methods described in any of the foregoing embodiments.
The Memory 902 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
Processor 901 may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 900. In other embodiments of the present application, electronic device 900 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of link failure detection in the method embodiments described above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiments of the present disclosure further provide a computer program product, where the computer program product carries program code, where instructions included in the program code may be used to perform the steps of link failure detection in the foregoing method embodiments, and specifically reference the foregoing method embodiments will not be described herein.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A method for detecting a link failure, comprising:
responding to a link detection request aiming at a target fault exposure point, and acquiring an analysis object corresponding to the target fault exposure point and an analysis type of the analysis object; the analysis object is used for representing a target micro-service corresponding to the target fault exposure point, and the analysis type is used for representing the calling type of the target micro-service;
Screening a link set to be analyzed from a plurality of links based on the analysis object and the analysis type; the link is used for representing the calling relation among the micro services generated during the micro service processing task;
and carrying out link fault detection on the link set based on the analysis object and the analysis type to obtain a link fault detection result of the target fault exposure point.
2. The method according to claim 1, wherein the performing link failure detection on the link set based on the analysis object and the analysis type to obtain a link failure detection result of the target failure exposure point includes:
determining an analysis starting point of the link set based on the analysis object and the analysis type; the analysis starting point is used for determining a sub-link representing the minimum fault propagation range from the link set;
deleting branches, which are irrelevant to the analysis starting point, in the link set according to the analysis starting point to obtain a sub-link of the minimum fault propagation range associated with the analysis starting point;
and carrying out link fault detection on the sub-link of the minimum fault propagation range associated with the analysis starting point to obtain a link fault detection result of the target fault exposure point.
3. The method according to claim 2, wherein the performing link failure detection on the sub-link of the minimum failure propagation range associated with the analysis start point in the link set to obtain a link failure detection result of the target failure exposure point includes:
generating a directed link graph centered on the analysis starting point based on a sub-link of the minimum fault propagation range associated with the analysis starting point in the link set, wherein each node in the directed link graph corresponds to one micro service;
and carrying out fault analysis on each node in the directed link graph to obtain a link fault detection result of the target fault exposure point.
4. The method of claim 3, wherein the fault detection result includes at least one of an error number of each node, an error rate, and a contribution rate, wherein the error number is a number of links in which the node has an error in each sub-link, the error rate is a ratio between the error number and a total number of sub-links to which the node belongs, and the contribution rate is a ratio between the error number of the node and the total number of sub-links to which the analysis start point belongs.
5. The method of claim 2, wherein the analysis type comprises a self-tuning type; the determining an analysis start point of the link set based on the analysis object and the analysis type includes:
and determining the analysis object as the analysis starting point in the case that the call type is the self-tuning type.
6. The method of claim 2, wherein the link includes call information between micro services, and wherein the analysis type includes an active call type; the determining an analysis start point of the link set based on the analysis object and the analysis type includes:
determining active call information corresponding to the analysis object from the link set according to the analysis object and the active call type under the condition that the analysis type is the active call type;
and taking the calling micro-service corresponding to the active calling information as an analysis starting point of the link set.
7. The method of claim 2, wherein the link includes call information between micro services, and wherein the call type includes a passive call type; the determining an analysis start point of the link set based on the analysis object and the analysis type includes:
Under the condition that the call type is a passive call type, determining passive call information from the link set according to the analysis object and the passive call type;
and taking the called micro-service corresponding to the passive call information as an analysis starting point of the link set.
8. A link failure detection apparatus, comprising:
the acquisition module is used for responding to a link detection request aiming at a target fault exposure point and acquiring an analysis object corresponding to the target fault exposure point and an analysis type of the analysis object; the analysis object is used for representing a target micro-service corresponding to the target fault exposure point, and the analysis type is used for representing the calling type of the target micro-service;
the screening module is used for screening a link set to be analyzed from a plurality of links based on the analysis object and the analysis type; the link is used for representing the calling relation among the micro services generated during the micro service processing task;
and the detection module is used for carrying out link fault detection on the link set based on the analysis object and the analysis type to obtain a link fault detection result of the target fault exposure point.
9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the link failure detection method according to any of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the link failure detection method according to any of claims 1 to 7.
CN202211500900.3A 2022-11-28 2022-11-28 Link fault detection method, device, equipment and medium Pending CN116155688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211500900.3A CN116155688A (en) 2022-11-28 2022-11-28 Link fault detection method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211500900.3A CN116155688A (en) 2022-11-28 2022-11-28 Link fault detection method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116155688A true CN116155688A (en) 2023-05-23

Family

ID=86353337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211500900.3A Pending CN116155688A (en) 2022-11-28 2022-11-28 Link fault detection method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116155688A (en)

Similar Documents

Publication Publication Date Title
US10909028B1 (en) Multi-version regression tester for source code
WO2018000607A1 (en) Method and electronic apparatus for identifying test case failure causes
US10095599B2 (en) Optimization for application runtime monitoring
CN108038039B (en) Method for recording log and micro-service system
CN110674034A (en) Health examination method and device, electronic equipment and storage medium
CN111679968A (en) Interface calling abnormity detection method and device, computer equipment and storage medium
CN112035344A (en) Multi-scenario test method, device, equipment and computer readable storage medium
CN114844768A (en) Information analysis method and device and electronic equipment
CN110688305B (en) Test environment synchronization method, device, medium and electronic equipment
CN110347572B (en) Method, device, system, equipment and medium for outputting performance log
CN114500249B (en) Root cause positioning method and device
CN116155688A (en) Link fault detection method, device, equipment and medium
CN113238901B (en) Multi-device automatic testing method and device, storage medium and computer device
US11036624B2 (en) Self healing software utilizing regression test fingerprints
CN110618943B (en) Security service test method and device, electronic equipment and readable storage medium
CN111367796B (en) Application program debugging method and device
CN114546799A (en) Point burying log checking method and device, electronic equipment, storage medium and product
CN105786865B (en) Fault analysis method and device for retrieval system
CN112966056A (en) Information processing method, device, equipment, system and readable storage medium
CN111475400A (en) Verification method of service platform and related equipment
CN117155772B (en) Alarm information enrichment method, device, equipment and storage medium
CN117130945B (en) Test method and device
CN115640236B (en) Script quality detection method and computing device
CN110990475B (en) Batch task inserting method and device, computer equipment and storage medium
CN116521557A (en) Verification method and device for test cases, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination