CN112737800A

CN112737800A - Service node fault positioning method, call chain generation method and server

Info

Publication number: CN112737800A
Application number: CN201911029150.4A
Authority: CN
Inventors: 张媛; 陈秋浩; 程庞钢; 匡磊; 李杨; 周辉; 巫勇明; 邹艳军
Original assignee: SF Technology Co Ltd
Current assignee: SF Technology Co Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2021-04-30
Anticipated expiration: 2039-10-28
Also published as: CN112737800B

Abstract

The invention is suitable for the technical field of computers, and provides a service node fault positioning method, which comprises the following steps: if the acquisition server detects that the running performance of the target application fails, acquiring call chains of all service nodes corresponding to the target application; the acquisition server acquires first calling data of the parent service node contained in each calling chain and second calling data of all child service nodes contained in each calling chain according to the first identification information of each parent service node and the second identification information of all child service nodes; and the acquisition server determines the service node with the fault according to the acquired first calling data and the acquired second calling data. The fault locating method and the fault locating device have the advantages that the fault sending service node is determined according to the first calling data of the father service node contained in each calling chain and the second calling data of all the child service nodes contained in each calling chain, and therefore the fault locating efficiency of the service node can be improved.

Description

Service node fault positioning method, call chain generation method and server

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a service node fault positioning method, a call chain generation method and a server.

Background

Because an Application Performance Management (APM) system can monitor an enterprise system in real time to achieve systematic solution to Application Performance Management and failures, more and more enterprise-level applications are accessed. But the fault location of the service node can be carried out only by calling the log, and the problem of low fault location efficiency exists.

Disclosure of Invention

In view of this, embodiments of the present invention provide a service node fault location method, a call generation method, an application performance management system, an acquisition server, and a call chain server, so as to solve the problem in the prior art that the fault location of a service node can only be performed by calling a log, and the fault location efficiency is low.

A first aspect of an embodiment of the present invention provides a method for locating a fault of a service node, which is applied to an application performance management system, and is characterized in that the application performance management system includes an acquisition server and a call chain server, and the method includes:

if the acquisition server detects that the running performance of the target application fails, acquiring call chains of all service nodes corresponding to the target application; wherein the service nodes comprise a parent service node and a child service node; the call chain is generated by the call chain server in advance according to the first identification information of the father service node and the second identification information of all the child service nodes, and the call chain is the call chain of each father service node and all the child service nodes corresponding to the father service node;

the acquisition server acquires first calling data of the parent service node contained in each calling chain and second calling data of all the child service nodes contained in each calling chain according to the first identification information of each parent service node and the second identification information of all the child service nodes;

and the acquisition server determines the service node with the fault according to the acquired first calling data and the acquired second calling data.

A second aspect of the embodiments of the present invention provides a call chain generation method, which is applied to an acquisition server, and includes:

acquiring predetermined calling data corresponding to all applied service nodes, wherein the calling data comprises identification information of the service nodes, and the service nodes comprise father service nodes and son service nodes;

respectively determining first identification information of each parent service node and second identification information of all child service nodes corresponding to each parent service node, wherein the first identification information of each parent service node and the second identification information of all child service nodes corresponding to each parent service node have the same identification characters with preset quantity;

storing the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node into a predetermined distributed subscription message system;

the distributed subscription message system stores the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node, and the first identification information and the second identification information are used for a call chain server to generate a service call chain of each parent service node and all the child service nodes corresponding to each parent service node.

A third aspect of the embodiments of the present invention provides a call chain generation method, which is applied to a call chain server, and includes:

acquiring first identification information of each father service node and second identification information of all child service nodes corresponding to each father service node from a distributed subscription message system;

the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each service node respectively have the same identification characters with preset quantity, and the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node are stored in the distributed subscription message system;

the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node are determined by an acquisition server according to acquired calling data corresponding to all the service nodes of a target application, the acquisition server stores the determined first identification information and the determined second identification information into the distributed subscription message system, the calling data comprises the identification information of the service nodes, and the service nodes comprise the parent service nodes and the child service nodes;

and generating a service call chain of each father service node and all the corresponding child service nodes according to the first identification information and the second identification information.

A fourth aspect of the embodiments of the present invention provides an application performance management system, including an acquisition server and a call chain server;

the acquisition server includes:

the detection module is used for acquiring call chains of all service nodes corresponding to the target application if the operation performance of the target application is detected to be failed; wherein the service nodes comprise a parent service node and a child service node; the call chain is generated by the call chain server in advance according to the first identification information of the father service node and the second identification information of all the child service nodes, and the call chain is the call chain of each father service node and all the child service nodes corresponding to the father service node;

an obtaining module, configured to obtain, according to the first identification information of each parent service node and the second identification information of all child service nodes, first call data of the parent service node included in each call chain and second call data of all child service nodes included in each call chain;

the determining module is used for determining a service node with a fault according to the acquired first calling data and the acquired second calling data;

the call chain server comprises:

and the generating module is used for generating a calling chain of each parent service node and all the corresponding child service nodes according to the first identification information of the parent service node and the second identification information of all the child service nodes.

A fifth aspect of embodiments of the present invention provides a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the call chain generation method according to the second aspect when executing the computer program, or implements the steps of the call chain generation method according to the third aspect when executing the computer program.

Compared with the prior art, the service node fault positioning method provided by the first aspect of the application comprises the following steps: if the acquisition server detects that the running performance of the target application fails, acquiring call chains of all service nodes corresponding to the target application; the acquisition server acquires first calling data of the parent service node contained in each calling chain and second calling data of all child service nodes contained in each calling chain according to the first identification information of each parent service node and the second identification information of all child service nodes; and the acquisition server determines the service node with the fault according to the acquired first calling data and the acquired second calling data. According to the method and the device, the service node sending the fault is determined according to the first calling data of the father service node contained in each calling chain and the second calling data of all the child service nodes contained in each calling chain, and therefore the fault positioning efficiency of the service node can be improved.

Compared with the prior art, the advantageous effects of the embodiments provided in the second aspect to the fifth aspect of the present application are the same as the advantageous effects of the embodiments provided in the first aspect of the present application compared with the prior art, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is an application performance management system according to a first embodiment of the present application;

FIG. 1a is a first application view of the application performance management system of FIG. 1;

FIG. 1b is a diagram of a second application scenario of the application performance management system of FIG. 1;

FIG. 1c is a diagram of a third application scenario of the application performance management system of FIG. 1;

fig. 2 is a flowchart of an implementation of a service node fault location method according to a second embodiment of the present application;

fig. 3 is a flowchart of an implementation of a service node fault location method according to a third embodiment of the present application;

FIG. 4 is a method for generating a call chain according to a fourth embodiment of the present application;

FIG. 5 is a flowchart illustrating an embodiment of S403 in FIG. 4;

fig. 6 is a flowchart of an implementation of a call chain generation method according to a fifth embodiment of the present application;

FIG. 7 is a flowchart illustrating an implementation of S601 in FIG. 6;

FIG. 8 is a schematic diagram of an apparatus for an application performance management system provided by the present invention;

FIG. 9 is a schematic diagram of the acquisition server of FIG. 8;

FIG. 10 is a schematic diagram of the apparatus of the Call chain Server of FIG. 8;

FIG. 11 is a schematic diagram of an acquisition server provided by the present invention;

fig. 12 is a schematic diagram of a call chain server provided by the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

In general, Application Performance Management (APM) systems aim to monitor and manage Application software Performance and availability, and ensure good operation of software applications by monitoring and diagnosing Performance problems of complex applications. It should be noted that, since one service request may involve multiple service nodes of an application, when a service node of an application fails, for example, is called incorrectly or slowly, the application performance management system can only locate the failed service node by using a call log, and the whole fault query process needs to consume a large amount of labor and time costs, and is inefficient in query.

In view of the above technical problems, an embodiment of the present invention first provides an application performance management system. As shown in fig. 1, fig. 1 is an application performance management system according to a first embodiment of the present application. The system comprises: the collection server 01 and the call chain server 02 are in communication connection, and the collection server 01 and the call chain server 02 are in communication connection.

If the acquisition server 01 detects that the running performance of the target application fails, acquiring call chains of all service nodes corresponding to the target application; wherein the service nodes comprise a parent service node and a child service node.

And the call chain server 02 generates call chains of each parent service node and all the corresponding child service nodes according to the first identification information of the parent service node and the second identification information of all the child service nodes.

It should be noted that, in an alternative implementation, as shown in fig. 1a, it is a first application view diagram of the application performance management system in fig. 1. As shown in fig. 1a, after the call chain server 02 generates the call chain, it stores the call chain in a predetermined database 03, for example, a mixed non-relational database (Cassandra database, etc.), and the collection server 01 obtains the call chains of all service nodes corresponding to the target application from the predetermined database 03.

It is to be understood that, in order to ensure the working performance of the acquisition server 01 and the call chain server 02, as shown in fig. 1a, the predetermined database runs on a server other than the acquisition server 01 and the call chain server 02, and in an alternative implementation, the predetermined database is an open-source distributed database.

The target application is any one of all applications monitored by the application performance management system.

In an alternative implementation, as shown in fig. 1b, it is a second application scenario diagram of the application performance management system. As shown in fig. 1b, the collection server 01 is in communication connection with the user agent server 04, the user agent server 04 monitors node data of all service nodes of each application accessed in advance in real time, and acquires the node data of all service nodes, and the user agent server 04 sends the acquired node data of all service nodes to the collection server 01.

In an optional implementation manner, when the user agent server 04 monitors that there is a performance failure of the target application, the first identification information of the parent service node of the target application is sent to the acquisition server 01.

In an alternative implementation manner, in order to improve the performance of the collection server 01 and the call chain server 02 in processing data, as shown in fig. 1c, a third application scenario diagram of the application performance management system in fig. 1 is shown. As shown in fig. 1c, the collecting server 01 stores the acquired first identification information of each parent service node and the acquired second identification information of all the child service nodes corresponding to each parent service node into a predetermined distributed subscription message system 05; the call chain server 02 obtains the first identification information and the second identification information from the distributed subscription message system 05, and generates call chains of each parent service node and all the corresponding child service nodes according to the first identification information and the second identification information.

Fig. 2 is a flowchart illustrating an implementation of a service node fault location method according to a second embodiment of the present application. The application performance management system shown in fig. 1 can be implemented by software or hardware of the acquisition server. As can be seen from fig. 2, in this embodiment, the method for locating a service node fault includes steps S201 to S203, which are detailed as follows:

s201, if the running performance of the target application is detected to be failed, acquiring call chains of all service nodes corresponding to the target application;

wherein the service nodes comprise a parent service node and a child service node; the call chain is generated in advance by the call chain server according to the first identification information of the father service node and the second identification information of all the child service nodes, and the call chain is composed of each father service node and all the corresponding child service nodes. The target application is any application of all applications which are accessed to the application performance management system in advance.

The failure of the operation performance of the target application may be that the operation speed is less than a preset speed threshold or that the calling parameter exceeds a preset parameter threshold, and the like.

S202, according to the first identification information of each father service node and the second identification information of all the child service nodes, obtaining the first calling data of the father service node contained in each calling chain and the second calling data of all the child service nodes contained in each calling chain.

The parent service node is a service node that needs to call data for the first time in one service request, and the parent service node may also be referred to as a source service node. The service node is a service node called in the service request, for example, in the process of executing the service request through the application a, the service node 10, the service node 20, the service node 30, the service node 40, and the like are directly or indirectly called, and if the service node 10 is a service node to be initially called, the service node 10 is a root service node in the service request, and the remaining service nodes are child service nodes corresponding to the root service node.

The first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node respectively have the same identification characters with the preset number; for example, in an optional implementation manner, the first identification information of the parent service node is an internet protocol address and port information, and the second identification information of each child service node corresponding to the parent service node includes the internet protocol address, the port information, and a preset arrangement sequence number corresponding to each child service node. The preset arrangement sequence number corresponding to each sub-service node can be determined according to the sequence of the sub-service nodes called in the service request.

S203, determining the service node with the fault according to the acquired first calling data and the acquired second calling data.

The first calling data comprise calling chain data and running data of a father service node, and the second calling data comprise calling chain data and running data of a child service node. The calling chain data is calling data between service nodes generated by an application development kit, and the calling data between the service nodes comprises calling relations and business data between the service nodes.

The running data is a running log, and the running log is used for recording an event record of the occurrence of the service request, a date of the occurrence of the event record, timestamp information of the occurrence of the event record and the like.

The service node may include a service logic unit and an application development kit, where the service logic unit is configured to generate operation data, and the application development kit is configured to generate call chain data.

In an optional implementation manner, the call chain includes call chain identification information (TraceID), a call relationship between service nodes, and event information corresponding to each service node, where the event information includes time information of event response and event attribute information, and the event attribute information is used to record whether an event is an associated event or a single event.

In an alternative implementation, the fault is tracked and located by combining the calling relationship and the event information of each service node. For example, assuming that a service node a has a fault, when the service node a having the fault is located, first, a call chain including the service node a is found according to call chain identification information, then, a call relationship between service nodes included in the found call chain and event information corresponding to each service node are queried, and if the response time of the event information record corresponding to the service node a is queried to be greater than a preset response time threshold and the event attribute information is a single event, the service node a is determined to be the fault node.

It can be understood that if the response time of the event information record corresponding to the service node a is greater than the preset response time threshold and the event attribute information is the associated event, the event information and the event attribute information of each service node having a call relationship with the service node a need to be queried, and if the response time of other service nodes having a call relationship with the service node a is greater than the preset response time threshold, the service node and the service node a are both failure service nodes.

The calling relationship may be represented by a calling relationship tree, it should be noted that the calling relationship tree is only one representation of the calling relationship, and in other possible embodiments, the calling relationship may also be represented by another representation, such as a calling relationship diagram, which is not specifically limited herein.

According to the analysis, the service node fault positioning method provided by the application obtains the call chains of all service nodes corresponding to the target application if the operation performance of the target application is detected to be faulty; acquiring first calling data of the parent service node contained in each calling chain and second calling data of all the child service nodes contained in each calling chain according to the first identification information of each parent service node and the second identification information of all the child service nodes; and determining the service node with the fault according to the acquired first calling data and the acquired second calling data. The fault locating method and the fault locating device have the advantages that the fault sending service node is determined according to the first calling data of the father service node contained in each calling chain and the second calling data of all the child service nodes contained in each calling chain, and therefore the fault locating efficiency of the service node can be improved.

Fig. 3 is a flowchart illustrating an implementation of a service node fault location method according to a third embodiment of the present application. As shown in fig. 3, in this embodiment, compared with the embodiment shown in fig. 2, the implementation process of S306 to S308 in fig. 3 is the same as the implementation process of S201 to S203, except that before S306, S301 to S305 are further included, where S301 to S303 may be implemented by software or hardware of the acquisition server shown in fig. 1, and S304 to S305 may be implemented by software or hardware of the call chain server shown in fig. 1, and specifically, the implementation process of S301 to S305 is detailed as follows:

s301, obtaining predetermined calling data corresponding to all applied service nodes, wherein the calling data comprises identification information of the service nodes, and the service nodes comprise father service nodes and son service nodes.

In a non-limiting example, the predetermined all applications are all applications that have access to the application management system in advance, such as business critical applications of an enterprise; the identification information of the service node is unique identification information which can be used for identifying the service node.

S302, respectively determining first identification information of each parent service node and second identification information of all child service nodes corresponding to each parent service node, wherein the first identification information of each parent service node and the second identification information of all child service nodes corresponding to each parent service node have the same identification characters with preset quantity.

In a non-limiting example, the first identification information of the parent service node is an internet protocol address and port information, and the second identification information of each child service node corresponding to the parent service node includes the internet protocol address, the port information, and a preset arrangement sequence number corresponding to each child service node. The preset arrangement sequence number corresponding to each sub-service node can be determined according to the sequence of the sub-service nodes called in the service request.

In another non-limiting example, the first identification information of the parent service node is global unique identification information carried in a service request header, and the second identification information of each corresponding child service node of the child service nodes includes the global unique identification information and a preset arrangement sequence number corresponding to each child service node. The preset sequence number corresponding to each sub-service node can be determined according to the sequence in which each sub-service node is called in the service request.

S303, storing the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to the parent service node in a predetermined distributed subscription message system in an associated manner.

The distributed subscription message system is a fast high-throughput distributed publish-subscribe message system, can process all action flow data in a consumer-scale website, such as Kafka, and can unify online and offline message processing through a parallel loading mechanism of Hadoop, and also can provide real-time consumption through clustering.

In this embodiment, in order to ensure the performance of the acquisition server and the call chain server, the distributed subscription message system runs on other servers which are in communication connection with the acquisition server and the call chain server, and the running of the distributed subscription message system may be one or more server clusters. S304, obtaining the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node from the distributed subscription message system.

Since each message issued to the Kafka server cluster has a message type, which is called Topic, in this embodiment, a mapping relationship exists between the first identification information of each parent service node and the message type, and according to the mapping relationship, the first identification information of each parent service node corresponding to each message type and the second identification information of all the child service nodes corresponding to each parent service node are obtained.

S305, generating a call chain of each father service node and all the corresponding child service nodes according to the first identification information and the second identification information.

In one non-limiting example, the call chain includes call chain identification information (TraceID) and call relationships between service nodes; in another implementable manner, the call chain further includes event information including time information and attribute information of an event occurrence.

Through the analysis, the acquisition server stores the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node in a distributed subscription message system; and the calling chain server generates a calling chain according to the first identification information and the second identification information. The calling chain identification information, the calling relation among the service nodes and the event information corresponding to each service node included by the calling chain can be used for positioning the fault of the service node, and the fault positioning of the service node is carried out based on the calling chain, so that the fault positioning efficiency can be improved.

As shown in fig. 4, the method for generating a call chain according to the fourth embodiment of the present application is applied to an acquisition server, and can be implemented by software or hardware of the acquisition server. As can be seen from fig. 4, the call chain generating method provided in this embodiment includes steps S401 to S403, and specifically, the specific implementation processes of steps S401 to S403 are detailed as follows:

s401, obtaining predetermined calling data corresponding to all applied service nodes, wherein the calling data comprises identification information of the service nodes, and the service nodes comprise father service nodes and son service nodes.

S402, respectively determining first identification information of each parent service node and second identification information of all child service nodes corresponding to each parent service node, wherein the first identification information of each parent service node and the second identification information of all child service nodes corresponding to each parent service node have the same identification characters with preset quantity.

S403, storing the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node into a predetermined distributed subscription message system.

Specifically, as shown in fig. 5, it is a flowchart of the specific implementation of S403 in fig. 4. As can be seen from fig. 5, S403 includes:

s4031, traverse a mapping relationship stored in a predetermined database based on the first identification information of each parent service node, and query a message category matched with the first identification information of each parent service node.

The message type is a message type of a message stored in a message queue of the predetermined distributed subscription message system, and the mapping relationship is a mapping relationship between the first identification information of the parent service node and the message type.

S4032, if the message category matched with the first identification information of the target parent service node is found, storing the first identification information of the target parent service node and the second identification information of all the child service nodes corresponding to the target parent service node as the matched message category; the target parent service node is any one of the parent service nodes.

S4033, if the message category matching the first identification information of the target parent service node is not found, generating the message category corresponding to the first identification information of the target parent service node according to a preset message category generation method.

In an optional implementation manner, the preset message category generating method includes:

traversing node information of all the father service nodes stored in a predetermined distributed coordination service system to obtain node information of the target father service node, wherein the node information comprises Internet protocol address information and port information of a calling node;

and generating the message category corresponding to the first identification information of the target father service node according to the Internet protocol address and the port information of the calling node.

It should be noted that the distributed coordination service system is a message relay system, and may be used as a service header in the distributed system, for example, ZooKeeper, and the data storage in the distributed coordination service system is also based on nodes, but is different from the nodes of the tree, and the application mode thereof is based on path reference, and is similar to a file path. In this embodiment, each node of the distributed coordination service system stores node information of one parent service node.

S4034, store the first identification information of the target parent service node and the second identification information of all the child service nodes corresponding to the target parent service node as the generated message type.

In an optional implementation manner, before the traversing node information of all the parent service nodes stored in the predetermined distributed coordination service system to obtain the node information of the target parent service node, the method includes: sending a first instruction for monitoring the heartbeat information of each father service node of all the applications to the distributed coordination service system;

if receiving heartbeat information of a father service node which is returned by the distributed coordination service system and monitored, sending a second instruction for acquiring node information of the father service node to the distributed coordination service system;

and the second instruction is used for instructing the distributed coordination service system to store the node information of the father service node.

Fig. 6 is a flowchart illustrating an implementation of a call chain generating method according to a fifth embodiment of the present application. The application to the call chain server can be realized by software or hardware of the call chain server. The method specifically comprises S601-S602, wherein the specific implementation processes of S601-S602 are detailed as follows: s601, acquiring first identification information of each father service node and second identification information of all child service nodes corresponding to each father service node from a distributed subscription message system.

specifically, as shown in fig. 7, it is a flowchart of a specific implementation of S601 in fig. 6. As can be seen from fig. 7, S601 includes:

s6011, obtaining a scheduling progress of each message type of the distributed subscription message system from a predetermined distributed coordination service system, where the scheduling progress of each message type of the distributed subscription message system is cached by the distributed coordination service system.

S6012, the first identification information of each father service node and the second identification information of all the child service nodes corresponding to each father service node are determined according to the calling progress of each message type.

S6013, sequentially obtaining, from the distributed subscription message system according to the obtaining sequence, the first identification information of each parent service node and the second identification information of all child service nodes corresponding to each parent service node.

S602, generating a call chain of each father service node and all the corresponding child service nodes according to the first identification information and the second identification information.

In an alternative implementation, the call chain includes call chain identification information (TraceID) and a call relationship between service nodes; in another implementable manner, the call chain further includes event information including time information and attribute information of an event occurrence.

Through the analysis, the call chain server acquires the first identification information of each father service node and the second identification information of all child service nodes corresponding to each father service node from the distributed subscription message system; and generating a call chain of each father service node and all the corresponding child service nodes according to the first identification information and the second identification information. The calling chain can be used for fault location of the service node, fault location of the service node is carried out based on the calling chain, and fault location efficiency can be improved. Fig. 8 is a schematic device diagram of an application performance management system provided in the present invention. As shown in fig. 8, the application performance management system 8 of this embodiment collects a server 81 and a call chain server 82. As shown in fig. 9, it is a schematic diagram of an apparatus of the acquisition server 81 in fig. 8, and as can be seen from fig. 9, the acquisition server 81 includes: a detection module 810, an acquisition module 811, and a determination module 812;

the detection module 810 is configured to, if it is detected that the operation performance of the target application fails, obtain call chains of all service nodes corresponding to the target application; wherein the service nodes comprise a parent service node and a child service node; the call chain is generated by the call chain server in advance according to the first identification information of the father service node and the second identification information of all the child service nodes, and the call chain is the call chain of each father service node and all the child service nodes corresponding to the father service node;

an obtaining module 811, configured to obtain, according to the first identification information of each parent service node and the second identification information of all child service nodes, first call data of the parent service node included in each call chain and second call data of all child service nodes included in each call chain;

a determining module 812, configured to determine a failed service node according to the obtained first call data and the second call data;

in an optional implementation manner, the collection server 81 further includes:

the system comprises a calling data acquisition module, a service node selection module and a service node selection module, wherein the calling data acquisition module is used for acquiring calling data corresponding to service nodes of all predetermined applications, the calling data comprise identification information of the service nodes, and the service nodes comprise father service nodes and son service nodes;

an identification information determining module, configured to determine first identification information of each parent service node and second identification information of all child service nodes corresponding to the parent service node, where the first identification information of each parent service node and the second identification information of all child service nodes corresponding to the parent service node have the same identification characters in a preset number;

the storage module is used for storing the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to the parent service nodes into a predetermined distributed subscription message system in an associated manner;

In an alternative implementation, the storage module includes:

the traversal unit is used for traversing the mapping relation stored in a predetermined database based on the first identification information of each father service node, and inquiring the message category matched with the first identification information of each father service node;

the message type is the message type of the message stored in the message queue of the predetermined distributed subscription message system, and the mapping relationship is the mapping relationship between the first identification information of the parent service node and the message type;

a first storage unit, configured to store, if a message category matching the first identification information of a target parent service node is found, the first identification information of the target parent service node and the second identification information of all the child service nodes corresponding to the target parent service node as the matching message category; the target parent service node is any one of the parent service nodes;

a generating unit, configured to generate, if a message category matching the first identification information of the target parent service node is not found, the message category corresponding to the first identification information of the target parent service node according to a preset message category generating method;

a second storage unit, configured to store the first identification information of the target parent service node and the second identification information of all the child service nodes corresponding to the target parent service node as the generated message type.

In an optional implementation manner, the storage module further includes:

a first sending unit, configured to send a first instruction for monitoring heartbeat information of each parent service node of all the applications to the distributed coordination service system;

the second sending unit is used for sending a second instruction for acquiring the node information of the father service node to the distributed coordination service system if the heartbeat information of the father service node returned by the distributed coordination service system is received and monitored;

the second instruction is used for instructing the distributed coordination service system to store the node information of the parent service node.

As shown in fig. 10, it is a schematic diagram of the apparatus of the call chain server 82 in fig. 8, and as can be seen from fig. 10, the call chain server 82 includes:

a generating module 820, configured to generate, in advance, a call chain between each parent service node and each corresponding child service node according to the first identification information of the parent service node and the second identification information of all the child service nodes.

In an alternative implementation, the call chain server 82 further includes:

an identification information obtaining module, configured to obtain, from the distributed subscription message system, the first identification information of each parent service node and the second identification information of all the corresponding child service nodes;

the first identification information of each parent service node and the second identification information of all the child service nodes corresponding to each parent service node are determined by an acquisition server according to acquired call data corresponding to all the service nodes of a target application, the acquisition server stores the determined first identification information and the determined second identification information into the distributed subscription message system, the call data comprises the identification information of the service nodes, and the service nodes comprise the parent service nodes and the child service nodes.

In an optional implementation manner, the identification information obtaining module includes: the first obtaining unit is used for obtaining the calling progress of each message type of the distributed subscription message system from a predetermined distributed coordination service system, and the calling progress of each message type of the distributed subscription message system is cached in the distributed coordination service system;

the calling unit is used for determining the acquisition sequence of the first identification information of each father service node and the second identification information of all the corresponding child service nodes according to the calling progress of each message type;

and the second obtaining unit is used for sequentially obtaining the first identification information of each father service node and the second identification information of all the child service nodes corresponding to each father service node from the distributed subscription message system according to the obtaining sequence.

In an optional implementation manner, the call chain includes call chain identification information and a call relation between service nodes;

a generating module 820, configured to generate, according to the first identification information and the second identification information, the call chain identification information of each parent service node and all the corresponding child service nodes and the call relationship between service nodes.

Fig. 11 is a schematic diagram of an acquisition server provided by the present invention. As shown in fig. 11, the acquisition server 11 of this embodiment includes: a processor 110, a memory 111 and a computer program 112, such as a snore call chain generation program, stored in said memory 111 and operable on said processor 110. The processor 110, when executing the computer program 112, implements the steps in the above-described embodiments of the call chain generation method, such as steps 401 to 403 shown in fig. 4. Alternatively, the processor 110, when executing the computer program 112, implements the functions of the modules/units in the above-described acquisition server embodiment, for example, the functions of the modules 810 to 812 shown in fig. 9.

Illustratively, the computer program 112 may be partitioned into one or more modules/units that are stored in the memory 111 and executed by the processor 110 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 112 in the acquisition server 11. For example, the computer program 112 may be divided into a detection module, an acquisition module, and a determination module (module in a virtual device), and each module specifically functions as follows:

and the determining module is used for determining the service node with the fault according to the acquired first calling data and the acquired second calling data.

Fig. 12 is a schematic diagram of a call chain server provided by the present invention. As shown in fig. 12, the call chain server 12 of this embodiment includes: a processor 120, a memory 121, and a computer program 122, such as a call chain generator, stored in the memory 121 and operable on the processor 120. The processor 120, when executing the computer program 122, implements the steps in the various call chain generation method embodiments described above, such as steps 601 to 602 shown in fig. 6. Alternatively, the processor 120, when executing the computer program 122, implements the functions of each module/unit in the call chain server embodiment described above, such as the functions of the module 820 shown in fig. 10.

Illustratively, the computer program 122 may be partitioned into one or more modules/units that are stored in the memory 121 and executed by the processor 120 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 122 in the call chain server 12. For example, the computer program 122 may be divided into generating modules (modules in the virtual device), and the specific functions of the generating modules are as follows:

and the generating module is used for generating a calling chain of each parent service node and all the corresponding child service nodes in advance according to the first identification information of the parent service node and the second identification information of all the child service nodes.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of communication units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A service node fault positioning method is applied to an application performance management system, and is characterized in that the application performance management system comprises an acquisition server and a call chain server, and the method comprises the following steps:

2. The method for locating a service node fault according to claim 1, wherein, if the acquisition server detects that the running performance of the target application is faulty, before acquiring the call chains of all service nodes corresponding to the target application, the method further includes:

the acquisition server acquires predetermined calling data corresponding to all applied service nodes, wherein the calling data comprises identification information of the service nodes, and the service nodes comprise father service nodes and son service nodes;

the acquisition server respectively determines first identification information of each father service node and second identification information of all corresponding child service nodes of the father service node, wherein the first identification information of each father service node and the second identification information of all corresponding child service nodes of each father service node have the same identification characters with preset quantity;

the acquisition server stores the first identification information of each father service node and the second identification information of all the child service nodes corresponding to each father service node in a predetermined distributed subscription message system in an associated manner;

the call chain server acquires the first identification information of each father service node and the second identification information of all the corresponding child service nodes from the distributed subscription message system;

and the calling chain server generates calling chains of each father service node and all the corresponding child service nodes according to the first identification information and the second identification information.

3. A call chain generation method is applied to an acquisition server, and is characterized by comprising the following steps:

4. The call chain generation method according to claim 3, wherein the storing the first identification information of each of the parent service nodes and the second identification information of all the child service nodes corresponding to each of the parent service nodes in a predetermined distributed subscription message system comprises:

based on the first identification information of each father service node, traversing a mapping relation stored in a predetermined database, and inquiring the message category matched with the first identification information of each father service node;

the message type is stored in a message queue of the predetermined distributed subscription message system, and the mapping relationship is a mapping relationship between the first identification information of the parent service node and the message type;

if the message category matched with the first identification information of the target father service node is found, storing the first identification information of the target father service node and the second identification information of all the child service nodes corresponding to the target father service node as the matched message category; the target parent service node is any one of the parent service nodes;

if the message category matched with the first identification information of the target father service node is not found, generating the message category corresponding to the first identification information of the target father service node according to a preset message category generation method;

and storing the first identification information of the target parent service node and the second identification information of all the child service nodes corresponding to the target parent service node as the generated message category.

5. The call chain generation method according to claim 4, wherein the preset message category generation method includes:

6. The call chain generation method according to claim 5, wherein before the traversing node information of all the parent service nodes stored in the predetermined distributed coordination service system to obtain the node information of the target parent service node, the method comprises:

sending a first instruction for monitoring the heartbeat information of each father service node of all the applications to the distributed coordination service system;

7. A call chain generation method is applied to a call chain server and is characterized by comprising the following steps:

8. The call chain generation method according to claim 7, wherein the obtaining the first identification information of each parent service node and the second identification information of all corresponding child service nodes from the distributed subscription message system includes:

acquiring the calling progress of each message type of the distributed subscription message system from a predetermined distributed coordination service system, wherein the calling progress of each message type of the distributed subscription message system is cached in the distributed coordination service system;

determining the acquisition sequence of the first identification information of each father service node and the second identification information of all the corresponding child service nodes according to the calling progress of each message type;

and sequentially acquiring the first identification information of each father service node and the second identification information of all child service nodes corresponding to each father service node from the distributed subscription message system according to the acquisition sequence.

9. The call chain generation method according to claim 7, wherein the call chain includes call chain identification information and a call relationship between service nodes;

generating a call chain of each parent service node and all the corresponding child service nodes according to the first identification information and the second identification information, including:

and generating calling chain identification information of each father service node and all the corresponding child service nodes and calling relations among the service nodes according to the first identification information and the second identification information.

10. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the call chain generation method according to any one of claims 3 to 6 or the steps of the call chain generation method according to any one of claims 7 to 9 when executing the computer program.