CN114500248A

CN114500248A - Monitoring and alarming method and system for service in Internet software system

Info

Publication number: CN114500248A
Application number: CN202210335792.2A
Authority: CN
Inventors: 孙宝岳; 闵刚; 姚占龙
Original assignee: Beijing Ruirong Tianxia Technology Co ltd
Current assignee: Beijing Ruirong Tianxia Technology Co ltd
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-05-13
Anticipated expiration: 2042-04-01
Also published as: CN114500248B

Abstract

The invention provides a method and a system for monitoring and alarming service in an Internet software system, which relate to the technical field of Internet software and comprise the following steps: constructing a business tree according to a business process; determining information required to be contained by each node in a service tree; triggering related nodes to report index information in the service execution process, wherein the index information comprises execution state information; if the execution state information is failure, triggering an alarm; and scanning the index information, acquiring nodes which do not report execution state information and missing nodes in the service flow, and triggering an alarm. The invention takes the service tree as the core to aggregate the monitoring index information, and accurately obtains the context information of service execution, thereby greatly increasing the observability of the service; when the problem occurs in the service execution, the problem is quickly checked and positioned by timely warning perception, the error checking efficiency is improved, and the problem is quickly solved.

Description

Monitoring and alarming method and system for service in Internet software system

Technical Field

The invention relates to the technical field of internet software, in particular to a method and a system for monitoring and alarming services in an internet software system.

Background

Products of a plurality of enterprises provide services to the outside by being deployed on the Internet, which puts higher requirements on the reliability of software services, ideally, the services need to be uninterrupted 24 hours per year, but the software products have the objectivity of software defects, overhigh service load, server faults and other irresistible factors, so that the software products have to have a mechanism for rapidly finding faults, positioning problems and solving the problems so as to rapidly recover the services, and therefore, a set of monitoring and warning system with complete functions is the key for solving the problems.

At present, for a monitoring and warning mechanism of a business application service, one employs a link monitoring warning, that is: completely recording the process from the call flow from the application gateway to each service to the response to the user, and acquiring error information to alarm; and the other method adopts code embedded point reporting, and senses the running state of the system by manually reporting key node information and error information of the service flow in the service codes to realize monitoring and alarming.

The method for monitoring and alarming the link can monitor that the delay of a certain link is too high or the link terminal in the process of transferring the service, but can not sense the abnormal information in the service; for the method for reporting the code embedded points, although the system running state can be sensed, because each embedded point is relatively independent, the state information of each embedded point in each service processing circulation process is collected to form a transverse report, but the serial connection of one service line is lacked, so that when an abnormality occurs, the context state data of the site where the abnormality occurs cannot be obtained as soon as possible, and an alarm cannot be given accurately.

Disclosure of Invention

Aiming at the problems, the invention provides a method and a system for monitoring and alarming services in an internet software system, which can realize timely notification of problems in the service execution process and accurately acquire context information of service execution, thereby quickly positioning the causes of the problems, solving the problems and ensuring the reliability of software services.

In order to achieve the above object, the present invention provides a method for monitoring and alarming services in an internet software system, comprising:

splitting key nodes in a business process to construct a business tree;

determining information required to be contained by each node in the service tree, including: exception information, execution state information;

triggering each node in the service tree to report index information based on the contained information in the service execution process;

if the execution state information in the index information is failure, triggering an alarm;

if any node has errors, interrupting service execution, reporting abnormal information and execution failure information, and triggering an alarm;

and scanning the reported index information, acquiring nodes which do not report execution state information and nodes which are missed in the service flow, and triggering an alarm.

As a further improvement of the present invention, the splitting of the key nodes in the business process constructs a business tree; the method comprises the following steps:

determining a correct execution sequence of each node;

determining whether each of said nodes has to execute;

determining the maximum waiting time from the completion of the current node execution to the next node execution;

determining an execution mode of each node;

and configuring the alarm strategy of each node.

As a further improvement of the invention, the execution modes of each node comprise synchronous execution and asynchronous execution;

and the alarm strategy of each node comprises whether to alarm, error information needing to be alarmed and an alarm mode.

As a further improvement of the present invention, the information that each node in the service tree needs to contain includes:

the method comprises the following steps of key input parameters and key output parameters of nodes, change snapshots of key data, exception information, execution state information, execution timestamp, service line ID, current node unique ID, last node ID, key service ID and information about whether execution is necessary.

As a further improvement of the invention, the related nodes are triggered to report the index information in the process of executing the service, and the process comprises the following steps:

firstly, reporting execution starting information, including key input parameters, an execution timestamp, a service line ID, a current node ID and a previous node ID;

then reporting execution process information including change snapshots of key data;

and finally reporting execution result information, including an execution timestamp and execution state information.

As a further improvement of the invention, in the process of reporting the index information by the node, after the node generates an error triggering alarm, the service responsible person positions the alarm node, inquires the index information reported by the alarm node, acquires the abnormal information and repairs the problem.

As a further improvement of the invention, a missing node alarm rule is preset;

and acquiring the missing nodes in the business process through scanning, and triggering an alarm according to a preset missing node alarm rule.

As a further improvement of the present invention, when a node that does not report execution state information triggers an alarm, the problem troubleshooting process is as follows:

searching a previous executing node according to the previous node ID contained in the node, and so on to obtain all the nodes of the service;

and troubleshooting the problems according to the key input parameters, the key output parameters and the abnormal information reported by all the nodes.

As a further improvement of the present invention, when scanning a missing node in a business process to trigger an alarm, the problem troubleshooting process includes:

and inquiring all nodes in the service tree with the missing nodes, and restoring the abnormal site.

The invention also provides a monitoring and warning system for the service in the internet software system, which comprises: a service index reporting program and a service monitoring engine server;

the service index reporting program is configured to:

the method comprises the steps that the method is integrated in a service system, and index data are reported in a triggering mode in the service execution process;

the service end of the business monitoring engine is used for:

receiving and storing index information reported by a client;

triggering an alarm aiming at the information of which the execution state is failure;

and starting multithreading to scan the index data, acquiring nodes which do not report execution state information and missing nodes in the service flow, and triggering an alarm.

Compared with the prior art, the invention has the beneficial effects that:

aiming at the service, the invention constructs a service tree, aggregates monitoring index information by taking the service tree as a core, and greatly increases the observability of the service compared with the monitoring system designed in the system operation angle in the existing monitoring alarm method; when a problem occurs in service execution, the alarm sensing is timely carried out, the context information of the service execution, namely the index information of the relevant node, can be accurately obtained, the root cause of the problem is quickly checked and positioned, the error checking efficiency is improved, and the problem is favorably and quickly solved.

The method integrates the service index reporting program into the service system, triggers the reporting of the index information according to the operation flow in the operation process, enables the problem troubleshooting data to be comprehensive, and compared with the traditional monitoring means which heavily depends on the system operation log to troubleshoot the problems, the method is easier in troubleshooting process, and saves time and labor.

Drawings

Fig. 1 is a flowchart of a monitoring and warning method for services in an internet software system according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a monitoring and warning system for services in an internet software system according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a service tree constructed after splitting a service flow according to an embodiment of the present invention;

fig. 4 is a schematic view of a business tree constructed based on a platform merchant network access process according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The invention is described in further detail below with reference to the attached drawing figures:

as shown in fig. 1, a monitoring and warning method for services in an internet software system provided by the present invention includes:

s1, constructing a service tree according to the service flow;

wherein the content of the first and second substances,

a service tree is constructed according to actual services of an Internet software system, and the service tree is a tree structure which is composed of a plurality of core service nodes and has strict execution sequence composition to describe a specific service.

The method comprises the following steps:

splitting key nodes in a business process;

determining the correct execution sequence of each node;

determining whether each node has to execute;

determining the maximum waiting time from the completion of the execution of the current node to the execution of the next node;

determining execution modes of each node, including synchronous execution and asynchronous execution;

and configuring the alarm strategy of each node, wherein the alarm strategy comprises whether to alarm, error information needing to be alarmed and an alarm mode.

As shown in fig. 3, after a service flow is split, 5 key nodes are obtained to form a service tree; the nodes pointed by the solid lines indicate nodes that must be executed, the nodes pointed by the dotted lines indicate nodes that do not need to be executed (may or may not be executed), and the relationships among the nodes 2, 3, 4 and 5 are: after the node 2 finishes executing, the node 3 must execute, the node 4 may or may not execute, and when the node 4 executes, the node 5 must execute.

S2, determining information required to be contained by each node in the service tree;

wherein, include:

(1) key input parameters and key output parameters of the nodes (reported by the client);

(2) change snapshot of critical data: pre-change and post-change (reported by client);

(3) abnormal information: reporting error field information (reported by the client) when an error occurs;

(4) executing the state information: success or failure (reported by the client);

(5) perform timestamp (reported by client);

(6) service line ID: uniquely identifying the execution of a service, usually generated by the first node of the service tree, and then each node saves the value to identify the service affiliation of the node (reported by the client);

(7) the unique ID of the current node and the ID of the previous node (reported by a client);

(8) a key service ID (reported by a client) is used for performing service association according to the key service ID;

(9) whether the information must be executed is set at the time of service node rule definition.

S3, triggering the relevant nodes to report the index information including the execution state information based on the contained information in the service execution process;

the process of reporting the index information comprises the following steps:

firstly, reporting execution start information, including key input parameters, an execution timestamp, a service line ID (a first node in a service flow is generated by itself and is stored in the context of a current execution thread after being generated), a current node ID (a generated random number) and a previous node ID (acquired from the context of thread execution);

then reporting execution process information including change snapshots of key data, wherein the intermediate process can be reported for multiple times according to business requirements;

and finally reporting execution result information, including an execution timestamp and execution state information (success or failure).

Further, in the above-mentioned case,

in the process of reporting the index information by the nodes, the service execution is interrupted when any node has errors, and abnormal information and execution failure information are reported;

and the service responsible person locates the alarm node, inquires the index information reported by the alarm node, acquires abnormal information and repairs the problem.

S4, if the execution state is failure, triggering an alarm;

wherein the content of the first and second substances,

the reported index information is stored, and the storage system can be a relational database or other types of storage engines;

and when receiving a node state report with failed execution, triggering an alarm according to an alarm rule defined by the failed node.

Further, in the above-mentioned case,

the alarm mode can be that the software system responsible person is informed by short message or mail.

S5, scanning the index information, obtaining the nodes which do not report the execution state information and the nodes which are missing in the service process, and triggering the alarm.

Wherein the content of the first and second substances,

starting multithreading to scan according to the data volume in the service;

and presetting a missing node alarm rule, and triggering an alarm according to the preset missing node alarm rule when the missing node in the service flow is obtained through scanning.

Further, in the above-mentioned case,

when a node which does not report execution state information is scanned to trigger an alarm, searching a previous execution node according to a previous node ID contained in the node, and so on to obtain all nodes of the service; and (4) checking the problems according to the key input parameters, the key output parameters and the abnormal information reported by all the nodes.

When the missing nodes in the service flow are scanned to trigger an alarm, all the nodes in the service tree of the missing nodes are inquired, and the abnormal site is restored.

As shown in fig. 2, the present invention further provides a system for monitoring and alarming services in an internet software system, including: a service index reporting program and a service monitoring engine server;

a service index reporting program, configured to:

the method comprises the steps that the method is integrated in a service system, and reporting index data is triggered in the service execution process in a point burying mode of the service system;

wherein the content of the first and second substances,

triggering related nodes to report index information when the service executes the logic of the buried point;

the client and the service monitoring engine server adopt a TCP connection mode, and the data reporting process is asynchronously reported to the service monitoring engine server by other thread pools, so that the execution resources of the service threads are not occupied.

The service end of the business monitoring engine is used for:

receiving and storing index information reported by a client;

and starting the multithreading scanning index data, acquiring nodes which do not report execution state information and missing nodes in the business process, and triggering an alarm.

The embodiment is as follows:

the method comprises the following steps of (1) embedding points, monitoring and alarming aiming at a network access process of a merchant of a certain platform:

firstly, a business process is disassembled, and the business process comprises key nodes: registering a platform user, submitting authentication data for verification, and successfully authenticating to open a system account;

then, the execution order is determined as: registering a platform user, submitting verification of authentication data, and successfully opening a system account through authentication; determining whether a node must perform: in the network access process of the merchant, the merchant does not need to authenticate after registering the platform user, so that node execution is unnecessary when authentication data audit is submitted, but when the merchant submits the authentication data audit, the node execution is necessary when the merchant successfully opens the system account, so that the obtained service tree is shown in fig. 4, the node pointed by the solid line represents the node to be executed certainly, and the node pointed by the dotted line represents that the node can be executed or not executed;

in the process of user network access operation, reporting index data by each related node in a JSON format, specifically comprising the following steps:

(1) registering a platform user node:

{nodeId： 1， input： 180****1234， status：success， time： 1642179452540，bussId： 180****1234， reqId：i9osdkng}

(2) submitting authentication data for verification:

{nodeId： 2，input： 180****1234， status：success， time： 1642179452640，bussId： 180****1234，reqId：bh8isngns}

(3) opening a system account:

{ nodeId: 3, parentNodeId: 2, input: 180 x 1234, status: fail, time: 1642179452540, busld: 180 x 1234, reqId: bh8isngns, error: null pointer exception }

Wherein the content of the first and second substances,

nodeId represents the current node id; parentNodeId represents the last executing node id; input represents an Input parameter; status represents the execution status; time represents an execution timestamp; busId represents key service id; reqId represents the current business process id; error represents anomaly information.

Secondly, the service end of the service monitoring engine receives the information of the execution failure of the system account opening, an alarm is triggered, a software system responsible person inquires index data reported by the service end (the system account opening node) according to the alarm node, and the obtained abnormal information is as follows: null pointer exception;

and finally, the software system is quickly repaired by a software system person in charge according to the specific condition that the null pointer is abnormal.

The invention has the advantages that:

The invention adopts a service system point burying mode, so that system developers can report indexes according to the disassembled point burying standard and the accurate point burying time, thereby being easier in the process of troubleshooting problems.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A monitoring and alarming method for service in an Internet software system is characterized by comprising the following steps:

splitting key nodes in a business process to construct a business tree;

2. The monitoring and warning method according to claim 1, wherein: constructing a service tree by key nodes in the split service process; the method comprises the following steps:

determining a correct execution sequence of each node;

determining whether each of said nodes has to execute;

determining an execution mode of each node;

and configuring the alarm strategy of each node.

3. The monitoring and warning method according to claim 2, characterized in that:

the execution modes of each node comprise synchronous execution and asynchronous execution;

4. The monitoring and alarm method according to claim 3, wherein: the information that each node in the service tree needs to contain includes:

5. The monitoring and warning method according to claim 1, wherein: triggering related nodes to report index information in the process of executing the service, wherein the process comprises the following steps:

6. The monitoring and warning method according to claim 1 or 5, characterized in that: in the process of reporting the index information by the node, after the node generates an error triggering alarm, a service responsible person positions the alarm node, inquires the index information reported by the alarm node, acquires the abnormal information and repairs the problem.

7. The monitoring and warning method according to claim 1, wherein: presetting a missing node alarm rule;

8. The monitoring and warning method according to claim 1, wherein: when the node which does not report the execution state information is scanned to trigger an alarm, the problem troubleshooting process is as follows:

9. The monitoring and warning method according to claim 1, wherein: when scanning missing nodes in the business process to trigger an alarm, the problem troubleshooting process comprises the following steps:

10. A system for implementing the monitoring and warning method according to any one of claims 1 to 9, comprising: a service index reporting program and a service monitoring engine server;

the service index reporting program is configured to:

the service end of the business monitoring engine is used for:

receiving and storing index information reported by a client;