CN115022213A

CN115022213A - Method for identifying request abnormity and storage medium

Info

Publication number: CN115022213A
Application number: CN202210755690.6A
Authority: CN
Inventors: 邹建峰; 施维串
Original assignee: Fuzhou Changxin Information Technology Co ltd
Current assignee: Fuzhou Changxin Information Technology Co ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-09-06
Anticipated expiration: 2042-06-30
Also published as: CN115022213B

Abstract

The invention discloses a method and a storage medium for identifying request abnormity.A request initiator is used as an initial calling party to establish a tracking link and stores the tracking link in a preset database; in each calling process of the request execution, a calling party newly builds a tracking node in the tracking link, and inserts a calling timestamp, self information, superior tracking node information and called party information into the tracking node, wherein the superior tracking node information and the called party information can be null; acquiring tracking link information from the preset database, judging whether a request is abnormal or not according to the data of each tracking node in the tracking link information, and positioning an abnormal service node; the invention can effectively realize the complete call link tracking of a request by establishing the tracking link, and can know the response time of each called service through the timestamp recorded in the tracking link, thereby reflecting the execution condition and the service performance of the request and positioning the abnormal service node.

Description

Method for identifying request abnormity and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and a storage medium for requesting exception identification.

Background

In a distributed system, especially a microservice system, one external request often needs a plurality of internal modules, a plurality of middleware and mutual calling of a plurality of machines to be completed. In this series of calls, some may be serial and some parallel. In this case, how can it be determined which applications were invoked by the entire request? Which modules? Which nodes? And their order, how each part behaves and if there are anomalies?

Disclosure of Invention

The technical problem to be solved by the invention is as follows: a method and a storage medium for identifying request exception are provided, which can effectively track the data circulation path of the request and record the execution condition and performance.

In order to solve the technical problems, the invention adopts the technical scheme that:

a method of requesting anomaly identification, comprising the steps of:

s1, the request initiator is used as an initial calling party to establish a tracking link and stores the tracking link in a preset database;

s2, in each calling process of the request execution, a calling party creates a tracking node in the tracking link, inserts a calling time stamp, self information, superior tracking node information and called party information into the tracking node, and supplements a completion time stamp when the calling is completed;

s3, obtaining the tracing link information from the preset database, judging whether the request is abnormal according to the data of each tracing node in the tracing link information, and positioning the abnormal service node.

In order to solve the technical problem, the invention adopts another technical scheme as follows:

a storage medium requesting anomaly identification, having stored therein a computer program that, when executed, performs the steps of:

The invention has the beneficial effects that: the method for identifying the request abnormity and the storage medium of the invention can effectively realize the complete calling link tracking of a request by establishing the tracking link and correspondingly recording the calling party, the called party, the timestamp and the previous node information of each calling in each node in the tracking link, and can obtain the execution and response time of each called service through the timestamp recorded in the tracking link, thereby reflecting the execution situation and the service performance of the request, judging whether the request is abnormal or not, realizing the positioning of an abnormal service node, and facilitating the correction and improvement of the abnormal service node by developers.

Drawings

FIG. 1 is a flow chart of a method of requesting anomaly identification in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a request and a response of a method for request anomaly identification according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a request call of a method for requesting exception identification according to an embodiment of the present invention.

Detailed Description

In order to explain the technical contents, the objects and the effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.

Referring to fig. 1 and fig. 2, a method for requesting exception identification includes the steps of:

From the above description, the beneficial effects of the present invention are: the method for identifying the request abnormity and the storage medium of the invention can effectively realize the complete calling link tracking of a request by establishing the tracking link and correspondingly recording the calling party, the called party, the timestamp and the previous node information of each calling in each node in the tracking link, and can obtain the execution and response time of each called service through the timestamp recorded in the tracking link, thereby reflecting the execution situation and the service performance of the request, judging whether the request is abnormal or not, realizing the positioning of an abnormal service node, and facilitating the correction and improvement of the abnormal service node by developers.

Further, the step S2 includes the steps of:

s21, establishing a calling request, newly establishing a tracking node in the tracking link, inserting a timestamp, self information, superior tracking node information and called party information into the tracking node, and storing or updating the tracking link into a preset database;

s22, the caller stores the unique identification of the tracking link and the unique identification of the tracking node into a request head, and sends the call request to the callee;

s23, the called party receives the calling request, acquires the unique identification of the tracking link according to the request header, and takes the unique identification of the tracking node in the request header as the information of the superior tracking node;

s24, the called party executes the calling request, and uses itself as a new calling party to judge whether a next service node exists, if so, the next service node is used as a new called party to execute the step S21 and the step S22, otherwise, the execution result is returned to the previous service node, and the previous service node adds a completion timestamp to the created tracking node;

the request initiator is used as an initial caller, the information of a superior tracking node in the tracking nodes created by the request initiator is empty, and each caller can have a plurality of callees simultaneously to correspondingly create a plurality of tracking nodes.

It can be known from the above description that each node for service execution serves as a caller to establish a tracking node, and the same caller may have multiple callees, which can adapt to a scenario where multiple parallel services may exist in a service request, thereby completely recording a flow link for service execution.

Further, when the calling request is sent to the called party in the step S22, if abnormal information occurs, acquiring an abnormal code therein, and automatically identifying whether the code belongs to a network communication problem or a code error according to the abnormal code;

if the network communication problem is solved, automatically retrying the sending of the calling request, recording an abnormal code and calling party information and called party information in the calling request after the number of times of retrying exceeds a preset threshold value, and associating the abnormal code with the tracking link;

if the code is wrong, directly recording calling party information in the abnormal code and the calling request, and associating the calling party information with the tracking link;

returning error information, supplementing request interrupt information in the tracking node, and correspondingly marking the tracking link as an abnormal link;

the step S3 further includes the steps of:

s31, acquiring all abnormal links in a preset time period, and positioning abnormal service nodes according to the information related to the abnormal links and the interruption information in the abnormal links.

As can be seen from the above description, in the service execution process, if abnormal information occurs, the reason for the abnormality can be identified according to the abnormal information, so that the abnormality can be more effectively handled in different handling manners, and finally, the abnormal service node can be effectively determined according to the interruption information in all the abnormal links within the preset time period.

Further, if the caller is a client, the self information includes a device identifier, network information, a user Id and an IP address, and if the caller is a service interface in a server, the self information includes a domain name, a method name and parameter information;

if the called party is a client, the called party information comprises equipment identification, network information, user Id and IP address, and if the called party is a service interface in a server, the called party information comprises a domain name, a method name and parameter information.

According to the description, the recorded information is different according to different possibilities of the calling end or the called end, so that the specific identities of the calling party and the called end can be effectively positioned, and the recorded information is ensured to be effective.

Further, the preset database is an elastic search database;

the step S2 is followed by the step of:

s3, the server analyzes the data of each tracking node in the tracking link in the Elasticissearch database through an Elasticissearch data analysis engine to generate a complete calling chain.

As can be seen from the above description, the ElasticSearch database is used to store the trace link data, and the trace link data can be analyzed by the ElasticSearch data analysis engine, so that a complete call chain is generated, the reproducibility of the request call is effectively improved, and the reproducibility of the problems occurring in the request execution process is further improved.

As can be seen from the above description, the beneficial effects of the present invention are: the method for identifying the request abnormity and the storage medium of the invention can effectively realize the complete calling link tracking of a request by establishing the tracking link and correspondingly recording the calling party, the called party, the timestamp and the previous node information of each calling in each node in the tracking link, and can obtain the execution and response time of each called service through the timestamp recorded in the tracking link, thereby reflecting the execution situation and the service performance of the request, judging whether the request is abnormal or not, realizing the positioning of an abnormal service node, and facilitating the correction and improvement of the abnormal service node by developers.

Further, the step S2 includes the steps of:

s22, the caller stores the unique identification of the tracking link and the unique identification of the tracking node into a request header and sends the call request to the callee;

It can be known from the above description that each node for service execution serves as a calling party to establish a tracking node, and the same calling party may have multiple called parties, and can adapt to a scenario where multiple parallel services may exist in a service request, so that a streaming link for service execution can be completely recorded.

the step S3 further includes the steps of:

if the called party is a client, the called party information comprises a device identifier, network information, a user Id and an IP address, and if the called party is a service interface in a server, the called party information comprises a domain name, a method name and parameter information.

Further, the preset database is an elastic search database;

the step S2 is followed by the step of:

From the above description, the storage of the trace link data is performed by using the ElasticSearch database, and the trace link data can be analyzed by using the ElasticSearch data analysis engine, so that a complete call chain is generated, the reproducibility of the request call is effectively improved, and the reproducibility of the problems occurring in the request execution process is further improved.

The method and the storage medium for identifying the request exception are suitable for a scene that a service system needs to track the execution process of a service request so as to find possible problems in the system.

Referring to fig. 1 to fig. 3, a first embodiment of the present invention is:

a method of requesting anomaly identification, comprising the steps of:

and S1, establishing a tracking link by taking the request initiator as an initial calling party, and storing the tracking link in a preset database.

In this embodiment, taking the client as the request initiator as an example, the client invokes the StartTrace of the SDK to start a new Trace link Trace.

S2, in each calling process of the request execution, a calling party newly builds a tracking node in the tracking link, and inserts a calling time stamp, self information, superior tracking node information and called party information into the tracking node, wherein the superior tracking node information and the called party information can be null;

the step S2 includes the steps of:

if the caller is a client, the self information comprises equipment identification, network information, user Id and IP address, and if the caller is a service interface in a server, the self information comprises domain name, method name and parameter information;

In this embodiment, after the client establishes the Trace link Trace, a Span structure is generated, that is, a Trace node, fills relevant information (device identifier, network information, user Id, IP address, and the like are used to mark relevant information of user identity data) of the caller, fills relevant information (domain name, method, parameter, and the like are used to assist in marking relevant information of user identity data) of the callee, fills the current timestamp, and collects the filled information into an Elastic Search database by calling a method in the SDK provided by the collector.

Since the client is the initiator of the request and there is no superior call, there is no need to collect superior tracking node information.

And S22, the caller stores the unique identification of the tracking link and the unique identification of the tracking node into a request header and sends the call request to the callee.

In this embodiment, the client appends the unique identifier SpanId of the trace node and the unique identifier TraceId of the trace link to the request Header Http Header, so that the service of the downstream callee can acquire the TraceId and the SpanId of the caller through Http Header.

When the calling request is sent to the called party in the step S22, if abnormal information occurs, acquiring an abnormal code therein, and automatically identifying whether the code belongs to a network communication problem or a code error according to the abnormal code;

if the network communication problem is the problem, automatically retrying the sending of the calling request, and reporting manual processing after the number of times of retrying exceeds a preset threshold value;

if the code error is found, the manual processing is directly reported.

In the embodiment, in the process of calling the downstream service, capturing abnormal information generated in the calling process, automatically judging whether the abnormality is a network communication problem or a code error through the abnormal information, if the abnormality is the network communication problem, automatically retrying the service calling, reporting manual processing through a nail after the number of retrying exceeds a preset number threshold, and if the abnormality is the code error, directly reporting the manual processing through the nail.

Network communication problems and code errors may be determined by exception codes contained in the exception information, such as by partial identification by HTTP status codes, e.g., 504 indicating a gateway timeout, belonging to a network communication problem, 414 indicating that the URL of the request is too long, belonging to a code error.

In this embodiment, when the callee service receives the request, the TraceId and the SpanId of the caller are analyzed from the Http Header (the Http Header is a dictionary data structure, and data can be analyzed and found by directly taking the TraceId and the SpanId as keys), and the information and the timestamp are collected into an Elastic Search database through a collector. The called party also generates a Span and fills the parsed Span id into the parentspan id of the current Span, which represents the Span id of the parent node of the current Span, namely the superior tracking node information.

If the service still needs to call other services, a Span is created in the same trace link, and the Span Id and the TraceId of the Span are attached to the Http Header to call other services. When the callee's service is completed, the Span's information and timestamp are collected by the collector into the Elastic Search database.

In this embodiment, each caller can complete the request execution after receiving the return information of the lower level, and as shown in fig. 2, each lower level service node needs to respond to the request of the upper level service node. After the request execution is completed, the timestamp at this time is also collected into the Elastic Search database.

Taking the calling relationship shown in fig. 3 as an example, the partial data records in the Elastic Search database are shown in table 1 below:

TABLE 1

S25, repeating the steps S21 to S24 until all the calling parties have no next service node;

the request initiator is used as an initial caller, the upper tracking node information in the tracking node created by the request initiator is empty, each caller can have a plurality of callees at the same time, and the callees in the tracking node can have a plurality of pieces of information.

Thus, we can find the path of all called services of one call through the TraceId, and the order relation of the call can be known from the parenspanid, and the response time of each called service can be known from the recorded timestamp.

S3, obtaining the tracing link information from the preset database, and judging whether the request is abnormal according to the data of each tracing node in the tracing link information.

S4, the server analyzes the data of each tracking node in the tracking link in the Elasticissearch database through an Elasticissearch data analysis engine to generate a complete calling chain.

In this embodiment, the tracking link in the Elasticsearch database is analyzed by the Elasticsearch data analysis engine to generate a complete call chain, and a problem occurring in the request execution process can be reproduced with a high probability due to the complete call chain of the request.

The invention realizes that:

automatically collecting data;

analyze the data to generate a complete call chain: with a complete call chain of requests, the problem can be reproduced with a high probability;

data visualization and performance visualization of each component can help people to well locate the bottleneck of the system and find out the problem in time.

Therefore, each specific request link of the request can be well positioned, so that the request link tracking is easily realized, and the performance bottleneck of each module is further positioned and analyzed.

Referring to fig. 3, a second embodiment of the invention is:

a method for requesting exception identification, which is different from the first embodiment in that:

in this embodiment, step S3 is specifically to analyze, by the Elasticsearch data analysis engine, data of each tracking node in the tracking link in the Elasticsearch database, and determine a possible short board or bottleneck of the system service according to the completion time of the request.

The method comprises the steps of counting the time length of request execution through the start time and the finish time of the request execution, and listing abnormal request execution conditions (for example, whether the time length of the request execution exceeds a preset threshold value or exceeds the average time length of the request execution in the system, and the like) according to a call relation in the request execution, so as to obtain a short board or a bottleneck which may exist in the system.

For example, the request call illustrated in fig. 3 has part of data in the Elasticsearch database as shown in table 2 below:

trace_id	Span_id	parent_span_id	span_name	gmt_begin	gmt_end
						123	0		request a	14:00:01	14:00:12
123	1	0	a calls b	14:00:02	14:00:10
						123	1.1	1	b call d	14:00:03	14:00:03
123	2	0	a call c	14:00:03	14:00:12

In this embodiment, the average duration of the request execution completion in the system is used as the criterion for determining the abnormality identification, and the average duration of the request execution in the system is 5s, as can be seen from table 2, the execution duration of the request a reaches 11 seconds, the execution duration of the call a reaches 8 seconds, and the execution duration of the call a reaches 9 seconds, both of which exceed the average duration. According to the calling relation of the request, the fact that the time length of the request a exceeds the average value is caused by two sub-calling requests of a calling b and a calling c, and the b node and the c node do not have lower-level calling or abnormal lower-level calling, so that the fact that the b node and the c node have abnormal service execution can be judged, and a short board or a bottleneck which may exist for system service execution can be judged. The output analysis results are:

and simultaneously outputting the service interface names of the two services b and c and the names of the corresponding servers, so that developers can further analyze the reasons to check and correct the problems.

The third embodiment of the invention is as follows:

a method for requesting exception identification is different from the second embodiment in that:

and returning error information, supplementing request interrupt information in the tracking node, and correspondingly marking the tracking link as an abnormal link.

In this embodiment, in the process of invoking a downstream service, capturing exception information generated in the invoking process, automatically determining whether the exception is a network communication problem or a code error through the exception information, if the exception is a network communication problem, automatically retrying the service invocation, recording an exception code and caller information and callee information in the invocation request after the number of retries exceeds a preset number threshold, and associating the exception code and the caller information in the invocation request with the trace link, and if the number of retries is a code error, directly recording the exception code and the caller information in the invocation request, and associating the exception code and the caller information with the trace link.

After the abnormal information is recorded, the current service node returns error information to the previous service node, and the request interrupt information is supplemented in the tracking node corresponding to the previous node calling the current node, and the current tracking link is marked as an abnormal link.

Step S3 further includes the steps of:

s31, acquiring all abnormal links, and positioning abnormal service nodes according to the information related to the abnormal links and the interruption information in the abnormal links.

In this embodiment, the abnormal link information in the preset database within a preset time period is obtained, for example, all the abnormal link information within 3 days is obtained, which service node has an interruption is determined according to the interruption information therein, and the number of times that each service node has an interruption is counted. For example, within three days, node a is interrupted 1 time, node b is interrupted 10 times, and node c is interrupted 2 times. And comparing the interruption times of each service node with a preset value to judge whether the service node is abnormal or not. For example, in this embodiment, the preset value is 3, that is, the number of times of interruption within 3 days exceeds 3, it is considered that the node is abnormal, otherwise, it is considered that the node is interrupted sporadically, and no processing is performed.

And for the identified service nodes with the abnormal conditions, acquiring the information recorded by each interruption according to the node information of the path reached by the service nodes when the interruption occurs, and analyzing the information. For example, when a network communication problem occurs, caller information (current service node) and callee information (next service node) are recorded, since the network communication problem may be caused by an upstream node or a downstream node, when counting abnormal codes, even if an interrupt node is marked as callee information, the current abnormal codes are counted.

And counting the occurrence frequency of network communication problems and the occurrence frequency of code errors, and generating and outputting an exception analysis table corresponding to the occurrence frequency of each exception code.

For example:

the fourth embodiment of the invention is as follows:

a method for requesting exception identification, which is different from the second embodiment in that:

in step S3 of this embodiment, after it is recognized that some service nodes (e.g., nodes b and c) are abnormal, the analysis result is not directly output, but the abnormal times are recorded by an abnormal time identifier, and when the abnormal times reach a preset time threshold, all the abnormal requests are called to automatically perform common analysis.

In this embodiment, when the number of times of abnormality of b and c reaches 200 times, the specific contents of the 200 requests are called for common analysis. Taking the b node as an example, assuming that the b node in this embodiment reads the content of the document file (including doc, docx, pdf, txt, and the like) specified by the user, we find that the files whose common property is that the requested reading is pdf files by comparing the data content of the call request, and as long as the processing time is longer when the pdf files are requested to be read. Thus, the system outputs the common data parameter in the request:

for example:

FileName

a.pdf

13.pdf

pdf, chapter 33

The general property is as follows: pdf.

According to the content output by the system, developers can easily find that the slow processing is probably caused by reading the pdf file, so that problem troubleshooting and correction can be performed in a targeted manner, and the working efficiency is improved.

The fifth embodiment of the invention is as follows:

a storage medium requesting abnormality identification, having stored therein a computer program that, when executed, implements the steps in one of the above embodiments one to three, a method requesting abnormality identification.

In summary, the method and the storage medium for identifying a request exception provided by the present invention can effectively implement a complete call link tracking for a request by establishing a tracking link and correspondingly recording a calling party, a called party, a timestamp and previous node information for each call in each node in the tracking link, and can obtain execution and response time of each called service through the timestamp recorded therein, thereby reflecting the execution situation and service performance of the request, determining whether the request is abnormal, and implementing the positioning of an abnormal service node, thereby facilitating the correction and improvement of an abnormal service node by developers; and can analyze the abnormal situation occurring in the request execution process and make corresponding processing.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

1. A method of requesting anomaly identification, comprising the steps of:

2. The method of claim 1, wherein the step S2 includes the steps of:

s24, the called party executes the calling request, takes the called party as a new calling party to judge whether a next service node exists, if so, takes the next service node as a new called party to execute the step S21 and the step S22, otherwise, returns the execution result to the previous service node, and adds the completion timestamp to the created tracking node by the previous service node;

the request initiator is used as an initial caller, the information of a superior tracking node in the tracking nodes created by the request initiator is empty, and each caller can have a plurality of callees at the same time to correspondingly create a plurality of tracking nodes.

3. The method according to claim 2, wherein when the call request is sent to the called party in the step S22, if an abnormal message occurs, the abnormal code is obtained, and whether the abnormal code is a network communication problem or a code error is automatically identified according to the abnormal code;

the step S3 further includes the steps of:

4. The method according to claim 2, wherein if the caller is a client, the self-information includes a device identifier, network information, a user Id, and an IP address, and if the caller is a service interface in a server, the self-information includes a domain name, a method name, and parameter information;

5. The method for requesting anomaly identification according to claim 1, wherein the preset database is an ElasticSearch database;

the step S3 is followed by the step of:

6. A storage medium requesting anomaly identification, having a computer program stored therein, the computer program when executed implementing the steps of:

7. The storage medium of claim 6, wherein the step S2 includes the steps of:

8. The storage medium of claim 7, wherein when the call request is sent to the called party in the step S22, if exception information occurs, the exception code is obtained, and whether the exception code is a network communication problem or a code error is automatically identified according to the exception code;

the step S3 further includes the steps of:

9. The storage medium of claim 7, wherein if the caller is a client, the self-information includes a device identifier, network information, a user Id, and an IP address, and if the caller is a service interface in a server, the self-information includes a domain name, a method name, and parameter information;

10. The storage medium of claim 6, wherein the predetermined database is an ElasticSearch database;

the step S2 is followed by the step of: