CN115022213B

CN115022213B - Method for identifying request abnormality and storage medium

Info

Publication number: CN115022213B
Application number: CN202210755690.6A
Authority: CN
Inventors: 邹建峰; 施维串
Original assignee: Fuzhou Changxin Information Technology Co ltd
Current assignee: Fuzhou Changxin Information Technology Co ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2024-04-05
Anticipated expiration: 2042-06-30
Also published as: CN115022213A; CN118138512A

Abstract

The invention discloses a method for identifying a request abnormality and a storage medium, wherein a request initiator is used as an initial caller to establish a tracking link and store the tracking link into a preset database; in each calling process of the request execution, a calling party builds a tracking node in the tracking link, and inserts a calling time stamp, self information, upper tracking node information and called party information into the tracking node, wherein the upper tracking node information and the called party information can be empty; acquiring tracking link information from the preset database, judging whether the request is abnormal or not according to the data of each tracking node in the tracking link information, and positioning abnormal service nodes; the invention can effectively realize the complete call link tracking of a request by establishing the tracking link, and can know the response time of each called service through the timestamp recorded in the tracking link, thereby reflecting the execution condition and service performance of the request and locating the abnormal service node.

Description

Method for identifying request abnormality and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and a storage medium for identifying a request abnormality.

Background

In a distributed system, especially a micro-service system, an external request often needs a plurality of internal modules, a plurality of middleware and a plurality of machines to be called mutually. Some of the calls in the series may be serial, while some may be parallel. In this case, how can it be determined what applications the entire request called? What modules? Which nodes? And their sequencing, how well the performance of the parts is and if anomalies are present?

Disclosure of Invention

The technical problems to be solved by the invention are as follows: a method and a storage medium for request exception recognition are provided, which can effectively track a requested data flow path and record execution conditions and performance.

In order to solve the technical problems, the invention adopts the following technical scheme:

a method of requesting anomaly identification, comprising the steps of:

s1, a request initiator is used as an initial caller to establish a tracking link and stores the tracking link into a preset database;

s2, in each calling process of the request execution, a calling party builds a tracking node in the tracking link, inserts a calling time stamp, self information, upper tracking node information and called party information in the tracking node, and supplements a completion time stamp when the calling is completed;

s3, acquiring tracking link information from the preset database, judging whether the request is abnormal or not according to the data of each tracking node in the tracking link information, and positioning the abnormal service node.

In order to solve the technical problems, the invention adopts another technical scheme that:

a storage medium having stored therein a computer program for requesting anomaly identification, the computer program when executed performing the steps of:

The invention has the beneficial effects that: according to the method and the storage medium for identifying the request abnormality, the tracking link is established, the calling party, the called party, the time stamp and the information of the previous node which are called each time are correspondingly recorded in each node in the tracking link, so that the complete tracking of the request for the whole calling link can be effectively realized, the execution and response time of each called service can be obtained through the recorded time stamp, the execution condition and service performance of the request are reflected, whether the request is abnormal or not is judged, the positioning of an abnormal service node is realized, and the correction and improvement of the abnormal service node by a developer are facilitated.

Drawings

FIG. 1 is a flow chart of a method for requesting exception identification according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a request and response of a method for identifying a request exception according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a request call for a method for exception identification according to an embodiment of the present invention.

Detailed Description

In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.

Referring to fig. 1 and 2, a method for identifying a request abnormality includes the steps of:

From the above description, the beneficial effects of the invention are as follows: according to the method and the storage medium for identifying the request abnormality, the tracking link is established, the calling party, the called party, the time stamp and the information of the previous node which are called each time are correspondingly recorded in each node in the tracking link, so that the complete tracking of the request for the whole calling link can be effectively realized, the execution and response time of each called service can be obtained through the recorded time stamp, the execution condition and service performance of the request are reflected, whether the request is abnormal or not is judged, the positioning of an abnormal service node is realized, and the correction and improvement of the abnormal service node by a developer are facilitated.

Further, the step S2 includes the steps of:

s21, establishing a calling request, creating a tracking node in the tracking link, inserting a time stamp, self information, upper tracking node information and called party information into the tracking node, and storing or updating the tracking link into a preset database;

s22, the caller stores the unique identifier of the tracking link and the unique identifier of the tracking node into a request header, and sends the calling request to a called party;

s23, the called party receives the call request, acquires the unique identifier of the tracking link according to the request header, and takes the unique identifier of the tracking node in the request header as the information of the upper tracking node;

s24, the called party executes the call request, takes the called party as a new calling party, judges whether a next service node exists, if so, takes the next service node as the new called party, executes the step S21 and the step S22, otherwise, returns an execution result to the previous service node, and supplements a completion time stamp in the created tracking node by the previous service node;

the method comprises the steps that a request initiator is used as an initial caller, upper level tracking node information in the tracking nodes created by the request initiator is empty, and each caller can have multiple callees at the same time, so that multiple tracking nodes are created correspondingly.

As can be seen from the above description, each node of the service execution is used as a caller, a tracking node is established, and the same caller can have multiple callees, so that the service execution system can adapt to a scenario that a service request may have multiple parallel services, and thus, a circulation link of the service execution can be completely recorded.

Further, when the step S22 is executed to send the call request to the called party, if abnormal information occurs, an abnormal code in the call request is obtained, and whether the call request belongs to a network communication problem or is a code error is automatically identified according to the abnormal code;

if the network communication problem is solved, automatically retrying the sending of the calling request, and after the retrying times exceed a preset threshold, recording an abnormal code, calling party information and called party information in the calling request and associating with the tracking link;

if the code is wrong, directly recording abnormal codes and calling party information in a calling request, and associating with the tracking link;

returning error information, supplementing request interrupt information into the tracking node, and correspondingly marking the tracking link as an abnormal link;

the step S3 further includes the steps of:

s31, acquiring all abnormal links in a preset time period, and positioning abnormal service nodes according to the information related to the abnormal links and the interrupt information in the abnormal links.

As can be seen from the above description, if abnormal information occurs during service execution, the cause of the abnormality can be identified according to the abnormal information, so that different processing modes are adopted to more effectively process the abnormality, and finally, abnormal service nodes are effectively judged according to the interrupt information in all abnormal links within a preset time period.

Further, if the calling party is a client, the self information comprises a device identifier, network information, a user Id and an IP address, and if the calling party is a service interface in a server, the self information comprises a domain name, a method name and parameter information;

if the called party is a client, the called party information comprises a device identifier, network information, a user Id and an IP address, and if the called party is a service interface in a server, the called party information comprises a domain name, a method name and parameter information.

According to the description, the recorded information is different according to different possibilities of the calling end or the called end, so that specific identities of the calling party and the called end can be effectively positioned, and the validity of the recorded information is ensured.

Further, the preset database is an elastic search database;

the step S2 further includes the steps of:

and S3, the server analyzes the data of each tracking node in the tracking link in the elastic search database through an elastic search data analysis engine to generate a complete call chain.

As can be seen from the description, the elastic search database is adopted to store the tracking link data, and the tracking link data can be analyzed by the elastic search data analysis engine, so that a complete call chain is generated, the reproducibility of the request call is effectively improved, and the reproducibility of the problems in the request execution process is further improved.

Further, the step S2 includes the steps of:

the step S3 further includes the steps of:

Further, the preset database is an elastic search database;

the step S2 further includes the steps of:

The method and the storage medium for identifying the request abnormality are suitable for a service system to track the execution process of the service request, so as to find the scene of possible problems in the system.

Referring to fig. 1 to 3, a first embodiment of the present invention is as follows:

a method of requesting anomaly identification, comprising the steps of:

s1, a request initiator is used as an initial caller to establish a tracking link and stores the tracking link into a preset database.

In this embodiment, taking the client as the request initiator, the client invokes the StartTrace of the SDK to start a new Trace link Trace.

S2, in each calling process of the request execution, a calling party builds a tracking node in the tracking link, and a calling time stamp, own information, upper tracking node information and called party information are inserted into the tracking node, wherein the upper tracking node information and the called party information can be empty;

the step S2 includes the steps of:

if the calling party is a client, the self information comprises a device identifier, network information, a user Id and an IP address, and if the calling party is a service interface in a server, the self information comprises a domain name, a method name and parameter information;

In this embodiment, after the client establishes the Trace link Trace, a Span structure is generated, that is, trace nodes, relevant information of the calling party (equipment identifier, network information, user Id, IP address, etc. are used to mark relevant information of user identity data) is filled, relevant information of the called party (domain name, method, parameter, etc. are used to assist in marking relevant information of user identity data) is filled, a current timestamp is filled, and the filled information is collected into an Elastic Search database through a method in the SDK provided by the call collector.

Since the client acts as the initiator of the request and there is no upper level call, there is no need to collect upper level tracking node information.

S22, the caller stores the unique identification of the tracking link and the unique identification of the tracking node into a request header, and sends the calling request to the called party.

In this embodiment, the client appends the unique id of the trace node and the unique id of the trace link to the request Header Http Header, so that the downstream called service can obtain the TraceId and the span id of the calling party through the Http Header.

When the call request is sent to the called party in the step S22, if abnormal information appears, an abnormal code in the call request is obtained, and whether the call request belongs to a network communication problem or is a code error is automatically identified according to the abnormal code;

if the network communication problem is solved, automatically retrying the sending of the calling request, and reporting the manual processing after the retrying times exceed a preset threshold;

if the code is wrong, the manual processing is directly reported.

In this embodiment, during the process of invoking the downstream service, the abnormal information generated during the invoking process is captured, the abnormal information is used to automatically determine whether the abnormality is a network communication problem or a code error, if the abnormality is a network communication problem, the service invocation is automatically retried, and after the retry number exceeds the preset number threshold, the manual processing is reported through nailing, and if the abnormality is a code error, the manual processing is directly reported through nailing.

The network communication problem and code error may be determined by an anomaly code contained in the anomaly information, such as by an HTTP status code, e.g., 504 indicating a gateway timeout, belonging to the network communication problem, 414 indicating that the requested URL is too long, belonging to the code error.

In this embodiment, when the called party service receives the request, the TraceId and the span id of the calling party are parsed from the Http Header (Http Header is a dictionary data structure, and the data can be parsed by directly taking the TraceId and span id as keys), and the information and the timestamp are collected into the Elastic Search database by the collector. The called party also generates a Span, and fills the Span id resolved above into the parentspan id of the current Span, which represents the Span id of the parent node of the current Span, i.e. the upper level tracking node information.

If the service continues to call other services, then a Span is created in the same trace link and the Span id of the Span and the TraceId are appended to the Http Header to call other services. When the called party's service is completed, the Span's information and time stamps are collected by the collector into the Elastic Search database.

In this embodiment, after receiving the return information of the lower level, each caller can complete the execution of the request, and referring to fig. 2, each lower level service node needs to respond to the request of the upper level service node. After completion of the request execution, the time stamp at this time is also collected into the Elastic Search database.

Taking the call relationship as shown in fig. 3 as an example, the following table 1 is an example of partial data records in the Elastic Search database:

TABLE 1

S25, repeatedly executing the steps S21 to S24 until all calling parties have no next service node;

the method comprises the steps that a request initiator is used as an initial caller, upper level tracking node information in the tracking nodes created by the request initiator is empty, each caller can have multiple callees at the same time, and the callee information in the tracking nodes can be multiple.

Thus we can find the paths of all called services of a call by TraceId, and from the parentspan id we can know the order of the calls, from the recorded time stamp we can know the response time of each called service.

S3, acquiring tracking link information from the preset database, and judging whether the request is abnormal or not according to the data of each tracking node in the tracking link information.

S4, the server analyzes data of each tracking node in the tracking link in the elastic search database through an elastic search data analysis engine to generate a complete call chain.

In this embodiment, the trace links in the elastic search database are analyzed by the elastic search data analysis engine to generate a complete call chain, and the complete call chain of the request is provided, so that the problem occurring in the execution process of the request is very probable to be reproduced.

The invention realizes the following steps:

automatically collecting data;

analyzing the data to produce a complete call chain: the problem can be reproduced with a high probability by having a complete call chain of the request;

the data visualization and the performance visualization of each component can help us well locate the bottleneck of the system and find out the problem in time.

Therefore, each specific request link of the request can be well positioned, so that the request link tracking can be easily realized, and the performance bottleneck of each module can be positioned and analyzed.

Referring to fig. 3, a second embodiment of the present invention is as follows:

a method for requesting abnormality identification, which is different from the first embodiment in that:

in this embodiment, step S3 is specifically to analyze, by the elastic search data analysis engine, data of each tracking node in the tracking link in the elastic search database, and determine, according to the completion time of the request, a short board or bottleneck that may exist in the system service.

Counting the time of the request execution by the start time and the completion time of the request execution, and listing abnormal request execution conditions (for example, whether the time of the request execution exceeds a preset threshold value or the average time of the request execution in the system is exceeded or not) according to the calling relation in the time, so that a short board or bottleneck possibly existing in the system is obtained.

For example, taking the request call of fig. 3 as an example, a portion of the data in its elastic search database is shown in table 2 below:

trace_id	Span_id	parent_span_id	span_name	gmt_begin	gmt_end
						123	0		request a	14:00:01	14:00:12
123	1	0	a call b	14:00:02	14:00:10
						123	1.1	1	b call d	14:00:03	14:00:03
123	2	0	a call c	14:00:03	14:00:12

In this embodiment, the average time length of the execution request in the system is taken as the judgment standard of the anomaly identification, and the average time length of the execution request in the system is 5s, then as can be seen from table 2, the execution time length of the request a reaches 11 seconds, the execution time length of the call a reaches 8 seconds, the execution time length of the call a reaches 9 seconds, and the average time length of the call a exceeds the average time length. According to the calling relation of the request, the time length of the request a exceeds the average value and is caused by two sub-calling requests of a call b and a call c, and the node b and the node c have no lower-level call or abnormal lower-level call, so that the node b and the node c can be judged to have abnormal service execution and be a short board or bottleneck possibly existing in system service execution. The output analysis result is:

and simultaneously outputting the service interface names of the two services b and c and the corresponding server names so that developers can further analyze reasons and conduct troubleshooting and correction of the problems.

The third embodiment of the invention is as follows:

a method for requesting abnormality identification, which is different from embodiment two in that:

and returning error information, supplementing request interrupt information in the tracking node, and correspondingly marking the tracking link as an abnormal link.

In this embodiment, in the process of invoking the downstream service, the anomaly information generated in the invoking process is captured, the anomaly is automatically determined by the anomaly information to be a network communication problem or a code error, if the anomaly is a network communication problem, the service invocation is automatically retried, after the retry number exceeds a preset number threshold, the anomaly code, the caller information and the callee information in the invocation request are recorded and associated with the tracking link, if the anomaly code is a code error, the anomaly code and the caller information in the invocation request are directly recorded and associated with the tracking link.

After the abnormal information is recorded, the current service node returns error information to the previous service node, and the corresponding previous node calls the tracking node of the current node to supplement the request interrupt information, and marks the current tracking link as an abnormal link.

Step S3 further comprises the steps of:

s31, acquiring all abnormal links, and positioning abnormal service nodes according to the information related to the abnormal links and the interrupt information in the abnormal links.

In this embodiment, abnormal link information in a preset database within a preset time period is obtained, for example, all abnormal link information within 3 days is obtained, according to the interrupt information therein, which service node is interrupted is judged, and the number of times that each service node is interrupted is counted. For example, node a interrupts 1 time, node b interrupts 10 times, and node c interrupts 2 times within three days. And comparing the number of times of interruption of each service node with a preset value to judge whether the service node is abnormal or not. For example, in this embodiment, the preset value is 3, that is, the number of interruption times occurring in 3 days exceeds 3, the node is considered to be abnormal, otherwise, the node is considered to be sporadic interruption, and no processing is performed.

And for the identified service node with the abnormality, acquiring information recorded by each interruption according to the node information of the reached path when the interruption occurs, and analyzing the information. For example, when a network communication problem occurs, caller information (current service node) and callee information (next service node) are recorded, and since the network communication problem may be an upstream node cause or a downstream node cause, when an abnormal code is counted, even if an interrupt node is recorded as the callee information, the current abnormal code is counted.

Counting the occurrence times of network communication problems, the occurrence times of code errors and the occurrence times of corresponding abnormal codes, generating an abnormal analysis table and outputting the abnormal analysis table.

For example:

the fourth embodiment of the invention is as follows:

in step S3 of this embodiment, after identifying that some service nodes (such as nodes b and c) have an abnormality, the analysis result is not directly output, but the number of times of occurrence of the abnormality is recorded through an abnormality number identification, and when the number of times of occurrence of the abnormality reaches a preset number threshold, all the abnormality requests are called to automatically perform the commonality analysis.

In this embodiment, when the number of anomalies of b and c reaches 200, we call the specific content of the 200 requests for commonality analysis. Taking the node b as an example, assuming that the node b in the embodiment is used for reading the content of a document file (including doc, docx, pdf, txt and the like) appointed by a user, by comparing the data content of a call request, the common existence of the node b is found that the file requested to be read is a pdf file, and the processing time is longer only when the request is made for reading the pdf file. Thus, the commonality data parameter in the system output request:

for example:

FileName

a.pdf

13.pdf

chapter 33 pdf

Commonality: pdf.

According to the content output by the system, a developer can easily find that the pdf file is very likely to be read to cause slow processing, so that the problem can be purposefully checked and corrected, and the working efficiency is improved.

The fifth embodiment of the invention is as follows:

a storage medium having stored therein a computer program which, when executed, performs the steps of one of the methods of requesting anomaly identification of the above embodiments one through three.

In summary, according to the method and the storage medium for identifying the request exception, the tracing link is established, and the calling party, the called party, the time stamp and the information of the previous node of each call are correspondingly recorded in each node in the tracing link, so that the complete tracing of the calling link of a request can be effectively realized, the execution and response time of each called service can be obtained through the time stamp recorded in the tracing link, the execution condition and the service performance of the request are reflected, whether the request is abnormal or not is judged, the positioning of an abnormal service node is realized, and the correction and improvement of the abnormal service node by developers are facilitated; and can analyze the abnormal situation occurring in the request execution process and make corresponding processing.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims

1. A method for identifying a request anomaly, comprising the steps of:

the step S2 includes the steps of:

when the step S22 is executed and the call request is sent to the called party, if abnormal information appears, the abnormal code in the call request is obtained, and whether the call request belongs to a network communication problem or is a code error is automatically identified according to the abnormal code;

returning error information to the previous service node, supplementing request interruption information in the tracking node, and correspondingly marking the tracking link as an abnormal link;

s24, the called party executes the call request, takes the called party as a new calling party, judges whether a next service node exists, if so, takes the next service node as the new called party, executes the step S21 and the step S22, otherwise, returns an execution result to the previous service node, and supplements a completion time stamp in the tracking node created by the previous service node;

the method comprises the steps that a request initiator is used as an initial calling party, upper-level tracking node information in tracking nodes created by the request initiator is empty, and each calling party simultaneously has a plurality of called parties, and a plurality of tracking nodes are correspondingly created;

s3, acquiring tracking link information from the preset database, judging whether the request is abnormal or not according to the data of each tracking node in the tracking link information, and positioning the abnormal service node;

the step S3 further includes the steps of:

s31, acquiring all abnormal links, and positioning abnormal service nodes according to the information related to the abnormal links and the interrupt information in the abnormal links;

step S31 includes the steps of:

obtaining abnormal link information in a preset database in a preset time period, determining service nodes with interruption according to the interruption information, counting the interruption times of each service node, and judging whether the service nodes are sporadically interrupted or abnormal by comparing the interruption times of each service node with a preset value.

2. The method of claim 1, wherein if the caller is a client, the self information includes a device identifier, network information, a user Id, and an IP address, and if the caller is a service interface in a server, the self information includes a domain name, a method name, and parameter information;

3. The method for requesting anomaly identification of claim 1, wherein the predetermined database is an elastic search database;

the step S3 further includes the steps of:

and S4, the server analyzes the data of each tracking node in the tracking link in the elastic search database through an elastic search data analysis engine to generate a complete call chain.

4. A storage medium having a computer program stored therein for requesting anomaly identification, the computer program when executed performing the steps of:

the step S2 includes the steps of:

the step S3 further includes the steps of:

step S31 includes the steps of:

5. The storage medium of claim 4, wherein if the caller is a client, the self information includes a device identifier, network information, a user Id, and an IP address, and if the caller is a service interface in a server, the self information includes a domain name, a method name, and parameter information;

6. The storage medium for requesting anomaly identification of claim 4, wherein the predetermined database is an elastic search database;

the step S3 further includes the steps of: