CN115017218B

CN115017218B - Processing method and device of distributed call chain, storage medium and electronic equipment

Info

Publication number: CN115017218B
Application number: CN202210691815.3A
Authority: CN
Inventors: 郑永坤; 陈康; 陈翀; 付华峥; 韦登荣
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2024-01-30
Anticipated expiration: 2042-06-17
Also published as: CN115017218A

Abstract

The disclosure relates to a method and a device for processing a distributed call chain, a storage medium and electronic equipment, and relates to the technical field of computers, wherein the method comprises the following steps: intercepting original log data, and analyzing the original log data to obtain a distributed call chain with a first data format included in the original log data; performing format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format; carrying out data compression on the distributed call chain with the second data format based on a preset compression algorithm to obtain a compressed distributed call chain; and sending the compressed distributed call chain to a Kafka cluster, so that the Kafka cluster stores the compressed distributed call chain. The present disclosure improves the integrity of data.

Description

Processing method and device of distributed call chain, storage medium and electronic equipment

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a processing method of a distributed call chain, a processing device of the distributed call chain, a computer readable storage medium and electronic equipment.

Background

In the prior art, the call chain data is stored in a distributed, column-oriented open source database HBase, and the format of the call chain data is not a standard 16-system format and is not easy to read.

It should be noted that the information of the present invention in the above background section is only for enhancing understanding of the background of the present disclosure, and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure aims to provide a processing method of a distributed call chain, a processing device of the distributed call chain, a computer readable storage medium and an electronic device, so as to overcome the problem of difficult reading due to the limitations and defects of the related art at least to a certain extent.

According to one aspect of the present disclosure, there is provided a method for processing a distributed call chain, including:

intercepting original log data, and analyzing the original log data to obtain a distributed call chain with a first data format included in the original log data;

performing format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format;

Carrying out data compression on the distributed call chain with the second data format based on a preset compression algorithm to obtain a compressed distributed call chain;

and sending the compressed distributed call chain to a Kafka cluster, so that the Kafka cluster stores the compressed distributed call chain.

In one exemplary embodiment of the present disclosure, the first data format is an object structure data format presented in the form of key-value pairs, and the second data format is a standard JSON string format;

the method for converting the format of the distributed call chain with the first data format to obtain the distributed call chain with the second data format comprises the following steps:

and converting the distributed call chain of the object structure data format presented in the form of key value pairs into a standard JSON string format.

In an exemplary embodiment of the present disclosure, the preset compression algorithm includes a GZIP compression algorithm;

the data compression is performed on the distributed call chain with the second data format based on a preset compression algorithm, so as to obtain a compressed distributed call chain, which comprises the following steps:

constructing a file head of a GZIP format file and a file tail of the GZIP format file, and analyzing the distributed call chain with the second data format to obtain an original text, a matching length and an offset distance which are included in the distributed call chain with the second data format;

And carrying out Huffman coding on the original text, the matching length and the offset distance to obtain a data block with a deflagration format, and packaging the file head, the file tail and the data block with the deflagration format to obtain a compressed distributed call chain.

In an exemplary embodiment of the present disclosure, constructing a header of a GZIP format file includes:

setting a first byte of the GZIP format check code in the file header as a first preset value, and setting a second byte of the GZIP format check code as a second preset value;

setting a compression algorithm identifier in the file header to a third preset value, and setting each bit of the flag bit in the file header to zero;

and setting the source file timestamp in the file header as the current time, and setting the additional mark and the operating system mark in the file header as fourth preset values.

In an exemplary embodiment of the present disclosure, intercepting original log data and parsing the original log data to obtain a distributed call chain having a first data format included in the original log data, includes:

modifying the source code of the log data acquisition program in the application performance management tool to obtain a data interception program with a data interception function;

Compiling and deploying the application performance management tool after source code modification, and intercepting the original log data based on a data interception program in the application performance management tool after compiling and deploying;

and carrying out internal conversion on the intercepted original log data to obtain a distributed call chain with a first data format included in the original log data.

In one exemplary embodiment of the present disclosure, the distributed log storage cluster is a Kafka storage cluster;

the distributed call chain comprises a plurality of application names, transaction IDs, IDs of current span information, IDs of upstream span information, application starting time, application response time, interface names called by the application and time consumption of the application.

In an exemplary embodiment of the present disclosure, the processing method of the distributed call chain further includes:

acquiring a compressed distributed call chain from the Kafka storage cluster, and decompressing the compressed distributed call chain to obtain an application name, a transaction ID, an ID of current span information, an ID of upstream span information, application starting time, application response time, an interface name called by an application and time consumption of the application, wherein the application name, the transaction ID, the ID of current span information, the ID of upstream span information, the application starting time, the application response time, the interface name called by the application are included in the distributed call chain;

Generating a software and hardware knowledge graph according to the application name, the transaction ID, the ID of the current span information, the ID of the upstream span information, the application starting time, the application response time, the interface name called by the application and the time consumption of the application;

acquiring original log data and original index data included in the original log data according to the transaction ID, and predicting abnormality of the original log data and the original index data based on a preset time sequence neural network model;

if the prediction result is that the original log data and the original index data are abnormal, positioning an abnormal root cause according to the software and hardware knowledge graph, and carrying out abnormal alarm based on the positioning result.

According to one aspect of the present disclosure, there is provided a processing apparatus of a distributed call chain, including:

the log data interception module is used for intercepting original log data and analyzing the original log data to obtain a distributed call chain with a first data format included in the original log data;

the format conversion module is used for carrying out format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format;

The data compression module is used for carrying out data compression on the distributed call chain with the second data format based on a preset compression algorithm to obtain a compressed distributed call chain;

and the distributed call chain storage module is used for sending the compressed distributed call chain to a Kafka cluster so that the Kafka cluster stores the compressed distributed call chain.

According to one aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of processing a distributed call chain of any of the above.

According to one aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of processing a distributed call chain as described in any one of the preceding claims via execution of the executable instructions.

According to the processing method of the distributed call chain, on one hand, original log data can be directly intercepted, and the original log data is analyzed, so that the distributed call chain with the first data format, which is included in the original log data, is obtained; then, carrying out format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format; performing data compression on the distributed call chain with the second data format based on a preset compression algorithm to obtain a compressed distributed call chain; the compressed distributed call chains are sent to the Kafka cluster, so that the Kafka cluster stores the compressed distributed call chains, the distributed call chains stored in the Kafka cluster are in a second data format, and the distributed call chains with the second data format can be directly read from the Kafka cluster, and the problem that the distributed call chains are not easy to read in the prior art is solved; on the other hand, the distributed call chain with the second data format is also subjected to data compression before being stored, so that the sampling rate of the log can be increased under the same storage space, and the data integrity is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

Fig. 1 schematically illustrates a flow chart of a method of processing a distributed call link according to an example embodiment of the present disclosure.

Fig. 2 schematically illustrates a block diagram of a tracking system of a distributed invocation link, according to an example embodiment of the present disclosure.

Fig. 3 schematically illustrates an example diagram of an application scenario of a tracking system of a distributed invocation link according to an example embodiment of the present disclosure.

FIG. 4 schematically illustrates an example diagram of the processing of a distributed call chain in a standard JSON string format, according to an example embodiment of the present disclosure.

FIG. 5 schematically illustrates a compressed example diagram of a distributed call chain having a second data format resulting in a compressed distributed call chain, according to an example embodiment of the present disclosure.

FIG. 6 schematically illustrates a specific example diagram of an application scenario for invoking chain alarms according to an example embodiment of the present disclosure.

Fig. 7 schematically illustrates a flowchart of a method of handling an anomaly alarm according to an example embodiment of the present disclosure.

Fig. 8 schematically illustrates a flow diagram of a training and predicting operation framework of a time-series neural network model according to an example embodiment of the present disclosure.

Fig. 9 schematically illustrates a block diagram of a processing device for a distributed invocation link, according to an example embodiment of the present disclosure.

Fig. 10 schematically illustrates an electronic device for implementing the above-described processing method of a distributed call link according to an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

The Pinpoint system (open source distributed link tracking system) is an APM (Application Performance Management ) tool written in Java for large-scale distributed systems. Among these, pinpoint agents (probe modules) and Pinpoint Collector (acquisition modules) may be included in Pinpoint. In the process that the acquisition module acquires original log data by a method of embedding the log points, the following defects exist: on one hand, the log embedding method needs to support two types of log framework embedding points of log4j and log back, and is complex to manage; on the other hand, the log embedding point needs to add the parameters related to the Pinpoint call chain log in each application log template, if the parameters are not added in advance, the application needs to be restarted to take effect, and the reliability of the system is reduced; on the other hand, after the embedded point is added in the log, the log is collected through the collecting component and then is added into the Kafka, the technical stack is increased, and the maintenance cost is increased.

And, the underlying data of the Pinpoint system is stored as an HBase table, and if the current data size is large and the HBase table uses a row primary key as a unique index, the efficiency of querying the target data is extremely low. Meanwhile, the underlying data processing of the Pinpoint system adopts a byte code enhancement mode. The method has the advantages that the method is convenient for developers to modify codes; and the advantage of more accurate data can be collected. The method has the defects that coding and analyzing steps are needed in the data warehouse-in and query process, the system overhead is increased by several times, the technical risk is high, and the method is not solved by an open source team. Further, the data format of the call chain data stored in the HBase is not a conventional 16-ary format and cannot be directly read; nor JSON format, which is detrimental to integrated interactions with other systems.

Based on this, in this exemplary embodiment, a method for processing a distributed call chain is provided first, where the method may run on a server, a server cluster, or a cloud server, etc.; of course, those skilled in the art may also operate the methods of the present disclosure on other platforms as desired, which is not particularly limited in the present exemplary embodiment. Referring to fig. 1, the processing method of the distributed call chain may include the steps of:

S110, intercepting original log data, and analyzing the original log data to obtain a distributed call chain with a first data format, wherein the distributed call chain is included in the original log data;

s120, performing format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format;

s130, carrying out data compression on the distributed call chain with the second data format based on a preset compression algorithm to obtain a compressed distributed call chain;

and S140, sending the compressed distributed call chain to a Kafka cluster, so that the Kafka cluster stores the compressed distributed call chain.

In the processing method of the distributed call chain, on one hand, the original log data can be directly intercepted, and the original log data is analyzed to obtain the distributed call chain with the first data format included in the original log data; then, carrying out format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format; performing data compression on the distributed call chain with the second data format based on a preset compression algorithm to obtain a compressed distributed call chain; the compressed distributed call chains are sent to the Kafka cluster, so that the Kafka cluster stores the compressed distributed call chains, the distributed call chains stored in the Kafka cluster are in a second data format, and the distributed call chains with the second data format can be directly read from the Kafka cluster, and the problem that the distributed call chains are not easy to read in the prior art is solved; on the other hand, the distributed call chain with the second data format is also subjected to data compression before being stored, so that the sampling rate of the log can be increased under the same storage space, and the data integrity is improved.

Hereinafter, a method of processing a distributed call chain according to an exemplary embodiment of the present disclosure will be explained and illustrated in detail with reference to the accompanying drawings.

First, the object of the present disclosure of the exemplary embodiment is explained and explained.

The disclosed example embodiment provides a Pinpoint-based processing method of a distributed call chain, which adds a data interception module by carrying out secondary development on a Pinpoint of an open-source distributed link tracking system, converts Span data which is originally collected by Pinpoint Collector and sent to HBase into a standard json format, compresses the Span data by adopting a GZIP algorithm, and then sends the compressed Span data to a message middleware Kafka. In addition, the processing method of the distributed call chain recorded in the example embodiment of the disclosure is non-invasive to the application, does not need to consider which log framework is used by the application, does not need to configure and collect logs, is simple and easy to maintain, and has strong data integrity; meanwhile, the method can be applied to fault early warning and intelligent positioning scenes of various large systems and services, and provides a convenient and reliable data source for subsequent application development, machine learning and the like.

Next, the tracking principle of the distributed link tracking system is explained and explained. Specifically, under the distributed service architecture, a Web request flows in from the gateway, and may call multiple services to process the request, and the final result is obtained. The communication between each service in this process is again a separate network request, and whichever service the request passes through fails or is too slow to process can have an impact on the head end. Further, there are two important concepts in distributed link tracking: tracking (trace) and span (span); where trace is the view of the entire link requesting in the distributed system, span represents the view inside the different services in the entire link, and span is the view of the entire trace combined together.

In the call chain of the whole request, the request is always transmitted to the downstream service by carrying the traceid, and each service internally generates own span for generating own internal call view and transmits the internal call view to the downstream service together with the traceid. the traceid remains unchanged throughout the call chain of the request, so all logs recorded by the system during the entire request can be queried through the traceid in the original log data. After the request reaches each service, the service generates a span for the request, and the span of the upstream service transmitted from upstream along with the request is recorded as parent-span or pspanid; when the span generated by the current service is retransmitted to the downstream service along with the request, the span is recorded as pspanid by the downstream service; and traceid, spanid and pspanid recorded in the access log and the service log can completely restore the call link view of the whole request, thereby greatly helping error checking.

Further, the distributed link tracking system described in the exemplary embodiments of the present disclosure is explained and illustrated. In particular, referring to FIG. 2, the distributed link tracking system may include a Client (Client) 210, a Gateway (Gateway) 220, a Resource Service (Resource) 230, and an authentication Service (Auth Service) 240. The Gateway (Gateway) is provided with a RESTful API and an RPC (Remote Procedure Call Protocol ), that is, the Gateway may be connected to the API in Resource and the RPC in Auth Service through the RESTful API and the RPC, respectively. Specifically, in the application process, at the beginning of the web request reaching the gateway, generating the traceid of the request and the span in the gateway service, then placing the traceid and the span in the HTTP request header or the metadata called by the RPC, and continuing to transfer downwards when the downstream service is called; further, the global routing middleware of the downstream RESTful API service and the interceptor of the RPC service receive the traceid carried by the request and generate the span of the current request inside the service, where the span received from the upstream is converted into pspanid; a specific application scenario may be shown in fig. 3.

It should be noted that, the processing method of the distributed call chain described in the exemplary embodiment of the present disclosure processes the distributed call chain generated by the web request in the whole request process. The processing method of the distributed call chain shown in fig. 1 is further explained and explained below with reference to fig. 2 and 3.

In step S110, original log data is intercepted, and the original log data is parsed, so as to obtain a distributed call chain with a first data format included in the original log data.

Specifically, to intercept the original log data, modification of the Pinpoint source code is first required. The specific implementation process can be implemented by the following steps: modifying the source code of the log data acquisition program in the application performance management tool to obtain a data interception program with a data interception function; compiling and deploying the application performance management tool after source code modification, and intercepting the original log data based on a data interception program in the application performance management tool after compiling and deploying; and carrying out internal conversion on the intercepted original log data to obtain a distributed call chain with a first data format included in the original log data.

Specifically, in the process of modifying the source code of the log data acquisition program, the main modification parts are as follows: com. Taking Pinpoint Span as an example, the pseudo code of the data interception module is as follows, and the handleSimple method under the threftspan handler is mainly modified; for example, a kafkaProducerGroup. Put (span Bo) program may be added to the handleSimple to effect interception of data writes to Kafka.

The specific new program may be as follows:

private void handleSimple(TBase<？,？>tbase){

private TraceService traceService；

final TSpan tSpan＝(TSpan)tbase；

final SpanBo spanBo＝spanFactory.buildSpanBo(tSpan)；

kafkaProducerGroup＝kafkaProducerManager.getGroup("span")；

kafkaProducerGroup.put(spanBo)；

traceService.insertSpan(spanBo)；

}

furthermore, after the new program is completed, compiling and deployment can be performed, so that the effects of one-time modification and long-term use are achieved, the log embedded point is not needed, and relevant configuration and collection are needed when the new application exists. By the method, the interception efficiency of the original log data can be improved, and the purpose of improving the processing effect of the distributed call chain is achieved.

Furthermore, after compiling and deployment are completed, the original log data can be intercepted based on the data interception program; further, internal conversion is carried out on the intercepted original log data, and a distributed call chain with a first data format is obtained, wherein the distributed call chain is included in the original log data; that is, the intercepted Agent original data may be internally converted to form object structure data.

In step S120, format conversion is performed on the distributed call chain with the first data format, so as to obtain a distributed call chain with the second data format.

Specifically, the first data format is an object structure data format presented in a key value pair form, and the second data format is a standard JSON string format; that is, the first data format is an object structure data format presented in the form of Key-Value; the Key may be a track (i.e., a transaction ID) in the original log data, and the value may be an application name, an ID of the current span information, an ID of the upstream span information, an application start time, an application response time, an interface name called by the application, and a time consuming the application, etc.

Specifically, in the process of performing format conversion on the distributed call chain with the first data format to obtain the distributed call chain with the second data format, the distributed call chain with the object structure data format presented in the form of key value pairs can be directly converted into the distributed call chain with the standard JSON string format. The obtained standard JSON string format distributed call chain can be specifically shown in fig. 4.

In step S130, data compression is performed on the distributed call chain with the second data format based on a preset compression algorithm, so as to obtain a compressed distributed call chain; the preset compression algorithm comprises a GZIP compression algorithm.

In this example embodiment, first, a header of a GZIP format file and a footer of the GZIP format file are constructed, and the distributed call chain with the second data format is parsed to obtain an original text, a matching length and an offset distance included in the distributed call chain with the second data format; and carrying out Huffman coding on the original text, the matching length and the offset distance to obtain a data block with a deflagration format, and packaging the file head, the file tail and the data block with the deflagration format to obtain a compressed distributed call chain.

The constructing the file header of the GZIP format file may include: setting a first byte of the GZIP format check code in the file header as a first preset value, and setting a second byte of the GZIP format check code as a second preset value; setting a compression algorithm identifier in the file header to a third preset value, and setting each bit of the flag bit in the file header to zero; and setting the source file timestamp in the file header as the current time, and setting the additional mark and the operating system mark in the file header as fourth preset values.

Hereinafter, a specific compression process will be explained and explained. Specifically, in the GZIP compression algorithm, a file header, a file tail and a file body may be included; and, the header may include a portion of a prescribed length and an extension portion. Further, in a specific application process, firstly, a header of the GZIP format file and a tail of the GZIP format file may be constructed; wherein, the file header may include a GZIP format check code, a compression algorithm, a flag bit, and a source file timestamp, a first byte (Identification 1) of the GZIP format check code may be set to a first preset value (id1=31 (0 x1f, \037)), and a second byte (Identification 2) of the GZIP format check code is set to a second preset value (id2=139 (0 x8b, \213)); the compression algorithm (CM, compression Method) identifier is set to a third preset value, which may be 8, for example; the additional flag and the operating system flag may be set to 0.

Further, in a specific compression process, the compression method can be realized based on a Deflate algorithm. The deflash algorithm is a data lossless compression algorithm, and combines huffman coding and LZ77 coding. Specifically, the LZ77 algorithm compresses original data by utilizing the correlation of adjacent data, and the module inputs the original data and outputs literal, distance-length data pairs; the huffman coding is to respectively perform data statistics on the LZ77 result to generate a huffman table, and then perform the huffman coding on the data. The input of the module is a linear and distance-length data pair, wherein the linear and length share one huffman table compression, and the distance is independently one huffman table compression. The output of the module is two huffman tables and data after huffman compression; furthermore, the data of the Huffman table can be further compressed by using Huffman coding after serialization and run-length coding; meanwhile, the input of the module is two huffman tables, and the output is the generated huffman table after the statistics of the huffman table and the compressed data stream. A specific compression flow may be shown with reference to fig. 5.

It should be noted here that, since the character string has many sub-strings that are repeated, the LZ77 compression algorithm compresses data by utilizing this feature; thus, during a particular compression process, it is searched from the compressed data whether the character appears in the front; if the character appears, only the distance between the character and the character appearing before and the character length are saved; where the compressed data has a maximum length beyond which the sliding window needs to be advanced. Meanwhile, when the next character sequence to be compressed can be found in the sliding window, this sequence is replaced by two numbers: one is the distance, representing the distance from the window of the starting position of this sequence in the window; one is the length, i.e. the length of the string. The LZ77 algorithm is characterized in that repeated character strings are found in the previous historical data, the repeated phenomenon is local, and the basic assumption is that if one character string is to be repeated, the character string is also repeated nearby and is not found in a far place, so that a sliding window for finding the data is set; the LZ77 sliding window is 32KB, so that the previous 32KB is searched, and the 32KB slides forward as the coding is continuously performed; huffman coding is a prefix coding in which each element has a corresponding codeword, where each codeword cannot be a prefix of the other codewords. The basic principle of huffman coding is that characters with high probability of occurrence are coded with codewords that are as short as possible.

It should be further noted that, because of a large number of complex call relationships between services, a large number of messages are generated when Pinpoint collects these service data, and by compressing the distributed call chain, the example embodiments of the present disclosure can increase the sampling rate of the log under the same storage space, and improve the integrity of the data. And based on lossless compression attribute of the Deflate algorithm, under the condition of ensuring that a distributed call chain is lossless, the storage space of the Kafka cluster is saved, and further, the data integrity is improved on the basis of improving the sampling rate, and further, the user experience is improved.

In step S140, the compressed distributed call chain is sent to a Kafka cluster, so that the Kafka cluster stores the compressed distributed call chain.

To this end, the data format conversion and data compression process of the distributed call chain has been completed. Further, it may be sent to the Kafka cluster to cause the Kafka cluster to store the compressed distributed call chain. Among other things, in the stored distributed call links, the distributed call chain (i.e., span) includes an application name (application name), a transaction ID (transactionId), an ID of current Span information (Span ID), ID (parentSpanId) of upstream Span information, an application start time (startTime), an application response time (elapped), an interface name (rpc) called by the application, and the application time consuming (time), among others.

It should be noted that, with continued reference to fig. 3, the traceId described above may be used to identify a specific request ID at a time; that is, when a user request enters the system, a globally unique traceId is generated in the first layer of the RPC call network, and is continuously transmitted backwards along with the RPC call of each layer, so that the paths called by the user request in the system can be connected in series through the traceId; the span id described above may be used to identify the location of an RPC call in a distributed request; that is, when the user's request enters the system, the initial value of the span id is 0 at the first layer of the RPC call network, the span id is 0.1 at the time of entering the next layer RPC call, the span id is 0.1.1 when continuing to enter the next layer RPC call, and the span id of the RPC call at the same layer as B is 0.2, so that the location of a certain RPC request in the system call can be located by the span id, and the upstream and downstream dependencies thereof are who are respectively.

It can be seen so far that the method for processing a distributed call chain according to the exemplary embodiments of the present disclosure has at least the following advantages: on one hand, the processing method based on the Pinpoint distributed call chain provided by the example embodiment of the present disclosure is different from a general log embedded point method, does not need to configure and collect logs, can directly compress call chain data by modifying source codes and then store the compressed call chain data in Kafka, and is convenient and easy to maintain; on the other hand, the processing method based on the Pinpoint distributed call chain provided by the example embodiment of the disclosure is non-invasive to the application, does not need to consider which log framework is used by the program, and is simple to manage; further, through secondary development of the open source distributed link tracking system Pinpoint, the problem that calling link data are stored on HBase and are not easy to read is solved, span data are converted into a standard json format and are stored in a message middleware Kafka; meanwhile, GZIP algorithm compression is carried out on a large number of messages generated by the Pinpoint acquisition service, so that the sampling rate of data can be improved and the integrity of the data is improved under the same storage space.

Hereinafter, an application scenario of a distributed call chain obtained by processing original log data by the distributed call chain processing method described in the exemplary embodiment of the present disclosure will be explained and described with reference to the accompanying drawings. Specifically, the application scenarios of the resulting distributed call chain may include, but are not limited to, the following scenarios: on the one hand, fault early warning: regularly learning and analyzing the collected log data, and finding abnormal hidden dangers from mass data in real time; that is, the time consumption and the state of each request call of the service can be tracked based on the call chain list, so that the potential anomaly hazards in the service execution process can be tracked; on the other hand, fault localization: constructing an alarm knowledge graph and a software and hardware knowledge graph by using historical call chain data, and analyzing and presenting possible fault sources when faults occur; in yet another aspect, intelligent decision: predicting the running trend of equipment and components according to the log data, and sending out suggestions such as capacity expansion, capacity shrinkage, cleaning and the like in advance; furthermore, the intelligent fault positioning technology is applied to the floor, early warning is carried out in advance, precautions are taken in advance, and normal operation of the application on the telecommunication line is ensured; for example, call chain alarms: when multi-service alarms are sent, only alarms of root cause service are sent, and other services are influence services; meanwhile, alarm convergence is realized through a calling chain relation and a root cause analysis algorithm among services, and power-assisted operation and maintenance personnel can quickly locate and solve the problems; wherein, a specific application scenario exemplary diagram of invoking chain alarms may be shown with reference to fig. 6; automatic topology and alarms: positioning root cause service of faults and middleware causing faults through service topology and alarms; abnormality detection: after call chain data between services is input into Kafka, a software and hardware knowledge graph is generated, data such as logs and indexes (existing in Kafka) are combined, whether the logs and the index data are abnormal or not is predicted through a TFT model algorithm, and abnormal alarm is further realized according to the knowledge graph.

The following explains and describes a specific implementation procedure of the abnormal root cause positioning with reference to fig. 7. Specifically, referring to fig. 7, a specific implementation process of the abnormal root cause positioning may include the following steps:

step S710, obtaining a compressed distributed call chain from the Kafka storage cluster, and decompressing the compressed distributed call chain to obtain an application name, a transaction ID, an ID of current span information, an ID of upstream span information, an application start time, an application response time, an interface name called by an application and time consumption of the application, which are included in the distributed call chain;

step S720, generating a software and hardware knowledge graph according to the application name, the transaction ID, the ID of the current span information, the ID of the upstream span information, the application starting time, the application response time, the interface name called by the application and the time consumption of the application;

step S730, acquiring original log data and original index data included in the original log data according to the transaction ID, and predicting anomalies of the original log data and the original index data based on a preset time sequence neural network model;

and step 740, if the prediction result is that the original log data and the original index data are abnormal, positioning an abnormal root cause according to the software and hardware knowledge graph, and carrying out abnormal alarm based on the positioning result.

Hereinafter, step S710 to step S740 will be explained and explained. Specifically, first, the above-mentioned time series neural network model is explained and explained. Specifically, the time series neural network (Temporal Fusion Transformer, TFT) model was a google in 2019 based on the time series neural network model proposed by the transducer model that was largely successful in the text field. The TFT model can support a plurality of time sequences, is based on a model structure of attention, has interpretability and characteristic selection, and uses gating to perform characteristic compression, so that the speed is high. In the training process of the TFT model, the input and output data used by the TFT model may include the following items: target: predicting a target value; the observed inputs: the observation input, such as the value of the last moment, cannot be known in advance; knowninputs: known inputs, such as year, month, day, holiday, etc., may be known in advance; static input: static inputs, such as store addresses, etc., do not change; id: the time sequence number is not input as a model, but is only used as an index; time: the time index is not input as a model, but is only an index. Based on the above, it can be known that in a specific process of training a TFT model to be trained through a history distributed call chain after history compression, the method can be implemented as follows:

Firstly, acquiring a compressed historical distributed call chain in a first preset time period (for example, one week) from a Kafka cluster, and decompressing the compressed historical distributed call chain to obtain an application name, a transaction ID, an ID of current span information, an ID of upstream span information, application starting time, application response time, an interface name called by an application and time consumption of the application, wherein the application name, the transaction ID, the ID of current span information, the ID of upstream span information, the application response time, the interface name called by the application and the time consumption of the application are included in the historical distributed call chain; secondly, a training data set is built according to the application name, the transaction ID, the ID of the current span information, the ID of the upstream span information, the application starting time, the application response time, the interface name called by the application and the application time consumption, and the training is carried out on the time sequence neural network model to be trained based on the training data set, so that the time sequence neural network model after training is completed is obtained. After obtaining the trained time-series neural network model, the trained time-series neural network model needs to be updated in a specific updating period, and specific updating modes can refer to the training process and are not repeated here.

Further, after the training time sequence neural network model is obtained, a compressed current distributed call chain in a second preset time period can be obtained from the Kafka cluster, and the compressed current distributed call chain is decompressed to obtain an application name, a transaction ID, an ID of current span information, an ID of upstream span information, application starting time, application response time, an interface name called by an application and application time consumption, which are included in the current distributed call chain;

Then, inputting the application name, the transaction ID, the ID of the current span information, the ID of the upstream span information, the application starting time, the application response time, the interface name called by the application and the time consumption of the application which are included in the current distributed call chain into a time sequence neural network model after training is completed, and obtaining prediction log data and prediction index data;

finally, judging whether the original log data and the original index data are abnormal or not according to the predicted log data and the predicted index data; if the abnormality exists, the abnormality root cause is positioned according to the positions of the abnormal log data and the abnormal index data in the software and hardware knowledge graph, corresponding abnormality alarm information is generated, and related personnel are informed to maintain.

A specific training and predictive framework flow chart is shown with reference to fig. 8.

The processing device of the distributed call chain according to the exemplary embodiment of the present disclosure will be explained and described below with reference to fig. 9. Specifically, referring to fig. 9, the processing device of the distributed call chain may include a log data interception module 910, a format conversion module 920, a data compression module 930, and a distributed call chain storage module 940. Wherein:

The log data interception module 910 may be configured to intercept original log data and parse the original log data to obtain a distributed call chain with a first data format included in the original log data;

the format conversion module 920 may be configured to perform format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format;

the data compression module 930 may be configured to perform data compression on the distributed call chain having the second data format based on a preset compression algorithm, to obtain a compressed distributed call chain;

the distributed call chain storage module 940 may be configured to send the compressed distributed call chain to a Kafka cluster, so that the Kafka cluster stores the compressed distributed call chain.

In an exemplary embodiment of the present disclosure, the processing apparatus of the distributed call chain further includes:

the decompression module can be used for acquiring a compressed distributed call chain from the Kafka storage cluster, decompressing the compressed distributed call chain, and obtaining an application name, a transaction ID, an ID of current span information, an ID of upstream span information, application starting time, application response time, an interface name called by an application and time consumption of the application, wherein the application name, the transaction ID, the ID of current span information, the ID of upstream span information, the application starting time, the application response time, the interface name called by the application and the time consumption of the application are included in the distributed call chain;

the software and hardware knowledge graph generation module can be used for generating a software and hardware knowledge graph according to the application name, the transaction ID, the ID of the current span information, the ID of the upstream span information, the application starting time, the application response time, the interface name called by the application and the application time consumption;

the anomaly prediction module can be used for acquiring original log data and original index data included in the original log data according to the transaction ID, and predicting anomalies of the original log data and the original index data based on a preset time sequence neural network model;

And the abnormality alarming module can be used for positioning the abnormal root cause according to the software and hardware knowledge graph and alarming abnormality based on the positioning result if the prediction result is that the original log data and the original index data are abnormal.

The specific details of each module in the processing device of the distributed call chain are described in detail in the processing method of the corresponding distributed call chain, so that the details are not repeated here.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 1000 according to such an embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.

As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. Components of electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting the various system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.

Wherein the storage unit stores program code that is executable by the processing unit 1010 such that the processing unit 1010 performs steps according to various exemplary embodiments of the present disclosure described in the above section of the present specification. For example, the processing unit 1010 may perform step S110 as shown in fig. 1: intercepting original log data, and analyzing the original log data to obtain a distributed call chain with a first data format included in the original log data; step S120: performing format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format; step S130: carrying out data compression on the distributed call chain with the second data format based on a preset compression algorithm to obtain a compressed distributed call chain; step S140: and sending the compressed distributed call chain to a Kafka cluster, so that the Kafka cluster stores the compressed distributed call chain.

The memory unit 1020 may include readable media in the form of volatile memory units such as Random Access Memory (RAM) 10201 and/or cache memory unit 10202, and may further include Read Only Memory (ROM) 10203.

The storage unit 1020 may also include a program/utility 10204 having a set (at least one) of program modules 10205, such program modules 10205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 1030 may be representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1000 can also communicate with one or more external devices 1100 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1050. Also, electronic device 1000 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1060. As shown, the network adapter 1060 communicates with other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with the electronic device 1000, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible implementations, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.

A program product for implementing the above-described method according to an embodiment of the present disclosure may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for processing a distributed call chain, comprising:

intercepting original log data, analyzing the original log data, and obtaining a distributed call chain with a first data format, wherein the distributed call chain is included in the original log data and comprises the following steps: modifying the source code of the log data acquisition program in the application performance management tool to obtain a data interception program with a data interception function; compiling and deploying the application performance management tool after source code modification, and intercepting the original log data based on a data interception program in the application performance management tool after compiling and deploying; internal conversion is carried out on the intercepted original log data, and a distributed call chain with a first data format and included in the original log data is obtained;

Performing format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format; the first data format is an object structure data format presented in a key value pair form, and the second data format is a standard JSON character string format;

2. The method for processing a distributed call chain according to claim 1, wherein performing format conversion on the distributed call chain having the first data format to obtain a distributed call chain having the second data format, includes:

3. The method for processing the distributed call chain according to claim 1, wherein the preset compression algorithm comprises a GZIP compression algorithm;

4. A method of processing a distributed call chain according to claim 3, wherein constructing a header of a GZIP format file comprises:

5. The method according to any one of claims 1 to 4, wherein the distributed call chain includes a plurality of application names, transaction IDs, IDs of current span information, IDs of upstream span information, application start time, application response time, interface names called by applications, and application time consumption.

6. The method for processing a distributed call chain according to claim 1, wherein the method for processing a distributed call chain further comprises:

acquiring a compressed distributed call chain from a Kafka storage cluster, decompressing the compressed distributed call chain to obtain an application name, a transaction ID, an ID of current span information, an ID of upstream span information, application starting time, application response time, an interface name called by an application and application time consumption, wherein the application name, the transaction ID, the ID of current span information, the ID of upstream span information, the application starting time, the application response time, the interface name called by the application are included in the distributed call chain;

7. A processing apparatus for a distributed call chain, comprising:

the log data interception module is used for intercepting original log data and analyzing the original log data to obtain a distributed call chain with a first data format, wherein the distributed call chain is included in the original log data and comprises the following components: modifying the source code of the log data acquisition program in the application performance management tool to obtain a data interception program with a data interception function; compiling and deploying the application performance management tool after source code modification, and intercepting the original log data based on a data interception program in the application performance management tool after compiling and deploying; internal conversion is carried out on the intercepted original log data, and a distributed call chain with a first data format and included in the original log data is obtained;

The format conversion module is used for carrying out format conversion on the distributed call chain with the first data format to obtain a distributed call chain with a second data format; the first data format is an object structure data format presented in a key value pair form, and the second data format is a standard JSON character string format;

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of processing a distributed call chain according to any one of claims 1-6.

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of processing the distributed call chain of any of claims 1-6 via execution of the executable instructions.