CN111478806A - Link tracking sampling method and system - Google Patents

Link tracking sampling method and system Download PDF

Info

Publication number
CN111478806A
CN111478806A CN202010254461.7A CN202010254461A CN111478806A CN 111478806 A CN111478806 A CN 111478806A CN 202010254461 A CN202010254461 A CN 202010254461A CN 111478806 A CN111478806 A CN 111478806A
Authority
CN
China
Prior art keywords
calling
sampling
link
judgment condition
calling link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010254461.7A
Other languages
Chinese (zh)
Other versions
CN111478806B (en
Inventor
李希伟
于晓峰
矫恒浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Hisense Media Network Technology Co Ltd
Original Assignee
Qingdao Hisense Media Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Hisense Media Network Technology Co Ltd filed Critical Qingdao Hisense Media Network Technology Co Ltd
Priority to CN202010254461.7A priority Critical patent/CN111478806B/en
Publication of CN111478806A publication Critical patent/CN111478806A/en
Application granted granted Critical
Publication of CN111478806B publication Critical patent/CN111478806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The application provides a link tracking method and a system, after call chain information is generated through initialization, whether a sampling rate condition is met or not is judged through a request sequence, and sampling is carried out based on the sampling rate condition; and judging whether the characteristic judgment condition is met or not through the sampling characteristic, and sampling according to the characteristic judgment condition so as to report the calling chain information needing to be sampled. According to the method, feature sampling is added on the basis of the sampling rate, so that the problem of huge sampling data can be relieved, and request data with specified abnormal features can be collected so as to carry out fault detection.

Description

Link tracking sampling method and system
Technical Field
The application relates to the technical field of cloud computing, in particular to a link tracking sampling method and system.
Background
Cloud computing is an information processing mode which runs by means of an internet cloud platform technology, and powerful and perfect network and storage capacity can be achieved. For example: cloud-based technologies such as Docker and Kubernetes. A private cloud platform or public cloud service can be constructed based on the cloud native technology, so that the resource utilization rate is improved, and the iteration speed of the software version is increased. However, under the cloud native micro-service architecture, the number of physical server nodes and the number of services increase, which brings complex networking and calling relationships among the services. Usually, a user's request is processed by multiple services.
When an abnormality occurs, in order to quickly find out which link is a problem and solve a similar problem, a link tracing process is required. Through link tracking processing, the calling relation among the services can be drawn, and the time consumed by each link corresponding to one request is inquired, so that the fault location and the performance analysis of the complex micro-service architecture system are facilitated. A typical link tracing processing method relies on a link tracing system, for example: AWS/X-Ray, OpenZipkin, OpenTracing, Datadog, and the like. The same unique ID is used for identification in each calling link of a request, so that the upstream and downstream relation of calling the microservice for multiple times is indicated, and finally, a complete calling chain message is formed.
However, in the case of a large traffic volume, if all the requested call chain information is collected, the data volume of the system is huge, and the processing cost and the storage cost are very high. Therefore, in order to reduce the processing cost and the storage cost, the data volume of the acquired call chain information can be reduced through a sampling rate control mode. I.e., a fixed proportion or number of requests to collect, such as a request to collect 1/1000, or 100 requests per second. However, the call chain information acquired by the sampling control mode is too random, and is often difficult to acquire for a call error or a request with slow response which is relatively concerned by system operation and maintenance personnel, and is not beneficial to fault detection.
Disclosure of Invention
The application provides a link tracking sampling method and a link tracking sampling system, which aim to solve the problem that the traditional sampling rate control method is difficult to acquire calling chain information such as calling error or slow response.
In one aspect, the present application provides a link trace sampling method, including:
initializing and generating calling chain information; the calling chain information comprises a plurality of calling links; each calling link comprises a plurality of calling segments belonging to the same service;
judging whether the calling link meets a sampling rate judgment condition or not according to a trigger request sequence corresponding to the calling link;
if the calling link meets the sampling rate judgment condition, adding a sampling identifier for a request, and reporting the calling link added with the sampling identifier;
if the calling link does not meet the sampling rate judgment condition, extracting sampling features in the calling link;
and adding an abnormal identifier for the calling link of which the sampling characteristics meet the characteristic judgment condition, and reporting the calling link added with the abnormal identifier.
According to the technical scheme, after the call chain information is generated through initialization, whether the sampling rate condition is met or not is judged through the request sequence, and sampling is conducted based on the sampling rate condition; and judging whether the characteristic judgment condition is met or not through the sampling characteristic, and sampling according to the characteristic judgment condition so as to report the calling chain information needing to be sampled. According to the method, feature sampling is added on the basis of the sampling rate, so that the problem of huge sampling data can be solved, and request data with specified abnormal features can be acquired to provide a fault detection channel.
In another aspect, the present application further provides a link tracking sampling system, including: the system comprises a data acquisition client and a data collection server; the data acquisition client is connected with the data acquisition server;
wherein the data collection client is configured to:
initializing and generating calling chain information; the calling chain information comprises a plurality of calling links; each calling link comprises a plurality of calling segments belonging to the same service;
judging whether the calling link meets a sampling rate judgment condition or not according to a trigger request sequence corresponding to the calling link;
if the calling link meets the sampling rate judgment condition, adding a sampling identifier for a request so as to report the calling link added with the sampling identifier to the data collection server;
if the calling link does not meet the sampling rate judgment condition, extracting sampling features in the calling link;
adding an abnormal identifier for the calling link of which the sampling characteristics meet the characteristic judgment condition so as to report the calling link added with the abnormal identifier to the data collection server;
the data collection server is configured to: and receiving and storing the calling link added with the sampling identifier and the calling link added with the abnormal identifier.
According to the technical scheme, the link tracking sampling system comprises a data acquisition client and a data collection server which are mutually connected, wherein the data acquisition client can judge call chain information through sampling rate conditions and characteristic judgment conditions after the call chain information is generated initially, and the call chain information is sampled when any condition is met, so that the data volume and the call characteristic acquisition call chain information are taken into consideration, and the acquired call chain information is stored in the data collection server so that a user can call and analyze the call chain information.
Drawings
In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a call chain in an embodiment of the present application;
fig. 2 is a schematic flow chart of a link trace sampling method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating the determination of the sampling rate according to the sampling ratio in the embodiment of the present application;
FIG. 4 is a schematic flow chart illustrating the determination of the sampling rate according to the sampling rate in the embodiment of the present application;
FIG. 5 is a schematic flow chart illustrating the determination of the characteristic determination condition according to the processing duration in the embodiment of the present application;
FIG. 6 is a schematic flow chart illustrating the determination of the characteristic determination condition according to the response code in the embodiment of the present application;
fig. 7 is a schematic flow chart illustrating that the sampling characteristic does not satisfy the characteristic determination condition in the embodiment of the present application;
FIG. 8 is a schematic flow chart illustrating deletion of call segment data according to survival time in an embodiment of the present application;
FIG. 9 is a schematic diagram of a processing flow of a related call link in an embodiment of the present application;
FIG. 10 is a block diagram of a link trace sampling system according to an embodiment of the present application;
fig. 11 is a general flowchart of a link trace sampling method according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present application. But merely as exemplifications of systems and methods consistent with certain aspects of the application, as recited in the claims.
The link tracking sampling method provided by the application can be applied to a micro-service cloud platform. The cloud platform is a network platform composed of a plurality of nodes. In the cloud platform, each node may deploy a network device having a data processing function, such as a computer, a server, and the like. Different nodes are configured with different data processing programs for realizing specific data calculation functions. According to the practical application, the node can be configured as a client and/or a server, and participate in different data processing processes in the role of the client and/or the server. The network devices deployed on different nodes can also be further configured with different application programs to realize different services.
For example, the node 1 is configured with a service a, and the function corresponding to the service a may be provided for the data processing procedure. Node 2 is configured as a client, which can be accessed by sending an access request to node 1 when node 2 has the needs of service a. After receiving the access request, the node 1 enables the node 2 to send the data to be processed to the node 1 by verifying the connection relationship with the node 2. And calculating and processing the data through the service A on the node 1 to realize the corresponding function of the service A, and finally feeding back the processing result to the node 2 to finish one-time service calling. Meanwhile, the node 1 may also serve as a client to initiate an access request to other nodes, so as to use services corresponding to other nodes.
Because the nodes are dispersed at various positions of the cloud platform network, and the realization of one total function usually needs to call a plurality of services on a plurality of nodes, and each node can be used as a client and/or a server, the number of times of service call and the data volume in the cloud platform are very large. Therefore, a problem with any one service call will result in an error in the overall function implementation. In order to ensure effective implementation of the overall function, in the software development process corresponding to the cloud platform and subsequent operation and maintenance links, data processing conditions under each service corresponding to the overall function need to be sampled, and the sampled data is analyzed so as to determine which link has a problem.
In this embodiment, the sampled data used for analysis is referred to as a call chain (Trace), and the call chain is a directed acyclic graph composed of a plurality of call fragments (Span). The calling segment records the operation name (such as interface identification) and the start time of one cross-service call, which can reflect the information of the calling process. As shown in fig. 1, in the opentracking standard, a call chain may include multiple call fragments, and parent-child relationships or context relationships exist among the multiple call fragments. For example, the call segment Span c may be a sub-segment of the call segment Span a; the calling fragment Span F, the calling fragment Span G and the calling fragment Span H have a front-back relationship in sequence.
A service on a node may be invoked multiple times in a functional implementation. For example, if the function needs to call the login service on the node 1, the function needs to send an access request to the node 1 to establish a connection; then, sending login registration information to the login service of the node 1 for registration; and then the login authentication information is sent to the login service of the node 1 for authentication, and an authentication result is obtained. One call fragment is recorded for each call to the service. Thus, multiple call fragments may be recorded for a particular service. Accordingly, in a call chain, a plurality of call fragments belonging to a plurality of services may be included. In the sampling process, a plurality of calling segments forming a calling chain are collected and sampled, so that operation and maintenance personnel can analyze the complete calling chain to determine a fault point.
In view of the large data volume of the whole call chain, in the technical scheme provided by the application, a new concept can be abstracted: a link (Segment) is called. The calling links represent all calling segments created by a service when a calling chain passes through the service. Namely, a calling chain can comprise a plurality of calling links; in one calling link, several calling segments can be included. The method and the device take each calling link as a reporting unit, are different from the method and the device for reporting the calling fragments one by one, and can realize the reporting of a plurality of calling fragments created by a certain service at one time.
In order to sample the link tracking, the application provides a link tracking sampling system, which comprises a data acquisition client, a data collection server and the like. The data acquisition client is integrated with the application program, deployed in different nodes and responsible for generating calling fragment information, analyzing upstream calling chain information, transmitting the calling chain information to a downstream and reporting sampling report information to the data collection server. The data collection server is responsible for receiving the information reported content of the client, storing the information persistently, and providing a query interface so that operation and maintenance personnel can call and analyze the reported data.
Link tracking sampling can be realized through cooperative control between the data acquisition client and the data acquisition server. As shown in fig. 2, the link trace sampling method provided by the present application includes the following steps:
s1: the initialization generates call chain information.
In practical applications, when a request arrives at a service, call chain information may be generated for the request initialization. The generated call chain information may contain a unique call chain ID. The call chain ID is a numerical identifier set according to a request order and used for distinguishing each call chain. The calling chain information comprises a plurality of calling links; each of the calling links includes a plurality of calling segments belonging to the same service.
In order to generate the calling chain information, the data acquisition client can gradually add the calling segment to the calling chain information according to the service triggered by the request; with the addition of the calling segments, the data acquisition client can also record the starting time, the ending time and the access result identification of each calling segment of the triggered service to form a calling link.
S2: and judging whether the calling link meets the sampling rate judgment condition or not according to the triggering request sequence corresponding to the calling link.
After the call chain information is generated through initialization, a set sampling rate judgment condition can be obtained at the beginning of a call link in the call chain, and whether to sample the request or not is judged according to the sampling rate judgment condition. The sampling rate determination condition may be, for example, sampling according to a sampling rate or sampling rate.
For example, the set sampling rate judgment condition is that sampling is performed every 100 requests, and the sampling rate judgment condition is satisfied when the request sequence corresponding to the calling chain ID is the 1 st time according to the numerical value corresponding to the calling chain ID; and when the request sequence corresponding to the calling chain ID is from 2 to 100 times, the sampling rate judgment condition is not met.
S3: and if the calling link meets the sampling rate judgment condition, adding a sampling identifier for the request, and reporting the calling link added with the sampling identifier.
For example, when the calling chain ID is ×× 00201, the corresponding request sequence is 201 th time, the calling link is judged to meet the sampling rate judgment condition, at this time, a sampling label can be added to the request of the calling link to represent that the current request needs to be sampled in the subsequent access process of the current service.
In practical application, for a request added with a sampling identifier, when the request triggers to query another service, the request carries the call chain ID and the sampling identifier. After another service receives the request, the data acquisition client finds that the request carries the call chain ID and requires sampling, and then the request is considered to be required to be acquired without judging according to the sampling rate judgment condition, so that the sampling judgment time and the system operation resource are saved. Meanwhile, for the calling link added with the sampling identifier, after all requests are finished, the calling link is reported to the data acquisition server.
S4: and if the calling link does not meet the sampling rate judgment condition, extracting the sampling characteristics in the calling link.
For example, when the calling link ID is ×× 00202, the corresponding request sequence is 202 th time, the calling link is judged not to meet the sampling rate judgment condition, and the current calling link is not required to be proportionally sampled.
The extracted sampling features may include various operating parameters that can reflect actual service processes, such as: processing duration, response code, etc. And aiming at the extracted sampling features, analyzing one by one according to the sequence of the influence degree of the sampling features on the service processing result from high to low, and determining whether the sampling features meet the feature judgment condition. When all the extracted sampling features do not meet the feature judgment condition, namely all the acquired features do not show abnormal conditions, the current calling link can be determined to have no abnormal conditions, and therefore the calling link does not need to be reported to the data acquisition server.
S5: and adding an abnormal identifier for the calling link of which the sampling characteristics meet the characteristic judgment condition, and reporting the calling link added with the abnormal identifier.
If any collection characteristic meets the preset characteristic judgment condition, the fact that the current calling link possibly has an abnormal condition is determined, so that an abnormal identifier can be added to the calling link, and the calling link added with the identifier is reported to the data collection server side.
According to the technical scheme, after the call chain information is generated through initialization, whether the sampling rate condition is met or not is judged through the request sequence, and sampling is conducted based on the sampling rate condition; and judging whether the characteristic judgment condition is met or not through the sampling characteristic, and sampling according to the characteristic judgment condition so as to report the calling chain information needing to be sampled. According to the method, on the basis of sampling according to the sampling rate, feature sampling is added, so that the problem of huge sampling data can be solved, and request data with specified abnormal features can be collected so as to facilitate fault detection.
Since the call chain ID will appear as a larger value as the cloud platform runs, in order to determine which call chains satisfy the sampling rate determination condition, in one implementation, the sampling rate determination condition includes a sampling ratio. That is, as shown in fig. 3, the step of determining whether the request meets the sampling rate determination condition according to the trigger request sequence corresponding to the calling link further includes:
s211: acquiring a calling chain ID and a proportion numerical value;
s212: performing modular operation on the calling chain ID according to the proportion value to obtain a modular result value;
s213: and if the modulus result value is equal to 1, determining that the request meets the sampling rate judgment condition.
Since the cloud platform typically does not deploy a device dedicated to counting requests, the order of the corresponding requests can be determined by obtaining the call chain ID. And meanwhile, a numerical value corresponding to the sampling proportion is determined by reading a set sampling rate judgment condition. And performing modular operation on the ID of the calling chain according to the proportional numerical value to obtain a modular result value. And finally, determining whether the current calling chain ID is in the sampling ratio corresponding to the sampling request sequence capable of sampling or not by judging whether the modulus result value is equal to 1 or not, thereby determining whether the request meets the sampling rate judgment condition or not.
For example, if the sample rate is set to 1/100, the scale value is 100. When the acquired call chain ID is 000201, a mode of taking a modulus of the call chain ID is used, that is, a modulus of 100 is taken for the call chain ID "000201", mod (000201, 100) can be calculated to be 1, and a modulus result value is obtained to be equal to 1, it is determined that the current call link meets a sampling rate judgment condition, and the current call link needs to be sampled.
In one implementation, the sampling rate determination condition includes a sampling rate; that is, as shown in fig. 4, the step of determining whether the request satisfies the sampling rate determination condition according to the trigger request sequence corresponding to the calling link includes:
s221: acquiring a unit sampling time sampling quota;
s222: and if the request sequence does not exceed the sampling quota within the unit sampling time, determining that the request meets the sampling rate judgment condition.
The sampling rate refers to the number of sampling times in a unit time, and the sampling quota in the unit sampling time can be determined according to the set sampling rate. And reporting the number of the sampling quota calling chains in a sampling period of unit time according to the sampling quota. Therefore, if the request sequence does not exceed the sampling quota within the unit sampling time, the request needs to be sampled, that is, it is determined that the request meets the sampling rate judgment condition.
For example, 100 samples per second are configured, the sampling quota 100 is set at the beginning of each second, and then, each sample, the sampling quota is decremented by 1 until the sampling quota is decremented to 0, and then, no further sampling is performed on subsequent requests. And when the next second starts (next sampling period), the sampling quota is reset to 100, and the above sampling process is executed in a loop.
In the above embodiment, two sampling rate determination conditions are schematically provided, and in practical application, the sampling rate determination conditions are not limited to the above two implementation manners, and may also be implemented in other manners. For example, a sampling time may be set, and when the cloud platform runs to the sampling time, sampling is performed for all requests currently running. Other sampling rate determination conditions, which are suggested by those skilled in the art without creative efforts based on the above sampling rate determination conditions, are within the scope of the present application.
In addition, in practical application, one sampling rate judgment condition or a combination of multiple sampling rate judgment conditions can be set according to the set automatic sampling strategy so as to meet different analysis requirements. For example, when the operation and maintenance personnel perform load pressure analysis of the cloud platform in unit time, the sampling rate can be used as a sampling rate judgment condition; when the operation and maintenance personnel carry out software reliability analysis, the sampling proportion can be used as a sampling rate judgment condition; when the operation and maintenance personnel analyze the stability of the cloud platform, sampling time can be used as a sampling rate judgment condition; when the operation and maintenance personnel comprehensively analyze the cloud platform and the software, the sampling rate judgment conditions can be deployed at the data sampling client, and sampling can be performed when the calling chain meets any sampling rate judgment condition.
In an implementation manner, if the sampling identifier is added to any one of the call links in the call chain information, the whole call chain information is reported. Because the same calling chain can successively call a plurality of services, namely a plurality of calling links are correspondingly included. For the purpose of analysis, when sampling rate control sampling is performed, generally, once the first service hits the sampling rate judgment condition, all subsequent services involved in the call chain need to be sampled with sampling rate control sampling.
For example, when the request reaches the service a, the data sampling client finds that the request does not carry any call chain identification information, and judges whether to sample according to the sampling rate judgment condition. If the judgment result is sampling, when the service A initiates a request for calling the service B, the data acquisition client adds a message header for the request, which represents that the request needs to be sampled subsequently. After the service B receives the request, the data acquisition client finds that the request hits the sampling rate in the last service through detection, and then sampling is directly performed without secondary judgment. In the sampling process, after the service A and the service B are processed, the calling link is reported to the data collection server side, so that subsequent operation and maintenance personnel can see a complete calling chain.
According to the scheme, the link tracking sampling method can be used for preliminarily screening the call chains according to the sampling rate judgment condition, and partial analysis requirements can be met without reporting all call chain data. Meanwhile, relatively random call chain data can be reported through the sampling rate judgment condition so as to meet the requirements of operation and maintenance personnel on processing such as steady state analysis.
In practical application, if the calling link does not hit the sampling rate judgment condition, it is determined that the calling link does not need to be reported according to a sampling rate mode, but in order to obtain calling chain information which is relatively concerned by operation and maintenance personnel, the calling link also needs to be subjected to characteristic judgment, and whether the calling link is the calling chain information which is relatively concerned by a user or not is analyzed. Wherein, the characteristic judgment needs to preset a specific characteristic judgment condition. That is, as shown in fig. 5, in an implementation manner, the sampling characteristic includes a processing time length, and the characteristic determination condition includes a preset time length threshold. If the calling link does not meet the sampling rate judgment condition, the method further comprises the following steps:
s411: acquiring the processing time of the calling link without the sampling identifier;
s412: and if the processing time length exceeds a preset time length threshold value, determining that the sampling characteristics meet a characteristic judgment condition.
For a calling link which does not hit the sampling rate judgment condition, a sampling identifier is not added, and the processing duration of the calling link can be recorded to be used as a sampling characteristic. Generally, if the processing time of one service exceeds the design time of the microservice, there is a high possibility of internal errors, such as program running errors, network delays being too high, and the like. Therefore, the sampling characteristic of the calling link can be judged after the processing time length is recorded, and if the processing time length exceeds a preset time length threshold value, the sampling characteristic is determined to meet the characteristic judgment condition.
For example, if the preset time threshold is 100ms and the processing time of the calling link is 500ms, the processing time exceeds the preset time threshold, which indicates that the processing time of the current service is too long and there is a possibility of an abnormal condition, that is, it is determined that the sampling characteristic meets the characteristic judgment condition, and the calling link with the long processing time needs to be reported to the data acquisition server for analysis by operation and maintenance personnel.
In one implementation, the sampling feature further includes a response code, and the feature determination condition includes an abnormal response code interval; as shown in fig. 6, if the calling link does not satisfy the sampling rate determination condition, the method further includes:
s421: acquiring a response code of the calling link without adding a sampling identifier;
s422: and if the response code is in the abnormal response code interval, determining that the sampling characteristic meets a characteristic judgment condition.
The response code is a specific code capable of reflecting the execution result of the application program, and for any called network service, a response code is fed back to the original client after calling to represent the execution result of the service. Because different response codes reflect different execution results, when the response codes are in an abnormal response code interval, an execution error exists in the current service, and the corresponding call chain information is information which is relatively concerned by operation and maintenance personnel, namely the sampling feature is determined to meet the feature judgment condition.
The execution results represented by the response code are different based on different transport layer protocols. Taking the HTTP protocol as an example, when the feedback response code is 100-101, the information prompt is represented; 200- > 206 indicates success; 300-305 represents redirection; 400-415 represent error type information, which is a client error; 500-505, the error type information is a server-side error. If the obtained response code is 404, it is determined that the response code of the current calling link represents error information in the interval of 400 + 415, and therefore it is determined that the sampling characteristic meets the characteristic judgment condition and the calling chain information corresponding to the calling link needs to be reported.
Since the response code indicating the error type information may reflect the client error or the server error, and both error types are relatively concerned by the operation and maintenance personnel, and one response code may be obtained by invoking the service once, in practical application, if any invocation segment response code is an abnormal response code (i.e., in the abnormal response code interval 400-.
And for the sampling characteristics which do not meet the characteristic judgment condition, namely the service process corresponding to the current calling link has no abnormal condition, and the sampling characteristics do not represent that the calling link is not reported. In practical applications, a request may be normal in a call link, but abnormal in a subsequent link. For analysis of the operation and maintenance personnel, as long as one link is abnormal, information of all links should be reported, so that the operation and maintenance personnel can obtain complete call chain information to accurately determine the reason of the abnormality. Therefore, even if one calling link does not meet the characteristic judgment condition, the judgment conditions of other calling links need to be waited, so as to determine whether to report abnormal information.
Therefore, in one implementation, as shown in fig. 7, if the sampling feature does not satisfy the feature determination condition, the method further includes:
s431: adding a waiting identifier for the calling link which does not meet the characteristic judgment condition;
s432: and storing the calling link added with the waiting identifier.
When the sampling characteristic does not meet the special judgment condition, a waiting mark can be added to the calling link which does not meet the characteristic judgment condition, and the waiting mark can indicate that a data acquisition client temporarily stores the calling link data, so that the characteristic judgment results of other calling links belonging to the same calling chain with the calling link are waited.
For example, when the request reaches the service a, if the calling link does not hit the sampling rate judgment condition or the characteristic judgment condition, the waiting identifier is added to the calling link and temporarily stored in the data acquisition client. When the request reaches the service B, the call link corresponding to the service B can be judged. Because the current calling chain is determined in the service A link without carrying out sampling rate judgment, the sampling rate judgment can be skipped to directly carry out characteristic judgment on the calling link of the service B. And if the service B hits the characteristic judgment condition, reporting the calling link data corresponding to the service B to the data collection server, and simultaneously reporting the temporarily stored calling link data corresponding to the service A to the data collection server.
In addition, after determining that the calling link data corresponding to the service B needs to be reported, if the request needs to call the service C after the service B, after the request reaches the service C, the calling link data corresponding to the service C can be directly reported by skipping the characteristic judgment, so that the calling links corresponding to the service A and the service B form complete calling chain information.
According to the technical scheme, the feature judgment can be respectively carried out when different services are reached according to the request, so that when the calling link corresponding to any service needs to be reported, the calling link data corresponding to all the services in the whole calling link are reported, and operation and maintenance personnel can carry out comprehensive analysis conveniently. Through the independent analysis of each calling link, the abnormal position can be accurately determined, and the subsequent analysis is facilitated.
In practical application, because the number of requests related in the cloud platform is large, and the data volume of the corresponding calling link is large, in order to temporarily store the calling link added with the waiting identifier, the data acquisition client needs to have enough buffer space to store the calling link added with the waiting identifier. But may still not accommodate a large amount of data storage, and therefore, in one implementation, as shown in fig. 8, the method further comprises:
s441: acquiring the stored survival time of the calling link;
s442: and if the survival time exceeds a preset survival threshold, deleting the stored calling link.
The calling link data temporarily stored in the data acquisition client can automatically detect the survival time of the calling link data in the cache, and once the survival time exceeds a preset survival threshold, the calling link data is deleted to release the cache space, so that the data acquisition client can store the data of other calling links.
For example, the calling link corresponding to the request Q in the service a is judged as a miss characteristic judgment condition, the calling link data corresponding to the service a is stored in the cache of the data acquisition client, the survival time of the calling link data is set to 10min, after the calling link data corresponding to the service a is cached, if all the calling links of the request Q do not meet the characteristic judgment condition within 10min, the calling link corresponding to the service a is deleted from the cache of the data acquisition client, and therefore the cache space of the data acquisition client is released.
It should be noted that the duration corresponding to the preset survival threshold is required to ensure that all call links of the whole call chain are judged to be completed, that is, the preset survival threshold exceeds the total processing duration of one complete call chain. Therefore, the survival threshold may be set according to the number of call links included in the whole call chain, that is, the number of services called by one request. Obviously, the greater the number of call legs that the call chain contains, the greater the set survival threshold should be. In addition, because the processing time lengths corresponding to different services are different, for example, for micro services such as push, prompt and the like, the processing time is very short; for micro services such as traversal and complex model calculation, the processing time is relatively long, so that the preset survival threshold value also needs to comprehensively consider the processing time corresponding to each service and reserve certain network delay time.
When the number of services included in the cloud platform is large, a certain amount of computing resources need to be consumed for setting the preset duration threshold, so as shown in fig. 9, in an implementation manner, it may be further determined whether to delete call link data in the cache by judging whether all the call links are completed on the same call chain, that is, the method further includes:
s451: acquiring a correlation calling link;
s452: if the sampling characteristics of any one of the associated calling links meet the characteristic judgment condition, reporting the associated calling link and the stored calling link;
s453: and if the sampling characteristics of all the associated calling links do not meet the characteristic judgment condition, deleting the associated calling links and the stored calling links.
After a calling link is stored, the associated calling link of the stored calling link can be further determined. And the related calling link and the stored calling link have the same calling chain ID. And for the calling links with the same calling chain ID, the calling links belong to the same calling chain, so that all the calling links are reported to the data collection server side as long as any one calling link judges that the characteristic judgment condition is met.
And when the sampling characteristics of all the associated calling links do not meet the characteristic judgment condition, directly deleting all the calling links under the calling chain ID, thereby releasing the cache space of the data acquisition client. For example, one call chain includes request-first calls: service a, service B, service C, and service D. The corresponding generated calling link SA, calling link SB, calling link SC and calling link SD have the same calling chain ID. After the calling link SA of the service A is generated, sampling judgment can be carried out on the calling link SA, and if the sampling rate judgment condition and the characteristic judgment condition are not hit, the calling link SA is stored in the cache of the data acquisition client. Similarly, the calling link SB corresponding to the service B and the calling link SC corresponding to the service C are respectively judged in the same way, and the calling links SB and SC are stored in the cache of the data acquisition client when the characteristic judgment conditions are not met.
And when the calling link SD corresponding to the service D is judged, if the calling link SD hits the characteristic judgment condition, determining that the calling link SD needs to be reported to the data collection server side. Meanwhile, the calling link SA, the calling link SB, and the calling link SC stored in the cache of the data acquisition client also need to be reported to the data collection server. If the calling link SD does not hit the characteristic judgment condition, the calling link is determined not to be reported to the data collection server, so that the calling link SD does not need to be cached, and meanwhile, the calling link SA, the calling link SB and the calling link SC stored in the cache of the data collection client are deleted to release the cache of the data collection client.
As can be seen, in the embodiment, the whole call chain is used as a judgment basis, and after the call chain is judged to be completed, the call link can be deleted in time, and the cache space of the data acquisition client is released in time. Meanwhile, in the embodiment, in the calling links corresponding to the same calling chain, only the first generated calling link stores the time corresponding to the whole judgment period in the cache of the data acquisition client, and the rest calling links do not need to store the whole judgment time, so that the occupation of the calling links on the cache space can be effectively relieved.
According to the technical scheme, after the call chain information is generated through initialization, whether the sampling rate judgment condition is met or not is judged through the request sequence, whether the characteristic judgment condition is met or not is judged through the sampling characteristic, sampling is conducted according to the characteristic judgment condition, and therefore the call chain information needing to be sampled is reported. According to the method, feature sampling is added on the basis of the sampling rate, so that the problem of huge sampling data can be relieved, and request data with specified abnormal features can be collected so as to carry out fault detection.
Based on the above link trace sampling method, the present application further provides a link trace sampling system, as shown in fig. 10, the system includes: the system comprises a data acquisition client and a data collection server; the data acquisition client is connected with the data acquisition server;
wherein, as shown in fig. 11, the data collection client is configured to:
s1: initializing and generating calling chain information; the calling chain information comprises a plurality of calling links; each calling link comprises a plurality of calling segments belonging to the same service;
s2: judging whether the calling link meets a sampling rate judgment condition or not according to a trigger request sequence corresponding to the calling link;
s3: if the calling link meets the sampling rate judgment condition, adding a sampling identifier for a request so as to report the calling link added with the sampling identifier to the data collection server;
s4: if the calling link does not meet the sampling rate judgment condition, extracting sampling features in the calling link;
s5: adding an abnormal identifier for the calling link of which the sampling characteristics meet the characteristic judgment condition so as to report the calling link added with the abnormal identifier to the data collection server;
the data collection server is configured to: and receiving and storing the calling link added with the sampling identifier and the calling link added with the abnormal identifier.
For example, after the data acquisition client detects that the request reaches service a, the sampling rate judgment condition and the feature judgment condition are judged by judging whether the request is missed. When the service A calls the service B, carrying the call chain ID of the request; and after receiving the request, the service B judges the condition according to the sampling rate, and marks that the calling link needs to wait for confirmation after the service B finishes processing the request if the characteristic judgment condition still does not hit. If the processing time of the service B is longer and exceeds 100ms set by the sampling characteristic judgment condition, after the service A receives the response of the service B, the service A judges that the characteristic judgment condition is hit, and the service A needs to report the condition to the data collection server side. Meanwhile, the service B synchronizes to the ID information of the latest calling chain, and reports the ID information to the data collection server together when finding that the calling link of the calling chain exists locally.
According to the technical scheme, the link tracking sampling system comprises a data acquisition client and a data collection server which are mutually connected, wherein the data acquisition client can judge call chain information through sampling rate conditions and characteristic judgment conditions after the call chain information is generated initially, and the call chain information is sampled when any condition is met, so that the data volume and the call characteristic acquisition call chain information are taken into consideration, and the acquired call chain information is stored in the data collection server so that a user can call and analyze the call chain information.
In a specific implementation, the present application further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the method provided in the present application when executed, and when the controller of the display device provided in the present application runs the computer program instructions, the controller executes the steps in which the controller is configured. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
Those skilled in the art will clearly understand that the techniques in the embodiments of the present application may be implemented by way of software plus a required general hardware platform. Based on such understanding, the technical solutions in the embodiments of the present application may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The same and similar parts in the various embodiments in this specification may be referred to each other. In particular, for the embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the description in the method embodiments.
The above-described embodiments of the present application do not limit the scope of the present application.

Claims (11)

1. A method for link trace sampling, comprising:
initializing and generating calling chain information; the calling chain information comprises a plurality of calling links; each calling link comprises a plurality of calling segments belonging to the same service;
judging whether the calling link meets a sampling rate judgment condition or not according to a trigger request sequence corresponding to the calling link;
if the calling link meets the sampling rate judgment condition, adding a sampling identifier for a request, and reporting the calling link added with the sampling identifier;
if the calling link does not meet the sampling rate judgment condition, extracting sampling features in the calling link;
and adding an abnormal identifier for the calling link of which the sampling characteristics meet the characteristic judgment condition, and reporting the calling link added with the abnormal identifier.
2. The link trace sampling method according to claim 1, wherein the sampling rate determination condition includes a sampling ratio; judging whether the request meets the sampling rate judgment condition according to the triggering request sequence corresponding to the calling link, wherein the step comprises the following steps:
acquiring a calling chain ID and a proportion numerical value; the calling chain ID is a digital identifier set according to a request sequence;
performing modular operation on the calling chain ID according to the proportion value to obtain a modular result value;
and if the modulus result value is equal to 1, determining that the request meets the sampling rate judgment condition.
3. The link trace sampling method according to claim 1, wherein the sampling rate determination condition includes a sampling rate; judging whether the request meets the sampling rate judgment condition according to the triggering request sequence corresponding to the calling link, wherein the step comprises the following steps:
acquiring a unit sampling time sampling quota;
and if the request sequence does not exceed the sampling quota within the unit sampling time, determining that the request meets the sampling rate judgment condition.
4. The link trace sampling method according to claim 1, wherein the sampling characteristic includes a processing duration, and the characteristic determination condition includes a preset duration threshold; if the calling link does not meet the sampling rate judgment condition, the method further comprises the following steps:
acquiring the processing time of the calling link without the sampling identifier;
and if the processing time length exceeds a preset time length threshold value, determining that the sampling characteristics meet a characteristic judgment condition.
5. The link trace sampling method according to claim 1, wherein the sampling characteristic includes a response code, and the characteristic determination condition includes an abnormal response code interval; if the calling link does not meet the sampling rate judgment condition, the method further comprises the following steps:
acquiring a response code of the calling link without adding a sampling identifier;
and if the response code is in the abnormal response code interval, determining that the sampling characteristic meets a characteristic judgment condition.
6. The link trace sampling method according to claim 1, wherein if the sampling characteristic does not satisfy a characteristic judgment condition, the method further comprises:
adding a waiting identifier for the calling link which does not meet the characteristic judgment condition;
and storing the calling link added with the waiting identifier.
7. The link trace sampling method according to claim 6, further comprising:
acquiring the stored survival time of the calling link;
and if the survival time exceeds a preset survival threshold, deleting the stored calling link.
8. The link trace sampling method according to claim 6, further comprising:
acquiring a correlation calling link; the related calling link and the stored calling link have the same calling chain ID;
if the sampling characteristics of any one of the associated calling links meet the characteristic judgment condition, reporting the associated calling link and the stored calling link;
and if the sampling characteristics of all the associated calling links do not meet the characteristic judgment condition, deleting the associated calling links and the stored calling links.
9. The link trace sampling method according to claim 1, further comprising:
adding the calling segment to the calling chain information according to the service triggered by the request;
and recording the starting time, the ending time and the access result identification of each calling segment of the triggered service to form the calling link.
10. The link trace sampling method according to claim 1, wherein if the calling segment satisfies the sampling rate determination condition, the method further comprises:
and if the sampling identifier is added to any calling link in the calling chain information, reporting the whole calling chain information.
11. A link trace sampling system, comprising: the system comprises a data acquisition client and a data collection server; the data acquisition client is connected with the data acquisition server;
wherein the data collection client is configured to:
initializing and generating calling chain information; the calling chain information comprises a plurality of calling links; each calling link comprises a plurality of calling segments belonging to the same service;
judging whether the calling link meets a sampling rate judgment condition or not according to a trigger request sequence corresponding to the calling link;
if the calling link meets the sampling rate judgment condition, adding a sampling identifier for a request so as to report the calling link added with the sampling identifier to the data collection server;
if the calling link does not meet the sampling rate judgment condition, extracting sampling features in the calling link;
adding an abnormal identifier for the calling link of which the sampling characteristics meet the characteristic judgment condition so as to report the calling link added with the abnormal identifier to the data collection server;
the data collection server is configured to: and receiving and storing the calling link added with the sampling identifier and the calling link added with the abnormal identifier.
CN202010254461.7A 2020-04-02 2020-04-02 Link tracking sampling method and system Active CN111478806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010254461.7A CN111478806B (en) 2020-04-02 2020-04-02 Link tracking sampling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010254461.7A CN111478806B (en) 2020-04-02 2020-04-02 Link tracking sampling method and system

Publications (2)

Publication Number Publication Date
CN111478806A true CN111478806A (en) 2020-07-31
CN111478806B CN111478806B (en) 2022-10-14

Family

ID=71749852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010254461.7A Active CN111478806B (en) 2020-04-02 2020-04-02 Link tracking sampling method and system

Country Status (1)

Country Link
CN (1) CN111478806B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302249A (en) * 2020-09-22 2022-04-08 华为云计算技术有限公司 Transmission chain information generation method and device
CN116016262A (en) * 2022-12-28 2023-04-25 天翼云科技有限公司 Method and device for detecting call chain consistency in real time based on union
CN116471213A (en) * 2023-06-09 2023-07-21 北京随信云链科技有限公司 Link tracking method, link tracking system and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104219316A (en) * 2014-09-12 2014-12-17 微梦创科网络科技(中国)有限公司 Method and device for processing call request in distributed system
CN107423433A (en) * 2017-08-03 2017-12-01 聚好看科技股份有限公司 A kind of data sampling rate control method and device
CN108900640A (en) * 2018-08-13 2018-11-27 平安普惠企业管理有限公司 Node calls link generation method, device, computer equipment and storage medium
CN110401579A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 The full link data method of sampling, device, equipment and storage medium based on hash table
CN110474812A (en) * 2019-08-22 2019-11-19 中国工商银行股份有限公司 Sample rate self-adapting regulation method and device
CN112866010A (en) * 2021-01-04 2021-05-28 聚好看科技股份有限公司 Fault positioning method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104219316A (en) * 2014-09-12 2014-12-17 微梦创科网络科技(中国)有限公司 Method and device for processing call request in distributed system
CN107423433A (en) * 2017-08-03 2017-12-01 聚好看科技股份有限公司 A kind of data sampling rate control method and device
CN108900640A (en) * 2018-08-13 2018-11-27 平安普惠企业管理有限公司 Node calls link generation method, device, computer equipment and storage medium
CN110401579A (en) * 2019-06-18 2019-11-01 平安科技(深圳)有限公司 The full link data method of sampling, device, equipment and storage medium based on hash table
CN110474812A (en) * 2019-08-22 2019-11-19 中国工商银行股份有限公司 Sample rate self-adapting regulation method and device
CN112866010A (en) * 2021-01-04 2021-05-28 聚好看科技股份有限公司 Fault positioning method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114302249A (en) * 2020-09-22 2022-04-08 华为云计算技术有限公司 Transmission chain information generation method and device
CN116016262A (en) * 2022-12-28 2023-04-25 天翼云科技有限公司 Method and device for detecting call chain consistency in real time based on union
CN116471213A (en) * 2023-06-09 2023-07-21 北京随信云链科技有限公司 Link tracking method, link tracking system and medium
CN116471213B (en) * 2023-06-09 2023-09-15 北京随信云链科技有限公司 Link tracking method, link tracking system and medium

Also Published As

Publication number Publication date
CN111478806B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN111478806B (en) Link tracking sampling method and system
US9571373B2 (en) System and method for combining server side and network side transaction tracing and measurement data at the granularity level of individual transactions
US8495006B2 (en) System analysis program, system analysis method, and system analysis apparatus
CN108400909B (en) Traffic statistical method, device, terminal equipment and storage medium
US7647418B2 (en) Real-time streaming media measurement system and method
CN111176941B (en) Data processing method, device and storage medium
CN111064780B (en) Multitask content updating method, device, equipment and medium
CN108462598A (en) A kind of daily record generation method, log analysis method and device
CN112416708B (en) Asynchronous call link monitoring method and system
CN109739708A (en) Test the methods, devices and systems of pressure
CN111737207A (en) Method and device for displaying and collecting logs of service nodes in distributed system
CN109783284A (en) Information acquisition method, system and server, computer readable storage medium
US8732323B2 (en) Recording medium storing transaction model generation support program, transaction model generation support computer, and transaction model generation support method
CN109559121B (en) Transaction path call exception analysis method, device, equipment and readable storage medium
CN109409948B (en) Transaction abnormity detection method, device, equipment and computer readable storage medium
CN112257065A (en) Process event processing method and device
KR101968575B1 (en) Method for automatic real-time analysis for bottleneck and apparatus for using the same
CN114598622B (en) Data monitoring method and device, storage medium and computer equipment
KR100875912B1 (en) Apparatus and method for processing network event processing network events in open environment
CN111625412A (en) Flume-based data acquisition method, system, device and storage medium
JP2005157727A (en) Log processing method, and processing program and execution system of the same
CN114928523B (en) 5G OAM-based network element upgrading method and upgrading device
CN117312104B (en) Visual link tracking method and system based on airport production operation system
CN112671822B (en) Service request processing method, device, storage medium, server and system
CN115396343B (en) Front-end page performance detection method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant