WO2021058104A1 - Span categorization - Google Patents

Span categorization Download PDF

Info

Publication number
WO2021058104A1
WO2021058104A1 PCT/EP2019/076076 EP2019076076W WO2021058104A1 WO 2021058104 A1 WO2021058104 A1 WO 2021058104A1 EP 2019076076 W EP2019076076 W EP 2019076076W WO 2021058104 A1 WO2021058104 A1 WO 2021058104A1
Authority
WO
WIPO (PCT)
Prior art keywords
span
spans
features
nodes
trie
Prior art date
Application number
PCT/EP2019/076076
Other languages
English (en)
French (fr)
Inventor
Jorge Cardoso
Ilya SHAKHAT
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to CN201980099750.XA priority Critical patent/CN114730280A/zh
Priority to PCT/EP2019/076076 priority patent/WO2021058104A1/en
Publication of WO2021058104A1 publication Critical patent/WO2021058104A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring

Definitions

  • This invention relates to categorizing data, for example for analysis in microservice applications, using real-time distributed traces and spans, online model learning and machine learning.
  • Traces are composed of spans. They contain information that can be used to detect anomalies and performance bottlenecks at runtime.
  • a span S t is a vector of property-value pairs (; Pi, v t ) describing the state, response time, and/or other characteristics of a microservice at a given time t.
  • spans of the same type are grouped together to form a time series ⁇ 5 1 S 2 , . . . , S T ], where the subscript t indicates time. Time series of spans may conveniently be used to analyse microservice applications.
  • pre-processing can be used to remove the scheme (i.e., “http”), the host (i.e., “192.168.0.12”). and the port number (i.e., “:5002”).
  • distance or string-based matching methods are used to group spans with the same pre-processed URL.
  • the immediate limitation is that when the path /v2/AB12345/instance/98CD765/delete contains so-called path parameters, existing approaches generate too many groups. As a negative consequence, many time series are also generated. This occurs because AB12345 is a customer ID and 98CD765 is a resource ID. Both are variables. Often, these path parameters have no effect on the response time of remote procedure calls and it is desirable to ignore them.
  • Distance metrics can also be used to classify data.
  • CLUE Clustering for Mining Web URLs
  • ITC 28 International Teletraffic Congress
  • a system for categorizing spans in microservice applications comprising a span categorization module configured to: receive a plurality of spans; extract a plurality of features from each span; and categorize the plurality of spans into a plurality of categories in dependence on the extracted features.
  • the system may further comprise a span buffer configured to select a set of k spans from the plurality of spans, the span buffer being configured to repeatedly perform the steps of: receiving an i-th span from the plurality of spans; randomly selecting that i-th span with a probability of k/i; and if that i-th span is selected, storing it as one of the k spans. This may allow a representative sample of spans to be analysed if there are insufficient computational resources available to analyse all incoming spans. This may allow the system to handle span bursts.
  • the system may further comprise a learning module configured to perform the following steps: extract a plurality of features from the plurality of spans; form a series of span abstractions corresponding to the subset of spans, each span abstraction comprising the features of a respective span; form a span trie from the series of span abstractions by mapping each span abstraction to a series of corresponding nodes in the trie; identify nodes in the trie whose incoming edge frequency is greater than a predetermined threshold; and select the features of the paths to each such node as a pattern and storing that pattern for future detection.
  • This may allow the features to be extracted from the spans in order to categorize them to be identified. This may also allow for progressive learning, as span categories may be learned as new spans are observed by the system.
  • the plurality of spans may be the set of k spans selected by the span buffer. This may allow the system to learn from a representative sample of spans to be analysed if there are insufficient computational resources available to learn from all incoming spans. This may allow the system to handle span bursts.
  • the span categorization module may be configured to extract features from the plurality of spans corresponding to the features of the pattern.
  • the pattern may correspond to a category and the span categorization module may be configured to categorize spans into that category by matching the extracted features of spans to the features of the pattern. Therefore, the system may learn the features to extract in order to meaningfully categorize the plurality of spans.
  • the learning module may be configured to cease mapping spans abstractions to the trie when the rate of change of the number of such nodes is approximately zero. The approach may therefore automatically stop when the change rate derivative converges. This may provide an effective stop condition preventing the system from analysing redundant spans which do not add additional information to the knowledge model.
  • the predetermined threshold may be the mean of the incoming edge frequencies of all of the nodes in the span trie. This may be a convenient critereon for determining schema nodes.
  • Each span may be a vector of property-value pairs describing the state of a microservice at a given time, and the spans in each category form a time series of spans.
  • the system may be further configured to assign a time series ID to the spans in each category. This allows each category to represent a time series of spans. The system may therefore be used, for example, for time series analysis.
  • the extracted features may comprise components of a URL.
  • the extracted features may comprise at least one of a method, URL endpoint, path or port. This may allow for improved processing efficiency and the system may achieve a low running time complexity. This makes the approach scalable and suitable to efficiently process spans originating from large-scale microservice applications.
  • the port number of the spans may be used as the root node of the span trie. This may be a convenient implementation.
  • a method for categorizing spans in microservice applications comprising: receiving a plurality of spans; extracting a plurality of features from each span; and categorizing the plurality of spans into a plurality of categories in dependence on the extracted features.
  • a method for selecting a set of k spans from a plurality of spans, each span being a vector of property-value pairs comprising repeatedly performing the steps of: receiving an i-th span from the plurality of spans; randomly selecting that i-th span with a probability of k/i; and if that i-th span is selected, storing it as one of the k spans.
  • This may allow a representative sample of spans to be analysed if there are insufficient computational resources available to analyse all incoming spans. This may allow the system to handle span bursts.
  • a method for determining a pattern of features for extracting from a span in a microservice application comprising: extracting a plurality of features from a plurality of spans; forming a series of span abstractions corresponding to the spans, each span abstraction comprising the features of a respective span; forming a span trie from the series of span abstractions by mapping each span abstraction to a series of corresponding nodes in the trie; identifying nodes in the trie whose incoming edge frequency is greater than a predetermined threshold; and selecting the features of the paths to each such node as a pattern and storing that pattern for future detection.
  • This may allow the features to be extracted from the spans in order to categorize them to be identified. This may also allow for progressive learning, as span categories may be learned as new spans are observed by the system.
  • a computer program that, when executed by a computer, causes the computer to perform the methods described above.
  • the computer program may be provided on a non-transitory computer readable storage medium.
  • Figure 1 illustrates an example of the architecture of a span categorization system and its relationships to the context environment.
  • Figure 2 shows an example properties-value pairs in a span.
  • Figure 3 shows an example of a span categorization system.
  • Figure 4 illustrates the interaction between the span buffer, the tracing server and the span extractor.
  • Figure 5 illustrates a method for selecting a set of k spans from a plurality of spans in the span buffer.
  • Figure 6 illustrates the step of a progressive learning process to extract patterns for a plurality of spans.
  • Figure 7 shows an example of the structure of a HTTP endpoint.
  • Figure 8 shows examples of span abstractions.
  • Figure 9 illustrates port indexing
  • Figures 10(a)-(c) show a span trie constructed using three qualified paths (q. paths).
  • Figure 11 shows a span trie with terminal nodes.
  • Figure 12 illustrates identifying schema and parameter nodes in a span trie.
  • Figure 13 illustrates the relative change of node types as more spans are analysed.
  • Figures 14(a) and 14(b) illustrate reducing a span trie.
  • Figure 15 illustrates a method for determining a pattern of features for extracting from a span.
  • Figure 16 illustrates a method of categorizing spans. DETAILED DESCRIPTION OF THE INVENTION
  • the present invention relates to a progressive learning technique for span categorization to assign distinct, but related, spans into categories which dynamically adapts as new categories appear in span streams. For example, spans with similar URLs are assigned to the same category since they have inherently a similar behaviour.
  • Categorization is closely related to the concept of classification.
  • categorization refers to the process of dividing the open-world of spans into groups whose members are in some way similar.
  • classification refers to the process of assigning elements to classes in an existing classification system.
  • Figure 1 shows the context architecture of the proposed categorization system 100, particularly its relationship with a microservice application 101 , a tracing server 102 and a data and processing tier 103.
  • the system 100 is particularly suitable for use with large-scale, complex distributed systems implemented by following a microservice architecture, which is an architecture style for developing software systems exposed as (micro)services interconnected by a network.
  • a microservice application is shown at 101 in Figure 1 .
  • Services interact with other services using an inter-process communication protocol such as HTTP, AMQP, or a binary protocol such as TCP.
  • the system described herein exemplifies its use with the HTTP protocol and REST, but it is not limited to them.
  • Distributed tracing may be used to monitor the microservice application. For each request, detailed statistics, metrics, and logging data is generated so that operators can understand distributed traffic flow and debug problems as they occur. Tracing infrastructures generate so-called traces which are collections of spans. Each span describes the state of a microservice application when an important milestone started or completed its execution. As an example, consider spans which capture the response time of communication between microservices. Spans may be collected, grouped, and analysed using statistical and machine learning algorithms with one of the following objectives in mind: transaction monitoring, root cause analysis, service dependency analysis, and performance optimization. The analysis may be achieved by ordering spans to form a time series using timestamps for the x-axis and a value of a property, such as intra-service call response time, for the y-axis.
  • a property such as intra-service call response time
  • the microservices of microservice application 101 communicate by calling other microservices via endpoints.
  • the tracing library client 104 generates traces which are pathways showing how the handling of requests was conducted. Traces can be generated using distributed tracing systems (for example, Jaeger or Zipkin) or traditional logging (form example, Unix syslog).
  • the tracing server module 102 receives tracing data generated by the microservices via tracing library clients. The received data is in the form of a span. Each span contains several property-value (; Pi , v t ) pairs.
  • Figure 2 shows the following properties as examples: trace id, span id, timestamp, application name, method, protocol, endpoint, response time, req_msg_size, rsp_msg_size and result_code.
  • the spans associated with each category which in this example are assigned a time series id, are stored in a database.
  • the database is a time-series database (TSDB).
  • TSDB time-series database
  • artificial intelligence and statistical methods are applied to the time series for operations such as transaction monitoring, root cause analysis, service dependency analysis and performance optimization.
  • This module may conveniently implement various behaviour change detection and anomaly detection algorithms, using parallel data analysis frameworks such as Spark for efficient processing.
  • Each span generated by the tracing infrastructure has an invocation endpoint. If a data centre has 50,000 servers, each server has 10 services, and each service has 10 endpoints, which can be called with 10 different parameters, then there are 50 million different invocation endpoints which need to be monitored and analysed. If no specific method is used, the invocation endpoints can be used as a unique identifier for a time series to store the spans of an endpoint. Nonetheless, this has a major drawback, since it generates too many time series, on the one hand, and causes many time series to only contain a few observations since they get diluted into a large pool of time series.
  • the span categorization system described herein is an efficient implementation which addresses this problem by assigning spans to categories, which may optionally afterwards be assigned to time series.
  • Figure 3 shows the overall architecture of the categorization system 300.
  • the components of the system include span buffer 301 , progressive learning module 302, span categorization module 303, span extractor 304, model constructor module 305, model adaptor module 306, pattern extractor module 307 and span collector 308. The operation of these components will now be described in the following sections.
  • the span buffer 301 is a component shared by the tracing server (102 in Figure 1 ) and by the span extractor 304 to mediate the different processing speeds so that these two modules can operate without mutual interference.
  • the number of spans generated per unit of time can be much larger than the number of spans, which can be analysed.
  • the tracing server 102 can easily overload the span extractor 304 and progressive learning module 302. While adopting a buffer enables the achievement of a strong decoupling to handle different speeds of operation, it typically has a limited capacity. Thus, situations when spans cannot be processed to build a span model are unavoidable.
  • the span buffer may store k spans and drop all of the subsequent incoming spans. While simple, this approach does not guarantee that the spans stored in the buffer are a statistically representative sample from the population. This is important since, based on the operation of the microservice application, it can happen that a large set of related requests are executed with a close temporal proximity. In such a scenario, the buffer contains a set of spans which is not representative of the true span population. Hence, a mechanism which enables to randomly choose a sample of k spans, having seen n spans, from an unknown span domain size N and which guarantees that each span has an equal probability k/n of being chosen is required.
  • a reservoir sampling mechanism which runs in 0(n ) time and uses 0(k ) space to sample spans.
  • the approach comprises making a pass to the stream of n spans while maintaining and updating a span buffer structure.
  • the buffer 301 stores the first k items of the span stream. Then, it iterates through the rest of 5. For the i-th item of 5 where i > k, it selects this span with probability k/i. If the buffer decides to store the new span, it randomly removes an old span from its structure and puts the new span into its place. It can be shown via induction that, when the whole stream is traversed, the buffer 301 contains k random samples.
  • the approach of the reservoir sampling has a complexity of O(n) making it an efficient buffering solution for span categorization.
  • Figure 5 illustrates a method for selecting a set of k spans from a plurality of spans in the span buffer, each span being a vector of property-value pairs.
  • the method comprising repeatedly performing the steps 501 to 503.
  • the method comprises receiving an i-th span from the plurality of spans.
  • the method comprises randomly selecting that i-th span with a probability of k/i.
  • the method comprises, if that i-th span is selected, storing it as one of the k spans.
  • This method may be conveniently applied where there are insufficient computational resources to process all incoming spans. This ensures that a representative sample of spans is input into the progressive learning module, the operation of which will now be described.
  • the span categorization system may categorize a large number of spans by applying a procedure implemented by the progressive learning module 302.
  • a value of T (true) assigned to (s ⁇ C j ) indicates a decision to assign s; to c 7 ⁇ , while a value of F (false) indicates a decision not to assign s; to c .
  • the task is to approximate the unknown target function f-.S C ® ⁇ T,F ⁇ (that describes how spans should be categorized) by means of a function f'-.S x C ® ⁇ T,F ⁇ called the categorizer such that / and f concur as much as possible.
  • the progressive learning module 302 is responsible for executing six main steps, as illustrated in Figure 6. These steps are:
  • Step 605 If the Model is to be Extended: Go to Step 601 else Go to Step 606
  • step 601-604 are executed by the model constructor module 305
  • step 605 is executed by the model adaptor module 306
  • step 606 is executed by the pattern extractor module 307.
  • step 601 extracts information related to the calls made between microservices.
  • the following fields are extracted from the structure of a span (shown in Figure 2): method, endpoint.
  • a method is used together with the endpoint of a remote microservice.
  • Calls between microservices are made using a method, also known as HTTP method or HTTP verb. Examples of methods are GET, POST, PUT, and DELETE.
  • the method is extracted from spans, since in many cases it is correlated with the behaviour of microservice calls.
  • a microservice call also includes a HTTP endpoint.
  • An example of a HTTP endpoint is shown in Figure 7.
  • An endpoint is a URL structure which identifies the endpoint (remote procedure) called during inter-process communication.
  • An endpoint has a schema, user, password, host, port, path, query, and fragment (several of these elements are optional).
  • the following elements are further extracted from an endpoint: port; path.
  • These two elements are considered since they are also highly correlated with, for example, the response time of invoking a remote microservice.
  • Different microservices are typically assigned to an internal, well-know, unique port. Thus, different ports refer to distinct microservices and different paths refer to distinct functions.
  • the path component is divided into path segments (keywords) using the separators ⁇ /, ? ⁇ to identify keywords boundaries.
  • Figure 8 shows additional examples of the results of the span extraction step.
  • the host can be included and concatenated with the port number to enable a more fine-grained categorization when the characteristics of the host are relevant.
  • the HTTP result code (see Figure 2) may also be extracted to enable a categorization which also accounts for successful and unsuccessful remote microservice calls.
  • step 602 the feature of the port number of span abstractions is used as a key to create an entry in a hash table with its value referencing the root node of a trie structure, as shown in Figure 9.
  • the trie is used to model spans for a specific port. Since ports are distributed uniformly across the hash table, the table enables to spans to be efficiently assigned to a trie until a complete model is constructed.
  • a trie data structure (also known as a prefix tree) is used to create a model for the spans.
  • a trie is a tree-like data structure.
  • the trie comprises nodes and edges. It is similar to a binary tree, and it is a very efficient implementation of an ordered index for text-based keys. Retrieving and determining the existence of paths stored in a trie is linear. Thus, the trie is suitable for modelling spans.
  • This structure is herein referred to as a span trie.
  • N (N, E) be a span trie with N being the set of nodes, E the set of edges, and r e N is the root node of T.
  • the alphabet ⁇ of the span trie is the set of path segments and methods: ⁇ p
  • S be the set of s strings over alphabet ⁇ .
  • a span trie T that stores the strings in 5 is a structure where each node of T (except the root node) is labelled with a character c e ⁇ . T has s terminal nodes, each associated with one string in S. The path from the root node to a terminal node has exactly one string.
  • a span trie To construct a span trie, the method and the path from a span abstraction are retrieved from the span extractor 304. Both features are joined to create a string s of 5 called a qualified path, with the form of a sequence (method, path). For example, (GET, v2, instance, list). Qualified paths (or q. paths) are added to the span trie.
  • Figures 10(a)-(c) respectively show a trie with the following three qualified paths which were added over a period of time:
  • Path 1 (a, b, c, d, e)
  • Path 2 (a, b, c, d, f)
  • edges are annotated with the number of qualified paths which go through them. This number is called the edge frequency.
  • Terminal nodes are used to distinguish nodes as end of string nodes. These nodes do not have outgoing edges or are nodes for which the number of incoming edges is greater than the number of outgoing edges. These nodes enable the identification of the last node of q. paths. In the figures, these nodes are coloured using light grey.
  • Figure 11 shows a span trie constructed using the following three qualified paths:
  • the terminal nodes are identified as list, RST22 and RST88, shown at 1101 , 1102 and 1103 in Figure 11 respectively.
  • Inserting and searching for a qualified path is efficient and can be achieved in linear time complexity O(n), where n is the length of the path.
  • the nodes are labelled.
  • a span trie can be used to efficiently categorize spans to be assigned to a specific time series.
  • a simple categorization approach comprises concatenating the elements of qualified paths from the root to terminal nodes to create unique category or time series identifier. Nonetheless, a closer observation of a span trie structure reveals that three types of disjoint nodes exist. These types of nodes are method nodes, schema nodes and parameter nodes. For example, in the qualified path (GET, v1 , XYZ11 , list, RST22), GET is a method node, v1 and list are schema nodes, and XYZ11 and RST22 are parameter nodes. XYZ11 and RST22 correspond to path parameters.
  • REST-based microservice applications support four types of parameters: 1) path parameters, 2) query string parameters, 3) header parameters, and 4) request body parameters. Described herein are examples of handling path parameters. Other types of parameters can be considered in other implementations by also adding them to a span trie.
  • Path parameters are found within the path of an endpoint, before the query string (identified with symbol ⁇ ). Path parameters are represented herein using curly brackets.
  • the path L/1/CUZ11/list/RST22 has two path parameters: XYZ11 and RST22, and the path is represented with /v1/ ⁇ P1 ⁇ /list/ ⁇ P2 ⁇ , where P1 and P2 are the parameters.
  • span trie is used to categorize spans, it needs to identify which nodes are parameters nodes, since they are not taken into account by the categorization procedure. Otherwise, the number of categories will be large and its is likely that there will only be a few spans assigned to each category, which may make the application of analytics rather useless.
  • step 604 is therefore to label each node as a schema node or as a parameter node.
  • the number of distinct parameters nodes is substantially larger than the number of distinct schema nodes.
  • schema nodes occur with a higher frequency than parameters nodes.
  • the number of distinct schema nodes has been observed to grow slowly when compared to the number of parameter nodes since the domain of parameters is usually much larger than the schema domain. For example, if parameters ⁇ P1 ⁇ and ⁇ P2 ⁇ of path /v1/ ⁇ P1 ⁇ /list/ ⁇ P2 ⁇ are customer ID and product ID, respectively, the size of their domain should be much larger than the number of elements that belong to the schema nodes such as v1 and list.
  • each depth d i i e (2, ...,n ⁇ of a span trie starting at the root node only has schema nodes or parameter nodes, but not both.
  • the level of the root node is 0 and level 1 corresponds to method nodes.
  • the level of all the other nodes is calculated using the topological sort operation breadth first (BFS) search.
  • BFS breadth first
  • two consecutive depth levels cannot contain parameter nodes. For microservice applications, this translates into having URL paths for which parameters occur always in the same position using the separator 7’ and no two parameters can occur in sequence.
  • a first approach is to use a threshold variable k.
  • a node When a node has a number of incoming edges greater that k, it is labelled as a schema node, otherwise it is a parameter node.
  • this approach may sometimes fail, since as more spans are analysed, the number of incoming edges of some parameter nodes may increase beyond threshold k, thus being incorrectly classified.
  • schema nodes which are infrequent (as they are not often part of traces) would be misclassified as parameter nodes.
  • a labelling method therefore should not use a global threshold but should use local and contextual information to each node.
  • An alternative approach is to partition all of the nodes in the trie into two sets, based on a threshold of the incoming edge frequency of nodes. Finding a threshold for the two sets can be performed by, for example, calculating the mean of the incoming edge frequency. A high edge frequency is therefore above the mean and a low edge frequency is below the mean.
  • An alternative, more robust, approach to outliers is to sort the edge frequencies, identify the two most distant edge frequencies in sequence, and use their mean as the boundary between low and high frequencies.
  • Figure 12 shows an example using the mean of the incoming edge frequency to separate the two sets.
  • the average edge frequency is 2.42. Therefore, the nodes with a high incoming edge frequency (>2.42) are ⁇ GET, v1 , v2, list, info ⁇ , show at 1201 , 1202, 1203, 1204 and 1205 respectively.
  • the nodes with a high incoming frequency are the schema nodes and are coloured in black in Figure 12.
  • the procedure has a low running time complexity and it is executed in 0(n + rri), where n and m are the number of nodes and edges, by analysing all the nodes once and counting the outgoing edges.
  • step 605 the labelling of nodes is suspended.
  • the execution of steps 601-604 provides a method to add span abstractions to a span trie and to automatically colour terminal, schema and parameter nodes. For efficiency reasons, it is advantageous to stop the learning phase as soon as possible. Thus, a mechanism is required to, at any given time, estimate how many spans still need to be analysed to build a representative span trie. This presents a two-fold challenge, in that the system does not know how many distinct span abstractions exist and the system does not know when sufficient spans have been analysed to build a representative span trie.
  • each span observed is random, independent from other spans, with equal probabilities, and that the total number of parameter nodes presented by all the spans is far greater than the schema nodes, for example, ⁇ P ⁇ »
  • a microservice application generates
  • E(S ) denotes the expected number of spans which need to be analysed to see all the
  • N(ln N + y) spans need to be collected to analyse at least one node of each type.
  • N 100
  • 419 observations on average are required to collect all the different span types.
  • the relative change rate of schema node labels may be analysed and the derivative used as a stop condition when it reaches the value zero (or a value close to, or approximately, zero). If the change rate of parameter nodes is used to derive a stop condition, it may happen that the training phase would never finish, as new parameter nodes can be seen over time. This scenario may occur, for example, if new customers or product are regularly added to an application and the endpoints use customer/product IDs to pass arguments to microservices calls.
  • count new ⁇ high_incoming_edge_frequency(G) ⁇ 8.
  • D ( count new - count old )/ count old
  • each loop reads a span (5), updates a span trie T (6), and counts the number of nodes labelled as schema nodes (7) using the function ⁇ high_incoming_edge_frequency(G) ⁇ .
  • the delta variable D stores the change rate of the span trie T (8).
  • Variable acc is an accumulator, which memorizes the number of consecutives iterations which had a zero or very small change rate (9-12). When b consecutive spans did not cause a change to the structure of trie T (i.e., while acc ⁇ b ), the procedure stops the learning phase.
  • step 606 in Figure 6 extracts patterns corresponding to the features of the identified paths, which will be used to assign spans to categories (and, optionally, time series ids) in the span categorization module.
  • the first procedure to execute in step 606 is to compress the span tries by replacing parameter nodes.
  • the procedure is as follows. Firstly, for each level of nodes in the span trie, low edge frequency nodes (i.e. nodes having an incoming edge frequency that is below a predetermined threshold) having the same parent node are merged into a unique parameter node. Then, newly created parameter nodes are each labelled with a unique identifier (for example, ⁇ 3 ⁇ 4). Next, the edge frequency of new parameter nodes is updated. For example, the span trie of Figure 14(a) is reduced into the trie shown in Figure 14(b).
  • the reduction of the trie is efficient and can be achieved in 0(n + m), where n and m are the number of nodes and edges, using topological sort or the Breadth-First Search (BFS) algorithm.
  • BFS Breadth-First Search
  • the reduced span trie can be further optimized by merging parameter nodes at the same level with different parent nodes, and by transforming the structure into a deterministic acyclic finite state automaton (DAFSA). Since DAFSA allow the same vertices to be reached by multiple paths, this alternative data structure uses significantly fewer vertices than a trie.
  • the second procedure of step 606 is to extract patterns from the reduced span tries. Firstly, all of the paths from the root to the terminal nodes are determined. Then, for each path, the path support is calculated.
  • the support is equal to the incoming edge frequency.
  • DFS Depth-First Search
  • Each pattern extracted from a span trie T is transformed into a 4-tuple (port, method, regex, tsjd), where regex is a regular expression for the endpoint and tsjd is a time series id.
  • regex is a regular expression for the endpoint
  • tsjd is a time series id.
  • the second pattern from the previous golden patterns is transformed into the 4-tuple:
  • the port and method fields are copied from the golden pattern.
  • the regular expression regex is created by joining path elements using the slash separator.
  • Parameter nodes labelled with ⁇ P x ⁇ are transformed to match any character in the set [a-zA-Z0-9J, represented with ⁇ w in the previous example.
  • the time series id may be created by providing a unique string representation of the concatenation of the port, method and the path regular expression. For example, this may be done by replacing slashes by underscores and the ⁇ w character set by the star symbol * . If domain knowledge is available, other mechanisms can be used to create regular expressions and time series ids.
  • the regex field is included into the general regular expression which will be used to parse HTTP/URL endpoints.
  • ep_regex A ((http[s]?
  • the output of the step 606 is a list of 3-tuple (method, ep_regex, tsjd). The patterns are therefore represented by regular expressions.
  • the extracted span patterns may therefore each correspond to a category.
  • the span categorization module can categorize spans into that category by matching the extracted features of spans to the features of the pattern.
  • Figure 15 summarises the above steps and illustrates a method for determining a pattern of features for extracting from a span in a microservice application, each span being a vector of property-value pairs.
  • the method comprises extracting a plurality of features from a plurality of spans.
  • the method comprises forming a series of span abstractions corresponding to the spans, each span abstraction comprising the features of a respective span.
  • the method comprises forming a span trie from the series of span abstractions by mapping each span abstraction to a series of corresponding nodes in the trie.
  • the method comprises identifying nodes in the trie whose incoming edge frequency is greater than a predetermined threshold.
  • the method comprises selecting the features of the paths to each such node as a pattern and storing that pattern for future detection.
  • the progressive learning module may only process and extract patterns from subset of spans selected by the span buffer. However, if sufficient resources are available, the progressive learning module may learn from all incoming spans (i.e, in the span extractor, features are extracted from all incoming spans).
  • the span categorization module 303 processes new spans. It receives spans and extracts their method and endpoint using the same procedure described in step 601 (extracting features). These fields are matched against the method and the regular expression ep_regex of golden patterns.
  • the span categorization module 303 When a match exists with a pattern, the span categorization module 303 emits the category to which a span is assigned, and may optionally assign a time series id to the span. Otherwise, the None keyword is returned and the span can be stored by the span collector module 308 for further processing.
  • the span categorization module iterates over the golden patterns to identify the category, and/or time series id, to emit. Since the following pattern matches the method GET and the endpoint of the span, the system emits the time series id 1343_G ET_v2_ * _info_ * .
  • Figure 16 illustrates a method for categorizing spans in microservice applications, each span being a vector of property-value pairs. This method is performed by the span categorization module.
  • the method comprises receiving a plurality of spans.
  • the method comprises extracting a plurality of features from each span.
  • the method comprises categorizing the plurality of spans into a plurality of categories in dependence on the extracted features.
  • Each of the modules described above may comprise a processor and a non-volatile memory.
  • Each module may comprise more than one processor and more than one memory.
  • the memory may store data that is executable by the processor.
  • the processor may be configured to operate in accordance with a computer program stored in non-transitory form on a machine readable storage medium.
  • the computer program may store instructions for causing the processor to perform its methods in the manner described herein.
  • the components may be implemented in physical hardware or in the cloud.
  • the categorization system is advantageously able to handle span bursts. Since large- scale microservice applications can generate millions spans in short periods of time, the mechanism is able to decouple span producers from span consumers. Traditional solutions which rely only on the use of message queuing systems for buffering do not guarantee the construction of a representative span model.
  • the present invention uses a sound statistical sampling technique to cope with span bursts. This makes the process scaleable.
  • the system also allows for improved processing efficiency.
  • the system achieves a low running time complexity 0(nk), being n the number of spans and k the number of elements of a path (typically a constant number ⁇ 10), by exploring the use of method calls and endpoints for categorizing spans. This makes the approach scalable and suitable to efficiently process spans originating from large-scale microservice applications.
  • the method proposed is particularly suitable to analyse microservice applications for which the initial number of span categorizes is unknown and online learning from real time data is required. It draws its fundamental operation from the field of machine learning by using a progressive and open-world learning approach.
  • Many traditional classification methods make a closed world assumption, i.e., the classes to which an element can be assigned are identified during training and are the ones which will be used in testing. No new classes can appear during testing.
  • This principal does not hold with large-scale, dynamic microservices applications which are in permanent change and evolution.
  • Microservices start, reboot, and shutdown as development teams introduce, split, improve, and deprecate services which leads to the constant change of span types.
  • the system therefore uses an open-world view, since new span categories can appear during testing which were not seen during training.
  • This approach is particularly suited for large-scale, dynamic microservices applications which are in permanent change and evolution and contrasts with the closed-world assumption taken by many existing classification methods.
  • the system also allows for progressive learning. Span categories are learned as new spans are observed by the system. The approach automatically stops when the change rate derivative converges. This characteristic provides an effective stop condition preventing the system from analysing redundant spans which do not add additional information to the knowledge model.
  • the processing and categorization of spans uses efficient data structures, such as hash tables and tries, to offer a solution with a linear running time, resulting in improved performance.
  • API application performance management
  • the categorization system and methods described herein are described as being particularly advantageous for use for time series analysis, the system and methods may also be applied to any field of data analysis and not only time series analysis. They may also be applied to other applications outside of microservice applications.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/EP2019/076076 2019-09-26 2019-09-26 Span categorization WO2021058104A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980099750.XA CN114730280A (zh) 2019-09-26 2019-09-26 跨度分类
PCT/EP2019/076076 WO2021058104A1 (en) 2019-09-26 2019-09-26 Span categorization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/076076 WO2021058104A1 (en) 2019-09-26 2019-09-26 Span categorization

Publications (1)

Publication Number Publication Date
WO2021058104A1 true WO2021058104A1 (en) 2021-04-01

Family

ID=68136358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/076076 WO2021058104A1 (en) 2019-09-26 2019-09-26 Span categorization

Country Status (2)

Country Link
CN (1) CN114730280A (zh)
WO (1) WO2021058104A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160269482A1 (en) * 2015-03-12 2016-09-15 International Business Machines Corporation Providing agentless application performance monitoring (apm) to tenant applications by leveraging software-defined networking (sdn)
US9531736B1 (en) * 2012-12-24 2016-12-27 Narus, Inc. Detecting malicious HTTP redirections using user browsing activity trees
US20180309637A1 (en) * 2017-04-25 2018-10-25 Nutanix, Inc. Systems and methods for networked microservice modeling and visualization
US20190057015A1 (en) * 2017-08-15 2019-02-21 Hybris Ag Predicting defects in software systems hosted in cloud infrastructures
WO2019099558A1 (en) * 2017-11-15 2019-05-23 Sumo Logic Cardinality of time series

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9531736B1 (en) * 2012-12-24 2016-12-27 Narus, Inc. Detecting malicious HTTP redirections using user browsing activity trees
US20160269482A1 (en) * 2015-03-12 2016-09-15 International Business Machines Corporation Providing agentless application performance monitoring (apm) to tenant applications by leveraging software-defined networking (sdn)
US20180309637A1 (en) * 2017-04-25 2018-10-25 Nutanix, Inc. Systems and methods for networked microservice modeling and visualization
US20190057015A1 (en) * 2017-08-15 2019-02-21 Hybris Ag Predicting defects in software systems hosted in cloud infrastructures
WO2019099558A1 (en) * 2017-11-15 2019-05-23 Sumo Logic Cardinality of time series

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Fast webpage classification using URL features", CIKM '05 PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, pages 325 - 326

Also Published As

Publication number Publication date
CN114730280A (zh) 2022-07-08

Similar Documents

Publication Publication Date Title
CN107665191B (zh) 一种基于扩展前缀树的私有协议报文格式推断方法
US9942318B2 (en) Producing search results by aggregating messages from multiple search peers
US7548848B1 (en) Method and apparatus for semantic processing engine
US20060085389A1 (en) Method for transformation of regular expressions
US20230161760A1 (en) Applying data-determinant query terms to data records with different formats
US20210406288A1 (en) Novelty detection system
US11567735B1 (en) Systems and methods for integration of multiple programming languages within a pipelined search query
Reza et al. Prunejuice: pruning trillion-edge graphs to a precise pattern-matching solution
US11727007B1 (en) Systems and methods for a unified analytics platform
US20210185059A1 (en) Label guided unsupervised learning based network-level application signature generation
Bramandia et al. On incremental maintenance of 2-hop labeling of graphs
Gogoi et al. A rough set–based effective rule generation method for classification with an application in intrusion detection
US11843622B1 (en) Providing machine learning models for classifying domain names for malware detection
CN113723542A (zh) 一种日志聚类处理方法及系统
CN112612832A (zh) 节点分析方法、装置、设备及存储介质
CN110022343B (zh) 自适应事件聚合
Kobayashi et al. amulog: A general log analysis framework for diverse template generation methods
Xhafa et al. Apache Mahout's k-Means vs Fuzzy k-Means Performance Evaluation
WO2021058104A1 (en) Span categorization
US11949547B2 (en) Enhanced simple network management protocol (SNMP) connector
US11922222B1 (en) Generating a modified component for a data intake and query system using an isolated execution environment image
Haneef et al. A FEATURE SELECTION TECHNIQUE FOR INTRUSION DETECTION SYSTEM BASED ON IWD AND ACO.
CN114090850A (zh) 日志分类方法、电子设备及计算机可读存储介质
Agrawal et al. A survey on content based crawling for deep and surface web
Ziehn Complex event processing for the internet of things

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19782520

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19782520

Country of ref document: EP

Kind code of ref document: A1