WO2023073859A1

WO2023073859A1 - Inference device, inference method and inference program

Info

Publication number: WO2023073859A1
Application number: PCT/JP2021/039788
Authority: WO
Inventors: 優酒井; 謙輔高橋
Original assignee: 日本電信電話株式会社
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2023-05-04

Abstract

An inference device 14 that infers a dependency relationship between components, said inference device comprising: a conversion unit 141 that extracts a parent-child relationship between components among a plurality of components which constitute a program from observational data containing an operation history of the program, and that converts the observational data into an event log of a child component of each parent component on the basis of a tree representing the parent-child relationship; and an inference unit 143 that uses the event log of the child component of each parent component to infer and calculate the dependency relationship between the components among the plurality of components.

Description

Estimation device, estimation method, and estimation program

The present invention relates to an estimation device, an estimation method, and an estimation program.

Conventionally, microservices are known. Microservices have a form in which multiple services are linked by interfaces, and each service is developed and maintained by a different team. Each service will continue to be updated from time to time by its respective development team to fix bugs, improve performance, and adapt to customer needs.

　Microservice developers generally do not fully understand services other than the services they are in charge of. Therefore, it is difficult for both developers and maintainers of microservices to understand in what order user requests are processed by which service.

In addition, in microservices that have a complex structure that links multiple services, detecting anomalies and narrowing down the location of failures by relying only on error messages requires a great deal of labor and leads to delays in discovering failure locations. possible and inadequate for service maintenance.

Therefore, there is a technology called distributed tracing. Distributed tracing is a technology that traces how multiple services work together and visualizes them as a single flow. In Non-Patent Document 1, a service graph, which is an operation model of microservices, is generated based on monitoring data including the operation history of microservices.

For example, in Non-Patent Document 1, trace data obtained by OpenTracingAPI is used as is to estimate dependencies between components, and a service graph is constructed using a Petri net, which is an extended form of a state machine using a directed graph. . Trace data is a set of operation histories of each service for one request.

　Non-Patent Document 1 expresses the components of each service that constitutes a microservice as a data structure called a span, in which figures representing waiting for processing, starting processing, being processed, finished processing, and processed are connected by arrow lines. Trace data is a set of spans. A service graph represents a state in which spans are connected by arrow lines based on the estimated dependencies between spans.

Then, Non-Patent Document 1 automatically detects anomalies in the monitoring data by matching the service graph that has already been generated against the monitoring data from the microservice that continues to be updated from time to time.

　In order to generate a service graph, it is necessary to estimate the dependencies between components (between spans) that exist within a microservice from monitoring data such as trace data. At this time, if the estimation accuracy of the dependencies is low or the types of dependencies that can be estimated are insufficient, the deviation between the actual microservice behavior and the service graph that has already been generated will increase, resulting in abnormalities. There was a problem that false detections and missed detections increased.

In order to solve this problem, it is conceivable that the developer or maintainer of the microservice will give the correct answer, wholly or partially, about the dependencies between the components. However, continuously providing such knowledge to microservices that have short development periods and are frequently updated imposes a heavy burden on developers and maintainers.

It is also conceivable to use the highly accurate and flexible process discovery of Non-Patent Document 2 in order to estimate dependencies between components. However, if process discovery is used without giving any prior knowledge, the computational cost tends to increase in order to ensure its flexibility. The larger the scale of the microservice, the larger the size of each microservice monitoring data, and the larger the size of the monitoring data, the larger the calculation cost. Therefore, applying process discovery using monitoring data as it is or converting it directly to process logs will result in a very high computational cost and a large delay from running the microservice application program to generating the service graph. occur.

The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technology that can estimate dependencies between components at high speed and with high accuracy.

An estimating device of one aspect of the present invention is an estimating device for estimating a dependency relationship between components, extracting a parent-child relationship between components of a plurality of components constituting the program from monitoring data including an operation history of the program, a converter that converts the monitoring data into an event log of a child component for each parent component based on a parent-child relationship tree; and an estimating unit that estimates and calculates the dependency of

An estimation method according to one aspect of the present invention is an estimation method for estimating a dependency relationship between components, wherein an estimation device determines a parent-child relationship between components of a plurality of components constituting the program from monitoring data including an operation history of the program. extracting and converting the monitoring data into event logs of child components for each parent component based on the tree of parent-child relationships; and calculating an estimate of the dependencies between the components of .

An estimation program according to one aspect of the present invention causes a computer to function as the estimation device.

According to the present invention, it is possible to provide a technology capable of estimating dependencies between components at high speed and with high accuracy.

FIG. 1 is a diagram showing the overall configuration of the system. FIG. 2 is a diagram showing the functional block configuration of the estimation device. FIG. 3 is a diagram showing the operation of the system. FIG. 4 is a diagram showing an example of the basic format of trace data. FIG. 5 is a diagram illustrating an example of extraction of parent-child relationships between spans. FIG. 6 is a diagram showing an example of creating a parent-child relationship tree between spans. FIG. 7 is a diagram illustrating an example of creating an event sequence of child spans for each parent span. FIG. 8 is a diagram illustrating an example of creating an event log for a child span for each parent span. FIG. 9 is a diagram illustrating an example of estimation of inter-span dependencies for each event log. FIG. 10 is a diagram showing examples of functional functions used in recursive mining. FIG. 11 is a diagram illustrating an example of order relationship estimation. FIG. 12 is a diagram showing an example of a service graph. FIG. 13 is a diagram illustrating an example of a hardware configuration of an estimation device;

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the description of the drawings, the same parts are denoted by the same reference numerals, and the description thereof is omitted.

[Summary of Invention]
The present invention creates a service model that faithfully simulates the operation of a microservice based on service dependencies at the component level within an application program, and compares the service model with the operation history during replay of the microservice. By doing so, we aim to be useful for detecting anomalies in microservices and analyzing bottlenecks through simulation.

To achieve this goal, the present invention enables highly accurate and flexible estimation of the inter-component dependencies of multiple components that provide microservices, thereby reducing the need for manual application analysis and research when creating service models. intended to

Conventionally, as described above, it is possible to create a service model that simulates the behavior of microservices based on the dependencies between components within an application program. Also, by comparing the service model with the operation history of the microservice during replay, it is possible to detect the difference between the service model and the actual operation history as an anomaly.

However, as mentioned above, the dependencies between components that can be estimated are limited, and in order to create a service model that is less likely to deviate from the actual behavior of microservices, maintenance personnel or Need to give developer knowledge. Moreover, an increase in the amount of calculation corresponding to the size of the monitoring data is a barrier to estimating the dependencies between components with high accuracy.

Therefore, the present invention converts monitoring data such as trace data obtained by distributed tracing into an event log, and uses the event log to perform process mining to estimate dependencies between components. In other words, using the service information that is generally included in the monitoring data by default and estimating the dependencies between components in a divide-and-conquer manner, that is, converting (dividing) the monitoring data into each parent component, and then performing process mining. By doing so, it realizes fast and highly accurate dependency estimation between components.

[Overall system configuration]
FIG. 1 is a diagram showing the overall configuration of a system 1 according to this embodiment. System 1 is a system that provides microservices and manages and monitors their provision.

A microservice is an application program that has a form in which multiple services are linked by interfaces, and is developed and maintained by a different team for each service. Each service will continue to be updated from time to time by its respective development team to fix bugs, improve performance, and adapt to customer needs.

The system 1 includes a provision device 11, a monitoring device 12, a distribution device 13, an estimation device 14, a generation device 15, a storage device 16, and an analysis device 17, as shown in FIG. In addition, a developer terminal 21, a user terminal 22, and a maintenance person terminal 23 exist. All devices and terminals are communicatively physically and electrically connected.

The providing device 11 is a device that executes a microservice application program function-released from the developer terminal 21 . The providing device 11 is a device that provides a microservice to the user terminal 22 by executing the microservice, and transmits to the monitoring device 12 an operation history of the microservice that is output when the microservice is executed.

The monitoring device 12 treats the microservice provided by the providing device 11 to the user terminal 22 as a service to be monitored, and sends monitoring data including the operation history of the service to be monitored transmitted from the providing device 11 in response to an inquiry from the distribution device 13. and transmits it to the distribution device 13 .

The distribution device 13 is a device that inquires of the monitoring device 12 about transmission of monitoring data and transmits the monitoring data transmitted from the monitoring device 12 in response to the inquiry to the estimation device 14 and the analysis device 17 .

The estimating device 14 is a device that converts the monitoring data sent from the distribution device 13 into an event log, and performs process mining using the event log to estimate dependencies between components.

The generating device 15 is a device that refers to the inter-component dependencies estimated by the estimating device 14 and generates a service graph of the monitored service based on the dependencies.

The storage device 16 is a device that stores the service graph of the monitored service created by the generation device 15 so that it can be referenced and read.

The analysis device 17 refers to the service graph of the monitored service stored in the storage device 16, and monitors the service graph and monitoring data including the replay operation history of the updated monitored service transmitted from the distribution device 13. are compared by simulation to analyze an abnormality in the monitoring data, and the analysis result is transmitted to the developer terminal 21 and the maintenance person terminal 23 .

[Configuration of estimation device]
FIG. 2 is a diagram showing the functional block configuration of the estimation device 14. As shown in FIG. As described above, the estimating device 14 is a device that converts monitoring data such as trace data obtained by distributed tracing into an event log, and performs process mining using the event log to estimate dependencies between components. be.

　Distributed tracing is a technology that tracks how multiple services work together and visualizes them as a single flow.

A component is a part or element that makes up an application program, and is a functional program with a predetermined function. In this embodiment, a component is represented by a data structure called a span, in which figures representing waiting for processing, starting processing, being processed, ending processing, and processed are connected by arrow lines. A component is hereinafter referred to as a span.

The estimating device 14 includes a transforming unit 141, a storage unit 142, and an estimating unit 143, as shown in FIG.

The conversion unit 141 extracts a parent-child relationship between spans of a plurality of spans forming the target program from monitoring data including the operation history of the application program of the target service, and converts the monitoring data based on the tree of the parent-child relationship. It has a function to convert to event logs of child spans for each parent span.

The conversion unit 141 has a function of converting a plurality of events related to a plurality of child spans into an event log composed of an event string sorted according to the time information of the events and the identification information of the spans included in the events.

The conversion unit 141 converts the monitoring data into an event string every time it receives monitoring data including the operation history of the replay of the monitored service after update, which is transmitted from the distribution device 13, and converts the converted event string into a past event string. It has a function of updating (storing) the event string if it is different from the event string, and not updating (storing) if it is the same as the past event string.

The storage unit 142 has a function of storing the event log created (converted) by the conversion unit 141 in a readable and referable manner. The storage unit 142 has a function of storing the inter-span dependency relation estimated by the estimation unit 143 in a readable and referable manner.

The estimating unit 143 has a function of estimating and calculating inter-span dependencies of a plurality of spans using event logs of child spans for each parent span created (converted) by the converting unit 141 .

The estimating unit 143 performs process mining using event logs of child spans for each parent span created (converted) by the converting unit 141, thereby estimating and calculating inter-span dependencies of a plurality of spans. It has a function to update.

Note that in this embodiment, a microservice application program is used as an example of the application program of the service to be monitored. However, it is also applicable to any program composed of a plurality of spans (components, that is, modular functional programs that provide predetermined functions).

[System operation]
FIG. 3 is a diagram showing the operation of system 1. As shown in FIG. The operation of the estimating device 14 will be mainly described. The method of the present invention extracts endpoint information and parent-child relationships between spans from monitoring data, and converts the monitoring data into event logs for each parent span based on the parent-child relationships. This method estimates the dependency between spans by Details will be described below.

Step S1;
First, the distribution device 13 transmits monitoring data including the operation history of the monitored service to the estimation device 14 . FIG. 4 is a diagram showing an example of the basic format of trace data relating to the operation of an application program of a service to be monitored, as an example of monitoring data. The trace data includes, for example, component-level process name, process start time, process end time, reference type name, and related resources.

Note that one piece of trace data is created each time the application program of the monitored service runs. Assume that four pieces of trace data are transmitted to the estimating device 14 thereafter.

step S2 to step S5;
Next, the conversion unit 141 of the estimation device 14 refers to the “reference type name” in the “Reference” column shown in FIG. 4 for each of the four trace data, and based on the “reference type name” Extract parent-child relationships. For example, as shown in FIG. 5, the parent-child relationship between spans is extracted for each of the four trace data D1 to D4.

Subsequently, the conversion unit 141 of the estimating device 14 acquires the endpoint information of the application program by referring to the roots of all the extracted parent-child relationships, and creates and holds a parent-child relationship tree for each endpoint. An endpoint is a parent-child relationship tree between spans with a certain span as the root. Endpoint A points to a parent-child relationship tree rooted at span A. For example, a parent-child relationship tree for endpoint A (span A) and a parent-child relationship tree for endpoint B (span B) as shown in FIG. 6 are created.

Here, based on the created parent-child relationship tree, the span set of monitoring data can be divided as to which span is the parent, except for the root (endpoint) span. In other words, a plurality of hierarchized spans that form the parent-child relationship tree can be divided into sets of child spans for each parent span. For example, as shown in FIG. 7, a parent span A can be divided into a child span set #1 consisting of child spans A_1, .

Therefore, the conversion unit 141 of the estimating device 14 extracts only the spans included in each divided set from the monitoring data, and extracts the "events" of the spans from the monitoring data. "Event" has four elements, "operationName", "eventType", "timestamp", and "spanID", as shown in FIG.

"operationName" is the processing name of the child span ("component level processing name" in Fig. 4). "eventType" indicates the start or end of child span processing. "timestamp" indicates the processing start time or processing end time of the child span. “spanID” is the ID of the child span (“span ID” in FIG. 4).

Then, the conversion unit 141 of the estimating device 14 sorts the plurality of extracted “events” according to the “timestamp” of the “event” and the “spanID” included in the event, and the “event column" is created for each parent span of the parent-child relationship tree, and stored in the storage unit 142 as an "event log" (steps S2 and S5).

Thereafter, the conversion unit 141 of the estimating device 14 receives the fifth, sixth, . Each time, the parent-child relationship between spans is extracted and the parent-child relationship tree is newly created or updated, and stored in the storage unit 142 .

Furthermore, the conversion unit 141 of the estimation device 14 refers to the newly created or updated parent-child relationship tree, divides the trace data and creates an event sequence for each parent span having children, and The created "event string" is added to the "event log" (steps S2 and S5).

An "event log" is a set of event sequences entered in this way, and all parent spans that have children hold an "event log". As a result, an event log for each endpoint and each parent span is created and stored in the storage unit 142 as shown in FIG.

When adding an event sequence to the event log, the conversion unit 141 of the estimation device 14 refers to the existing event log stored in the storage unit 142, and stores the event sequence to be added and the total number in the existing event log. If there is already an event sequence whose order is exactly the same, the event to be added is not added to the event log (steps S3 to S5). In this way, by pre-checking the monitoring data equivalent to the event sequence, the total number of event sequences can be reduced, and the event log size can be reduced.

step S6 to step S8;
Next, the estimating unit 143 of the estimating device 14 refers to the event logs divided for each endpoint and each parent span and stored in the storage unit 142, executes process mining for each event log, and performs process mining for each event log. For each event log, an estimated dependency between spans included in the event log is calculated. FIG. 9 is a diagram illustrating an example of estimation of inter-span dependencies for each event log.

Process mining is a technology that visualizes event logs by connecting them in chronological order. Also called process discovery.

As a specific example 1 of process discovery, there is a method of applying inductive miner. Recursive mining finds five dependencies: "sequence", "concurrent", "xor", "interleaved", and "loop" from the event log, and divides the event log based on the dependencies. This is a method of recursively repeating the estimation of inter-span dependencies.

Specifically, first, the BASECASE function F1 shown in FIG. 10 is used to recursively divide the event log to check whether the event log after division is the minimum unit. Next, the FINDCUT function F2 is used to detect dependencies between spans. After that, using the SPLITLOG function F3, the event log is split based on the inter-span dependencies detected. As a result, inter-span dependencies are estimated, and the result of dependency estimation is output as a process tree.

Even if the dependency between spans cannot be found accurately, the dependency can be forcibly applied to one of the dependencies by using the FALLTHROUGH function F4. The FALLTHROUGH function F4 is a function that applies a dependency that is not the smallest unit but could not be divided to any of the dependencies. Therefore, the specific example 1 can be said to be a highly flexible method.

Specific example 2 of process discovery is the method of using a microservice demo application. Observing monitoring data that can be obtained from demo applications, etc., most of them can be explained by a combination of parallel processing, serial processing, and iterative processing. Therefore, focusing on such a tendency, a method of first estimating the spans processed in series, arranging them in order, and estimating the portion of repeated processing from the sequence can be applied.

Compared to the recursive mining of Specific Example 1, there are fewer types of candidate dependencies to be estimated, but the computational cost is reduced accordingly. Hereinafter, a case will be described where the target trace data for the order relationship of the cart screen is "B→C→D→E→E→E→E→E→G→I→E→I→E→I".

Procedure 1;
First, the order relation between two elements before and after the target trace data is listed from the beginning, and the number of times each order relation appears is counted. For example, as shown in FIG. 11, four order relationships of "E→E" appear, so the count number of "E→E" at positions 4 to 7 is 4. There are two order relations of "I→E" and "E→I", so "I→E" at position 10, "E→I" at position 11, and "I→E" at position 12. , the count number of "E→I" at position 13 becomes 2.

Procedure 2;
Next, for the order relation with the count number of 1, the check result is listed as completed. For an order relation with a count number of 2 or more, if the order relation process is repeated, the check result is listed as completed, and the order relation process is repeated, but all the processes are the same. , or if the order-related processing is not repeated, the check result is reserved. Each "E→E" at positions 4 to 7 repeats the processing, but since all the processing is the same "E", it is held. 'I→E' at position 10, 'E→I' at position 11, 'I→E' at position 12, and 'E→I' at position 13 are repeated processes, and not all processes are the same. Therefore, it is completed.

Step 3;
After that, if there is a hold in the check result, the number of elements is increased for the order relation at that position, the count number is recalculated, and step 2 is performed again.

Steps

2 and 3 are repeated until the count number of all positions becomes 1 and the check result is completed.

After that, the estimating unit 143 of the estimating device 14 stores the inter-span dependency estimation result in the storage unit 142 . The result of estimating inter-span dependencies is stored for each parent span in the same way as the event log.

step S9, step S10;
Finally, the generating device 15 refers to the inter-span dependency relationships stored in the storage unit 142 of the estimating device 14, and generates a service graph of the monitored service based on the dependency relationships. FIG. 12 is an example of a service graph.

[effect]
As described above, in this embodiment, monitoring data such as trace data obtained by distributed tracing is converted into an event log, and process mining is performed using the event log to estimate the inter-component dependencies.

Specifically, the estimating device 14 according to the present embodiment extracts the parent-child relationship between spans of a plurality of spans that constitute the application program from monitoring data including the operation history of the microservice application program, and extracts the parent-child relationship a conversion unit 141 that converts the monitoring data into an event log of a child span for each parent span based on the tree of the parent span, and a dependence relationship between spans of a plurality of spans using the event log of the child span for each parent span. and an estimating unit 143 that estimates and calculates the .

Because process discovery heavily uses the operation of retrieving pairs of events, the computational cost increases as the size of the event log increases. Regarding this point, in the present embodiment, the event log is divided in advance based on the information held by default in the monitoring data from the beginning, and the process discovery is performed on the divided event log. Since the inter-span dependency relationship is systematically estimated, the inter-span dependency relationship can be estimated faster and more accurately than when all trace data are converted into event logs.

As a result, the divergence between the actual microservice operation and the already generated service graph is reduced, and the problem of more false detections and oversights of anomaly detection can be solved, and microservice developers and maintenance staff can It is possible to reduce the burden on people.

[others]
The invention is not limited to the above embodiments. The present invention can be modified in many ways within the scope of the gist of the present invention.

The estimation device 14 of this embodiment described above includes, for example, a CPU 901, a memory 902, a storage 903, a communication device 904, an input device 905, and an output device 906, as shown in FIG. It can be realized using a general-purpose computer system. Memory 902 and storage 903 are storage devices. In the computer system, each function of the estimation device 14 is realized by executing a predetermined program loaded on the memory 902 by the CPU 901 .

The estimation device 14 may be implemented by one computer. The estimator 14 may be implemented with multiple computers. The estimating device 14 may be a virtual machine implemented on a computer. A program for the estimating device 14 can be stored in computer-readable recording media such as HDD, SSD, USB memory, CD, and DVD. The program for estimating device 14 can also be distributed via a communication network.

1: System 11: Providing Device 12: Monitoring Device 13: Distribution Device 14: Estimation Device 15: Generating Device 16: Storage Device 17: Analysis Device 21: Developer Terminal 22: User Terminal 23: Maintenance Person Terminal 141: Conversion Unit 142 : Storage unit 143: Estimation unit 901: CPU
902: Memory 903: Storage 904: Communication device 905: Input device 906: Output device

Claims

In an estimator that estimates dependencies between components,
A parent-child relationship between components of a plurality of components constituting the program is extracted from monitoring data including the operation history of the program, and the monitoring data is stored in the event log of the child component for each parent component based on the tree of the parent-child relationship. a conversion unit that converts;
an estimating unit that estimates and calculates dependencies between components of the plurality of components using event logs of child components for each of the parent components;
An estimating device comprising:
The conversion unit
2. The estimation device according to claim 1, wherein a plurality of events related to a plurality of child components are converted into an event log composed of an event string sorted according to time information of events and identification information of components included in the events.
The conversion unit
3. The estimation device according to claim 2, wherein the event sequence is updated when the event sequence converted each time the monitoring data is received is different from the past event sequence, and is not updated when the event sequence is the same as the past event sequence.
The estimation unit
The estimating device according to claim 1, wherein the inter-component dependencies of the plurality of components are sequentially updated by performing process mining using event logs of child components of each of the parent components.
Said program
2. The estimating device according to claim 1, which is an application program for microservices in which a plurality of programs are linked by an interface.
In the estimation method for estimating dependencies between components,
The estimation device
A parent-child relationship between components of a plurality of components constituting the program is extracted from monitoring data including the operation history of the program, and the monitoring data is stored in the event log of the child component for each parent component based on the tree of the parent-child relationship. a step of converting;
calculating estimated dependencies between components of the plurality of components using event logs of child components for each of the parent components;
estimation method.
An estimation program that causes a computer to function as the estimation device according to any one of claims 1 to 5.