WO2023073859A1 - Inference device, inference method and inference program - Google Patents

Inference device, inference method and inference program Download PDF

Info

Publication number
WO2023073859A1
WO2023073859A1 PCT/JP2021/039788 JP2021039788W WO2023073859A1 WO 2023073859 A1 WO2023073859 A1 WO 2023073859A1 JP 2021039788 W JP2021039788 W JP 2021039788W WO 2023073859 A1 WO2023073859 A1 WO 2023073859A1
Authority
WO
WIPO (PCT)
Prior art keywords
components
parent
child
event
program
Prior art date
Application number
PCT/JP2021/039788
Other languages
French (fr)
Japanese (ja)
Inventor
優 酒井
謙輔 高橋
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2021/039788 priority Critical patent/WO2023073859A1/en
Publication of WO2023073859A1 publication Critical patent/WO2023073859A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring

Definitions

  • the present invention relates to an estimation device, an estimation method, and an estimation program.
  • Microservices have a form in which multiple services are linked by interfaces, and each service is developed and maintained by a different team. Each service will continue to be updated from time to time by its respective development team to fix bugs, improve performance, and adapt to customer needs.
  • Microservice developers generally do not fully understand services other than the services they are in charge of. Therefore, it is difficult for both developers and maintainers of microservices to understand in what order user requests are processed by which service.
  • Distributed tracing is a technology that traces how multiple services work together and visualizes them as a single flow.
  • a service graph which is an operation model of microservices, is generated based on monitoring data including the operation history of microservices.
  • Non-Patent Document 1 trace data obtained by OpenTracingAPI is used as is to estimate dependencies between components, and a service graph is constructed using a Petri net, which is an extended form of a state machine using a directed graph. .
  • Trace data is a set of operation histories of each service for one request.
  • Non-Patent Document 1 expresses the components of each service that constitutes a microservice as a data structure called a span, in which figures representing waiting for processing, starting processing, being processed, finished processing, and processed are connected by arrow lines.
  • Trace data is a set of spans.
  • a service graph represents a state in which spans are connected by arrow lines based on the estimated dependencies between spans.
  • Non-Patent Document 1 automatically detects anomalies in the monitoring data by matching the service graph that has already been generated against the monitoring data from the microservice that continues to be updated from time to time.
  • Non-Patent Document 2 It is also conceivable to use the highly accurate and flexible process discovery of Non-Patent Document 2 in order to estimate dependencies between components.
  • process discovery is used without giving any prior knowledge, the computational cost tends to increase in order to ensure its flexibility.
  • the present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technology that can estimate dependencies between components at high speed and with high accuracy.
  • An estimating device of one aspect of the present invention is an estimating device for estimating a dependency relationship between components, extracting a parent-child relationship between components of a plurality of components constituting the program from monitoring data including an operation history of the program, a converter that converts the monitoring data into an event log of a child component for each parent component based on a parent-child relationship tree; and an estimating unit that estimates and calculates the dependency of
  • An estimation method is an estimation method for estimating a dependency relationship between components, wherein an estimation device determines a parent-child relationship between components of a plurality of components constituting the program from monitoring data including an operation history of the program. extracting and converting the monitoring data into event logs of child components for each parent component based on the tree of parent-child relationships; and calculating an estimate of the dependencies between the components of .
  • An estimation program causes a computer to function as the estimation device.
  • FIG. 1 is a diagram showing the overall configuration of the system.
  • FIG. 2 is a diagram showing the functional block configuration of the estimation device.
  • FIG. 3 is a diagram showing the operation of the system.
  • FIG. 4 is a diagram showing an example of the basic format of trace data.
  • FIG. 5 is a diagram illustrating an example of extraction of parent-child relationships between spans.
  • FIG. 6 is a diagram showing an example of creating a parent-child relationship tree between spans.
  • FIG. 7 is a diagram illustrating an example of creating an event sequence of child spans for each parent span.
  • FIG. 8 is a diagram illustrating an example of creating an event log for a child span for each parent span.
  • FIG. 9 is a diagram illustrating an example of estimation of inter-span dependencies for each event log.
  • FIG. 1 is a diagram showing the overall configuration of the system.
  • FIG. 2 is a diagram showing the functional block configuration of the estimation device.
  • FIG. 3 is a diagram showing the operation of the system.
  • FIG. 10 is a diagram showing examples of functional functions used in recursive mining.
  • FIG. 11 is a diagram illustrating an example of order relationship estimation.
  • FIG. 12 is a diagram showing an example of a service graph.
  • FIG. 13 is a diagram illustrating an example of a hardware configuration of an estimation device;
  • the present invention creates a service model that faithfully simulates the operation of a microservice based on service dependencies at the component level within an application program, and compares the service model with the operation history during replay of the microservice. By doing so, we aim to be useful for detecting anomalies in microservices and analyzing bottlenecks through simulation.
  • the present invention enables highly accurate and flexible estimation of the inter-component dependencies of multiple components that provide microservices, thereby reducing the need for manual application analysis and research when creating service models.
  • the present invention converts monitoring data such as trace data obtained by distributed tracing into an event log, and uses the event log to perform process mining to estimate dependencies between components.
  • monitoring data such as trace data obtained by distributed tracing into an event log
  • process mining to estimate dependencies between components.
  • FIG. 1 is a diagram showing the overall configuration of a system 1 according to this embodiment.
  • System 1 is a system that provides microservices and manages and monitors their provision.
  • a microservice is an application program that has a form in which multiple services are linked by interfaces, and is developed and maintained by a different team for each service. Each service will continue to be updated from time to time by its respective development team to fix bugs, improve performance, and adapt to customer needs.
  • the system 1 includes a provision device 11, a monitoring device 12, a distribution device 13, an estimation device 14, a generation device 15, a storage device 16, and an analysis device 17, as shown in FIG.
  • a developer terminal 21, a user terminal 22, and a maintenance person terminal 23 exist. All devices and terminals are communicatively physically and electrically connected.
  • the providing device 11 is a device that executes a microservice application program function-released from the developer terminal 21 .
  • the providing device 11 is a device that provides a microservice to the user terminal 22 by executing the microservice, and transmits to the monitoring device 12 an operation history of the microservice that is output when the microservice is executed.
  • the monitoring device 12 treats the microservice provided by the providing device 11 to the user terminal 22 as a service to be monitored, and sends monitoring data including the operation history of the service to be monitored transmitted from the providing device 11 in response to an inquiry from the distribution device 13. and transmits it to the distribution device 13 .
  • the distribution device 13 is a device that inquires of the monitoring device 12 about transmission of monitoring data and transmits the monitoring data transmitted from the monitoring device 12 in response to the inquiry to the estimation device 14 and the analysis device 17 .
  • the estimating device 14 is a device that converts the monitoring data sent from the distribution device 13 into an event log, and performs process mining using the event log to estimate dependencies between components.
  • the generating device 15 is a device that refers to the inter-component dependencies estimated by the estimating device 14 and generates a service graph of the monitored service based on the dependencies.
  • the storage device 16 is a device that stores the service graph of the monitored service created by the generation device 15 so that it can be referenced and read.
  • the analysis device 17 refers to the service graph of the monitored service stored in the storage device 16, and monitors the service graph and monitoring data including the replay operation history of the updated monitored service transmitted from the distribution device 13. are compared by simulation to analyze an abnormality in the monitoring data, and the analysis result is transmitted to the developer terminal 21 and the maintenance person terminal 23 .
  • FIG. 2 is a diagram showing the functional block configuration of the estimation device 14.
  • the estimating device 14 is a device that converts monitoring data such as trace data obtained by distributed tracing into an event log, and performs process mining using the event log to estimate dependencies between components. be.
  • Distributed tracing is a technology that tracks how multiple services work together and visualizes them as a single flow.
  • a component is a part or element that makes up an application program, and is a functional program with a predetermined function.
  • a component is represented by a data structure called a span, in which figures representing waiting for processing, starting processing, being processed, ending processing, and processed are connected by arrow lines.
  • a component is hereinafter referred to as a span.
  • the estimating device 14 includes a transforming unit 141, a storage unit 142, and an estimating unit 143, as shown in FIG.
  • the conversion unit 141 extracts a parent-child relationship between spans of a plurality of spans forming the target program from monitoring data including the operation history of the application program of the target service, and converts the monitoring data based on the tree of the parent-child relationship. It has a function to convert to event logs of child spans for each parent span.
  • the conversion unit 141 has a function of converting a plurality of events related to a plurality of child spans into an event log composed of an event string sorted according to the time information of the events and the identification information of the spans included in the events.
  • the conversion unit 141 converts the monitoring data into an event string every time it receives monitoring data including the operation history of the replay of the monitored service after update, which is transmitted from the distribution device 13, and converts the converted event string into a past event string. It has a function of updating (storing) the event string if it is different from the event string, and not updating (storing) if it is the same as the past event string.
  • the storage unit 142 has a function of storing the event log created (converted) by the conversion unit 141 in a readable and referable manner.
  • the storage unit 142 has a function of storing the inter-span dependency relation estimated by the estimation unit 143 in a readable and referable manner.
  • the estimating unit 143 has a function of estimating and calculating inter-span dependencies of a plurality of spans using event logs of child spans for each parent span created (converted) by the converting unit 141 .
  • the estimating unit 143 performs process mining using event logs of child spans for each parent span created (converted) by the converting unit 141, thereby estimating and calculating inter-span dependencies of a plurality of spans. It has a function to update.
  • microservice application program is used as an example of the application program of the service to be monitored.
  • it is also applicable to any program composed of a plurality of spans (components, that is, modular functional programs that provide predetermined functions).
  • FIG. 3 is a diagram showing the operation of system 1. As shown in FIG. The operation of the estimating device 14 will be mainly described.
  • the method of the present invention extracts endpoint information and parent-child relationships between spans from monitoring data, and converts the monitoring data into event logs for each parent span based on the parent-child relationships. This method estimates the dependency between spans by Details will be described below.
  • FIG. 4 is a diagram showing an example of the basic format of trace data relating to the operation of an application program of a service to be monitored, as an example of monitoring data.
  • the trace data includes, for example, component-level process name, process start time, process end time, reference type name, and related resources.
  • the conversion unit 141 of the estimation device 14 refers to the “reference type name” in the “Reference” column shown in FIG. 4 for each of the four trace data, and based on the “reference type name” Extract parent-child relationships. For example, as shown in FIG. 5, the parent-child relationship between spans is extracted for each of the four trace data D1 to D4.
  • the conversion unit 141 of the estimating device 14 acquires the endpoint information of the application program by referring to the roots of all the extracted parent-child relationships, and creates and holds a parent-child relationship tree for each endpoint.
  • An endpoint is a parent-child relationship tree between spans with a certain span as the root.
  • Endpoint A points to a parent-child relationship tree rooted at span A. For example, a parent-child relationship tree for endpoint A (span A) and a parent-child relationship tree for endpoint B (span B) as shown in FIG. 6 are created.
  • the span set of monitoring data can be divided as to which span is the parent, except for the root (endpoint) span.
  • a plurality of hierarchized spans that form the parent-child relationship tree can be divided into sets of child spans for each parent span. For example, as shown in FIG. 7, a parent span A can be divided into a child span set #1 consisting of child spans A_1, .
  • the conversion unit 141 of the estimating device 14 extracts only the spans included in each divided set from the monitoring data, and extracts the "events" of the spans from the monitoring data.
  • Event has four elements, “operationName”, “eventType”, “timestamp”, and “spanID”, as shown in FIG.
  • operationName is the processing name of the child span ("component level processing name” in Fig. 4).
  • eventType indicates the start or end of child span processing.
  • timestamp indicates the processing start time or processing end time of the child span.
  • spanID is the ID of the child span (“span ID” in FIG. 4).
  • the conversion unit 141 of the estimating device 14 sorts the plurality of extracted “events” according to the “timestamp” of the “event” and the “spanID” included in the event, and the “event column” is created for each parent span of the parent-child relationship tree, and stored in the storage unit 142 as an "event log” (steps S2 and S5).
  • the conversion unit 141 of the estimating device 14 receives the fifth, sixth, .
  • the parent-child relationship between spans is extracted and the parent-child relationship tree is newly created or updated, and stored in the storage unit 142 .
  • the conversion unit 141 of the estimation device 14 refers to the newly created or updated parent-child relationship tree, divides the trace data and creates an event sequence for each parent span having children, and The created “event string” is added to the "event log” (steps S2 and S5).
  • An “event log” is a set of event sequences entered in this way, and all parent spans that have children hold an “event log”. As a result, an event log for each endpoint and each parent span is created and stored in the storage unit 142 as shown in FIG.
  • the conversion unit 141 of the estimation device 14 refers to the existing event log stored in the storage unit 142, and stores the event sequence to be added and the total number in the existing event log. If there is already an event sequence whose order is exactly the same, the event to be added is not added to the event log (steps S3 to S5). In this way, by pre-checking the monitoring data equivalent to the event sequence, the total number of event sequences can be reduced, and the event log size can be reduced.
  • the estimating unit 143 of the estimating device 14 refers to the event logs divided for each endpoint and each parent span and stored in the storage unit 142, executes process mining for each event log, and performs process mining for each event log. For each event log, an estimated dependency between spans included in the event log is calculated.
  • FIG. 9 is a diagram illustrating an example of estimation of inter-span dependencies for each event log.
  • Process mining is a technology that visualizes event logs by connecting them in chronological order. Also called process discovery.
  • the BASECASE function F1 shown in FIG. 10 is used to recursively divide the event log to check whether the event log after division is the minimum unit.
  • the FINDCUT function F2 is used to detect dependencies between spans.
  • the SPLITLOG function F3 uses the SPLITLOG function F3 to split the event log based on the inter-span dependencies detected. As a result, inter-span dependencies are estimated, and the result of dependency estimation is output as a process tree.
  • the FALLTHROUGH function F4 is a function that applies a dependency that is not the smallest unit but could not be divided to any of the dependencies. Therefore, the specific example 1 can be said to be a highly flexible method.
  • Specific example 2 of process discovery is the method of using a microservice demo application. Observing monitoring data that can be obtained from demo applications, etc., most of them can be explained by a combination of parallel processing, serial processing, and iterative processing. Therefore, focusing on such a tendency, a method of first estimating the spans processed in series, arranging them in order, and estimating the portion of repeated processing from the sequence can be applied.
  • Procedure 1 First, the order relation between two elements before and after the target trace data is listed from the beginning, and the number of times each order relation appears is counted. For example, as shown in FIG. 11, four order relationships of "E ⁇ E” appear, so the count number of "E ⁇ E” at positions 4 to 7 is 4. There are two order relations of "I ⁇ E” and "E ⁇ I”, so "I ⁇ E” at position 10, "E ⁇ I” at position 11, and "I ⁇ E” at position 12. , the count number of "E ⁇ I" at position 13 becomes 2.
  • Procedure 2 Next, for the order relation with the count number of 1, the check result is listed as completed. For an order relation with a count number of 2 or more, if the order relation process is repeated, the check result is listed as completed, and the order relation process is repeated, but all the processes are the same. , or if the order-related processing is not repeated, the check result is reserved.
  • Each "E ⁇ E" at positions 4 to 7 repeats the processing, but since all the processing is the same "E", it is held. 'I ⁇ E' at position 10, 'E ⁇ I' at position 11, 'I ⁇ E' at position 12, and 'E ⁇ I' at position 13 are repeated processes, and not all processes are the same. Therefore, it is completed.
  • Step 3 After that, if there is a hold in the check result, the number of elements is increased for the order relation at that position, the count number is recalculated, and step 2 is performed again. Steps 2 and 3 are repeated until the count number of all positions becomes 1 and the check result is completed.
  • the estimating unit 143 of the estimating device 14 stores the inter-span dependency estimation result in the storage unit 142 .
  • the result of estimating inter-span dependencies is stored for each parent span in the same way as the event log.
  • step S9 step S10;
  • the generating device 15 refers to the inter-span dependency relationships stored in the storage unit 142 of the estimating device 14, and generates a service graph of the monitored service based on the dependency relationships.
  • FIG. 12 is an example of a service graph.
  • monitoring data such as trace data obtained by distributed tracing is converted into an event log, and process mining is performed using the event log to estimate the inter-component dependencies.
  • the estimating device 14 extracts the parent-child relationship between spans of a plurality of spans that constitute the application program from monitoring data including the operation history of the microservice application program, and extracts the parent-child relationship a conversion unit 141 that converts the monitoring data into an event log of a child span for each parent span based on the tree of the parent span, and a dependence relationship between spans of a plurality of spans using the event log of the child span for each parent span. and an estimating unit 143 that estimates and calculates the .
  • the event log is divided in advance based on the information held by default in the monitoring data from the beginning, and the process discovery is performed on the divided event log. Since the inter-span dependency relationship is systematically estimated, the inter-span dependency relationship can be estimated faster and more accurately than when all trace data are converted into event logs.
  • the estimation device 14 of this embodiment described above includes, for example, a CPU 901, a memory 902, a storage 903, a communication device 904, an input device 905, and an output device 906, as shown in FIG. It can be realized using a general-purpose computer system.
  • Memory 902 and storage 903 are storage devices.
  • each function of the estimation device 14 is realized by executing a predetermined program loaded on the memory 902 by the CPU 901 .
  • the estimation device 14 may be implemented by one computer.
  • the estimator 14 may be implemented with multiple computers.
  • the estimating device 14 may be a virtual machine implemented on a computer.
  • a program for the estimating device 14 can be stored in computer-readable recording media such as HDD, SSD, USB memory, CD, and DVD.
  • the program for estimating device 14 can also be distributed via a communication network.
  • System 11 Providing Device 12: Monitoring Device 13: Distribution Device 14: Estimation Device 15: Generating Device 16: Storage Device 17: Analysis Device 21: Developer Terminal 22: User Terminal 23: Maintenance Person Terminal 141: Conversion Unit 142 : Storage unit 143: Estimation unit 901: CPU 902: Memory 903: Storage 904: Communication device 905: Input device 906: Output device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

An inference device 14 that infers a dependency relationship between components, said inference device comprising: a conversion unit 141 that extracts a parent-child relationship between components among a plurality of components which constitute a program from observational data containing an operation history of the program, and that converts the observational data into an event log of a child component of each parent component on the basis of a tree representing the parent-child relationship; and an inference unit 143 that uses the event log of the child component of each parent component to infer and calculate the dependency relationship between the components among the plurality of components.

Description

推定装置、推定方法、及び、推定プログラムEstimation device, estimation method, and estimation program
 本発明は、推定装置、推定方法、及び、推定プログラムに関する。 The present invention relates to an estimation device, an estimation method, and an estimation program.
 従来、マイクロサービスが知られている。マイクロサービスは、複数のサービスがインタフェースによって連携した形態を持ち、サービス毎に異なるチームによって開発及びメンテナンスされる。各サービスは、バグ修正、性能向上、顧客のニーズへの適応のために、それぞれの開発チームにより随時更新され続ける。 Conventionally, microservices are known. Microservices have a form in which multiple services are linked by interfaces, and each service is developed and maintained by a different team. Each service will continue to be updated from time to time by its respective development team to fix bugs, improve performance, and adapt to customer needs.
 マイクロサービスの開発者は、一般に、自身が担当しているサービス以外のサービスを十分に把握していない。そのため、ユーザのリクエストがどのサービスによってどのような順序で処理されているかを把握することは、マイクロサービスの開発者及び保守者ともに困難である。  Microservice developers generally do not fully understand services other than the services they are in charge of. Therefore, it is difficult for both developers and maintainers of microservices to understand in what order user requests are processed by which service.
 また、複数のサービスを連携させた複雑な構造を有するマイクロサービスにおいて、エラーメッセージのみを頼りに異常を検知して障害箇所を絞り込むことは、多大な労力を必要とし、障害箇所発見の遅れに繋がる可能性があり、サービス保守を行う上で不十分である。 In addition, in microservices that have a complex structure that links multiple services, detecting anomalies and narrowing down the location of failures by relying only on error messages requires a great deal of labor and leads to delays in discovering failure locations. possible and inadequate for service maintenance.
 そこで、分散トレーシングという技術がある。分散トレーシングは、複数のサービス同士が連携して動作する様を追跡し、1つのフローとして可視化する技術である。非特許文献1では、マイクロサービスの動作履歴を含む監視データを基に、マイクロサービスの動作モデルであるサービスグラフを生成する。 Therefore, there is a technology called distributed tracing. Distributed tracing is a technology that traces how multiple services work together and visualizes them as a single flow. In Non-Patent Document 1, a service graph, which is an operation model of microservices, is generated based on monitoring data including the operation history of microservices.
 例えば、非特許文献1は、OpenTracingAPIによって得られたトレースデータをそのまま用いてコンポーネント間の依存関係を推定し、有向グラフを用いた状態機械を拡張した形式であるペトリネットを用いてサービスグラフを構築する。トレースデータとは、1つのリクエストに対する各サービスの動作履歴の集合である。 For example, in Non-Patent Document 1, trace data obtained by OpenTracingAPI is used as is to estimate dependencies between components, and a service graph is constructed using a Petri net, which is an extended form of a state machine using a directed graph. . Trace data is a set of operation histories of each service for one request.
 非特許文献1は、マイクロサービスを構成する各サービスのコンポーネントを、処理待ち、処理開始、処理中、処理終了、処理済みをそれぞれ表す各図形を矢印線で繋げたスパンというデータ構造で表現する。トレースデータは、スパンの集合となる。サービスグラフは、推定したスパン間の依存関係に基づきスパン同士を矢印線で繋げた様子を表す。  Non-Patent Document 1 expresses the components of each service that constitutes a microservice as a data structure called a span, in which figures representing waiting for processing, starting processing, being processed, finished processing, and processed are connected by arrow lines. Trace data is a set of spans. A service graph represents a state in which spans are connected by arrow lines based on the estimated dependencies between spans.
 そして、非特許文献1は、随時更新され続けるマイクロサービスからの監視データに対して既に生成していたサービスグラフをシミュレーションによって照合することで、その監視データ内での異常を自動で検出する。 Then, Non-Patent Document 1 automatically detects anomalies in the monitoring data by matching the service graph that has already been generated against the monitoring data from the microservice that continues to be updated from time to time.
 サービスグラフを生成するためには、トレースデータ等の監視データからマイクロサービス内に存在するコンポーネント間(スパン間)の依存関係を推定することが必要である。このとき、その依存関係の推定精度が低かったり、推定できた依存関係の種類が不十分であったりすると、実際のマイクロサービスの動作と既に生成していたサービスグラフとの乖離が大きくなり、異常検知の誤検知や見逃しが多くなるという課題があった。  In order to generate a service graph, it is necessary to estimate the dependencies between components (between spans) that exist within a microservice from monitoring data such as trace data. At this time, if the estimation accuracy of the dependencies is low or the types of dependencies that can be estimated are insufficient, the deviation between the actual microservice behavior and the service graph that has already been generated will increase, resulting in abnormalities. There was a problem that false detections and missed detections increased.
 この課題を解決するため、マイクロサービスの開発者又は保守者により、コンポーネント間の依存関係について全体的又は部分的な正解を与えることが考えられる。しかし、開発期間が短く頻繁にアップデートされるマイクロサービスに対してそのような知見を継続的に与えることは、開発者及び保守者にとって大きな負担となる。 In order to solve this problem, it is conceivable that the developer or maintainer of the microservice will give the correct answer, wholly or partially, about the dependencies between the components. However, continuously providing such knowledge to microservices that have short development periods and are frequently updated imposes a heavy burden on developers and maintainers.
 また、コンポーネント間の依存関係を推定するために、高精度かつ柔軟な非特許文献2のプロセスディスカバリーを利用することも考えられる。しかし、何らの事前知識を与えない状態でプロセスディスカバリーを利用すると、その柔軟性を担保するために計算コストが大きくなる傾向がある。マイクロサービスの監視データはマイクロサービスの規模が大きいほどその1個あたりの大きさが巨大になり、計算コストは監視データのサイズに応じて大きく増加する。それゆえ、監視データをそのまま用いて、又はプロセスログに直接変換して、プロセスディスカバリーを適用すると、計算コストが非常に大きくなり、マイクロサービスのアプリケーションプログラムの稼働からサービスグラフの生成までに大きな遅延が生じてしまう。 It is also conceivable to use the highly accurate and flexible process discovery of Non-Patent Document 2 in order to estimate dependencies between components. However, if process discovery is used without giving any prior knowledge, the computational cost tends to increase in order to ensure its flexibility. The larger the scale of the microservice, the larger the size of each microservice monitoring data, and the larger the size of the monitoring data, the larger the calculation cost. Therefore, applying process discovery using monitoring data as it is or converting it directly to process logs will result in a very high computational cost and a large delay from running the microservice application program to generating the service graph. occur.
 本発明は、上記事情に鑑みてなされたものであり、本発明の目的は、コンポーネント間の依存関係を高速かつ高精度に推定可能な技術を提供することである。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technology that can estimate dependencies between components at high speed and with high accuracy.
 本発明の一態様の推定装置は、コンポーネント間の依存関係を推定する推定装置において、プログラムの動作履歴を含む監視データから前記プログラムを構成する複数のコンポーネントのコンポーネント間の親子関係を抽出し、前記親子関係のツリーに基づき前記監視データを親のコンポーネント毎の子のコンポーネントのイベントログに変換する変換部と、前記親のコンポーネント毎の子のコンポーネントのイベントログを用いて前記複数のコンポーネントのコンポーネント間の依存関係を推定計算する推定部と、を備える。 An estimating device of one aspect of the present invention is an estimating device for estimating a dependency relationship between components, extracting a parent-child relationship between components of a plurality of components constituting the program from monitoring data including an operation history of the program, a converter that converts the monitoring data into an event log of a child component for each parent component based on a parent-child relationship tree; and an estimating unit that estimates and calculates the dependency of
 本発明の一態様の推定方法は、コンポーネント間の依存関係を推定する推定方法において、推定装置が、プログラムの動作履歴を含む監視データから前記プログラムを構成する複数のコンポーネントのコンポーネント間の親子関係を抽出し、前記親子関係のツリーに基づき前記監視データを親のコンポーネント毎の子のコンポーネントのイベントログに変換するステップと、前記親のコンポーネント毎の子のコンポーネントのイベントログを用いて前記複数のコンポーネントのコンポーネント間の依存関係を推定計算するステップと、を行う。 An estimation method according to one aspect of the present invention is an estimation method for estimating a dependency relationship between components, wherein an estimation device determines a parent-child relationship between components of a plurality of components constituting the program from monitoring data including an operation history of the program. extracting and converting the monitoring data into event logs of child components for each parent component based on the tree of parent-child relationships; and calculating an estimate of the dependencies between the components of .
 本発明の一態様の推定プログラムは、上記推定装置としてコンピュータを機能させる。 An estimation program according to one aspect of the present invention causes a computer to function as the estimation device.
 本発明によれば、コンポーネント間の依存関係を高速かつ高精度に推定可能な技術を提供できる。 According to the present invention, it is possible to provide a technology capable of estimating dependencies between components at high speed and with high accuracy.
図1は、システムの全体構成を示す図である。FIG. 1 is a diagram showing the overall configuration of the system. 図2は、推定装置の機能ブロック構成を示す図である。FIG. 2 is a diagram showing the functional block configuration of the estimation device. 図3は、システムの動作を示す図である。FIG. 3 is a diagram showing the operation of the system. 図4は、トレースデータの基本的形式の例を示す図である。FIG. 4 is a diagram showing an example of the basic format of trace data. 図5は、スパン間の親子関係の抽出例を示す図である。FIG. 5 is a diagram illustrating an example of extraction of parent-child relationships between spans. 図6は、スパン間の親子関係ツリーの作成例を示す図である。FIG. 6 is a diagram showing an example of creating a parent-child relationship tree between spans. 図7は、親スパン毎の子スパンのイベント列の作成例を示す図である。FIG. 7 is a diagram illustrating an example of creating an event sequence of child spans for each parent span. 図8は、親スパン毎の子スパンのイベントログの作成例を示す図である。FIG. 8 is a diagram illustrating an example of creating an event log for a child span for each parent span. 図9は、イベントログ毎のスパン間の依存関係の推定例を示す図である。FIG. 9 is a diagram illustrating an example of estimation of inter-span dependencies for each event log. 図10は、帰納的マイニングで用いる機能的関数の例を示す図である。FIG. 10 is a diagram showing examples of functional functions used in recursive mining. 図11は、順序関係の推定例を示す図である。FIG. 11 is a diagram illustrating an example of order relationship estimation. 図12は、サービスグラフの例を示す図である。FIG. 12 is a diagram showing an example of a service graph. 図13は、推定装置のハードウェア構成の例を示す図である。FIG. 13 is a diagram illustrating an example of a hardware configuration of an estimation device;
 以下、図面を参照して、本発明の実施形態を説明する。図面の記載において同一部分には同一符号を付し説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the description of the drawings, the same parts are denoted by the same reference numerals, and the description thereof is omitted.
 [発明の概要]
 本発明は、アプリケーションプログラム内のコンポーネントレベルでのサービスの依存関係を元に忠実にマイクロサービスの動作を模擬したサービスモデルを作成し、そのサービスモデルとマイクロサービスのリプレイ時の動作履歴とを照合することで、マイクロサービスの異常検知やシミュレーションによるボトルネック解析に役立てることを目標とする。
[Summary of Invention]
The present invention creates a service model that faithfully simulates the operation of a microservice based on service dependencies at the component level within an application program, and compares the service model with the operation history during replay of the microservice. By doing so, we aim to be useful for detecting anomalies in microservices and analyzing bottlenecks through simulation.
 この目標を実現するため、本発明は、マイクロサービスを提供する複数のコンポーネントのコンポーネント間の依存関係を高精度かつ柔軟に推定可能にし、サービスモデル作成時の人手によるアプリケーション解析や調査の稼働を削減することを目的とする。 To achieve this goal, the present invention enables highly accurate and flexible estimation of the inter-component dependencies of multiple components that provide microservices, thereby reducing the need for manual application analysis and research when creating service models. intended to
 従来、上述したように、アプリケーションプログラム内のコンポーネント間の依存関係を元にマイクロサービスの動作を模擬したサービスモデルを作成することができる。また、そのサービスモデルとマイクロサービスのリプレイ時の動作履歴とを照合することで、そのサービスモデルと実際の動作履歴との違いを異常として検出することができる。 Conventionally, as described above, it is possible to create a service model that simulates the behavior of microservices based on the dependencies between components within an application program. Also, by comparing the service model with the operation history of the microservice during replay, it is possible to detect the difference between the service model and the actual operation history as an anomaly.
 しかし、上述した通り、推定可能なコンポーネント間の依存関係が限定的であり、実際のマイクロサービスの動作との間で乖離の少ないサービスモデルを作成するためには、アプリケーションプログラムに対して保守者又は開発者の知識を与える必要がある。また、コンポーネント間の依存関係を高精度に推定しようとすると、監視データのサイズに応じた計算量の増加が障壁となる。 However, as mentioned above, the dependencies between components that can be estimated are limited, and in order to create a service model that is less likely to deviate from the actual behavior of microservices, maintenance personnel or Need to give developer knowledge. Moreover, an increase in the amount of calculation corresponding to the size of the monitoring data is a barrier to estimating the dependencies between components with high accuracy.
 そこで、本発明は、分散トレーシングによって得られるトレースデータ等の監視データをイベントログに変換し、そのイベントログを用いてプロセスマイニングを行うことでコンポーネント間の依存関係を推定する。すなわち、監視データに一般にデフォルトで含まれるサービス情報を利用し、分割統治的にコンポーネント間の依存関係を推定すること、つまり、監視データを親のコンポーネント毎に変換(分割)した上でプロセスマイニングを行うことで、高速かつ高精度なコンポーネント間の依存関係推定を実現する。 Therefore, the present invention converts monitoring data such as trace data obtained by distributed tracing into an event log, and uses the event log to perform process mining to estimate dependencies between components. In other words, using the service information that is generally included in the monitoring data by default and estimating the dependencies between components in a divide-and-conquer manner, that is, converting (dividing) the monitoring data into each parent component, and then performing process mining. By doing so, it realizes fast and highly accurate dependency estimation between components.
 [システムの全体構成]
 図1は、本実施形態に係るシステム1の全体構成を示す図である。システム1は、マイクロサービスを提供し、その提供を管理及び監視するシステムである。
[Overall system configuration]
FIG. 1 is a diagram showing the overall configuration of a system 1 according to this embodiment. System 1 is a system that provides microservices and manages and monitors their provision.
 マイクロサービスとは、複数のサービスがインタフェースによって連携した形態を持ち、サービス毎に異なるチームによって開発及びメンテナンスされるアプリケーションプログラムである。各サービスは、バグ修正、性能向上、顧客のニーズへの適応のために、それぞれの開発チームにより随時更新され続ける。 A microservice is an application program that has a form in which multiple services are linked by interfaces, and is developed and maintained by a different team for each service. Each service will continue to be updated from time to time by its respective development team to fix bugs, improve performance, and adapt to customer needs.
 システム1は、図1に示したように、提供装置11と、監視装置12と、流通装置13と、推定装置14と、生成装置15と、記憶装置16と、解析装置17と、を備える。また、開発者端末21、ユーザ端末22、保守者端末23が存在する。全ての装置及び端末は、通信可能に物理的及び電気的に接続されている。 The system 1 includes a provision device 11, a monitoring device 12, a distribution device 13, an estimation device 14, a generation device 15, a storage device 16, and an analysis device 17, as shown in FIG. In addition, a developer terminal 21, a user terminal 22, and a maintenance person terminal 23 exist. All devices and terminals are communicatively physically and electrically connected.
 提供装置11は、開発者端末21から機能リリースされるマイクロサービスのアプリケーションプログラムを実行する装置である。提供装置11は、その実行によりマイクロサービスをユーザ端末22へ提供し、その実行時に出力されるマイクロサービスの動作履歴を監視装置12へ送信する装置である。 The providing device 11 is a device that executes a microservice application program function-released from the developer terminal 21 . The providing device 11 is a device that provides a microservice to the user terminal 22 by executing the microservice, and transmits to the monitoring device 12 an operation history of the microservice that is output when the microservice is executed.
 監視装置12は、提供装置11によりユーザ端末22へ提供されるマイクロサービスを監視対象サービスとし、提供装置11から送信される監視対象サービスの動作履歴を含む監視データを流通装置13からの問い合わせに応じて流通装置13へ送信する装置である。 The monitoring device 12 treats the microservice provided by the providing device 11 to the user terminal 22 as a service to be monitored, and sends monitoring data including the operation history of the service to be monitored transmitted from the providing device 11 in response to an inquiry from the distribution device 13. and transmits it to the distribution device 13 .
 流通装置13は、監視データの送信を監視装置12へ問い合わせ、その問い合わせに応じて監視装置12から送信される監視データを推定装置14と解析装置17へ送信する装置である。 The distribution device 13 is a device that inquires of the monitoring device 12 about transmission of monitoring data and transmits the monitoring data transmitted from the monitoring device 12 in response to the inquiry to the estimation device 14 and the analysis device 17 .
 推定装置14は、流通装置13から送信される監視データをイベントログに変換し、そのイベントログを用いてプロセスマイニングを行うことでコンポーネント間の依存関係を推定する装置である。 The estimating device 14 is a device that converts the monitoring data sent from the distribution device 13 into an event log, and performs process mining using the event log to estimate dependencies between components.
 生成装置15は、推定装置14で推定されるコンポーネント間の依存関係を参照し、その依存関係を基に監視対象サービスのサービスグラフを生成する装置である。 The generating device 15 is a device that refers to the inter-component dependencies estimated by the estimating device 14 and generates a service graph of the monitored service based on the dependencies.
 記憶装置16は、生成装置15で作成される監視対象サービスのサービスグラフを参照及び読み出し可能に記憶する装置である。 The storage device 16 is a device that stores the service graph of the monitored service created by the generation device 15 so that it can be referenced and read.
 解析装置17は、記憶装置16に記憶されている監視対象サービスのサービスグラフを参照し、そのサービスグラフと流通装置13から送信される更新後の監視対象サービスのリプレイ時の動作履歴を含む監視データとをシミュレーションによって照合することで当該監視データ内での異常を解析し、その解析結果を開発者端末21と保守者端末23へ送信する装置である。 The analysis device 17 refers to the service graph of the monitored service stored in the storage device 16, and monitors the service graph and monitoring data including the replay operation history of the updated monitored service transmitted from the distribution device 13. are compared by simulation to analyze an abnormality in the monitoring data, and the analysis result is transmitted to the developer terminal 21 and the maintenance person terminal 23 .
 [推定装置の構成]
 図2は、推定装置14の機能ブロック構成を示す図である。推定装置14は、上述の通り、分散トレーシングによって得られるトレースデータ等の監視データをイベントログに変換し、そのイベントログを用いてプロセスマイニングを行うことでコンポーネント間の依存関係を推定する装置である。
[Configuration of estimation device]
FIG. 2 is a diagram showing the functional block configuration of the estimation device 14. As shown in FIG. As described above, the estimating device 14 is a device that converts monitoring data such as trace data obtained by distributed tracing into an event log, and performs process mining using the event log to estimate dependencies between components. be.
 分散トレーシングとは、複数のサービス同士が連携して動作する様を追跡し、1つのフローとして可視化する技術である。  Distributed tracing is a technology that tracks how multiple services work together and visualizes them as a single flow.
 コンポーネントとは、アプリケーションプログラムを構成する部品や構成要素であり、所定の機能を持つ機能プログラムである。本実施形態では、コンポーネントを、処理待ち、処理開始、処理中、処理終了、処理済みをそれぞれ表す各図形を矢印線で繋げたスパンというデータ構造で表現する。以降、コンポーネントをスパンという。 A component is a part or element that makes up an application program, and is a functional program with a predetermined function. In this embodiment, a component is represented by a data structure called a span, in which figures representing waiting for processing, starting processing, being processed, ending processing, and processed are connected by arrow lines. A component is hereinafter referred to as a span.
 推定装置14は、図2に示したように、変換部141と、記憶部142と、推定部143と、を備える。 The estimating device 14 includes a transforming unit 141, a storage unit 142, and an estimating unit 143, as shown in FIG.
 変換部141は、監視対象サービスのアプリケーションプログラムの動作履歴を含む監視データから当該監視対象プログラムを構成する複数のスパンのスパン間の親子関係を抽出し、その親子関係のツリーに基づき当該監視データを親のスパン毎の子のスパンのイベントログに変換する機能を備える。 The conversion unit 141 extracts a parent-child relationship between spans of a plurality of spans forming the target program from monitoring data including the operation history of the application program of the target service, and converts the monitoring data based on the tree of the parent-child relationship. It has a function to convert to event logs of child spans for each parent span.
 変換部141は、複数の子のスパンに係る複数のイベントをイベントの時刻情報及びイベントに含まれるスパンの識別情報に従ってソートしたイベント列で構成されるイベントログに変換する機能を備える。 The conversion unit 141 has a function of converting a plurality of events related to a plurality of child spans into an event log composed of an event string sorted according to the time information of the events and the identification information of the spans included in the events.
 変換部141は、流通装置13から送信される更新後の監視対象サービスのリプレイ時の動作履歴を含む監視データを受信する毎に当該監視データをイベント列に変換し、変換したイベント列が過去のイベント列と異なる場合にはイベント列を更新(格納)し、過去のイベント列と同じ場合には更新(格納)しない機能を備える。 The conversion unit 141 converts the monitoring data into an event string every time it receives monitoring data including the operation history of the replay of the monitored service after update, which is transmitted from the distribution device 13, and converts the converted event string into a past event string. It has a function of updating (storing) the event string if it is different from the event string, and not updating (storing) if it is the same as the past event string.
 記憶部142は、変換部141が作成(変換)するイベントログを参照及び読み出し可能に記憶する機能を備える。記憶部142は、推定部143が推定計算するスパン間の依存関係を参照及び読み出し可能に記憶する機能を備える。 The storage unit 142 has a function of storing the event log created (converted) by the conversion unit 141 in a readable and referable manner. The storage unit 142 has a function of storing the inter-span dependency relation estimated by the estimation unit 143 in a readable and referable manner.
 推定部143は、変換部141が作成(変換)する親のスパン毎の子のスパンのイベントログを用いて複数のスパンのスパン間の依存関係を推定計算する機能を備える。 The estimating unit 143 has a function of estimating and calculating inter-span dependencies of a plurality of spans using event logs of child spans for each parent span created (converted) by the converting unit 141 .
 推定部143は、変換部141が作成(変換)する親のスパン毎の子のスパンのイベントログを用いてプロセスマイニングを行うことで、複数のスパンのスパン間の依存関係を推定計算し、逐次更新する機能を備える。 The estimating unit 143 performs process mining using event logs of child spans for each parent span created (converted) by the converting unit 141, thereby estimating and calculating inter-span dependencies of a plurality of spans. It has a function to update.
 なお、本実施形態では、監視対象サービスのアプリケーションプログラムとして、マイクロサービスのアプリケーションプログラムを例としている。但し、複数のスパン(コンポーネント。つまり、所定の機能を提供する部品的な機能プログラム)で構成される任意のプログラムにも適用可能である。 Note that in this embodiment, a microservice application program is used as an example of the application program of the service to be monitored. However, it is also applicable to any program composed of a plurality of spans (components, that is, modular functional programs that provide predetermined functions).
 [システムの動作]
 図3は、システム1の動作を示す図である。推定装置14の動作を中心に説明する。本発明の方式は、監視データからエンドポイント情報とスパン間の親子関係とを抽出し、その親子関係を基に監視データを親スパン毎のイベントログに変換する方式であり、親スパン毎に独立してスパン間の依存関係を推定する方式である。以下、詳述する。
[System operation]
FIG. 3 is a diagram showing the operation of system 1. As shown in FIG. The operation of the estimating device 14 will be mainly described. The method of the present invention extracts endpoint information and parent-child relationships between spans from monitoring data, and converts the monitoring data into event logs for each parent span based on the parent-child relationships. This method estimates the dependency between spans by Details will be described below.
 ステップS1;
 まず、流通装置13は、監視対象サービスの動作履歴を含む監視データを推定装置14へ送信する。図4は、監視データの例として、監視対象サービスのアプリケーションプログラムの動作に関するトレースデータの基本的形式の例を示す図である。トレースデータには、例えば、コンポーネントレベルでの処理名、処理開始時刻、処理終了時間、参照タイプ名、関連リソースが含まれている。
Step S1;
First, the distribution device 13 transmits monitoring data including the operation history of the monitored service to the estimation device 14 . FIG. 4 is a diagram showing an example of the basic format of trace data relating to the operation of an application program of a service to be monitored, as an example of monitoring data. The trace data includes, for example, component-level process name, process start time, process end time, reference type name, and related resources.
 なお、トレースデータは、監視対象サービスのアプリケーションプログラムが動作する毎に1つ作成される。以降、4つのトレースデータが推定装置14へ送信されたとする。 Note that one piece of trace data is created each time the application program of the monitored service runs. Assume that four pieces of trace data are transmitted to the estimating device 14 thereafter.
 ステップS2~ステップS5;
 次に、推定装置14の変換部141は、4つのトレースデータのそれぞれについて、図4に示した「Reference」欄の「参照タイプ名」を参照し、その「参照タイプ名」に基づきスパン間の親子関係を抽出する。例えば、図5に示すように、4つのトレースデータD1~D4のそれぞれについてスパン間の親子関係を抽出する。
step S2 to step S5;
Next, the conversion unit 141 of the estimation device 14 refers to the “reference type name” in the “Reference” column shown in FIG. 4 for each of the four trace data, and based on the “reference type name” Extract parent-child relationships. For example, as shown in FIG. 5, the parent-child relationship between spans is extracted for each of the four trace data D1 to D4.
 続いて、推定装置14の変換部141は、抽出した全ての親子関係の根を参照することでアプリケーションプログラムのエンドポイント情報を取得し、エンドポイント毎に親子関係ツリーを作成して保持する。エンドポイントとは、あるスパンを根としたスパン間の親子関係ツリーである。エンドポイントAは、Aというスパンを根とした親子関係ツリーを指す。例えば、図6に示すような、エンドポイントA(スパンA)の親子関係ツリー、エンドポイントB(スパンB)の親子関係ツリーを作成する。 Subsequently, the conversion unit 141 of the estimating device 14 acquires the endpoint information of the application program by referring to the roots of all the extracted parent-child relationships, and creates and holds a parent-child relationship tree for each endpoint. An endpoint is a parent-child relationship tree between spans with a certain span as the root. Endpoint A points to a parent-child relationship tree rooted at span A. For example, a parent-child relationship tree for endpoint A (span A) and a parent-child relationship tree for endpoint B (span B) as shown in FIG. 6 are created.
 ここで、作成した親子関係ツリーに基づくと、監視データのスパン集合は、根(エンドポイント)となるスパンを除き、どのスパンが親であるかについて分割することができる。つまり、親子関係ツリーを構成する階層化した複数のスパンを、親スパン毎の子スパンの集合に分割することができる。例えば、図7に示すように、親スパンAについて子スパンA_1、…、子スパンA_nからなる子スパン集合#1、親スパンA_1について子スパン集合#2に分割できる。 Here, based on the created parent-child relationship tree, the span set of monitoring data can be divided as to which span is the parent, except for the root (endpoint) span. In other words, a plurality of hierarchized spans that form the parent-child relationship tree can be divided into sets of child spans for each parent span. For example, as shown in FIG. 7, a parent span A can be divided into a child span set #1 consisting of child spans A_1, .
 そこで、推定装置14の変換部141は、監視データから分割した各集合に含まれるスパンのみを抜き出し、監視データから当該スパンの「イベント」を抽出する。「イベント」は、図7に示したように、「operationName」、「eventType」、「timestamp」、「spanID」という4つの要素を持つ。 Therefore, the conversion unit 141 of the estimating device 14 extracts only the spans included in each divided set from the monitoring data, and extracts the "events" of the spans from the monitoring data. "Event" has four elements, "operationName", "eventType", "timestamp", and "spanID", as shown in FIG.
 「operationName」は、子スパンの処理名(図4の「コンポーネントレベルでの処理名」)である。「eventType」は、子スパンの処理開始又は処理終了を示す。「timestamp」は、子スパンの処理開始時刻又は処理終了時刻を示す。「spanID」は、子スパンのID(図4の「スパンID」)である。 "operationName" is the processing name of the child span ("component level processing name" in Fig. 4). "eventType" indicates the start or end of child span processing. "timestamp" indicates the processing start time or processing end time of the child span. “spanID” is the ID of the child span (“span ID” in FIG. 4).
 そして、推定装置14の変換部141は、抽出した複数の「イベント」を「イベント」の「timestamp」及びイベントに含まれる「spanID」に従ってソートし、そのソートした複数の「イベント」からなる「イベント列」を親子関係ツリーの親スパン毎に作成し、「イベントログ」として記憶部142に格納する(ステップS2、ステップS5)。 Then, the conversion unit 141 of the estimating device 14 sorts the plurality of extracted “events” according to the “timestamp” of the “event” and the “spanID” included in the event, and the “event column" is created for each parent span of the parent-child relationship tree, and stored in the storage unit 142 as an "event log" (steps S2 and S5).
 その後、推定装置14の変換部141は、更新後の監視対象サービスのリプレイ時の動作履歴を含む監視データを受信する毎に、つまり、5つ目、6つ目、…、のトレースデータを受信する毎に、スパン間の親子関係の抽出処理と親子関係ツリーの新規作成又は更新処理とを行い、記憶部142に格納する。 Thereafter, the conversion unit 141 of the estimating device 14 receives the fifth, sixth, . Each time, the parent-child relationship between spans is extracted and the parent-child relationship tree is newly created or updated, and stored in the storage unit 142 .
 さらに、推定装置14の変換部141は、新規作成又は更新した親子関係ツリーを参照して、子を持つ親スパン毎にトレースデータの分割とイベント列の作成とを行い、関連する親スパン毎に当該作成した「イベント列」を「イベントログ」に追加する(ステップS2、ステップS5)。 Furthermore, the conversion unit 141 of the estimation device 14 refers to the newly created or updated parent-child relationship tree, divides the trace data and creates an event sequence for each parent span having children, and The created "event string" is added to the "event log" (steps S2 and S5).
 「イベントログ」とは、このようにして投入されたイベント列の集合であり、子を持つ親スパン全てが「イベントログ」を保持する。これにより、図8に示すような、エンドポイント毎及び親スパン毎のイベントログが作成され、記憶部142に格納される。 An "event log" is a set of event sequences entered in this way, and all parent spans that have children hold an "event log". As a result, an event log for each endpoint and each parent span is created and stored in the storage unit 142 as shown in FIG.
 なお、推定装置14の変換部141は、イベント列をイベントログに追加する際、記憶部142に格納されている既存イベントログを参照し、その既存イベントログに、追加予定のイベント列と総数が等しく、その順番が完全に一致するイベント列が既に存在している場合には、その追加予定のイベントをイベントログに追加しない(ステップS3~ステップS5)。このように、イベント列として同値な監視データの事前チェックを行うことで、イベント列の総数を削減でき、イベントログサイズを削減できる。 When adding an event sequence to the event log, the conversion unit 141 of the estimation device 14 refers to the existing event log stored in the storage unit 142, and stores the event sequence to be added and the total number in the existing event log. If there is already an event sequence whose order is exactly the same, the event to be added is not added to the event log (steps S3 to S5). In this way, by pre-checking the monitoring data equivalent to the event sequence, the total number of event sequences can be reduced, and the event log size can be reduced.
 ステップS6~ステップS8;
 次に、推定装置14の推定部143は、エンドポイント毎及び親スパン毎に分割され、記憶部142に格納されていたイベントログを参照し、そのイベントログ毎にプロセスマイニングをそれぞれ実行し、各イベントログにそれぞれ含まれるスパン間の依存関係をイベントログ毎に推定計算する。図9は、イベントログ毎のスパン間の依存関係の推定例を示す図である。
step S6 to step S8;
Next, the estimating unit 143 of the estimating device 14 refers to the event logs divided for each endpoint and each parent span and stored in the storage unit 142, executes process mining for each event log, and performs process mining for each event log. For each event log, an estimated dependency between spans included in the event log is calculated. FIG. 9 is a diagram illustrating an example of estimation of inter-span dependencies for each event log.
 プロセスマイニングとは、イベントログを時系列で繋ぎ合わせて可視化する技術である。プロセスディスカバリーとも呼ばれている。 Process mining is a technology that visualizes event logs by connecting them in chronological order. Also called process discovery.
 プロセスディスカバリーの具体例1としては、帰納的マイニング(Inductive Miner)を適用する方法がある。帰納的マイニングは、イベントログから「sequence」、「concurrent」、「xor」、「interleaved」、「loop」という5つの依存関係を発見し、その依存関係に基づいてイベントログを分割していき、再帰的にスパン間の依存関係の推定を繰り返す方法である。 As a specific example 1 of process discovery, there is a method of applying inductive miner. Recursive mining finds five dependencies: "sequence", "concurrent", "xor", "interleaved", and "loop" from the event log, and divides the event log based on the dependencies. This is a method of recursively repeating the estimation of inter-span dependencies.
 具体的には、まず、図10に示すBASECASE関数F1を用いて、イベントログを再帰的に分割していく中で分割後のイベントログが最小単位になっているか否かを確認する。次に、FINDCUT関数F2を用いて、スパン間の依存関係を検出する。その後、SPLITLOG関数F3を用いて、検出したスパン間の依存関係を基にイベントログを分割する。これにより、スパン間の依存関係が推定され、依存関係の推定結果がプロセスツリーとして出力される。 Specifically, first, the BASECASE function F1 shown in FIG. 10 is used to recursively divide the event log to check whether the event log after division is the minimum unit. Next, the FINDCUT function F2 is used to detect dependencies between spans. After that, using the SPLITLOG function F3, the event log is split based on the inter-span dependencies detected. As a result, inter-span dependencies are estimated, and the result of dependency estimation is output as a process tree.
 なお、スパン間の依存関係を正確に発見できなかった場合でも、FALLTHROUGH関数F4を用いることで、その依存関係を強引にいずれかの依存関係に当てはめることができる。FALLTHROUGH関数F4は、最小単位ではないが分割もできなかった依存関係をいずれかの依存関係に当てはめる関数である。それ故、具体例1は、柔軟性に富む方法と言える。 Even if the dependency between spans cannot be found accurately, the dependency can be forcibly applied to one of the dependencies by using the FALLTHROUGH function F4. The FALLTHROUGH function F4 is a function that applies a dependency that is not the smallest unit but could not be divided to any of the dependencies. Therefore, the specific example 1 can be said to be a highly flexible method.
 プロセスディスカバリーの具体例2としては、マイクロサービスのデモアプリケーション等を利用する方法である。デモアプリケーション等から取得できる監視データを観察すると、並行処理、直列処理、繰り返し処理の組み合わせで説明できるものが大半である。そこで、このような傾向に着目して、まず直列で処理されているスパンを推定し、順番通りに並べ、その列から繰り返し処理になっている部分を推定する方法が適用できる。 Specific example 2 of process discovery is the method of using a microservice demo application. Observing monitoring data that can be obtained from demo applications, etc., most of them can be explained by a combination of parallel processing, serial processing, and iterative processing. Therefore, focusing on such a tendency, a method of first estimating the spans processed in series, arranging them in order, and estimating the portion of repeated processing from the sequence can be applied.
 具体例1の帰納的マイニングと比べて、推定する候補の依存関係の種類が少ないが、その分計算コストが少なくて済む。以降、カート画面の順序関係の対象トレースデータが「B→C→D→E→E→E→E→E→G→I→E→I→E→I」である場合について説明する。 Compared to the recursive mining of Specific Example 1, there are fewer types of candidate dependencies to be estimated, but the computational cost is reduced accordingly. Hereinafter, a case will be described where the target trace data for the order relationship of the cart screen is "B→C→D→E→E→E→E→E→G→I→E→I→E→I".
 手順1;
 まず、対象トレースデータ内にある前後2要素の順序関係を先頭から順にリストアップし、各順序関係が何回出現しているかをカウントする。例えば、図11に示すように、「E→E」の順序関係は4つ出現しているので、位置4~位置7の「E→E」のカウント数は4になる。「I→E」、「E→I」の順序関係はそれぞれ2つ出現しているので、位置10の「I→E」、位置11の「E→I」、位置12の「I→E」、位置13の「E→I」のカウント数は2になる。
Procedure 1;
First, the order relation between two elements before and after the target trace data is listed from the beginning, and the number of times each order relation appears is counted. For example, as shown in FIG. 11, four order relationships of "E→E" appear, so the count number of "E→E" at positions 4 to 7 is 4. There are two order relations of "I→E" and "E→I", so "I→E" at position 10, "E→I" at position 11, and "I→E" at position 12. , the count number of "E→I" at position 13 becomes 2.
 手順2;
 次に、カウント数が1の順序関係については、チェック結果を完了としてリストアップする。カウント数が2以上の順序関係については、その順序関係の処理が繰り返されている場合には、チェック結果を完了としてリストアップし、その順序関係の処理が繰り返されているが全ての処理が同じ場合、又はその順序関係の処理が繰り返されていない場合には、チェック結果を保留とする。位置4~位置7の各「E→E」は、処理が繰り返されているが、全ての処理が同じ「E」であるため、保留とする。位置10の「I→E」、位置11の「E→I」、位置12の「I→E」、位置13の「E→I」は、処理が繰り返されており、全ての処理が同じでないため、完了とする。
Procedure 2;
Next, for the order relation with the count number of 1, the check result is listed as completed. For an order relation with a count number of 2 or more, if the order relation process is repeated, the check result is listed as completed, and the order relation process is repeated, but all the processes are the same. , or if the order-related processing is not repeated, the check result is reserved. Each "E→E" at positions 4 to 7 repeats the processing, but since all the processing is the same "E", it is held. 'I→E' at position 10, 'E→I' at position 11, 'I→E' at position 12, and 'E→I' at position 13 are repeated processes, and not all processes are the same. Therefore, it is completed.
 手順3;
 その後、チェック結果に保留があれば、その位置にある順序関係について、要素数を増やしてカウント数を再計算し、再度、手順2実施する。全ての位置のカウント数が1になり、チェック結果が完了になるまで手順2、手順3を繰り返す。
Step 3;
After that, if there is a hold in the check result, the number of elements is increased for the order relation at that position, the count number is recalculated, and step 2 is performed again. Steps 2 and 3 are repeated until the count number of all positions becomes 1 and the check result is completed.
 その後、推定装置14の推定部143は、スパン間の依存関係の推定結果を記憶部142に格納する。スパン間の依存関係の推定結果は、イベントログと同様に親スパン毎に格納される。 After that, the estimating unit 143 of the estimating device 14 stores the inter-span dependency estimation result in the storage unit 142 . The result of estimating inter-span dependencies is stored for each parent span in the same way as the event log.
 ステップS9、ステップS10;
 最後に、生成装置15は、推定装置14の記憶部142に格納されているスパン間の依存関係を参照し、その依存関係を基に監視対象サービスのサービスグラフを生成する。図12は、サービスグラフの例である。
step S9, step S10;
Finally, the generating device 15 refers to the inter-span dependency relationships stored in the storage unit 142 of the estimating device 14, and generates a service graph of the monitored service based on the dependency relationships. FIG. 12 is an example of a service graph.
 [効果]
 以上の通り、本実施形態では、分散トレーシングによって得られるトレースデータ等の監視データをイベントログに変換し、そのイベントログを用いてプロセスマイニングを行うことでコンポーネント間の依存関係を推定する。
[effect]
As described above, in this embodiment, monitoring data such as trace data obtained by distributed tracing is converted into an event log, and process mining is performed using the event log to estimate the inter-component dependencies.
 具体的には、本実施形態に係る推定装置14は、マイクロサービスのアプリケーションプログラムの動作履歴を含む監視データから当該アプリケーションプログラムを構成する複数のスパンのスパン間の親子関係を抽出し、その親子関係のツリーに基づき当該監視データを親のスパン毎の子のスパンのイベントログに変換する変換部141と、親のスパン毎の子のスパンのイベントログを用いて複数のスパンのスパン間の依存関係を推定計算する推定部143と、を備える。 Specifically, the estimating device 14 according to the present embodiment extracts the parent-child relationship between spans of a plurality of spans that constitute the application program from monitoring data including the operation history of the microservice application program, and extracts the parent-child relationship a conversion unit 141 that converts the monitoring data into an event log of a child span for each parent span based on the tree of the parent span, and a dependence relationship between spans of a plurality of spans using the event log of the child span for each parent span. and an estimating unit 143 that estimates and calculates the .
 プロセスディスカバリーはイベントの対を取り出す操作を多用するため、イベントログのサイズが大きくなると計算コストが肥大化する。この点について、本実施形態では、監視データが最初からデフォルトで保持する情報を基にイベントログの分割処理を事前に行い、分割処理が行われたイベントログについてプロセスディスカバリーを行うので、つまり、分割統治的にスパン間の依存関係推定を行うので、トレースデータを全てイベントログに変換する場合よりも高速かつ高精度にスパン間の依存関係を推定することができる。 Because process discovery heavily uses the operation of retrieving pairs of events, the computational cost increases as the size of the event log increases. Regarding this point, in the present embodiment, the event log is divided in advance based on the information held by default in the monitoring data from the beginning, and the process discovery is performed on the divided event log. Since the inter-span dependency relationship is systematically estimated, the inter-span dependency relationship can be estimated faster and more accurately than when all trace data are converted into event logs.
 その結果、実際のマイクロサービスの動作と既に生成していたサービスグラフとの乖離が小さくなり、異常検知の誤検知や見逃しが多くなるという課題を解決することができ、マイクロサービスの開発者及び保守者に与える負担を低減することができる。 As a result, the divergence between the actual microservice operation and the already generated service graph is reduced, and the problem of more false detections and oversights of anomaly detection can be solved, and microservice developers and maintenance staff can It is possible to reduce the burden on people.
 [その他]
 本発明は、上記実施形態に限定されない。本発明は、本発明の要旨の範囲内で数々の変形が可能である。
[others]
The invention is not limited to the above embodiments. The present invention can be modified in many ways within the scope of the gist of the present invention.
 上記説明した本実施形態の推定装置14は、例えば、図13に示すように、CPU901と、メモリ902と、ストレージ903と、通信装置904と、入力装置905と、出力装置906と、を備えた汎用的なコンピュータシステムを用いて実現できる。メモリ902及びストレージ903は、記憶装置である。当該コンピュータシステムにおいて、CPU901がメモリ902上にロードされた所定のプログラムを実行することにより、推定装置14の各機能が実現される。 The estimation device 14 of this embodiment described above includes, for example, a CPU 901, a memory 902, a storage 903, a communication device 904, an input device 905, and an output device 906, as shown in FIG. It can be realized using a general-purpose computer system. Memory 902 and storage 903 are storage devices. In the computer system, each function of the estimation device 14 is realized by executing a predetermined program loaded on the memory 902 by the CPU 901 .
 推定装置14は、1つのコンピュータで実装されてもよい。推定装置14は、複数のコンピュータで実装されてもよい。推定装置14は、コンピュータに実装される仮想マシンであってもよい。推定装置14用のプログラムは、HDD、SSD、USBメモリ、CD、DVD等のコンピュータ読取り可能な記録媒体に記憶できる。推定装置14用のプログラムは、通信ネットワークを介して配信することもできる。 The estimation device 14 may be implemented by one computer. The estimator 14 may be implemented with multiple computers. The estimating device 14 may be a virtual machine implemented on a computer. A program for the estimating device 14 can be stored in computer-readable recording media such as HDD, SSD, USB memory, CD, and DVD. The program for estimating device 14 can also be distributed via a communication network.
 1:システム
 11:提供装置
 12:監視装置
 13:流通装置
 14:推定装置
 15:生成装置
 16:記憶装置
 17:解析装置
 21:開発者端末
 22:ユーザ端末
 23:保守者端末
 141:変換部
 142:記憶部
 143:推定部
 901:CPU
 902:メモリ
 903:ストレージ
 904:通信装置
 905:入力装置
 906:出力装置
1: System 11: Providing Device 12: Monitoring Device 13: Distribution Device 14: Estimation Device 15: Generating Device 16: Storage Device 17: Analysis Device 21: Developer Terminal 22: User Terminal 23: Maintenance Person Terminal 141: Conversion Unit 142 : Storage unit 143: Estimation unit 901: CPU
902: Memory 903: Storage 904: Communication device 905: Input device 906: Output device

Claims (7)

  1.  コンポーネント間の依存関係を推定する推定装置において、
     プログラムの動作履歴を含む監視データから前記プログラムを構成する複数のコンポーネントのコンポーネント間の親子関係を抽出し、前記親子関係のツリーに基づき前記監視データを親のコンポーネント毎の子のコンポーネントのイベントログに変換する変換部と、
     前記親のコンポーネント毎の子のコンポーネントのイベントログを用いて前記複数のコンポーネントのコンポーネント間の依存関係を推定計算する推定部と、
     を備える推定装置。
    In an estimator that estimates dependencies between components,
    A parent-child relationship between components of a plurality of components constituting the program is extracted from monitoring data including the operation history of the program, and the monitoring data is stored in the event log of the child component for each parent component based on the tree of the parent-child relationship. a conversion unit that converts;
    an estimating unit that estimates and calculates dependencies between components of the plurality of components using event logs of child components for each of the parent components;
    An estimating device comprising:
  2.  前記変換部は、
     複数の子のコンポーネントに係る複数のイベントをイベントの時刻情報及びイベントに含まれるコンポーネントの識別情報に従ってソートしたイベント列で構成されるイベントログに変換する請求項1に記載の推定装置。
    The conversion unit
    2. The estimation device according to claim 1, wherein a plurality of events related to a plurality of child components are converted into an event log composed of an event string sorted according to time information of events and identification information of components included in the events.
  3.  前記変換部は、
     前記監視データを受信する毎に変換した前記イベント列が過去のイベント列と異なる場合にはイベント列を更新し、過去のイベント列と同じ場合には更新しない請求項2に記載の推定装置。
    The conversion unit
    3. The estimation device according to claim 2, wherein the event sequence is updated when the event sequence converted each time the monitoring data is received is different from the past event sequence, and is not updated when the event sequence is the same as the past event sequence.
  4.  前記推定部は、
     前記親のコンポーネント毎の子のコンポーネントのイベントログを用いてプロセスマイニングを行うことで前記複数のコンポーネントのコンポーネント間の依存関係を逐次更新する請求項1に記載の推定装置。
    The estimation unit
    The estimating device according to claim 1, wherein the inter-component dependencies of the plurality of components are sequentially updated by performing process mining using event logs of child components of each of the parent components.
  5.  前記プログラムは、
     複数のプログラムがインタフェースによって連携した形態を持つマイクロサービス用のアプリケーションプログラムである請求項1に記載の推定装置。
    Said program
    2. The estimating device according to claim 1, which is an application program for microservices in which a plurality of programs are linked by an interface.
  6.  コンポーネント間の依存関係を推定する推定方法において、
     推定装置が、
     プログラムの動作履歴を含む監視データから前記プログラムを構成する複数のコンポーネントのコンポーネント間の親子関係を抽出し、前記親子関係のツリーに基づき前記監視データを親のコンポーネント毎の子のコンポーネントのイベントログに変換するステップと、
     前記親のコンポーネント毎の子のコンポーネントのイベントログを用いて前記複数のコンポーネントのコンポーネント間の依存関係を推定計算するステップと、
     を行う推定方法。
    In the estimation method for estimating dependencies between components,
    The estimation device
    A parent-child relationship between components of a plurality of components constituting the program is extracted from monitoring data including the operation history of the program, and the monitoring data is stored in the event log of the child component for each parent component based on the tree of the parent-child relationship. a step of converting;
    calculating estimated dependencies between components of the plurality of components using event logs of child components for each of the parent components;
    estimation method.
  7.  請求項1乃至5のいずれかに記載の推定装置としてコンピュータを機能させる推定プログラム。 An estimation program that causes a computer to function as the estimation device according to any one of claims 1 to 5.
PCT/JP2021/039788 2021-10-28 2021-10-28 Inference device, inference method and inference program WO2023073859A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/039788 WO2023073859A1 (en) 2021-10-28 2021-10-28 Inference device, inference method and inference program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/039788 WO2023073859A1 (en) 2021-10-28 2021-10-28 Inference device, inference method and inference program

Publications (1)

Publication Number Publication Date
WO2023073859A1 true WO2023073859A1 (en) 2023-05-04

Family

ID=86157549

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/039788 WO2023073859A1 (en) 2021-10-28 2021-10-28 Inference device, inference method and inference program

Country Status (1)

Country Link
WO (1) WO2023073859A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009054843A (en) * 2007-08-28 2009-03-12 Omron Corp Device, method and program for process abnormality detection
WO2021166118A1 (en) * 2020-02-19 2021-08-26 日本電信電話株式会社 Service graph generation device, service graph generation method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009054843A (en) * 2007-08-28 2009-03-12 Omron Corp Device, method and program for process abnormality detection
WO2021166118A1 (en) * 2020-02-19 2021-08-26 日本電信電話株式会社 Service graph generation device, service graph generation method, and program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARCO PEGORARO; MERIH SERAN UYSAL; WIL M.P. VAN DER AALST: "Efficient Construction of Behavior Graphs for Uncertain Event Data", ARXIV.ORG, 9 March 2020 (2020-03-09), XP091133906, DOI: 10.1007/978-3-030-53337-3_6 *
YAMAGUCHI, NAOFUMI ET AL.: "Anomaly detection with process-based method for cyber-physical system with MQTT protocol", IEICE TECHNICAL REPORT, vol. 119, no. 313, 21 November 2019 (2019-11-21), pages 111 - 116, XP009545219 *

Similar Documents

Publication Publication Date Title
US11968264B2 (en) Systems and methods for operation management and monitoring of bots
US20190317885A1 (en) Machine-Assisted Quality Assurance and Software Improvement
EP2572294B1 (en) System and method for sql performance assurance services
CN104137078A (en) Operation management device, operation management method, and program
JP2019515403A (en) Graph database for diagnosis and system health monitoring
CN111737033A (en) Micro-service fault positioning method based on runtime map analysis
Koziolek et al. An industrial case study on quality impact prediction for evolving service-oriented software
Lai et al. milliscope: A fine-grained monitoring framework for performance debugging of n-tier web services
Willnecker et al. Using dynatrace monitoring data for generating performance models of java ee applications
US9621679B2 (en) Operation task managing apparatus and method
Ehlers et al. A self-adaptive monitoring framework for component-based software systems
Mazkatli et al. Continuous integration of performance model
Yu et al. TraceRank: Abnormal service localization with dis‐aggregated end‐to‐end tracing data in cloud native systems
Schulz et al. Microservice-tailored generation of session-based workload models for representative load testing
CN115374595A (en) Automatic software process modeling method and system based on process mining
US11748226B2 (en) Service graph generator, service graph generation method, and program
WO2023073859A1 (en) Inference device, inference method and inference program
Hao et al. Usage-based statistical testing of web applications
US8997064B2 (en) Symbolic testing of software using concrete software execution
CN111338609B (en) Information acquisition method, device, storage medium and terminal
Danciu et al. Performance awareness in Java EE development environments
Li et al. Modeling web application for cross-browser compatibility testing
Kothari et al. Reducing program comprehension effort in evolving software by recognizing feature implementation convergence
Thomas et al. SIM-PIPE DryRunner: An approach for testing container-based big data pipelines and generating simulation data
Abdullatif et al. UML-JMT: A Tool for Evaluating Performance Requirements

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21962407

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023555977

Country of ref document: JP