JP2022142456A

JP2022142456A - Abnormality handling program, abnormality handling system, and abnormality handling method

Info

Publication number: JP2022142456A
Application number: JP2021042634A
Authority: JP
Inventors: 正人伊藤; Masato Ito; 大希山越; Daiki Yamakoshi; 敦桑林; Atsushi Kuwabayashi; 要高落; Kaname Takaochi; 勉金子; Tsutomu Kaneko; 恭兵杉野; Kyohei Sugino
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2022-09-30

Abstract

To suppress a delay in handling for an abnormality of a container.SOLUTION: An abnormality handling program causes a computer to perform processing in which priorities of abnormalities having occurred in each of a plurality of containers are set, a logic for resolving the abnormalities is generated for each type of the abnormalities, and the logic is performed to the abnormalities in order of the priorities.SELECTED DRAWING: Figure 7

Description

本発明は、異常対処プログラム、異常対処システム、及び異常対処方法に関する。 The present invention relates to an anomaly handling program, an anomaly handling system, and an anomaly handling method.

コンピュータを仮想化する技術の一つにコンテナ仮想化技術がある。コンテナ仮想化技術は、OS(Operating System)のカーネルの一部を利用して仮想化を行うため、VM(Virtual Machine)仮想化技術と比較して仮想化のオーバヘッドが小さく軽量であるという利点がある。そのコンテナ仮想化技術においては、互いに独立した複数のユーザ空間が生成される。これらのユーザ空間はコンテナと呼ばれ、そのコンテナの各々においてアプリケーションプログラムが実行される。 One of the technologies for virtualizing computers is container virtualization technology. Container virtualization technology uses a part of the kernel of the OS (Operating System) to perform virtualization, so it has the advantage of having less virtualization overhead and being lighter than VM (Virtual Machine) virtualization technology. be. In the container virtualization technology, a plurality of mutually independent user spaces are generated. These user spaces are called containers, each of which runs an application program.

コンテナを利用して業務システムを構築する場合、各々のコンテナで一つのアプリケーションプログラムを実行するMSA(Micro Service Architecture)と呼ばれるアーキテクチャを採用することがある。MSAは、業務システムを複数の機能に分割し、各機能を実現するアプリケーションプログラムをコンテナで実行するアーキテクチャである。前述のようにコンテナは軽量であるため、コンテナを利用したMSAでは業務システムを簡単にスケールアウトすることができるというメリットがある。 When building a business system using containers, an architecture called MSA (Micro Service Architecture), in which each container executes one application program, may be adopted. MSA is an architecture that divides a business system into multiple functions and executes application programs that implement each function in containers. As mentioned above, containers are lightweight, so MSA using containers has the advantage of being able to easily scale out business systems.

但し、MSAでは、例えば一つのコンテナの負荷が増大したときの負荷軽減を図る等の目的でコンテナの個数が急増したり、更にコンテナ同士の依存関係が複雑になったりする。そのため、コンテナに異常が発生した場合、業務システムの運用者がその異常に対処するのが極めて難しくなり、ひいては異常への対処が遅れてしまう。 However, in MSA, for example, the number of containers increases rapidly for the purpose of reducing the load when the load of one container increases, and the dependency relationship between containers becomes complicated. Therefore, when an abnormality occurs in a container, it becomes extremely difficult for the operator of the business system to deal with the abnormality, resulting in a delay in dealing with the abnormality.

特開２０１３－１６１３０５号公報JP 2013-161305 A 国際公開第２０１３／０３５２４３号WO2013/035243

一側面によれば、コンテナの異常への対処が遅れるのを抑制することを目的とする。 According to one aspect, it is an object to suppress delays in dealing with abnormalities in containers.

一側面によれば、複数のコンテナの各々に発生した異常の優先度を設定し、前記異常の種類ごとに、当該異常を解消させるロジックを生成し、前記優先度の順に、前記異常に対して前記ロジックを実行する処理をコンピュータに実行させるための異常対処プログラムが提供される。 According to one aspect, the priority of anomalies occurring in each of a plurality of containers is set, logic for resolving the anomaly is generated for each type of anomaly, and the anomalies are treated in order of the priority. An anomaly handling program is provided for causing a computer to execute a process of executing the logic.

一側面によれば、コンテナの異常への対処が遅れるのを抑制できる。 According to one aspect, it is possible to suppress delays in dealing with abnormalities in containers.

図１は、業務システムの監視方法について示す模式図である。FIG. 1 is a schematic diagram showing a method of monitoring a business system. 図２は、問題を示す模式図（その１）である。FIG. 2 is a schematic diagram (part 1) showing the problem. 図３は、問題を示す模式図（その２）である。FIG. 3 is a schematic diagram (part 2) showing the problem. 図４は、業務システムの別の監視方法について示す模式図である。FIG. 4 is a schematic diagram showing another method of monitoring a business system. 図５は、問題について示す模式図（その３）である。FIG. 5 is a schematic diagram (part 3) showing the problem. 図６は、問題について示す模式図（その４）である。FIG. 6 is a schematic diagram (part 4) showing the problem. 図７は、本実施形態に係る異常対処システムの機能構成図である。FIG. 7 is a functional configuration diagram of the abnormality handling system according to this embodiment. 図８は、サービス情報の模式図である。FIG. 8 is a schematic diagram of service information. 図９（ａ）、（ｂ）は、サービス情報の取得方法の例について示す模式図である。FIGS. 9A and 9B are schematic diagrams showing an example of a method of acquiring service information. 図１０は、サービス情報に含まれる「Using_service」の項目を取得する方法の模式図である。FIG. 10 is a schematic diagram of a method of acquiring the item "Using_service" included in the service information. 図１１は、異常情報リソースの模式図である。FIG. 11 is a schematic diagram of anomaly information resources. 図１２は、ロジックデータベースの模式図である。FIG. 12 is a schematic diagram of a logic database. 図１３は、運用ポリシの模式図である。FIG. 13 is a schematic diagram of an operational policy. 図１４は、業務システムの運用を開始するときの処理の流れを示すフローチャートである。FIG. 14 is a flow chart showing the flow of processing when starting the operation of the business system. 図１５は、本実施形態に係る異常対処方法のフローチャートである。FIG. 15 is a flowchart of an abnormality coping method according to this embodiment. 図１６は、異常対処処理のフローチャートである。FIG. 16 is a flowchart of the abnormality handling process. 図１７は、本実施形態の第１例に係る異常対処について説明するための模式図である。FIG. 17 is a schematic diagram for explaining the abnormality handling according to the first example of the present embodiment. 図１８（ａ）は本実施形態の第１例に係るサービス情報の模式図であり、図１８（ｂ）は本実施形態の第１例において制御部が生成した異常情報リソースの模式図である。FIG. 18(a) is a schematic diagram of service information according to the first example of the present embodiment, and FIG. 18(b) is a schematic diagram of anomaly information resources generated by the control unit in the first example of the present embodiment. . 図１９は、本実施形態の第１例においてロジック生成部が生成したロジックの模式図である。FIG. 19 is a schematic diagram of logic generated by the logic generation unit in the first example of the present embodiment. 図２０は、本実施形態の第２例に係る異常対処について説明するための模式図である。FIG. 20 is a schematic diagram for explaining the abnormality handling according to the second example of the present embodiment. 図２１は、本実施形態の第２例に係るサービス情報を示す模式図である。FIG. 21 is a schematic diagram showing service information according to the second example of the present embodiment. 図２２は、図２１のサービス情報を利用して制御部が生成したサービストポロジの模式図である。22 is a schematic diagram of a service topology generated by the control unit using the service information of FIG. 21. FIG. 図２３は、本実施形態の第２例において制御部が生成した異常情報リソースの模式図である。FIG. 23 is a schematic diagram of anomaly information resources generated by the control unit in the second example of the present embodiment. 図２４は、本実施形態に係る異常対処装置のハードウェア構成図である。FIG. 24 is a hardware configuration diagram of an abnormality handling device according to this embodiment.

本実施形態の説明に先立ち、本願発明者が検討した事項について説明する。 Prior to the description of the present embodiment, matters examined by the inventors of the present application will be described.

図１は、業務システムの監視方法について示す模式図である。 FIG. 1 is a schematic diagram showing a method of monitoring a business system.

図１においては、複数のサービス４によって実現される業務システム１について例示している。業務システム１はMSAを採用したシステムであり、MSAの個々の機能がサービス４によって実現される。 FIG. 1 illustrates a business system 1 realized by a plurality of services 4. As shown in FIG. The business system 1 is a system that employs MSA, and individual functions of MSA are implemented by services 4 .

各々のサービス４は、コンテナ基盤２の上で起動した複数のコンテナの各々のアプリケーションプログラムで実現される。コンテナ基盤２は、複数のコンテナを自動的に配備するプログラムであって、Kubernetes（登録商標）やOpenShift（登録商標）等がその一例である。 Each service 4 is realized by each application program of a plurality of containers activated on the container base 2. FIG. The container infrastructure 2 is a program that automatically deploys multiple containers, and examples thereof include Kubernetes (registered trademark) and OpenShift (registered trademark).

サービス４を実現するコンテナに異常が発生しているかを判断するために、この例では各々のコンテナをコンテナ監視ソフトウェア５で監視する。コンテナ監視ソフトウェア５は、コンテナの異常を検知した場合には、業務システム１の運用者の操作端末６にアラートを表示する。そして、業務システム１の運用者は、アラートが通知された順に異常に対処することになる。 In this example, each container is monitored by container monitoring software 5 in order to determine whether an abnormality has occurred in the container that implements the service 4 . The container monitoring software 5 displays an alert on the operation terminal 6 of the operator of the business system 1 when an abnormality of the container is detected. Then, the operator of the business system 1 deals with the abnormality in the order in which the alerts are notified.

しかし、コンテナ監視ソフトウェア５は軽微な異常や重篤な異常を問わずにアラートを表示するため、この方法では業務システム１の運用者が重篤な異常に対処するのが遅れる可能性がある。 However, since the container monitoring software 5 displays alerts regardless of minor or serious abnormalities, this method may delay the operator of the business system 1 in dealing with serious abnormalities.

図２は、この問題を示す模式図である。図２の例では、軽微な異常である「軽微A」～「軽微D」という異常と、重大な重篤な異常である「重大A」という異常が各コンテナに発生した場合を想定している。 FIG. 2 is a schematic diagram illustrating this problem. In the example in Figure 2, it is assumed that each container has an abnormality of "minor A" to "minor D", which is a minor abnormality, and an abnormality of "major A", which is a serious serious abnormality. .

この場合、運用者は、異常が発生した順に対処するため、「重大A」という重篤な異常に対処するのが遅れてしまい、業務システム１自体の運用に問題が生じてしまう。更に、業務システム１の運用者が異常の緊急性を考慮して異常への対応順を決めても、対応順を決める判断そのものに時間がかかり、異常への対処が遅れるおそれがある。 In this case, since the operator deals with the errors in the order in which they occur, there is a delay in dealing with the serious error of "serious A", which causes a problem in the operation of the business system 1 itself. Furthermore, even if the operator of the business system 1 decides the order of dealing with anomalies in consideration of the urgency of the anomaly, it takes time to determine the order of dealing with the anomaly, which may delay the handling of the anomaly.

また、運用者が手動で異常に対処するのではなく、異常への対処をスクリプトで自動化することも考えられる。しかし、単純にスクリプトを組んだのでは、上記と同様にアラートが通知された異常から順にスクリプトが対処するため、やはり重篤な異常への対処が遅れる。 It is also conceivable to automate the handling of anomalies using a script rather than having the operator deal with the anomalies manually. However, if the script is simply constructed, the script will deal with the anomaly in the order that the alert was notified, as in the above case, so the handling of the serious anomaly will be delayed.

しかも、この方法では、異常への対処後に当該異常が解消されたかを運用者が確認する必要があり、解消されていない場合には再び同じ異常に対して対処する必要がある。 Moreover, in this method, the operator needs to check whether the abnormality has been resolved after the abnormality has been dealt with, and if the abnormality has not been resolved, the same abnormality must be dealt with again.

図３は、この問題を示す模式図である。図３では、運用者が同じ異常に対して何度も対処した場合を想定している。これでは運用者の負担が大きくなり煩わしさに堪えない。 FIG. 3 is a schematic diagram illustrating this problem. In FIG. 3, it is assumed that the operator has dealt with the same abnormality many times. This increases the burden on the operator and makes it unbearable.

図４は、業務システムの別の監視方法について示す模式図である。なお、図４において図１で説明したのと同じ要素には図１におけるのと同じ符号を付し、以下ではその説明を省略する。 FIG. 4 is a schematic diagram showing another method of monitoring a business system. In FIG. 4, the same elements as those explained in FIG. 1 are denoted by the same reference numerals as in FIG. 1, and the explanation thereof will be omitted below.

図４の例では、コンテナ基盤２に備わっているヘルスチェック機構２ａがサービス４を実行しているコンテナの異常を検出し、異常が検出された場合にはヘルスチェック機構２ａが異常を解消させる。 In the example of FIG. 4, the health check mechanism 2a provided in the container base 2 detects an abnormality in the container executing the service 4, and when the abnormality is detected, the health check mechanism 2a eliminates the abnormality.

しかし、ヘルスチェック機構２ａが監視可能な異常の種類は、コンテナ自身が停止した等の異常に限定されており、コンテナの内部の異常をヘルスチェック機構２ａが検出することはできない。 However, the types of abnormalities that can be monitored by the health check mechanism 2a are limited to abnormalities such as the stoppage of the container itself, and the health check mechanism 2a cannot detect abnormalities inside the container.

更に、ヘルスチェック機構２ａは、複数のコンテナに跨って発生した高度な異常を検出することもできない。 Furthermore, the health check mechanism 2a cannot detect high-level abnormalities occurring across multiple containers.

図５は、その問題について示す模式図である。図５に示すように、ヘルスチェック機構２ａは、コンテナの内部でのみ有効なコマンドを実行することで当該コンテナのログを取得し、ログに基づいて異常の有無の判断を行う。そのため、一つのコンテナでコマンドを実行しても、そのコンテナとは別のコンテナのログを取得することができず、複数のサービス４に跨る異常を検出することができない。 FIG. 5 is a schematic diagram showing the problem. As shown in FIG. 5, the health check mechanism 2a obtains a log of the container by executing a command valid only inside the container, and determines whether or not there is an abnormality based on the log. Therefore, even if a command is executed in one container, it is not possible to acquire the log of a container other than that container, and it is impossible to detect an abnormality across a plurality of services 4 .

しかも、ヘルスチェック機構２ａは、ある異常を解消させることができない場合、その異常に対する対処を繰り返して行うため、重篤な異常への対処が遅れる可能性がある。 In addition, when the health check mechanism 2a cannot resolve a certain abnormality, it repeatedly deals with the abnormality, which may delay the handling of a serious abnormality.

図６は、その問題について示す模式図である。図６に示すように、ヘルスチェック機構２ａがあるコンテナのサービス４の異常に対して繰り返し対処をしている間は、他のコンテナで実行しているサービス４の異常が放置されてその対処が遅れてしまう。以下、本実施形態について説明する。 FIG. 6 is a schematic diagram showing the problem. As shown in FIG. 6, while the health check mechanism 2a is repeatedly dealing with an abnormality in the service 4 of a certain container, the abnormality in the service 4 running in another container is left unaddressed. I will be late. The present embodiment will be described below.

（本実施形態）
図７は、本実施形態に係る異常対処システムの機能構成図である。 (this embodiment)
FIG. 7 is a functional configuration diagram of the abnormality handling system according to this embodiment.

図７に示すように、異常対処システム２０は、業務システム２１と異常対処装置２２とを有する。このうち、業務システム２１は、複数のサービス２３で実現されるMSAを採用したシステムである。各々のサービス２３は、コンテナ２４とサイドカープロキシ２５とを有する。各々のコンテナ２４は、業務システム２１の複数の機能のうちの一つを実現するためのアプリケーションプログラムを一つ実行しており、これらのアプリケーションプログラムによって業務システム２１の機能が実現される。 As shown in FIG. 7, the error handling system 20 has a business system 21 and an error handling device 22 . Among them, the business system 21 is a system that employs MSA realized by a plurality of services 23 . Each service 23 has a container 24 and a sidecar proxy 25 . Each container 24 executes one application program for realizing one of a plurality of functions of the business system 21, and the functions of the business system 21 are realized by these application programs.

各々のコンテナ２４とサイドカープロキシ２５はコンテナ基盤２６の上で実行される。コンテナ基盤２６は、コンテナ２４を起動するためのDocker（登録商標）等のコンテナエンジンと、各コンテナ２４を管理するKubernetes等のコンテナ管理プログラムとをコンピュータの上で実行することで実現されるコンテナ実行環境である。コンテナエンジンとコンテナ管理プログラムを実行するコンピュータは特に限定されず、物理マシンや仮想マシンの上でこれらのプログラムを実行し得る。 Each container 24 and sidecar proxy 25 runs on top of container infrastructure 26 . The container platform 26 is a container execution realized by executing a container engine such as Docker (registered trademark) for starting the containers 24 and a container management program such as Kubernetes for managing each container 24 on a computer. environment. The computer that executes the container engine and container management program is not particularly limited, and these programs can be executed on physical machines or virtual machines.

サイドカープロキシ２５は、自身と同一のサービス２３に配備されたコンテナ２４に係るサービス情報２７の各項目のパラメータを取得し、それを異常対処装置２２に通知するためのプログラムである。 The sidecar proxy 25 is a program for acquiring the parameter of each item of the service information 27 related to the container 24 deployed in the same service 23 as itself, and notifying it to the abnormality handling device 22 .

図８は、サービス情報２７の模式図である。図８に示すように、サービス情報２７は、「Name」、「Id」、「Status」、「Priority」、「Stdout_path」、「Logfile_path」、「Internalcmd_path」、「Using_db」、「Using_service」、「Operation_type」、及び「SLA」の各項目を有する。 FIG. 8 is a schematic diagram of the service information 27. As shown in FIG. As shown in FIG. 8, the service information 27 includes "Name", "Id", "Status", "Priority", "Stdout_path", "Logfile_path", "Internalcmd_path", "Using_db", "Using_service", "Operation_type ”, and “SLA” items.

このうち、「Name」はサービス２３の名前であり、「Id」はサービス２３を一意に識別する識別子である。 Among these, "Name" is the name of the service 23, and "Id" is an identifier that uniquely identifies the service 23. FIG.

「Status」は、サービス２３の状態を示す情報である。その情報には、サービス２３に含まれるコンテナ２４が稼働中であることを示す「Running」、当該コンテナ２４に異常が発生していることを示す「Error」、当該異常に対処中であることを示す「ResolvingError」がある。また、サービス２３に含まれるコンテナ２４が停止中であることを示す「Stopped」も「Status」に含まれる。 “Status” is information indicating the status of the service 23 . The information includes "Running" indicating that the container 24 included in the service 23 is running, "Error" indicating that an abnormality has occurred in the container 24, and that the abnormality is being dealt with. There is a "ResolvingError" showing. "Status" also includes "Stopped" indicating that the container 24 included in the service 23 is stopped.

「Priority」は、コンテナ２４に異常が発生したときに異常の対処順序を決めるためのパラメータである。「Stdout_path」はサービス２３における標準出力先を示し、「Logfile_path」はログファイルの出力先を示す。 “Priority” is a parameter for determining the order of handling an abnormality when an abnormality occurs in the container 24 . "Stdout_path" indicates the standard output destination in the service 23, and "Logfile_path" indicates the output destination of the log file.

また、「Internalcmd_path」は、サービス２３に含まれるコンテナ２４の内部の情報を取得するためのコマンドパスを示す。 “Internalcmd_path” indicates a command path for obtaining information inside the container 24 included in the service 23 .

「Using_db」は、サービス２３に含まれるコンテナ２４が使用するデータベースの名前である。「Using_service」は、サービス２３に含まれるコンテナ２４と依存関係にある他のコンテナ２４を含むサービス２３の名前である。 “Using_db” is the name of the database used by the container 24 included in the service 23 . “Using_service” is the name of the service 23 that includes other containers 24 that are dependent on the container 24 included in the service 23 .

「Operation_type」は、業務システム２１を実現するのに当該サービス２３が必須かどうかを示す情報である。 “Operation_type” is information indicating whether the service 23 is essential for realizing the business system 21 .

「SLA」は、業務システム２１のSLA(Service Level Agreement)である。一例として、業務システム２１の可用性をSLAとして採用し得る。 “SLA” is the SLA (Service Level Agreement) of the business system 21 . As an example, the availability of business system 21 may be taken as an SLA.

サービス情報２７の取得方法は特に限定されない。 The acquisition method of the service information 27 is not particularly limited.

図９（ａ）、（ｂ）は、サービス情報２７の取得方法の例について示す模式図である。 9A and 9B are schematic diagrams showing an example of a method of acquiring service information 27. FIG.

このうち、図９（ａ）は、サービス２３に割り当てられた記憶領域２３ａを利用した方法の模式図である。記憶領域２３ａは、同一のサービス２３に属するコンテナ２４とサイドカープロキシ２５の両方からアクセス可能な記憶領域である。例えば、コンテナ２４は、動作ログや異常ログを記憶領域２３ａに格納する。そして、サイドカープロキシ２５は、これらのログのうちでサービス情報２７に含まれる項目を記憶領域２３ａから取得し、当該項目をサービス情報取得部４１に通知する。 Among them, FIG. 9A is a schematic diagram of a method using the storage area 23a allocated to the service 23. FIG. The storage area 23 a is a storage area accessible from both the container 24 and the sidecar proxy 25 belonging to the same service 23 . For example, the container 24 stores operation logs and error logs in the storage area 23a. Then, the sidecar proxy 25 acquires items included in the service information 27 from the storage area 23a among these logs, and notifies the service information acquiring unit 41 of the items.

一方、図９（ｂ）は、記憶領域２３ａを介さずに、サイドカープロキシ２５がコンテナ２４からサービス情報２７に含まれる各項目を直接取得し、それをサービス情報取得部４１に通知する場合の模式図である。 On the other hand, FIG. 9(b) shows a model in which the sidecar proxy 25 directly acquires each item included in the service information 27 from the container 24 without going through the storage area 23a and notifies the service information acquisition unit 41 of it. It is a diagram.

この場合、サイドカープロキシ２５は、サービス情報２７に含まれる各項目を収集するための内部コマンド「/bin/internal_cmd」をコンテナ２４の内部で実行し、これらの項目を収集する。 In this case, the sidecar proxy 25 executes the internal command "/bin/internal_cmd" inside the container 24 to collect each item included in the service information 27, and collects these items.

図１０は、サービス情報２７に含まれる「Using_service」の項目を取得する方法の模式図である。 FIG. 10 is a schematic diagram of a method of acquiring the item "Using_service" included in the service information 27. As shown in FIG.

ここでは、「ServiceA」のサービス２３が、「ServiceB」と「ServiceC」のサービス２３の各々に依存しているとする。なお、「SeviceA」のコンテナ２４は第１のコンテナの一例であり、「ServiceB」のコンテナ２４は第２のコンテナの一例である。この場合、「ServiceA」のサービス２３におけるサイドカープロキシ２５は、自サービス２３のコンテナ２４から送信される通信パケット等のデータを監視する。そして、当該サイドカープロキシ２５は、通信パケットのヘッダを分析することにより送信元のコンテナ２４が属するサービス２３と、送信先のコンテナ２４が属するサービス２３とを特定する。その後、サイドカープロキシ２５は、送信元と送信先とを対応つけた通信テーブル２３ｂを生成し、それをサービス情報取得部４１に通知する。そして、サービス情報取得部４１が制御部４２に通信テーブル２３ｂを通知する。 Here, it is assumed that the service 23 of "ServiceA" depends on each of the services 23 of "ServiceB" and "ServiceC". The "ServiceA" container 24 is an example of a first container, and the "ServiceB" container 24 is an example of a second container. In this case, the sidecar proxy 25 in the service 23 of "Service A" monitors data such as communication packets transmitted from the container 24 of its own service 23. FIG. Then, the sidecar proxy 25 identifies the service 23 to which the source container 24 belongs and the service 23 to which the destination container 24 belongs by analyzing the header of the communication packet. After that, the sidecar proxy 25 creates a communication table 23b in which the source and the destination are associated, and notifies the service information acquisition unit 41 of it. Then, the service information acquisition unit 41 notifies the control unit 42 of the communication table 23b.

制御部４２は、通信テーブル２３ｂに基づいて、送信元の「ServiceA」に含まれるコンテナ２４と、送信先の「ServiceB」に含まれるコンテナ２４とを特定する。そして、制御部４２は、「ServiceA」に含まれるコンテナ２４が「ServiceB」に含まれるコンテナ２４に依存していると判断し、サービス情報２７の「Using_service」の項目として「ServiceB」を設定する。 Based on the communication table 23b, the control unit 42 identifies the container 24 included in the source "ServiceA" and the container 24 included in the destination "ServiceB". Then, the control unit 42 determines that the container 24 included in “ServiceA” depends on the container 24 included in “ServiceB”, and sets “ServiceB” as the “Using_service” item of the service information 27 .

同様に、制御部４２は、は、この通信テーブル２３ｂから「ServiceA」に含まれるコンテナ２４が「ServiceC」に含まれるコンテナ２４に依存しているとも判断する。そのため、制御部４２は、サービス情報２７の「Using_service」の項目として更に「ServiceC」を設定する。 Similarly, the control unit 42 also determines from the communication table 23b that the container 24 included in "ServiceA" depends on the container 24 included in "ServiceC". Therefore, the control unit 42 further sets “ServiceC” as the “Using_service” item of the service information 27 .

再び図７を参照する。異常対処装置２２は、業務システム２１の各コンテナ２４に異常が発生した場合に、その異常を解消させるための対処を行う装置である。一例として、異常対処装置２２は、解析部３１、異常対処部３２、記憶部３３、運用ポリシ生成部３４、運用ポリシ適用部３５、及び受付部３６を備える。 Refer to FIG. 7 again. The abnormality handling device 22 is a device that, when an abnormality occurs in each container 24 of the business system 21, takes measures to eliminate the abnormality. As an example, the error handling device 22 includes an analysis unit 31 , an error handling unit 32 , a storage unit 33 , an operation policy generation unit 34 , an operation policy application unit 35 and a reception unit 36 .

このうち、解析部３１は、前述のサービス情報２７を解析する処理部であって、サービス情報取得部４１と制御部４２とを有する。 Among them, the analysis unit 31 is a processing unit that analyzes the service information 27 described above, and has a service information acquisition unit 41 and a control unit 42 .

サービス情報取得部４１は、各々のサイドカープロキシ２５からサービス情報２７を取得する処理部である。 The service information acquisition unit 41 is a processing unit that acquires the service information 27 from each sidecar proxy 25 .

制御部４２は、サービス情報取得部４１が収集したサービス情報２７を解析することによりコンテナ２４の異常の有無を判定する。 The control unit 42 determines whether there is an abnormality in the container 24 by analyzing the service information 27 collected by the service information acquisition unit 41 .

例えば、制御部４２は、サービス情報２７の「Status」が「Error」となっている場合に、そのサービス情報２７に係るサービス２３に含まれるコンテナ２４に異常が発生したと判定する。 For example, when the “Status” of the service information 27 is “Error”, the control unit 42 determines that an abnormality has occurred in the container 24 included in the service 23 related to the service information 27 .

また、制御部４２は、サービス情報２７の「Using_db」や「Using_service」を利用して、複数のサービス２３同士の依存関係を示すサービストポロジを生成する。更に、制御部４２は、サービス情報２７の「Using_db」や「Using_service」が前回取得時と異なっている場合に、複数のサービス２３同士の依存関係を示すサービストポロジが変更されたと判定する。 Also, the control unit 42 uses “Using_db” and “Using_service” of the service information 27 to generate a service topology that indicates the dependency relationship between the multiple services 23 . Further, when the "Using_db" and "Using_service" of the service information 27 are different from those at the time of previous acquisition, the control unit 42 determines that the service topology indicating the interdependency between the services 23 has been changed.

制御部４２は、前述のようにコンテナ２４に異常が発生したと判定した場合には、サービス情報２７に基づいて異常情報リソース４４を生成し、それを記憶部３３に格納する。異常情報リソース４４は、サービス２３に発生した異常についての情報を示すファイルであり、発生した異常ごとに制御部４２が生成する。 When the control unit 42 determines that an abnormality has occurred in the container 24 as described above, the control unit 42 generates the abnormality information resource 44 based on the service information 27 and stores it in the storage unit 33 . The anomaly information resource 44 is a file indicating information about an anomaly that has occurred in the service 23, and is generated by the control unit 42 for each anomaly that has occurred.

図１１は、異常情報リソース４４の模式図である。図１１に示すように、異常情報リソース４４は、「Kind」、「Id」、「Status」、「Priority」、「Service」、及び「Retry_count」の各項目を有する。 FIG. 11 is a schematic diagram of the anomaly information resource 44. As shown in FIG. As shown in FIG. 11, the abnormality information resource 44 has items of "Kind", "Id", "Status", "Priority", "Service", and "Retry_count".

このうち、「Kind」は、異常の種類を示す情報である。「Kind」の設定方法は特に限定されない。例えば、制御部４２は、サービス情報２７の「Status」が「Error」となっている場合に、サービス情報２７の「Stdout_path」で示されるパスから標準エラー出力を取得し、サービス情報２７の「Logfile_path」からログファイルを取得する。そして、制御部４２は、取得した標準エラー出力とログファイルに基づいて異常の種類を示す「Kind」の値を設定する。 Among these, "Kind" is information indicating the type of abnormality. The setting method of "Kind" is not particularly limited. For example, when the "Status" of the service information 27 is "Error", the control unit 42 acquires the standard error output from the path indicated by "Stdout_path" of the service information 27, Get the log file from Then, the control unit 42 sets the value of "Kind" indicating the type of abnormality based on the acquired standard error output and log file.

また、「Id」は異常情報リソース４４を一意に識別する識別子である。「Status」は、異常への対処結果を示す情報である。その情報には、対処により異常が解消されたことを示す「Sucess」、対処しても異常が解消されなかったことを示す「Failed」、及び異常への対処中であることを示す「ResolvingError」がある。 “Id” is an identifier that uniquely identifies the anomaly information resource 44 . “Status” is information indicating the result of coping with the abnormality. The information includes "Sucess", which indicates that the error has been resolved, "Failed", which indicates that the error has not been resolved, and "ResolvingError", which indicates that the error is being resolved. There is

「Priority」は、複数の異常のうち、どの異常から先に対処すべきかを示す優先度を示す数値であり、その値が小さいほど優先度が高いことになる。「Priority」の値の設定方法は特に限定されず、異常の種類を示す「Kind」と、サービス情報２７に含まれる「Priority」等に基づいて、制御部４２が異常情報リソース４４の「Priority」の値を設定し得る。 "Priority" is a numerical value indicating the priority of which abnormality should be dealt with first among a plurality of abnormalities, and the smaller the value, the higher the priority. The method of setting the value of "Priority" is not particularly limited. You can set the value of

「Service」は、異常が発生したサービス２３の名前である。「Retry_count」は、異常に対処した回数を示す。「Retry_count」の初期値は「０」であり、異常を解消させるためのロジックをロジック実行部４９が実行するたびに制御部４２が「Retry_count」を１だけインクリメントする。 "Service" is the name of the service 23 in which the error occurred. "Retry_count" indicates the number of times the error has been dealt with. The initial value of "Retry_count" is "0", and the control unit 42 increments "Retry_count" by 1 each time the logic execution unit 49 executes the logic for resolving the abnormality.

再び図７を参照する。制御部４２は、異常情報リソース４４を生成した後に、異常対処部３２に対して異常の対処を依頼する。 Refer to FIG. 7 again. After generating the abnormality information resource 44, the control unit 42 requests the abnormality handling unit 32 to handle the abnormality.

異常対処部３２は、制御部４２からの依頼を受けたときに、サービス２３に発生した異常を解消させるための対処を行う処理部である。一例として、異常対処部３２は、異常特定部４６、スケジューリング部４７、ロジック生成部４８、及びロジック実行部４９を有する。 The abnormality handling unit 32 is a processing unit that, when receiving a request from the control unit 42 , takes measures to resolve an abnormality that has occurred in the service 23 . As an example, the abnormality handling unit 32 has an abnormality identification unit 46 , a scheduling unit 47 , a logic generation unit 48 and a logic execution unit 49 .

このうち、異常特定部４６は、制御部４２から依頼を受けたときに記憶部３３にある複数の異常情報リソース４４を読み込み、これらの異常情報リソース４４に対応した異常の種類を特定する処理部である。例えば、異常特定部４６は、異常情報リソース４４の「Kind」から異常の種類を特定する。また、異常特定部４６は、異常の種類を特定した後に、異常への対処のスケジューリングを行うようにスケジューリング部４７に依頼する。 Among these, the abnormality identification unit 46 is a processing unit that reads a plurality of abnormality information resources 44 in the storage unit 33 when receiving a request from the control unit 42 and identifies the type of abnormality corresponding to these abnormality information resources 44. is. For example, the anomaly identification unit 46 identifies the kind of anomaly from “Kind” of the anomaly information resource 44 . Further, after identifying the type of abnormality, the abnormality identification unit 46 requests the scheduling unit 47 to perform scheduling for dealing with the abnormality.

スケジューリング部４７は、異常特定部４６からの依頼を受けたときに、異常への対処のスケジューリングを行う処理部である。一例として、スケジューリング部４７は、異常情報リソース４４の「Priority」から異常の優先度を特定し、優先度が高い異常から順に処理をするようにスケジューリングを行う。 The scheduling unit 47 is a processing unit that, upon receiving a request from the abnormality identification unit 46, schedules measures to be taken against an abnormality. As an example, the scheduling unit 47 specifies the priority of anomalies from "Priority" of the anomaly information resource 44, and performs scheduling so that the anomalies with the highest priority are processed first.

ロジック生成部４８は、異常特定部４６が特定した異常の種類ごとに、当該異常を解消させるロジックを生成する処理部である。例えば、ロジック生成部４８は、記憶部３３に格納されているロジックデータベース５１を参照することによりロジックを生成する。 The logic generation unit 48 is a processing unit that generates, for each type of abnormality identified by the abnormality identification unit 46, logic for resolving the abnormality. For example, the logic generation unit 48 generates logic by referring to the logic database 51 stored in the storage unit 33 .

図１２は、ロジックデータベース５１の模式図である。図１２に示すように、ロジックデータベース５１は、「異常の種類」、「異常名」、及び「ロジック」の各々を対応付けた情報である。 FIG. 12 is a schematic diagram of the logic database 51. As shown in FIG. As shown in FIG. 12, the logic database 51 is information in which "type of abnormality", "name of abnormality", and "logic" are associated with each other.

「異常の種類」は、異常情報リソース４４の「Kind」と同一の情報であって、異常の種類を示す情報である。「異常名」は、異常の名前を示す情報である。そして、「ロジック」は、異常を解消させるための処理内容を示す情報である。 "Kind of anomaly" is the same information as "Kind" of the anomaly information resource 44, and is information indicating the kind of anomaly. "Abnormality name" is information indicating the name of the abnormality. "Logic" is information indicating the processing content for resolving the abnormality.

例えば、「異常の種類」が「サービス間のネットワークタイムアウト」である場合について考える。この場合の「異常の名前」は「NW_timeout」である。「ロジック」は、「[COMMAND: “<実行するコマンド>”]」であって、この“<実行するコマンド>”を実行することにより「サービス間のネットワークタイムアウト」という異常が解消される。 For example, consider the case where the 'failure type' is 'network timeout between services'. The "name of anomaly" in this case is "NW_timeout". The “logic” is “[COMMAND: “<command to be executed>”]”, and by executing this “<command to be executed>”, the abnormality of “network timeout between services” is resolved.

そして、ロジック生成部４８は、「[COMMAND: “<実行するコマンド>”]」というロジックを生成する。 Then, the logic generation unit 48 generates logic “[COMMAND: “<command to be executed>”]”.

なお、「DBへの接続エラー」のように、ロジックとして異常を解消させるためのスクリプトを用いてもよい。 It should be noted that a script for resolving an error as logic, such as "Error connecting to DB", may be used.

再び図７を参照する。ロジック実行部４９は、ロジック生成部４８が生成したロジックを実行することにより、異常の解消を試みる処理部である。なお、ロジック生成部４８は、ある異常の異常情報リソース４４の「Retry_count」が予め定めておいた閾値を超えている場合には、当該異常に対するロジックの実行を停止する。この場合、ロジック生成部４８は、残りの異常のうちで優先度が最も高い異常に対してロジックを実行することになる。 Refer to FIG. 7 again. The logic execution unit 49 is a processing unit that tries to eliminate the abnormality by executing the logic generated by the logic generation unit 48 . It should be noted that, when the "Retry_count" of the anomaly information resource 44 of a certain anomaly exceeds a predetermined threshold value, the logic generation unit 48 stops executing the logic for that anomaly. In this case, the logic generator 48 executes the logic for the abnormality with the highest priority among the remaining abnormalities.

運用ポリシ生成部３４は、制御部４２がサービス情報２７から取得したSLAに基づいて業務システム２１の運用ポリシを生成する処理部である。 The operation policy generation unit 34 is a processing unit that generates an operation policy for the business system 21 based on the SLA that the control unit 42 has acquired from the service information 27 .

図１３は、運用ポリシの模式図である。運用ポリシは、コンテナ２４のリソース使用率の制御のためのパラメータとサービス２３間の通信を制御するためのサイドカープロキシ２５の設定パラメータである。そのような設定パラメータとしては、サービス情報２７における「Name」、「Id」、「Priority」、「Stdout_path」、「Logfile_path」、「Internalcmd_path」、及び「Operation_type」の各パラメータがある。これらの値の初期値は運用者によって設定されるが、業務システム２１の運用の開始と共に運用ポリシ生成部３４が自動で調節する。例えば、運用ポリシ生成部３４は、業務システム２１がSLAを満たすようにこれらの設定パラメータを更新する。また、運用ポリシ生成部３４は、更新した運用ポリシを運用ポリシデータベース５２に格納する。 FIG. 13 is a schematic diagram of an operational policy. The operational policy is a parameter for controlling the resource usage rate of the container 24 and a setting parameter for the sidecar proxy 25 for controlling communication between the services 23 . Such setting parameters include the parameters “Name”, “Id”, “Priority”, “Stdout_path”, “Logfile_path”, “Internalcmd_path”, and “Operation_type” in the service information 27 . Initial values of these values are set by the operator, but are automatically adjusted by the operation policy generation unit 34 when the operation of the business system 21 is started. For example, the operational policy generator 34 updates these setting parameters so that the business system 21 satisfies the SLA. The operational policy generation unit 34 also stores the updated operational policy in the operational policy database 52 .

再び図７を参照する。運用ポリシ適用部３５は、運用ポリシデータベース５２を参照することにより、各コンテナ２４に運用ポリシを適用する処理部である。 Refer to FIG. 7 again. The operational policy application unit 35 is a processing unit that applies the operational policy to each container 24 by referring to the operational policy database 52 .

受付部３６は、運用者からロジックデータベース５１に含まれる個々のパラメータの入力を受け付け、それを記憶部３３に格納する処理部である。また、受付部３６は、運用者から運用ポリシの初期値の入力を受け付け、それを記憶部３３の運用ポリシデータベース５２に格納する。 The accepting unit 36 is a processing unit that accepts input of individual parameters contained in the logic database 51 from the operator and stores them in the storage unit 33 . The accepting unit 36 also accepts input of an initial value of the operational policy from the operator, and stores it in the operational policy database 52 of the storage unit 33 .

次に、業務システム１２の運用を開始するときの処理の流れについて説明する。 Next, the flow of processing when starting the operation of the business system 12 will be described.

図１４は、業務システム１２の運用を開始するときの処理の流れを示すフローチャートである。 FIG. 14 is a flow chart showing the flow of processing when starting the operation of the business system 12 .

まず、受付部３６が、運用者から運用ポリシ（図１３参照）の個々のパラメータの初期値の入力を受け付け、それらを運用ポリシデータベース５２に格納する（ステップＳ１１）。 First, the reception unit 36 receives input of initial values of individual parameters of the operation policy (see FIG. 13) from the operator, and stores them in the operation policy database 52 (step S11).

次に、運用ポリシ生成部３４が運用ポリシデータベース５２を参照して運用ポリシを生成する（ステップＳ１２）。 Next, the operational policy generation unit 34 refers to the operational policy database 52 and generates an operational policy (step S12).

次いで、運用ポリシ適用部３５が、運用ポリシデータベース５２を参照して運用ポリシを取得する（ステップＳ１３）。 Next, the operational policy application unit 35 refers to the operational policy database 52 and acquires the operational policy (step S13).

続いて、運用ポリシ適用部３５が、取得した運用ポリシを各コンテナ２４に適用する（ステップＳ１４）。 Subsequently, the operational policy application unit 35 applies the acquired operational policy to each container 24 (step S14).

次に、受付部３６が、運用者からロジックデータベース５１の個々のパラメータの入力を受け付け、それを記憶部３３に格納する（ステップＳ１５）。 Next, the reception unit 36 receives inputs of individual parameters of the logic database 51 from the operator and stores them in the storage unit 33 (step S15).

次いで、ロジック生成部４８がこれらのパラメータからロジックデータベース５１を作成し、それを記憶部３３に格納する（ステップＳ１６）。 Next, the logic generation unit 48 creates the logic database 51 from these parameters and stores it in the storage unit 33 (step S16).

以上により、業務システム１２の運用を開始するときの基本的な処理を終える。 With the above, the basic processing for starting the operation of the business system 12 is completed.

次に、本実施形態に係る異常対処方法について説明する。 Next, an abnormality coping method according to this embodiment will be described.

図１５は、本実施形態に係る異常対処方法のフローチャートである。まず、運用ポリシ適用部３５が運用ポリシデータベース５２を参照し、運用ポリシに変更がある場合には変更後の運用ポリシを各コンテナ２４に適用する（ステップＳ２１）。 FIG. 15 is a flowchart of an abnormality coping method according to this embodiment. First, the operation policy application unit 35 refers to the operation policy database 52, and if there is a change in the operation policy, applies the changed operation policy to each container 24 (step S21).

次いで、サイドカープロキシ２５が、サービス情報２７に含まれる各項目の情報を、当該サイドカープロキシ２５と同じサービス２３内のコンテナ２４から収集し、それらをサービス情報取得部４１に通知する（ステップＳ２２）。 Next, the sidecar proxy 25 collects the information of each item included in the service information 27 from the container 24 in the same service 23 as the sidecar proxy 25, and notifies the service information acquisition unit 41 of them (step S22).

次に、サービス情報取得部４１が各サイドカープロキシ２５からサービス情報２７を取得する（ステップＳ２３）。 Next, the service information acquisition unit 41 acquires the service information 27 from each sidecar proxy 25 (step S23).

次いで、制御部４２がサービス情報２７を解析する（ステップＳ２４）。例えば、制御部４２は、通信テーブル２３ｂに基づいて、複数のサービス２３同士の依存関係を示すサービストポロジを生成する。また、制御部４２は、サービス情報２７に基づいて業務システム１２のSLAを特定する。 Next, the control unit 42 analyzes the service information 27 (step S24). For example, based on the communication table 23b, the control unit 42 generates a service topology that indicates dependencies between the services 23. FIG. Also, the control unit 42 identifies the SLA of the business system 12 based on the service information 27 .

次に、制御部４２が、サービス情報２７を解析した結果、サービストポロジが変更されたかを判定する（ステップＳ２５）。 Next, as a result of analyzing the service information 27, the control unit 42 determines whether the service topology has been changed (step S25).

そして、サービストポロジが変更されたと判定された場合（ステップＳ２５：肯定）はステップＳ２６に移る。ステップＳ２６では、運用ポリシ生成部３４が、変更後のサービストポロジにとって望ましい運用ポリシを生成し、それを運用ポリシデータベース５２に格納する。 Then, if it is determined that the service topology has been changed (step S25: affirmative), the process proceeds to step S26. In step S<b>26 , the operational policy generator 34 generates an operational policy desirable for the service topology after change and stores it in the operational policy database 52 .

一方、サービストポロジが変更されていないと判定された場合（ステップＳ２５：否定）はステップＳ２７に移る。 On the other hand, if it is determined that the service topology has not been changed (step S25: No), the process proceeds to step S27.

ステップＳ２７においては、制御部４２が、サービス情報２７を解析した結果、業務システム１２のSLAが基準を超えたかを判定する。ここで、SLAが基準を超えたと判定された場合（ステップＳ２７：肯定）は前述のステップＳ２６に移る。ステップＳ２６では、運用ポリシ生成部３４が、業務システム１２のSLAが基準を満たすように運用ポリシを変更する。 In step S27, the control unit 42 analyzes the service information 27 and determines whether the SLA of the business system 12 exceeds the standard. Here, if it is determined that the SLA exceeds the standard (step S27: affirmative), the process proceeds to step S26. In step S26, the operational policy generator 34 changes the operational policy so that the SLA of the business system 12 satisfies the criteria.

一方、SLAが基準を超えていないと判定された場合（ステップＳ２７：否定）はステップＳ２８に移る。 On the other hand, if it is determined that the SLA does not exceed the standard (step S27: No), the process proceeds to step S28.

ステップＳ２８においては、制御部４２が、コンテナ２４に異常が発生したかを判定する。例えば、制御部４２は、サービス情報２７の「Status」が「Error」となっている場合に、サービス情報２７の「Name」が示すサービス２３に含まれるコンテナ２４に異常が発生したと判定する。 In step S28, the control unit 42 determines whether the container 24 has an abnormality. For example, when the “Status” of the service information 27 is “Error”, the control unit 42 determines that an abnormality has occurred in the container 24 included in the service 23 indicated by the “Name” of the service information 27 .

ここで、異常は発生していないと判定された場合（ステップＳ２８：否定）はステップＳ２９に移る。 Here, if it is determined that no abnormality has occurred (step S28: No), the process proceeds to step S29.

ステップＳ２９においては、制御部４２が、業務システム１２が業務を終了したかを判定する。ここで、業務を終了していないと判定された場合（ステップＳ２９：否定）にはステップＳ２２に戻る。一方、業務を終了したと判定した場合（ステップＳ２９：肯定）は処理を終える。 In step S29, the control unit 42 determines whether the business system 12 has completed the business. Here, if it is determined that the work has not been completed (step S29: No), the process returns to step S22. On the other hand, if it is determined that the business has ended (step S29: affirmative), the process ends.

また、前述のステップＳ２８において異常が発生したと制御部４２が判定した場合にはステップＳ３０の異常対処処理を行い、その後ステップＳ２９に移る。以上により、図１５のフローチャートの基本的な処理を終える。 Further, when the control unit 42 determines that an abnormality has occurred in the above-described step S28, the abnormality handling process of step S30 is performed, and then the process proceeds to step S29. With the above, the basic processing of the flowchart of FIG. 15 is completed.

図１６は、前述のステップＳ３０の異常対処処理のフローチャートである。 FIG. 16 is a flowchart of the abnormality handling process in step S30 described above.

まず、制御部４２が、サービス情報２７に基づいて異常情報リソース４４を生成し、それを記憶部３３に格納する（ステップＳ４１）。このとき、制御部４２は、サービス情報２７に基づいて異常の種類を特定し、その異常の種類に応じた「Kind」の値を異常情報リソース４４に設定する。また、制御部４２は、異常対処部３２に対して異常の対処を依頼する。 First, the control unit 42 generates the abnormality information resource 44 based on the service information 27 and stores it in the storage unit 33 (step S41). At this time, the control unit 42 identifies the type of abnormality based on the service information 27 and sets the value of “Kind” according to the type of abnormality in the abnormality information resource 44 . Further, the control unit 42 requests the abnormality handling unit 32 to handle the abnormality.

次に、異常対処部３２の異常特定部４６が、制御部４２からの依頼を受けて、異常の種類を特定する（ステップＳ４２）。例えば、異常特定部４６は、記憶部３３にある異常情報リソース４４を読み込み、その異常情報リソース４４の「Kind」から異常の種類を特定する。また、異常特定部４６は、異常への対処のスケジューリングを行うようにスケジューリング部４７に依頼する。 Next, the abnormality identification unit 46 of the abnormality handling unit 32 identifies the type of abnormality upon receiving a request from the control unit 42 (step S42). For example, the anomaly identification unit 46 reads the anomaly information resource 44 in the storage unit 33 and identifies the type of anomaly from the “Kind” of the anomaly information resource 44 . Further, the anomaly identifying unit 46 requests the scheduling unit 47 to schedule measures to deal with the anomaly.

次に、スケジューリング部４７が、異常特定部４６からの依頼を受けて、異常への対処のスケジューリングを行う（ステップＳ４３）。例えば、スケジューリング部４７は、異常情報リソース４４の「Priority」から異常の優先度を特定し、優先度が高い異常から順に処理をするようにスケジューリングを行う。 Next, the scheduling unit 47 receives a request from the abnormality identification unit 46 and schedules measures to deal with the abnormality (step S43). For example, the scheduling unit 47 specifies the priority of anomalies from "Priority" of the anomaly information resource 44, and performs scheduling so that the anomalies with the highest priority are processed first.

次いで、ロジック生成部４８が、異常特定部４６が特定した異常の種類ごとに、当該異常を解消させるロジックを生成する（ステップＳ４４）。一例として、ロジック生成部４８は、記憶部３３に格納されているロジックデータベース５１を参照することにより、ステップＳ４２で特定した異常の種類に対応するロジックを特定し、当該ロジックを生成する。 Next, the logic generation unit 48 generates logic for resolving the abnormality for each type of abnormality identified by the abnormality identification unit 46 (step S44). As an example, the logic generation unit 48 identifies logic corresponding to the type of abnormality identified in step S42 by referring to the logic database 51 stored in the storage unit 33, and generates the logic.

次に、ロジック実行部４９が、ロジック生成部４８が生成したロジックを実行する（ステップＳ４５）。 Next, the logic execution unit 49 executes the logic generated by the logic generation unit 48 (step S45).

次いで、制御部４２がサービス情報２７を新たに取得し、そのサービス情報２７に基づいて異常情報リソース４４を生成する（ステップＳ４６）。 Next, the control unit 42 newly acquires the service information 27 and generates the abnormality information resource 44 based on the service information 27 (step S46).

次に、ロジック実行部４９が、ロジックを実行したことにより異常が解消されたかを判定する（ステップＳ４７）。例えば、ロジック実行部４９は、ステップＳ４６で生成した異常情報リソース４４の「Status」が「Running」の場合に異常が解消されたと判定し、「Status」が「Error」のままの場合に異常は解消されていないと判定する。 Next, the logic execution unit 49 determines whether or not the abnormality has been resolved by executing the logic (step S47). For example, the logic execution unit 49 determines that the abnormality has been resolved when the "Status" of the abnormality information resource 44 generated in step S46 is "Running", and determines that the abnormality has been resolved when the "Status" remains "Error". Determine that it has not been resolved.

ここで、異常が解消されたと判定された場合（ステップＳ４７：肯定）はステップＳ５０に移る。 Here, if it is determined that the abnormality has been resolved (step S47: affirmative), the process proceeds to step S50.

ステップＳ５０においては、制御部４２が、解消された異常に対応した異常情報リソース４４を記憶部３３から削除する。 In step S<b>50 , the control unit 42 deletes the abnormality information resource 44 corresponding to the resolved abnormality from the storage unit 33 .

一方、異常が解消されていないと判定された場合（ステップＳ４７：否定）はステップＳ４８に移る。 On the other hand, if it is determined that the abnormality has not been resolved (step S47: No), the process proceeds to step S48.

ステップＳ４８においては、ロジック実行部４９が、異常対処のリトライ回数を示す異常情報リソース４４の「Retry_count」が閾値を超えたかを判定する。その閾値は、ある異常への対処を繰り返すことで他の異常への対処が遅れるのを防止する観点から設定される。 In step S48, the logic execution unit 49 determines whether the "Retry_count" of the abnormality information resource 44 indicating the number of retries for dealing with an abnormality exceeds a threshold. The threshold is set from the viewpoint of preventing delays in dealing with other anomalies due to repeated dealing with a certain anomaly.

ここで、閾値を超えたと判定した場合（ステップＳ４８：肯定）は、前述のステップＳ５０に移り、制御部４２が、ステップＳ４５で対処した異常に係る異常情報リソース４４を記憶部３３から削除する。これにより、特定の異常に対する異常を何度も繰り返すことで他の異常への対処が遅れるのを防止することができる。 Here, when it is determined that the threshold value is exceeded (step S48: affirmative), the control unit 42 proceeds to the above-described step S50, and deletes the abnormality information resource 44 related to the abnormality dealt with in step S45 from the storage unit 33. As a result, it is possible to prevent delays in coping with other anomalies caused by repeating an anomaly with respect to a specific anomaly.

なお、この場合は異常が解消されていないことになるが、制御部４２が表示装置等にアラートを表示することで、運用者に異常が解消されていないことを通知してもよい。 In this case, the abnormality has not been resolved, but the control unit 42 may display an alert on the display device or the like to notify the operator that the abnormality has not been resolved.

一方、閾値を超えていないと判定した場合（ステップＳ４８：否定）はステップＳ４９に移る。 On the other hand, if it is determined that the threshold is not exceeded (step S48: No), the process proceeds to step S49.

ステップＳ４９においては、対処していない異常があるかを制御部４２が判定する。ここで、対処していない異常があると判定した場合（ステップＳ４９：肯定）はステップＳ４３に戻る。 In step S49, the control unit 42 determines whether there is an abnormality that has not been dealt with. Here, if it is determined that there is an abnormality that has not been dealt with (step S49: affirmative), the process returns to step S43.

一方、対処していない異常がないと判定した場合（ステップＳ４９：肯定）は処理を終えて呼び出し元に戻る。 On the other hand, if it is determined that there is no abnormality that has not been dealt with (step S49: affirmative), the processing is terminated and the caller is returned to.

以上により、本実施形態に係る異常対処方法の基本的な処理を終える。 With the above, the basic processing of the abnormality coping method according to the present embodiment is completed.

上記した本実施形態によれば、異常に対処すべき優先度を示す異常情報リソース４４の「Priority」を制御部４２が設定し（ステップＳ４１）、その優先度の順に異常に対処する（ステップＳ４３、Ｓ４５）。そのため、重篤な異常の対処が後回しにされるのを抑制でき、異常への対処が遅れるのを抑制することができる。 According to the present embodiment described above, the control unit 42 sets the "Priority" of the abnormality information resource 44 that indicates the priority of handling an abnormality (step S41), and the abnormality is handled in order of priority (step S43). , S45). Therefore, it is possible to prevent the handling of a serious abnormality from being postponed, and to suppress the delay in handling the abnormality.

更に、ロジック生成部４８が異常の種類ごとにロジックを生成するため（ステップＳ４４）、当該異常を解消するのに相応しいロジックを自動で実行でき、業務システム１２の運用者が対処内容を判断する必要がない。 Furthermore, since the logic generation unit 48 generates logic for each type of abnormality (step S44), it is possible to automatically execute logic suitable for resolving the abnormality, and the operator of the business system 12 needs to determine the content of the countermeasure. There is no

しかも、ステップＳ４８においてある異常についてのリトライ回数が閾値を超えたと判定された場合には、ロジック実行部４９がその異常への対処を停止する。そのため、同一の異常への対処が何度も行われることで他の異常への対処が遅れるのを抑制することができる。 Moreover, when it is determined in step S48 that the number of retries for a certain abnormality has exceeded the threshold, the logic execution unit 49 stops dealing with that abnormality. Therefore, it is possible to prevent delays in dealing with other anomalies caused by dealing with the same anomaly many times.

次に、異常対処の具体例について説明する。 Next, a specific example of handling an abnormality will be described.

・第１例
図１７は、第１例に係る異常対処について説明するための模式図である。 - 1st example FIG. 17 : is a schematic diagram for demonstrating the abnormality handling based on a 1st example.

図１７に示すように、第１例では、「ServiceA」～「ServiceD」の４つのサービス２３で業務システム１２が実現される場合を想定する。また、これらのサービス２３のうちで、「ServiceA」と「ServiceD」の２つのコンテナ２４に異常が発生したものとする。 As shown in FIG. 17, in the first example, it is assumed that the business system 12 is realized by four services 23 of "ServiceA" to "ServiceD". It is also assumed that, of these services 23, two containers 24, "ServiceA" and "ServiceD", have failed.

図１８（ａ）は、この場合のサービス情報２７の模式図である。図１８（ａ）に示すように、コンテナ２４に異常が発生していない「ServiceB」と「ServiceC」の「Status」は「Running」となる。一方、コンテナ２４に異常が発生した「ServiceA」と「ServiceD」の「Status」は「Error」となる。 FIG. 18(a) is a schematic diagram of the service information 27 in this case. As shown in FIG. 18A, the "Status" of "ServiceB" and "ServiceC" in which no abnormality has occurred in the container 24 is "Running". On the other hand, the "Status" of "ServiceA" and "ServiceD" in which an abnormality has occurred in the container 24 becomes "Error".

図１８（ｂ）は、第１例において制御部４２が生成した異常情報リソース４４の模式図である。異常情報リソース４４は、コンテナ２４に異常が発生したサービス２３ごとに制御部４２が生成するため、この例では制御部４２が「ServiceA」と「ServiceD」の異常情報リソース４４を生成する。 FIG. 18B is a schematic diagram of the abnormality information resource 44 generated by the control unit 42 in the first example. Since the abnormality information resource 44 is generated by the control unit 42 for each service 23 in which an abnormality has occurred in the container 24, the control unit 42 generates the abnormality information resource 44 for "ServiceA" and "ServiceD" in this example.

ここでは、制御部４２が、「ServiceD」に係る異常情報リソース４４の「Priority」を「0」に設定し、「ServiceA」に係る異常情報リソース４４の「Priority」をそれよりも高い「1」に設定したものとする。 Here, the control unit 42 sets the "Priority" of the abnormality information resource 44 related to "ServiceD" to "0", and sets the "Priority" of the abnormality information resource 44 related to "ServiceA" to a higher "1". shall be set to

図１９は、この場合にロジック生成部４８が生成したロジックの模式図である。 FIG. 19 is a schematic diagram of the logic generated by the logic generator 48 in this case.

そのロジックには、「ServiceA」のコンテナ２４の異常を解消させるスクリプトと、「ServiceD」のコンテナ２４の異常を解消させるスクリプトが記述される。前述のように「ServiceA」の異常の優先度は「ServiceD」のそれよりも高い。そのため、スケジューリング部４７は、「ServiceA」への対処が「ServiceD」への対処よりも先になるようにスケジューリングを行う。このスケジューリングの結果、ロジック生成部４８は、「ServiceA」のコンテナ２４の異常を解消させるためのスクリプトを、「ServiceD」のコンテナ２４の異常を解消させるためのスクリプトよりも先に記述する。 In the logic, a script for resolving the abnormality of the container 24 of "ServiceA" and a script for resolving the abnormality of the container 24 of "ServiceD" are described. As mentioned above, the priority of abnormality of "ServiceA" is higher than that of "ServiceD". Therefore, the scheduling unit 47 performs scheduling so that "ServiceA" is handled before "ServiceD" is handled. As a result of this scheduling, the logic generation unit 48 writes the script for resolving the abnormality of the container 24 of "ServiceA" before the script for resolving the abnormality of the container 24 of "ServiceD".

これにより、ロジック実行部４９は、優先度が高い「ServiceA」のコンテナ２４の異常から先に対処し、その対処を終えた後に「ServiceD」のコンテナ２４の異常に対処する。 As a result, the logic execution unit 49 first deals with the abnormality of the container 24 of "ServiceA", which has the highest priority, and then deals with the abnormality of the container 24 of "ServiceD".

その結果、優先度が高く重篤な異常への対処が遅れるのを抑制することができ、業務システム１２を安定的に稼働させることができる。しかも、制御部４２が自動的に異常の優先度を設定し、その優先度に従ってスケジューリング部４７が自動的にスケジューリングを行うため、業務システム１２の運用者が異常への対応順を決める必要もない。 As a result, it is possible to suppress delays in dealing with high-priority and serious abnormalities, and the business system 12 can be operated stably. Moreover, since the control unit 42 automatically sets the priority of abnormalities, and the scheduling unit 47 automatically schedules according to the priority, the operator of the business system 12 does not need to decide the order of responding to abnormalities. .

・第２例
図２０は、第２例に係る異常対処について説明するための模式図である。図２０に示すように、第２例では、「ServiceA1」、「ServiceA2」、「ServiceA3」、及び「ServiceB」の４つのサービス２３と、「DatabaseA」というデータベース２９とによって業務システム２１が実現されている場合を想定する。また、図２０では、サービス２３同士の依存関係を矢印で示している。その矢印の根元のサービス２３は、通信パケットの送信元のコンテナ２４が起動しているサービスを示す。また、矢印の先端のサービス２３は、通信パケットの送信先のコンテナ２４を示す。 Second Example FIG. 20 is a schematic diagram for explaining how to deal with an abnormality according to a second example. As shown in FIG. 20, in the second example, a business system 21 is implemented by four services 23 of "ServiceA1", "ServiceA2", "ServiceA3", and "ServiceB" and a database 29 of "DatabaseA". Assume that there are In FIG. 20, arrows indicate dependencies between the services 23 . The service 23 at the root of the arrow indicates the service activated by the container 24 that is the transmission source of the communication packet. Also, the service 23 at the tip of the arrow indicates the container 24 of the transmission destination of the communication packet.

なお、データベース２９に向かう矢印は、矢印の根元のサービス２３内のコンテナ２４がデータベース２９にアクセスすることを示す。 An arrow directed to the database 29 indicates that the container 24 in the service 23 at the root of the arrow accesses the database 29 .

以下では、「ServiceA1」と「ServiceA2」の各々のコンテナ２４に異常があった場合について説明する。 Below, a case where there is an abnormality in each of the containers 24 of "ServiceA1" and "ServiceA2" will be described.

図２１は、この場合のサービス情報２７を示す模式図である。前述のように「ServiceA1」と「ServiceA2」の各々のコンテナ２４に異常があるため、これらのサービス２３における「Status」は「Error」となる。 FIG. 21 is a schematic diagram showing service information 27 in this case. Since the containers 24 of "ServiceA1" and "ServiceA2" each have an abnormality as described above, the "Status" of these services 23 is "Error".

また、「Using_db」と「Using_service」の各項目には、図２０の依存関係を反映した値が格納される。例えば、「ServiceA1」の「Using_service」には「ServiceA2」と「ServiceA3」が格納される。なお、「ServiceA1」は「DatabaseA」にアクセスしないため、アクセスしないことを示す「NULL」が「Using_db」に格納される。 Also, the items "Using_db" and "Using_service" store values that reflect the dependencies shown in FIG. For example, "ServiceA2" and "ServiceA3" are stored in "Using_service" of "ServiceA1". Since "ServiceA1" does not access "DatabaseA", "NULL" indicating no access is stored in "Using_db".

一方、「ServiceA2」は「DatabaseA」にアクセスするため、「ServiceA2」の「Using_db」には「DatabaseA」が格納される。また、「ServiceA2」は他のサービス２３に依存しないため、「ServiceA2」の「Using_service」は「NULL」となる。 On the other hand, since "ServiceA2" accesses "DatabaseA", "DatabaseA" is stored in "Using_db" of "ServiceA2". Also, since "ServiceA2" does not depend on other services 23, "Using_service" of "ServiceA2" is "NULL".

なお、「ServiceA3」の「Operation_type」における「CircuitBreakOK」は、「ServiceA3」のサービス２３を実行しているコンテナ２４に異常が発生した場合、業務システム２１から「ServiceA3」を削除してもよいことを示す。 "CircuitBreakOK" in "Operation_type" of "ServiceA3" indicates that "ServiceA3" may be deleted from the business system 21 when an abnormality occurs in the container 24 executing the service 23 of "ServiceA3". show.

図２２は、図２１のサービス情報２７を利用して制御部４２が生成したサービストポロジの模式図である。 FIG. 22 is a schematic diagram of a service topology generated by the control unit 42 using the service information 27 of FIG.

ここでは、制御部４２は、サービストポロジを表現する隣接リストを生成する。この隣接リストの１行目の「ServiceA1-ServiceA2-ServiceA3」は、「ServiceA1」の通信先のサービス２３が「ServiceA2」と「ServiceA3」であることを示す。２行目の「ServiceA2-DatabaseA」は、「ServiceA2」が「DatabaseA」にアクセスすることを示す。また、最後の行の「DatabaseA」は、「DatabaseA」のアクセス先がないことを示す。 Here, the control unit 42 generates a neighbor list representing the service topology. "ServiceA1-ServiceA2-ServiceA3" on the first line of this adjacency list indicates that the services 23 to which "ServiceA1" communicates are "ServiceA2" and "ServiceA3". "ServiceA2-DatabaseA" on the second line indicates that "ServiceA2" accesses "DatabaseA". Also, "DatabaseA" in the last line indicates that there is no access destination for "DatabaseA".

図２３は、本例において制御部４２が生成した異常情報リソース４４の模式図である。 FIG. 23 is a schematic diagram of the abnormality information resource 44 generated by the control unit 42 in this example.

図２３の例では、「ServiceA1」と「ServiceA2」の各異常に対してロジック実行部４９が既に１回ロジックを実行しており、それでも異常が解消されなかった場合を示す。この場合、「ServiceA1」と「ServiceA2」のそれぞれの「Status」は「Failed」となる。 The example of FIG. 23 shows a case where the logic execution unit 49 has already executed the logic once for each abnormality of "ServiceA1" and "ServiceA2", but the abnormality has not been resolved. In this case, the "Status" of each of "ServiceA1" and "ServiceA2" is "Failed".

また、「ServiceA1」と「ServiceA2」のそれぞれの「Priority」はいずれも「1」であるとする。 Also, it is assumed that the "Priority" of "ServiceA1" and "ServiceA2" are both "1".

このように二つのサービス２３の「Priority」が同一の場合は、スケジューリング部４７は、通信パケットの送信先のサービス２３に含まれるコンテナ２４から先にロジックを実行するようにスケジューリングする。この例では通信パケットの送信先は「ServiceA2」であるため、スケジューリング部４７は、「ServiceA1」よりも先に「ServiceA2」のロジックが実行されるようにスケジューリングする。 When the two services 23 have the same "Priority" as described above, the scheduling unit 47 schedules the container 24 included in the service 23 to which the communication packet is sent to execute the logic first. In this example, the destination of the communication packet is "ServiceA2", so the scheduling unit 47 schedules the logic of "ServiceA2" to be executed before "ServiceA1".

そのスケジューリングを受けて、ロジック実行部４９は、「ServiceA1」よりも先に「ServiceA2」のロジックを実行し、「ServiceA2」のコンテナ２４の異常の解消を試みる。 In response to the scheduling, the logic execution unit 49 executes the logic of "ServiceA2" before "ServiceA1" and attempts to resolve the abnormality of the container 24 of "ServiceA2".

これとは逆に「ServiceA1」から先に対処することも考えられるが、この場合は仮に「ServiceA1」の異常が解消しても、送信先の「ServiceA2」で異常が発生しているため、「ServiceA1」にすぐに異常が発生することがある。 Conversely, it is also possible to deal with "ServiceA1" first. ServiceA1" may immediately malfunction.

これに対し、本例では送信先の「ServiceA2」から先に対処し、その次に「ServiceA1」の対処を行う。そのため、「ServiceA2」の異常が解消した後に「ServiceA1」の対処を行っているときに「ServiceA2」で再び異常が発生する可能性が少なく、異常への無駄な対処を抑制できる。 On the other hand, in this example, the transmission destination "ServiceA2" is dealt with first, and then "ServiceA1" is dealt with. Therefore, there is little possibility that an error will occur again in "ServiceA2" while "ServiceA1" is being dealt with after the error in "ServiceA2" has been resolved, and wasteful handling of the error can be suppressed.

（ハードウェア構成）
次に、本実施形態に係る異常対処装置２２のハードウェア構成について説明する。 (Hardware configuration)
Next, the hardware configuration of the abnormality handling device 22 according to this embodiment will be described.

図２４は、本実施形態に係る異常対処装置２２のハードウェア構成図である。 FIG. 24 is a hardware configuration diagram of the abnormality handling device 22 according to this embodiment.

図２４に示すように、異常対処装置２２は、記憶装置２２ａ、メモリ２２ｂ、プロセッサ２２ｃ、通信インターフェース２２ｄ、入力装置２２ｆ、表示装置２２ｇ、及び媒体読取装置２２ｈを有する。これらの各部は、バス２２ｊにより相互に接続される。 As shown in FIG. 24, the abnormality handling device 22 has a storage device 22a, a memory 22b, a processor 22c, a communication interface 22d, an input device 22f, a display device 22g, and a medium reading device 22h. These units are interconnected by a bus 22j.

このうち、記憶装置２２ａは、HDD(Hard Disk Drive)やSSD(Solid State Drive)等の不揮発性のストレージであって、本実施形態に係る異常対処プログラム１００を記憶する。 Among these, the storage device 22a is a non-volatile storage such as a HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores the anomaly handling program 100 according to the present embodiment.

なお、異常対処プログラム１００をコンピュータが読み取り可能な記録媒体２２ｉに記録し、媒体読取装置２２ｈを介してプロセッサ２２ｃにその異常対処プログラム１００を読み取らせるようにしてもよい。 It should be noted that the abnormality handling program 100 may be recorded in a computer-readable recording medium 22i, and the abnormality handling program 100 may be read by the processor 22c via the medium reading device 22h.

そのような記録媒体２２ｉとしては、例えばCD-ROM (Compact Disc - Read Only Memory)、DVD (Digital Versatile Disc)、及びUSB (Universal Serial Bus)メモリ等の物理的な可搬型記録媒体がある。また、フラッシュメモリ等の半導体メモリやハードディスクドライブを記録媒体２２ｉとして使用してもよい。これらの記録媒体２２ｉは、物理的な形態を持たない搬送波のような一時的な媒体ではない。 Examples of such a recording medium 22i include physical portable recording media such as CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile Disc), and USB (Universal Serial Bus) memory. A semiconductor memory such as a flash memory or a hard disk drive may also be used as the recording medium 22i. These recording media 22i are not temporary media like carrier waves that have no physical form.

更に、公衆回線、インターネット、及びLAN等に接続された装置に異常対処プログラム１００を記憶させてもよい。その場合は、プロセッサ２２ｃがその異常対処プログラム１００を読み出して実行すればよい。 Furthermore, the abnormality handling program 100 may be stored in a device connected to a public line, the Internet, a LAN, or the like. In that case, the processor 22c may read and execute the abnormality handling program 100. FIG.

一方、メモリ２２ｂは、DRAM(Dynamic Random Access Memory)等のようにデータを一時的に記憶するハードウェアである。 On the other hand, the memory 22b is hardware such as a DRAM (Dynamic Random Access Memory) that temporarily stores data.

プロセッサ２２ｃは、異常対処装置２２の各部を制御するCPUやGPU(Graphical Processing Unit)等のハードウェアである。また、プロセッサ２２ｃは、メモリ２２ｂと協働して異常対処プログラム１００を実行する。 The processor 22c is hardware such as a CPU or a GPU (Graphical Processing Unit) that controls each part of the anomaly handling device 22 . The processor 22c also executes the anomaly handling program 100 in cooperation with the memory 22b.

これにより、図７に示した異常対処装置２２の解析部３１、異常対処部３２、運用ポリシ生成部３４、運用ポリシ適用部３５、及び受付部３６が実現される。 As a result, the analysis unit 31, the abnormality handling unit 32, the operation policy generation unit 34, the operation policy application unit 35, and the reception unit 36 of the abnormality handling device 22 shown in FIG. 7 are realized.

また、記憶部３３（図７参照）は、記憶装置２２ａとメモリ２２ｂによって実現される。 Also, the storage unit 33 (see FIG. 7) is realized by the storage device 22a and the memory 22b.

更に、通信インターフェース２２ｄは、異常対処装置２２をインターネットやLAN(Local Area Network)等のネットワークに接続するためのNIC(Network Interface Card)等のハードウェアである。この通信インターフェース２２ｄを介して、コンテナ２４やコンテナ基盤２６の各々と異常対処装置２２が通信することができる。 Furthermore, the communication interface 22d is hardware such as a NIC (Network Interface Card) for connecting the abnormality handling device 22 to a network such as the Internet or a LAN (Local Area Network). Via this communication interface 22d, each of the container 24 and the container base 26 can communicate with the abnormality handling device 22. FIG.

また、入力装置２２ｆは、ロジックデータベース５１に含まれる各パラメータや運用ポリシの初期値を運用者が入力するためのキーボードやマウス等のハードウェアである。 The input device 22f is hardware such as a keyboard and a mouse for the operator to input each parameter and the initial value of the operation policy included in the logic database 51 .

表示装置２２ｇは、入力装置２２ｆを介して運用者が異常対処装置２２に入力した各種のデータを表示するための液晶ディスプレイ等の表示デバイスである。 The display device 22g is a display device such as a liquid crystal display for displaying various data input to the error handling device 22 by the operator via the input device 22f.

媒体読取装置２２ｈは、記録媒体２２ｉを読み取るためのCDドライブ、DVDドライブ、及びUSBインターフェース等のハードウェアである。 The medium reading device 22h is hardware such as a CD drive, a DVD drive, and a USB interface for reading the recording medium 22i.

１…業務システム、２…コンテナ基盤、２ａ…ヘルスチェック機構、４…サービス、５…コンテナ監視ソフトウェア、６…操作端末、１２…業務システム、２０…異常対処システム、２１…業務システム、２２…異常対処装置、２３…サービス、２３ａ…記憶領域、２３ｂ…通信テーブル、２４…コンテナ、２５…サイドカープロキシ、２６…コンテナ基盤、２７…サービス情報、２９…データベース、３１…解析部、３２…異常対処部、３３…記憶部、３４…運用ポリシ生成部、３５…運用ポリシ適用部、３６…受付部、４１…サービス情報取得部、４２…制御部、４４…異常情報リソース、４６…異常特定部、４７…スケジューリング部、４８…ロジック生成部、４９…ロジック実行部、５１…ロジックデータベース、５２…運用ポリシデータベース、１００…異常対処プログラム。

1 Business system 2 Container base 2a Health check mechanism 4 Service 5 Container monitoring software 6 Operation terminal 12 Business system 20 Abnormality handling system 21 Business system 22 Abnormality Response device 23 Service 23a Storage area 23b Communication table 24 Container 25 Sidecar proxy 26 Container base 27 Service information 29 Database 31 Analysis unit 32 Abnormality handling unit , 33 storage unit 34 operation policy generation unit 35 operation policy application unit 36 reception unit 41 service information acquisition unit 42 control unit 44 abnormality information resource 46 abnormality identification unit 47 Scheduling unit 48 Logic generation unit 49 Logic execution unit 51 Logic database 52 Operation policy database 100 Abnormality handling program.

Claims

Set the priority of abnormalities that occurred in each of multiple containers,
generating logic for resolving the anomaly for each type of anomaly;
executing the logic for the anomalies in order of the priority;
An anomaly handling program that causes a computer to execute processing.

identifying a first container that is a source of data and a second container that is a destination of data from among the plurality of containers;
if the priority of each of the anomaly of the first container and the anomaly of the second container are the same, then executing the logic for the anomaly of the second container first;
2. The abnormality handling program according to claim 1, further causing said computer to execute processing.

stopping execution of the logic for the same anomaly when the number of executions of the logic for the same anomaly exceeds a threshold;
2. The abnormality handling program according to claim 1, further causing said computer to execute processing.

a control unit that performs control to set the priority of anomalies that have occurred in each of a plurality of containers;
a generation unit that generates logic for resolving the anomaly for each type of anomaly;
an execution unit that executes the logic for the abnormality in order of the priority;
An anomaly coping system characterized by having:

the computer
Set the priority of abnormalities that occurred in each of multiple containers,
generating logic for resolving the anomaly for each type of anomaly;
executing the logic for the anomalies in order of the priority;
An abnormality coping method characterized by executing processing.