JP7311335B2

JP7311335B2 - DISTRIBUTED CONTAINER MONITORING SYSTEM AND DISTRIBUTED CONTAINER MONITORING METHOD

Info

Publication number: JP7311335B2
Application number: JP2019125791A
Authority: JP
Inventors: 明彦伊藤
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2023-07-19
Anticipated expiration: 2039-07-05
Also published as: JP2021012498A

Description

本発明は、分散型コンテナ監視システム及び分散型コンテナ監視方法に関するものである。 The present invention relates to a distributed container monitoring system and a distributed container monitoring method.

従来から、コンテナ環境において管理状態等を各サーバ資源がブロックチェーンとして保持することで、中央管理システムを使用せずに、サーバ資源に障害が発生した際、当該サーバ資源上で稼働していたコンテナの再配置先を決定し、再配置することでコンテナ環境を復旧するものがある（例えば、特許文献１参照）。 Conventionally, in a container environment, each server resource maintains the management status etc. as a blockchain, so that when a failure occurs in a server resource without using a central management system, the container that was running on the server resource concerned A container environment is restored by determining the relocation destination of the container and relocating the container (see, for example, Patent Literature 1).

特開２０１８－１５６４６５号公報JP 2018-156465 A

特許文献１に記載の技術では、エージェントがサーバ資源を監視することで、サーバ資源に障害が発生しているか否かを判断しているが、当該エージェントを有する装置が、障害発生してしまった場合、適切に障害監視できない可能性がある。 In the technique described in Patent Document 1, an agent monitors server resources to determine whether or not a failure has occurred in the server resources. In this case, fault monitoring may not be performed properly.

そこで本発明の目的は、より適切に障害監視することにある。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to perform failure monitoring more appropriately.

本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述及び添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will become apparent from the description of the specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 A brief outline of typical inventions disclosed in the present application is as follows.

本発明の代表的な実施の形態による監視対象のコンテナの稼働状況を監視する分散型コンテナ監視システムでは、監視対象のコンテナを監視する第１の監視コンテナと、第１の監視コンテナを監視する第２の監視コンテナと、を備え、第１の監視コンテナは、監視対象のコンテナに対して状況確認を示す信号を送信する第１信号送信部と、第１信号送信部により送信された信号に対する監視対象のコンテナの応答状況に基づいて、監視対象のコンテナの障害を検知する第１検知部と、を有し、第２の監視コンテナは、第１の監視コンテナに対して状況確認を示す信号を送信する第２信号送信部と、第２信号送信部により送信された信号に対する第１の監視コンテナの応答状況に基づいて、第１の監視コンテナの障害を検知する第２検知部と、を有する。 In a distributed container monitoring system for monitoring the operating status of a container to be monitored according to a representative embodiment of the present invention, a first monitoring container for monitoring the container to be monitored and a first monitoring container for monitoring the first monitoring container are provided. 2 monitoring containers, wherein the first monitoring container includes a first signal transmission unit for transmitting a signal indicating status confirmation to the container to be monitored, and monitoring the signal transmitted by the first signal transmission unit. a first detection unit that detects a failure of the container to be monitored based on the response status of the container to be monitored, and the second monitoring container sends a signal indicating status confirmation to the first monitoring container. a second signal transmission unit for transmitting; and a second detection unit for detecting a failure of the first monitoring container based on the response status of the first monitoring container to the signal transmitted by the second signal transmission unit. .

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, the effects obtained by representative ones are briefly described below.

すなわち、本発明の代表的な実施の形態によれば、より適切に障害監視することが可能となる。 That is, according to the representative embodiment of the present invention, fault monitoring can be performed more appropriately.

本実施形態である分散型コンテナ監視システムの構成について概要を示した図である。1 is a diagram showing an overview of the configuration of a distributed container monitoring system according to this embodiment; FIG. 本実施形態である監視コンテナの機能ブロック図である。3 is a functional block diagram of a monitoring container according to this embodiment; FIG. 本実施形態における監視コンテナによる業務コンテナ及び監視コンテナを監視し、業務コンテナに障害を検知した場合の処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing when the monitoring container monitors the business container and the monitoring container and detects a failure in the business container according to the present embodiment; 本実施形態における監視コンテナによる業務コンテナ及び監視コンテナを監視し、監視コンテナに障害を検知した場合の処理の流れを示すシーケンス図である。FIG. 10 is a sequence diagram showing the flow of processing when a monitoring container monitors a business container and a monitoring container and detects a failure in the monitoring container according to the present embodiment; 監視状況の画面例を説明する図の例である。It is an example of a figure explaining the example of a screen of a monitoring situation.

以下、本実施形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一部には原則として同一の符号を付し、その繰り返しの説明は省略する。一方で、ある図において符号を付して説明した部位について、他の図の説明の際に再度の図示はしないが同一の符号を付して言及する場合がある。 Hereinafter, this embodiment will be described in detail based on the drawings. In principle, the same parts are denoted by the same reference numerals throughout the drawings for describing the embodiments, and repeated descriptions thereof will be omitted. On the other hand, parts that have been described with reference numerals in one drawing may be referred to with the same reference numerals, although they are not shown again in the description of other drawings.

＜概要＞
図１は、本実施形態である分散型コンテナ監視システム１の構成について概要を示した図である。図１に示すように、分散型コンテナ監視システム１は、サーバ１０（サーバ１０ａ～サーバ１０ｃ）、及びコントロールサーバ２０を有する。 <Overview>
FIG. 1 is a diagram showing an overview of the configuration of a distributed container monitoring system 1 according to this embodiment. As shown in FIG. 1, the distributed container monitoring system 1 has servers 10 (servers 10 a to 10 c ) and a control server 20 .

分散型コンテナ監視システム１は、監視対象のコンテナ（業務アプリケーションを実行するコンテナ等）の稼働状況を監視するシステムである。ここで、コンテナとは、仮想化技術により実現されるものである。 The distributed container monitoring system 1 is a system that monitors the operation status of monitored containers (containers that execute business applications, etc.). Here, a container is realized by virtualization technology.

サーバ１０ａ～サーバ１０ｃ、コントロールサーバ２０は、いわゆるサーバ装置であり、ネットワーク３０を介して互いに情報・信号を送受信できる。 The servers 10a to 10c and the control server 20 are so-called server devices, and can exchange information and signals with each other via the network 30. FIG.

サーバ１０及びコントロールサーバ２０は、図示しないＣＰＵ（Central Processing Unit）により、ＨＤＤ（Hard Disk Drive）等の記録装置からメモリ上に展開したＯＳ（Operating System）やＤＢＭＳ（DataBase Management System）、Ｗｅｂサーバプログラム等のミドルウェアや、その上で稼働するソフトウェアを実行する。これにより、後述する各種機能を実現する。 The server 10 and the control server 20 run an OS (Operating System), a DBMS (DataBase Management System), and a Web server program developed on a memory from a recording device such as an HDD (Hard Disk Drive) by a CPU (Central Processing Unit) (not shown). and other middleware and software running on it. This implements various functions to be described later.

コントロールサーバ２０は、コンテナを管理し、サーバ１０へコンテナ作成の指示をする。コントロールサーバ２０は、サーバ１０で実行しているコンテナの稼働状況の情報をサーバ１０から取得し、取得した情報を出力処理する。 The control server 20 manages containers and instructs the server 10 to create containers. The control server 20 acquires from the server 10 information on the operation status of the container being executed by the server 10, and outputs the acquired information.

このコントロールサーバ２０は、Ｋｕｂｅｒｎｅｔｅｓ等の、既存の分散型コンテナ運用管理ソフトウェアを実行することで、各種機能を実現する。 The control server 20 implements various functions by executing existing distributed container operation management software such as Kubernetes.

サーバ１０は、コントロールサーバ２０からの指示に基づいて、コンテナを生成する。サーバ１０は、コントロールサーバ２０からの指示に基づいて、業務アプリケーションを実行するコンテナである業務コンテナ１１（監視対象のコンテナ）や、当該業務コンテナを監視することが可能なコンテナである監視コンテナ１２を生成する。 The server 10 creates containers based on instructions from the control server 20 . Based on instructions from the control server 20, the server 10 selects a business container 11 (container to be monitored), which is a container for executing business applications, and a monitoring container 12, which is a container capable of monitoring the business container. Generate.

また、サーバ１０における監視コンテナ１２が、業務コンテナ１１の監視をする。また、業務コンテナ１１を監視する監視コンテナ１２以外の監視コンテナ１２が、業務コンテナ１１を監視する監視コンテナ１２を監視する。 Also, the monitoring container 12 in the server 10 monitors the business container 11 . Also, the monitoring container 12 other than the monitoring container 12 monitoring the business container 11 monitors the monitoring container 12 monitoring the business container 11 .

なお、図１に示す分散型コンテナ監視システム１では、サーバ１０ａ（識別子はサーバＳ１）は、監視コンテナ１２ａ（識別子は監視コンテナＭ１）有する。また、サーバ１０ｂ（識別子はサーバＳ２）は、業務コンテナ１１と監視コンテナ１２ｂ（識別子は監視コンテナＭ２）とを有する。また、サーバ１０ｃ（識別子はサーバＳ３）は、監視コンテナ１２ｃ（識別子は監視コンテナＭ３）を有する。 In the distributed container monitoring system 1 shown in FIG. 1, the server 10a (identifier is server S1) has a monitoring container 12a (identifier is monitoring container M1). The server 10b (identifier is server S2) has a business container 11 and a monitoring container 12b (identifier is monitoring container M2). The server 10c (identifier is server S3) has a monitoring container 12c (identifier is monitoring container M3).

このように、分散型コンテナ監視システム１では、監視コンテナ１２が、各サーバ資源（サーバ１０ａ～サーバ１０ｃ）に分散配置されている。 As described above, in the distributed container monitoring system 1, the monitoring containers 12 are distributed to each server resource (server 10a to server 10c).

続いて、監視コンテナ１２の機能について、図２を用いて説明する。図２は、監視コンテナ１２の機能ブロック図である。 Next, functions of the monitoring container 12 will be described with reference to FIG. FIG. 2 is a functional block diagram of the monitoring container 12. As shown in FIG.

図２に示すように、監視コンテナ１２は、役割決定部１２１、第１信号送信部１２２、第１検知部１２３、第２信号送信部１２４、第２検知部１２５、検知結果出力部１２６、及び復旧部１２７を有する。 As shown in FIG. 2, the monitoring container 12 includes a role determination unit 121, a first signal transmission unit 122, a first detection unit 123, a second signal transmission unit 124, a second detection unit 125, a detection result output unit 126, and a It has a recovery unit 127 .

役割決定部１２１は、監視コンテナ１２のそれぞれが、業務コンテナ１１を監視する監視コンテナ１２（第１の監視コンテナ）として機能するか、業務コンテナ１１を監視する監視コンテナ１２を監視するコンテナ（第２の監視コンテナ）として機能するかを決定する部分である。 The role determining unit 121 determines whether each of the monitoring containers 12 functions as a monitoring container 12 (first monitoring container) that monitors the business container 11 or functions as a container (second monitoring container) that monitors the monitoring container 12 that monitors the business container 11 . monitoring container).

役割決定部１２１は、例えば、コントロールサーバ２０からリーダ立候補の問い合わせを受けると、リーダ立候補を示す信号をコントロールサーバ２０または他の監視コンテナ１２へ送出する。ここでリーダとは、業務コンテナ１１を監視するコンテナを意味する。 For example, upon receiving a leader candidacy inquiry from the control server 20 , the role determining unit 121 sends a signal indicating the leader candidacy to the control server 20 or other monitoring container 12 . Here, a leader means a container that monitors the business container 11 .

また、役割決定部１２１は、他の監視コンテナ１２からのリーダ立候補を示す信号を受信する。役割決定部１２１は、最も早くリーダ立候補を示す信号を送信した監視コンテナ１２を、業務コンテナ１１を監視する監視コンテナ１２に決定する。また、他の監視コンテナ１２を、業務コンテナ１１を監視する監視コンテナ１２に決定する。 Also, the role determination unit 121 receives a signal indicating a leader candidacy from another monitoring container 12 . The role determining unit 121 determines the monitoring container 12 that has transmitted the signal indicating the leader candidate earliest to be the monitoring container 12 that monitors the business container 11 . Also, another monitoring container 12 is determined to be the monitoring container 12 that monitors the business container 11 .

また、役割決定部１２１は、自監視コンテナ１２が、業務コンテナ１１を監視する監視コンテナ１２である場合、定期的にリーダ継続表明を示す信号を他の監視コンテナ１２へ送信する。 Further, when the self-monitoring container 12 is the monitoring container 12 that monitors the business container 11 , the role determination unit 121 periodically transmits a signal indicating the leader continuation assertion to the other monitoring containers 12 .

また、業務コンテナ１１を監視する監視コンテナ１２に障害が発生していることが検知された場合に、役割決定部１２１は、リーダ立候補を示す信号をコントロールサーバ２０や他の監視コンテナ１２へ送信する。このように、役割決定部１２１は、リーダとなる監視コンテナ１２に障害が発生した場合に、役割を決定し直す。このように、役割決定部１２１は、リーダである監視コンテナ１２に障害が発生したことをトリガとして、障害が発生した監視コンテナ１２以外の監視コンテナ１２をリーダにする。 Further, when it is detected that a failure has occurred in the monitoring container 12 that monitors the business container 11, the role determining unit 121 transmits a signal indicating a leader candidate to the control server 20 and other monitoring containers 12. . In this manner, the role determination unit 121 re-determines the role when a failure occurs in the monitoring container 12 serving as the leader. In this way, the role determining unit 121, triggered by the occurrence of a failure in the monitoring container 12 that is the leader, sets the monitoring container 12 other than the failed monitoring container 12 as the leader.

第１信号送信部１２２は、監視対象のコンテナである業務コンテナ１１に対して状況確認を示す信号を送信する部分である。 The first signal transmission unit 122 is a part that transmits a signal indicating status confirmation to the business container 11, which is a container to be monitored.

役割決定部１２１により、自コンテナが、業務コンテナ１１を監視する監視コンテナ１２として機能することが決定された場合、第１信号送信部１２２は、予め定められているタイミングで業務コンテナ１１へ生存確認を問い合わせる信号（ヘルスチェック信号）である生存確認信号を送信する。 When the role determining unit 121 determines that the own container functions as the monitoring container 12 that monitors the business container 11, the first signal transmitting unit 122 confirms the survival of the business container 11 at a predetermined timing. a signal (health check signal) to inquire about the

第１信号送信部１２２は、生存確認信号を送信すると、送信した旨を第１検知部１２３へ通知する。 After transmitting the survival confirmation signal, first signal transmitting section 122 notifies first detecting section 123 of the transmission.

第１検知部１２３は、第１信号送信部１２２により送信された信号に対する業務コンテナ１１の応答状況に基づいて、業務コンテナ１１の障害を検知する部分である。 The first detection unit 123 is a part that detects a failure of the business container 11 based on the response status of the business container 11 to the signal transmitted by the first signal transmission unit 122 .

第１検知部１２３は、第１信号送信部１２２により、生存確認信号が送信された旨の通知を取得すると、業務コンテナ１１からの当該生存確認信号に対する応答信号の受付をする。 When the first detection unit 123 acquires the notification that the life confirmation signal has been transmitted by the first signal transmission unit 122 , the first detection unit 123 receives a response signal to the life confirmation signal from the business container 11 .

第１検知部１２３は、当該応答信号の内容または、当該応答信号の受信状態に基づいて、業務コンテナ１１の障害を検知する。 The first detection unit 123 detects a failure of the business container 11 based on the content of the response signal or the reception state of the response signal.

例えば、第１検知部１２３は、第１信号送信部１２２から生存確認信号が送信されてから予め定められている期間内に応答信号が送信されなかった場合、この結果に基づいて、業務コンテナ１１の障害を検知する。 For example, if the response signal is not transmitted within a predetermined period after the survival confirmation signal is transmitted from the first signal transmission unit 122, the first detection unit 123 detects the status of the business container 11 based on this result. to detect failures in

なお、第１検知部１２３は、応答信号を受信するタイミングが徐々に遅くなってきている場合に、業務コンテナ１１の障害を検知するようにしてもよい。これは、業務コンテナ１１の障害の蓋然性が高いためである。 Note that the first detection unit 123 may detect a failure of the business container 11 when the timing of receiving the response signal is gradually delayed. This is because the business container 11 has a high probability of failure.

第１検知部１２３は、応答信号を受信した場合、当該応答信号を検知結果出力部１２６へ送出する。また、第１検知部１２３は、業務コンテナ１１の障害を検知した場合、障害を検知した旨を検知結果出力部１２６へ送出する。 When receiving the response signal, the first detection unit 123 outputs the response signal to the detection result output unit 126 . Further, when the first detection unit 123 detects a failure of the business container 11 , it sends the detection of the failure to the detection result output unit 126 .

なお、第１検知部１２３は、当該応答信号自体や、当該応答信号に関する情報（応答信号を受信するまでの時間に関する情報）を検知結果出力部１２６へ送出してもよい。 Note that the first detection unit 123 may send the response signal itself or information about the response signal (information about the time until the response signal is received) to the detection result output unit 126 .

第２信号送信部１２４は、業務コンテナ１１を監視する監視コンテナ１２に対して状況確認を示す信号を送信する部分である。 The second signal transmission unit 124 is a part that transmits a signal indicating status confirmation to the monitoring container 12 that monitors the business container 11 .

役割決定部１２１により、自コンテナが、業務コンテナ１１を監視する監視コンテナ１２を監視する監視コンテナ１２として機能することが決定された場合、第２信号送信部１２４は、所定のタイミングで監視対象の監視コンテナ１２に対して生存確認信号を送信する。 When the role determination unit 121 determines that the own container functions as the monitoring container 12 that monitors the monitoring container 12 that monitors the business container 11, the second signal transmission unit 124 determines the monitoring target at a predetermined timing. A survival confirmation signal is transmitted to the monitoring container 12 .

第２信号送信部１２４は、生存確認信号を送信すると、送信した旨を第２検知部１２５へ通知する。 After transmitting the survival confirmation signal, second signal transmission section 124 notifies second detection section 125 of the transmission.

第２検知部１２５は、第２信号送信部１２４により送信された信号に対する、監視対象の監視コンテナ１２の応答状況に基づいて、監視対象の監視コンテナ１２の障害を検知する部分である。 The second detection unit 125 is a part that detects a failure of the monitoring container 12 to be monitored based on the response status of the monitoring container 12 to be monitored to the signal transmitted by the second signal transmission unit 124 .

第２検知部１２５は、第２信号送信部１２４により、生存確認信号が送信された旨の通知を取得すると、監視対象の監視コンテナ１２からの当該生存確認信号に対する応答信号の受付をする。監視対象の監視コンテナ１２は、正常に稼働している場合、当該生存確認信号を受信すると、応答信号を当該生存確認信号の送信元へ送信する。 When the second detection unit 125 acquires the notification that the survival confirmation signal has been transmitted by the second signal transmission unit 124, the second detection unit 125 receives a response signal to the survival confirmation signal from the monitoring container 12 to be monitored. When the monitoring container 12 to be monitored is operating normally and receives the life confirmation signal, it transmits a response signal to the sender of the life confirmation signal.

第２検知部１２５は、当該応答信号の内容または、当該応答信号の受信状態に基づいて、監視対象の監視コンテナ１２の障害を検知する。 The second detection unit 125 detects a failure of the monitoring container 12 to be monitored based on the content of the response signal or the reception state of the response signal.

例えば、第２検知部１２５は、第２信号送信部１２４から生存確認信号が送信されてから予め定められている期間内に応答信号が送信されなかった場合、この結果に基づいて、監視対象の監視コンテナ１２の障害を検知する。 For example, when the response signal is not transmitted within a predetermined period after the survival confirmation signal is transmitted from the second signal transmission unit 124, the second detection unit 125 determines the monitoring target based on this result. A failure of the monitoring container 12 is detected.

なお、第２検知部１２５は、応答信号を受信するタイミングが徐々に遅くなってきている場合に、監視対象の監視コンテナ１２の障害を検知するようにしてもよい。これは、監視対象の監視コンテナ１２の障害の蓋然性が高いためである。 Note that the second detection unit 125 may detect a failure of the monitoring container 12 to be monitored when the timing of receiving the response signal is gradually delayed. This is because the probability of failure of the monitoring container 12 to be monitored is high.

第２検知部１２５は、応答信号を受信した場合、当該応答信号を検知結果出力部１２６へ送出する。また、第２検知部１２５は、監視対象の監視コンテナ１２の障害を検知した場合、障害を検知した旨を検知結果出力部１２６へ送出する。なお、第２検知部１２５は、当該応答信号自体や、当該応答信号に関する情報（応答信号を受信するまでの時間に関する情報）を検知結果出力部１２６へ送出してもよい。 When receiving the response signal, the second detection unit 125 sends the response signal to the detection result output unit 126 . Further, when the second detection unit 125 detects a failure of the monitoring container 12 to be monitored, the second detection unit 125 sends a notification to the effect that the failure has been detected to the detection result output unit 126 . Note that the second detection unit 125 may send the response signal itself or information about the response signal (information about the time until the response signal is received) to the detection result output unit 126 .

検知結果出力部１２６は、第１検知部１２３または第２検知部１２５による検知結果を出力する部分である。 The detection result output unit 126 is a part that outputs the detection result by the first detection unit 123 or the second detection unit 125 .

例えば、検知結果出力部１２６は、第１検知部１２３または第２検知部１２５による検知結果をコントロールサーバ２０へ送信する。 For example, the detection result output unit 126 transmits the detection result by the first detection unit 123 or the second detection unit 125 to the control server 20 .

なお、検知結果出力部１２６は、第１検知部１２３または第２検知部１２５から取得した応答信号自体や当該応答信号に関する情報を取得して、これらの情報をコントロールサーバ２０へ送信してもよい。 Note that the detection result output unit 126 may acquire information about the response signal itself or the response signal acquired from the first detection unit 123 or the second detection unit 125 and transmit this information to the control server 20. .

復旧部１２７は、第１検知部１２３により監視対象のコンテナである業務コンテナ１１の障害が検知された場合、コンテナ環境の復旧処理をする部分である。 The recovery unit 127 is a part that performs recovery processing of the container environment when the first detection unit 123 detects a failure in the business container 11, which is a container to be monitored.

復旧部１２７は、第１検知部１２３から業務コンテナ１１の障害が検知された旨の通知を受けると、障害が発生した業務コンテナ１１以外のサーバ１０に対して、コンテナ作成要求をして、業務コンテナ１１の作成要求をする。この場合、要求先のサーバ１０が、業務コンテナ１１を作成して、当該業務コンテナ１１を稼働させる。このように、復旧部１２７は、業務コンテナ１１の障害が検知された場合、コンテナ環境の復旧処理をする。 Upon receiving notification from the first detection unit 123 that a failure of the business container 11 has been detected, the recovery unit 127 requests the server 10 other than the business container 11 in which the failure occurred to create a container, and restores the business. Make a request to create the container 11 . In this case, the server 10 to which the request is made creates the business container 11 and operates the business container 11 . In this manner, the recovery unit 127 performs recovery processing of the container environment when a failure of the business container 11 is detected.

＜処理手順＞
続いて、図３を用いて、本実施形態における監視コンテナ１２による業務コンテナ１１及び監視コンテナ１２を監視し、業務コンテナ１１に障害を検知した場合の処理について説明する。 <Processing procedure>
Next, with reference to FIG. 3, processing when the monitoring container 12 in this embodiment monitors the business container 11 and the monitoring container 12 and detects a failure in the business container 11 will be described.

図３は、本実施形態における監視コンテナ１２による業務コンテナ１１及び監視コンテナ１２を監視し、業務コンテナ１１に障害を検知した場合の処理の流れを示すシーケンス図である。 FIG. 3 is a sequence diagram showing the flow of processing when the monitoring container 12 monitors the business container 11 and the monitoring container 12 and detects a failure in the business container 11 in this embodiment.

まず、サーバ１０ｂにおいて、業務コンテナ１１が稼働しているものとする（ステップＳ１）。コントロールサーバ２０からリーダ候補の問い合わせを受信すると、役割決定部１２１は、リーダ立候補を示す信号を他の監視コンテナ１２へ送信する。例えば、監視コンテナ１２ａの役割決定部１２１は、監視コンテナ１２ｂ及び監視コンテナ１２ｃへリーダ立候補を示す信号を送信する（ステップＳ２、ステップＳ３）。 First, it is assumed that the business container 11 is running in the server 10b (step S1). Upon receiving a leader candidate inquiry from the control server 20 , the role determining unit 121 transmits a signal indicating a leader candidate to the other monitoring containers 12 . For example, the role determining unit 121 of the monitoring container 12a transmits a signal indicating leader candidacy to the monitoring containers 12b and 12c (steps S2 and S3).

監視コンテナ１２ａの役割決定部１２１は、自監視コンテナ１２ａが最も早くリーダ立候補を示す信号を送信している場合、自監視コンテナ１２ａを、業務コンテナ１１を監視する監視コンテナに決定する。また、監視コンテナ１２ａの役割決定部１２１は、監視コンテナ１２ｂ及び監視コンテナ１２ｃを、監視コンテナ１２ａを監視する監視コンテナに決定する。 The role determining unit 121 of the monitoring container 12a determines the monitoring container 12a to be the monitoring container that monitors the business container 11 when the monitoring container 12a transmits the signal indicating the leader candidate earliest. Also, the role determining unit 121 of the monitoring container 12a determines the monitoring container 12b and the monitoring container 12c as monitoring containers that monitor the monitoring container 12a.

監視コンテナ１２ａの第１信号送信部１２２は、業務コンテナ１１の生存確認を示す信号を送信し、第１検知部１２３が、業務コンテナ１１から応答信号を受信した場合、業務コンテナ１１が障害していないと判断する（ステップＳ４）。 The first signal transmission unit 122 of the monitoring container 12a transmits a signal indicating that the business container 11 is alive. It is determined that there is no (step S4).

また、監視コンテナ１２ａの役割決定部１２１は、所定期間毎にリーダ継続表明を示す信号を監視コンテナ１２ｂ及び監視コンテナ１２ｃへ送信する（ステップＳ５、ステップＳ６）。 In addition, the role determination unit 121 of the monitoring container 12a transmits a signal indicating the leader continuation assertion to the monitoring container 12b and the monitoring container 12c every predetermined period (Steps S5 and S6).

また、監視コンテナ１２ｂの第２信号送信部１２４は、生存確認を示す信号を監視コンテナ１２ａへ送信し、監視コンテナ１２ｂの第２検知部１２５は、監視コンテナ１２ａから応答信号を受信することで生存確認をする（ステップＳ７）。 Further, the second signal transmission unit 124 of the monitoring container 12b transmits a signal indicating confirmation of survival to the monitoring container 12a, and the second detection unit 125 of the monitoring container 12b receives a response signal from the monitoring container 12a to confirm that the monitoring container 12b is alive. Confirm (step S7).

また、監視コンテナ１２ｃの第２信号送信部１２４は、生存確認を示す信号を監視コンテナ１２ａへ送信し、監視コンテナ１２ｃの第２検知部１２５は、監視コンテナ１２ａから応答信号を受信することで生存確認をする（ステップＳ８）。 Further, the second signal transmission unit 124 of the monitoring container 12c transmits a signal indicating survival confirmation to the monitoring container 12a, and the second detection unit 125 of the monitoring container 12c receives a response signal from the monitoring container 12a to confirm that the monitoring container 12c is alive. Confirm (step S8).

また、監視コンテナ１２ａの第１信号送信部１２２は、業務コンテナ１１の生存確認を示す信号を送信する（ステップＳ９）。なお、ステップＳ４～ステップＳ９の間で、サーバ１０ｂにおいて、障害が発生している。 Also, the first signal transmission unit 122 of the monitoring container 12a transmits a signal indicating that the business container 11 is alive (step S9). A failure occurs in the server 10b between steps S4 to S9.

また、監視コンテナ１２ａの役割決定部１２１は、所定期間毎にリーダ継続表明を示す信号を監視コンテナ１２ｂ及び監視コンテナ１２ｃへ送信する（ステップＳ１０、ステップＳ１１）。 In addition, the role determining unit 121 of the monitoring container 12a transmits a signal indicating the leader continuation assertion to the monitoring container 12b and the monitoring container 12c every predetermined period (steps S10 and S11).

また、監視コンテナ１２ｃの第２信号送信部１２４は、生存確認を示す信号を監視コンテナ１２ａへ送信し、監視コンテナ１２ｃの第２検知部１２５は、監視コンテナ１２ａから応答信号を受信することで生存確認をする（ステップＳ１２）。 Further, the second signal transmission unit 124 of the monitoring container 12c transmits a signal indicating survival confirmation to the monitoring container 12a, and the second detection unit 125 of the monitoring container 12c receives a response signal from the monitoring container 12a to confirm that the monitoring container 12c is alive. Confirm (step S12).

監視コンテナ１２ａの第１検知部１２３は、ステップＳ９において送信した生存確認を示す信号を送信してから所定期間応答信号が無いので、業務コンテナ１１において障害が発生したことを検知する。復旧部１２７は、これに応じて、サーバ１０ｃに対してコンテナ復旧指示の信号を送信し（ステップＳ１３）、サーバ１０ｃがこれに応じて、業務コンテナの生成（復旧）をする（ステップＳ１４）。 The first detection unit 123 of the monitoring container 12a detects that a failure has occurred in the business container 11 because there is no response signal for a predetermined period after the signal indicating the confirmation of survival transmitted in step S9 is transmitted. In response, the recovery unit 127 transmits a container recovery instruction signal to the server 10c (step S13), and the server 10c generates (recovers) a business container in response to this (step S14).

ステップＳ１５において、ステップＳ６と同様にリーダ継続表明し、ステップＳ１６において、ステップＳ８と同様にリーダ生存確認処理をする。 In step S15, leader continuation is asserted in the same manner as in step S6, and in step S16, leader survival confirmation processing is performed in the same manner as in step S8.

また、ステップＳ１７において、ステップＳ６と同様にリーダ継続表明し、ステップＳ１８において、ステップＳ８と同様にリーダ生存確認処理をする。 Further, in step S17, leader continuation is asserted as in step S6, and in step S18, leader survival confirmation processing is performed in the same manner as in step S8.

また、ステップＳ１９において、監視コンテナ１２ａの第１信号送信部１２２は、業務コンテナ１１の生存確認を示す信号をサーバ１０ｃの業務コンテナ１１に送信する（ステップＳ１９）。 Also, in step S19, the first signal transmission unit 122 of the monitoring container 12a transmits a signal indicating that the business container 11 is alive to the business container 11 of the server 10c (step S19).

続いて、図４を用いて、本実施形態における監視コンテナ１２による業務コンテナ１１及び監視コンテナ１２を監視し、監視コンテナ１２に障害を検知した場合の処理について説明する。図４は、本実施の形態における監視コンテナ１２による業務コンテナ１１及び監視コンテナ１２を監視し、監視コンテナ１２に障害を検知した場合の処理の流れを示すシーケンス図である。 Next, with reference to FIG. 4, processing when the business container 11 and the monitoring container 12 are monitored by the monitoring container 12 in this embodiment and a failure is detected in the monitoring container 12 will be described. FIG. 4 is a sequence diagram showing the flow of processing when the monitoring container 12 monitors the business container 11 and the monitoring container 12 and detects a failure in the monitoring container 12 in this embodiment.

ステップＳ３１～ステップＳ３８は、図３に示したシーケンス図のステップＳ１～ステップＳ８と同様のため、説明を省略する。 Since steps S31 to S38 are the same as steps S1 to S8 in the sequence diagram shown in FIG. 3, description thereof will be omitted.

ステップＳ３９では、ステップＳ３４と同様に、監視コンテナ１２ａの第１信号送信部１２２が、業務コンテナ１１の生存確認を示す信号を送信し、第１検知部１２３が、業務コンテナ１１から応答信号を受信した場合、業務コンテナ１１が障害していないと判断する（ステップＳ３９）。 In step S39, as in step S34, the first signal transmission unit 122 of the monitoring container 12a transmits a signal indicating that the business container 11 is alive, and the first detection unit 123 receives a response signal from the business container 11. If so, it is determined that the business container 11 is not in trouble (step S39).

ステップＳ４０～ステップＳ４３は、ステップＳ３５～ステップＳ３８と同様に、リーダ継続表明及びリーダ生存確認処理をする。 In steps S40 to S43, similarly to steps S35 to S38, leader continuation assertion and leader survival confirmation processing are performed.

ステップＳ４３の後に、サーバ１０ａにおいて、障害が発生する。この後で、サーバ１０ｂ及びサーバ１０ｃの第２信号送信部１２４は、生存確認を示す信号を監視コンテナ１２ａへ送信する（ステップＳ４４、ステップＳ４５）。 After step S43, a failure occurs in the server 10a. After that, the second signal transmission units 124 of the servers 10b and 10c transmit a signal indicating survival confirmation to the monitoring container 12a (steps S44 and S45).

サーバ１０ｂ及びサーバ１０ｃの第２検知部１２５は、第２信号送信部１２４により生存確認を示す信号を送信してから待機期間（ステップＳ４６）を経過すると、監視コンテナ１２ａにおいて障害が発生したことを検知する。 The second detection units 125 of the servers 10b and 10c detect that a failure has occurred in the monitoring container 12a after a waiting period (step S46) has elapsed since the second signal transmission unit 124 transmitted a signal indicating survival confirmation. detect.

これに応じて、監視コンテナ１２ｂの役割決定部１２１は、リーダ立候補を示す信号を監視コンテナ１２ｃへ送信し、監視コンテナ１２ｂを、業務コンテナ１１を監視する監視コンテナに決定する（ステップＳ４７）。 In response to this, the role determining unit 121 of the monitoring container 12b transmits a signal indicating the leader candidacy to the monitoring container 12c, and determines the monitoring container 12b to be the monitoring container that monitors the business container 11 (step S47).

監視コンテナ１２ｂの第１信号送信部１２２は、業務コンテナ１１の生存確認を示す信号を送信し、第１検知部１２３が、業務コンテナ１１から応答信号を受信した場合、業務コンテナ１１が障害していないと判断する（ステップＳ４８）。 The first signal transmission unit 122 of the monitoring container 12b transmits a signal indicating that the business container 11 is alive. It is determined that there is no (step S48).

また、監視コンテナ１２ｂの役割決定部１２１は、所定期間毎にリーダ継続表明を示す信号を監視コンテナ１２ｃへ送信する（ステップＳ４９）。また、監視コンテナ１２ｃの第２信号送信部１２４は、生存確認を示す信号を監視コンテナ１２ｂへ送信し、監視コンテナ１２ｃの第２検知部１２５は、監視コンテナ１２ｂから応答信号を受信することで生存確認をする（ステップＳ５０）。 In addition, the role determining unit 121 of the monitoring container 12b transmits a signal indicating the assertion of leader continuation to the monitoring container 12c every predetermined period (step S49). Further, the second signal transmission unit 124 of the monitoring container 12c transmits a signal indicating confirmation of survival to the monitoring container 12b, and the second detection unit 125 of the monitoring container 12c receives a response signal from the monitoring container 12b to confirm that the container is alive. Confirm (step S50).

また、監視コンテナ１２ｂの第１信号送信部１２２は、業務コンテナ１１の生存確認を示す信号を送信し、第１検知部１２３が、業務コンテナ１１から応答信号を受信した場合、業務コンテナ１１が障害していないと判断する（ステップＳ５１）。 Also, the first signal transmission unit 122 of the monitoring container 12b transmits a signal indicating that the business container 11 is alive, and when the first detection unit 123 receives a response signal from the business container 11, the business container 11 fails It is determined that it is not (step S51).

続いて、監視状況の画面の例について、図５を用いて説明する。図５は、監視状況の画面例を説明する図である。 Next, an example of a monitoring status screen will be described with reference to FIG. FIG. 5 is a diagram for explaining an example of a monitor status screen.

図５の画面は、例えば、コントロールサーバ２０が、検知結果出力部１２６から取得した情報に基づいて生成した画面である。 The screen in FIG. 5 is a screen generated by the control server 20 based on information acquired from the detection result output unit 126, for example.

図５の例では、サーバ３台を管理していることが示されている。具体的に、ノード名が「Ｗｏｒｋｅｒ＃１」であるサーバは、識別子が監視コンテナＭ１である監視コンテナを有する。また、ノード名が「Ｗｏｒｋｅｒ＃２」であるサーバは、識別子が業務コンテナＣ１である業務コンテナと、識別子が監視コンテナＭ２である監視コンテナを有する。また、ノード名が「Ｗｏｒｋｅｒ＃３」であるサーバは、識別子が業務コンテナＣ２である業務コンテナと、識別子が監視コンテナＭ３である監視コンテナを有する。 The example of FIG. 5 shows that three servers are managed. Specifically, the server whose node name is "Worker#1" has a monitoring container whose identifier is the monitoring container M1. Also, the server whose node name is "Worker#2" has a business container whose identifier is the business container C1 and a monitoring container whose identifier is the monitoring container M2. Also, the server with the node name "Worker#3" has a business container with the identifier of the business container C2 and a monitoring container with the identifier of the monitoring container M3.

図５の例では、監視コンテナＭ１が、業務コンテナＣ１及び業務コンテナＣ２を監視し、監視コンテナＭ２及び監視コンテナＭ３が、監視コンテナＭ１を監視することが示されている。 The example of FIG. 5 shows that the monitoring container M1 monitors the business containers C1 and C2, and the monitoring containers M2 and M3 monitor the monitoring container M1.

また、「ＨｅａｌｔｈＣｈｅｃｋＲｅｓｐｏｎｓｅＴｉｍｅ」では、業務コンテナＣ１及び業務コンテナＣ２の生存確認を示す信号に対するレスポンスタイムのグラフを示している。 "HealthCheck Response Time" shows a graph of the response time to the signal indicating the confirmation of the existence of the business container C1 and the business container C2.

また、「ＭｅｓｓａｇｅＣｏｕｎｔ」では、メッセージのレベル毎（Ｉｎｆｏ、Ｗａｒｎ、Ｅｒｒｏｒ）のメッセージ数の推移を示している。 Further, "Message Count" shows the transition of the number of messages for each message level (Info, Warn, Error).

また、詳細欄Ｄ１では、メッセージを送信した業務コンテナ、メッセージの送信日、メッセージの送信時刻、メッセージのレベル、当該業務コンテナの所属ノード、具体的なメッセー内容を示している。 The detail field D1 shows the business container that sent the message, the message transmission date, the message transmission time, the message level, the node to which the business container belongs, and the specific message content.

上述の実施形態では、監視コンテナ１２が、役割決定部１２１を有する場合について述べたが、コントロールサーバ２０が、役割決定部１２１を有するようにしてもよい。 Although the monitoring container 12 has the role determination unit 121 in the above embodiment, the control server 20 may have the role determination unit 121 .

上述の実施形態では、役割決定部１２１が、動的にリーダを決定する場合について述べたが、予め固定してリーダを決定していてもよい。 In the above-described embodiment, the case where the role determination unit 121 dynamically determines the leader was described, but the leader may be fixed in advance.

監視コンテナ１２が、第１信号送信部１２２、第１検知部１２３、第２信号送信部１２４、及び第２検知部１２５を有する場合について述べたが、これに限られず、予めリーダであるか否か決まっている場合、全て有していなくてもよい。 Although the case where the monitoring container 12 has the first signal transmission unit 122, the first detection unit 123, the second signal transmission unit 124, and the second detection unit 125 has been described, the present invention is not limited to this. If it is decided, it is not necessary to have all of them.

＜作用効果＞
上述の分散型コンテナ監視システム１における、業務コンテナを監視する監視コンテナ１２（上述の実施例における監視コンテナ１２ａ）では、第１信号送信部１２２が、監視対象のコンテナである業務コンテナ１１に対して状況確認を示す信号を送信し、第１検知部１２３が、第１信号送信部１２２により送信された信号に対する業務コンテナ１１の応答状況に基づいて、業務コンテナ１１の障害を検知する。 <Effect>
In the monitoring container 12 (monitoring container 12a in the above embodiment) that monitors the business container in the distributed container monitoring system 1 described above, the first signal transmission unit 122 sends the A signal indicating status confirmation is transmitted, and the first detection unit 123 detects a failure of the business container 11 based on the response status of the business container 11 to the signal transmitted by the first signal transmission unit 122 .

また、当該業務コンテナを監視する監視コンテナ１２を監視する監視コンテナ１２（上述の実施例における監視コンテナ１２ｃ）では、第２信号送信部１２４が、監視コンテナ１２ａに対して状況確認を示す信号を送信し、第２検知部１２５が、第２信号送信部１２４により送信された信号に対する監視コンテナ１２ａの応答状況に基づいて、監視コンテナ１２ａの障害を検知する。 Also, in the monitoring container 12 (monitoring container 12c in the above embodiment) monitoring the monitoring container 12 monitoring the business container, the second signal transmitting unit 124 transmits a signal indicating status confirmation to the monitoring container 12a. Then, the second detection unit 125 detects a failure of the monitoring container 12a based on the response status of the monitoring container 12a to the signal transmitted by the second signal transmission unit 124. FIG.

この場合、分散型コンテナ監視システム１では、業務コンテナ１１を監視する監視コンテナ１２をさらに監視するので、業務コンテナ１１を監視する監視コンテナ１２に障害が発生したとしても、速やかに障害に対応することができる。すなわち、より適切に障害監視することができる。また、上述の実施形態の分散型コンテナ監視システム１では、ブロックチェーンを用いることなく、サーバ資源を最大限有効活用し、より単純な実装で障害監視することができる。 In this case, since the distributed container monitoring system 1 further monitors the monitoring container 12 that monitors the business container 11, even if a failure occurs in the monitoring container 12 that monitors the business container 11, the failure can be dealt with promptly. can be done. That is, failure monitoring can be performed more appropriately. Furthermore, in the distributed container monitoring system 1 of the above-described embodiment, server resources can be utilized to the maximum and fault monitoring can be performed with a simpler implementation without using blockchain.

また、役割決定部１２１は、監視コンテナ１２のそれぞれが、業務コンテナ１１を監視する監視コンテナ１２として機能するか、業務コンテナ１１を監視する監視コンテナ１２を監視する監視コンテナ１２として機能するかを決定する。 Also, the role determining unit 121 determines whether each of the monitoring containers 12 functions as the monitoring container 12 that monitors the business container 11 or functions as the monitoring container 12 that monitors the monitoring container 12 that monitors the business container 11 . do.

このように、分散型コンテナ監視システム１では、役割決定部１２１が、複数の監視コンテナ１２の役割分担をすることで、動的に役割分担することができる。 As described above, in the distributed container monitoring system 1, the role determination unit 121 can dynamically divide the roles by allocating the roles of the plurality of monitoring containers 12. FIG.

また、役割決定部１２１は、第２検知部１２５により、業務コンテナ１１を監視する監視コンテナ１２の障害が検知された場合、他の監視コンテナ１２を、業務コンテナ１１を監視する監視コンテナ１２として機能させる。 Further, when the second detection unit 125 detects a failure of the monitoring container 12 that monitors the business container 11 , the role determination unit 121 functions as the monitoring container 12 that monitors the business container 11 . Let

このように、分散型コンテナ監視システム１は、業務コンテナ１１を監視する監視コンテナ１２の障害を検知して、他の監視コンテナ１２を、業務コンテナ１１を監視する監視コンテナ１２とすることで、継続して、適切に業務コンテナ１１を監視し続けることができる。 In this way, the distributed container monitoring system 1 detects a failure in the monitoring container 12 that monitors the business container 11, and uses another monitoring container 12 as the monitoring container 12 that monitors the business container 11. By doing so, the business container 11 can be appropriately monitored.

また、復旧部１２７は、第１検知部１２３により監視対象のコンテナである業務コンテナ１１の障害が検知された場合、コンテナ環境の復旧処理をする。これにより、分散型コンテナ監視システム１では、業務コンテナ１１を適切に障害復旧することができる。また、業務コンテナ１１を実行するアプリケーションによっては、アプリケーション自身の機能で復旧することもできるが、復旧部１２７によれば、それに依存することなく障害復旧することができる。 Further, when the first detection unit 123 detects a failure of the business container 11, which is a container to be monitored, the restoration unit 127 performs restoration processing of the container environment. As a result, the distributed container monitoring system 1 can appropriately perform fault recovery for the business container 11 . Further, depending on the application that executes the business container 11, recovery may be performed by the function of the application itself, but the recovery unit 127 can perform failure recovery without relying on it.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は上記の実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。例えば、上記の実施の形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施の形態の構成の一部を他の実施の形態の構成に置き換えることが可能であり、また、ある実施の形態の構成に他の実施の形態の構成を加えることも可能である。また、各実施の形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 Although the invention made by the present inventors has been specifically described based on the embodiments, the present invention is not limited to the above embodiments, and can be variously modified without departing from the scope of the invention. Needless to say. For example, the above embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the described configurations. Also, part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. . Moreover, it is possible to add, delete, or replace a part of the configuration of each embodiment with another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば、集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリやハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、またはＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 Further, each of the above configurations, functions, processing units, processing means, and the like may be realized by hardware, for example, by designing a part or all of them using an integrated circuit. Moreover, each of the above configurations, functions, etc. may be realized by software by a processor interpreting and executing a program for realizing each function. Information such as programs, tables, and files that implement each function can be stored in recording devices such as memories, hard disks, SSDs (Solid State Drives), or recording media such as IC cards, SD cards, and DVDs.

また、上記の各図において、制御線や情報線は説明上必要と考えられるものを示しており、必ずしも実装上の全ての制御線や情報線を示しているとは限らない。実際にはほとんど全ての構成が相互に接続されていると考えてもよい。 Further, in each of the above drawings, control lines and information lines are those considered to be necessary for explanation, and not all control lines and information lines for implementation are necessarily shown. In fact, it may be considered that almost all configurations are interconnected.

本発明は、監視対象のコンテナの稼働状況を監視する分散型コンテナ監視システムに利用可能である。 INDUSTRIAL APPLICABILITY The present invention can be used in a distributed container monitoring system that monitors the operational status of containers to be monitored.

１…分散型コンテナ監視システム、１０…サーバ、１１…業務コンテナ、１２…監視コンテナ、１２１…役割決定部、１２２…第１信号送信部、１２３…第１検知部、
１２４…第２信号送信部、１２５…第２検知部、１２６…検知結果出力部、１２７…復旧部。 DESCRIPTION OF SYMBOLS 1... Distributed container monitoring system 10... Server 11... Business container 12... Monitoring container 121... Role determination part 122... First signal transmission part 123... First detection part,
124...Second signal transmission unit, 125...Second detection unit, 126...Detection result output unit, 127...Recovery unit.

Claims

A distributed container monitoring system that monitors the operating status of containers to be monitored,
a first monitoring container that monitors the monitored container;
a second monitoring container that monitors the first monitoring container;
The first monitoring container includes:
a first signal transmission unit that transmits a signal indicating status confirmation to the container to be monitored;
a first detection unit that detects a failure of the container to be monitored based on the response status of the container to be monitored to the signal transmitted by the first signal transmission unit;
The second monitoring container comprises:
a second signal transmission unit that transmits a signal indicating status confirmation to the first surveillance container;
a second detection unit that detects a failure of the first monitoring container based on the response status of the first monitoring container to the signal transmitted by the second signal transmission unit;
Distributed container monitoring system.

A distributed container monitoring system according to claim 1,
having a plurality of monitoring containers that are containers capable of monitoring the container to be monitored;
each of the monitoring containers further comprising a role determining unit that determines whether each of the monitoring containers functions as the first monitoring container or the second monitoring container;
Distributed container monitoring system.

A distributed container monitoring system according to claim 2,
When the second detection unit detects a failure of the first monitoring container, the role determination unit causes another monitoring container to function as the first monitoring container.
Distributed container monitoring system.

A distributed container monitoring system according to claim 1 or 2,
The first monitoring container includes:
Further comprising a restoration unit that performs restoration processing of the container environment when the failure of the container to be monitored is detected by the first detection unit,
Distributed container monitoring system.

A distributed container monitoring method executed by a distributed container monitoring system for monitoring the operating status of containers to be monitored,
a first monitoring container that monitors the monitored container;
a second monitoring container that monitors the first monitoring container;
In the first monitoring container,
a first signal transmission step of transmitting a signal indicating status confirmation to the container to be monitored;
a first detection step of detecting a failure of the container to be monitored based on the response status of the container to be monitored to the signal transmitted in the first signal transmission step;
In the second monitoring container,
a second signal transmission step of transmitting a signal indicating status confirmation to the first surveillance container;
a second detection step of detecting a failure of the first monitoring container based on the response status of the first monitoring container to the signal transmitted in the second signal transmission step;
Distributed container monitoring method.