JP2009271858A

JP2009271858A - Computing system and program

Info

Publication number: JP2009271858A
Application number: JP2008123877A
Authority: JP
Inventors: Shinya Ando; 真也安藤; Koji Muramatsu; 孝治村松
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2008-05-09
Filing date: 2008-05-09
Publication date: 2009-11-19

Abstract

<P>PROBLEM TO BE SOLVED: To shorten a time from fault generation to fault detection and to expand a detectable fault range by detecting the occurrence of a deadlock in a node in an event-driven way. <P>SOLUTION: A cluster system includes a node 10 for executing a plurality of threads sharing resources. The node 10 includes an API hook part 131, a lock state table 133 and a deadlock detection part 134. The API hook part 131 hooks a lock acquisition request for requesting acquisition of a lock for occupying the resources from each of the plurality of threads. In the lock state table 133, an acquisition state of the lock by the plurality of threads is registered on the basis of the lock acquisition request hooked by the API hook part 131. The deadlock detection part 134 detects the occurrence of a deadlock on the basis of the lock acquisition request hooked by the API hook part 131 and the acquisition state of the lock registered in the lock state table 133. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、計算機において発生するデッドロックを検出する計算機システム及びプログラムに関する。 The present invention relates to a computer system and a program for detecting a deadlock that occurs in a computer.

近年、例えば複数のノード（計算機）及び少なくとも１つ以上のストレージ装置（共有ディスク）から構成される高可用性（High Availability）クラスタシステム（以下、ＨＡクラスタシステムと表記）と呼ばれる計算機システムが知られている。このＨＡクラスタシステムを構成する複数のノードには、例えば運用系ノード及び待機系ノードが含まれる。 In recent years, for example, a computer system called a high availability cluster system (hereinafter referred to as an HA cluster system) composed of a plurality of nodes (computers) and at least one storage device (shared disk) has been known. Yes. The plurality of nodes constituting the HA cluster system include, for example, an active node and a standby node.

ＨＡクラスタシステムにおいては、例えば運用系ノードで障害が発生した場合、当該運用系ノード上で動作していたアプリケーション、共有ディスクまたはＩＰ（Internet Protocol）アドレス等を待機系ノードで引き継ぐ（フェイルオーバ）ことにより障害を復旧し、クラスタシステムにおける運用を継続することができる。以下、ＨＡクラスタシステムが提供する機能を継続するために必要な運用系ノードから待機系ノードに引き継がれる引継ぎ単位のセット（例えばアプリケーション、共有ディスク、ＩＰアドレス等）を統合してサービスと称する。 In the HA cluster system, for example, when a failure occurs in the active node, the standby node takes over the application, shared disk, IP (Internet Protocol) address, etc. that were operating on the active node. The failure can be recovered and the operation in the cluster system can be continued. Hereinafter, a set of takeover units (for example, an application, a shared disk, an IP address, etc.) to be taken over from the active node to the standby node necessary for continuing the functions provided by the HA cluster system are referred to as a service.

例えば運用系ノードで発生する障害は、大きく２つ（第１の障害、第２の障害）に分類できる。第１の障害は、例えばシャットダウン等のノード自体が停止することにより生じるサーバ障害である。第２の障害は、例えばノード上で稼動しているアプリケーションが誤動作することにより生じるアプリケーション障害である。 For example, failures that occur in the active node can be roughly classified into two (first failure and second failure). The first failure is a server failure that occurs when the node itself stops, for example, shutdown. The second failure is an application failure that occurs when, for example, an application running on a node malfunctions.

上記したＨＡクラスタシステムを構成する各ノードには、当該ＨＡクラスタシステムを管理するためのミドルウェア（ＨＡクラスタデーモン）が備えられている。このＨＡクラスタデーモンでは、障害発生時に自動で待機系ノードにサービスを引き継ぐため、例えば運用系ノードで発生する障害を検出する仕組みが必要となる。以下、上記した第１の障害及び第２の障害のうち、第２の障害について述べる。 Each node constituting the above-described HA cluster system is provided with middleware (HA cluster daemon) for managing the HA cluster system. In this HA cluster daemon, since a service is automatically taken over to the standby node when a failure occurs, a mechanism for detecting a failure that occurs in the active node, for example, is required. Hereinafter, the second failure among the first failure and the second failure will be described.

上記した第２の障害（アプリケーション障害）を検出する技術として、例えば監視の対象となるアプリケーション（以下、監視対象アプリケーションと表記）の状態を監視する外部コマンド（状態監視コマンド）を用意し、ＨＡクラスタデーモンにおいて当該状態監視コマンドが定期的に実行されることによって監視対象アプリケーションで発生した障害を検出する技術（以下、先行技術１と表記）がある。この先行技術１によれば、状態監視コマンドの返り値が異常を示すものであれば監視対象アプリケーションで障害が発生したことが検出される。なお、監視対象アプリケーションは、例えば少なくとも１つのプロセスから構成され、当該プロセスは１つ以上のスレッドから構成される。 As a technique for detecting the second failure (application failure) described above, for example, an external command (state monitoring command) for monitoring the state of an application to be monitored (hereinafter referred to as a monitoring target application) is prepared, and an HA cluster is prepared. There is a technique (hereinafter referred to as Prior Art 1) for detecting a failure that has occurred in a monitoring target application by periodically executing the state monitoring command in a daemon. According to this prior art 1, if the return value of the state monitoring command indicates an abnormality, it is detected that a failure has occurred in the monitoring target application. Note that the monitoring target application includes, for example, at least one process, and the process includes one or more threads.

ところで、上記したアプリケーション障害の代表的な例としてデッドロックが挙げられる。例えば上記した複数のスレッド（監視対象アプリケーション）の共有資源を用いて当該複数のスレッドのうちの１つのスレッドが実行される際、当該スレッドは、当該スレッドとは異なる他のスレッドからの当該共有資源の利用（アクセス）を排他制御するためにロックを取得する。この場合、ロックが取得されている間、他のスレッドは共有資源を利用することができない。また、スレッドの実行が終了されると、当該スレッドによって取得されていたロックは解放される。これにより、他のスレッドは共有資源を利用することができるようになる。 Incidentally, a deadlock is a typical example of the application failure described above. For example, when one thread of the plurality of threads is executed using the shared resource of the plurality of threads (monitored applications) described above, the thread is the shared resource from another thread different from the thread. Acquire a lock to exclusively control the use (access) of In this case, other threads cannot use the shared resource while the lock is acquired. Further, when the execution of the thread is terminated, the lock acquired by the thread is released. As a result, other threads can use the shared resource.

デッドロックとは、このような場合に、複数のスレッドがそれぞれロックを取得し、当該ロックが取得された状態を維持したまま当該複数のスレッドが互いに相手にロックされている共有資源の解放を待ってしまい、処理が停止することをいう。 In such a case, deadlock means that multiple threads acquire their respective locks and wait for the shared resources that are locked to each other to be held while the locks are acquired. It means that the process stops.

このアプリケーション障害であるデッドロック（アプリケーションのデッドロック）を直接検出して、上記したサービスをフェイルオーバする技術は知られていない。 There is no known technique for directly detecting this application failure deadlock (application deadlock) and failing over the above service.

上記したアプリケーションのデッドロックの検出に関連する技術として、アプリケーションプログラムに影響することなく、デッドロック状態を容易に且つ正確に検知できる技術（以下、先行技術２と表記）が開示されている（例えば、特許文献１を参照）。この先行技術２によれば、ＯＳ（Operating System）の中でロックの取得状態が管理されることによってデッドロックが検出される。 As a technique related to the detection of the above-described application deadlock, a technique (hereinafter referred to as Prior Art 2) that can easily and accurately detect a deadlock state without affecting an application program is disclosed (for example, described below). , See Patent Document 1). According to this prior art 2, a deadlock is detected by managing a lock acquisition state in an OS (Operating System).

また、デッドロックを発生させる可能性のある処理の組み合わせを構成する各処理を別々に行ったテスト時に取得した共通の資源に対するアクセス情報を元にデッドロックの可能性を検出する技術（以下、先行技術３と表記）が開示されている（例えば、特許文献２を参照）。この先行技術３によれば、アプリケーションの全処理の共有資源へのアクセス順序を予め記録し、その記録をつき合わせることでデッドロックが発生する可能性が検出される。 In addition, a technology for detecting the possibility of deadlock based on access information to a common resource acquired during a test in which each process constituting a combination of processes that may cause a deadlock is performed separately (hereinafter, the preceding (Referred to as Patent 3). According to the prior art 3, the access order to the shared resources of all processes of the application is recorded in advance, and the possibility of deadlock is detected by matching the records.

クラスタシステムにおけるデッドロックに関連する技術として、クラスタシステムにおける情報転送のためのオーバヘッド時間を短くし、デッドロック検出までの時間を短くする技術（以下、先行技術４と表記）が開示されている（例えば、特許文献３を参照）。この先行技術４は、クラスタシステム内のノード間の共有リソースにおけるデッドロックを検出する方法である。先行技術４によれば、ロック管理用ノードを１台用意し、当該共有リソースへのアクセスが不許可の場合、管理用ノードにチェックを要求することでデッドロックの発生が検出される。
特開２００４−３４１８７８号公報特開２００１−２２９０４３号公報特開２０００−１７２６５９号公報 As a technique related to deadlock in the cluster system, a technique (hereinafter referred to as Prior Art 4) that shortens the overhead time for information transfer in the cluster system and shortens the time until deadlock detection is disclosed (hereinafter referred to as Prior Art 4). For example, see Patent Document 3). This prior art 4 is a method for detecting a deadlock in a shared resource between nodes in a cluster system. According to the prior art 4, when one lock management node is prepared and access to the shared resource is not permitted, occurrence of a deadlock is detected by requesting a check from the management node.
JP 2004-341878 A JP 2001-229043 A JP 2000-172659 A

上記した先行技術１においては、例えば外部の状態監視コマンドの定期実行によってアプリケーションの特定の障害が検出される。 In the above-described prior art 1, a specific failure of an application is detected by, for example, periodic execution of an external state monitoring command.

このため、先行技術１では、アプリケーションの代表的な障害であるデッドロックを外部から直接検出する手段がない。また、監視対象アプリケーションの内部実装はわからない場合が多いため、外部コマンドの実行では、必要な全ての機能の障害を網羅する監視は困難である。したがって、先行技術１では、例えばある特定の機能がデッドロックで処理できないというアプリケーション障害が発生しても当該デッドロックを検出することができない場合がある。 For this reason, in the prior art 1, there is no means for directly detecting a deadlock that is a typical failure of an application from the outside. In addition, since the internal implementation of the monitoring target application is often unknown, it is difficult to monitor all the necessary functional failures by executing an external command. Therefore, in the prior art 1, for example, even if an application failure occurs in which a specific function cannot be processed by a deadlock, the deadlock may not be detected.

また、先行技術１では、デッドロックが発生した場合であっても状態監視コマンドで検出可能な状態になるまで時間を要する場合がある。更に、状態監視コマンドで検出可能な状態になっても、次の定期監視実行までにタイムラグがあり、その分の時間を要する。このため、先行技術１では、デッドロックという障害が発生しても即座に当該デッドロックを検出することはできない。 In the prior art 1, it may take time until the state can be detected by the state monitoring command even when a deadlock occurs. Further, even if the state can be detected by the state monitoring command, there is a time lag until the next periodic monitoring execution, and it takes a corresponding amount of time. For this reason, in the prior art 1, even if a failure called deadlock occurs, the deadlock cannot be detected immediately.

したがって、先行技術１では、平均修理時間の増大による可用性の低下や検出可能な障害範囲の限定といった影響がある。 Therefore, in the prior art 1, there are effects such as a decrease in availability due to an increase in the average repair time and a limitation of a detectable failure range.

上記した先行技術２においては、特殊なＯＳが必要であり、クラスタシステムが管理する任意のアプリケーションの動作には適さない。また、デッドロックが検出された場合であっても、フェイルオーバ等の対処を行うことはできない。 The above-described prior art 2 requires a special OS and is not suitable for the operation of any application managed by the cluster system. Even if a deadlock is detected, it is not possible to take measures such as failover.

上記した先行技術３においては、上記したようにアプリケーションの全処理の共有資源へのアクセス順序を予め記録しておくため、動的にデッドロックを検出するものではない。したがって、先行技術３では、クラスタシステムでの障害検出には適さない。 In the above prior art 3, since the access order to the shared resources of all processes of the application is recorded in advance as described above, the deadlock is not dynamically detected. Therefore, Prior Art 3 is not suitable for detecting a failure in a cluster system.

上記した先行技術４においては、上記したようにロック管理用ノードを１台用意し、管理用ノードにチェックを要求することでデッドロックの発生を検出するため、ノード上で稼動するアプリケーションのデッドロックを検出する手段には適用することができない。 In the above-described prior art 4, since one lock management node is prepared as described above and the occurrence of deadlock is detected by requesting the management node to check, deadlock of an application running on the node is detected. It cannot be applied to the means for detecting.

そこで、本発明の目的は、ノードにおけるデッドロックの発生をイベントドリブンで検出し、障害発生から障害検出までの時間を短縮及び検出可能な障害範囲を拡大することが可能な計算機システム及びプログラムを提供することにある。 Therefore, an object of the present invention is to provide a computer system and a program capable of detecting the occurrence of a deadlock in a node in an event-driven manner, reducing the time from the occurrence of a failure to the failure detection, and expanding the detectable failure range. There is to do.

本発明の１つの態様によれば、資源を共有する複数のスレッドを実行する計算機を具備し、前記計算機は、前記複数のスレッドの各々からの、前記資源を占有するためのロックの取得を要求するロック取得要求を取得する取得手段と、前記取得されたロック取得要求に基づいて、前記複数のスレッドによるロックの取得状態が登録されるロック状態テーブルと、前記取得されたロック取得要求及び前記ロック状態テーブルに登録されているロックの取得状態に基づいて、デッドロックの発生を検出する検出手段とを含むことを特徴とする計算機システムが提供される。 According to one aspect of the present invention, a computer is provided that executes a plurality of threads that share a resource, and the computer requests acquisition of a lock for occupying the resource from each of the plurality of threads. An acquisition unit for acquiring a lock acquisition request, a lock state table in which acquisition states of locks by the plurality of threads are registered based on the acquired lock acquisition request, the acquired lock acquisition request, and the lock There is provided a computer system including detection means for detecting the occurrence of a deadlock based on a lock acquisition state registered in a state table.

本発明によれば、ノードにおけるデッドロックの発生を検出することで、障害発生から障害検出までの時間を短縮及び検出可能な障害範囲を拡大することが可能となる。 According to the present invention, by detecting the occurrence of a deadlock in a node, it is possible to reduce the time from failure occurrence to failure detection and to expand the detectable failure range.

以下、図面を参照して、本発明の各実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［第１の実施形態］
図１は、本発明の第１の実施形態に係るクラスタシステム（計算機システム）の機能構成を示すブロック図である。 [First Embodiment]
FIG. 1 is a block diagram showing a functional configuration of a cluster system (computer system) according to the first embodiment of the present invention.

図１に示すように、本実施形態に係るクラスタシステムは、複数のノード（計算機）１０及び共有ディスク３０を備える。複数のノード１０には、運用系ノード及び複数の待機系ノードが含まれる。 As shown in FIG. 1, the cluster system according to the present embodiment includes a plurality of nodes (computers) 10 and a shared disk 30. The plurality of nodes 10 include an active node and a plurality of standby nodes.

複数のノード１０は、例えば通信路を介して互いに通信可能に接続されている。 The plurality of nodes 10 are connected to be communicable with each other via, for example, a communication path.

また、複数のノード１０及び共有ディスク３０は、例えばＦＣ（Fibre Channel）ケーブルで接続されている。 The plurality of nodes 10 and the shared disk 30 are connected by, for example, an FC (Fibre Channel) cable.

上記したクラスタシステムは、当該クラスタシステムを構成する複数のノード１０のうち、例えば運用系ノードとして動作するノード（以下、運用系ノードと表記）１０で障害が発生した場合、当該運用系ノード１０の処理が待機系ノードとして動作するノード（以下、単に待機系ノードと表記）１０に引き継がれる高可用性（High Availability）クラスタシステム（ＨＡクラスタシステム）である。つまり、運用系ノード１０で障害が発生した場合には、例えば運用系ノード１０上で動作していたアプリケーション、共有ディスク３０または当該運用系ノード１０のＩＰ（Internet Protocol）アドレス等（以下、サービスと表記）が待機系ノード１０に引き継がれる処理（フェイルオーバ）が実行される。これにより、クラスタシステムでは、運用系ノード１０において障害が発生した場合であっても待機系ノード１０において運用を継続することが可能となる。 In the cluster system described above, when a failure occurs in, for example, a node (hereinafter referred to as an active node) 10 that operates as an active node among a plurality of nodes 10 constituting the cluster system, the active node 10 This is a high availability cluster system (HA cluster system) that is handed over to a node (hereinafter, simply referred to as a standby node) 10 that operates as a standby node. That is, when a failure occurs in the active node 10, for example, an application operating on the active node 10, the shared disk 30, the IP (Internet Protocol) address of the active node 10, etc. (Notation) is taken over to the standby node 10 (failover). Thereby, in the cluster system, even if a failure occurs in the active node 10, the operation can be continued in the standby node 10.

図２は、図１に示す複数のノード１０の各々（以下、単にノード１０と表記）の主として機能構成を示すブロック図である。 FIG. 2 is a block diagram mainly showing a functional configuration of each of the plurality of nodes 10 shown in FIG. 1 (hereinafter simply referred to as node 10).

ノード１０上では、アプリケーション１１、ＯＳ（Operating System：オペレーティングシステム）１２、ジャケットライブラリ１３及びＨＡ（High Availability）クラスタデーモン１４が動作する。 On the node 10, an application 11, an OS (Operating System) 12, a jacket library 13, and a HA (High Availability) cluster daemon 14 operate.

アプリケーション１１は、ジャケットライブラリ１３による監視の対象となるアプリケーション（以下、監視対象アプリケーション１１と表記）である。監視対象アプリケーション１１は、例えば少なくとも１つのプロセスから構成される。ここでは、監視対象アプリケーション１１は、１つのプロセスから構成されるものとして説明する。また、監視対象アプリケーション１１を構成するプロセスは、１つ以上のスレッドから構成される。ここでは、監視対象アプリケーション１１を構成するプロセスは、複数のスレッドから構成されるものとして説明する。 The application 11 is an application to be monitored by the jacket library 13 (hereinafter referred to as the monitoring target application 11). The monitoring target application 11 includes, for example, at least one process. Here, it is assumed that the monitoring target application 11 is composed of one process. Moreover, the process which comprises the monitoring object application 11 is comprised from one or more threads. Here, the process constituting the monitoring target application 11 will be described as comprising a plurality of threads.

つまり、例えば運用系ノードとして動作するノード（運用系ノード）１０では、監視対象アプリケーション１１（を構成するプロセス）を構成する複数のスレッドが実行される。 That is, for example, in the node (active node) 10 that operates as the active node, a plurality of threads constituting the monitoring target application 11 (processes constituting the monitored application 11) are executed.

監視対象アプリケーション１１を構成する複数のスレッドは、例えばノード１０が有する資源（リソース）を共有する。 The plurality of threads constituting the monitoring target application 11 share resources (resources) that the node 10 has, for example.

ノード１０において複数のスレッドが実行される際には、当該複数のスレッドの各々は、当該ノード１０が有する資源を占有するためのロックの取得を要求するロック取得要求をＯＳ１２に対して出力する。これにより、ロックが取得された場合には、当該ロックが取得された資源に対する排他制御が可能となる。 When a plurality of threads are executed in the node 10, each of the plurality of threads outputs a lock acquisition request for requesting acquisition of a lock for occupying resources of the node 10 to the OS 12. Thus, when a lock is acquired, exclusive control can be performed on the resource from which the lock is acquired.

また、複数のスレッドの各々は、当該スレッドによりロックが取得されている資源の占有を解放するためのロック解放要求をＯＳ１２に対して出力する。これにより、ロックが解放された場合には、ロックを取得していたスレッド以外のスレッドは、当該ロックが取得されていた資源を使用（利用）することが可能となる。 Each of the plurality of threads outputs a lock release request for releasing the occupation of the resource for which the lock is acquired by the thread to the OS 12. Thus, when the lock is released, threads other than the thread that has acquired the lock can use (use) the resource for which the lock has been acquired.

ＯＳ１２は、監視対象アプリケーション１１を構成する複数のスレッドの各々を管理する。この場合、ＯＳ１２は、複数のスレッドの各々を識別するためのスレッドＩＤを用いて実行中のスレッドを管理する。 The OS 12 manages each of a plurality of threads constituting the monitoring target application 11. In this case, the OS 12 manages the thread being executed by using a thread ID for identifying each of the plurality of threads.

ＯＳ１２は、ロック取得部１２１及びロック解放部１２２を含む。ロック取得部１２１は、複数のスレッドの各々からのロック取得要求に応じてロックを取得する処理を実行する。ロック取得部１２１は、複数のスレッドの各々からのロック取得要求に応じて取得されるべきロックが他のスレッドによって既に取得されている場合には、当該ロックが他のスレッドによって解放されるのを待ってからロックの取得処理を実行する。 The OS 12 includes a lock acquisition unit 121 and a lock release unit 122. The lock acquisition unit 121 executes processing for acquiring a lock in response to a lock acquisition request from each of a plurality of threads. When a lock to be acquired in response to a lock acquisition request from each of a plurality of threads has already been acquired by another thread, the lock acquisition unit 121 determines that the lock is released by the other thread. Execute lock acquisition processing after waiting.

ロック解放部１２２は、複数のスレッドの各々からのロック解放要求に応じてロックを解放する処理を実行する。 The lock release unit 122 executes a process of releasing the lock in response to a lock release request from each of the plurality of threads.

ジャケットライブラリ１３は、監視対象アプリケーション１１を構成する複数のスレッドの各々からのＯＳ１２に対するロック取得要求及びロック解放要求をフック（取得）することで、当該複数のスレッドの各々によるロックの取得状態を管理する機能を有する。また、ジャケットライブラリ１３は、フックされたロック取得要求及び複数のスレッドの各々によるロックの取得状態に基づいてデッドロックの発生を検出する機能を有する。 The jacket library 13 manages the lock acquisition status of each of the plurality of threads by hooking (acquiring) a lock acquisition request and a lock release request to the OS 12 from each of the plurality of threads constituting the monitoring target application 11. It has the function to do. The jacket library 13 has a function of detecting the occurrence of a deadlock based on a hooked lock acquisition request and a lock acquisition state by each of a plurality of threads.

図２に示すように、ジャケットライブラリ１３は、ＡＰＩ（Application Program Interface）フック部１３１、ロック状態管理部１３２、ロック状態テーブル１３３、デッドロック検出部１３４及びデッドロック通知部１３５を含む。 As shown in FIG. 2, the jacket library 13 includes an API (Application Program Interface) hook unit 131, a lock state management unit 132, a lock state table 133, a deadlock detection unit 134, and a deadlock notification unit 135.

本実施形態において、ＡＰＩフック部１３１、ロック状態管理部１３２、デッドロック検出部１３４及びデッドロック通知部１３５は、ノード１０のコンピュータ（図示せず）が例えばノード１０に備えられるメモリ（図示せず）に格納されているプログラムを実行することにより実現されるものとする。このプログラムは、コンピュータ読取可能な記憶媒体に予め格納して頒布可能である。また、このプログラムが、ネットワークを介してコンピュータにダウンロードされても構わない。 In the present embodiment, the API hook unit 131, the lock state management unit 132, the deadlock detection unit 134, and the deadlock notification unit 135 include a memory (not shown) provided in the node 10 by a computer (not shown) of the node 10 for example. ) Is executed by executing the program stored in (1). This program can be stored in advance in a computer-readable storage medium and distributed. In addition, this program may be downloaded to a computer via a network.

また、本実施形態において、ロック状態テーブル１３３は、例えばノード１０に備えられるメモリ内であって、上記した監視対象アプリケーション１１（を構成するプロセス）に与えられたメモリ空間にその領域が確保されている。 Further, in the present embodiment, the lock state table 133 is in a memory provided in the node 10, for example, and the area is secured in the memory space given to the monitoring target application 11 (a process constituting the monitoring target application 11). Yes.

換言すれば、図２に示すジャケットライブラリ１３は、ＡＰＩフック部１３１、ロック状態管理部１３２、デッドロック検出部１３４及びデッドロック通知部１３５というモジュールと、ロック状態テーブル１３３というデータで構成される。 In other words, the jacket library 13 shown in FIG. 2 includes a module called an API hook unit 131, a lock state management unit 132, a deadlock detection unit 134, and a deadlock notification unit 135, and data called a lock state table 133.

ＡＰＩフック部１３１は、監視対象アプリケーション１１を構成する複数のスレッドのＯＳ１２に対するロック取得要求及びロック解放要求をフックする。このロック取得要求及びロック解放要求には、当該要求に応じて取得または解放されるロックのアドレスが含まれる。ロックのアドレスは、ＯＳ１２によって管理されているメモリ上のアドレスである。 The API hook unit 131 hooks a lock acquisition request and a lock release request to the OS 12 of a plurality of threads constituting the monitoring target application 11. The lock acquisition request and the lock release request include the address of the lock that is acquired or released in response to the request. The lock address is an address on the memory managed by the OS 12.

ＡＰＩフック部１３１は、ロック取得要求がフックされた場合には、当該ロック取得要求に含まれるロックのアドレス及び当該ロック取得要求に応じてロックの取得が実行される旨（以下、「ロック取得の実行」と表記）をロック状態管理部１３２に通知する。 When the lock acquisition request is hooked, the API hook unit 131 indicates that lock acquisition is executed according to the lock address included in the lock acquisition request and the lock acquisition request (hereinafter referred to as “lock acquisition request”). Notify execution ”) to the lock state management unit 132.

また、ＡＰＩフック部１３１は、ロック取得要求がフックされた場合には、当該ロック取得要求に応じてＯＳ１２に含まれるロック取得部（ロック取得ＡＰＩ）１２１を実行する。これにより、ロック取得部１２１は、ロック取得要求に応じてロック取得処理を実行する。 Further, when a lock acquisition request is hooked, the API hook unit 131 executes a lock acquisition unit (lock acquisition API) 121 included in the OS 12 in response to the lock acquisition request. Thereby, the lock acquisition part 121 performs a lock acquisition process according to a lock acquisition request.

ＡＰＩフック部１３１は、ロック解放要求がフックされた場合には、当該ロック解放要求に含まれるロックのアドレス及び当該ロック解放要求に応じてロックの解放が実行される旨（以下、「ロック解放の実行」と表記）をロック状態管理部１３２に通知する。 When the lock release request is hooked, the API hook unit 131 executes lock release according to the lock address included in the lock release request and the lock release request (hereinafter referred to as “lock release request”). Notify execution ”) to the lock state management unit 132.

また、ＡＰＩフック部１３１は、ロック解放要求がフックされた場合には、当該ロック解放要求に応じてＯＳ１２に含まれるロック解放部（ロック解放ＡＰＩ）１２２を実行する。これにより、ロック解放部１２２は、ロック解放要求に応じてロック解放処理を実行する。 Further, when a lock release request is hooked, the API hook unit 131 executes a lock release unit (lock release API) 122 included in the OS 12 in response to the lock release request. As a result, the lock release unit 122 executes a lock release process in response to the lock release request.

ＡＰＩフック部１３１は、ＯＳ１２（に含まれるロック取得部１２１及びロック解放部１２２）からのロック取得要求に対する応答（以下、ロック取得応答と表記）及びロック解放要求に対する応答（以下、ロック解放応答と表記）をフックする。このロック取得応答及びロック解放応答には、当該取得または解放されたロックのアドレスが含まれる。また、ＡＰＩフック部１３１は、ロック取得応答及びロック解放応答を監視対象アプリケーション１１（を構成するスレッド）に返す。 The API hook unit 131 responds to a lock acquisition request from the OS 12 (the lock acquisition unit 121 and the lock release unit 122 included therein) (hereinafter referred to as a lock acquisition response) and a response to the lock release request (hereinafter referred to as a lock release response). Hook). The lock acquisition response and the lock release response include the address of the acquired or released lock. Further, the API hook unit 131 returns a lock acquisition response and a lock release response to the monitoring target application 11 (a thread constituting the monitoring target application 11).

ＡＰＩフック部１３１は、ロック取得応答がフックされた場合、当該ロック取得応答に含まれるロックのアドレス及び当該ロック取得応答が返された旨（以下、「ロック取得の応答」と表記）をロック状態管理部１３２に通知する。同様に、ＡＰＩフック部１３１は、ロック解放応答がフックされた場合、当該ロック解放応答に含まれるロックのアドレス及び当該ロック解放応答が返された旨（以下、「ロック解放の応答」と表記）をロック状態管理部１３２に通知する。 When the lock acquisition response is hooked, the API hook unit 131 locks the lock address included in the lock acquisition response and the fact that the lock acquisition response is returned (hereinafter referred to as “lock acquisition response”). Notify the management unit 132. Similarly, when the lock release response is hooked, the API hook unit 131 returns the address of the lock included in the lock release response and the lock release response (hereinafter referred to as “lock release response”). Is notified to the lock state management unit 132.

ＡＰＩフック部１３１は、上記したようにロック状態管理部１３２に通知することによって、当該ロック状態管理部１３２にロック状態テーブル１３３の更新を要求する。 The API hook unit 131 requests the lock state management unit 132 to update the lock state table 133 by notifying the lock state management unit 132 as described above.

また、ＡＰＩフック部１３１は、監視対象アプリケーション１１を構成するスレッドからのロック取得要求に応じたロック取得部１２１（ロック取得ＡＰＩ）の実行時に、デッドロック検出部１３４に対してデッドロックの発生の検出処理を要求する。このとき、ＡＰＩフック部１３１は、ロックのアドレスをデッドロック検出部１３４に出力する。 Also, the API hook unit 131 causes the deadlock detection unit 134 to generate a deadlock when executing the lock acquisition unit 121 (lock acquisition API) in response to a lock acquisition request from a thread constituting the monitoring target application 11. Request detection processing. At this time, the API hook unit 131 outputs the lock address to the deadlock detection unit 134.

ロック状態管理部１３２は、ＡＰＩフック部１３１からの通知を受けると、当該ＡＰＩフック部１３１によってフックされたロック取得要求またはロック解放要求の要求元であるスレッドを識別するためのスレッドＩＤを、監視対象アプリケーション１１を構成するスレッドを管理するＯＳ１２に問い合わせる。これにより、ロック状態管理部１３２は、スレッドＩＤを取得する。 Upon receiving the notification from the API hook unit 131, the lock state management unit 132 monitors the thread ID for identifying the thread that is the request source of the lock acquisition request or the lock release request hooked by the API hook unit 131. An inquiry is made to the OS 12 that manages the threads constituting the target application 11. Thereby, the lock state management unit 132 acquires the thread ID.

ロック状態管理部１３２は、ＡＰＩフック部１３１からの通知及び取得されたスレッドＩＤに基づいてロック状態テーブル１３３を更新する。ロック状態テーブル１３３には、監視対象アプリケーション１１を構成する複数のスレッドによるロックの取得状態が登録される。ロック状態テーブル１３３の更新は、ＡＰＩフック部１３１から通知されたロックのアドレス及びロック状態管理部１３２によって取得されたスレッドＩＤからなる組み合わせの状態（ロックの取得状態）を、ＡＰＩフック部１３１からの通知の内容（例えば、「ロック取得の実行」等）に応じた状態に変更することによって行われる。 The lock state management unit 132 updates the lock state table 133 based on the notification from the API hook unit 131 and the acquired thread ID. In the lock state table 133, lock acquisition states by a plurality of threads constituting the monitoring target application 11 are registered. The lock state table 133 is updated by changing the combination state (lock acquisition state) of the lock address notified from the API hook unit 131 and the thread ID acquired by the lock state management unit 132 from the API hook unit 131. This is performed by changing to a state corresponding to the content of the notification (for example, “execution of lock acquisition”).

ロック状態管理部１３２は、更新するスレッドＩＤまたはロックのアドレスがロック状態テーブル１３３にない（登録されていない）ときは当該ロック状態テーブル１３３を拡張する。 The lock state management unit 132 expands the lock state table 133 when the thread ID or lock address to be updated is not in the lock state table 133 (not registered).

デッドロック検出部１３４は、ＡＰＩフック部１３１からの要求に基づいて、ＡＰＩフック部１３１によってフックされたロック取得要求に応じてロックが取得された場合にデッドロックが発生するか否かを確認する。これにより、デッドロック検出部１３４は、デッドロックの発生を検出する。 Based on the request from the API hook unit 131, the deadlock detection unit 134 checks whether or not a deadlock occurs when a lock is acquired in response to a lock acquisition request hooked by the API hook unit 131. . Thereby, the deadlock detector 134 detects the occurrence of a deadlock.

このとき、デッドロック検出部１３４は、ロック取得要求の要求元であるスレッドを識別するためのスレッドＩＤをＯＳ１２に問い合わせることにより、当該スレッドＩＤを取得する。デッドロック検出部１３４は、取得されたスレッドＩＤ、ＡＰＩフック部１３１によって出力されたロックのアドレス及びロック状態テーブル１３３に登録されているロック取得状態に基づいてデッドロックの発生を検出する。デッドロックの発生が検出された場合、デッドロック検出部１３４は、デッドロック通知部１３５に当該デッドロックの発生の通知要求を出力する。 At this time, the deadlock detection unit 134 acquires the thread ID by inquiring of the OS 12 about the thread ID for identifying the thread that is the request source of the lock acquisition request. The deadlock detection unit 134 detects the occurrence of a deadlock based on the acquired thread ID, the lock address output by the API hook unit 131, and the lock acquisition state registered in the lock state table 133. When the occurrence of a deadlock is detected, the deadlock detection unit 134 outputs a notification request for the occurrence of the deadlock to the deadlock notification unit 135.

なお、デッドロック検出部１３４によるデッドロックの発生の検出処理の詳細については後述する。 Details of the deadlock occurrence detection process by the deadlock detector 134 will be described later.

デッドロック通知部１３５は、デッドロック検出部１３４によって出力された通知要求に基づいて、デッドロックの発生をＨＡクラスタデーモン１４に通知する。 The deadlock notification unit 135 notifies the HA cluster daemon 14 of the occurrence of a deadlock based on the notification request output by the deadlock detection unit 134.

ＨＡクラスタデーモン１４は、クラスタシステムの管理及び運用を実行する機能を有する。ＨＡクラスタデーモン１４は、運用系ノードとして動作するノード１０において例えば監視対象アプリケーション１１に障害が発生した場合には、当該ノード１０のサービスを当該ノード１０とは別の待機系ノードとして動作するノード１０に引き継ぐ（フェイルオーバ）処理を実行する。 The HA cluster daemon 14 has a function of executing management and operation of the cluster system. For example, when a failure occurs in the monitoring target application 11 in the node 10 that operates as the active node, the HA cluster daemon 14 operates the service of the node 10 as a standby node that is different from the node 10. Execute the failover process.

ＨＡクラスタデーモン１４は、ジャケットライブラリ１３に含まれるデッドロック通知部１３５によってデッドロックの発生が通知された場合、監視対象アプリケーション１１において障害（デッドロック）が発生したものとしてサービスのフェイルオーバ等の対処を行う。 When the deadlock notification unit 135 included in the jacket library 13 notifies the HA cluster daemon 14 that a deadlock has occurred, the HA cluster daemon 14 takes measures such as service failover as a failure (deadlock) has occurred in the monitored application 11. Do.

なお、ＨＡクラスタデーモン１４は、監視対象アプリケーション１１において障害が発生した場合の対処として、監視対象アプリケーション１１の再起動を行う構成であっても構わない。 Note that the HA cluster daemon 14 may be configured to restart the monitored application 11 as a countermeasure when a failure occurs in the monitored application 11.

図３は、図２に示すロック状態テーブル１３３のデータ構造の一例を示す。図３に示すように、ロック状態テーブル１３３には、ロックのアドレス及びスレッドＩＤに対応付けてロックの取得状態が登録される。ここでは、ロックのアドレスはＬ（Ｌ１、Ｌ２、Ｌ３、…）、スレッドＩＤはＴ（Ｔ１、Ｔ２、Ｔ３、…）と表記されている。 FIG. 3 shows an example of the data structure of the lock state table 133 shown in FIG. As shown in FIG. 3, in the lock state table 133, the lock acquisition state is registered in association with the lock address and the thread ID. Here, the lock address is represented as L (L1, L2, L3,...), And the thread ID is represented as T (T1, T2, T3,...).

ここで、ロックの取得状態には、「ロック中」、「ロック要求中」及び「ロックなし」が含まれる。図３に示す例では、「ロック中」は「○」、「ロック要求中」は「△」、「ロックなし」は「ブランク（なし）」で表されている。 Here, the lock acquisition status includes “locking”, “lock requesting”, and “no lock”. In the example shown in FIG. 3, “locked” is represented by “◯”, “lock requested” is represented by “Δ”, and “no lock” is represented by “blank (none)”.

図３に示す例では、ロック状態テーブル１３３には、ロックのアドレス「Ｌ１」及びスレッドＩＤ「Ｔ１」に対応付けてロックの取得状態「○」が登録されている。これによれば、スレッドＩＤ「Ｔ１」によって識別されるスレッドが、ロックのアドレスが「Ｌ１」であるロックを「ロック中」であることが示されている。 In the example illustrated in FIG. 3, the lock acquisition state “◯” is registered in the lock state table 133 in association with the lock address “L1” and the thread ID “T1”. This indicates that the thread identified by the thread ID “T1” is “locking” the lock whose lock address is “L1”.

ロック状態テーブル１３３には、ロックのアドレス「Ｌ２」及びスレッドＩＤ「Ｔ３」に対応付けてロックの取得状態「○」が登録されている。これによれば、スレッドＩＤ「Ｔ３」によって識別されるスレッドが、ロックのアドレスが「Ｌ２」であるロックを「ロック中」であることが示されている。 In the lock state table 133, the lock acquisition state “◯” is registered in association with the lock address “L2” and the thread ID “T3”. This indicates that the thread identified by the thread ID “T3” is “locking” the lock whose lock address is “L2”.

また、ロック状態テーブル１３３には、ロックのアドレス「Ｌ３」及びスレッドＩＤ「Ｔ１」に対応付けてロックの取得状態「△」が登録されている。これによれば、スレッドＩＤ「Ｔ１」によって識別されるスレッドが、ロックのアドレスが「Ｌ１」であるロックを「ロック要求中」であることが示されている。 In the lock state table 133, the lock acquisition state “Δ” is registered in association with the lock address “L3” and the thread ID “T1”. This indicates that the thread identified by the thread ID “T1” is “locking requested” for the lock whose lock address is “L1”.

なお、ロック状態テーブル１３３において、上記した以外については「ロック中」及び「ロック要求中」でない、つまり、「ロックなし」であることが示されている。 The lock state table 133 indicates that the items other than those described above are not “locking” and “lock requesting”, that is, “no lock”.

次に、図４を参照して、上記したロック状態管理部１３２によるロック状態テーブル１３３の更新について説明する。 Next, update of the lock state table 133 by the lock state management unit 132 will be described with reference to FIG.

ロック状態管理部１３２は、ＡＰＩフック部１３１からの通知及び取得されたスレッドＩＤ（要求元であるスレッドを識別するためのスレッドＩＤ）に基づいてロック状態テーブル１３３を更新する。ＡＰＩフック部１３１からの通知（の内容）には、上記したように「ロック取得の実行」、「ロック取得の応答」、「ロック解放の実行」及び「ロック解放の応答」が含まれる。 The lock state management unit 132 updates the lock state table 133 based on the notification from the API hook unit 131 and the acquired thread ID (thread ID for identifying the thread that is the request source). As described above, the notification (contents) from the API hook unit 131 includes “execution of lock acquisition”, “response to acquire lock”, “execution of lock release”, and “response to release lock”.

ここで、図４に示すように、ＡＰＩフック部１３１からの通知の内容が「ロック取得の実行」である場合には、ロック状態管理部１３２は、ロック状態テーブル１３３におけるロックの取得状態を「ロック要求中」に更新する。例えばＡＰＩフック部１３１から通知されたロックのアドレスが「Ｌ１」、ロック状態管理部１３２によって取得されたスレッドＩＤが「Ｔ３」であり、ＡＰＩフック部１３１からの通知の内容が「ロック取得の実行」である場合を想定する。この場合には、ロック状態管理部１３２は、ロックのアドレス「Ｌ１」及びスレッドＩＤ「Ｔ３」に対応付けられているロック状態テーブル１３３におけるロックの取得状態を「ロック要求中」に更新する。 Here, as shown in FIG. 4, when the content of the notification from the API hook unit 131 is “execution of lock acquisition”, the lock state management unit 132 sets the lock acquisition state in the lock state table 133 to “ Update to “Lock Requesting”. For example, the lock address notified from the API hook unit 131 is “L1”, the thread ID acquired by the lock state management unit 132 is “T3”, and the content of the notification from the API hook unit 131 is “execute lock acquisition”. ”Is assumed. In this case, the lock state management unit 132 updates the lock acquisition state in the lock state table 133 associated with the lock address “L1” and the thread ID “T3” to “lock requested”.

図４に示すように、ＡＰＩフック部１３１からの通知の内容が「ロック取得の応答」である場合には、ロック状態管理部１３２は、ロック状態テーブル１３３におけるロックの取得状態を「ロック中」に更新する。 As illustrated in FIG. 4, when the content of the notification from the API hook unit 131 is “lock acquisition response”, the lock state management unit 132 sets the lock acquisition state in the lock state table 133 to “locking”. Update to

図４に示すように、ＡＰＩフック部１３１からの通知の内容が「ロック解放の実行」である場合には、ロック状態テーブル１３３におけるロックの取得状態は変更していないため、ロック状態管理部１３２は、ロック状態テーブル１３３の更新は行わない。 As shown in FIG. 4, when the content of the notification from the API hook unit 131 is “execution of lock release”, the lock acquisition state in the lock state table 133 has not been changed, so the lock state management unit 132. Does not update the lock state table 133.

また、図４に示すように、ＡＰＩフック部１３１からの通知の内容が「ロック解放の応答」である場合には、ロック状態管理部１３２は、ロック状態テーブル１３３におけるロックの取得状態を「ロックなし」に更新する。 As shown in FIG. 4, when the content of the notification from the API hook unit 131 is “lock release response”, the lock state management unit 132 sets the lock acquisition state in the lock state table 133 to “lock”. Update to None.

次に、図５のフローチャートを参照して、前述したデッドロック検出部１３４によるデッドロックの発生の検出処理の処理手順について説明する。 Next, with reference to the flowchart of FIG. 5, the processing procedure of the deadlock occurrence detection process by the deadlock detection unit 134 described above will be described.

デッドロックの発生の検出処理は、デッドロック検出部１３４がＡＰＩフック部１３１からの要求（デッドロックの発生の確認の要求）を受けると実行される。このとき、デッドロック検出部１３４は、ＡＰＩフック部１３１からロックのアドレスを受け取る。また、デッドロック検出部１３４は、ＯＳ１２に問い合わせることにより、スレッドＩＤを取得する。ここで取得されたスレッドＩＤは、ＡＰＩフック部１３１から受け取られたロックのアドレスを含むロック取得要求（ＡＰＩフック部１３１によってフックされたロック取得要求）の要求元であるスレッドを識別するためのＩＤである。以下、デッドロック検出部１３４がＡＰＩフック部１３１から受け取ったロックのアドレスをＬ、当該デッドロック検出部１３４によって取得されたスレッドＩＤをＴとして説明する。 The deadlock occurrence detection process is executed when the deadlock detection unit 134 receives a request from the API hook unit 131 (a request to confirm the occurrence of a deadlock). At this time, the deadlock detection unit 134 receives the lock address from the API hook unit 131. Moreover, the deadlock detection unit 134 acquires the thread ID by inquiring of the OS 12. The thread ID acquired here is an ID for identifying the thread that is the request source of the lock acquisition request (the lock acquisition request hooked by the API hook unit 131) including the address of the lock received from the API hook unit 131. It is. Hereinafter, the address of the lock received by the deadlock detection unit 134 from the API hook unit 131 will be described as L, and the thread ID acquired by the deadlock detection unit 134 will be described as T.

まず、デッドロック検出部１３４は、初期化処理として、スタックをクリアする（ステップＳ１）。このスタックは、例えばデータを後入れ先出しの構造で保持するものであり、ハードウェアとソフトウェアの機能の組み合わせによって実装される。スタックは、例えばノード１０に備えられているメモリにその領域が確保される。 First, the deadlock detection unit 134 clears the stack as an initialization process (step S1). This stack holds, for example, data in a last-in first-out structure, and is implemented by a combination of hardware and software functions. For example, an area of the stack is secured in a memory provided in the node 10.

次に、デッドロック検出部１３４は、Ｔをスタックに追加する（ステップＳ２）。ここでは、Ｔは、デッドロック検出部１３４によって取得されたスレッドＩＤである。 Next, the deadlock detector 134 adds T to the stack (step S2). Here, T is a thread ID acquired by the deadlock detection unit 134.

デッドロック検出部１３４は、ＡＰＩフック部１３１から受け取ったＬのロックがＴによって識別されるスレッド以外のスレッド（以下、単に他のスレッドと表記）に取得されているか否かを判定する（ステップＳ３）。このとき、デッドロック検出部１３４は、ロック状態テーブル１３３を参照して判定処理を実行する。デッドロック検出部１３４は、ロック状態テーブル１３３においてＬのロックを「ロック中」であるスレッドが存在するか否かを判定する。 The deadlock detection unit 134 determines whether or not the L lock received from the API hook unit 131 has been acquired by a thread other than the thread identified by T (hereinafter simply referred to as another thread) (step S3). ). At this time, the deadlock detection unit 134 performs a determination process with reference to the lock state table 133. The deadlock detection unit 134 determines whether or not there is a thread that is “locking” the L lock in the lock state table 133.

Ｌのロックが取得されていると判定された場合（ステップＳ３のＹＥＳ）、デッドロック検出部１３４は、Ｔを、当該Ｌのロックを「ロック中」である他のスレッドを識別するためのスレッドＩＤ（つまり、Ｔ＝Ｌのロックを取得しているスレッドを識別するためのスレッドＩＤ）とする（ステップＳ４）。これにより、ステップＳ４以降の処理においては、上記した他のスレッドを識別するためのスレッドＩＤをＴとして処理が実行される。 When it is determined that the lock of L has been acquired (YES in step S3), the deadlock detection unit 134 identifies T and a thread for identifying another thread that is “locking” the lock of L ID (that is, a thread ID for identifying a thread that has acquired T = L lock) (step S4). Thereby, in the processing after step S4, the processing is executed with the thread ID for identifying the other threads described above as T.

次に、デッドロック検出部１３４は、Ｔ（上記した他のスレッドを識別するためのスレッドＩＤ）がスタックに存在するか否かを判定する（ステップＳ５）。 Next, the deadlock detection unit 134 determines whether or not T (thread ID for identifying the other thread described above) exists in the stack (step S5).

スタックに存在しないと判定された場合（ステップＳ５のＮＯ）、デッドロック検出部１３４は、ロック状態テーブル１３３を参照して、Ｔによって識別されるスレッドが「ロック要求中」であるロックがあるか否かを判定する（ステップＳ６）。 If it is determined that the thread does not exist in the stack (NO in step S5), the deadlock detector 134 refers to the lock state table 133 and determines whether there is a lock whose thread identified by T is “lock requested”. It is determined whether or not (step S6).

「ロック要求中」であるロックがあると判定された場合（ステップＳ６のＹＥＳ）、デッドロック検出部１３４は、Ｌを、当該ロックのアドレス（つまり、Ｌ＝Ｔによって識別されるスレッドが「ロック要求中」であるロックのアドレス）とする（ステップＳ７）。これにより、ステップＳ７以降の処理においては、上記したＴによって識別されるスレッドが「ロック要求中」であるロックのアドレスをＬとして処理が実行される。 If it is determined that there is a lock that is “lock requested” (YES in step S6), the deadlock detector 134 sets L to the lock address (that is, the thread identified by L = T is “locked”). The address of the lock being “requesting”) (step S7). Thereby, in the processing after step S7, the processing is executed with the address of the lock whose thread identified by T described above is “lock requested” being L.

上記したステップＳ７の処理が実行された場合、ステップＳ２に戻って処理が繰り返される。 When the process of step S7 described above is executed, the process returns to step S2 and is repeated.

一方、ステップＳ３においてＬのロックが取得されていないと判定された場合、デッドロック検出部１３４によって取得されたスレッドＩＤによって識別されるスレッドがロックを取得した場合であってもデッドロックは発生しないとして、デッドロック検出部１３４は、デッドロックの発生を検出することなく処理を終了する。 On the other hand, if it is determined in step S3 that the lock of L has not been acquired, even if the thread identified by the thread ID acquired by the deadlock detection unit 134 acquires the lock, no deadlock occurs. As a result, the deadlock detector 134 ends the process without detecting the occurrence of a deadlock.

また、ステップＳ６において「ロック要求中」であるロックがないと判定された場合、デッドロック検出部１３４は、デッドロックの発生を検出することなく処理を終了する。 If it is determined in step S6 that there is no lock that is “lock requested”, the deadlock detector 134 ends the process without detecting the occurrence of a deadlock.

一方、ステップＳ５においてＴがスタックに存在すると判定された場合、デッドロック検出部１３４は、デッドロックの発生を検出する（ステップＳ８）。このように、ロック状態テーブル１３３に登録されているロックの取得状態において、ロック中及びロック要求中の（ロックの）ループが発生する場合に、デッドロック検出部１３４は、デッドロックの発生を検出する。 On the other hand, when it is determined in step S5 that T exists in the stack, the deadlock detector 134 detects the occurrence of deadlock (step S8). In this way, in the lock acquisition state registered in the lock state table 133, the deadlock detection unit 134 detects the occurrence of a deadlock when a lock or lock request (lock) loop occurs. To do.

ここで、図６を参照して、上記した検出部１３４のデッドロックの発生の検出処理について具体的に説明する。図６は、デッドロックが発生する場合のロックの取得状態を表すロック状態テーブル１３３の一例を示す。 Here, with reference to FIG. 6, the detection process of the occurrence of deadlock in the detection unit 134 will be specifically described. FIG. 6 shows an example of the lock state table 133 representing the lock acquisition state when a deadlock occurs.

ここでは、デッドロック検出部１３４がＡＰＩフック部１３１から受け取ったロックのアドレスはＬ３であり、当該デッドロック検出部１３４がＯＳ１２に問い合わせることによって取得されたスレッドＩＤはＴ１であるものとして説明する。つまり、Ｔ１によって識別されるスレッドが、Ｌ３のロックを取得しようとしている場合を想定している。 Here, it is assumed that the address of the lock received by the deadlock detection unit 134 from the API hook unit 131 is L3, and the thread ID acquired when the deadlock detection unit 134 inquires of the OS 12 is T1. That is, it is assumed that the thread identified by T1 is trying to acquire the lock of L3.

なお、図６に示す例では、上記したＴ１及びＬ３に対応付けて「ロック要求中（△）」が、ロック状態テーブル１３３に登録されている。 In the example shown in FIG. 6, “lock requested (Δ)” is registered in the lock state table 133 in association with T1 and L3 described above.

この場合、デッドロック検出部１３４は、取得されたスレッドＩＤであるＴ１をスタックに追加する。 In this case, the deadlock detection unit 134 adds T1 that is the acquired thread ID to the stack.

次に、デッドロック検出部１３４は、ＡＰＩフック部１３１から受け取ったＬ３のロックがＴ１によって識別されるスレッド以外のスレッド（他のスレッド）によって取得されているか否か（つまり、「ロック中」であるか否か）を、ロック状態テーブル１３３を参照して判定する。図６に示すロック状態テーブル１３３の例では、ＡＰＩフック部１３１から受け取ったＬ３のロックはＴ２によって識別されるスレッドによって取得されている。したがって、デッドロック検出部１３４は、Ｌ３のロックが他のスレッド（Ｔ２によって識別されるスレッド）によって取得されていると判定する。 Next, the deadlock detection unit 134 determines whether or not the lock of L3 received from the API hook unit 131 has been acquired by a thread other than the thread identified by T1 (that is, “locking”). Is determined with reference to the lock state table 133. In the example of the lock state table 133 shown in FIG. 6, the lock of L3 received from the API hook unit 131 is acquired by the thread identified by T2. Therefore, the deadlock detection unit 134 determines that the lock of L3 has been acquired by another thread (the thread identified by T2).

デッドロック検出部１３４は、Ｔ２がスタックに存在するか否かを判定する。この段階ではスタックにはＴ１のみが存在するため、デッドロック検出部１３４は、Ｔ２がスタックに存在しないと判定する。 The deadlock detector 134 determines whether or not T2 exists in the stack. At this stage, since only T1 exists in the stack, the deadlock detector 134 determines that T2 does not exist in the stack.

デッドロック検出部１３４は、ロック状態テーブル１３３を参照して、Ｔ２によって識別されるスレッドが「ロック要求中」であるロックがあるか否かを判定する。図６に示すロック状態テーブル１３３の例では、Ｔ２によって識別されるスレッドは、Ｌ４のロックを「ロック要求中」である。したがって、デッドロック検出部１３４は、Ｔ２によって識別されるスレッドが「ロック要求中」であるロック（Ｌ４のロック）があると判定する。 The deadlock detection unit 134 refers to the lock state table 133 to determine whether there is a lock in which the thread identified by T2 is “lock requested”. In the example of the lock state table 133 illustrated in FIG. 6, the thread identified by T2 is “lock requested” for the lock of L4. Therefore, the deadlock detection unit 134 determines that there is a lock (L4 lock) in which the thread identified by T2 is “lock requested”.

この場合、デッドロック検出部１３４は、Ｔ２をスタックに追加する。Ｔ２がスタックに追加されると、デッドロック検出部１３４は、Ｌ４のロックがＴ２によって識別されるスレッド以外のスレッド（他のスレッド）によって取得されているか否かを、ロック状態テーブル１３３を参照して判定する。図６に示すロック状態テーブル１３３の例では、Ｌ４のロックはＴ４によって識別されるスレッドによって取得されている。したがって、デッドロック検出部１３４は、Ｌ４のロックが他のスレッド（Ｔ４によって識別されるスレッド）によって取得されていると判定する。 In this case, the deadlock detection unit 134 adds T2 to the stack. When T2 is added to the stack, the deadlock detection unit 134 refers to the lock state table 133 to determine whether or not the lock of L4 has been acquired by a thread other than the thread identified by T2 (another thread). Judgment. In the example of the lock state table 133 shown in FIG. 6, the lock of L4 is acquired by the thread identified by T4. Therefore, the deadlock detection unit 134 determines that the lock of L4 has been acquired by another thread (the thread identified by T4).

デッドロック検出部１３４は、Ｔ４がスタックに存在するか否かを判定する。この段階ではスタックにはＴ１及びＴ２が存在するため、デッドロック検出部１３４は、Ｔ４がスタックに存在しないと判定する。 The deadlock detection unit 134 determines whether T4 exists in the stack. Since T1 and T2 exist in the stack at this stage, the deadlock detector 134 determines that T4 does not exist in the stack.

デッドロック検出部１３４は、ロック状態テーブル１３３を参照して、Ｔ４によって識別されるスレッドが「ロック要求中」であるロックがあるか否かを判定する。図６に示すロック状態テーブル１３３の例では、Ｔ４によって識別されるスレッドは、Ｌ１のロックを「ロック要求中」である。したがって、デッドロック検出部１３４は、Ｔ４によって識別されるスレッドが「ロック要求中」であるロック（Ｌ１のロック）があると判定する。 The deadlock detection unit 134 refers to the lock state table 133 to determine whether there is a lock whose thread identified by T4 is “lock requested”. In the example of the lock state table 133 illustrated in FIG. 6, the thread identified by T4 is “lock requested” for the lock of L1. Therefore, the deadlock detection unit 134 determines that there is a lock (L1 lock) in which the thread identified by T4 is “lock requested”.

この場合、デッドロック検出部１３４は、Ｔ４をスタックに追加する。Ｔ４がスタックに追加されると、デッドロック検出部１３４は、Ｌ１のロックがＴ４によって識別されるスレッド以外のスレッド（他のスレッド）によって取得されているか否かを、ロック状態テーブル１３３を参照して判定する。図６に示すロック状態テーブル１３３の例では、Ｌ１のロックはＴ１によって識別されるスレッドによって取得されている。したがって、デッドロック検出部１３４は、Ｌ１のロックが他のスレッド（Ｔ１によって識別されるスレッド）によって取得されていると判定する。 In this case, the deadlock detection unit 134 adds T4 to the stack. When T4 is added to the stack, the deadlock detector 134 refers to the lock state table 133 to determine whether or not the lock of L1 has been acquired by a thread other than the thread identified by T4 (another thread). Judgment. In the example of the lock state table 133 shown in FIG. 6, the lock of L1 is acquired by the thread identified by T1. Therefore, the deadlock detection unit 134 determines that the lock of L1 has been acquired by another thread (the thread identified by T1).

デッドロック検出部１３４は、Ｔ１がスタックに存在するか否かを判定する。この場合、スタックにはＴ１、Ｔ２及びＴ４が存在するため、デッドロック検出部１３４は、Ｔ１がスタックに存在すると判定する。 The deadlock detection unit 134 determines whether T1 exists in the stack. In this case, since T1, T2, and T4 exist in the stack, the deadlock detector 134 determines that T1 exists in the stack.

このようにＴ１がスタックに存在すると判定された場合、図６に示すように複数のスレッド間（ここでは、Ｔ１、Ｔ２及びＴ４）においてロック中及びロック要求中のループが発生することになる。複数のスレッドの各々は、いずれかのロックに対して「ロック要求中」である場合には当該スレッドが取得しているロックの解放を実行できない。つまり、例えばＴ１（によって識別されるスレッド）はＴ２のロックの解放を、Ｔ２はＴ４のロックの解放を、Ｔ４はＴ１のロックの解放を待っている状態となるため、複数のスレッドが互いに占有している資源の解放を待ち合うことによりデッドロックが発生する。 When it is determined that T1 exists in the stack as described above, a loop during locking and lock request occurs between a plurality of threads (here, T1, T2, and T4) as shown in FIG. Each of the plurality of threads cannot execute the release of the lock acquired by the thread when “lock is being requested” for any lock. That is, for example, T1 (thread identified by) waits for T2 lock release, T2 waits for T4 lock release, and T4 waits for T1 lock release, so multiple threads occupy each other. A deadlock occurs due to waiting for the release of the resource being released.

したがって、デッドロック検出部１３４は、上記したようにロック中及びロック要求中のループが発生する場合にはデッドロックの発生を検出する。 Therefore, the deadlock detector 134 detects the occurrence of a deadlock when a loop that is locked and a lock is requested as described above.

なお、デッドロック検出部１３４は、デッドロックの発生を検出した場合、当該デッドロックの発生をＨＡクラスタデーモン１４に通知することを要求する通知要求をデッドロック通知部１３５に出力する。デッドロック通知部１３５は、デッドロック検出部１３４からの通知要求に基づいてデッドロックの発生をＨＡクラスタデーモン１４に通知する。ＨＡクラスタデーモン１４では、当該デッドロック通知部１３５からの通知に応じて例えばフェイルオーバ等の処理が実行される。 When the deadlock detection unit 134 detects the occurrence of a deadlock, the deadlock detection unit 134 outputs a notification request for notifying the HA cluster daemon 14 of the occurrence of the deadlock to the deadlock notification unit 135. The deadlock notification unit 135 notifies the HA cluster daemon 14 of the occurrence of deadlock based on the notification request from the deadlock detection unit 134. In the HA cluster daemon 14, processing such as failover is executed in response to the notification from the deadlock notification unit 135.

次に、図７を参照して、監視対象アプリケーション１１を構成するスレッドによってロックが取得される際の処理の流れについて説明する。 Next, the flow of processing when a lock is acquired by a thread constituting the monitoring target application 11 will be described with reference to FIG.

まず、監視対象アプリケーション１１を構成するスレッドは、資源を占有するためのロック取得要求をＯＳ１２に対して出力する。このスレッドは、例えば運用系ノードとして動作しているノード１０において実行されているスレッドである。スレッドから出力されるロック取得要求には、当該ロック取得要求に応じて取得されるロックのアドレスが含まれる。 First, the thread constituting the monitoring target application 11 outputs a lock acquisition request for occupying resources to the OS 12. This thread is, for example, a thread that is executed in the node 10 operating as an active node. The lock acquisition request output from the thread includes the address of the lock acquired in response to the lock acquisition request.

ジャケットライブラリ１３のＡＰＩフック部１３１は、監視対象アプリケーション１１を構成するスレッドからのロック取得要求をフックする。ＡＰＩフック部１３１は、ロック取得要求がフックされると、ロック状態管理部１３２を実行する（ステップＳ１１）。このとき、ＡＰＩフック部１３１は、フックされたロック取得要求に含まれるロックのアドレス及び当該ロック取得要求に応じてロックの取得が実行される旨（ロック取得の実行）をロック状態管理部１３２に通知する。 The API hook unit 131 of the jacket library 13 hooks a lock acquisition request from a thread constituting the monitoring target application 11. When the lock acquisition request is hooked, the API hook unit 131 executes the lock state management unit 132 (step S11). At this time, the API hook unit 131 informs the lock state management unit 132 that the lock acquisition included in the hooked lock acquisition request and that lock acquisition is executed according to the lock acquisition request (execution of lock acquisition). Notice.

ＡＰＩフック部１３１によってロック状態管理部１３２が実行されると、ロック状態管理部１３２は、ＡＰＩフック部１３１によってフックされたロック取得要求の要求元であるスレッド（つまり、実行中のスレッド）を識別するためのスレッドＩＤをＯＳ１２に問い合わせる。これにより、ロック状態管理部１３２は、ロック取得要求の要求元であるスレッドのスレッドＩＤを取得する（ステップＳ１２）。 When the lock state management unit 132 is executed by the API hook unit 131, the lock state management unit 132 identifies the thread that is the request source of the lock acquisition request hooked by the API hook unit 131 (that is, the thread that is being executed). The OS 12 is inquired about the thread ID to be used. Thereby, the lock state management unit 132 acquires the thread ID of the thread that is the request source of the lock acquisition request (step S12).

ロック状態管理部１３２は、ＡＰＩフック部１３１から通知されたロックのアドレス及びＯＳ１２に問い合わせることにより取得されたスレッドＩＤにロック状態テーブル１３３において対応付けられているロックの取得状態に更新する。このとき、ロック状態管理部１３２は、ＡＰＩフック部１３１からの通知の内容が「ロック取得の実行」であるため、ロックの取得状態を「ロック要求中」に更新する（ステップＳ１３）。 The lock state management unit 132 updates the lock acquisition state associated with the lock ID notified from the API hook unit 131 and the thread ID acquired by making an inquiry to the OS 12 in the lock state table 133. At this time, since the content of the notification from the API hook unit 131 is “execution of lock acquisition”, the lock state management unit 132 updates the lock acquisition state to “lock requesting” (step S13).

次に、ＡＰＩフック部１３１は、デッドロック検出部１３４を実行する（ステップＳ１４）。このとき、ＡＰＩフック部１３１は、フックされたロック取得要求に含まれるロックのアドレスをデッドロック検出部１３４に出力する。 Next, the API hook unit 131 executes the deadlock detection unit 134 (step S14). At this time, the API hook unit 131 outputs the address of the lock included in the hooked lock acquisition request to the deadlock detection unit 134.

ＡＰＩフック部１３１によってデッドロック検出部１３４が実行されると、デッドロック検出部１３４は、ＡＰＩフック部１３１によってフックされたロック取得要求の要求元となるスレッドのスレッドＩＤを取得し、上記した図５に示すデッドロックの発生の検出処理を実行する（ステップＳ１５）。 When the deadlock detection unit 134 is executed by the API hook unit 131, the deadlock detection unit 134 acquires the thread ID of the thread that is the request source of the lock acquisition request hooked by the API hook unit 131, and the above-described diagram. The deadlock occurrence detection process shown in FIG. 5 is executed (step S15).

デッドロック検出部１３４によってデッドロックの発生が検出されない場合（ステップＳ１５のＮＯ）、ＡＰＩフック部１３１は、ＯＳ１２のロック取得ＡＰＩ（ロック取得部１２１）を実行する（ステップＳ１６）。このとき、ＡＰＩフック部１３１は、ＯＳ１２にロックのアドレスを出力する。 If the occurrence of a deadlock is not detected by the deadlock detection unit 134 (NO in step S15), the API hook unit 131 executes the lock acquisition API (lock acquisition unit 121) of the OS 12 (step S16). At this time, the API hook unit 131 outputs a lock address to the OS 12.

ＯＳ１２のロック取得部１２１は、ＡＰＩフック部１３１によって出力されたロックのアドレスに応じて、ロック取得処理を実行する（ステップＳ１７）。このとき、ロック取得部１２１は、例えばロックが既に他のスレッドによって取得されている場合には、当該ロックが解放されるのを待ってからロック取得処理を実行する。ロック取得部１２１は、ロック取得処理が実行されると、ロック取得応答（返り値）を監視対象アプリケーション１１に対して出力する。このロック取得応答には、取得処理が完了されたロックのアドレスが含まれる。 The lock acquisition unit 121 of the OS 12 executes a lock acquisition process according to the lock address output by the API hook unit 131 (step S17). At this time, for example, if the lock has already been acquired by another thread, the lock acquisition unit 121 executes the lock acquisition process after waiting for the lock to be released. When the lock acquisition process is executed, the lock acquisition unit 121 outputs a lock acquisition response (return value) to the monitoring target application 11. This lock acquisition response includes the address of the lock for which acquisition processing has been completed.

ジャケットライブラリ１３のＡＰＩフック部１３１は、ＯＳ１２のロック取得部１２１によって出力されたロック取得応答をフックする。ＡＰＩフック部１３１は、ロック取得応答をフックすると、ロック状態管理部１３２を実行する（ステップＳ１８）。このとき、ＡＰＩフック部１３１は、フックされたロック取得応答に含まれるロックのアドレス及び当該ロック取得応答が返された旨（ロック取得の応答）をロック状態管理部１３２に通知する。 The API hook unit 131 of the jacket library 13 hooks the lock acquisition response output by the lock acquisition unit 121 of the OS 12. When the API hook unit 131 hooks the lock acquisition response, the API hook unit 131 executes the lock state management unit 132 (step S18). At this time, the API hook unit 131 notifies the lock state management unit 132 of the address of the lock included in the hooked lock acquisition response and that the lock acquisition response has been returned (lock acquisition response).

ＡＰＩフック部１３１によってロック状態管理部１３２が実行されると、ロック状態管理部１３２は、ＡＰＩフック部１３１によってフックされたロック取得応答の元となるロック取得要求の要求元であるスレッドを識別するためのスレッドＩＤをＯＳ１２に問い合わせる。これにより、ロック状態管理部１３２は、スレッドＩＤを取得する（ステップＳ１９）。 When the lock state management unit 132 is executed by the API hook unit 131, the lock state management unit 132 identifies the thread that is the request source of the lock acquisition request that is the source of the lock acquisition response hooked by the API hook unit 131. The OS 12 is inquired about the thread ID for this purpose. Thereby, the lock state management unit 132 acquires the thread ID (step S19).

ロック状態管理部１３２は、ＡＰＩフック部１３１から通知されたロックのアドレス及びＯＳ１２に問い合わせることにより取得されたスレッドＩＤにロック状態テーブル１３３において対応付けられているロックの取得状態を更新する。このとき、ロック状態管理部１３２は、ＡＰＩフック部１３１からの通知の内容が「ロック取得の応答」であるため、ロックの取得状態を「ロック中」に更新する（ステップＳ２０）。 The lock state management unit 132 updates the lock acquisition state associated in the lock state table 133 with the lock ID notified from the API hook unit 131 and the thread ID acquired by making an inquiry to the OS 12. At this time, since the content of the notification from the API hook unit 131 is “lock acquisition response”, the lock state management unit 132 updates the lock acquisition state to “locking” (step S20).

上記したような処理が実行された後、ＡＰＩフック部１３１によりロック取得応答が監視対象アプリケーション１１に返されると処理が終了される。 After the processing as described above is executed, when the lock acquisition response is returned to the monitoring target application 11 by the API hook unit 131, the processing is terminated.

一方、ステップＳ１５においてデッドロック検出部１３４によってデッドロックの発生が検出された場合、デッドロック検出部１３４は、デッドロック通知部１３５を実行する（ステップＳ２１）。 On the other hand, when the occurrence of deadlock is detected by the deadlock detection unit 134 in step S15, the deadlock detection unit 134 executes the deadlock notification unit 135 (step S21).

デッドロック検出部１３４によってデッドロック通知部１３５が実行されると、デッドロック通知部１３５は、デッドロックの発生をＨＡクラスタデーモン１４に通知する（ステップＳ２２）。 When the deadlock notifying unit 135 is executed by the deadlock detecting unit 134, the deadlock notifying unit 135 notifies the HA cluster daemon 14 of the occurrence of the deadlock (step S22).

デッドロック通知部１３５によってデッドロックの発生が通知されると、ＨＡクラスタデーモン１４は、監視対象アプリケーション１１で障害が発生したものとしてサービスのフェイルオーバ等の対処を行う（ステップＳ２３）。 When the deadlock notification unit 135 notifies the occurrence of a deadlock, the HA cluster daemon 14 takes measures such as service failover, assuming that a failure has occurred in the monitored application 11 (step S23).

なお、上記したステップＳ２３の処理が実行されると、ＨＡクラスタデーモン１４により監視対象アプリケーション１１等の動作が停止される。 Note that when the processing in step S23 described above is executed, the operation of the monitoring target application 11 and the like is stopped by the HA cluster daemon 14.

次に、図８を参照して、監視対象アプリケーション１１を構成するスレッドによってロックが解放される際の処理の流れについて説明する。 Next, the flow of processing when a lock is released by a thread constituting the monitoring target application 11 will be described with reference to FIG.

まず、監視対象アプリケーション１１を構成するスレッドは、当該スレッドがロックを取得している資源の占有を解放するためのロック解放要求をＯＳ１２に対して出力する。このロック解放要求には、当該ロック解放要求に応じて解放されるロックのアドレスが含まれる。 First, the thread constituting the monitoring target application 11 outputs a lock release request for releasing the occupation of the resource for which the thread has acquired the lock to the OS 12. This lock release request includes the address of the lock released in response to the lock release request.

ジャケットライブラリ１３のＡＰＩフック部１３１は、監視対象アプリケーション１１を構成するスレッドからのロック解放要求をフックする。ＡＰＩフック部１３１は、ロック解放要求をフックすると、ロック状態管理部１３２を実行する（ステップＳ３１）。このとき、ＡＰＩフック部１３１は、フックされたロック解放要求に含まれるロックのアドレス及び当該ロック解放要求に応じてロックの解放が実行される旨（ロック解放の実行）をロック状態管理部１３２に通知する。 The API hook unit 131 of the jacket library 13 hooks a lock release request from a thread constituting the monitoring target application 11. When the API hook unit 131 hooks the lock release request, the API hook unit 131 executes the lock state management unit 132 (step S31). At this time, the API hook unit 131 informs the lock state management unit 132 that the lock address contained in the hooked lock release request and that the lock release is executed according to the lock release request (execution of lock release). Notice.

ここで、前述したように、ＡＰＩフック部１３１からの通知の内容が「ロック解放の実行」である場合には、ロックの取得状態は変更しないためロック状態テーブル１３３の更新は実行されない。 Here, as described above, when the content of the notification from the API hook unit 131 is “execution of lock release”, the lock acquisition state is not changed, and the lock state table 133 is not updated.

したがって、ＡＰＩフック部１３１は、ＯＳ１２のロック解放ＡＰＩ（ロック解放部１２２）を実行する（ステップＳ３２）。このとき、ＡＰＩフック部１３１は、ＯＳ１２にロックのアドレスを出力する。 Accordingly, the API hook unit 131 executes the lock release API (lock release unit 122) of the OS 12 (step S32). At this time, the API hook unit 131 outputs a lock address to the OS 12.

ＯＳ１２のロック解放部１２１は、ＡＰＩフック部１３１によって出力されたロックのアドレスに応じて、ロック解放処理を実行する（ステップＳ３３）。ロック解放部１２２は、ロック解放処理が実行されると、ロック解放応答（返り値）を監視対象アプリケーション１１に対して出力する。このロック解放応答には、解放処理が完了されたロックのアドレスが含まれる。 The lock release unit 121 of the OS 12 executes a lock release process according to the lock address output by the API hook unit 131 (step S33). When the lock release processing is executed, the lock release unit 122 outputs a lock release response (return value) to the monitoring target application 11. This lock release response includes the address of the lock for which the release process has been completed.

ジャケットライブラリ１３のＡＰＩフック部１３１は、ＯＳ１２のロック解放部１２２によって出力されたロック解放応答をフックする。ＡＰＩフック部１３１は、ロック解放応答をフックすると、ロック状態管理部１３２を実行する（ステップＳ３４）。このとき、ＡＰＩフック部１３１は、フックされたロック解放応答に含まれるロックのアドレス及び当該ロック解放応答が返された旨（ロック解放の応答）をロック状態管理部１３２に通知する。 The API hook unit 131 of the jacket library 13 hooks the lock release response output by the lock release unit 122 of the OS 12. When hooking the lock release response, the API hook unit 131 executes the lock state management unit 132 (step S34). At this time, the API hook unit 131 notifies the lock state management unit 132 of the address of the lock included in the hooked lock release response and the fact that the lock release response is returned (lock release response).

ＡＰＩフック部１３１によってロック状態管理部１３２が実行されると、ロック状態管理部１３２は、ＡＰＩフック部１３１によってフックされたロック解放応答の元となるロック解放要求の要求元であるスレッドを識別するためのスレッドＩＤをＯＳ１２に問い合わせる。これにより、ロック状態管理部１３２は、ロック解放応答の元となるロック解放要求の要求元であるスレッドを識別するためのスレッドＩＤを取得する（ステップＳ３５）。 When the lock state management unit 132 is executed by the API hook unit 131, the lock state management unit 132 identifies the thread that is the request source of the lock release request that is the source of the lock release response hooked by the API hook unit 131. The OS 12 is inquired about the thread ID for this purpose. Thereby, the lock state management unit 132 acquires a thread ID for identifying the thread that is the request source of the lock release request that is the source of the lock release response (step S35).

ロック状態管理部１３２は、ＡＰＩフック部１３１から通知されたロックのアドレス及びＯＳ１２に問い合わせることにより取得されたスレッドＩＤにロック状態テーブル１３３において対応付けられているロックの取得状態を更新する。このとき、ロック状態管理部１３２は、ＡＰＩフック部１３１からの通知の内容が「ロック解放の応答」であるため、ロックの取得状態を「ロックなし」に更新する（ステップＳ３６）。 The lock state management unit 132 updates the lock acquisition state associated in the lock state table 133 with the lock ID notified from the API hook unit 131 and the thread ID acquired by making an inquiry to the OS 12. At this time, since the content of the notification from the API hook unit 131 is “lock release response”, the lock state management unit 132 updates the lock acquisition state to “no lock” (step S36).

上記したような処理が実行された後、ＡＰＩフック部１３１によりロック解放応答が監視対象アプリケーション１１に返されると処理が終了される。 After the processing as described above is executed, when the lock release response is returned to the monitoring target application 11 by the API hook unit 131, the processing is terminated.

上記したように本実施形態においては、監視対象アプリケーション１１を構成する複数のスレッドからのロック取得要求及びロック解放要求と、ＯＳ１２からのロック取得応答及びロック解放応答をフックし、当該複数のスレッドのロックの取得状態をロック状態テーブル１３３において管理することによって、例えばスレッドからロック取得要求があった場合にデッドロックの発生を検出することが可能となる。 As described above, in the present embodiment, the lock acquisition request and the lock release request from the plurality of threads constituting the monitoring target application 11 and the lock acquisition response and the lock release response from the OS 12 are hooked, By managing the lock acquisition state in the lock state table 133, for example, when a lock acquisition request is issued from a thread, it is possible to detect the occurrence of a deadlock.

これにより、本実施形態においては、例えば監視対象アプリケーション１１の状態を監視する外部コマンドを定期的に実行することによって当該監視対象アプリケーション１１で発生した障害を検出するような定期監視と併用することで、検出可能な障害範囲を拡大することが可能となる。したがって、監視対象アプリケーション１１で障害が発生した状態でサービスが継続される可能性を低下させることが可能となり、サービスの信頼性を向上させることが可能となる。 As a result, in the present embodiment, for example, by periodically executing an external command for monitoring the state of the monitoring target application 11, it is used in combination with periodic monitoring that detects a failure occurring in the monitoring target application 11. Thus, the detectable fault range can be expanded. Therefore, it is possible to reduce the possibility that the service is continued in a state where a failure has occurred in the monitoring target application 11, and it is possible to improve the reliability of the service.

また、本実施形態においては、デッドロックが発生するタイミング（イベントドリブン）で障害を検出することができるため、上記した定期監視のみで検出できる障害の場合であっても、監視対象アプリケーションで発生した障害を検出するまでの時間を短縮することが可能となる。これにより、サービス停止時間を短縮することが可能となるため、当該サービスの可用性を向上させることができる。 In the present embodiment, since a failure can be detected at the timing (event driven) when a deadlock occurs, even if the failure can be detected only by the regular monitoring described above, the failure occurred in the monitored application. It is possible to shorten the time until the failure is detected. As a result, the service stop time can be shortened, so that the availability of the service can be improved.

［第２の実施形態］
次に、図９を参照して、本発明の第２の実施形態について説明する。図９は、本実施形態に係るクラスタシステムに備えられるノード１００の主として機能構成を示すブロック図である。なお、前述した図２と同様の部分には同一参照符号を付してその詳しい説明は省略する。ここでは、図２と異なる部分について主に述べる。以下の実施形態についても同様にして重複した説明を省略する。 [Second Embodiment]
Next, a second embodiment of the present invention will be described with reference to FIG. FIG. 9 is a block diagram mainly showing a functional configuration of the node 100 provided in the cluster system according to the present embodiment. The same parts as those in FIG. 2 described above are denoted by the same reference numerals, and detailed description thereof is omitted. Here, parts different from FIG. 2 will be mainly described. Similarly, the following embodiments will not be described repeatedly.

図９に示すように、本実施形態に係るクラスタシステムに備えられるノード１００上では、監視対象アプリケーション、ＯＳ１２０、ジャケットライブラリ１３０ａ及びジャケットライブラリ１３０ｂが動作する。 As shown in FIG. 9, the monitoring target application, the OS 120, the jacket library 130a, and the jacket library 130b operate on the node 100 provided in the cluster system according to the present embodiment.

前述した第１の実施形態においては、監視対象アプリケーションが１つのプロセスから構成されるものとして説明したが、本実施形態においては、監視対象アプリケーションが複数のプロセスから構成される場合を想定している。つまり、本実施形態においては、ノード１００で監視対象のアプリケーションのプロセス１１０ａ及び１１０ｂを含む複数のプロセスが実行される。以下、監視対象のアプリケーションのプロセス１１０ａを、単にプロセス１１０ａと表記する。同様に、監視対象のアプリケーションのプロセス１１０ｂを、単にプロセス１１０ｂと表記する。 In the first embodiment described above, the monitoring target application has been described as being configured by one process. However, in the present embodiment, it is assumed that the monitoring target application is configured by a plurality of processes. . That is, in this embodiment, a plurality of processes including the processes 110a and 110b of the monitoring target application are executed on the node 100. Hereinafter, the process 110a of the monitoring target application is simply referred to as a process 110a. Similarly, the process 110b of the monitoring target application is simply expressed as a process 110b.

以下、監視対象アプリケーションを構成する複数のプロセスのうち、プロセス１１０ａ及び１１０ｂについて説明する。プロセス１１０ａ及び１１０ｂ以外のプロセスについては、当該プロセス１１０ａ及び１１０ｂと同様である。 Hereinafter, the processes 110a and 110b among the plurality of processes constituting the monitoring target application will be described. The processes other than the processes 110a and 110b are the same as the processes 110a and 110b.

なお、監視対象アプリケーションを構成するプロセス１１０ａ及び１１０ｂは、複数のスレッドから構成される。 Note that the processes 110a and 110b constituting the monitoring target application are constituted by a plurality of threads.

ＯＳ１２０は、プロセス１１０ａとプロセス１１０ｂと当該プロセス１１０ａ及び１１０ｂを構成する複数のスレッドの各々とを管理する。この場合、ＯＳ１２０は、プロセス１１０ａ及び１１０ｂを識別するためのプロセスＩＤと当該プロセス１１０ａ及び１１０ｂを構成する複数のスレッドの各々を識別するためのスレッドＩＤを用いて実行中のスレッドを管理する。 The OS 120 manages the process 110a, the process 110b, and each of a plurality of threads constituting the processes 110a and 110b. In this case, the OS 120 manages the thread being executed by using the process ID for identifying the processes 110a and 110b and the thread ID for identifying each of the plurality of threads constituting the processes 110a and 110b.

ＯＳ１２０は、ロック状態テーブル１２３を含む。このロック状態テーブル１２３は、例えばノード１００に備えられるメモリ（図示せず）内であって、ＯＳ１２０に割り当てられた複数のプロセス間で共有のメモリ空間にその領域が確保されている。 The OS 120 includes a lock state table 123. The lock state table 123 is reserved in a memory space (not shown) provided in the node 100, for example, in a shared memory space among a plurality of processes assigned to the OS 120.

ロック状態テーブル１２３には、プロセス１１０ａ及び１１０ｂを構成する複数のスレッドによるロックの取得状態が登録される。 In the lock state table 123, lock acquisition states by a plurality of threads constituting the processes 110a and 110b are registered.

ジャケットライブラリ１３０ａは、プロセス１１０ａを構成する複数のスレッドの各々からのＯＳ１２０に対するロック取得要求及びロック解放要求をフック（取得）することで、当該複数のスレッドの各々によるロックの取得状態を管理する。ジャケットライブラリ１３０ａは、フックされたロック取得要求及び複数のスレッドの各々によるロックの取得状態に基づいてデッドロックの発生を検出する。ジャケットライブラリ１３０ａは、デッドロックの発生が検出された場合、その旨をＨＡクラスタデーモン１４に通知する。 The jacket library 130a manages a lock acquisition state by each of the plurality of threads by hooking (acquiring) a lock acquisition request and a lock release request to the OS 120 from each of the plurality of threads constituting the process 110a. The jacket library 130a detects the occurrence of a deadlock based on a hooked lock acquisition request and a lock acquisition state by each of a plurality of threads. When the occurrence of deadlock is detected, the jacket library 130a notifies the HA cluster daemon 14 to that effect.

なお、ジャケットライブラリは、プロセス毎に対応している。つまり、例えばプロセス１１０ａにはジャケットライブラリ１３０ａが対応しており、プロセス１１０ｂにはジャケットライブラリ１３０ｂが対応している。この場合、ジャケットライブラリ１３０ａはプロセス１１０ａからのロック取得要求等に応じた処理を実行し、ジャケットライブラリ１３０ｂはプロセス１１０ｂからのロック取得要求等に応じた処理を実行する。 Note that the jacket library corresponds to each process. That is, for example, the jacket library 130a corresponds to the process 110a, and the jacket library 130b corresponds to the process 110b. In this case, the jacket library 130a executes processing corresponding to the lock acquisition request from the process 110a, and the jacket library 130b executes processing according to the lock acquisition request from the process 110b.

以下、主にジャケットライブラリ１３０ａの構成等について説明するが、ジャケットライブラリ１３０ｂについても当該ジャケットライブラリ１３０ａと同様である。 Hereinafter, the configuration and the like of the jacket library 130a will be mainly described, but the jacket library 130b is the same as the jacket library 130a.

ジャケットライブラリ１３０ａは、ロック状態管理部１３６及びデッドロック検出部１３７を含む。 The jacket library 130a includes a lock state management unit 136 and a deadlock detection unit 137.

本実施形態において、ロック状態管理部１３６及びデッドロック検出部１３７は、ノード１００のコンピュータ（図示せず）が例えばノード１００に備えられるメモリに格納されているプログラムを実行することにより実現されるものとする。このプログラムは、コンピュータ読取可能な記憶媒体に予め格納して頒布可能である。また、このプログラムが、ネットワークを介してコンピュータにダウンロードされても構わない。 In the present embodiment, the lock state management unit 136 and the deadlock detection unit 137 are realized by a computer (not shown) of the node 100 executing a program stored in a memory provided in the node 100, for example. And This program can be stored in advance in a computer-readable storage medium and distributed. In addition, this program may be downloaded to a computer via a network.

ロック状態管理部１３６は、ＡＰＩフック部１３１によってフックされたロック取得要求またはロック解放要求の要求元であるスレッド（を含む複数のスレッド）から構成される）プロセス１１０ａを識別するためのプロセスＩＤを、ＯＳ１２０に問い合わせる。これにより、ロック状態管理部１３６は、プロセスＩＤを取得する。 The lock state management unit 136 sets a process ID for identifying a process 110a (consisting of a plurality of threads) including a thread that is a request source of a lock acquisition request or a lock release request hooked by the API hook unit 131. Inquires the OS 120. Thereby, the lock state management unit 136 acquires the process ID.

また、ロック状態管理部１３６は、ＡＰＩフック部１３１からの通知を受けると、ロック取得要求またはロック解放要求の要求元であるスレッドを識別するためのスレッドＩＤを、ＯＳ１２０に問い合わせる。これにより、ロック状態管理部１３６は、スレッドＩＤを取得する。 In addition, when the notification from the API hook unit 131 is received, the lock state management unit 136 inquires of the OS 120 about a thread ID for identifying the thread that is the request source of the lock acquisition request or the lock release request. Thereby, the lock state management unit 136 acquires the thread ID.

ロック状態管理部１３６は、ＡＰＩフック部１３１からの通知、取得されたプロセスＩＤ及びスレッドＩＤに基づいてロック状態テーブル１２３を更新する。つまり、本実施形態においては、前述した第１の実施形態とは異なり、プロセスＩＤ、スレッドＩＤ及びロックのアドレスの３次元の組み合わせに対してロックの取得状態が管理（登録）される。 The lock state management unit 136 updates the lock state table 123 based on the notification from the API hook unit 131 and the acquired process ID and thread ID. That is, in this embodiment, unlike the first embodiment described above, the lock acquisition state is managed (registered) for a three-dimensional combination of a process ID, a thread ID, and a lock address.

なお、ロック状態テーブル１２３には、上記したようにプロセス１１０ａ及び１１０ｂを構成する複数のスレッドによるロックの取得状態が登録されるが、ジャケットライブラリ１３０ａのロック状態管理部１３６は、プロセス１１０ａを構成する複数のスレッドによるロックの取得状態をロック状態テーブル１２３に登録（更新）する。つまり、プロセス１１０ｂを構成する複数のスレッドによるロックの取得状態は、ジャケットライブラリ１３０ｂのロック状態管理部（図示せず）によって登録される。 In the lock state table 123, as described above, the lock acquisition state by a plurality of threads constituting the processes 110a and 110b is registered, but the lock state management unit 136 of the jacket library 130a constitutes the process 110a. The lock acquisition status by a plurality of threads is registered (updated) in the lock status table 123. That is, the lock acquisition state by a plurality of threads constituting the process 110b is registered by a lock state management unit (not shown) of the jacket library 130b.

ロック状態管理部１３６は、更新するプロセスＩＤ、スレッドＩＤまたはロックのアドレスがロック状態テーブル１２３に登録されていないときは当該ロック状態テーブル１２３を拡張する。 When the process ID, thread ID, or lock address to be updated is not registered in the lock state table 123, the lock state management unit 136 extends the lock state table 123.

デッドロック検出部１３７は、ＡＰＩフック部１３１からの要求に基づいて、ロックの取得によってデッドロックが発生するか否かを確認する。これにより、デッドロック検出部１３７は、デッドロックの発生を検出する。 The deadlock detection unit 137 confirms whether or not a deadlock occurs due to acquisition of a lock based on a request from the API hook unit 131. As a result, the deadlock detector 137 detects the occurrence of a deadlock.

このとき、デッドロック検出部１３７は、ＡＰＩフック部１３１によってフックされたロック取得要求の要求元であるスレッドから構成されるプロセス１１０ａを識別するためのプロセスＩＤをＯＳ１２０に問い合わせる。これにより、デッドロック検出部１３７は、プロセス１１０ａのプロセスＩＤを取得する。また、デッドロック検出部１３７は、ＡＰＩフック部１３１によってフックされたロック取得要求の要求元であるスレッドを識別するためのスレッドＩＤをＯＳ１２０に問い合わせることにより、当該スレッドＩＤを取得する。 At this time, the deadlock detection unit 137 inquires of the OS 120 for a process ID for identifying the process 110a configured by the thread that is the request source of the lock acquisition request hooked by the API hook unit 131. Thereby, the deadlock detector 137 acquires the process ID of the process 110a. In addition, the deadlock detection unit 137 acquires the thread ID by inquiring the OS 120 about the thread ID for identifying the thread that is the request source of the lock acquisition request hooked by the API hook unit 131.

デッドロック検出部１３７は、取得されたプロセスＩＤ、スレッドＩＤ及びＡＰＩフック部１３１によって出力されたロックのアドレスと、ロック状態テーブル１２３に登録されているロック取得状態とに基づいて、デッドロックの発生を検出する。デッドロックの発生が検出された場合、デッドロック検出部１３７は、デッドロック通知部１３５に当該デッドロックの発生の通知要求を出力する。 The deadlock detection unit 137 generates a deadlock based on the acquired process ID, thread ID, and the lock address output by the API hook unit 131 and the lock acquisition state registered in the lock state table 123. Is detected. When occurrence of a deadlock is detected, the deadlock detection unit 137 outputs a notification request for occurrence of the deadlock to the deadlock notification unit 135.

なお、ジャケットライブラリ１３０ｂの構成については、上記したようにジャケットライブラリ１３０ａと同様の構成であるため、図９において省略されている。また、ジャケットライブラリ１３０ｂの構成の詳しい説明は省略する。 The configuration of the jacket library 130b is the same as that of the jacket library 130a as described above, and is omitted in FIG. A detailed description of the configuration of the jacket library 130b is omitted.

次に、図１０を参照して、プロセス１１０ａを構成するスレッドによってロックが取得される際の処理の流れについて説明する。 Next, the flow of processing when a lock is acquired by a thread constituting the process 110a will be described with reference to FIG.

まず、プロセス１１０ａを構成するスレッドは、資源を占有するためのロック取得要求をＯＳ１２０に対して出力する。このロック取得要求には、当該ロック取得要求に応じて取得されるロックのアドレスが含まれる。 First, the thread constituting the process 110a outputs a lock acquisition request for occupying resources to the OS 120. This lock acquisition request includes the address of the lock acquired in response to the lock acquisition request.

次に、前述した図７に示すステップＳ１１の処理に相当するステップＳ４１の処理が実行される。 Next, the process of step S41 corresponding to the process of step S11 shown in FIG. 7 is executed.

ロック状態管理部１３６は、ＡＰＩフック部１３１によってフックされたロック取得要求の要求元であるスレッドから構成されるプロセス１１０ａを識別するためのプロセスＩＤをＯＳ１２０に問い合わせる。これにより、ロック状態管理部１３６は、プロセス１１０ａのプロセスＩＤを取得する（ステップＳ４２）。 The lock state management unit 136 inquires of the OS 120 about a process ID for identifying the process 110 a configured by a thread that is a request source of the lock acquisition request hooked by the API hook unit 131. Thereby, the lock state management unit 136 acquires the process ID of the process 110a (step S42).

ステップＳ４２の処理が実行されると、前述した図７に示すステップＳ１２の処理に相当するステップＳ４３の処理が実行される。 When the process of step S42 is executed, the process of step S43 corresponding to the process of step S12 shown in FIG. 7 described above is executed.

ロック状態管理部１３６は、ＡＰＩフック部１３１から通知されたロックのアドレス、ステップＳ４２において取得されたプロセスＩＤ及びステップＳ４３において取得されたスレッドＩＤに対応付けられているロックの取得状態を更新する。このとき、ロック状態管理部１３２は、ＡＰＩフック部１３１からの通知の内容が「ロック取得の実行」であるため、ロック状態テーブル１２３におけるロックの取得状態を「ロック要求中」に更新する（ステップＳ４４）。 The lock state management unit 136 updates the lock acquisition state associated with the address of the lock notified from the API hook unit 131, the process ID acquired in step S42, and the thread ID acquired in step S43. At this time, since the content of the notification from the API hook unit 131 is “execution of lock acquisition”, the lock state management unit 132 updates the lock acquisition state in the lock state table 123 to “lock requesting” (step S1). S44).

次に、デッドロック検出部１３７は、デッドロックの発生の検出処理を実行する（ステップＳ４６）。このとき、デッドロック検出部１３７は、ＡＰＩフック部１３１によってフックされたロック取得要求の要求元であるスレッドを含むプロセス１１０ａを識別するためのプロセスＩＤをＯＳ１２０に問い合わせることにより、当該プロセスＩＤを取得する。また、デッドロック検出部１３７は、ＡＰＩフック部１３１によってフックされたロック取得要求の要求元であるスレッドを識別するためのスレッドＩＤをＯＳ１２０に問い合わせることにより、当該スレッドＩＤを取得する。デッドロック検出部１３７は、取得されたプロセスＩＤ、スレッドＩＤ、ＡＰＩフック部１３１によって出力されたロックのアドレス及びロック状態テーブル１２３に登録されているロックの取得状態に基づいてデッドロックの発生を検出する。このロック状態テーブル１２３には、プロセス１１０ａ及び１１０ｂを含む複数のプロセスを構成するスレッドによるロックの取得状態が登録されている。 Next, the deadlock detection unit 137 executes deadlock occurrence detection processing (step S46). At this time, the deadlock detection unit 137 obtains the process ID by inquiring the process ID for identifying the process 110a including the thread that is the request source of the lock acquisition request hooked by the API hook unit 131. To do. In addition, the deadlock detection unit 137 acquires the thread ID by inquiring the OS 120 about the thread ID for identifying the thread that is the request source of the lock acquisition request hooked by the API hook unit 131. The deadlock detection unit 137 detects the occurrence of a deadlock based on the acquired process ID, thread ID, lock address output by the API hook unit 131, and the lock acquisition state registered in the lock state table 123. To do. In the lock state table 123, lock acquisition states by threads constituting a plurality of processes including the processes 110a and 110b are registered.

なお、デッドロック検出部１３７によるデッドロックの発生の検出処理は、前述した図５に示す処理において例えばプロセスＩＤ及びスレッドＩＤの組をＴ、ロックのアドレスをＬとした場合と同様であるため、その詳しい説明は省略する。 The deadlock generation detection process by the deadlock detection unit 137 is the same as the process shown in FIG. 5 described above, for example, when the set of process ID and thread ID is T and the lock address is L. Detailed description thereof is omitted.

デッドロック検出部１３７によってデッドロックの発生が検出されない場合（ステップＳ４６のＮＯ）、前述した図７に示すステップＳ１６〜ステップＳ１８の処理に相当するステップＳ４７〜ステップＳ４９の処理が実行される。 When the deadlock detection unit 137 does not detect the occurrence of a deadlock (NO in step S46), the processes in steps S47 to S49 corresponding to the processes in steps S16 to S18 shown in FIG. 7 described above are executed.

次に、ロック状態管理部１３６は、ＡＰＩフック部１３１によってフックされたロック取得応答の元となるロック取得要求の要求元であるスレッドから構成されるプロセス１１０ａを識別するためのプロセスＩＤをＯＳ１２０に問い合わせる。これにより、ロック状態管理部１３６は、プロセス１１０ａのプロセスＩＤを取得する（ステップＳ５０）。 Next, the lock state management unit 136 sets a process ID for identifying the process 110a including the thread that is the request source of the lock acquisition request that is the source of the lock acquisition response hooked by the API hook unit 131, to the OS 120. Inquire. Thereby, the lock state management unit 136 acquires the process ID of the process 110a (step S50).

ステップＳ５０の処理が実行されると、前述した図７のステップＳ１９の処理に相当するステップＳ５１の処理が実行される。 When the process of step S50 is executed, the process of step S51 corresponding to the process of step S19 of FIG. 7 described above is executed.

ロック状態管理部１３６は、ＡＰＩフック部１３１から通知されたロックのアドレス、取得されたプロセスＩＤ及びスレッドＩＤにロック状態テーブル１２３において対応付けられているロックの取得状態を更新する。このとき、ロック状態管理部１３６は、ＡＰＩフック部１３１からの通知の内容が「ロック取得の応答」であるため、ロック状態テーブル１２３におけるロックの取得状態を「ロック中」に更新する（ステップＳ２０）。 The lock state management unit 136 updates the lock acquisition state associated in the lock state table 123 with the lock address, the acquired process ID, and the thread ID notified from the API hook unit 131. At this time, the lock state management unit 136 updates the lock acquisition state in the lock state table 123 to “locking” because the content of the notification from the API hook unit 131 is “response to acquire lock” (step S20). ).

一方、ステップＳ４６においてデッドロック検出部１３７によってデッドロックの発生が検出された場合、前述した図７に示すステップＳ２１〜ステップＳ２３の処理に相当するステップＳ５３〜ステップＳ５５の処理が実行される。 On the other hand, when occurrence of a deadlock is detected by the deadlock detection unit 137 in step S46, the processes of steps S53 to S55 corresponding to the processes of steps S21 to S23 shown in FIG. 7 described above are executed.

なお、プロセス１１０ｂを構成するスレッドによってロックが取得される場合には、上記した図１０に示す処理と同様の処理がジャケットライブラリ１３０ｂにおいて実行されるため、その詳しい説明は省略する。 Note that when the lock is acquired by the thread constituting the process 110b, the same processing as the processing shown in FIG. 10 described above is executed in the jacket library 130b, and thus detailed description thereof is omitted.

次に、図１１を参照して、プロセス１１０ａを構成するスレッドによってロックが解放される際の処理の流れについて説明する。 Next, with reference to FIG. 11, the flow of processing when a lock is released by a thread constituting the process 110a will be described.

まず、プロセス１１０ａを構成するスレッドは、当該スレッドがロックを取得している資源の占有を解放するためのロック解放要求をＯＳ１２０に対して出力する。このロック解放要求には、当該ロック解放要求に応じて解放されるロックのアドレスが含まれる。 First, the thread constituting the process 110a outputs a lock release request for releasing the occupation of the resource for which the thread has acquired the lock to the OS 120. This lock release request includes the address of the lock released in response to the lock release request.

次に、前述した図８に示すステップＳ３１〜ステップＳ３４の処理に相当するステップＳ６１〜ステップＳ６４の処理が実行される。 Next, the process of step S61-step S64 equivalent to the process of step S31-step S34 shown in FIG. 8 mentioned above is performed.

ロック状態管理部１３６は、ＡＰＩフック部１３１によってフックされたロック解放応答の元となるロック解放要求の要求元であるスレッドから構成されるプロセス１１０ａを識別するためのプロセスＩＤをＯＳ１２０に問い合わせる。これにより、ロック状態管理部１３６は、プロセス１１０ａのプロセスＩＤを取得する（ステップＳ６５）。 The lock state management unit 136 inquires of the OS 120 about a process ID for identifying the process 110a configured by the thread that is the request source of the lock release request that is the source of the lock release response hooked by the API hook unit 131. Thereby, the lock state management unit 136 acquires the process ID of the process 110a (step S65).

ステップＳ６５の処理が実行されると、前述した図８に示すステップＳ３５の処理に相当するステップＳ６６の処理が実行される。 When the process of step S65 is executed, the process of step S66 corresponding to the process of step S35 shown in FIG. 8 described above is executed.

次に、ロック状態管理部１３６は、ＡＰＩフック部１３１から通知されたロックのアドレス、ステップＳ６５において取得されたプロセスＩＤ及びステップＳ６６において取得されたスレッドＩＤにロック状態テーブル１２３において対応付けられているロックの取得状態を更新する。このとき、ロック状態管理部１３６は、ＡＰＩフック部１３１からの通知の内容が「ロック解放の応答」であるため、ロック状態テーブル１２３におけるロックの取得状態を「ロックなし」に更新する（ステップＳ６７）。 Next, the lock state management unit 136 associates the lock address notified from the API hook unit 131, the process ID acquired in step S65, and the thread ID acquired in step S66 in the lock state table 123. Update the lock acquisition status. At this time, the lock state management unit 136 updates the lock acquisition state in the lock state table 123 to “no lock” because the content of the notification from the API hook unit 131 is “lock release response” (step S67). ).

なお、プロセス１１０ｂを構成するスレッドによってロックが解放される場合には、上記した図１１に示す処理と同様の処理がジャケットライブラリ１３０ｂにおいて実行されるため、その詳しい説明は省略する。 Note that when the lock is released by the thread constituting the process 110b, the same processing as the processing shown in FIG. 11 described above is executed in the jacket library 130b, and therefore detailed description thereof is omitted.

上記したように本実施形態においては、監視対象アプリケーションを構成する複数のプロセス（プロセス１１０ａ及びプロセス１１０ｂ）を構成する複数のスレッドからのロック取得要求及びロック解放要求と、ＯＳ１２０からのロック取得応答及びロック解放応答とをフックし、当該複数のスレッドのロックの取得状態をロック状態テーブル１２３において管理することによって例えばスレッドからロック取得要求があった場合にデッドロックの発生を検出することが可能となる。つまり、本実施形態においては、前述した第１の実施形態とは異なり、例えば監視対象アプリケーションを構成する複数のプロセス間のデッドロックもイベントドリブンで検出すること可能となる。これにより、本実施形態は、例えばマルチプロセスのアプリケーションに対しても適用することができる。 As described above, in the present embodiment, lock acquisition requests and lock release requests from a plurality of threads that configure a plurality of processes (process 110a and process 110b) that constitute a monitoring target application, lock acquisition responses from the OS 120, and By hooking the lock release response and managing the lock acquisition states of the plurality of threads in the lock state table 123, for example, when a lock acquisition request is issued from a thread, it is possible to detect the occurrence of a deadlock. . That is, in the present embodiment, unlike the first embodiment described above, for example, a deadlock between a plurality of processes constituting the monitoring target application can be detected in an event driven manner. As a result, the present embodiment can be applied to, for example, a multi-process application.

［第３の実施形態］
次に、図１２を参照して、本発明の第３の実施形態について説明する。図１２は、本実施形態に係るクラスタシステムに備えられるノード２００の機能構成を示すブロック図である。 [Third Embodiment]
Next, a third embodiment of the present invention will be described with reference to FIG. FIG. 12 is a block diagram showing a functional configuration of the node 200 provided in the cluster system according to the present embodiment.

ノード２００上では、ＯＳ２１０及びジャケットライブラリ２２０が動作する。ＯＳ２１０は、ロック作成部２１１及びスレッド作成部２１２を含む。 On the node 200, the OS 210 and the jacket library 220 operate. The OS 210 includes a lock creation unit 211 and a thread creation unit 212.

スレッド作成部２１１は、例えば監視対象アプリケーション１１からのスレッド作成要求に応じて、当該監視対象アプリケーション１１を構成するスレッドを作成する。このスレッド作成要求は、スレッドを開始する関数のアドレス（以下、関数アドレスと表記）を指定することにより行われる。また、スレッド作成要求は、例えば監視対象アプリケーション１１が起動された際に適宜出力される。スレッド作成部２１１は、スレッド作成要求に応じてスレッドが作成された場合、当該スレッド作成要求に対する応答として当該スレッドを識別するためのスレッドＩＤを監視対象アプリケーション１１に対して返す。 For example, in response to a thread creation request from the monitoring target application 11, the thread creation unit 211 creates a thread that configures the monitoring target application 11. This thread creation request is made by designating an address of a function that starts a thread (hereinafter referred to as a function address). The thread creation request is appropriately output when the monitoring target application 11 is activated, for example. When a thread is created in response to a thread creation request, the thread creation unit 211 returns a thread ID for identifying the thread to the monitoring target application 11 as a response to the thread creation request.

ロック作成部２１２は、例えば監視対象アプリケーション１１を構成する複数のスレッドからのロック作成要求に応じて、資源を占有するためのロックを新たに作成する。このロック作成要求は、例えば上記したスレッド作成部２１１によってスレッドが作成された後に、当該スレッドから必要に応じて出力される。ロック作成部２１２は、ロック作成要求に応じてロックが作成された場合、当該ロック作成要求に対する応答として当該ロックのアドレスを監視対象アプリケーション１１に対して返す。 The lock creation unit 212 newly creates a lock for occupying resources, for example, in response to lock creation requests from a plurality of threads constituting the monitoring target application 11. The lock creation request is output from the thread as necessary after the thread creation unit 211 creates the thread, for example. When a lock is created in response to the lock creation request, the lock creation unit 212 returns the address of the lock to the monitoring target application 11 as a response to the lock creation request.

図１２に示すように、ジャケットライブラリ２２０は、ＡＰＩフック部２２１、ロック／スレッド対応付け部２２２、スレッド対応テーブル２２３、ロック対応テーブル２２４、デッドロック検出部２２５、デッドロック情報作成部２２６、デッドロック情報読込部２２７、サスペンド判定部２２８を含む。 As illustrated in FIG. 12, the jacket library 220 includes an API hook unit 221, a lock / thread association unit 222, a thread correspondence table 223, a lock correspondence table 224, a deadlock detection unit 225, a deadlock information creation unit 226, a deadlock An information reading unit 227 and a suspend determination unit 228 are included.

本実施形態において、ＡＰＩフック部２２１、ロック／スレッド対応付け部２２２、デッドロック検出部２２５、デッドロック情報作成部２２６、デッドロック情報読込部２２７及びサスペンド判定部２２８は、ノード２００のコンピュータ（図示せず）が例えばノード２００に備えられるメモリ（図示せず）に格納されているプログラムを実行することにより実現されるものとする。このプログラムは、コンピュータ読取可能な記憶媒体に予め格納して頒布可能である。また、このプログラムが、ネットワークを介してコンピュータにダウンロードされても構わない。 In the present embodiment, the API hook unit 221, the lock / thread association unit 222, the deadlock detection unit 225, the deadlock information creation unit 226, the deadlock information read unit 227, and the suspend determination unit 228 are included in the computer (see FIG. (Not shown) is realized by executing a program stored in a memory (not shown) provided in the node 200, for example. This program can be stored in advance in a computer-readable storage medium and distributed. In addition, this program may be downloaded to a computer via a network.

また、本実施形態において、スレッド対応テーブル２２３及びロック対応テーブル２２４は、例えばノード２００に備えられるメモリ内であって、監視対象アプリケーション１１（を構成するプロセス）に与えられたメモリ空間にその領域が確保されている。 Further, in the present embodiment, the thread correspondence table 223 and the lock correspondence table 224 are, for example, in a memory provided in the node 200, and the areas of the memory correspondence table 223 and the lock correspondence table 224 are in the memory space given to the monitoring target application 11 It is secured.

ＡＰＩフック部２２１は、監視対象アプリケーション１１からのＯＳ２１０に対するスレッド作成要求及びロック作成要求をフックする。ＡＰＩフック部２２１は、フックされたスレッド作成要求及びロック作成要求をＯＳ２１０に出力する。 The API hook unit 221 hooks a thread creation request and a lock creation request to the OS 210 from the monitoring target application 11. The API hook unit 221 outputs the hooked thread creation request and lock creation request to the OS 210.

ＡＰＩフック部２２１は、スレッド作成要求に応じてスレッドが作成された後に、当該スレッド作成要求に対する応答をフックする。このフックされた応答には、スレッド作成要求に応じて作成されたスレッドを識別するためのスレッドＩＤが含まれる。ＡＰＩフック部２２１は、フックされた応答に含まれるスレッドＩＤをロック／スレッド対応付け部２２２に対して出力することで、当該ロック／スレッド対応付け部２２２にスレッド対応テーブル２２３の更新を要求する。 The API hook unit 221 hooks a response to the thread creation request after the thread is created in response to the thread creation request. The hooked response includes a thread ID for identifying the thread created in response to the thread creation request. The API hook unit 221 requests the lock / thread association unit 222 to update the thread correspondence table 223 by outputting the thread ID included in the hooked response to the lock / thread association unit 222.

また、ＡＰＩフック部２２１は、ロック作成要求に応じてロックが作成された後に、当該ロック作成要求に対する応答をフックする。このフックされた応答には、ロック作成要求に応じて作成されたロックのアドレスが含まれる。ＡＰＩフック部２２１は、フックされた応答に含まれるロックのアドレスをロック／スレッド対応付け部２２２に対して出力することで、当該ロック／スレッド対応付け部２２２にロック対応テーブル２２４の更新を要求する。 The API hook unit 221 hooks a response to the lock creation request after the lock is created in response to the lock creation request. This hooked response includes the address of the lock created in response to the lock creation request. The API hook unit 221 requests the lock / thread association unit 222 to update the lock correspondence table 224 by outputting the lock address included in the hooked response to the lock / thread association unit 222. .

ＡＰＩフック部２２１は、ロック取得要求がフックされた場合、当該ロック取得要求に含まれるロックのアドレスをサスペンド判定部２２８に出力することにより、当該ロック取得要求に応じて実行されるロック取得処理をサスペンドすべきか否かの判定を要求する。 When the lock acquisition request is hooked, the API hook unit 221 outputs a lock address included in the lock acquisition request to the suspend determination unit 228, thereby performing a lock acquisition process executed in response to the lock acquisition request. Requests a determination of whether to suspend.

ロック／スレッド対応付け部２２２は、ＡＰＩフック部２２１からの要求に応じてスレッド対応テーブル２２３を更新する。また、ロック／スレッド対応付け部２２２は、ＡＰＩフック部２２１からの要求に応じてロック対応テーブル２２４を更新する。 The lock / thread association unit 222 updates the thread association table 223 in response to a request from the API hook unit 221. Further, the lock / thread association unit 222 updates the lock correspondence table 224 in response to a request from the API hook unit 221.

スレッド対応テーブル２２３には、ＯＳ２１０のスレッド作成部２１１によって作成されたスレッドを識別するためのスレッドＩＤと当該スレッドを開始するときの関数アドレスとの組み合わせの情報が保持される。この関数アドレスは、上記したスレッド作成要求において指定される。 The thread correspondence table 223 holds information on a combination of a thread ID for identifying a thread created by the thread creation unit 211 of the OS 210 and a function address when starting the thread. This function address is specified in the above thread creation request.

ロック対応テーブル２２４は、ＯＳ２１０のロック作成部２１２によって作成されたロックのアドレスと当該ロックを作成したスレッド（ロック作成要求の要求元となるスレッド）を識別するためのスレッドＩＤとの組み合わせの情報が保持される。 The lock correspondence table 224 includes information on the combination of the address of the lock created by the lock creation unit 212 of the OS 210 and the thread ID for identifying the thread that created the lock (the thread that has requested the lock creation request). Retained.

デッドロック検出部２２５は、デッドロックの発生が検出された場合、デッドロック情報作成部２２６に対して当該デッドロックを示すデッドロック状況情報の作成を要求する。 When the occurrence of a deadlock is detected, the deadlock detection unit 225 requests the deadlock information creation unit 226 to create deadlock status information indicating the deadlock.

デッドロック情報作成部２２６は、デッドロック検出部２２５からの要求に応じて、デッドロック検出部２２５によって検出されたデッドロックを示すデッドロック状況情報を作成する。このとき、デッドロック情報作成部２２６は、スレッド対応テーブル２２３、ロック対応テーブル２２４及びロック状態テーブル１３３を参照して作成処理を実行する。デッドロック情報作成部２２６は、作成されたデッドロック状況情報を共有ディスク３０に格納する。 The deadlock information creation unit 226 creates deadlock status information indicating a deadlock detected by the deadlock detection unit 225 in response to a request from the deadlock detection unit 225. At this time, the deadlock information creation unit 226 performs creation processing with reference to the thread correspondence table 223, the lock correspondence table 224, and the lock state table 133. The deadlock information creation unit 226 stores the created deadlock status information in the shared disk 30.

デッドロック情報読込部２２７は、サスペンド判定部２２８の要求に応じて、共有ディスク３０に格納されているデッドロック状況情報を読み込む。デッドロック情報読込部２２７は、共有ディスク３０から読み込まれたデッドロック状況情報をサスペンド判定部２２８に出力する。 The deadlock information reading unit 227 reads deadlock status information stored in the shared disk 30 in response to a request from the suspend determination unit 228. The deadlock information reading unit 227 outputs the deadlock status information read from the shared disk 30 to the suspend determination unit 228.

サスペンド判定部２２８は、ＡＰＩフック部２２１からの要求を受けると、ＡＰＩフック部２２１によってフックされたロック取得要求に応じて実行されるロック取得処理をサスペンドすべきか否かを判定する。このとき、サスペンド判定部２２８は、スレッド対応テーブル２２３、ロック対応テーブル２２４及びロック状態テーブル１３３を参照して判定処理を実行する。また、サスペンド判定部２２８は、デッドロック情報読込部２２７によって出力されたデッドロック状況情報に基づいて判定処理を実行する。サスペンド判定部２２８の処理の詳細については後述する。 When receiving the request from the API hook unit 221, the suspend determination unit 228 determines whether or not to suspend the lock acquisition process executed in response to the lock acquisition request hooked by the API hook unit 221. At this time, the suspend determination unit 228 executes determination processing with reference to the thread correspondence table 223, the lock correspondence table 224, and the lock state table 133. Also, the suspend determination unit 228 executes determination processing based on the deadlock status information output by the deadlock information reading unit 227. Details of the processing of the suspend determination unit 228 will be described later.

図１３は、図１２に示すスレッド対応テーブル２２３のデータ構造の一例を示す。図１３に示すように、スレッド対応テーブル２２３には、スレッドを開始するときの関数アドレス（スレッド開始時の関数アドレス）及び当該スレッドを識別するためのスレッドＩＤが対応付けて保持される。 FIG. 13 shows an example of the data structure of the thread correspondence table 223 shown in FIG. As shown in FIG. 13, in the thread correspondence table 223, a function address when starting a thread (function address at the time of starting the thread) and a thread ID for identifying the thread are held in association with each other.

つまり、スレッド対応テーブル２２３には、スレッド作成要求において指定された関数アドレス及び当該スレッド作成要求に応じて作成されたスレッドのスレッドＩＤが対応付けて保持される。 That is, the thread correspondence table 223 holds the function address specified in the thread creation request and the thread ID of the thread created in response to the thread creation request in association with each other.

また、例えば同一の関数アドレスから開始されるスレッドが複数存在する場合には、当該スレッドのスレッドＩＤは当該スレッドの作成順にスレッド対応テーブル２２３において保持される。 For example, when there are a plurality of threads started from the same function address, the thread ID of the thread is stored in the thread correspondence table 223 in the order of creation of the thread.

なお、上記したようにスレッド対応テーブル２２３は、監視対象アプリケーション１１からスレッド作成要求が出力され、当該スレッド作成要求に応じてスレッドが作成された際にロック／スレッド対応付け部２２２によって更新される。 As described above, the thread correspondence table 223 is updated by the lock / thread association unit 222 when a thread creation request is output from the monitoring target application 11 and a thread is created according to the thread creation request.

図１３に示す例では、スレッド対応テーブル２２３には、スレッド開始時の関数アドレス「Ｆ１」に対応付けてスレッドＩＤ「Ｔ１→Ｔ２」が保持されている。これによれば、スレッド開始時の関数アドレスが「Ｆ１」であるスレッドを識別するスレッドＩＤが「Ｔ１」及び「Ｔ２」であることが示される。また、Ｔ１によって識別されるスレッド、Ｔ２によって識別されるスレッドの順に当該スレッドが作成されたことが示される。 In the example shown in FIG. 13, the thread correspondence table 223 holds the thread ID “T1 → T2” in association with the function address “F1” at the start of the thread. According to this, it is indicated that the thread IDs for identifying the thread whose function address is “F1” at the start of the thread are “T1” and “T2”. Further, it is shown that the threads are created in the order of the thread identified by T1 and the thread identified by T2.

また、スレッド対応テーブル２２３には、スレッド開始時の関数アドレス「Ｆ２」に対応付けてスレッドＩＤ「Ｔ３→Ｔ４→Ｔ５」が保持されている。これによれば、スレッド開始時の関数アドレスが「Ｆ２」であるスレッドを識別するスレッドＩＤが「Ｔ３」、「Ｔ４」及び「Ｔ５」であることが示される。また、Ｔ３によって識別されるスレッド、Ｔ４によって識別されるスレッド、Ｔ５によって識別されるスレッドの順に当該スレッドが作成されたことが示される。 The thread correspondence table 223 holds a thread ID “T3 → T4 → T5” in association with the function address “F2” at the start of the thread. This indicates that the thread IDs for identifying the thread whose function address at the start of the thread is “F2” are “T3”, “T4”, and “T5”. Further, it is shown that the threads are created in the order of the thread identified by T3, the thread identified by T4, and the thread identified by T5.

なお、スレッド対応テーブル２２３は、例えば監視対象アプリケーション１１が再起動された場合またはフェイルオーバが実行され、運用系ノードの処理を引き継ぐために監視対象アプリケーション１１が起動された場合等に、スレッドを前回の起動時と対応付けるために用いられる。したがって、同一の関数アドレスで開始されるスレッドの作成順序が異なる場合にはスレッドの対応付けに失敗するが、一般的なアプリケーションでは同一の関数のスレッド作成順序は同一となる実装であることが多い。特にＨＡクラスタシステムでは、サービスを継続するため、引継ぎ後も引継ぎ前と同じ動作をするように設定している場合がほとんどであるので、スレッド作成順序は同一となる。よって、本実施形態においては、同一の関数アドレスで開始されるスレッドの作成順序は同一であることを前提としている。 The thread correspondence table 223 indicates that, for example, when the monitoring target application 11 is restarted or when failover is executed and the monitoring target application 11 is started to take over the processing of the active node, the thread correspondence table 223 Used to correlate with startup. Therefore, if the creation order of threads started at the same function address is different, the thread association fails, but in general applications, the same function thread creation order is often the same. . In particular, in the HA cluster system, since the service is continued, in most cases, it is set to perform the same operation after the takeover as before the takeover, and therefore the thread creation order is the same. Therefore, in the present embodiment, it is assumed that the creation order of threads started at the same function address is the same.

図１４は、図１２に示すロック対応テーブル２２４のデータ構造の一例を示す。図１４に示すように、ロック対応テーブル２２４には、監視対象アプリケーション１１を構成するスレッドを識別するためのスレッドＩＤ及び当該スレッドからのロック作成要求に応じてロック作成部２１２によって作成されたロックのアドレスが対応付けて保持されている。 FIG. 14 shows an example of the data structure of the lock correspondence table 224 shown in FIG. As shown in FIG. 14, the lock correspondence table 224 includes a thread ID for identifying a thread constituting the monitoring target application 11 and a lock created by the lock creation unit 212 in response to a lock creation request from the thread. Addresses are stored in association with each other.

また、例えば同一のスレッドからのロック作成要求に応じてロック作成部２１２によって作成されたロックのアドレスが複数存在する場合には、ロックのアドレスは作成順にロック対応テーブル２２４において保持される。 For example, if there are a plurality of lock addresses created by the lock creation unit 212 in response to a lock creation request from the same thread, the lock addresses are held in the lock correspondence table 224 in the order of creation.

なお、上記したようにロック対応テーブル２２４は、監視対象アプリケーション１１を構成するスレッドからロック作成要求が出力され、当該ロック作成要求に応じてロックが作成された際にロック／スレッド対応付け部２２２によって更新される。 As described above, the lock correspondence table 224 is output by the lock / thread association unit 222 when a lock creation request is output from a thread constituting the monitoring target application 11 and a lock is created according to the lock creation request. Updated.

図１４に示す例では、ロック対応テーブル２２４には、スレッドＩＤ「Ｔ１」に対応付けてロックのアドレス「Ｌ１→Ｌ２」が保持されている。これによれば、ロックのアドレスが「Ｌ１」及び「Ｌ２」であるロックが、スレッドＩＤ「Ｔ１」によって識別されるスレッドからのロック作成要求に応じて作成されたことが示される。また、ロックのアドレスが「Ｌ１」であるロック、ロックのアドレスが「Ｌ２」であるロックの順に当該ロックが作成されたことが示される。 In the example illustrated in FIG. 14, the lock correspondence table 224 holds the lock address “L1 → L2” in association with the thread ID “T1”. According to this, it is indicated that the locks whose lock addresses are “L1” and “L2” were created in response to a lock creation request from the thread identified by the thread ID “T1”. Further, it is indicated that the locks are created in the order of the lock having the lock address “L1” and the lock having the lock address “L2”.

また、ロック対応テーブル２２４には、スレッドＩＤ「Ｔ２」に対応付けてロックのアドレス「Ｌ３→Ｌ４→Ｌ５」が保持されている。これによれば、ロックのアドレスが「Ｌ３」、「Ｌ４」及び「Ｌ５」であるロックが、スレッドＩＤ「Ｔ２」によって識別されるスレッドからのロック作成要求に応じて作成されたことが示される。また、ロックのアドレスが「Ｌ３」であるロック、ロックのアドレスが「Ｌ４」であるロック、ロックのアドレスが「Ｌ５」であるロックの順に当該ロックが作成されたことが示されている。 Further, the lock correspondence table 224 holds a lock address “L3 → L4 → L5” in association with the thread ID “T2”. According to this, it is indicated that the locks whose lock addresses are “L3”, “L4”, and “L5” were created in response to a lock creation request from the thread identified by the thread ID “T2”. . Further, it is shown that the locks are created in the order of a lock whose lock address is “L3”, a lock whose lock address is “L4”, and a lock whose lock address is “L5”.

なお、上記したスレッド対応テーブル２２３と同様に、ロック対応テーブル２２４においても、同一のスレッドにおけるロックの作成順序は同一であることを前提としている。 Similar to the thread correspondence table 223 described above, the lock correspondence table 224 also assumes that the lock creation order in the same thread is the same.

図１５は、共有ディスク３０に格納されるデッドロック状況情報のデータ構造の一例を示す。デッドロック状況情報には、スレッド、他のスレッドを止めているロック（ロック中であるロック）及び取得要求中のロックが対応付けて含まれる。また、デッドロック状況情報においては、当該デッドロック状況情報に含まれるスレッド、他のスレッドを止めているロック及び取得要求中のロックは、スレッドＩＤ及びロックのアドレスを用いることなく、上記した図１３及び図１４に示すような関数アドレス及びスレッド及びロックの作成順を用いて示される。ここでは、適宜図１３及び図１４を用いて説明する。 FIG. 15 shows an example of the data structure of deadlock status information stored in the shared disk 30. The deadlock status information includes a thread, a lock that stops other threads (a lock that is being locked), and a lock that is being requested for acquisition. Further, in the deadlock status information, the thread included in the deadlock status information, the lock that stops other threads, and the lock that is in the acquisition request are not shown in FIG. And the function address and thread and lock creation order as shown in FIG. Here, description will be made with reference to FIGS. 13 and 14 as appropriate.

ここで、図１５に示すデッドロック状況情報は、前述した図６に示すロックの取得状態のようなデッドロック（図６のデッドロック）を示すものとする。 Here, it is assumed that the deadlock status information shown in FIG. 15 indicates a deadlock (deadlock in FIG. 6) like the lock acquisition state shown in FIG.

図１５に示すように、図６のデッドロックを示すデッドロック状況情報には、スレッドとして「Ｆ１の１番目」、他のスレッドを止めているロックとして「Ｆ１の１番目のスレッドの１番目」、取得要求中のロックとして「Ｆ１の２番目のスレッドの１番目」が対応付けて含まれる。 As shown in FIG. 15, the deadlock status information indicating the deadlock of FIG. 6 includes “first of F1” as a thread and “first of the first thread of F1” as a lock that stops other threads. As a lock in the acquisition request, “first of the second thread of F1” is associated and included.

この「Ｆ１の１番目」は、スレッド対応テーブル２２３において関数アドレス「Ｆ１」に対応付けられているスレッドＩＤによって識別されるスレッドであって、１番目に作成されたスレッドを表している。つまり、図１４に示すスレッド対応テーブル２２３の例では、Ｔ１によって識別されるスレッドを表している。 The “first of F1” is a thread identified by the thread ID associated with the function address “F1” in the thread correspondence table 223, and represents the first created thread. That is, the example of the thread correspondence table 223 shown in FIG. 14 represents a thread identified by T1.

「Ｆ１の１番目のスレッドの１番目」は、図１５に示すロック対応テーブル２２４において上記した「Ｆ１の１番目のスレッド（ここでは、Ｔ１によって識別されるスレッド）」に対応付けられているロック（のアドレス）であって、１番目に作成されたロック（ここでは、ロックのアドレスが「Ｌ１」であるロック）を表している。 “First of the first thread of F1” is a lock associated with “the first thread of F1 (here, the thread identified by T1)” in the lock correspondence table 224 shown in FIG. Represents the first created lock (here, the lock whose lock address is “L1”).

「Ｆ１の２番目のスレッドの１番目」は、同様に、図１５に示すロック対応テーブル２２４において「Ｆ１の２番目のスレッド（ここでは、Ｔ２によって識別されるスレッド）」に対応付けられているロックであって、１番目に作成されたロック（ここでは、ロックのアドレスが「Ｌ３」であるロック）を表している。 Similarly, “the first of the second threads of F1” is associated with “the second thread of F1 (here, the thread identified by T2)” in the lock correspondence table 224 illustrated in FIG. This is a lock and represents the first created lock (here, a lock whose lock address is “L3”).

図１５における上記した以外のスレッド、他のスレッドを止めているロック及び取得要求中のロックについても同様である。 The same applies to threads other than those described above in FIG. 15, locks that stop other threads, and locks that are in acquisition requests.

デッドロック状況情報においては、デッドロックを上記したような表し方を用いることで、例えば複数のノード２００間において同一のスレッドであるにもかかわらず異なるスレッドＩＤが付与された場合等であっても、スレッドが作成された順を用いることで異なるノード間においてもスレッドの対応付けが可能となる。また、ロック（のアドレス）についても同様である。 In the deadlock status information, by using the way of expressing deadlock as described above, for example, even when different thread IDs are given among a plurality of nodes 200 even though they are the same thread, etc. By using the order in which threads are created, it is possible to associate threads between different nodes. The same applies to the lock (address).

次に、図１６を参照して、上記した新たにスレッドが作成される処理（スレッド作成処理）の流れについて説明する。このスレッド作成処理は、例えばノード２００において監視対象アプリケーション１１が再起動された場合やフェイルオーバが実行され、待機系ノードとして動作していたノード２００が運用系ノードとして動作する際に監視対象アプリケーション１１が起動された場合等に実行される。 Next, with reference to FIG. 16, the flow of the above-described process for creating a new thread (thread creation process) will be described. For example, when the monitored application 11 is restarted in the node 200 or when failover is performed, the thread creation process is executed when the monitored application 11 is operated as the active node. It is executed when activated.

まず、例えば監視対象アプリケーション１１が起動されると、当該監視対象アプリケーション１１は、スレッドの作成を要求するためにスレッド作成要求をＯＳ２１０に対して出力する。このスレッド作成要求は、監視対象アプリケーション１１がスレッドを開始するときの関数アドレスを指定することにより行う。 First, for example, when the monitoring target application 11 is activated, the monitoring target application 11 outputs a thread creation request to the OS 210 in order to request thread creation. This thread creation request is made by designating a function address when the monitoring target application 11 starts a thread.

ジャケットライブラリ２２０のＡＰＩフック部２２１は、監視対象アプリケーション１１からのスレッド作成要求をフックする。ＡＰＩフック部２２１は、スレッド作成要求がフックされると、ＯＳ２１０のスレッド作成ＡＰＩ（スレッド作成部２１１）を実行する（ステップＳ７１）。このとき、ＡＰＩフック部２２１は、スレッド作成要求をスレッド作成部２１１に出力する。 The API hook unit 221 of the jacket library 220 hooks a thread creation request from the monitoring target application 11. When the thread creation request is hooked, the API hook unit 221 executes the thread creation API (thread creation unit 211) of the OS 210 (step S71). At this time, the API hook unit 221 outputs a thread creation request to the thread creation unit 211.

ＯＳ２１０のスレッド作成部２１１は、ＡＰＩフック部２２１によって出力されたスレッド作成要求に応じて、スレッドの作成処理を実行する（ステップＳ７２）。このとき、スレッド作成部２１１は、スレッド作成要求において指定された関数アドレスに基づいてスレッドを作成する。 The thread creation unit 211 of the OS 210 executes a thread creation process in response to the thread creation request output by the API hook unit 221 (step S72). At this time, the thread creation unit 211 creates a thread based on the function address specified in the thread creation request.

スレッド作成部２１１は、スレッドの作成処理が実行されると、当該作成されたスレッドを識別するためのスレッドＩＤを、スレッド作成要求に対する応答として監視対象アプリケーション１１に対して出力する。 When the thread creation process is executed, the thread creation unit 211 outputs a thread ID for identifying the created thread to the monitoring target application 11 as a response to the thread creation request.

ジャケットライブラリ２２０のＡＰＩフック部２２１は、ＯＳ２１０（のスレッド作成部２１１）からのスレッドＩＤ（スレッド作成要求に対する応答）をフックする。ＡＰＩフック部２２１は、スレッドＩＤがフックされると、ロック／スレッド対応付け部２２２を実行する（ステップＳ７３）。このとき、ＡＰＩフック部２２１は、フックされたスレッドＩＤを、ロック／スレッド対応付け部２２２に対して出力する。 The API hook unit 221 of the jacket library 220 hooks the thread ID (response to the thread creation request) from the OS 210 (the thread creation unit 211). When the thread ID is hooked, the API hook unit 221 executes the lock / thread association unit 222 (step S73). At this time, the API hook unit 221 outputs the hooked thread ID to the lock / thread association unit 222.

ＡＰＩフック部２１１によってロック／スレッド対応付け部２２２が実行されると、ロック／スレッド対応付け部２２２は、スレッド対応テーブル２２３を更新する（ステップＳ７４）。ロック／スレッド対応付け部２２２は、スレッド作成要求において指定された関数アドレス及びＡＰＩフック部２２１によって出力されたスレッドＩＤを対応付けてスレッド対応テーブル２２３に登録する。 When the lock / thread association unit 222 is executed by the API hook unit 211, the lock / thread association unit 222 updates the thread association table 223 (step S74). The lock / thread association unit 222 registers the function address specified in the thread creation request and the thread ID output by the API hook unit 221 in association with each other in the thread association table 223.

上記した処理が実行された後、ＡＰＩフック部２２１によりスレッド作成要求に対する応答としてのスレッドＩＤが監視対象アプリケーション１１に返されると処理が終了される。 After the above process is executed, the process is terminated when the API hook unit 221 returns a thread ID as a response to the thread creation request to the monitoring target application 11.

次に、図１７を参照して、上記した新たにロックが作成される処理（ロック作成処理）の流れについて説明する。このロック作成処理は、例えば上記したスレッド作成処理においてスレッドが作成された後に実行される。 Next, with reference to FIG. 17, the flow of the above-described process for creating a new lock (lock creation process) will be described. This lock creation process is executed, for example, after a thread is created in the thread creation process described above.

まず、例えば上記した図１６に示すようにスレッドが作成されると、当該スレッドは、必要に応じて新たなロックの作成を要求するためにロック作成要求を出力する。 First, for example, when a thread is created as shown in FIG. 16 described above, the thread outputs a lock creation request to request creation of a new lock as necessary.

ジャケットライブラリ２２０のＡＰＩフック部２２１は、監視対象アプリケーション１１からのロック作成要求をフックする。ＡＰＩフック部２２１は、ロック作成要求がフックされると、ＯＳ２１０のロック作成ＡＰＩ（ロック作成部２１２）を実行する（ステップＳ８１）。このとき、ＡＰＩフック部２２１は、ロック作成要求をロック作成部２１２に出力する。 The API hook unit 221 of the jacket library 220 hooks a lock creation request from the monitoring target application 11. When the lock creation request is hooked, the API hook unit 221 executes the lock creation API (lock creation unit 212) of the OS 210 (step S81). At this time, the API hook unit 221 outputs a lock creation request to the lock creation unit 212.

ＯＳ２１０のロック作成部２１２は、ＡＰＩフック部２２１によって出力されたロック作成要求に応じて、ロックの作成処理を実行する（ステップＳ８２）。 The lock creation unit 212 of the OS 210 executes lock creation processing in response to the lock creation request output by the API hook unit 221 (step S82).

ロック作成部２１２は、ロックの作成処理が実行されると、当該作成されたロックのアドレスを、ロック作成要求に対する応答として監視対象アプリケーション１１に対して出力する。 When the lock creation process is executed, the lock creation unit 212 outputs the address of the created lock to the monitoring target application 11 as a response to the lock creation request.

ＡＰＩフック部２２１は、ＯＳ２１０（ロック作成部２１２）からのロックのアドレス（ロック作成要求に対する応答）をフックする。ＡＰＩフック部２２１は、ロックのアドレスがフックされると、ロック／スレッド対応付け部２２２を実行する（ステップＳ８３）。このとき、ＡＰＩフック部２２１は、フックされたロックのアドレスを、ロック／スレッド対応付け部２２２に対して出力する。 The API hook unit 221 hooks a lock address (response to a lock creation request) from the OS 210 (lock creation unit 212). When the lock address is hooked, the API hook unit 221 executes the lock / thread association unit 222 (step S83). At this time, the API hook unit 221 outputs the address of the hooked lock to the lock / thread association unit 222.

ＡＰＩフック部２１１によってロック／スレッド対応付け部２２２が実行されると、ロック／スレッド対応付け部２２２は、ロック作成要求の要求元となるスレッドを識別するためのスレッドＩＤをＯＳ２１０に対して問い合わせる。これにより、ロック／スレッド対応付け部２２２は、ロック作成要求の要求元となるスレッドのスレッドＩＤを取得する（ステップＳ８４）。 When the lock / thread association unit 222 is executed by the API hook unit 211, the lock / thread association unit 222 inquires of the OS 210 about a thread ID for identifying a thread that is a request source of the lock creation request. As a result, the lock / thread association unit 222 acquires the thread ID of the thread that is the request source of the lock creation request (step S84).

次に、ロック／スレッド対応付け部２２２は、ロック対応テーブル２２４を更新する（ステップＳ８５）。ロック／スレッド対応付け部２２２は、ＡＰＩフック部２２１によって出力されたロックのアドレス及び取得されたスレッドＩＤを対応付けてロック対応テーブル２２４に登録する。 Next, the lock / thread association unit 222 updates the lock correspondence table 224 (step S85). The lock / thread association unit 222 registers the lock address output by the API hook unit 221 and the acquired thread ID in the lock correspondence table 224 in association with each other.

上記した処理が実行された後、ＡＰＩフック部２２１によりロック作成要求に対する応答としてロックのアドレスが監視対象アプリケーション１１に返されると処理が終了される。 After the processing described above is executed, the processing is ended when the lock address is returned to the monitoring target application 11 as a response to the lock creation request by the API hook unit 221.

次に、図１８を参照して、監視対象アプリケーション１１を構成するスレッドによってロックが取得される際の処理の流れについて説明する。 Next, with reference to FIG. 18, the flow of processing when a lock is acquired by a thread constituting the monitoring target application 11 will be described.

まず、監視対象アプリケーション１１を構成するスレッドは、資源を占有するためのロック取得要求をＯＳ２１０に対して出力する。このロック取得要求には、当該ロック取得要求に応じて取得されるロックのアドレスが含まれる。 First, the thread configuring the monitoring target application 11 outputs a lock acquisition request for occupying resources to the OS 210. This lock acquisition request includes the address of the lock acquired in response to the lock acquisition request.

ジャケットライブラリ２２０のＡＰＩフック部２２１は、監視対象アプリケーション１１を構成するスレッドからのロック取得要求をフックする。ＡＰＩフック部２２１は、ロック取得要求がフックされると、サスペンド判定部２２８を実行する（ステップＳ９１）。このとき、ＡＰＩフック部２２１は、フックされたロック取得要求に含まれるロックのアドレスをサスペンド判定部２２８に出力する。 The API hook unit 221 of the jacket library 220 hooks a lock acquisition request from a thread constituting the monitoring target application 11. When the lock acquisition request is hooked, the API hook unit 221 executes the suspend determination unit 228 (step S91). At this time, the API hook unit 221 outputs the lock address included in the hooked lock acquisition request to the suspend determination unit 228.

ＡＰＩフック部２２１によってサスペンド判定部２２８が実行されると、サスペンド判定部２２８は、ＡＰＩフック部２２１によってフックされたロック取得要求の要求元であるスレッドを識別するためのスレッドＩＤをＯＳ２１０に問い合わせる。これにより、サスペンド判定部２２８は、ロック取得要求の要求元であるスレッドのスレッドＩＤを取得する（ステップＳ９２）。 When the suspend determination unit 228 is executed by the API hook unit 221, the suspend determination unit 228 inquires the OS 210 about a thread ID for identifying the thread that is the request source of the lock acquisition request hooked by the API hook unit 221. Accordingly, the suspend determination unit 228 acquires the thread ID of the thread that is the request source of the lock acquisition request (step S92).

サスペンド判定部２２８は、ＡＰＩフック部２２１によって出力されたロックのアドレス、取得されたスレッドＩＤ及び共有ディスク３０に格納されているデッドロック状況情報に基づいて、ロック取得要求に応じて実行されるロック取得処理をサスペンドすべきか否かを判定する処理（以下、サスペンド判定処理と表記）を実行する（ステップＳ９３）。このとき、サスペンド判定部２２８は、スレッド対応テーブル２２３、ロック対応テーブル２２４及びロック状態テーブル１３３を参照してサスペンド判定処理を実行する。このサスペンド判定処理の詳細については後述する。 The suspend determination unit 228 executes the lock executed in response to the lock acquisition request based on the lock address output by the API hook unit 221, the acquired thread ID, and the deadlock status information stored in the shared disk 30. A process of determining whether to suspend the acquisition process (hereinafter referred to as a suspend determination process) is executed (step S93). At this time, the suspend determination unit 228 executes the suspend determination process with reference to the thread correspondence table 223, the lock correspondence table 224, and the lock state table 133. Details of the suspend determination process will be described later.

次に、前述した図７に示すステップＳ１１〜ステップＳ２０に相当するステップＳ９４〜ステップＳ１０３の処理が実行され、処理が終了される。 Next, steps S94 to S103 corresponding to steps S11 to S20 shown in FIG. 7 described above are executed, and the process is terminated.

一方、ステップＳ９８においてデッドロック検出部２２５によってデッドロックの発生が検出されたと判定された場合、デッドロック検出部２２５は、デッドロック情報作成部２２６を実行する（ステップＳ１０４）。デッドロック検出部２２５は、デッドロック情報作成部２２６に対して、デッドロック状況情報の作成を要求する。 On the other hand, if it is determined in step S98 that the deadlock has been detected by the deadlock detection unit 225, the deadlock detection unit 225 executes the deadlock information creation unit 226 (step S104). The deadlock detection unit 225 requests the deadlock information creation unit 226 to create deadlock status information.

このとき、デッドロック検出部２２５によってデッドロックの発生が検出されているため、ロック状態テーブル１３３に登録されているロックの取得状態はデッドロックを表している。よって、デッドロック情報作成部２２６は、ロック状態テーブル１３３に登録されているロックの取得状態に基づいて、デッドロックを示すデッドロック状況情報を作成する。このとき、デッドロック情報作成部２２６は、スレッド対応テーブル２２３及びロック対応テーブル２２４を参照して、上記した図１５に示すように関数アドレス及びスレッド及びロックの作成順を用いてデッドロック状況情報を作成する。 At this time, since the occurrence of a deadlock is detected by the deadlock detector 225, the lock acquisition state registered in the lock state table 133 represents a deadlock. Therefore, the deadlock information creation unit 226 creates deadlock status information indicating a deadlock based on the lock acquisition state registered in the lock state table 133. At this time, the deadlock information creation unit 226 refers to the thread correspondence table 223 and the lock correspondence table 224, and uses the function address and the thread and lock creation order as shown in FIG. create.

デッドロック情報作成部２２６は、作成されたデッドロック状況情報を共有ディスク３０に格納する。 The deadlock information creation unit 226 stores the created deadlock status information in the shared disk 30.

次に、前述した図７に示すステップＳ２１〜ステップＳ２３の処理に相当するステップＳ１０６〜ステップＳ１０８の処理が実行され、処理が終了される。 Next, the process of step S106-step S108 equivalent to the process of step S21-step S23 shown in FIG. 7 mentioned above is performed, and a process is complete | finished.

次に、図１９のフローチャートを参照して、上記したサスペンド判定部２２８によるサスペンド判定処理の処理手順について説明する。 Next, the processing procedure of the suspend determination process by the suspend determination unit 228 will be described with reference to the flowchart of FIG.

このサスペンド判定部２２８によるサスペンド判定処理は、上記したようにサスペンド判定部２２８がＡＰＩフック部２２１からの要求を受けると実行される。ＡＰＩフック部２２１は、監視対象アプリケーション１１からのロック取得要求がフックされた場合にサスペンド判定部２２８に対して要求を出力する。このとき、サスペンド判定部２２８は、ＡＰＩフック部２２１からロック取得要求に含まれるロックのアドレスを取得する。 The suspend determination process by the suspend determination unit 228 is executed when the suspend determination unit 228 receives a request from the API hook unit 221 as described above. The API hook unit 221 outputs a request to the suspend determination unit 228 when a lock acquisition request from the monitoring target application 11 is hooked. At this time, the suspend determination unit 228 acquires the lock address included in the lock acquisition request from the API hook unit 221.

サスペンド判定部２２８は、デッドロック情報読込部２２７によって共有ディスク３０からデッドロック状況情報が読み込まれている（読み込み済）か否かを判定する（ステップＳ１１１）。 The suspend determination unit 228 determines whether or not deadlock status information has been read from the shared disk 30 by the deadlock information reading unit 227 (read already) (step S111).

デッドロック状況情報が読み込まれていないと判定された場合（ステップＳ１１１のＮＯ）、サスペンド判定部２２８は、デッドロック情報読込部２２７を実行する（ステップＳ１１２）。サスペンド判定部２２８は、デッドロック情報読込部２２７に対して、デッドロック状況情報の読み込みを要求する。デッドロック情報読込部２２７は、サスペンド判定部２２８からの要求に応じて共有ディスク３０に格納されているデッドロック状況情報を読み込む。 When it is determined that the deadlock status information has not been read (NO in step S111), the suspend determination unit 228 executes the deadlock information reading unit 227 (step S112). The suspend determination unit 228 requests the deadlock information reading unit 227 to read deadlock status information. The deadlock information reading unit 227 reads deadlock status information stored in the shared disk 30 in response to a request from the suspend determination unit 228.

サスペンド判定部２２８は、デッドロック情報読込部２２７によって読み込まれたデッドロック状況情報を取得する（ステップＳ１１３）。 The suspend determination unit 228 acquires the deadlock status information read by the deadlock information reading unit 227 (step S113).

次に、サスペンド判定部２２８は、ＡＰＩフック部２２１によってフックされたロック取得要求に応じたロック取得処理をサスペンドすべきか否かを判定する（ステップＳ１１４）。 Next, the suspend determination unit 228 determines whether or not to suspend the lock acquisition process according to the lock acquisition request hooked by the API hook unit 221 (step S114).

サスペンド判定部２２８は、例えば予め定められた第１及び第２の条件を満たす場合に例えば今後デッドロックが発生する可能性があるとして、ロック取得処理をサスペンドすべきと判定する。このとき、サスペンド判定部２２８は、取得されたロックのアドレス、スレッドＩＤ及びデッドロック状況情報に基づいて判定処理を実行する。また、サスペンド判定部２２８は、スレッド対応テーブル２２３、ロック対応テーブル２２４及びロック状態テーブル１３３を参照して判定処理を実行する。なお、ここでは条件が２つ（第１及び第２の条件）であるものとして説明するが、当該条件は２つに限られない。つまり、条件は、２つ以上であっても構わない。以下、第１及び第２の条件の一例について説明する。 The suspend determination unit 228 determines that the lock acquisition process should be suspended, assuming that a deadlock may occur in the future, for example, when predetermined first and second conditions are satisfied. At this time, the suspend determination unit 228 executes determination processing based on the acquired lock address, thread ID, and deadlock status information. The suspend determination unit 228 executes determination processing with reference to the thread correspondence table 223, the lock correspondence table 224, and the lock state table 133. In addition, although it demonstrates as what has two conditions (1st and 2nd conditions) here, the said conditions are not restricted to two. That is, the number of conditions may be two or more. Hereinafter, an example of the first and second conditions will be described.

なお、サスペンド判定部２２８による判定処理においては、取得されたスレッドＩＤによって識別されるスレッドがデッドロック状況情報に含まれるスレッドである、つまり、当該スレッドがデッドロック状況情報によって示されるデッドロック時に実行中であることを前提としている。また、ＡＰＩフック部２２１によってフックされたロック取得要求に応じて取得されるロック（取得されたスレッドＩＤ及びロックのアドレスの組み合わせによるロックの取得状態）がデッドロック状況情報に含まれる他のスレッドを止めている（止めることになる）ロックであることを前提としている。 In the determination process by the suspend determination unit 228, the thread identified by the acquired thread ID is a thread included in the deadlock status information, that is, the thread is executed at the time of deadlock indicated by the deadlock status information. It is assumed that it is in the middle. In addition, other threads in which the lock acquired in response to the lock acquisition request hooked by the API hook unit 221 (the lock acquisition state based on the combination of the acquired thread ID and the lock address) is included in the deadlock status information. It is assumed that it is a lock that is stopped (will stop).

上記した第１の条件は、取得されたデッドロック状況情報において、取得されたスレッドＩＤによって識別されるスレッドに対応付けられている「取得要求中のロック」を止めることになるロック（デッドロック状況情報に含まれる「他のスレッドを止めているロック」）が、当該ロックに対応付けられている「スレッド」によって取得されていることである。 The first condition described above is that the acquired deadlock status information is a lock (deadlock status) that stops the “lock during acquisition request” associated with the thread identified by the acquired thread ID. The “lock that stops other threads”) included in the information is acquired by the “thread” associated with the lock.

また、上記した第２の条件は、ＡＰＩフック部２２１によってフックされたロック取得要求に応じて取得されるロックによって止められることになるデッドロック状況情報に含まれる「取得要求中のロック」に対応付けられているスレッドが、当該スレッドに対応付けて当該デッドロック状況情報に含まれる「他のスレッドを止めているロック」を取得していることである。 The second condition described above corresponds to “lock during acquisition request” included in the deadlock status information that is stopped by the lock acquired in response to the lock acquisition request hooked by the API hook unit 221. The attached thread acquires the “lock that stops other threads” included in the deadlock status information in association with the thread.

サスペンド判定部２２８は、上記した第１及び第２の条件を満たす場合には、取得されたデッドロック状況情報によって示されるデッドロックと同じ状況に陥る可能性があると判定する。よって、この場合には、ＡＰＩフック部２２１によってフックされたロック取得要求に応じたロック取得処理をサスペンドすべきであると判定される。 When the first and second conditions are satisfied, the suspend determination unit 228 determines that there is a possibility of falling into the same situation as the deadlock indicated by the acquired deadlock situation information. Therefore, in this case, it is determined that the lock acquisition process corresponding to the lock acquisition request hooked by the API hook unit 221 should be suspended.

ここで、図２０を参照して、上記した図１９に示すステップＳ１１４の処理について具体的に説明する。図２０は、ロック状態テーブル１３３におけるＡＰＩフック部２２１によってフックされたロック取得要求に応じて取得されるロック及び既に取得されているロックの関係の一例を表す図である。 Here, with reference to FIG. 20, the process of step S114 shown in FIG. 19 will be specifically described. FIG. 20 is a diagram illustrating an example of a relationship between a lock acquired in response to a lock acquisition request hooked by the API hook unit 221 in the lock state table 133 and an already acquired lock.

なお、サスペンド判定部２２８によって取得されたデッドロック状況情報は、前述した図６のデッドロックを示すものとして説明する。なお、図２０及び前述した図６において、スレッドＩＤ「Ｔ１」〜「Ｔ４」及びロックのアドレス「Ｌ１」〜「Ｌ４」はそれぞれ対応しているものとして説明する。 The deadlock status information acquired by the suspend determination unit 228 will be described as indicating the deadlock of FIG. In FIG. 20 and FIG. 6 described above, it is assumed that the thread IDs “T1” to “T4” and the lock addresses “L1” to “L4” correspond to each other.

また、ＡＰＩフック部２２１によってフックされたロック取得要求に応じて取得されるロックのアドレス（サスペンド判定処理部２２８によって取得されたロックのアドレス）は「Ｌ３」であり、当該ロック取得要求の要求元であるスレッドを識別するためのスレッドＩＤ（サスペンド判定処理部２２８によって取得されたスレッドＩＤ）は「Ｔ２」であるものとする。つまり、図２０における斜線部のロック取得状態が「ロック中」になる場合を想定している。以下、ＡＰＩフック部２２１によってフックされたロック取得要求に応じて取得されるロック（図２０における斜線部）を対象ロックと称する。 The address of the lock acquired in response to the lock acquisition request hooked by the API hook unit 221 (the address of the lock acquired by the suspend determination processing unit 228) is “L3”, and the request source of the lock acquisition request It is assumed that the thread ID (thread ID acquired by the suspend determination processing unit 228) for identifying the thread is “T2”. That is, it is assumed that the lock acquisition state in the shaded area in FIG. 20 is “locking”. Hereinafter, the lock (hatched portion in FIG. 20) acquired in response to the lock acquisition request hooked by the API hook unit 221 is referred to as a target lock.

まず、図２０に示す例では、サスペンド判定部２２８によって取得されたスレッドＩＤ「Ｔ２」によって識別されるスレッド（以下、単に「Ｔ２」と表記）がデッドロック状況情報によって示されるデッドロック時（以下、単にデッドロック時と表記）において実行中である。また、対象ロックがデッドロック時に他のスレッド（ここでは、「Ｔ１」）を止めることになるロックである。したがって、対象ロックは、上記した前提を満たすものとする。 First, in the example illustrated in FIG. 20, the thread identified by the thread ID “T2” acquired by the suspend determination unit 228 (hereinafter simply referred to as “T2”) is indicated by deadlock status information (hereinafter referred to as “T2”). , Simply described as deadlock). In addition, the target lock is a lock that stops other threads (here, “T1”) when deadlock occurs. Therefore, the target lock satisfies the above-described assumption.

ここで、図２０及び前述した図６に示すように、デッドロック時に「Ｔ２」で取得要求中になるロック（ここでは、ロックのアドレスが「Ｌ４」であるロック）を止めることになるロックが、「Ｔ４」によって取得されている。したがって、対象ロックは、上記した第１の条件を満たす。 Here, as shown in FIG. 20 and FIG. 6 described above, there is a lock that stops a lock that is being requested for acquisition at “T2” at the time of deadlock (here, a lock whose lock address is “L4”). , “T4”. Therefore, the target lock satisfies the first condition described above.

また、図２０及び前述した図６に示すように、対象ロックによって止められることになるロック（ここでは、ロックのアドレスが「Ｌ３」であるロック）を取得要求中となるスレッド（ここでは、「Ｔ１」）が、デッドロック時に他のスレッド（ここでは、「Ｔ４」）を止めることになるロック（ここでは、ロックのアドレスが「Ｌ１」であるロック）を取得している。したがって、対象ロックは、上記した第２の条件を満たす。 Further, as shown in FIG. 20 and FIG. 6 described above, a thread (here, “a lock whose lock address is“ L3 ”) that is to be stopped by the target lock” (here “ T1 ") has acquired a lock (here, a lock whose lock address is" L1 ") that stops another thread (here," T4 ") at the time of deadlock. Therefore, the target lock satisfies the second condition described above.

よって、対象ロックが取得された場合において、「Ｔ１」がロックのアドレスが「Ｌ１」であるロック（以下、「Ｌ１」のロックと表記）を解放する前に「Ｌ３」のロックを、「Ｔ２」が「Ｌ３」のロックを解放する前に「Ｌ４」のロックを、「Ｔ４」が「Ｌ４」のロックを解放する前に「Ｌ１」のロックを取得しようとした（ロック取得要求を出力した）場合、前述した図６に示すデッドロックと同じ状況に陥る可能性があるため当該対象ロックのロック取得処理がサスペンドされる。 Therefore, when the target lock is acquired, the lock of “L3” is released before the lock whose lock address is “L1” (hereinafter referred to as “L1” lock) is “T2”. Tried to acquire the lock of “L4” before releasing the lock of “L3”, and acquired the lock of “L1” before releasing the lock of “L4” to “T4” (the lock acquisition request was output ), There is a possibility of falling into the same situation as the deadlock shown in FIG.

なお、上記した第１及び第２の条件は一例であるため、ロック取得処理をサスペンドするための条件として当該第１及び第２の条件以外の条件によってサスペンド判定処理が実行される構成であっても構わない。 Since the first and second conditions described above are examples, the suspend determination process is executed under conditions other than the first and second conditions as conditions for suspending the lock acquisition process. It doesn't matter.

再び図１９に戻ると、サスペンドすべきであると判定された場合（ステップＳ１１４のＹＥＳ）、ＡＰＩフック部２２１によってフックされたロック取得要求に応じたロック取得処理は予め定められた時間（一定時間）サスペンドされる（ステップＳ１１５）。 Referring back to FIG. 19 again, if it is determined that the suspension should be suspended (YES in step S114), the lock acquisition process in response to the lock acquisition request hooked by the API hook unit 221 is performed for a predetermined time (a fixed time). ) Suspended (step S115).

次に、上記したステップＳ１１５において一定時間サスペンドされた後、当該ステップＳ１１５の処理がｎ回実行されたか否かが判定される（ステップＳ１１６）。このｎは、予め定められた値である。 Next, after suspending for a predetermined time in step S115, it is determined whether or not the process of step S115 has been executed n times (step S116). This n is a predetermined value.

ステップＳ１１５の処理がｎ回実行されたと判定された場合（ステップＳ１１６のＹＥＳ）、処理は終了される。つまり、ＡＰＩフック部２２１によってフックされたロック取得要求に応じたロック取得処理が実行される。一方、ステップＳ１１５の処理がｎ回実行されていないと判定された場合（ステップＳ１１６のＮＯ）、ステップＳ１１４に戻って処理が繰り返される。 If it is determined that the process of step S115 has been executed n times (YES in step S116), the process ends. That is, the lock acquisition process corresponding to the lock acquisition request hooked by the API hook unit 221 is executed. On the other hand, when it determines with the process of step S115 not being performed n times (NO of step S116), it returns to step S114 and a process is repeated.

ステップＳ１１１においてデッドロック状況情報が読み込まれていると判定された場合には、ステップＳ１１４の処理が実行される。 If it is determined in step S111 that deadlock status information has been read, the process of step S114 is executed.

ステップＳ１１４においてサスペンドすべきでないと判定された場合には、サスペンドされることなく処理が終了される。つまり、ＡＰＩフック部２２１によってフックされたロック取得要求に応じたロック取得処理が実行される。 If it is determined in step S114 that it should not be suspended, the process ends without being suspended. That is, the lock acquisition process corresponding to the lock acquisition request hooked by the API hook unit 221 is executed.

ここでは、ステップＳ１１５の処理（サスペンド処理）がｎ回実行された場合に当該サスペンド処理が終了されるものとして説明したが、ステップＳ１１４においてサスペンドすべきでないと判定されるまでロック取得処理が実行されない、つまり、サスペンド処理が継続される構成であっても構わない。 Here, although it has been described that the suspend process is terminated when the process (suspend process) in step S115 is executed n times, the lock acquisition process is not executed until it is determined in step S114 that the suspend process should not be performed. In other words, the configuration may be such that the suspend process is continued.

なお、共有ディスク３０に複数のデッドロック状況情報が格納されている場合には、当該複数のデッドロック状況情報の全てについて上記した図１９に示す処理が実行される。 When a plurality of deadlock status information is stored in the shared disk 30, the above-described processing shown in FIG. 19 is executed for all the plurality of deadlock status information.

上記したように本実施形態においては、例えばノード２００においてデッドロックの発生が検出された場合には、当該デッドロックを示すデッドロック状況情報が共有ディスク４００に格納される。本実施形態においては、ノード２００のＡＰＩフック部２２１によってロック取得要求がフックされると、当該ロック取得要求に応じたロック取得処理が実行されるとデッドロックに陥る可能性があるか否かをデッドロック状況情報に基づいて判定することによって、当該ロック取得処理をサスペンドすることができる。これにより、本実施形態においては、複数のノード２００のいずれかにおいて発生したデッドロックと同じ条件のデッドロックが発生する確率を低減することが可能となる。このように、本実施形態においては、アプリケーション障害が発生する確率を低減することでクラスタシステムの可用性を向上させることが可能となる。 As described above, in this embodiment, for example, when occurrence of a deadlock is detected in the node 200, deadlock status information indicating the deadlock is stored in the shared disk 400. In the present embodiment, when a lock acquisition request is hooked by the API hook unit 221 of the node 200, whether or not there is a possibility of falling into a deadlock when a lock acquisition process according to the lock acquisition request is executed. By determining based on the deadlock status information, the lock acquisition process can be suspended. Thereby, in the present embodiment, it is possible to reduce the probability that a deadlock having the same condition as a deadlock that has occurred in any of the plurality of nodes 200 occurs. Thus, in the present embodiment, it is possible to improve the availability of the cluster system by reducing the probability that an application failure will occur.

また、本実施形態においては、デッドロックが発生する可能性があると判定された場合に、一定時間のみロック取得処理をサスペンドすることで、監視対象アプリケーション１１の動作に与える影響を最小限に抑えることが可能である。 In the present embodiment, when it is determined that there is a possibility of deadlock, the lock acquisition process is suspended only for a predetermined time, thereby minimizing the influence on the operation of the monitored application 11. It is possible.

なお、本願発明は、上記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。更に、異なる実施形態に亘る構成要素を適宜組合せてもよい。 Note that the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine the component covering different embodiment suitably.

本発明の第１の実施形態に係るクラスタシステムの機能構成を示すブロック図。1 is a block diagram showing a functional configuration of a cluster system according to a first embodiment of the present invention. 図１に示すノード１０の主として機能構成を示すブロック図。The block diagram which mainly shows the function structure of the node 10 shown in FIG. 図２に示すロック状態テーブル１３３のデータ構造の一例を示す図。The figure which shows an example of the data structure of the lock state table 133 shown in FIG. ロック状態管理部１３２によるロック状態テーブル１３３の更新について説明するための図。The figure for demonstrating the update of the lock state table 133 by the lock state management part 132. FIG. デッドロック検出部１３４のデッドロックの発生の検出処理の処理手順を示すフローチャート。The flowchart which shows the process sequence of the detection process of the occurrence of the deadlock of the deadlock detection part 134. デッドロックが発生する場合のロックの取得状態を表すロック状態テーブル１３３の一例を示す図。The figure which shows an example of the lock state table 133 showing the acquisition state of a lock | rock when a deadlock occurs. 監視対象アプリケーション１１を構成するスレッドによってロックが取得される際の処理の流れについて説明するための図。The figure for demonstrating the flow of a process when a lock | rock is acquired by the thread | sled which comprises the monitoring object application. 監視対象アプリケーション１１を構成するスレッドによってロックが解放される際の処理の流れについて説明するための図。The figure for demonstrating the flow of a process when a lock is released by the thread | sled which comprises the monitoring object application. 本発明の第２の実施形態に係るクラスタシステムに備えられるノード１００の主として機能構成を示すブロック図。The block diagram which mainly shows a function structure of the node 100 with which the cluster system which concerns on the 2nd Embodiment of this invention is equipped. プロセス１１０ａを構成するスレッドによってロックが取得される際の処理の流れについて説明するための図。The figure for demonstrating the flow of a process when a lock is acquired by the thread | sled which comprises the process 110a. プロセス１１０ａを構成するスレッドによってロックが解放される際に処理の流れについて説明するための図。The figure for demonstrating the flow of a process when a lock is released by the thread | sled which comprises the process 110a. 本発明の第３の実施形態に係るクラスタシステムのノード２００の主として機能構成を示すブロック図。The block diagram which mainly shows the function structure of the node 200 of the cluster system which concerns on the 3rd Embodiment of this invention. 図１２に示すスレッド対応テーブル２２３のデータ構造の一例を示す図。The figure which shows an example of the data structure of the thread | sled corresponding table 223 shown in FIG. 図１２に示すロック対応テーブル２２４のデータ構造の一例を示す図。The figure which shows an example of the data structure of the lock corresponding | compatible table 224 shown in FIG. 共有ディスク３０に格納されるデッドロック状況情報のデータ構造の一例を示す図。The figure which shows an example of the data structure of the deadlock status information stored in the shared disk 30. スレッド作成処理の流れについて説明するための図。The figure for demonstrating the flow of a thread creation process. ロック作成処理の流れについて説明するための図。The figure for demonstrating the flow of a lock creation process. 監視対象アプリケーション１１を構成するスレッドによってロックが取得される際の処理の流れについて説明するための図。The figure for demonstrating the flow of a process when a lock | rock is acquired by the thread | sled which comprises the monitoring object application. サスペンド判定部２２８によるサスペンド判定処理の処理手順を示すフローチャート。The flowchart which shows the process sequence of the suspend determination process by the suspend determination part 228. ロック状態テーブル１３３におけるＡＰＩフック部２２１によってフックされたロック取得要求に応じて取得されるロック及び既に取得されているロックの関係の一例を表す図。The figure showing an example of the relationship between the lock acquired according to the lock acquisition request hooked by the API hook part 221 in the lock state table 133, and the already acquired lock.

Explanation of symbols

１０，１００，２００…ノード（計算機）、１１…監視対象アプリケーション、１２，２１０…ＯＳ、１３，１３０ａ，１３０ｂ，２２０…ジャケットライブラリ、１４…ＨＡクラスタデーモン（フェイルオーバ手段）、３０…共有ディスク（格納手段）、１２１…ロック取得部、１２２…ロック解放部、１３１，２２１…ＡＰＩフック部（取得手段）、１３２，１３６…ロック状態管理部（ロック管理手段）、１２３，１３３…ロック状態テーブル、１３４，１３７，２２５…デッドロック検出部、１３５…デッドロック通知部、２１１…スレッド作成部、２１２…ロック作成部、２２２…ロック／スレッド対応付け部、２２３…スレッド対応テーブル、２２４…ロック対応テーブル、２２６…デッドロック情報作成部、２２７…デッドロック情報読込部、２２８…サスペンド判定部。 DESCRIPTION OF SYMBOLS 10,100,200 ... Node (computer), 11 ... Monitored application, 12, 210 ... OS, 13, 130a, 130b, 220 ... Jacket library, 14 ... HA cluster daemon (failover means), 30 ... Shared disk (storage) Means), 121 ... Lock acquisition part, 122 ... Lock release part, 131, 221 ... API hook part (acquisition means), 132, 136 ... Lock state management part (lock management means), 123, 133 ... Lock state table, 134 , 137, 225 ... deadlock detection unit, 135 ... deadlock notification unit, 211 ... thread creation unit, 212 ... lock creation unit, 222 ... lock / thread association unit, 223 ... thread correspondence table, 224 ... lock correspondence table, 226 ... Deadlock information creation unit, 227 ... Deadlock Click information reading unit, 228 ... suspend determination unit.

Claims

A computer that executes a plurality of threads that share resources;
The calculator is
An acquisition means for acquiring a lock acquisition request for requesting acquisition of a lock for occupying the resource from each of the plurality of threads;
Based on the acquired lock acquisition request, a lock state table in which acquisition states of locks by the plurality of threads are registered;
A computer system comprising: a detecting unit that detects occurrence of a deadlock based on the acquired lock acquisition request and a lock acquisition state registered in the lock state table.

The computer system according to claim 1, further comprising a failover unit that takes over the processing of the computer to a computer different from the computer when the detection unit detects the occurrence of a deadlock.

Further comprising storage means accessed by the computer;
The calculator is
When the occurrence of the deadlock is detected, the deadlock status information indicating the deadlock represented by the lock acquisition status registered in the lock status table is created, and the created deadlock status information is Creating means for storing in the storage means;
A read unit that reads deadlock status information stored in the storage unit when a lock acquisition request is acquired from each of the plurality of threads by the acquisition unit;
The lock registered in the lock state table indicates whether there is a possibility of falling into a deadlock indicated by the read deadlock status information when a lock is acquired in response to the acquired lock acquisition request. Determining means for determining based on the acquisition state of
The computer system according to claim 2, further comprising: a suspend unit that suspends lock acquisition processing according to the acquired lock acquisition request when it is determined that there is a possibility of falling into the deadlock.

4. The computer system according to claim 3, wherein the suspend unit suspends for a predetermined time.

When a lock is acquired in response to the acquired lock acquisition request, the lock acquisition state is registered in the lock state table as the acquisition state of the lock, and a lock acquired in response to the lock acquisition request is registered. Lock management means for registering in the lock state table that the lock is being acquired as the lock acquisition state when acquired by a thread other than the thread that is the request source of the lock acquisition request;
The detecting means refers to the lock state table to detect the occurrence of the deadlock when a loop that is registered in the lock state table and a lock request is generated between the plurality of threads. The computer system according to claim 1.

A program that executes a plurality of threads that share resources and is executed by a computer having a lock state table,
In the calculator,
Obtaining a lock acquisition request for requesting acquisition of a lock for occupying the resource from each of the plurality of threads;
Registering the lock acquisition state by the plurality of threads in the lock state table based on the acquired lock acquisition request;
Detecting the occurrence of a deadlock based on the acquired lock acquisition request and the lock acquisition status registered in the lock status table.

In the calculator,
When the occurrence of the deadlock is detected, taking over the processing of the computer to a computer different from the system computer;
When the occurrence of the deadlock is detected, the deadlock status information indicating the deadlock represented by the lock acquisition status registered in the lock status table is created, and the created deadlock status information is Storing in storage means accessed by the computer;
When a lock acquisition request from each of the plurality of threads is acquired, reading deadlock status information stored in the storage unit;
The lock registered in the lock state table indicates whether there is a possibility of falling into a deadlock indicated by the read deadlock status information when a lock is acquired in response to the acquired lock acquisition request. Determining based on the acquisition state of
The program according to claim 6, further executing a step of suspending lock acquisition processing according to the acquired lock acquisition request when it is determined that there is a possibility of falling into the deadlock.