JP2014119964A

JP2014119964A - Computer system and program

Info

Publication number: JP2014119964A
Application number: JP2012274665A
Authority: JP
Inventors: Isao Konno; 功今野; Josuke Matsuki; 譲介松木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-12-17
Filing date: 2012-12-17
Publication date: 2014-06-30

Abstract

PROBLEM TO BE SOLVED: To provide a method for automatically detecting a deadlock in software operating in a multithread environment without modification of a kernel for solving the deadlock and solving the deadlock state without a failover.SOLUTION: A deadlock is solved by collecting and storing exclusive control information of a thread, detecting a deadlock from the information, evading the deadlock by adding exclusive control, and restoring to a state before the deadlock in the case of occurrence of a deadlock.

Description

本発明は、計算機システムおよびプログラムに係り、特に、マルチスレッドで動作する計算機システムにおいて、デッドロックを検出し、デッドロック発生前の回避とデッドロック発生後の解消を行う計算機システムおよびプログラムに関する。 The present invention relates to a computer system and a program, and more particularly, to a computer system and a program for detecting a deadlock in a computer system that operates in a multi-thread and avoiding it before occurrence of deadlock and eliminating it after occurrence of deadlock.

近年、様々なシステムにおいてマルチスレッド環境で稼働するソフトウェアが普及している。マルチスレッド環境では、ＯＳ（Operating System）がソフトエア内の実行単位をスレッドで管理する。ＯＳは、スレッドに対しＣＰＵ（Central Processing Unit）が時間割り当てのスケジューリングを行う。スレッドは、ＯＳからＣＰＵ時間を割り当てられた時間だけ動作することができる。 In recent years, software operating in a multi-thread environment has become widespread in various systems. In a multi-thread environment, an OS (Operating System) manages execution units in software using threads. In the OS, a CPU (Central Processing Unit) schedules time allocation for threads. The thread can operate only for a time allocated by the OS from the CPU time.

したがって、ＣＰＵを複数備えるマルチスレッド環境では、スレッドを並列に実行し高速化できる。しかし、処理を並列に実行することにより、デッドロックと呼ばれる問題が発生する。 Therefore, in a multi-thread environment including a plurality of CPUs, threads can be executed in parallel to increase the speed. However, a problem called deadlock occurs when the processes are executed in parallel.

ここで、デッドロックとは複数のスレッドまたはプロセスなどの処理単位がリソースを共有する場合において、リソース解放漏れやリソースを占有（Ｌｏｃｋ）中の処理単位が互いにリソースの解放（Ｕｎｌｏｃｋ）を待ち続け、処理を再開できずに停止している状態である。ここで、プロセスとはプログラムの実行単位であり、プログラム内で利用している変数や状態を保持し、１つ以上のスレッドから構成される。 Here, deadlock means that when processing units such as a plurality of threads or processes share resources, processing units that are omission of resource release or occupy resources (Lock) continue to wait for resource release (Unlock), The process cannot be resumed and is stopped. Here, a process is an execution unit of a program and holds variables and states used in the program and is composed of one or more threads.

デッドロックが発生したプロセスは、異常終了しない。このため、ソフトウェアの使用者、開発者が異常の発生直後に気付かない。また、ソフトウェアの開発者が異常に気付いたとしても、スレッドの挙動確認や解析に時間がかかる。そこで、デッドロックを防ぎ、解決する技術が重要となってくる。 The process in which the deadlock occurred does not terminate abnormally. For this reason, software users and developers do not notice immediately after the occurrence of an abnormality. Even if a software developer notices an abnormality, it takes time to check and analyze the behavior of the thread. Therefore, technology for preventing and solving deadlocks becomes important.

特許文献１には、デッドロックが二重以上のロック取得するスレッド同士で発生する点に着目し、大域的なロックを追加している。つまり、二重以上のロックを取得するスレッドに対し、大域的なロックを取得する制約を設ける。これによって、二重以上のロックを取得するスレッドを制限し、デッドロックを防止する。さらに、大域的なロックを一律に取得するのではなく、条件を付けることで性能低下を抑えている。 Patent Document 1 adds a global lock, focusing on the point that deadlock occurs between threads that acquire double or more locks. In other words, a constraint for acquiring a global lock is provided for a thread that acquires a double or more lock. This limits threads that acquire double or more locks and prevents deadlocks. Furthermore, instead of acquiring global locks uniformly, performance degradation is suppressed by adding conditions.

また、特許文献２では、ロック取得および解除に対してＡＰＩ関数フックを行い、スレッド毎に共有リソースに対して「ロック中」および「ロック取得中」の情報を取得し、その組み合わせによってデッドロックを検出する計算機システムおよびプログラムが提案されている。ここで、ＡＰＩ関数とは、ＯＳやミドルウェアが提供するアプリケーションおよびソフトウェア開発向けのインタフェースであり、共通ライブラリの形で提供される。また、ＡＰＩ関数フックとは、ＡＰＩ関数の処理を横取りし、利用者が独自に定義した処理を行うことである。また、「ロック中」とは、スレッドがリソースを占有中により他のスレッドがリソースにアクセスできない状態である。一方、「ロック取得中」とは他のスレッドが占有中のリソースが解放されるのを待ってからロックしようとしている状態である。 Further, in Patent Document 2, API function hooks are performed for lock acquisition and release, information on “locking” and “lock acquisition” is acquired for each shared resource for each thread, and deadlock is performed by the combination thereof. Computer systems and programs for detection have been proposed. Here, the API function is an interface for application and software development provided by the OS and middleware, and is provided in the form of a common library. The API function hook means to intercept the API function processing and to perform the processing uniquely defined by the user. Further, “locking” means a state in which another thread cannot access the resource because the thread is occupying the resource. On the other hand, “lock acquisition in progress” is a state in which a lock is made after waiting for a resource occupied by another thread to be released.

特許文献２は、これらの情報を組み合わせ、具体的には、リソースＡを「ロック中」でリソースＢを「ロック取得中」のスレッド１と、リソースＡを「ロック取得中」でリソースＢを「ロック中」のスレッド２が同時に存在すると、互いにリソースが解放されるのを待ち続けるデッドロックとして検出できる。デッドロック検出後は、発生経路を記憶して冗長系にフェールオーバすることで、フェールオーバ先でデッドロックの再発を防止する。ここで、フェールオーバとは、正常時に冗長構成の計算機でデータの同期を取っておき、障害が発生した計算機から残りの正常な計算機にデータを引継ぎ、処理を継続する動作である。 Patent Document 2 combines these pieces of information. Specifically, the resource 1 is “locked” and the resource B is “lock acquired”, and the resource A is “lock acquired” and the resource B is “ If the thread 2 that is “locked” exists at the same time, it can be detected as a deadlock that keeps waiting for resources to be released from each other. After the deadlock is detected, the occurrence path is memorized and failed over to the redundant system, thereby preventing the recurrence of the deadlock at the failover destination. Here, the failover is an operation in which data is synchronized in a redundant computer during normal operation, data is transferred from the computer in which a failure has occurred to the remaining normal computers, and the processing is continued.

特開平０７−１９１９４４号公報Japanese Patent Application Laid-Open No. 07-191944 特開２００９−２７１８５８号公報JP 2009-271858 A

特許文献１に記載の技術は、ロックの取得を連続で行う必要があり、多重ロックを取得する時点で予め必要なロックが全て既知であることが前提条件となる。実際のプログラムでは、経路によって必要なロックが異なるため、予め必要なロックを全て取得することは困難であり、必要なロックを全て取得できなかった場合、デッドロックが発生する。 The technique described in Patent Document 1 requires that locks be acquired continuously, and all necessary locks are known in advance at the time of acquiring multiple locks. In an actual program, since necessary locks differ depending on paths, it is difficult to acquire all necessary locks in advance, and deadlock occurs when all necessary locks cannot be acquired.

特許文献２に記載の技術は、デッドロックの再発を防げる。しかし、別経路でデッドロックが発生する度にフェールオーバを行う必要がある。フェールオーバ実行中は、本来実行しなければならない処理を実行できず、本来提供したいサービスに影響が出てしまう。また、切替え可能な計算機が無ければフェールオーバできないため、可能な限りフェールオーバ実施を避けるべきである。
特許文献１と特許文献２を組み合わせたとしても、デッドロック発生後の復帰手段がないため、容易には解決できない。 The technique described in Patent Document 2 can prevent the recurrence of deadlock. However, it is necessary to perform failover every time a deadlock occurs in another route. During failover execution, processing that must be executed cannot be executed, and the service that the user wants to provide is affected. Failover should be avoided as much as possible, because failover is not possible without a switchable computer.
Even if Patent Document 1 and Patent Document 2 are combined, there is no return means after the occurrence of deadlock, so that it cannot be solved easily.

本発明の代表的な一形態によると、複数のスレッドを実行する計算機システムであって、計算機システムは、少なくとも一つのプロセッサと、メモリとを備え、計算機システムで動作するソフトウェア内で第一のスレッドと第一のスレッドを監視するための監視スレッドを実行し、第一のスレッドの排他制御の情報を保持するための監視情報領域と、第一のスレッド用にデッドロック回避用の領域および情報と、メモリ内容を復元するための退避領域と、をメモリに保持し、第一のスレッドと監視スレッドは監視情報領域に排他制御の情報を格納し、第一のスレッドはメモリ内容を退避領域に格納し、監視スレッドが監視情報領域からデッドロックを検出し、監視スレッドはデッドロックが発生する可能性がある第一のスレッドに対し、デッドロック回避用領域に排他制御を追加し、第一のスレッドでデッドロックが発生した場合、第一のスレッドの処理を退避位置に戻し、かつ、退避領域からメモリ内容を復元することでデッドロックを解消する。 According to a typical embodiment of the present invention, a computer system that executes a plurality of threads, the computer system including at least one processor and a memory, and a first thread in software operating in the computer system. And a monitoring information area for executing the monitoring thread for monitoring the first thread and holding the exclusive control information of the first thread, an area and information for deadlock avoidance for the first thread, The save area for restoring the memory contents is held in the memory, the first thread and the monitoring thread store exclusive control information in the monitor information area, and the first thread stores the memory contents in the save area The monitoring thread detects a deadlock from the monitoring information area, and the monitoring thread de- When exclusive control is added to the lock avoidance area and a deadlock occurs in the first thread, the deadlock is prevented by returning the processing of the first thread to the save position and restoring the memory contents from the save area. Eliminate.

上述した課題は、複数の処理スレッドと、監視スレッドとを並列実行する計算機システムであって、プロセッサと、メモリとを備え、前記処理スレッドは、排他制御情報を収集および蓄積し、多重排他制御情報を前記監視スレッドに送信し、前記監視スレッドは、前記多重排他制御情報を蓄積し、前記多重排他制御情報間にデッドロックを引き起こす組み合わせを検出したとき、当該組み合わせの多重排他制御情報の復帰場所に排他制御を追加する計算機システムにより、達成できる。 The problem described above is a computer system that executes a plurality of processing threads and a monitoring thread in parallel, and includes a processor and a memory, and the processing threads collect and accumulate exclusive control information, and multiple exclusive control information To the monitoring thread, and when the monitoring thread accumulates the multiple exclusive control information and detects a combination that causes a deadlock between the multiple exclusive control information, the monitoring thread returns to the return location of the multiple exclusive control information of the combination. This can be achieved by a computer system that adds exclusive control.

また、コンピュータを、排他制御情報を収集および蓄積し、多重排他制御情報を前記監視スレッドに送信する処理スレッド、前記多重排他制御情報を蓄積し、前記多重排他制御情報間にデッドロックを引き起こす組み合わせを検出したとき、当該組み合わせの多重排他制御情報の復帰場所に排他制御を追加する監視スレッド、として機能させるプログラムにより、達成できる。 In addition, the computer collects and accumulates exclusive control information, a processing thread that transmits multiple exclusive control information to the monitoring thread, a combination that accumulates the multiple exclusive control information and causes a deadlock between the multiple exclusive control information. When detected, this can be achieved by a program that functions as a monitoring thread that adds exclusive control to the return location of the multiple exclusive control information of the combination.

本発明によって、デッドロックを自動検出し、フェールオーバを実施せずにデッドロック状態から正常な状態に復帰できる。 According to the present invention, it is possible to automatically detect a deadlock and return from a deadlock state to a normal state without performing a failover.

通信装置のハードウェアブロック図である。It is a hardware block diagram of a communication apparatus. デッドロック監視を行う装置のスレッド構成と処理内容を説明する図である。It is a figure explaining the thread | sled structure and process content of the apparatus which performs a deadlock monitoring. ロックレコード、多重ロックレコードを説明する図である。It is a figure explaining a lock record and a multiple lock record. Ｌｏｃｋ命令のフローチャートである。It is a flowchart of a Lock instruction. フック有りＬｏｃｋ命令のフローチャートである。It is a flowchart of a lock instruction with a hook. Ｕｎｌｏｃｋ命令のフローチャートである。It is a flowchart of an Unlock instruction. フック有りＵｎｌｏｃｋ命令のフローチャートである。It is a flowchart of a hooked Unlock instruction. ロックレコードテーブルを説明する図である。It is a figure explaining a lock record table. 多重ロックレコードテーブルを説明する図である。It is a figure explaining a multiple lock record table. 多重ロックレコードの処理フローチャートである。It is a processing flowchart of a multiple lock record. デッドロック検出、回避、解消の流れを示すフローチャートである。It is a flowchart which shows the flow of a deadlock detection, avoidance, and cancellation. デッドロック検出のフローチャートである。It is a flowchart of a deadlock detection. デッドロック回避のフローチャートである。It is a flowchart of deadlock avoidance. デッドロック解消のフローチャートである。It is a flowchart of deadlock cancellation. メモリ内容をデッドロック発生前の状態に復元する処理を説明する図である。It is a figure explaining the process which restores the memory contents to the state before the occurrence of deadlock.

以下、本発明の実施の形態について、実施例を用い図面を参照しながら詳細に説明する。なお、実質同一部位には同じ参照番号を振り、説明は繰り返さない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings using examples. The same reference numerals are assigned to substantially the same parts, and the description will not be repeated.

実施例１は、通信装置のように特定の処理を繰り返す計算機を説明する。
図１を参照して、最小構成のハードウェアの通信装置を説明する。図１において、通信装置１００は、プロセッサ１０１と、メモリ１０２と、ＮＩＦ（Network InterFace）１０３とを具備する。 The first embodiment describes a computer that repeats a specific process like a communication device.
With reference to FIG. 1, a hardware communication apparatus having a minimum configuration will be described. In FIG. 1, the communication device 100 includes a processor 101, a memory 102, and an NIF (Network InterFace) 103.

プロセッサ１０１は、ＣＰＵ等の演算装置であり、ＯＳやアプリケーションプログラム等のソフトウェアを実行する。メモリ１０２は、主記憶装置であり、プロセッサ１０１がソフトウェア実行時に、プログラム実行バイナリおよびプログラムが使用するデータを格納する。ＮＩＦ１０３は、通信装置１００とは別の装置とパケットを送受信するためのインタフェースである。各ハードウェアブロックは、バス１１０−１、１１０−２によって相互に接続するため、互いに命令メッセージ及びデータを送信することが可能である。 The processor 101 is an arithmetic device such as a CPU, and executes software such as an OS and application programs. The memory 102 is a main storage device, and stores a program execution binary and data used by the program when the processor 101 executes software. The NIF 103 is an interface for transmitting and receiving packets to and from a device different from the communication device 100. Since the hardware blocks are connected to each other via the buses 110-1 and 110-2, it is possible to transmit command messages and data to each other.

なお、本実施例を適用するためには、プロセッサ１０１がマルチタスクに対応している必要がある。ここで、マルチタスクとは演算装置が複数の処理を切換えながら複数の処理を実行する方式であり、汎用計算機のプロセッサでは、そのほとんどがマルチタスクに対応している。 In order to apply this embodiment, the processor 101 needs to support multitasking. Here, multitasking is a method in which an arithmetic unit executes a plurality of processes while switching a plurality of processes, and most general-purpose computer processors support multitasking.

また、通信装置１００は、物理的に一つの計算機によって実装されてもよいし、少なくとも一つの計算機が提供する仮想的な計算機によって実装されてもよい。 The communication device 100 may be physically implemented by one computer, or may be implemented by a virtual computer provided by at least one computer.

図２を参照して、ソフトウェアのスレッド構成と、ロックを記録するためのフック処理を説明する。図２のサブルーチンに関しては、別途、後述の図を用いて詳述する。図２において、スレッドは、通信装置の呼制御などを行う複数の処理スレッド２００と、処理スレッド２００が正常に動作しているか否かを判定する監視スレッド２１０から成る。 With reference to FIG. 2, a software thread configuration and a hook process for recording a lock will be described. The subroutine shown in FIG. 2 will be described in detail later with reference to the following drawings. In FIG. 2, the thread includes a plurality of processing threads 200 that perform call control of the communication device and the like, and a monitoring thread 210 that determines whether or not the processing thread 200 is operating normally.

処理スレッド２００は、起動後、パケット受信などのイベントを待つ（Ｓ２０２）。処理スレッド２００は、パケットを受信し（Ｓ２０３）、送信するパケットを加工し（Ｓ２０４）、送信する（Ｓ２０５）動作を繰返す。スレッド起動からパケット送信までは、ソフトウェアバイナリの変更は行わない。 The processing thread 200 waits for an event such as packet reception after the activation (S202). The processing thread 200 receives the packet (S203), processes the packet to be transmitted (S204), and repeats the transmission (S205) operation. Software binaries are not changed from thread activation to packet transmission.

処理スレッド２００は、フック処理２２０とロックレコードテーブル７００を用意しておく。フック処理２２０は、パケット処理のステップ２０３とパケット加工のステップ２０４とパケット送信のステップ２０５で呼び出されるすべてのＬｏｃｋ命令５００およびＵｎｌｏｃｋ命令６００を横取りするフック関数の集合である。フック処理２２０では、Ｌｏｃｋ命令５００が呼び出された際に、別途定義するサブルーチンＬｏｃｋ命令５５０を呼び出す。また、Ｕｎｌｏｃｋ命令６００が呼び出された際に、別途定義するサブルーチンＵｎｌｏｃｋ命令６５０を呼び出す。なお、サブルーチンＬｏｃｋ命令５５０は、図４で後述し、サブルーチンＵｎｌｏｃｋ命令６５０は、図５で後述する。フック処理２２０は、一時的にロックレコードをロックレコードテーブル７００に保存する。このうち、多重ロックレコードと判断した場合は、多重ロックレコード４５０を、監視スレッド２１０のキュー２３０に追加する。 The processing thread 200 prepares a hook process 220 and a lock record table 700. The hook processing 220 is a set of hook functions that intercept all the Lock instructions 500 and Unlock instructions 600 that are called at the packet processing step 203, the packet processing step 204, and the packet transmission step 205. In the hook process 220, when the Lock instruction 500 is called, a subroutine Lock instruction 550 that is defined separately is called. Further, when the Unlock instruction 600 is called, a subroutine Unlock instruction 650 that is defined separately is called. The subroutine Lock instruction 550 will be described later with reference to FIG. 4, and the subroutine Unlock instruction 650 will be described later with reference to FIG. The hook process 220 temporarily stores the lock record in the lock record table 700. Of these, if it is determined that the record is a multiple lock record, the multiple lock record 450 is added to the queue 230 of the monitoring thread 210.

監視スレッド２１０は、プログラムカウンタが、サブルーチン多重ロックレコード処理９００に到達した際に、キュー２３０に多重ロックレコード４５０が入っているか調べ、多重ロックレコード４５０がキュー２３０に入っていた場合は、多重ロックレコードテーブル８００に、多重ロックレコード４５０を追加する。なお、サブルーチン多重ロックレコード処理９００は、図８を用いて詳述する。 When the program counter reaches the subroutine multiple lock record processing 900, the monitoring thread 210 checks whether the multiple lock record 450 is in the queue 230, and if the multiple lock record 450 is in the queue 230, the multiple lock record 450 is checked. A multiple lock record 450 is added to the record table 800. The subroutine multiple lock record processing 900 will be described in detail with reference to FIG.

上述した手順により、スレッド起動とイベント待ちのステップ２０２とパケット受信のステップ２０３とパケット加工のステップ２０４とパケット送信のステップ２０５を、変更することなく、ロック情報をロックレコードテーブル７００に記録することができる。 According to the procedure described above, the lock information can be recorded in the lock record table 700 without changing the step 202 for thread activation and event waiting, the step 203 for packet reception, the step 204 for packet processing, and the step 205 for packet transmission. it can.

また、監視スレッド２１０は、起動後、チェック周期であるか否かを判定し、チェック周期であればデッドロックチェック処理（Ｓ３００）を行い、チェック周期でなければ多重ロックレコード処理（Ｓ９００）を行う動作を繰返す。ここで、チェック周期は開発者や使用者が指定する時間や回数以外にも、排他制御の統計から自動的に算出しても良い。デッドロックチェック処理は、図９を用いて、後述する。 Further, the monitoring thread 210 determines whether or not it is a check cycle after activation, and performs a deadlock check process (S300) if it is a check cycle, and performs a multiple lock record process (S900) if it is not a check cycle. Repeat the operation. Here, the check cycle may be automatically calculated from the statistics of exclusive control other than the time and number of times specified by the developer or user. The deadlock check process will be described later with reference to FIG.

本実施例では、監視スレッド２１０がデッドロックチェック処理を繰り返し実行する。これにより、デッドロックを検出し、デッドロック発生前に検出した場合は、デッドロックを回避し、デッドロック発生と同時にデッドロックを検出した場合はデッドロック状態からの解消処理を行い、デッドロックを自動的に解決する。なお、本実施例では監視スレッドを監視専用としているが、監視スレッドに別の役割を持たせることや、処理スレッド側に監視動作を追加することも可能である。このことにより、スレッド数を増やす必要が無く、メモリ量やＣＰＵリソースを節約する利点がある。 In the present embodiment, the monitoring thread 210 repeatedly executes deadlock check processing. As a result, when deadlock is detected and detected before deadlock occurs, deadlock is avoided, and when deadlock is detected at the same time as deadlock occurs, the deadlock state is resolved and deadlock is prevented. Solve automatically. In this embodiment, the monitoring thread is dedicated to monitoring, but it is also possible to give the monitoring thread another role or to add a monitoring operation to the processing thread side. As a result, there is no need to increase the number of threads, and there is an advantage of saving the amount of memory and CPU resources.

図３を参照して、デッドロックチェック動作の検出処理に利用するロックレコード４００および多重ロックレコード４５０を説明する。 With reference to FIG. 3, the lock record 400 and the multiple lock record 450 used for the detection process of the deadlock check operation will be described.

図３（ａ）において、ロックレコード４００は、ネスト数４０１、ロックハンドル４０２、復帰場所４０３から成る。ロックレコード４００は、処理スレッド２００が個々に管理する排他制御の記録である。ここで、ネスト数４０１は、排他制御の深さである。ロックハンドル４０２は、排他制御に必要な識別子である。復帰場所４０３は、デッドロック退避用領域である。 In FIG. 3A, the lock record 400 includes a nesting number 401, a lock handle 402, and a return place 403. The lock record 400 is a record of exclusive control managed individually by the processing thread 200. Here, the nest number 401 is the depth of exclusive control. The lock handle 402 is an identifier necessary for exclusive control. The return location 403 is a deadlock saving area.

図３（ｂ）において、多重ロックレコード４５０は、項番４５１、ロック順序４５２、復帰場所４０３、追加ロックハンドル４５３から成る。多重ロックレコード４５０は、監視スレッド２１０が管理する排他制御組合せパターンの記録である。ここで、項番４５１は、多重ロックレコードを管理するための識別子である。ロック順序４５２は、ロックハンドル４０２を取得した順序である。追加ロックハンドル４５３は、デッドロック回避用のロックハンドルである。 3B, the multiple lock record 450 includes an item number 451, a lock order 452, a return place 403, and an additional lock handle 453. The multiple lock record 450 is a record of an exclusive control combination pattern managed by the monitoring thread 210. Here, the item number 451 is an identifier for managing multiple lock records. The lock order 452 is the order in which the lock handle 402 is acquired. The additional lock handle 453 is a lock handle for avoiding deadlock.

ロックレコード４００は、図６に示すロックレコードテーブル７００で管理する。多重ロックレコード４５０は、図７に示す多重ロックレコードテーブル８００で管理する。ロックレコードテーブル７００は、処理スレッド２００が個々に管理するテーブルである。ロックレコードテーブル７００は、処理スレッド２００がその時点で取得している排他制御を示す。ロックレコードテーブル７００は、ロック取得命令（以下、Ｌｏｃｋ命令、本実施例では、ｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｌｏｃｋ）とロック解放命令（以下、Ｕｎｌｏｃｋ命令、本実施例では、ｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｕｎｌｏｃｋ）にフックを行って取得する。ここで、フックとは処理を横取りし、使用者が定義した処理をプロセッサ１０１に行わせる処理である。 The lock record 400 is managed by the lock record table 700 shown in FIG. Multiple lock record 450 is managed by multiple lock record table 800 shown in FIG. The lock record table 700 is a table managed individually by the processing thread 200. The lock record table 700 indicates the exclusive control acquired by the processing thread 200 at that time. The lock record table 700 is acquired by hooking a lock acquisition instruction (hereinafter, Lock instruction, in this embodiment, pthread_mutex_lock) and a lock release instruction (hereinafter, Unlock instruction, in this embodiment, pthread_mutex_unlock). Here, the hook is processing that intercepts processing and causes the processor 101 to perform processing defined by the user.

具体的には、Ｌｉｎｕｘ（登録商標）系ＯＳのＬＤ＿ＰＲＥＬＯＡＤ環境変数と共有ライブラリを用いたＡＰＩ関数フックで実現できる。ＬＤ＿ＰＲＥＬＯＡＤ環境変数を指定しない状態では、ｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｌｏｃｋ関数およびｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｕｎｌｏｃｋ関数は、ｌｉｂｐｔｈｒｅａｄ．ｓｏという共有ライブラリの関数が実行される。しかし、ＬＤ＿ＰＲＥＬＯＡＤ環境変数で、共有ライブラリファイル名を指定すると、関数名から実行する関数へのアドレス解決順序を入れ替え、ＬＤ＿ＰＲＥＬＯＡＤ環境変数で指定した、共有ライブラリを優先的に検索する。指定する共有ライブラリに、ｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｌｏｃｋ関数とｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｕｎｌｏｃｋ関数を定義することにより、ｌｉｂｐｔｈｒｅａｄ．ｓｏのｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｌｏｃｋ関数およびｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｕｎｌｏｃｋ実行前に、共有ライブラリの関数を実行することができる。共有ライブラリのｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｌｏｃｋ関数およびｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｕｎｌｏｃｋ関数にて、ｌｉｂｐｔｈｒｅａｄ．ｓｏのｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｌｏｃｋ関数およびｐｔｈｒｅａｄ＿ｍｕｔｅｘ＿ｕｎｌｏｃｋ関数を呼び出すことにより、Ｌｏｃｋ命令へのフックおよびＵｎｌｏｃｋ命令へのフックを実現することができる。 Specifically, it can be realized by an API function hook using a LD_PRELOAD environment variable of a Linux (registered trademark) OS and a shared library. In a state where the LD_PRELOAD environment variable is not specified, the pthread_mutex_lock function and the pthread_mutex_unlock function are set to libpthread. A function of a shared library called so is executed. However, if the shared library file name is specified by the LD_PRELOAD environment variable, the address resolution order from the function name to the function to be executed is changed, and the shared library specified by the LD_PRELOAD environment variable is preferentially searched. By defining the pthread_mutex_lock function and the pthread_mutex_unlock function in the specified shared library, the libthread. The functions of the shared library can be executed before executing the pthread_mutex_lock function and the pthread_mutex_unlock function of the so. In the shared library pthread_mutex_lock function and pthread_mutex_unlock function, By calling the pthread_mutex_lock function and the pthread_mutex_unlock function of the so, a hook to the Lock instruction and a hook to the Unlock instruction can be realized.

図４を参照して、Ｌｏｃｋ命令のフローチャートとフック有りＬｏｃｋ命令のフローチャートを説明する。また、図５を参照して、Ｕｎｌｏｃｋ命令のフローチャートとフック有りＵｎｌｏｃｋ命令のフローチャートを説明する。 With reference to FIG. 4, the flowchart of the Lock instruction and the flowchart of the Lock instruction with hook will be described. In addition, a flowchart of the Unlock instruction and a flowchart of the Unlock instruction with a hook will be described with reference to FIG.

図４Ａにおいて、Ｌｏｃｋ命令を開始すると、処理スレッド２００は、ロックハンドルが取得可能か判定する（Ｓ５０１）。取得可能であれば、処理スレッド２００は、ロックハンドルを取得し（Ｓ５０４）、処理を終了する。ステップ５０１でロックハンドルが取得不可であれば、ロックハンドルが解放されるまで待ち（Ｓ５０３）、ロックハンドルを取得する（Ｓ５０４）。 In FIG. 4A, when the Lock instruction is started, the processing thread 200 determines whether or not a lock handle can be acquired (S501). If acquisition is possible, the processing thread 200 acquires the lock handle (S504) and ends the processing. If the lock handle cannot be acquired in step 501, the process waits until the lock handle is released (S503), and acquires the lock handle (S504).

図４Ｂにおいて、フック有りＬｏｃｋ命令を開始すると、処理スレッド２００は、ロックレコードの記録追加処理を実施する（Ｓ５５１）。ロックレコードの記録追加処理は、取得しようとしているロックハンドルをロックレコードに登録し、ロックレコードテーブル７００に追加する処理である。追加後、処理スレッド２００は、ロックレコードテーブル７００のネスト数４０１が２以上か判定する（Ｓ５５２）。判定結果が２以上の場合、処理スレッド２００は、ロックレコードテーブル７００に登録中の全ロックレコード４００から多重ロックレコード４５０を作成し、監視スレッド２１０のキュー２３０にエンキューする（Ｓ５５３）。エンキュー完了後およびネスト数４０１が２未満の場合、処理スレッド２００は、Ｌｏｃｋ命令を実行し（Ｓ５００）、フック有りＬｏｃｋ関数を完了する。 In FIG. 4B, when the lock instruction with hook is started, the processing thread 200 performs a lock record recording addition process (S551). The record addition process for the lock record is a process for registering the lock handle to be acquired in the lock record and adding it to the lock record table 700. After the addition, the processing thread 200 determines whether the nesting number 401 of the lock record table 700 is 2 or more (S552). If the determination result is 2 or more, the processing thread 200 creates a multiple lock record 450 from all the lock records 400 registered in the lock record table 700 and enqueues it in the queue 230 of the monitoring thread 210 (S553). After completion of enqueue and when the number of nestings 401 is less than 2, the processing thread 200 executes a Lock instruction (S500) and completes the hooked Lock function.

図５Ａにおいて、Ｕｎｌｏｃｋ命令を開始すると、処理スレッド２００は、ロックハンドルを解放し（Ｓ６０１）、Ｕｎｌｏｃｋ命令を終了する。
図５Ｂにおいて、フック有りＵｎｌｏｃｋ命令６５０を開始すると、処理スレッド２００は、Ｕｎｌｏｃｋ命令を実行する（Ｓ６００）。処理スレッド２００は、処理完了後にロックレコードの記録削除処理を行い（Ｓ６５１）、フック有りＵｎｌｏｃｋ命令を完了する。 In FIG. 5A, when the Unlock instruction is started, the processing thread 200 releases the lock handle (S601), and ends the Unlock instruction.
In FIG. 5B, when the Unlock instruction with hook 650 is started, the processing thread 200 executes the Unlock instruction (S600). The processing thread 200 performs the record deletion processing of the lock record after the processing is completed (S651), and completes the hooked Unlock command.

フック有りＬｏｃｋ命令のロックレコードの記録追加処理のステップ５５１と、フック有りＵｎｌｏｃｋ命令のロックレコードの記録削除処理のステップ６５１によって、処理スレッド２００は、取得しているロックハンドルを最新の状態として反映することができる。 The processing thread 200 reflects the acquired lock handle as the latest state in step 551 of the lock record recording process of the lock instruction with hook and step 651 of the recording deletion process of the lock record of the hook instruction with hook. be able to.

図６を参照して、ロックハンドルテーブル７００の状態遷移を説明する。図６において、フック２２０によって、ロックハンドルＡを取得した状態７０１から、フック有りＬｏｃｋ（Ｂ）でロックハンドルＢを取得すると状態７０２になり、さらにフック有りＵｎｌｏｃｋ（Ｂ）でロックハンドルＢを解放すると状態７０３になる。 The state transition of the lock handle table 700 will be described with reference to FIG. In FIG. 6, from the state 701 where the lock handle A is acquired by the hook 220, when the lock handle B is acquired with the lock with lock (B), the state becomes the state 702, and when the lock handle B is released with the unlock with hook (B). State 703 is entered.

状態７０２ではネスト数４０１が２以上になるため、フック有りＬｏｃｋ命令において、処理スレッド２００は、多重ロックレコード４５０を作成する。ロックレコードテーブル７００が状態７０２であれば、作成する多重ロックレコード４５０のロック順序４５２はＡ→Ｂ、復帰場所４０３はＦｕｎｃＡＢとなり、作成した多重ロックレコード４５０を監視スレッド２１０のキュー２３０にエンキューする。なお、多重ロックレコード４５０の項番４５１と追加ロックハンドル４５３は、監視スレッド２１０が後で登録する。なお、デッドロックは、二重以上のロックハンドル４０２を取得した処理スレッド２００同士で発生するため、ネスト数４０１が２以上の場合のみ、多重ロックレコード４５０を作成すれば良い。 In the state 702, since the number of nests 401 is 2 or more, the processing thread 200 creates a multiple lock record 450 in the lock instruction with hook. If the lock record table 700 is in the state 702, the lock order 452 of the created multiple lock record 450 is A → B, the return location 403 is FuncAB, and the created multiple lock record 450 is enqueued in the queue 230 of the monitoring thread 210. The item number 451 and the additional lock handle 453 of the multiple lock record 450 are registered later by the monitoring thread 210. Note that since a deadlock occurs between processing threads 200 that have acquired double or more lock handles 402, a multiple lock record 450 may be created only when the number of nestings 401 is two or more.

処理スレッド２００が多重ロックレコード４５０をキュー２３０にエンキュー後、監視スレッド２１０は、多重ロックレコード処理９００で多重ロックレコードテーブル８００へ登録する。多重ロックレコードテーブル８００は、図７に示す多重ロックレコード４５０を蓄積するテーブルであり、多重ロックレコード４５０を持たない状態８０１から始まる。なお、図７は、フローチャートの中で説明する。 After the processing thread 200 enqueues the multiple lock record 450 in the queue 230, the monitoring thread 210 registers the multiple lock record 450 in the multiple lock record table 800 in the multiple lock record processing 900. The multiple lock record table 800 is a table that stores the multiple lock record 450 shown in FIG. 7 and starts from a state 801 that does not have the multiple lock record 450. FIG. 7 will be described in the flowchart.

図８を参照して、多重ロックレコード処理によって、未登録の多重ロックレコードを追加する処理を説明する。図８において、多重ロックレコード処理を開始すると、監視スレッド２１０は、キュー２３０に多重ロックレコードがあるか判定する（Ｓ９０１）。多重ロックレコードがある場合、監視スレッド２１０は、多重ロックレコードテーブル８００に未登録か判定する（Ｓ９０２）。未登録であれば、監視スレッド２１０は、多重ロックレコードテーブルに登録して（Ｓ９０３）、多重ロックレコード処理を終了する。なお、多重ロックレコードが未登録か否かの判定は、ロック順序および復帰場所が一致しているか否かで判定する。多重ロックレコードがキュー２３０に無い場合、または多重ロックレコードが登録済みの場合、監視スレッド２１０は、多重ロックレコード処理を終了する。この操作によって、図７において、多重ロックレコード４５０を保持していない状態８０１から、未登録の多重ロックレコード（Ａ→Ｂ）を受取り、受取った順に項番４５１を登録することで、状態８０２になる。 With reference to FIG. 8, a process of adding an unregistered multiple lock record by multiple lock record processing will be described. In FIG. 8, when the multiple lock record process is started, the monitoring thread 210 determines whether there is a multiple lock record in the queue 230 (S901). When there is a multiple lock record, the monitoring thread 210 determines whether it is not registered in the multiple lock record table 800 (S902). If not registered, the monitoring thread 210 registers it in the multiple lock record table (S903), and ends the multiple lock record process. Note that whether or not the multiple lock record is unregistered is determined by whether or not the lock order and the return location match. When the multiple lock record is not in the queue 230 or when the multiple lock record is already registered, the monitoring thread 210 ends the multiple lock record process. By this operation, in FIG. 7, from the state 801 in which the multiple lock record 450 is not held, the unregistered multiple lock record (A → B) is received, and the item number 451 is registered in the order received, thereby entering the state 802. Become.

図２に示したように、監視スレッド２１０は、多重ロックレコードテーブル８００に対し、チェック周期以外では多重ロックレコード処理９００を行い、チェック周期であれば図９に示すデッドロックチェック処理３００を行う。 As shown in FIG. 2, the monitoring thread 210 performs multiple lock record processing 900 on the multiple lock record table 800 at times other than the check cycle, and performs deadlock check processing 300 shown in FIG. 9 at the check cycle.

図９において、デッドロックチェック処理を開始すると、監視スレッド２１０は、デッドロック検出処理を実行する（Ｓ３１０）。監視スレッド２１０は、デッドロックを検出したか判定する（Ｓ３２０）。ＹＥＳのとき、監視スレッド２１０は、デッドロック回避処理を実行する（Ｓ３３０）。監視スレッド２１０は、検出したデッドロックが発生しているか判定する（Ｓ３４０）。ＹＥＳのとき、監視スレッド２１０は、デッドロック解消処理を実行して（Ｓ３５０）、終了する。ステップ３２０またはステップ３４０でＮＯのとき、監視スレッド２１０は、デッドロック検出処理を終了する。 In FIG. 9, when the deadlock check process is started, the monitoring thread 210 executes a deadlock detection process (S310). The monitoring thread 210 determines whether a deadlock has been detected (S320). If YES, the monitoring thread 210 executes deadlock avoidance processing (S330). The monitoring thread 210 determines whether the detected deadlock has occurred (S340). If YES, the monitoring thread 210 executes deadlock elimination processing (S350) and ends. When NO in step 320 or step 340, the monitoring thread 210 ends the deadlock detection process.

図１０を参照して、デッドロック検出処理を説明する。図１０において、デッドロック検出処理を開始すると、監視スレッド２１０は、比較未実施の多重ロックレコードがあるか判定する（Ｓ３１１）。ここで、比較未実施の多重ロックレコードとは、多重ロックレコードテーブル８００に含まれる多重ロックレコードの組合せのうち、デッドロック検出の為の比較を実施していない多重ロックレコードの集合である。デッドロック検出処理の開始時点では全多重ロックレコードが比較未実施となる。未比較の多重ロックレコードがある場合、監視スレッド２１０は、その中から候補ＸとＹを選択する（Ｓ３１２）。監視スレッド２１０は、候補ＸとＹにデッドロックの可能性があるか判定する（Ｓ３１３）。デッドロックの可能性がある場合、監視スレッド２１０は、デッドロックパターンとして登録し（Ｓ３１４）、ステップ３１１へ戻る。全ての比較が完了後（Ｓ３１１：ＮＯ）、監視スレッド２１０は、終了する。 The deadlock detection process will be described with reference to FIG. In FIG. 10, when the deadlock detection process is started, the monitoring thread 210 determines whether there is a multiple lock record that has not been compared (S311). Here, the non-compared multiple lock record is a set of multiple lock records that are not subjected to comparison for deadlock detection among combinations of multiple lock records included in the multiple lock record table 800. At the start of the deadlock detection process, all multiple lock records are not compared. When there are uncompared multiple lock records, the monitoring thread 210 selects candidates X and Y from among them (S312). The monitoring thread 210 determines whether the candidates X and Y have a possibility of deadlock (S313). If there is a possibility of deadlock, the monitoring thread 210 registers it as a deadlock pattern (S314), and returns to step 311. After all the comparisons are completed (S311: NO), the monitoring thread 210 ends.

ステップ３１３でデッドロックの可能性あるか否かを判定する単純な方法としては、候補ＸとＹのロック順序４５２から一致するロックハンドルを抜出し、その順序が逆転しているか否かを判定する方法が挙げられる。 As a simple method for determining whether or not there is a possibility of deadlock in step 313, a method is used in which a matching lock handle is extracted from the lock order 452 of the candidates X and Y, and whether or not the order is reversed is determined. Is mentioned.

図７に戻って、多重ロックレコードテーブルが状態８０３であれば、ロックハンドルＡのみが一致し、順序逆転はないため、デッドロックは発生しないと判定する。また、多重ロックレコードテーブルが状態８０４であれば、３種類の多重ロックレコードの全組合せ３通りで比較し、項番１と項番３の多重ロックレコードでロックハンドルＡとＢが一致し、かつ順序逆転が発生しているため、デッドロックとして検出できる。 Returning to FIG. 7, if the multiple lock record table is in the state 803, only the lock handle A matches and there is no order reversal, so it is determined that no deadlock occurs. Also, if the multiple lock record table is in the state 804, all combinations of three types of multiple lock records are compared, and the lock handles A and B match in the multiple lock records of item number 1 and item number 3, and Since order reversal has occurred, it can be detected as a deadlock.

なお、多重ロックレコードテーブル８００は、多重ロックレコードを蓄積していくため、過去の多重ロックレコードから発生確率が低いデッドロックでも検出することが可能になる。しかし、多重ロックレコードを蓄積し続けると記憶領域を圧迫するため、必要に応じて古い情報を削除しても良い。また、未比較の多重ロックレコード４５０は、多重ロックレコードテーブル８００に新規登録する際に発生するため、多重ロックレコード処理９００のステップ９０３後にデッドロック検出処理を実行しても良い。 Note that since the multiple lock record table 800 accumulates multiple lock records, it is possible to detect even a deadlock having a low occurrence probability from past multiple lock records. However, if the multiple lock records are continuously accumulated, the storage area is compressed, so old information may be deleted as necessary. In addition, since an uncompared multiple lock record 450 is newly registered in the multiple lock record table 800, a deadlock detection process may be executed after step 903 of the multiple lock record process 900.

図１１を参照して、デッドロック回避処理を説明する。図１１において、監視スレッド２１０は、デッドロックを検出した多重ロックレコードの組合せ（最低でも２つ）を受取り、その多重ロックレコードの中で追加ロックハンドル４５３が登録されているかを確認する（Ｓ３３１）。追加ロックハンドル４５３が全ての多重ロックレコードに登録されていない場合（ＮＯ）、監視スレッド２１０は、追加ロックハンドル４５３を新規確保（ＧｌｏｂａｌＡＢとする）し、多重ロックレコードの追加ロックハンドル４５３にＧｌｏｂａｌＡＢを登録して（Ｓ３３２）、デッドロック回避処理を終了する。 The deadlock avoidance process will be described with reference to FIG. In FIG. 11, the monitoring thread 210 receives a combination (at least two) of multiple lock records in which a deadlock has been detected, and confirms whether an additional lock handle 453 is registered in the multiple lock records (S331). . When the additional lock handle 453 is not registered in all the multiple lock records (NO), the monitoring thread 210 newly secures the additional lock handle 453 (referred to as GlobalAB), and sets GlobalAB in the additional lock handle 453 of the multiple lock record. Register (S332) and end the deadlock avoidance process.

ステップ３３１で追加ロックハンドル４５３を登録済みの場合（ＹＥＳ）、監視スレッド２１０は、登録している追加ロックハンドル４５３が１種類かを確認する（Ｓ３３３）。登録済みの追加ロックハンドル４５３が１種類（ＧｌｏｂａｌＸＸとする）の場合（ＹＥＳ）、監視スレッド２１０は、未登録の追加ロックハンドル４５３にＧｌｏｂａｌＸＸを登録して（Ｓ３３４）、デッドロック回避処理を終了する。 If the additional lock handle 453 has been registered in step 331 (YES), the monitoring thread 210 confirms whether the registered additional lock handle 453 is one type (S333). When the registered additional lock handle 453 is one type (referred to as GlobalXX) (YES), the monitoring thread 210 registers GlobalXX in the unregistered additional lock handle 453 (S334), and ends the deadlock avoidance process. .

ステップ３３３で登録されている追加ロックハンドル４５３が複数（ＧｌｏｂａｌＸＸ、ＧｌｏｂａｌＹＹとする）の場合（ＮＯ）、監視スレッド２１０は、ロックハンドルを新規確保（ＧｌｏｂａｌＸＹとする）し、未登録の追加ロックハンドル４５３と、受取った多重ロックレコードのＧｌｏｂａｌＸＸとＧｌｏｂａｌＹＹをＧｌｏｂａｌＸＹで登録または上書き登録する（Ｓ３３５）。さらに、監視スレッド２１０は、多重ロックテーブル８００の中から、追加ロックハンドル４５３にＧｌｏｂａｌＸＸまたはＧｌｏｂａｌＹＹを登録しているレコードをＧｌｏｂａｌＸＹで上書き登録する（Ｓ３３６）。監視スレッド２１０は、上書き登録によって不要となったＧｌｏｂａｌＸＸとＧｌｏｂａｌＹＹのメモリ領域を削除する（Ｓ３３７）。追加ロックハンドル４５３の登録が完了後、監視スレッド２１０は、デッドロック回避処理を終了する。
なお、追加ロックハンドル４５３が複数となるのは、ネスト数が３以上の場合である。 When there are a plurality of additional lock handles 453 registered in Step 333 (referred to as GlobalXX and GlobalYY) (NO), the monitoring thread 210 newly secures a lock handle (referred to as GlobalXY) and adds an unregistered additional lock handle 453. Then, GlobalXX and GlobalYY of the received multiple lock record are registered or overwritten with GlobalXY (S335). Further, the monitoring thread 210 overwrites and registers the record in which GlobalXX or GlobalYY is registered in the additional lock handle 453 from the multiple lock table 800 with GlobalXY (S336). The monitoring thread 210 deletes the memory areas of GlobalXX and GlobalYY that have become unnecessary due to the overwrite registration (S337). After the registration of the additional lock handle 453 is completed, the monitoring thread 210 ends the deadlock avoidance process.
Note that there are a plurality of additional lock handles 453 when the number of nests is three or more.

デッドロック回避処理により、具体的には、図７において、多重ロックレコードテーブル８００が状態８０４であれば、項番１と項番３の多重ロックレコードを検出し、両方とも追加ロックハンドル４５３を登録していないため、追加ロックハンドル（ＧｌｏｂａｌＡＢ）を新規確保し、登録することで状態８０５となる。 Specifically, if the multiple lock record table 800 in FIG. 7 is in the state 804, the deadlock avoidance process detects the multiple lock records of item number 1 and item number 3, and registers the additional lock handle 453 for both. Therefore, a state 805 is obtained by newly securing and registering an additional lock handle (GlobalAB).

デッドロック回避処理で登録した追加ロックハンドル４５３は、多重ロックレコードに登録している復帰場所にてスコープドロックし、デッドロック発生の可能性がある箇所を追加した排他制御で覆い込む。状態８０４では（Ａ）→（Ｂ）と（Ｂ）→（Ａ）の順番でロックハンドルを取得していたが、状態８０５で追加ロックハンドルにより（ＧｌｏｂａｌＡＢ）→（Ａ）→（Ｂ）と（ＧｌｏｂａｌＡＢ）→（Ｂ）→（Ａ）という順番になるため、ＧｌｏｂａｌＡＢによってデッドロックを防止できる。なお、スコープドロックとは、Ｌｏｃｋ命令を開始後、関数などの有効範囲（スコープ）終了時に自動でＵｎｌｏｃｋ命令を実行するための仕組みであり、リソース解放漏れを防ぐことが可能になる。 The additional lock handle 453 registered in the deadlock avoidance process is scoped locked at the return location registered in the multiple lock record, and is covered with exclusive control to which a portion where a deadlock may occur is added. In the state 804, the lock handles are acquired in the order of (A) → (B) and (B) → (A). However, in the state 805, (GlobalAB) → (A) → (B) ( Since the order is (GlobalAB) → (B) → (A), deadlock can be prevented by GlobalAB. Note that the scoped lock is a mechanism for automatically executing the Unlock instruction after the effective range (scope) of a function or the like after the start of the Lock instruction, and it is possible to prevent omission of resource release.

また、追加ロックハンドル４５３を上書き登録することにより、過去に検出したデッドロックパターン（具体的には、Ａ→Ｂ→Ｃ、Ｄ→Ｃ→Ｂ）と新たに検出したデッドロックパターン（Ｅ→Ｃ→Ｂ）で同一の追加ロックハンドル４５３を指定するため、デッドロックを全て防止できる。また、追加ロックハンドル４５３を一種類だけとし、全てのデッドロックパターンで同一の追加ロックハンドルを利用してすることも可能である。 Further, by overwriting and registering the additional lock handle 453, a deadlock pattern detected in the past (specifically, A → B → C, D → C → B) and a newly detected deadlock pattern (E → C). Since the same additional lock handle 453 is designated in (B), all deadlocks can be prevented. It is also possible to use only one type of additional lock handle 453 and use the same additional lock handle for all deadlock patterns.

ロックレコードテーブル７００や多重ロックレコードテーブル８００の復帰場所４０３は、デッドロックを防止できる箇所（プログラム上のアドレス）を指定する必要がある。その指定する方法は、チェックポイント方式と関数フック方式がある。チェックポイント方式は、復帰場所をソフトウェアの開発者がチェックポイントとして明示的に指定する方式である。一方、関数フック方式は、Ｃ言語やＣ＋＋言語で記述したソフトウェアのソースコードビルド時に「−ｆｉｎｓｔｒｕｍｅｎｔ−ｆｕｎｃｔｉｏｎｓ」オプションを指定することにより、関数の開始時と終了時にフックできるため、デッドロック要因となるＬｏｃｋ命令を含む関数の開始箇所を、自動的に指定する方式である。 As the return location 403 of the lock record table 700 and the multiple lock record table 800, it is necessary to specify a location (address on the program) where deadlock can be prevented. The designation method includes a checkpoint method and a function hook method. The checkpoint method is a method in which the software developer explicitly specifies the return location as a checkpoint. On the other hand, the function hook method causes a deadlock because it can be hooked at the start and end of a function by specifying the "-finstrument-functions" option when building the source code of software written in C or C ++. This is a method for automatically designating the start location of a function including a Lock instruction.

デッドロック検出と同時にデッドロックが発生している場合は回避できないため、デッドロック解消処理を実施する。なお、デッドロックが発生しているか判定するには、チェック周期の際に処理スレッド２００に応答があるか確認し、応答がない、かつ処理スレッド２００のロックレコードテーブル７００がデッドロックの可能性を検出した多重ロックレコードを含む場合に、デッドロック状態と判定する。 Since deadlock cannot be avoided when deadlock is detected at the same time, deadlock elimination processing is performed. In order to determine whether or not a deadlock has occurred, it is confirmed whether or not there is a response to the processing thread 200 during the check cycle, and there is no response and the lock record table 700 of the processing thread 200 indicates the possibility of deadlock. When the detected multiple lock record is included, the deadlock state is determined.

図１２を参照して、デッドロック解消処理を説明する。デッドロック解消処理を開始すると、監視スレッド２１０は、プログラムカウンタに復帰場所を設定する（Ｓ３５１）。監視スレッド２１０は、メモリ内容をデッドロック発生前の状態に復元する（Ｓ３５２）。監視スレッド２１０は、デッドロック要因のロックハンドルを解放して（Ｓ３５３）、デッドロック解消処理を終了する。 With reference to FIG. 12, the deadlock elimination processing will be described. When the deadlock elimination process is started, the monitoring thread 210 sets a return location in the program counter (S351). The monitoring thread 210 restores the memory contents to the state before the deadlock occurred (S352). The monitoring thread 210 releases the lock handle causing the deadlock (S353), and ends the deadlock elimination process.

プログラムカウンタは、プロセッサ１０１がプログラム上で次に実行する命令のアドレスを指す。各スレッドは、各々プログラムカウンタを保持しているため、プログラムカウンタを書き換えることにより、任意のスレッドが次に実行する命令のアドレスを任意のアドレスに変更することができる。したがって、プログラムカウンタを多重ロックレコードテーブル８００に登録した復帰場所４０３を指定する。処理スレッド２００は、追加ロックハンドル（ＧｌｏｂａｌＡＢ）取得から処理を再開し、デッドロック再発を防止する。 The program counter indicates the address of the instruction that the processor 101 executes next on the program. Since each thread holds a program counter, the address of an instruction to be executed next by an arbitrary thread can be changed to an arbitrary address by rewriting the program counter. Therefore, the return location 403 in which the program counter is registered in the multiple lock record table 800 is designated. The processing thread 200 restarts the process from acquiring the additional lock handle (GlobalAB), and prevents the deadlock from recurring.

次に、デッドロック発生時点で既にメモリ内容が書き換わり、メモリに更新途中のデータが保持されている可能性がある。このため、メモリ内容をデッドロック発生前の状態に戻すことで、データの整合性を維持する。 Next, there is a possibility that the memory contents are already rewritten at the time of occurrence of the deadlock, and the data being updated is held in the memory. For this reason, data consistency is maintained by returning the memory contents to the state before the deadlock occurs.

図１３を参照して、メモリ復元を説明する。メモリ復元では、まず、メモリ１０２内でデッドロック発生前メモリ内容１３００のスナップショットを取り、メモリ内容をコピーする（状態１３０１）。ここで、スナップショットの際に名前付きメモリマップを利用することで、デッドロック発生後に必要なメモリを検索することができる。その後、デッドロック発生を検出した時点で、メモリ内容が状態１３０２のように値が書き換わっていたとしても、スナップショット１３０１をメモリ状態１３０２に上書きすることで、メモリ内容を復元できる。なお、スナップショットは、フック有りＬｏｃｋ命令内で実行すれば、デッドロック発生前のメモリ内容を保持することができる。 The memory restoration will be described with reference to FIG. In the memory restoration, first, a snapshot of the memory content 1300 before the occurrence of the deadlock is taken in the memory 102, and the memory content is copied (state 1301). Here, by using a named memory map at the time of snapshot, it is possible to search for necessary memory after a deadlock occurs. Thereafter, even if the value of the memory is rewritten as in the state 1302 when the occurrence of deadlock is detected, the memory content can be restored by overwriting the snapshot 1301 in the memory state 1302. Note that if the snapshot is executed within the lock instruction with a hook, the memory contents before the occurrence of the deadlock can be held.

図１２に戻って、ステップ３５３において、処理スレッド２００の取得済みロックハンドルの内、デッドロック要因となっているロックハンドル（Ａ）および（Ｂ）と、その間に取得したロックハンドルを解放することで、Ｕｎｌｏｃｋ待ちで停止している処理スレッド２００が復帰場所ＦｕｎｃＡＢから処理を再開することが可能になる。 Returning to FIG. 12, in step 353, among the acquired lock handles of the processing thread 200, the lock handles (A) and (B) that are the cause of the deadlock, and the lock handle acquired between them are released. The processing thread 200 that has stopped waiting for Unlock can resume processing from the return location FuncAB.

以上により、デッドロック検出、回避、解消が可能となった。
本実施例によれば、多重ロックを取得する時点で予め必要なロックが全て既知である前提条件が不要となり、どのようなソフトウェアにも汎用的に適用可能となる。また、デッドロックが発生しても、フェールオーバを実施することなく処理を再開できるため、ソフトウェアの信頼性および稼働率を向上することができる。 As described above, deadlock can be detected, avoided, and resolved.
According to the present embodiment, the precondition that all necessary locks are known in advance at the time of acquiring the multiple lock is not necessary, and can be applied to any software for general purposes. In addition, even if a deadlock occurs, the process can be resumed without performing a failover, so that the software reliability and operating rate can be improved.

１００…通信装置、１０１…プロセッサ、１０２…メモリ、１０３…ＮＩＦ、１１０…バス、２００…処理スレッド、２１０…監視スレッド、２２０…フック、２３０…キュー、４００…ロックレコード、４５０…多重ロックレコード、５００…Ｌｏｃｋ命令、５５０…フック有りＬｏｃｋ命令、６００…Ｕｎｌｏｃｋ命令、６５０…フック有りＵｎｌｏｃｋ命令、７００…ロックレコードテーブル、８００…多重ロックレコードテーブル、１３０１…スナップショット DESCRIPTION OF SYMBOLS 100 ... Communication apparatus, 101 ... Processor, 102 ... Memory, 103 ... NIF, 110 ... Bus, 200 ... Processing thread, 210 ... Monitoring thread, 220 ... Hook, 230 ... Queue, 400 ... Lock record, 450 ... Multiple lock record, 500 ... Lock instruction, 550 ... Lock instruction with hook, 600 ... Unlock instruction, 650 ... Unlock instruction with hook, 700 ... Lock record table, 800 ... Multiple lock record table, 1301 ... Snapshot

Claims

A computer system that executes a plurality of processing threads and a monitoring thread in parallel,
A processor and memory;
The processing thread collects and accumulates exclusive control information, sends multiple exclusive control information to the monitoring thread,
The monitoring thread accumulates the multiple exclusive control information, and when detecting a combination that causes a deadlock between the multiple exclusive control information, adds the exclusive control to the return location of the multiple exclusive control information of the combination. A computer system.

The computer system according to claim 1,
The computer system according to claim 1, wherein the processing thread collects the exclusive control information using a hook to a lock acquisition instruction and a lock release instruction.

The computer system according to claim 1,
The computer system is characterized in that the monitoring thread hooks at the start of a function including a lock acquisition instruction that causes a deadlock and at the end of a function including a lock release instruction, and specifies a return location.

The computer system according to claim 1,
Take a snapshot of the memory contents before the deadlock occurs,
After the deadlock occurs, restore the memory contents from the snapshot,
Change the program counter position to the return position,
A computer system that restores from a deadlock state by releasing exclusive control that caused a deadlock.

Computer
A processing thread that collects and accumulates exclusive control information and sends multiple exclusive control information to the monitoring thread,
A monitoring thread that accumulates the multiple exclusive control information and adds exclusive control to the return location of the multiple exclusive control information of the combination when a combination that causes a deadlock between the multiple exclusive control information is detected;
Program to function as.