JP5648187B2

JP5648187B2 - Computer system and monitoring method

Info

Publication number: JP5648187B2
Application number: JP2011258318A
Authority: JP
Inventors: 功今野; 松木　譲介; 譲介松木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-11-25
Filing date: 2011-11-25
Publication date: 2015-01-07
Anticipated expiration: 2031-11-25
Also published as: JP2013114359A

Description

本発明は、計算機システムに関し、特に、スレッドの監視を行う計算機システムに関する。 The present invention relates to a computer system, and more particularly to a computer system that monitors threads.

近年、様々なシステムにおいてマルチスレッド環境によって稼働するソフトウェアが普及している。マルチスレッド環境は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）が処理の実行単位をスレッドによって管理し、複数のスレッドを実行する。具体的には、ＯＳが、各スレッドに割り当てられるＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）時間をスケジューリングする。そして、スケジューリングされたスレッドは、ＣＰＵ時間を割り当てられた時間だけ実行される。 In recent years, software that operates in a multi-thread environment has become widespread in various systems. In a multi-thread environment, an OS (Operating System) manages a process execution unit by using a thread and executes a plurality of threads. Specifically, the OS schedules a CPU (Central Processing Unit) time allocated to each thread. Then, the scheduled thread is executed for the time allocated with the CPU time.

従って、ＣＰＵが複数備わるマルチスレッド環境は、スレッドを並列に実行することによって処理を高速化できるという利点がある。しかし、スレッドを並列に実行することによって、シングルスレッド環境において生じなかったスレッドの異常動作という新たな問題が発生する。マルチスレッド環境におけるスレッドの異常動作としては、無限ループおよびデッドロックが知られている。 Therefore, a multi-thread environment having a plurality of CPUs has an advantage that processing can be speeded up by executing threads in parallel. However, by executing the threads in parallel, a new problem of abnormal operation of the thread that did not occur in the single thread environment occurs. As an abnormal operation of a thread in a multithread environment, an infinite loop and a deadlock are known.

ここで、無限ループとは、ループ処理を終了できずに、ソフトウェア開発者が意図していない処理を繰り返す状態である。また、デッドロックとは、複数のスレッドまたは複数のプロセスなどの処理単位がリソースを共有する場合において、リソースを占有（ロック）中の処理単位が互いにリソースの解放（アンロック）を待ったり、または、各処理単位がリソースを占有した後にリソースをアンロックしなかったりすることによって、処理の続きが実行されずに処理が停止する状態である。 Here, the infinite loop is a state in which the loop processing cannot be completed and processing that is not intended by the software developer is repeated. In addition, deadlock means that when processing units such as multiple threads or multiple processes share resources, processing units that occupy (lock) resources wait for each other to release resources (unlock), or When each processing unit occupies the resource and does not unlock the resource, the processing is stopped without continuing the processing.

ここで、プロセスとはプログラムの実行単位である。プロセスは、プログラム内で利用される変数および状態を保持し、一つ以上のスレッドから構成される。また、アンロックとは、占有中の共有リソースを解放し、他のスレッドがアクセスできるようにすることである。 Here, a process is a program execution unit. A process holds variables and states used in a program and is composed of one or more threads. The unlocking means releasing the shared resource being occupied so that other threads can access it.

これらの異常動作はプロセスを異常終了させないため、ソフトウェアの開発者または使用者が異常の発生に気付かないという問題がある。また、異常に気付いた場合も、複数のスレッドによる処理が並列に実行されるため、スレッド全体の状態を追うことが困難になり、原因解析にも時間がかかる。そこで、スレッドの異常動作を検出する「スレッド死活監視」の技術が重要となってくる。 Since these abnormal operations do not terminate the process abnormally, there is a problem that the software developer or user does not notice the occurrence of the abnormality. In addition, even when an abnormality is noticed, processing by a plurality of threads is executed in parallel, making it difficult to follow the state of the entire thread, and it takes time to analyze the cause. Therefore, a technique of “thread life monitoring” that detects abnormal operation of a thread becomes important.

最も単純なスレッド死活監視の方法としては、監視対象となるスレッドが生存フラグを保持し、監視対象のスレッドを監視するスレッド（監視スレッド）が、生存フラグを確認する方法がある。しかし、この方法では監視対象となるプロセスを生成するプログラムのソースコードを直接変更し、さらに、ソースコードをコンパイルしなければならない点と、使用者の入力待ちなどのＷＡＩＴ状態を異常として誤検出する点に問題がある。 As the simplest thread alive monitoring method, there is a method in which a monitoring target thread holds a survival flag, and a thread that monitors the monitoring target thread (monitoring thread) checks the survival flag. However, in this method, the source code of the program that generates the process to be monitored is directly changed, and the source code must be compiled, and the WAIT state such as waiting for user input is erroneously detected as abnormal. There is a problem with the point.

なお、コンパイルとは、ソースコードから計算機が実行できるバイナリ形式の実行ファイルを生成することである。 Compiling means generating a binary executable file that can be executed by a computer from source code.

そこで、スレッドにＣＰＵを割り当ててからスレッドの切り替え命令が発生するまでの命令実行数から、無限ループを検出するスレッド異常動作検知方法およびプログラムが提案されている（例えば、特許文献１参照）。ここで、スレッドの切り替え命令とは、スレッドの処理を停止し、別スレッドにＣＰＵを割り当てる命令である。スレッドの切り替え命令には、スリープ関数およびロック取得待ち関数が含まれる。特許文献１に記載された方法は、スレッドが切り替わるまでの実行命令数が閾値より大きくなったか否かを判定することによって、命令を実行し続ける無限ループを検出することができる。 Therefore, a thread abnormal operation detection method and program for detecting an infinite loop from the number of instruction executions from when a CPU is assigned to a thread until a thread switching instruction is generated have been proposed (for example, see Patent Document 1). Here, the thread switching instruction is an instruction that stops processing of a thread and assigns a CPU to another thread. The thread switching instruction includes a sleep function and a lock acquisition wait function. The method described in Patent Literature 1 can detect an infinite loop in which instructions are continuously executed by determining whether or not the number of instructions executed until a thread is switched is greater than a threshold value.

さらに、ロック取得および解除に対してＡＰＩ関数フックを行い、スレッド毎に共有リソースに対して「ロック中」および「ロック取得中」の情報を取得し、その組み合わせによってデッドロックを検出する計算機システムおよびプログラムが提案されている（例えば、特許文献２参照）。 Furthermore, a computer system that performs API function hooks for lock acquisition and release, acquires information about “locking” and “lock acquisition” for each shared resource, and detects a deadlock by a combination thereof. A program has been proposed (see, for example, Patent Document 2).

ここで、ＡＰＩ関数とは、ＯＳまたはミドルウェアが提供するアプリケーションおよびソフトウェア開発向けのインターフェースであり、共通ライブラリによって提供される。また、ＡＰＩ関数フックとは、ＡＰＩ関数の処理を横取りし、横取りされたＡＰＩ関数の処理の代わりに使用者が独自に定義した処理を、ＣＰＵに行わせることである。 Here, the API function is an interface for application and software development provided by the OS or middleware, and is provided by a common library. The API function hook is to intercept the processing of the API function and cause the CPU to perform processing uniquely defined by the user instead of the processing of the intercepted API function.

また、「ロック中」とは、リソースを使用しているスレッド以外のスレッドがリソースにアクセスできない状態であり、「ロック取得中」とは、ロック中のリソースが解放されるのを待ってからロックしようとしている状態である。 “Locking” means that a thread other than the thread using the resource cannot access the resource. “Lock acquisition” means waiting for the locked resource to be released before locking. It is in a state of trying.

特許文献２に記載の計算機システムは、これらの情報を組み合わせることによって、デッドロックを検出できる。例えば、リソースＡを「ロック中」であり、かつ、リソースＢを「ロック取得中」であるスレッド１と、リソースＡを「ロック取得中」であり、かつ、リソースＢを「ロック中」であるスレッド２とが同時に存在した場合、互いにリソースが解放されるのを待ち続けるデッドロックとして検出できる。 The computer system described in Patent Literature 2 can detect a deadlock by combining these pieces of information. For example, the resource 1 is “locking”, the resource B is “lock acquiring”, the resource A is “lock acquiring”, and the resource B is “locking” When the thread 2 exists at the same time, it can be detected as a deadlock that keeps waiting for the resources to be released from each other.

特開２００８−２０４０１３号広報JP 2008-204033 PR 特開２００９−２７１８５８号広報Japanese Unexamined Patent Publication No. 2009-271858

まず、前述の特許文献１には以下の問題点が存在する。 First, the above-described Patent Document 1 has the following problems.

特許文献１における第１の問題点は、デッドロックが発生した場合、スレッドが休眠状態（処理が停止している状態）を続け、実行命令数が増加しなくなるため、デッドロックが検出されないことである。 The first problem in Patent Document 1 is that when a deadlock occurs, the thread continues to be in a sleep state (a state in which processing is stopped), and the number of executed instructions does not increase, so that the deadlock is not detected. is there.

また、特許文献１における第２の問題点は、ＯＳの中枢であり、かつ、複雑なソフトウェアであるカーネル（または、さらに下位層のソフトウェア）の変更が必要であり、スレッド死活監視の対象外のプロセスにも影響があることである。 In addition, the second problem in Patent Document 1 is the core of the OS and requires a change in the kernel (or lower layer software) that is complex software, and is not subject to thread alive monitoring. The process is also affected.

また、特許文献１における第３の問題点は、無限ループとなるプログラムがスレッド切り替え命令を含む場合、命令実行数がリセットされるため、無限ループが検出されないことである。 The third problem in Patent Document 1 is that when a program that becomes an infinite loop includes a thread switching instruction, the number of instruction executions is reset, and thus an infinite loop is not detected.

さらに、前述の特許文献２には以下の問題点が存在する。 Further, the above-described Patent Document 2 has the following problems.

特許文献２における第１の問題点は、特許文献２のように「ロック中」と「ロック取得中」との情報を取得する方法のみでは、ロック処理を全く行わない無限ループが検出されないことである。 The first problem in Patent Document 2 is that an infinite loop that does not perform lock processing at all is not detected only by the method of acquiring “locking” and “lock acquiring” information as in Patent Document 2. is there.

特許文献２における第２の問題点は、ロック処理を行うＡＰＩ関数にＡＰＩ関数フックが行われるため、共有リソースを占有する時間がＡＰＩ関数フックする時間分増え、他のスレッドが共有リソースにアクセスできない時間が増えるため、実行速度に影響が出ることである。 The second problem in Patent Document 2 is that an API function hook is performed on an API function that performs lock processing. Therefore, the time for occupying the shared resource increases by the time for the API function hook, and other threads cannot access the shared resource. Since time increases, the execution speed is affected.

特許文献２における第３の問題点は、特許文献２に記載のシステムが、ロック処理を伴わない関数が処理として停止している状態を、検出できないことである。例えば、特許文献２に記載のシステムは、他のスレッドからの信号が到達するまで停止し続けるｃｏｎｄ＿ｗａｉｔＡＰＩ関数による停止状態を検出できない。 The third problem in Patent Document 2 is that the system described in Patent Document 2 cannot detect a state in which a function that does not involve lock processing is stopped as processing. For example, the system described in Patent Document 2 cannot detect a stop state by a cond_wait API function that continues to stop until a signal from another thread arrives.

前述の特許文献１および特許文献２における問題点は、特許文献１と特許文献２とを組み合わせても、スレッド切り替え命令を含む無限ループ、および、ロック処理を伴わない関数の停止状態を検出できないため、容易には解決できない。 The problems in Patent Document 1 and Patent Document 2 described above are that, even if Patent Document 1 and Patent Document 2 are combined, an infinite loop including a thread switching instruction and a stopped state of a function that does not involve lock processing cannot be detected. Can't be solved easily.

従って、本発明の目的は前述の六つの問題点を解決する方法を提供することである。 Accordingly, an object of the present invention is to provide a method for solving the above six problems.

具体的には、本発明の目的は、無限ループ、デッドロック、またはロック処理を伴わない停止処理によってスレッドが正常に動作していないことを、検出できるスレッド死活監視方法の提供である。そして、この提供に伴い、カーネルの変更、ソースコードの変更およびリコンパイル、または、コンパイル済み実行ファイルへの変更が不要であり、かつ、スレッドの動作を遅延させないことが目的である。 Specifically, an object of the present invention is to provide a thread alive monitoring method that can detect that a thread is not operating normally by an infinite loop, a deadlock, or a stop process that does not involve a lock process. With this provision, the object is to eliminate the need for kernel changes, source code changes and recompilation, or changes to compiled executables and not delaying thread operations.

本発明の代表的な一形態によると、複数のスレッドを実行する計算機システムであって、前記計算機システムは、少なくとも一つのプロセッサと、メモリとを備え、アプリケーションプログラミングインタフェースによって処理が割り当てられたＡＰＩ関数を実行するため、前記各スレッドを実行し、第１の前記ＡＰＩ関数に割り当てられた第１の処理を実行するため、第１の前記スレッドを実行し、前記第１のスレッドの状態を監視する処理を実行するため、監視スレッドを実行し、前記第１のスレッドが正常であるか否かを示す生存情報を保持するための監視情報領域と、前記第１の処理を示す情報と、前記第１のスレッドが正常であることを示す値によって前記生存情報を更新する第２の処理を示す情報と、を含む第１の処理内容と、を前記メモリに保持し、前記第１のスレッドは、前記第１の処理内容を読み出すことによって、前記第１の処理および前記第２の処理を実行するＡＰＩ関数フック処理を行い、前記監視スレッドは、前記生存情報が前記第１のスレッドが正常であることを示すか否かを判定し、前記生存情報が前記第１のスレッドが正常であることを示すと判定された場合、前記第１のスレッドは正常であると判定し、かつ、前記第１のスレッドが正常であることを示さない値によって前記生存情報を更新し、前記生存情報が前記第１のスレッドが正常であることを示さないと判定された場合、前記第１のスレッドは正常ではないと判定する。According to a typical embodiment of the present invention, a computer system that executes a plurality of threads, the computer system including at least one processor and a memory, and an API function to which processing is assigned by an application programming interface. The first thread is executed to execute the first process assigned to the first API function, and the state of the first thread is monitored. In order to execute processing, a monitoring thread is executed, a monitoring information area for holding survival information indicating whether or not the first thread is normal, information indicating the first processing, and the first Information indicating a second process for updating the survival information with a value indicating that one thread is normal; The first thread performs an API function hook process for executing the first process and the second process by reading the first process content, and the monitoring thread is stored in the memory. If the survival information indicates whether the first thread is normal, and if it is determined that the survival information indicates that the first thread is normal, the first thread Is determined to be normal, and the survival information is updated with a value that does not indicate that the first thread is normal, and the survival information does not indicate that the first thread is normal. If it is determined, it is determined that the first thread is not normal.

本発明の一実施形態によると、無限ループ、デッドロック、または、ロック処理を伴わない停止処理による、スレッドの状態を判定できる。 According to an embodiment of the present invention, the state of a thread can be determined by an infinite loop, a deadlock, or a stop process that does not involve a lock process.

本発明の第１の実施形態のスレッド死活監視方法を示す説明図である。It is explanatory drawing which shows the thread alive monitoring method of the 1st Embodiment of this invention. 本発明の第１の実施形態の機器のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the apparatus of the 1st Embodiment of this invention. 本発明の第１の実施形態のＡＰＩ関数フック群を示す説明図である。It is explanatory drawing which shows the API function hook group of the 1st Embodiment of this invention. 本発明の第１の実施形態の監視スレッドが保持するデータおよびスレッドが保持するデータを示す説明図である。It is explanatory drawing which shows the data which the monitoring thread | sled of the 1st Embodiment of this invention hold | maintains, and the data which a thread | sled hold | maintains. 本発明の第１の実施形態の生存報告および異常動作検出を示すシーケンス図である。It is a sequence diagram which shows the survival report and abnormal operation detection of the 1st Embodiment of this invention. 本発明の第１の実施形態の監視スレッドによる異常動作検出を示すフローチャートである。It is a flowchart which shows abnormal operation detection by the monitoring thread | sled of the 1st Embodiment of this invention. 本発明の第２の実施形態の異常動作検出を示すシーケンス図である。It is a sequence diagram which shows the abnormal operation detection of the 2nd Embodiment of this invention. 本発明の第２の実施形態の生存確認示すフローチャートである。It is a flowchart which shows the survival confirmation of the 2nd Embodiment of this invention. 本発明の第３の実施形態のＡＰＩ関数フック群を決定する処理を示す説明図である。It is explanatory drawing which shows the process which determines the API function hook group of the 3rd Embodiment of this invention. 本発明の第３の実施形態のＡＰＩ関数選択プログラムの処理を示すフローチャートである。It is a flowchart which shows the process of the API function selection program of the 3rd Embodiment of this invention. 従来技術のスレッド死活監視方法を示す説明図である。It is explanatory drawing which shows the thread alive monitoring method of a prior art. 従来技術の生存確認の処理を示すフローチャートである。It is a flowchart which shows the process of survival confirmation of a prior art.

以下、本発明実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
本実施形態において、スレッドが正常に稼働しているか否かを監視（以下、スレッド監視）する装置として、通信システムを構成するゲートウェイの機能を有する計算機を例に説明する。 (First embodiment)
In this embodiment, a computer having a gateway function constituting a communication system will be described as an example of a device that monitors whether a thread is operating normally (hereinafter, thread monitoring).

なお、本実施形態におけるＡＰＩ関数フック処理とは、ＡＰＩ関数に対して利用者が独自の処理を追加または処理内容を変更することである。また、ライブラリとは、特定の処理および機能を行うための再利用可能なコード（プログラム）を含む、データの集まりである。また、共有ライブラリとは、複数のプログラムによって共有されるライブラリであり、ＡＰＩ関数も共有ライブラリとして提供される。また、スレッドとは、計算機による処理の分割単位であり、計算機がスレッドを生成することによって、計算機は、分割した処理を並列に実行することができる。 Note that the API function hook process in the present embodiment means that the user adds an original process to the API function or changes the process content. A library is a collection of data including reusable code (program) for performing specific processing and functions. A shared library is a library shared by a plurality of programs, and an API function is also provided as a shared library. A thread is a unit of processing divided by a computer. When the computer generates a thread, the computer can execute the divided processing in parallel.

図１１は、従来技術のスレッド監視方法を示す説明図である。 FIG. 11 is an explanatory diagram showing a conventional thread monitoring method.

従来のゲートウェイ機器におけるスレッド監視の処理は、図１１に示すように、複数のスレッド１２、監視スレッド１１、および、カーネル１００によって実行される。従来のゲートウェイ機器は、プロセッサおよびメモリ（主記憶装置）を備える。ゲートウェイ機器に備わるプロセッサがメモリを用いてＯＳまたはアプリケーションプログラムを実行することによって、カーネル１００、監視スレッド１１およびスレッド１２を実行する。 The thread monitoring process in the conventional gateway device is executed by a plurality of threads 12, the monitoring thread 11, and the kernel 100 as shown in FIG. A conventional gateway device includes a processor and a memory (main storage device). The processor provided in the gateway device executes the OS or application program using the memory, thereby executing the kernel 100, the monitoring thread 11, and the thread 12.

カーネル１００は、ゲートウェイ機器のＯＳを実行するための処理単位である。監視スレッド１１は、プログラムを実行するための処理単位であり、スレッド１２のスレッド監視を行うスレッドである。スレッド１２は、プログラムを実行するための処理単位である。 The kernel 100 is a processing unit for executing the OS of the gateway device. The monitoring thread 11 is a processing unit for executing a program, and is a thread that performs thread monitoring of the thread 12. The thread 12 is a processing unit for executing a program.

図１１に示すスレッド１２は、ゲートウェイ機器に送信されるパケットを受信し、受信したパケットを加工し、加工されたパケットを送信する処理を行う。スレッド１２は、命令メッセージキュー１３０、および、生存フラグ１４０をメモリに保持する。命令メッセージキュー１３０は、カーネル１００、他のスレッド１２、または、監視スレッド１１から送信される命令メッセージを保持するためのキューである。生存フラグ１４０は、スレッド１２によって更新される記憶領域であり、監視スレッド１１によって参照および更新される。 The thread 12 illustrated in FIG. 11 receives a packet transmitted to the gateway device, processes the received packet, and transmits the processed packet. The thread 12 holds the instruction message queue 130 and the survival flag 140 in the memory. The instruction message queue 130 is a queue for holding instruction messages transmitted from the kernel 100, another thread 12, or the monitoring thread 11. The survival flag 140 is a storage area that is updated by the thread 12, and is referenced and updated by the monitoring thread 11.

スレッド１２が開始された後（１２１）、スレッド１２は、命令メッセージキュー１３０に命令メッセージが追加されるまで待つ（１２２）。ステップ１２２の後、スレッド１２は、命令メッセージキュー１３０に追加された命令メッセージが、監視命令メッセージ１３１か否かを判定する（１２３）。 After thread 12 is started (121), thread 12 waits until an instruction message is added to instruction message queue 130 (122). After step 122, the thread 12 determines whether or not the instruction message added to the instruction message queue 130 is the monitoring instruction message 131 (123).

命令メッセージキュー１３０は、スレッド１２へ送信された命令メッセージを格納するデータ構造を含む。スレッド１２は、命令メッセージキュー１３０に命令メッセージが追加された順番に命令メッセージを処理する。 The instruction message queue 130 includes a data structure for storing instruction messages transmitted to the thread 12. The thread 12 processes the instruction messages in the order in which the instruction messages are added to the instruction message queue 130.

ステップ１２３において、命令メッセージキュー１３０に追加された命令メッセージが監視命令メッセージ１３１であると判定された場合、スレッド１２は、スレッド１２が生存していると報告するための処理（生存報告）を実行する（１２４）。 If it is determined in step 123 that the instruction message added to the instruction message queue 130 is the monitoring instruction message 131, the thread 12 executes processing for reporting that the thread 12 is alive (survival report). (124).

ステップ１２４における生存報告は、あらかじめ保持された１ビットの生存フラグ１４０に、１を格納する処理（有効化）である。生存フラグ１４０は、スレッド１２が生成された際に生成される。 The survival report in step 124 is a process (validation) of storing 1 in the 1-bit survival flag 140 held in advance. The survival flag 140 is generated when the thread 12 is generated.

ステップ１２３において、命令メッセージキュー１３０に追加された命令メッセージが監視命令メッセージ１３１ではなく、パケット受信命令メッセージである場合、スレッド１２は、パケットを受信し（１２５）、受信したパケットを加工し（１２６）、加工されたパケットを送信する（１２７）。スレッド１２は、ステップ１２４またはステップ１２７が終了後、ステップ１２２に戻り、次の命令メッセージが命令メッセージキュー１３０に追加されるまで待つ。 In step 123, when the instruction message added to the instruction message queue 130 is not the monitoring instruction message 131 but a packet reception instruction message, the thread 12 receives the packet (125) and processes the received packet (126). ), The processed packet is transmitted (127). After step 124 or step 127 ends, thread 12 returns to step 122 and waits until the next instruction message is added to instruction message queue 130.

一方、監視スレッド１１が開始された後、監視スレッド１１は、所定の監視周期が経過したか否かを判定し（１１２）、監視周期が経過するまで待つ。監視周期が経過した場合、監視命令メッセージ１３１を生成し（１１３）、スレッド１２の命令メッセージキュー１３０の末尾に監視命令メッセージ１３１を追加する。ステップ１１３の後、生存フラグ１４０を参照し、スレッド１２の生存確認３００を行う。 On the other hand, after the monitoring thread 11 is started, the monitoring thread 11 determines whether or not a predetermined monitoring period has elapsed (112) and waits until the monitoring period elapses. When the monitoring period has elapsed, a monitoring command message 131 is generated (113), and the monitoring command message 131 is added to the end of the command message queue 130 of the thread 12. After step 113, the survival flag 140 is referred to, and the survival confirmation 300 of the thread 12 is performed.

生存確認３００において、監視スレッド１１は、生存フラグ１４０に１が格納されているか否かを判定し、１が格納されている場合、スレッド１２は生存していると判定し、生存フラグ１４０に０を格納する（無効化）。 In the survival confirmation 300, the monitoring thread 11 determines whether 1 is stored in the survival flag 140. If 1 is stored, the monitoring thread 11 determines that the thread 12 is alive and sets 0 in the survival flag 140. Is stored (invalidated).

生存確認３００の後、監視スレッド１１は、ステップ１１２に戻る。 After the survival confirmation 300, the monitoring thread 11 returns to step 112.

図１２は、従来技術の生存確認３００の処理を示すフローチャートである。 FIG. 12 is a flowchart showing the process of the survival confirmation 300 of the prior art.

監視スレッド１１は、図１２のフローチャートに示すように生存確認３００を行う。まず、監視スレッド１１は、生存フラグ１４０に１が格納されているか否か判定する（３１０）。そして、生存フラグ１４０に１が格納されている場合、監視スレッド１１は生存フラグ１４０に０を格納し（３２０）、監視スレッド１１はスレッド１２が正常であると判定する（３３０）。 The monitoring thread 11 performs the survival confirmation 300 as shown in the flowchart of FIG. First, the monitoring thread 11 determines whether 1 is stored in the survival flag 140 (310). If 1 is stored in the survival flag 140, the monitoring thread 11 stores 0 in the survival flag 140 (320), and the monitoring thread 11 determines that the thread 12 is normal (330).

ステップ３１０において、生存フラグ１４０に１が格納されていないと判定された場合、監視スレッド１１は、スレッド１２が異常であると判定する（３４０）。ステップ３３０またはステップ３４０が終了した後、監視スレッド１１は、生存確認３００を終了する（３５０）。 If it is determined in step 310 that 1 is not stored in the survival flag 140, the monitoring thread 11 determines that the thread 12 is abnormal (340). After step 330 or step 340 ends, the monitoring thread 11 ends the survival confirmation 300 (350).

さらに、ＯＳの機能によってソフトウェア割り込み１５０が発生した場合、カーネル１００は、割り込み処理を開始する（１０１）。図１１に示すソフトウェア割り込み１５０は、パケットをゲートウェイ機器が受信した場合に実行される。 Further, when the software interrupt 150 is generated by the function of the OS, the kernel 100 starts interrupt processing (101). The software interrupt 150 shown in FIG. 11 is executed when a packet is received by the gateway device.

カーネル１００は、他の機器がゲートウェイ機器に送信したパケットを、ＮＩＦ（ＮｅｔｗｏｒｋＩｎｔｅｒＦａｃｅ）から受信する（１０２）。ステップ１０２の後、カーネル１００は、ソケットバッファに受信したパケットデータをコピーする（１０３）。 The kernel 100 receives a packet transmitted from another device to the gateway device from a network interface (NIF) (102). After step 102, the kernel 100 copies the received packet data to the socket buffer (103).

ここで、ソケットバッファとは、カーネル１００が受信したパケットの内容を一時的に保存しておくメモリである。ソケットバッファは、スレッド１２がパケットを受信するために用いられる。 Here, the socket buffer is a memory that temporarily stores the contents of the packet received by the kernel 100. The socket buffer is used by the thread 12 to receive a packet.

ステップ１０３の後、カーネル１００は、パケット受信命令メッセージ１３２を生成し（１０４）、生成されたパケット受信命令メッセージ１３２を命令メッセージキュー１３０の末尾に追加する。そして、ステップ１０４の後、カーネル１００は、割り込み処理を終了する（１０５）。 After step 103, the kernel 100 generates a packet reception command message 132 (104), and adds the generated packet reception command message 132 to the end of the command message queue 130. After step 104, the kernel 100 ends the interrupt process (105).

図１１および図１２に示す処理によって、スレッド１２は、パケットの受信処理とスレッド監視処理における生存報告処理との両方を実現する。また、監視スレッド１１が監視命令メッセージ１３１を命令メッセージキュー１３０に追加するため、監視スレッド１１は、監視スレッド１１が生存確認３００をする前にスレッド１２のＷＡＩＴ状態を終了させ、ステップ１２４における生存報告を実行させることができる。 11 and 12, the thread 12 realizes both the packet reception process and the survival report process in the thread monitoring process. Further, since the monitoring thread 11 adds the monitoring instruction message 131 to the instruction message queue 130, the monitoring thread 11 ends the WAIT state of the thread 12 before the monitoring thread 11 performs the survival confirmation 300, and the survival report in step 124 is performed. Can be executed.

しかし、図１１および図１２に示す処理を、スレッド監視方法を導入していないシステムに適用する場合、開発者が、監視対象となるスレッド１２のソースコードを生存報告をするように変更し、さらに、変更されたソースコードをコンパイルしなければならない。また、監視スレッド１１による監視命令メッセージ１３１の送信およびスレッド１２による監視命令メッセージ１３１の受信のため、ゲートウェイ機器のＣＰＵ処理時間が増加する問題がある。 However, when the processing shown in FIGS. 11 and 12 is applied to a system in which the thread monitoring method is not introduced, the developer changes the source code of the thread 12 to be monitored to report the survival, The modified source code must be compiled. Further, there is a problem that the CPU processing time of the gateway device increases because the monitoring instruction message 131 is transmitted by the monitoring thread 11 and the monitoring instruction message 131 is received by the thread 12.

さらに、図１１の処理において、スレッド１２は監視命令メッセージ１３１を契機に生存報告を行うため、パケットを受信する処理（ステップ１２５〜ステップ１２７）に時間がかかった場合に生存報告が間に合わなくなる可能性がある。この場合、監視スレッド１１は、スレッド１２を異常と誤判定する可能性がある。 Furthermore, in the process of FIG. 11, since the thread 12 performs the survival report in response to the monitoring command message 131, there is a possibility that the survival report may not be in time if the process of receiving the packet (step 125 to step 127) takes time. There is. In this case, the monitoring thread 11 may erroneously determine that the thread 12 is abnormal.

このため、本発明の第１の実施形態は、監視命令メッセージ１３１を利用しない方法として、ＡＰＩ関数フックによるスレッド監視方法を用いる。 For this reason, the first embodiment of the present invention uses a thread monitoring method using an API function hook as a method not using the monitoring instruction message 131.

図１は、本発明の第１の実施形態のスレッド監視方法を示す説明図である。 FIG. 1 is an explanatory diagram illustrating a thread monitoring method according to the first embodiment of this invention.

図１に示すスレッド監視の処理は、スレッド１２０、監視スレッド１１０、およびカーネル１００によって実行される
第１の実施形態のスレッド監視方法は、ステップ１２２における命令メッセージキュー１３０への命令メッセージの追加を待つ処理において、スレッド１２０が、スレッド１２０をＷＡＩＴ状態にするＡＰＩ関数（例えば、ｓｌｅｅｐＡＰＩ関数）をＡＰＩ関数フックすることによって、ステップ１２２の処理を横取りする方法である。なお、ＡＰＩ関数フックの方法は後述する。 The thread monitoring process shown in FIG. 1 is executed by the thread 120, the monitoring thread 110, and the kernel 100. The thread monitoring method according to the first embodiment waits for an instruction message to be added to the instruction message queue 130 in step 122. In this process, the thread 120 intercepts the process of step 122 by hooking an API function (for example, a sleep API function) that makes the thread 120 a WAIT state. The API function hook method will be described later.

そして、第１の実施形態のスレッド監視方法は、横取りされたＡＰＩ関数に含まれる固有の処理（スレッド１２０をＷＡＩＴ状態にするＡＰＩ関数の処理）に、生存報告２００をする処理を追加させる。例えば、横取りされたＡＰＩ関数がｓｌｅｅｐＡＰＩ関数である場合、スレッド１２０は、スレッド１２０を指定された秒数の間停止する処理に加え、生存報告２００をする処理を追加させる。 Then, the thread monitoring method according to the first embodiment adds a process for reporting the survival report 200 to the inherent process included in the pre-prepared API function (the API function that sets the thread 120 to the WAIT state). For example, when the intercepted API function is a sleep API function, the thread 120 adds a process for making a survival report 200 in addition to a process for stopping the thread 120 for a specified number of seconds.

これによって、スレッド１２０は、図１１におけるステップ１２４において実行されていた生存報告を、スレッド監視の対象のスレッド１２０のソースコードの変更とコンパイルとを行うことなく、ステップ１２２において実行できる。また、監視命令メッセージ１３１に関する、ステップ１１３における生成処理とステップ１２３における受信判定処理とが不要になり、プログラムサイズの縮小とプロセッサの処理時間の削減とが可能になる。 As a result, the thread 120 can execute the survival report executed in step 124 in FIG. 11 in step 122 without changing and compiling the source code of the thread 120 to be monitored. Further, the generation process in step 113 and the reception determination process in step 123 relating to the monitoring command message 131 are not necessary, and the program size can be reduced and the processing time of the processor can be reduced.

そして、ＡＰＩ関数フックは、監視対象であるスレッド１２０のソースコードに変更を加えない方法であるため、既存システムに変更を加えずにスレッド監視方法を提供することができる。 Since the API function hook is a method that does not change the source code of the thread 120 to be monitored, a thread monitoring method can be provided without changing the existing system.

さらに、第１の実施形態のスレッド監視方法はフックする対象のＡＰＩ関数を選択することができるため、ステップ１２５におけるパケットの受信処理、ステップ１２６におけるパケットの加工処理、または、ステップ１２７におけるパケットの送信処理が、ＡＰＩ関数フックする対象として選択されることによって、ステップ１２５〜ステップ１２７における処理のいずれかにおいて、スレッド１２０が生存報告２００をすることも可能である。 Furthermore, since the thread monitoring method of the first embodiment can select an API function to be hooked, the packet reception process in step 125, the packet processing process in step 126, or the packet transmission in step 127 is performed. By selecting a process as an API function hook target, the thread 120 can make a survival report 200 in any of the processes in steps 125 to 127.

そしてこれによって、第１の実施形態のスレッド監視方法は、ゲートウェイ機器が受信するパケットのサイズが大きく、ステップ１２５〜ステップ１２７の処理に要する合計時間が、監視スレッド１１０が保持する所定の監視周期よりも大きくなった場合も、後述する処理によって、監視スレッド１１０がスレッド１２０を異常状態として誤判定することを防ぐことができる。 As a result, in the thread monitoring method of the first embodiment, the size of the packet received by the gateway device is large, and the total time required for the processing from step 125 to step 127 is greater than the predetermined monitoring period held by the monitoring thread 110. Also, when the value becomes larger, the process described later can prevent the monitoring thread 110 from erroneously determining the thread 120 as an abnormal state.

なお、後述する本実施形態におけるＡＰＩ関数フックを実現する方法には、Ｕｎｉｘ（登録商標）またはＬｉｎｕｘ（登録商標）系ＯＳにおいて、開発者または所定のプログラムが、フック処理を定義した共有ライブラリをあらかじめ生成し、生成された共有ライブラリへのパスをＬＤ＿ＰＲＥＬＯＡＤ環境変数等の環境変数に追記する方法を用いる。スレッド１２０を含むプロセスが起動時にＬＤ＿ＰＲＥＬＯＡＤ環境変数を読み出し、ＬＤ＿ＰＲＥＬＯＡＤ環境変数で指定された共有ライブラリ内のＡＰＩ関数をロードする。このため、スレッド１２０の実行時に共有ライブラリで定義したＡＰＩ関数を実行するため、スレッド１２０に含まれるＡＰＩ関数がＡＰＩ関数フックされる。 In addition, in the method of realizing the API function hook in this embodiment to be described later, in a Unix (registered trademark) or Linux (registered trademark) OS, a developer or a predetermined program defines a shared library in which hook processing is defined in advance. A method of generating and appending the path to the generated shared library to an environment variable such as an LD_PRELOAD environment variable is used. The process including the thread 120 reads the LD_PRELOAD environment variable at startup, and loads the API function in the shared library specified by the LD_PRELOAD environment variable. For this reason, since the API function defined in the shared library is executed when the thread 120 is executed, the API function included in the thread 120 is hooked to the API function.

しかし、本実施形態におけるＡＰＩ関数フックを実現する方法は、ＯＳまたはＡＰＩ関数フック方法を限定するものではない。すなわち、スレッド１２０が実行される際に、処理される関数に新たな処理を追加できれば、いかなるＯＳまたは関数フック方法を用いてもよい。 However, the method for realizing the API function hook in the present embodiment does not limit the OS or API function hook method. That is, any OS or function hook method may be used as long as new processing can be added to the function to be processed when the thread 120 is executed.

また、後述する本実施形態において、ＡＰＩ関数フックを行うプログラムは標準Ｃライブラリ（ＧＬＩＢＣ）であるとして説明するが、ＡＰＩ関数フックを実現するプログラミング言語は、ＣおよびＣ＋＋に限定されるものではない。 In the present embodiment described later, the program that performs API function hooks is described as being a standard C library (GLIBC), but the programming language that implements API function hooks is not limited to C and C ++.

また、前述および後述のスレッド１２０は、ゲートウェイ機器において実行されるが、本実施形態のスレッド１２０はゲートウェイ機器のみに限らず、いかなる計算機においても実行される。そして、いかなる計算機においても、本実施形態のスレッド監視方法が実行されうる。 Further, the thread 120 described above and below is executed in the gateway device, but the thread 120 of the present embodiment is not limited to the gateway device and is executed in any computer. In any computer, the thread monitoring method according to the present embodiment can be executed.

図２は、本発明の第１の実施形態の機器４０のハードウェア構成を示すブロック図である。 FIG. 2 is a block diagram illustrating a hardware configuration of the device 40 according to the first embodiment of this invention.

本実施形態を実現する機器４０は、少なくとも一つのプロセッサ４００、メモリ４２０、および、ＮＩＦ４４０を備える。また、プロセッサ４００およびメモリ４２０は、バス４１０によって接続され、メモリ４２０およびＮＩＦ４４０はバス４３０によって接続される。 The device 40 that implements the present embodiment includes at least one processor 400, a memory 420, and an NIF 440. The processor 400 and the memory 420 are connected by a bus 410, and the memory 420 and the NIF 440 are connected by a bus 430.

プロセッサ４００は、ＣＰＵ等の演算装置であり、ＯＳおよびアプリケーションプログラム等のソフトウェアを実行する。メモリ４２０は、プロセッサ４００がソフトウェア実行時に、プログラム実行バイナリおよびプログラムが使用するデータを格納するための記憶領域である。 The processor 400 is an arithmetic device such as a CPU, and executes software such as an OS and application programs. The memory 420 is a storage area for storing program execution binaries and data used by programs when the processor 400 executes software.

ＮＩＦ４４０は、機器４０とは別の機器とパケットを送受信するための装置である。プロセッサ４００、メモリ４２０、およびＮＩＦは、バス４１０およびバス４３０によって接続されるため、互いに命令メッセージおよびデータを送信することが可能である。 The NIF 440 is a device for transmitting and receiving packets to and from a device different from the device 40. Since the processor 400, the memory 420, and the NIF are connected by the bus 410 and the bus 430, it is possible to transmit an instruction message and data to each other.

なお、プロセッサ４００は、マルチタスク可能なマルチコアまたはマルチプロセッサであることが望ましいが、マルチタスクに対応したシングルコアまたはシングルプロセッサでも本発明を適用可能である。ここでマルチタスクとは、プロセッサ４００が複数の処理を切り替えながら複数の処理を実行する方法である。従って、マルチタスクが可能なハードウェアで稼働するソフトウェアであれば、本発明を適用することができる。 The processor 400 is preferably a multicore or multiprocessor capable of multitasking, but the present invention can also be applied to a single core or single processor that supports multitasking. Here, multitasking is a method in which the processor 400 executes a plurality of processes while switching between the plurality of processes. Therefore, the present invention can be applied to any software that runs on hardware capable of multitasking.

また、機器４０は、物理的に一つの計算機によって実装されてもよいし、少なくとも一つの計算機が提供する仮想的な計算機によって実装されてもよい。 The device 40 may be physically implemented by one computer, or may be implemented by a virtual computer provided by at least one computer.

図３は、本発明の第１の実施形態のＡＰＩ関数フック群５１０を示す説明図である。 FIG. 3 is an explanatory diagram illustrating the API function hook group 510 according to the first embodiment of this invention.

本実施形態におけるＡＰＩ関数フック群５１０とは、ＡＰＩ関数フック対象として選択されたＡＰＩ関数の集合である。本実施形態において、監視対象となるプログラムの開発者または使用者が、あらかじめＡＰＩ関数フック群５１０に含まれるＡＰＩ関数を決定する。 The API function hook group 510 in the present embodiment is a set of API functions selected as API function hook targets. In this embodiment, the developer or user of the program to be monitored determines API functions included in the API function hook group 510 in advance.

本実施形態のスレッド監視の対象のプログラムが実行された場合、プロセッサ４００は、少なくとも一つのスレッドまたは少なくとも一つのプロセスによって、プログラムの処理を実行する。 When the thread monitoring target program of this embodiment is executed, the processor 400 executes the processing of the program by at least one thread or at least one process.

スレッドプロシージャ５２０は、各スレッド１２０が生成されてから終了するまでの処理内容が、ソースコードにおいて定義されている箇所である。よって、スレッド１２０はスレッドプロシージャに従って動作する。 The thread procedure 520 is a place where the processing contents from the generation of each thread 120 to the termination thereof are defined in the source code. Therefore, the thread 120 operates according to the thread procedure.

スレッドプロシージャ５２０には、複数のＡＰＩ関数が含まれる。第１の実施形態のスレッドプロシージャ５２０に含まれるＡＰＩ関数がメモリ４２０にロードされる際、ＬＤ＿ＰＲＥＬＯＡＤ環境変数５３０に共有ライブラリ５００を指定していると、共有ライブラリ５００に該当するＡＰＩ関数があればメモリ４２０にロードする。 The thread procedure 520 includes a plurality of API functions. When the API function included in the thread procedure 520 of the first embodiment is loaded into the memory 420, if the shared library 500 is specified in the LD_PRELOAD environment variable 530, if there is an API function corresponding to the shared library 500, the memory Load to 420.

共有ライブラリ５００は、ＡＰＩ関数フック群５１０を含む。このため、メモリ４２０にロードされるＡＰＩ関数が書き換わることによって、スレッドプロシージャ５２０に含まれる各ＡＰＩ関数は、ＡＰＩ関数フック群５１０が示す処理内容を実行する。 The shared library 500 includes an API function hook group 510. Therefore, when the API function loaded in the memory 420 is rewritten, each API function included in the thread procedure 520 executes the processing content indicated by the API function hook group 510.

ただし、ＡＰＩ関数フックされるべきＡＰＩ関数は、スレッド監視の対象のプログラムを含むソフトウェアによって異なる。例えば、一般的に、ＡＰＩ関数フック群５１０として選択されるＡＰＩ関数は、定期的に実行されるＡＰＩ関数が望ましい。本実施形態のＡＰＩ関数フック群５１０は、開発者または使用者がスレッドプロシージャ５２０からフックされるＡＰＩ関数を選択することによって生成される。 However, the API function to be hooked depends on the software including the thread monitoring target program. For example, generally, the API function selected as the API function hook group 510 is preferably an API function that is executed periodically. The API function hook group 510 of the present embodiment is generated when a developer or user selects an API function to be hooked from the thread procedure 520.

開発者または使用者は、ＡＰＩ関数を選択するために、スレッドプロシージャ５２０からＡＰＩ関数を抜き出すためのスクリプトを用いてもよい。ＡＰＩ関数を抜き出すためのスクリプトは、簡易プログラミング言語によってコーディングされたプログラムである。 A developer or user may use a script to extract an API function from the thread procedure 520 to select an API function. A script for extracting an API function is a program coded in a simple programming language.

また、開発者または使用者は、定期的に実行されるＡＰＩ関数が既に取得されている場合、取得されたＡＰＩ関数をＡＰＩ関数フック群５１０に含めてもよい。 In addition, the developer or user may include the acquired API function in the API function hook group 510 when the API function to be periodically executed is already acquired.

例えば、図１に示すゲートウェイ機器において、開発者または使用者は、スレッド１２０をＷＡＩＴ状態にするＡＰＩ関数をＡＰＩ関数フック群５１０として選択する。また例えば、スレッド１２０が正常時にｐｒｉｎｔｆＡＰＩ関数によって文字列を表示し続ける場合、開発者または使用者は、ｐｒｉｎｔｆＡＰＩ関数をＡＰＩ関数フック群５１０として選択する。 For example, in the gateway device shown in FIG. 1, the developer or user selects the API function that sets the thread 120 as the WAIT state as the API function hook group 510. Further, for example, when the thread 120 continues to display a character string by the printf API function when it is normal, the developer or user selects the printf API function as the API function hook group 510.

ただし、スレッド１２０が無限ループを含み、かつ、無限ループ内にフック対象のＡＰＩ関数がある場合、スレッド１２０は無限ループ内で生存報告２００をするため、スレッド１２０の異常動作として無限ループが検出されなくなる。従って、前述のプログラムは、無限ループを引き起こす可能性があるループ内のＡＰＩ関数を選択対象から除外し、ＡＰＩ関数フック群５１０を生成する。より具体的には、前述のプログラムは、ループ文の終了条件が他のスレッドによって満たされるループ文または条件を満たした時のみ、無限ループを抜け出すループ文を選択対象から除外する。 However, when the thread 120 includes an infinite loop and there is an API function to be hooked in the infinite loop, the thread 120 reports the survival 200 in the infinite loop, so the infinite loop is detected as an abnormal operation of the thread 120. Disappear. Therefore, the above-described program excludes API functions in a loop that may cause an infinite loop from selection targets, and generates an API function hook group 510. More specifically, the above-described program excludes a loop statement that exits an infinite loop from selection targets only when the end condition of the loop statement satisfies a loop statement or a condition that is satisfied by another thread.

なお、ＡＰＩ関数フック群５１０に含まれるＡＰＩ関数の数は制限されない。 Note that the number of API functions included in the API function hook group 510 is not limited.

次に、共有ライブラリ５００の生成について説明する。共有ライブラリ５００は、ＯＳによって提供されるほか、開発者がソースコードをコンパイルすることによって生成される。従って、前述のプログラムが、ＡＰＩ関数フック群５１０に含まれるＡＰＩ関数を選択し、各ＡＰＩ関数がフックされた際の処理内容をＡＰＩ関数フック群５１０に定義した後、前述のプログラムは、ＡＰＩ関数フック群５１０をコンパイルすることによって共有ライブラリ５００を生成する。なお、生成される共有ライブラリ５００は、ＡＰＩ関数ごとに別々に生成されてもよい。 Next, generation of the shared library 500 will be described. In addition to being provided by the OS, the shared library 500 is generated by a developer compiling source code. Therefore, after the above program selects an API function included in the API function hook group 510 and defines the processing contents when each API function is hooked in the API function hook group 510, the above program includes the API function hook function 510. The shared library 500 is generated by compiling the hook group 510. Note that the generated shared library 500 may be generated separately for each API function.

図３が示す共有ライブラリ５００において定義されるＡＰＩ関数の処理内容は、各ＡＰＩ関数が有する各ＡＰＩ関数固有の処理内容と、生存報告２００の処理内容とを含む。 The processing content of the API function defined in the shared library 500 illustrated in FIG. 3 includes processing content unique to each API function included in each API function and processing content of the survival report 200.

例えば、スレッドプロシージャ５２０がｓｌｅｅｐＡＰＩ関数を含み、ｓｌｅｅｐＡＰＩ関数がＡＰＩ関数フック群５１０として選択された場合のｓｌｅｅｐＡＰＩ関数の処理内容を図３に示す。スレッド１２０が実行され、ｓｌｅｅｐＡＰＩ関数が実行された際、スレッド１２０は、ｓｌｅｅｐＡＰＩ関数においてＡＰＩ関数フック５４０を行うことによって生存報告５５０を行う。これは、ＬＤ＿ＰＲＥＬＯＡＤ環境変数５３０によって、共有ライブラリ５００で定義したＡＰＩ関数がメモリ４２０にロードされているためである。そして、生存報告５５０が終了した後、スレッド１２０は、ｓｌｅｅｐＡＰＩ関数固有の処理５６０を実行する。 For example, FIG. 3 shows the processing contents of the sleep API function when the thread procedure 520 includes a sleep API function and the sleep API function is selected as the API function hook group 510. When the thread 120 is executed and the sleep API function is executed, the thread 120 performs the survival report 550 by performing the API function hook 540 in the sleep API function. This is because the API function defined in the shared library 500 is loaded into the memory 420 by the LD_PRELOAD environment variable 530. Then, after the survival report 550 ends, the thread 120 executes a process 560 specific to the sleep API function.

一方、スレッドプロシージャ５２０がｐｒｉｎｔｆＡＰＩ関数を含み、ｐｒｉｎｔｆＡＰＩ関数はＡＰＩ関数フック群５１０として選択されていない場合のｐｒｉｎｔｆＡＰＩ関数の処理内容を図３に示す。スレッド１２０が実行され、ｐｒｉｎｔｆＡＰＩ関数が実行された際、スレッド１２０は、ｐｒｉｎｔｆＡＰＩ関数固有の処理５７０を実行する。これは、共有ライブラリ５００が、ｐｒｉｎｔｆＡＰＩ関数をフックし、ｐｒｉｎｔｆＡＰＩ関数固有の処理以外の処理を行うことを定義されていないためである。 On the other hand, FIG. 3 shows the processing contents of the printf API function when the thread procedure 520 includes a printf API function and the printf API function is not selected as the API function hook group 510. When the thread 120 is executed and the printf API function is executed, the thread 120 executes a process 570 unique to the printf API function. This is because it is not defined that the shared library 500 hooks the printf API function and performs processing other than processing unique to the printf API function.

図３に示す生存報告５５０は、図１に示す生存報告２００の処理である。すなわち、図３に示す生存報告５５０は、生存フラグ１４０に１を格納する処理である。 The survival report 550 shown in FIG. 3 is a process of the survival report 200 shown in FIG. That is, the survival report 550 shown in FIG. 3 is a process of storing 1 in the survival flag 140.

本実施形態の生存フラグ１４０が１である場合、生存フラグ１４０は、スレッド１２０が生存していることを報告したことを示す。生存フラグ１４０が１以外である場合、生存フラグ１４０は、スレッド１２０が生存していることを報告していないことを示す。すなわち、生存フラグ１４０が１以外である場合、スレッド１２が異常である可能性が高い。 When the survival flag 140 of this embodiment is 1, the survival flag 140 indicates that the thread 120 has been reported to be alive. When the survival flag 140 is other than 1, the survival flag 140 indicates that the thread 120 is not reported to be alive. That is, when the survival flag 140 is other than 1, there is a high possibility that the thread 12 is abnormal.

なお、図３におけるスレッド１２０は、生存フラグ１４０に１を格納する処理をＡＰＩ関数固有の処理を実行する前に行うが、本実施形態におけるスレッド１２０は、ＡＰＩ関数固有の処理を実行した後に生存フラグ１４０に１を格納してもよい。また、前述のプログラムは、ＡＰＩ関数フック時に追加される処理を少なくすることによって、ＡＰＩ関数フックによって生じる遅延を抑えることができる。 The thread 120 in FIG. 3 performs the process of storing 1 in the survival flag 140 before executing the process specific to the API function. However, the thread 120 in this embodiment is alive after executing the process specific to the API function. 1 may be stored in the flag 140. Further, the above-described program can suppress a delay caused by an API function hook by reducing processing added at the time of API function hook.

スレッド１２０がＡＰＩ関数がフックされた関数内からＡＰＩ関数固有の処理を実行するためには、スレッド１２０は、ｄｌｓｙｍＡＰＩ関数によってＡＰＩ関数固有の処理を行う関数のアドレスを取得し、生存報告５５０を実行した後、取得されたアドレスを指定して呼びだすことによって、スレッド１２０は、ＡＰＩ関数固有の処理を実行する。ただし、ＡＰＩ関数固有の処理を実行する度にＡＰＩ関数のアドレスを取得すると効率が悪いため、スレッド１２０は、プロセスが終了するまで値を保持する静的な変数に保存しておくことによって効率を改善してもよい。 In order for the thread 120 to execute processing specific to the API function from within the function to which the API function is hooked, the thread 120 acquires the address of the function that performs processing specific to the API function by the dlsym API function, and displays the survival report 550. After execution, the thread 120 executes processing specific to the API function by specifying and calling the acquired address. However, it is inefficient to obtain the API function address each time processing specific to the API function is executed. Therefore, the thread 120 saves efficiency by storing it in a static variable that holds the value until the process ends. You may improve.

図３に示す共有ライブラリ５００は、ＡＰＩ関数名、ＡＰＩ関数の引数、追加される処理内容（生存報告５５０に対応）、および、ＡＰＩ関数固有の処理の４項目を含む。共有ライブラリ５００は、前述のＡＰＩ関数フック群５１０を生成するプログラムによって、定期的または開発者の指示によって、生成されてもよい。 The shared library 500 shown in FIG. 3 includes four items: an API function name, an API function argument, a processing content to be added (corresponding to the survival report 550), and processing specific to the API function. The shared library 500 may be generated periodically or by a developer's instruction by a program that generates the API function hook group 510 described above.

ＡＰＩ関数フックによる生存報告５５０について説明する。前述のプログラムによって生成された共有ライブラリ５００を用いてＡＰＩ関数フックを有効にする場合、開発者は、ＬＤ＿ＰＲＥＬＯＡＤ環境変数５３０に共有ライブラリ５００へのパス（配置場所）を格納する。そして、パスを格納されたＬＤ＿ＰＲＥＬＯＡＤ環境変数５３０を用いてソフトウェアが起動した場合、起動されたソフトウェアに含まれるプロセスにおいてＡＰＩ関数フック５４０が有効になり、生存報告５５０が行われる。 The survival report 550 by the API function hook will be described. When the API function hook is enabled using the shared library 500 generated by the above-described program, the developer stores the path (placement location) to the shared library 500 in the LD_PRELOAD environment variable 530. When the software is activated using the LD_PRELOAD environment variable 530 in which the path is stored, the API function hook 540 is enabled in the process included in the activated software, and the survival report 550 is performed.

なお、ＡＰＩ関数フックを有効にしたプロセスから子プロセスが起動された場合も、子プロセスにおいてＡＰＩ関数フックが有効になる。子プロセスへのＡＰＩ関数フックを無効にする場合、親プロセスが実行される関数は、環境変数を削除するｕｎｓｅｔｅｎｖＡＰＩ関数を含み、親プロセスが実行された後、親プロセスは、ＬＤ＿ＰＲＥＬＯＡＤ環境変数５３０を無効にした状態において子プロセスを起動する。 Even when a child process is started from a process in which an API function hook is enabled, the API function hook is enabled in the child process. When disabling an API function hook to a child process, the function that the parent process executes includes an unsetenv API function that deletes the environment variable, and after the parent process is executed, the parent process sets the LD_PRELOAD environment variable 530. Start the child process in the disabled state.

このとき、本実施形態の機器４０は、予めＡＰＩ関数フックを有効にするプロセス数をメモリ４２０が保持する環境変数などに格納させ、ＡＰＩ関数フックの回数が環境変数に格納されたプロセス数を超えた場合、ＡＰＩ関数フックを無効にするプログラムを有することによって、ＡＰＩ関数フックの回数およびＡＰＩ関数フックの範囲を制御してもよい。 At this time, the device 40 of the present embodiment stores the number of processes for enabling the API function hook in advance in an environment variable or the like held in the memory 420, and the number of API function hooks exceeds the number of processes stored in the environment variable. In this case, the number of API function hooks and the range of API function hooks may be controlled by having a program that invalidates API function hooks.

図４は、本発明の第１の実施形態の監視スレッド１１０が保持するデータおよびスレッド１２０が保持するデータを示す説明図である。 FIG. 4 is an explanatory diagram illustrating data held by the monitoring thread 110 and data held by the thread 120 according to the first embodiment of this invention.

監視スレッド１１０は、メモリ４２０に監視対象リスト６１０を保持する。また、各スレッド１２０（１２０−１、１２０−２）は、メモリ４２０に監視情報６２０（６２０−１、６２０−２）を保持する。 The monitoring thread 110 holds a monitoring target list 610 in the memory 420. Each thread 120 (120-1, 120-2) holds monitoring information 620 (620-1, 620-2) in the memory 420.

監視対象リスト６１０は、各スレッドを一意に識別するための識別子と、各監視情報６２０がメモリ４２０のいずれの領域に格納されるかを示すポインタとを含む。監視スレッド１１０は、各スレッド１２０を監視する場合、監視対象リスト６１０を参照することによって、監視情報６２０から情報を取得する。 The monitoring target list 610 includes an identifier for uniquely identifying each thread, and a pointer indicating in which area of the memory 420 each monitoring information 620 is stored. When monitoring each thread 120, the monitoring thread 110 obtains information from the monitoring information 620 by referring to the monitoring target list 610.

監視情報６２０は、各スレッド１２０を監視するために必要な情報を、スレッド１２０毎に保持する構造体である。監視情報６２０は、生存フラグ１４０および連続ＮＧ回数６２１を含む。 The monitoring information 620 is a structure that holds information necessary for monitoring each thread 120 for each thread 120. The monitoring information 620 includes a survival flag 140 and a continuous NG count 621.

生存フラグ１４０は、図１に示す生存フラグ１４０と同じであり、スレッド１２０が正常状態であることを示す１ビットのフラグである。各スレッド１２０は、生存フラグ１４０を一つ保持する。生存フラグ１４０は、スレッド１２０および監視スレッド１１０によって更新される。 The survival flag 140 is the same as the survival flag 140 shown in FIG. 1, and is a 1-bit flag indicating that the thread 120 is in a normal state. Each thread 120 holds one survival flag 140. The survival flag 140 is updated by the thread 120 and the monitoring thread 110.

連続ＮＧ回数６２１は、後述する異常動作検出８００の処理によって、スレッド１２０−１が無効であると連続して判定された場合に、無効であると判定された回数を含む。連続ＮＧ回数６２１は、監視スレッド１１０によって更新される。 The continuous NG count 621 includes the number of times that the thread 120-1 is determined to be invalid when it is continuously determined that the thread 120-1 is invalid by the processing of the abnormal operation detection 800 described later. The continuous NG count 621 is updated by the monitoring thread 110.

メモリ４２０が監視情報６２０を保持する方法としては、各スレッド１２０固有の記憶領域に各スレッド１２０に対応した監視情報６２０を保持する方法がある。また、メモリ４２０における各スレッド１２０間の共有メモリ領域に、監視情報６２０の構造体を含む一つのテーブルを保持し、このテーブルによって各監視情報６２０を一括管理する方法がある。 As a method of holding the monitoring information 620 in the memory 420, there is a method of holding the monitoring information 620 corresponding to each thread 120 in a storage area unique to each thread 120. In addition, there is a method in which one table including the structure of the monitoring information 620 is held in a shared memory area between the threads 120 in the memory 420, and the monitoring information 620 is collectively managed by this table.

ここで、スレッド１２０固有の記憶領域とは、アドレスが指定されなければ他のスレッドからアクセスされないため、各スレッド１２０が固有の記憶領域に監視情報６２０を保持した場合、他のスレッド１２０が誤って自らが保持する監視情報６２０にアクセスすることを防ぐことができる。 Here, the storage area unique to the thread 120 is not accessed from other threads unless an address is specified. Therefore, when each thread 120 holds the monitoring information 620 in its own storage area, the other thread 120 erroneously It is possible to prevent access to the monitoring information 620 held by itself.

また、各スレッド１２０間の共有メモリ領域とは、プロセス間で共有できる領域であり、プロセスに含まれる各スレッド１２０が共有メモリ領域に監視情報６２０を保持した場合、プロセスごとに監視することが可能になる。 The shared memory area between the threads 120 is an area that can be shared between processes. When each thread 120 included in a process holds monitoring information 620 in the shared memory area, monitoring can be performed for each process. become.

さらに、共有メモリ領域に監視情報６２０が格納された場合、監視情報６２０は、スレッド名などの情報を含むことによって、デバッグ時の情報として用いられてもよい。また、共有メモリ領域に監視情報６２０が格納された場合、監視情報６２０は、ロックハンドルを含むことによって、共有リソースのロック処理とアンロック処理とに用いられてもよい。ここで、ロックハンドルとはロック処理とアンロック処理とに必要なキーである。 Furthermore, when the monitoring information 620 is stored in the shared memory area, the monitoring information 620 may be used as information at the time of debugging by including information such as a thread name. When the monitoring information 620 is stored in the shared memory area, the monitoring information 620 may be used for the shared resource locking process and unlocking process by including a lock handle. Here, the lock handle is a key required for lock processing and unlock processing.

監視情報６２０は、スレッド１２０が生成される際に、スレッド１２０毎にメモリ４２０に生成される。プロセッサ４００は、スレッド１２０が生成された際に、スレッド１２０の処理によって監視情報６２０をメモリ４２０に生成する。 The monitoring information 620 is generated in the memory 420 for each thread 120 when the thread 120 is generated. When the thread 120 is generated, the processor 400 generates monitoring information 620 in the memory 420 by processing of the thread 120.

本実施形態のスレッド１２０の動作には、メモリ４２０に監視情報６２０を生成する処理と、生成された監視情報６２０のメモリ４２０における位置を示すポインタを監視リスト６１０に登録する処理とが含まれる。これは、ｐｔｈｒｅａｄ＿ｃｒｅａｔｅＡＰＩ関数（スレッド生成命令）にＡＰＩ関数フックを行い、生成および登録処理をあらかじめ設定することによって可能である。 The operation of the thread 120 of this embodiment includes processing for generating monitoring information 620 in the memory 420 and processing for registering a pointer indicating the position of the generated monitoring information 620 in the memory 420 in the monitoring list 610. This is possible by performing an API function hook on the pthread_create API function (thread generation instruction) and setting the generation and registration processes in advance.

このため、スレッド１２０は、スレッド１２０が新たに生成された場合、スレッド１２０の監視情報６２０の位置を示すポインタと、スレッド１２０の識別子とを監視リスト６１０登録する。 Therefore, when the thread 120 is newly created, the thread 120 registers the pointer indicating the position of the monitoring information 620 of the thread 120 and the identifier of the thread 120 in the monitoring list 610.

そして、監視スレッド１１０は、監視対象リスト６１０に従って異常動作検出を行う。生成された監視情報６２０は、監視情報６２０に対応するスレッド１２０がプロセッサ４００によって実行される間、メモリ４２０に保持される。そして、各スレッド１２０の処理の終了に従って、スレッド１２０は、監視情報６２０をメモリ４２０から削除し、監視対象リスト６１０からスレッド１２０のエントリを削除する。これは、ｐｔｈｒｅａｄ＿ｅｘｉｔＡＰＩ関数（スレッド終了命令）にＡＰＩ関数フックを行い、削除処理を追加することで実現できる。 Then, the monitoring thread 110 performs abnormal operation detection according to the monitoring target list 610. The generated monitoring information 620 is held in the memory 420 while the thread 120 corresponding to the monitoring information 620 is executed by the processor 400. Then, as the processing of each thread 120 ends, the thread 120 deletes the monitoring information 620 from the memory 420 and deletes the entry of the thread 120 from the monitoring target list 610. This can be realized by performing an API function hook on the pthread_exit API function (thread end instruction) and adding a deletion process.

また、本実施形態の監視スレッド１１０は、生成されたスレッド１２０を示す識別子と生成されたスレッド１２０を含むプロセスを示す識別子とを、メモリ４２０に格納する処理を含んでもよい。これによって、後述の処理によって、監視スレッド１１０がスレッド１２０を異常であると判定した場合、監視スレッド１１０は、スレッド１２０の識別子に基づいてスレッド１２０を含むプロセスの識別子をメモリ４２０から取得し、取得された識別子を用いて別プロセスからプロセスを再起動してもよい。 Further, the monitoring thread 110 according to the present embodiment may include a process of storing an identifier indicating the generated thread 120 and an identifier indicating the process including the generated thread 120 in the memory 420. Accordingly, when the monitoring thread 110 determines that the thread 120 is abnormal by the processing described later, the monitoring thread 110 acquires the identifier of the process including the thread 120 from the memory 420 based on the identifier of the thread 120 and acquires the identifier. The process may be restarted from another process using the determined identifier.

また、本実施形態においてスレッド１２０は、監視情報６２０が削除された旨を監視スレッド１１０に通知してもよく、また、監視情報６２０が削除された旨を監視スレッド１１０に通知しなくてもよい。 In the present embodiment, the thread 120 may notify the monitoring thread 110 that the monitoring information 620 has been deleted, and may not notify the monitoring thread 110 that the monitoring information 620 has been deleted. .

監視情報６２０が削除されたことが通知されない場合においても、監視対象リスト６１０に含まれる各エントリは、監視情報６２０が削除された時点でスレッド１２０によって削除されるため、監視スレッド１１０はスレッド１２０への異常検出を行わない。 Even when it is not notified that the monitoring information 620 has been deleted, each entry included in the monitoring target list 610 is deleted by the thread 120 when the monitoring information 620 is deleted. Does not detect abnormalities.

図５は、本発明の第１の実施形態の生存報告２００および異常動作検出８００を示すシーケンス図である。 FIG. 5 is a sequence diagram illustrating the survival report 200 and the abnormal operation detection 800 according to the first embodiment of this invention.

図５は、スレッド１２０による生存報告２００と監視スレッド１１０による異常動作検出８００とによって行われるスレッド監視の処理を示す。 FIG. 5 shows thread monitoring processing performed by the survival report 200 by the thread 120 and the abnormal operation detection 800 by the monitoring thread 110.

スレッド１２０が実行される間、スレッド１２０が実行するＡＰＩ関数のうち、ＡＰＩ関数フックされるＡＰＩ関数が、監視情報６２０に生存報告２００を行う。具体的には、ＡＰＩ関数フックされるＡＰＩ関数が、生存フラグ１４０に１を格納する。 While the thread 120 is executed, the API function hooked by the API function among the API functions executed by the thread 120 performs the survival report 200 in the monitoring information 620. Specifically, the API function hooked by the API function stores 1 in the survival flag 140.

一方で、監視スレッド１１０は、監視対象リスト６１０を用いて監視情報６２０を参照することによって、異常動作検出８００を実行する。監視スレッド１１０は、監視対象リスト６１０のポインタが示す監視情報６２０をすべて参照することによって、監視対象リスト６１０に識別子が格納されるすべてのスレッド１２０に異常動作検出８００を行う。 On the other hand, the monitoring thread 110 executes the abnormal operation detection 800 by referring to the monitoring information 620 using the monitoring target list 610. The monitoring thread 110 refers to all the monitoring information 620 indicated by the pointer of the monitoring target list 610, and performs the abnormal operation detection 800 for all the threads 120 whose identifiers are stored in the monitoring target list 610.

監視スレッド１１０は、監視対象リスト６１０に識別子が格納されていないスレッド１２０に、異常動作検出８００を行えない。このため、実行中のスレッド１２０を示す識別子は、監視対象リスト６１０に常に格納される必要がある。本実施形態において、監視対象リスト６１０に実行中のスレッド１２０を示す識別子を格納する方法は、スレッド１２０が生成された際に監視対象リスト６１０に識別子が格納され、スレッド１２０が終了した際に監視対象リスト６１０から削除することによって実現される。 The monitoring thread 110 cannot perform the abnormal operation detection 800 for the thread 120 whose identifier is not stored in the monitoring target list 610. For this reason, the identifier indicating the executing thread 120 needs to be always stored in the monitoring target list 610. In this embodiment, a method for storing an identifier indicating the thread 120 being executed in the monitoring target list 610 is performed when the identifier is stored in the monitoring target list 610 when the thread 120 is generated and the thread 120 is terminated. This is realized by deleting from the target list 610.

なお、監視対象リスト６１０のデータ構造はリスト構造に限定するものではなく、テーブル構造でもよい。また、図５に示す異常動作検出８００は、一定の監視周期によって実行されるが、本実施形態の異常動作検出８００は、開発者または使用者等が監視スレッド１１０に対応するコマンドを実行することによって、不定期または任意のタイミングによって実行されてもよい。 The data structure of the monitoring target list 610 is not limited to the list structure, and may be a table structure. The abnormal operation detection 800 shown in FIG. 5 is executed at a constant monitoring cycle. However, the abnormal operation detection 800 according to the present embodiment allows a developer or a user to execute a command corresponding to the monitoring thread 110. May be executed irregularly or at an arbitrary timing.

図６は、本発明の第１の実施形態の監視スレッド１１０による異常動作検出８００を示すフローチャートである。 FIG. 6 is a flowchart showing the abnormal operation detection 800 by the monitoring thread 110 according to the first embodiment of this invention.

所定の監視周期において監視スレッド１１０が異常動作検出８００を開始する（８０１）。監視スレッド１１０は、開発者または使用者からの指示によって、異常動作検出８００を実行してもよい。 The monitoring thread 110 starts the abnormal operation detection 800 in a predetermined monitoring cycle (801). The monitoring thread 110 may execute the abnormal operation detection 800 according to an instruction from a developer or a user.

なお、異常動作検出８００は、監視対象リスト６１０に格納されるすべての監視情報６２０に行われるが、図６に示す処理は、一つの監視情報６２０に行われる処理を示す。複数の監視情報６２０に異常動作検出８００が行われる場合、監視スレッド１１０は、図６に示す処理を複数の監視情報６２０に対して並列に行ってもよいし、複数の監視情報６２０に順次行ってもよい。 Although the abnormal operation detection 800 is performed on all the monitoring information 620 stored in the monitoring target list 610, the processing illustrated in FIG. 6 indicates processing performed on one monitoring information 620. When the abnormal operation detection 800 is performed on a plurality of pieces of monitoring information 620, the monitoring thread 110 may perform the processing illustrated in FIG. 6 on the plurality of pieces of monitoring information 620 in parallel or sequentially on the pieces of monitoring information 620. May be.

ステップ８０１の後、監視スレッド１１０は、監視対象リスト６１０を用いてスレッド１２０の監視情報６２０を参照する。そして、各スレッド１２０に本実施形態の生存確認３００を実行し、スレッド１２０の生存確認３００の結果が有効か否かを判定する（８１０）。 After step 801, the monitoring thread 110 refers to the monitoring information 620 of the thread 120 using the monitoring target list 610. Then, the existence confirmation 300 of this embodiment is executed for each thread 120, and it is determined whether the result of the existence confirmation 300 of the thread 120 is valid (810).

ステップ８１０において行われる生存確認３００の処理の流れは、図１２に示す生存確認３００の処理の流れと同様である。しかし、本実施形態の生存確認３００のステップ３３０において、監視スレッド１１０は、生存フラグ１４０が有効であると判定する。また、本実施形態の生存確認３００のステップ３４０において、監視スレッド１１０は、生存フラグ１４０が無効であると判定する。これは、異常動作検出８００における生存確認３００は、スレッド１２０が正常であるか否かを判定する処理ではないためである。 The process flow of the survival confirmation 300 performed in step 810 is the same as the process flow of the survival confirmation 300 shown in FIG. However, in step 330 of the survival confirmation 300 of this embodiment, the monitoring thread 110 determines that the survival flag 140 is valid. In step 340 of the survival confirmation 300 according to the present embodiment, the monitoring thread 110 determines that the survival flag 140 is invalid. This is because the survival confirmation 300 in the abnormal operation detection 800 is not a process for determining whether or not the thread 120 is normal.

ステップ８１０において、生存確認３００の結果が有効であると判定された場合、生存フラグ１４０はスレッド１２０が生存報告２００を正常に行っていることを示す。このため、監視スレッド１１０は、連続ＮＧ回数６２１に０を格納する（８２０）。 When it is determined in step 810 that the result of the survival confirmation 300 is valid, the survival flag 140 indicates that the thread 120 is normally performing the survival report 200. Therefore, the monitoring thread 110 stores 0 in the continuous NG count 621 (820).

そして、ステップ８２０の後、監視スレッド１１０は、スレッド１２０が正常であると判定する（８６０）。連続ＮＧ回数６２１は、監視周期ごとに実行されるステップ８１０において、生存フラグ１４０が無効であると連続して判定された場合に、無効と判定された回数を示す。 After step 820, the monitoring thread 110 determines that the thread 120 is normal (860). The number of consecutive NGs 621 indicates the number of times determined to be invalid when it is continuously determined that the survival flag 140 is invalid in Step 810 executed every monitoring period.

ステップ８１０において、生存確認３００の結果が無効であると判定された場合、監視スレッド１１０は、スレッド１２０がＮＧであると判定する（８３０）。ステップ８３０の後、連続ＮＧ回数６２１に１を加算する（８４０）。例えば、二つ前の監視周期におけるステップ８１０において有効と判定され、一つ前の監視周期におけるステップ８１０において無効と判定された生存フラグ１４０を、監視スレッド１１０がステップ８１０において無効と判定する場合、ステップ８４０の後の連続ＮＧ回数６２１には、２が格納される。 If it is determined in step 810 that the result of the survival confirmation 300 is invalid, the monitoring thread 110 determines that the thread 120 is NG (830). After step 830, 1 is added to the continuous NG count 621 (840). For example, when the monitoring thread 110 determines that the survival flag 140 determined to be valid in step 810 in the previous monitoring cycle and invalid in step 810 in the previous monitoring cycle is invalid in step 810, 2 is stored in the continuous NG count 621 after step 840.

ステップ８４０の後、監視スレッド１１０は、連続ＮＧ回数６２１が示す値と所定の保護回数とを比較し、連続ＮＧ回数６２１が示す値が所定の保護回数以下であるか否かを判定する（８５０）。 After step 840, the monitoring thread 110 compares the value indicated by the continuous NG count 621 with a predetermined protection count, and determines whether or not the value indicated by the continuous NG count 621 is less than or equal to the predetermined protection count (850). ).

ここで、保護回数とは、各監視周期におけるステップ８３０においてＮＧであると連続して判定された場合、すなわち、各監視周期におけるステップ８１０において無効であると連続して判定された場合、監視スレッド１１０がスレッド１２０を異常と判定しない連続ＮＧ回数６２１の上限値である。監視スレッド１１０は、連続ＮＧ回数６２１の値が保護回数の値を超えるまで、スレッド１２０を異常と判定しない。 Here, when the number of protections is continuously determined to be NG in step 830 in each monitoring cycle, that is, when it is continuously determined to be invalid in step 810 in each monitoring cycle, the monitoring thread 110 is the upper limit value of the number of consecutive NGs 621 at which the thread 120 is not determined to be abnormal. The monitoring thread 110 does not determine that the thread 120 is abnormal until the value of the continuous NG count 621 exceeds the value of the protection count.

例えば、保護回数が１である場合、ステップ８３０における判定が２回連続した後、監視スレッド１１０は、スレッド１２０が異常な動作を行っていると判定（異常動作検出）する。 For example, when the protection count is 1, after the determination in step 830 is continued twice, the monitoring thread 110 determines that the thread 120 is performing an abnormal operation (abnormal operation detection).

ステップ８５０において、連続ＮＧ回数６２１が所定の保護回数以下であると判定された場合、監視スレッド１１０は、ステップ８６０を実行することによって、スレッド１２０が正常であると判定する。 If it is determined in step 850 that the continuous NG count 621 is less than or equal to the predetermined protection count, the monitoring thread 110 determines that the thread 120 is normal by executing step 860.

ステップ８５０において、連続ＮＧ回数６２１が所定の保護回数よりも大きいと判定された場合、監視スレッド１１０は、スレッド１２０が異常であると判定する（８７０）。ステップ８６０またはステップ８７０の後、監視スレッド１１０は、スレッド１２０への異常動作検出８００を終了する（８８０）。 If it is determined in step 850 that the continuous NG count 621 is greater than the predetermined protection count, the monitoring thread 110 determines that the thread 120 is abnormal (870). After step 860 or step 870, the monitoring thread 110 ends the abnormal operation detection 800 for the thread 120 (880).

前述の保護回数は、本実施形態のスレッド監視の処理におけるパラメータとしてあらかじめ開発者または使用者によって、機器４０のメモリ４２０に格納される。保護回数を１以上の値とすることによって、生存報告２００が監視周期内で一度も実行されない場合、開発者または使用者は、スレッド１２０の異常動作の誤判定を防ぐことができる。 The number of times of protection described above is stored in advance in the memory 420 of the device 40 by the developer or user as a parameter in the thread monitoring process of this embodiment. By setting the number of protections to a value of 1 or more, when the survival report 200 is never executed within the monitoring cycle, the developer or user can prevent erroneous determination of abnormal operation of the thread 120.

また、連続ＮＧ回数６２１が保護回数以下である場合、監視スレッド１１０は、スレッド１２０がＮＧと判定されたことをログに出力してもよい。これは、開発者または使用者が、出力されたログを参照することによって、スレッド１２０のデバッグまたは性能改善を行うことができるためである。 When the continuous NG count 621 is less than or equal to the protection count, the monitoring thread 110 may output to the log that the thread 120 is determined to be NG. This is because the developer or user can debug or improve the performance of the thread 120 by referring to the output log.

スレッド１２０において無限ループまたはデッドロックが発生した場合、スレッド１２０は生存報告２００を行わない。このため、無限ループまたはデッドロックが発生した時点から、異常動作検出８００が２回以上実行された場合、ステップ８１０において生存フラグ１４０は必ず０である。このため、監視スレッド１１０は、無限ループまたはデッドロックが発生したスレッド１２０がＮＧであると判定できる。さらに、連続ＮＧ回数６２１が保護回数を超えた場合、監視スレッド１１０は、スレッド１２０の異常を検出できる。 When an infinite loop or deadlock occurs in the thread 120, the thread 120 does not perform the survival report 200. Therefore, if the abnormal operation detection 800 is executed twice or more from the time when an infinite loop or deadlock occurs, the survival flag 140 is always 0 in step 810. For this reason, the monitoring thread 110 can determine that the thread 120 in which an infinite loop or a deadlock has occurred is NG. Furthermore, when the continuous NG count 621 exceeds the protection count, the monitoring thread 110 can detect an abnormality of the thread 120.

前述の図６に示す処理によって、監視スレッド１１０はスレッド１２０の状態が正常であるか否かを判定できる。 By the process shown in FIG. 6 described above, the monitoring thread 110 can determine whether or not the state of the thread 120 is normal.

なお、本実施形態において、開発者または使用者が監視スレッド１１０を生成および実行するための定義を共有ライブラリ５００に設定することによって、監視スレッド１１０が生成されてもよい。これによって、機器４０が有するプログラムが監視スレッド１１０を提供していない場合も、共有ライブラリ５００を変更するのみで監視スレッド１１０を生成できるため、容易に本実施形態のスレッド監視方法を機器４０に実装することができる。 In the present embodiment, the monitoring thread 110 may be generated by setting a definition for the developer or user to generate and execute the monitoring thread 110 in the shared library 500. As a result, even when the program of the device 40 does not provide the monitoring thread 110, the monitoring thread 110 can be generated simply by changing the shared library 500. Therefore, the thread monitoring method of this embodiment is easily implemented in the device 40. can do.

また、本実施形態は、監視方法を監視スレッド１１０による監視方法に限定するものではなく、前述の図６および図１２の処理を実行するためのプロセスまたはスレッドが実行されれば、いかなる監視方法を用いてもよい。 In the present embodiment, the monitoring method is not limited to the monitoring method by the monitoring thread 110, and any monitoring method can be used as long as the process or thread for executing the processes of FIGS. 6 and 12 is executed. It may be used.

監視スレッド１１０が、スレッド１２０を異常と判定した場合の処理について説明する。スレッド１２０において無限ループまたはデッドロックが発生した場合、異常であると判定されたスレッド１２０を含むプロセスは、一般的に、スレッド１２０が異常のまま継続する。しかし、異常であると判定されたスレッド１２０が含まれるプロセスを継続しても、スレッド１２０が無限ループまたはデッドロックから自然に回復することはない。 A process when the monitoring thread 110 determines that the thread 120 is abnormal will be described. When an infinite loop or a deadlock occurs in the thread 120, the process including the thread 120 determined to be abnormal generally continues while the thread 120 is abnormal. However, even if the process including the thread 120 determined to be abnormal is continued, the thread 120 does not naturally recover from the infinite loop or deadlock.

このため、監視スレッド１１０は、スレッド１２０の異常を検出した場合、スレッド１２０、または、スレッド１２０が含まれるプロセスに、適切な処理を実行してもよい。ここで、適切な処理とは、異常が検出されたスレッド１２０の情報をログに残し、スレッド１２０が含まれるプロセスを終了してから、スレッド１２０が含まれるプロセスを再起動する。異常が検出されたスレッド１２０を含むプロセスを終了することによって、監視スレッド１１０は、スレッド１２０が利用中のメモリ４２０またはロックの解放漏れを、防ぐことができる。 For this reason, when the monitoring thread 110 detects an abnormality of the thread 120, the monitoring thread 110 may execute an appropriate process on the thread 120 or a process including the thread 120. Here, appropriate processing means that information of the thread 120 in which an abnormality is detected is left in the log, the process including the thread 120 is terminated, and then the process including the thread 120 is restarted. By terminating the process including the thread 120 in which the abnormality is detected, the monitoring thread 110 can prevent the memory 420 or lock being used by the thread 120 from being leaked.

また、監視スレッド１１０は、異常が検出されたスレッド１２０を終了させてから、再度スレッド１２０を生成してもよい。これによって、プロセスを終了させずに済むため、監視スレッド１１０は、本実施形態のスレッド監視の対象であるプログラムが提供するサービスへの影響を抑えることができる。 Further, the monitoring thread 110 may generate the thread 120 again after terminating the thread 120 in which an abnormality is detected. As a result, it is not necessary to terminate the process, so that the monitoring thread 110 can suppress the influence on the service provided by the program that is the target of thread monitoring of this embodiment.

なお、監視スレッド１１０は、図６に示す処理によって異常と判定されたスレッド１２０の識別子に基づいて、その異常と判定されたスレッド１２０を含むプロセスの識別子を、メモリ４２０から読み出すことが可能であり、読み出されたプロセスの識別子を別プロセスに通知することで、スレッド１２０を含むプロセスを再起動する。 The monitoring thread 110 can read the identifier of the process including the thread 120 determined to be abnormal from the memory 420 based on the identifier of the thread 120 determined to be abnormal by the processing illustrated in FIG. The process including the thread 120 is restarted by notifying another process of the identifier of the read process.

第１の実施形態によれば、スレッド１２０が、無限ループ、デッドロック、またはロック処理を伴わない停止状態であることによって、正常に動作していない場合に、スレッド１２０が異常に動作していることを検出することができる。 According to the first embodiment, the thread 120 is operating abnormally when the thread 120 is not operating normally due to an infinite loop, deadlock, or a stopped state that does not involve lock processing. Can be detected.

また、第１の実施形態によれば、ＡＰＩ関数フックによってスレッド１２０を監視するための生存フラグ１４０を更新することによって、カーネル１００の変更、スレッド監視の対象であるプログラムのソースコードの変更およびリコンパイル、または、スレッド監視の対象であるプログラムのコンパイル済み実行ファイルへの変更を伴わないスレッド監視方法を実現できる。 Further, according to the first embodiment, by updating the survival flag 140 for monitoring the thread 120 by the API function hook, the kernel 100 is changed, the source code of the program to be monitored is changed, and the program is changed. It is possible to realize a thread monitoring method that does not involve compiling or changing a compiled execution file of a program to be monitored.

また、第１の実施形態によれば、既存のＡＰＩ関数の処理に生存フラグ１４０が追加されるのみであるため、スレッド監視の対象のプログラムの処理を大きく遅延させることがない。 Further, according to the first embodiment, since the survival flag 140 is only added to the processing of the existing API function, the processing of the thread monitoring target program is not greatly delayed.

（第２の実施形態）
前述の第１の実施形態においては、例えば、図１に示すステップ１２２における命令メッセージのＷＡＩＴ状態が監視周期よりも長い場合、監視スレッド１１０は、正常なＷＡＩＴ状態を異常として誤判定する可能性がある。第２の実施形態の方法は、例えば、スレッド１２０をＷＡＩＴ状態にする処理を含むＡＰＩ関数など、処理が監視周期よりも長い時間かかるＡＰＩ関数を監視するための方法である。 (Second Embodiment)
In the first embodiment described above, for example, when the WAIT state of the instruction message in step 122 shown in FIG. 1 is longer than the monitoring cycle, the monitoring thread 110 may erroneously determine that the normal WAIT state is abnormal. is there. The method according to the second embodiment is a method for monitoring an API function that takes a longer time than the monitoring cycle, for example, an API function including a process that sets the thread 120 to a WAIT state.

図７は、本発明の第２の実施形態の異常動作検出の処理を示すシーケンス図である。 FIG. 7 is a sequence diagram showing processing for detecting an abnormal operation according to the second embodiment of this invention.

図７は、スレッド１２０による監視情報６２０の更新処理９００、および、更新処理９０２を示し、監視スレッド１１０による異常動作検出９１０〜異常動作検出９１２を示す。また、ＷＡＩＴ状態９０１は、本実施形態のＡＰＩ関数フックされるＡＰＩ関数がＷＡＩＴ状態である期間を示す。 FIG. 7 shows an update process 900 and an update process 902 of the monitoring information 620 by the thread 120, and shows abnormal operation detection 910 to abnormal operation detection 912 by the monitoring thread 110. The WAIT state 901 indicates a period during which the API function to be hooked by the API function of this embodiment is in the WAIT state.

また、第２の実施形態の監視情報６２０は、生存フラグ１４０、連続ＮＧ回数６２１および処理待ち数６２２を含む。第２の実施形態の生存フラグ１４０および連続ＮＧ回数６２１は、第１の実施形態の生存フラグ１４０および連続ＮＧ回数６２１と同じである。処理待ち数６２２は、スレッド１２０がＷＡＩＴ状態９０１であるか否かを判定するための値を格納する領域である。 Further, the monitoring information 620 of the second embodiment includes a survival flag 140, a continuous NG count 621, and a processing wait number 622. The survival flag 140 and the number of consecutive NGs 621 in the second embodiment are the same as the survival flag 140 and the number of consecutive NGs 621 in the first embodiment. The processing wait number 622 is an area for storing a value for determining whether or not the thread 120 is in the WAIT state 901.

更新処理９００、ＷＡＩＴ状態９０１および更新処理９０２は、本実施形態のＡＰＩ関数フックされる一つのＡＰＩ関数の処理である。スレッド１２０は、ＡＰＩ関数フックされるＡＰＩ関数がＷＡＩＴ状態９０１となる前に、更新処理９００によって生存フラグ１４０に１を格納する。すなわち、更新処理９００は、第１の実施形態における生存報告２００の処理を含む。 The update process 900, the WAIT state 901, and the update process 902 are processes of one API function hooked by the API function of the present embodiment. The thread 120 stores 1 in the survival flag 140 by the update process 900 before the API function to be hooked into the API function enters the WAIT state 901. That is, the update process 900 includes the process of the survival report 200 in the first embodiment.

一方、監視スレッド１１０は、更新処理９００後の異常動作検出９１０において生存フラグ１４０を参照し、スレッド１２０が正常であると判定する。そして、スレッド１２０は、更新処理９００後にＷＡＩＴ状態９０１となる。第２の実施形態におけるＷＡＩＴ状態９０１は、監視スレッド１１０が保持する監視周期よりも長い。 On the other hand, the monitoring thread 110 refers to the survival flag 140 in the abnormal operation detection 910 after the update process 900 and determines that the thread 120 is normal. Then, the thread 120 enters the WAIT state 901 after the update processing 900. The WAIT state 901 in the second embodiment is longer than the monitoring cycle held by the monitoring thread 110.

ここで、異常動作検出９１０〜異常動作検出９１２が、第１の実施形態の異常動作検出８００と同じ処理を行う場合、監視スレッド１１０は、異常動作検出９１１において、生存フラグ１４０に０が格納されているため、ＷＡＩＴ状態９０１であるスレッド１２０を異常であると判定する。ＷＡＩＴ状態９０１は、スレッド１２０の異常状態ではないため、異常動作検出９１１によってスレッド１２０を異常と判定する結果は、誤りである。 Here, when the abnormal operation detection 910 to the abnormal operation detection 912 perform the same processing as the abnormal operation detection 800 of the first embodiment, the monitoring thread 110 stores 0 in the survival flag 140 in the abnormal operation detection 911. Therefore, it is determined that the thread 120 in the WAIT state 901 is abnormal. Since the WAIT state 901 is not an abnormal state of the thread 120, the result of determining that the thread 120 is abnormal by the abnormal operation detection 911 is an error.

そこで、第２の実施形態における監視スレッド１１０は、異常動作検出処理において監視情報６２０が保持する処理待ち数６２２を参照することによって、正常なＷＡＩＴ状態９０１を異常と誤判定しない。第２の実施形態のスレッド１２０は、ＷＡＩＴ状態９０１になる前後において、監視情報６２０の処理待ち数６２２を更新する。 Therefore, the monitoring thread 110 in the second embodiment does not erroneously determine the normal WAIT state 901 as abnormal by referring to the processing wait number 622 held in the monitoring information 620 in the abnormal operation detection processing. The thread 120 according to the second embodiment updates the processing wait number 622 of the monitoring information 620 before and after entering the WAIT state 901.

第２の実施形態の機器４０のハードウェア構成は、図２に示す第１の実施形態のハードウェア構成と同じく、プロセッサ４００、メモリ４２０、ＮＩＦ４４０、バス４１０、および、バス４３０を備える。 The hardware configuration of the device 40 according to the second embodiment includes the processor 400, the memory 420, the NIF 440, the bus 410, and the bus 430, similarly to the hardware configuration according to the first embodiment illustrated in FIG.

第２の実施形態におけるＡＰＩ関数フック群５１０の決定方法について説明する。第１の実施形態と同じく、本実施形態のスレッド監視の対象のプログラム、すなわち、スレッド１２０を生成するプログラムの開発者または使用者がＡＰＩ関数フック群５１０に含まれるＡＰＩ関数を選択する。また、開発者または使用者が、第１の実施形態と同じく、スレッドプロシージャ５２０からＡＰＩ関数を抜き出すためのスクリプトを用いて、ＷＡＩＴ状態９０１になるＡＰＩ関数を取得し、取得されたＡＰＩ関数をＡＰＩ関数フック群５１０に含めてもよい。 A method for determining the API function hook group 510 in the second embodiment will be described. As in the first embodiment, the developer or user of the thread monitoring target program of this embodiment, that is, the program that generates the thread 120, selects an API function included in the API function hook group 510. Similarly to the first embodiment, the developer or user acquires an API function that enters the WAIT state 901 by using a script for extracting the API function from the thread procedure 520, and uses the acquired API function as the API. It may be included in the function hook group 510.

第２の実施形態においてＷＡＩＴ状態９０１になるＡＰＩ関数には、他のプロセスまたはスレッドから信号を受信するまで、スレッド１２０を停止させるタイマ付き状態同期関数が含まれる。また、使用者からの入力を待つ入力待ち関数、および、指定された時間の間停止するスリープ関数が含まれる。 The API function that enters the WAIT state 901 in the second embodiment includes a state synchronization function with a timer that stops the thread 120 until a signal is received from another process or thread. Also included are an input wait function that waits for input from the user and a sleep function that stops for a specified time.

第２の実施形態におけるスレッド１２０は、ＷＡＩＴ状態９０１になるＡＰＩ関数にＡＰＩ関数フックする。ここで、ロック取得関数も、ＷＡＩＴ状態９０１になるＡＰＩ関数であるが、一方で、ロック取得関数はデッドロックを引き起こす可能性があるＡＰＩ関数である。 The thread 120 in the second embodiment hooks an API function to an API function that enters the WAIT state 901. Here, the lock acquisition function is also an API function that enters the WAIT state 901. On the other hand, the lock acquisition function is an API function that may cause a deadlock.

このため、開発者または使用者は、ＡＰＩ関数フック群５１０に含めることを禁止するためのリスト（ＡＰＩ関数フック禁止リスト）に、ロック取得関数をあらかじめ格納してもよい。そして、開発者または使用者が、ＡＰＩ関数フック禁止リストが示すＡＰＩ関数以外のＡＰＩ関数をＡＰＩ関数フック群５１０に選択することによって、デッドロックにおける監視スレッド１１０の状態の誤判定を回避できる。 Therefore, the developer or user may store the lock acquisition function in advance in a list (API function hook prohibition list) for prohibiting inclusion in the API function hook group 510. Then, the developer or user can select an API function other than the API function indicated by the API function hook prohibition list in the API function hook group 510, thereby avoiding erroneous determination of the state of the monitoring thread 110 in the deadlock.

なお、第２の実施形態のＡＰＩ関数フック群５１０には、第１の実施形態と同じくＷＡＩＴ状態９０１にならないＡＰＩ関数が選択されてもよい。ＡＰＩ関数フック群５１０に、ＷＡＩＴ状態９０１になるＡＰＩ関数と、ＷＡＩＴ状態９０１にならないＡＰＩ関数とが含まれる場合、各ＡＰＩ関数がＷＡＩＴ状態９０１になるか否かを示すリストをあらかじめ保持し、前述のリストを参照することによって、各ＡＰＩ関数に追加する処理を決定してもよい。これによって、本実施形態の機器４０は、第１の実施形態と第２の実施形態とを実装することができる。 An API function that does not enter the WAIT state 901 may be selected for the API function hook group 510 of the second embodiment as in the first embodiment. When the API function hook group 510 includes an API function that enters the WAIT state 901 and an API function that does not enter the WAIT state 901, a list indicating whether or not each API function enters the WAIT state 901 is stored in advance. The processing to be added to each API function may be determined by referring to the list. As a result, the device 40 of the present embodiment can implement the first embodiment and the second embodiment.

次に、共有ライブラリ５００の生成方法について説明する。第１の実施形態と同じく、開発者または使用者が、ＡＰＩ関数フック群５１０に、各ＡＰＩ関数の処理内容として、ＡＰＩ関数固有の処理と、生存報告２００の処理と、処理待ち数６２２を更新する処理とを格納する。また、ＡＰＩ関数フック群５１０の処理内容は、各処理が実行される順番を示す。これによって、共有ライブラリ５００が生成される。 Next, a method for generating the shared library 500 will be described. As in the first embodiment, the developer or user updates the API function hook group 510 to the API function-specific processing, the processing of the survival report 200, and the processing wait number 622 as processing contents of each API function. Stores the processing to be performed. Further, the processing content of the API function hook group 510 indicates the order in which each processing is executed. As a result, the shared library 500 is generated.

ＡＰＩ関数フック群５１０の処理内容に含まれる処理待ち数６２２を更新する処理は、処理待ち数６２２をスレッド１２０がＷＡＩＴ状態９０１である間増加させる処理を追加である。 The process of updating the processing wait number 622 included in the processing content of the API function hook group 510 is an additional process of increasing the processing wait number 622 while the thread 120 is in the WAIT state 901.

具体的には、スレッド１２０が、処理待ち数６２２の値を０に初期化してから処理待ち数６２２に１を加算する処理と、生存フラグ１４０に１を格納する処理とを実行し、その後、ＷＡＩＴ状態９０１になるＡＰＩ関数固有の処理を実行するように、開発者または使用者はＡＰＩ関数フック群５１０を生成する。さらに、スレッド１２０が、ＷＡＩＴ状態９０１になるＡＰＩ関数固有の処理の終了直後に処理待ち数６２２から１を減算する処理を実行するように、開発者または使用者は、ＡＰＩ関数フック群５１０を生成する。 Specifically, the thread 120 executes a process of initializing the value of the process wait number 622 to 0 and adding 1 to the process wait number 622 and a process of storing 1 in the survival flag 140, and then The developer or user generates the API function hook group 510 so as to execute processing specific to the API function that enters the WAIT state 901. Further, the developer or user generates the API function hook group 510 so that the thread 120 executes a process of subtracting 1 from the processing wait number 622 immediately after the end of the process specific to the API function that enters the WAIT state 901. To do.

これによって、図７に示すＷＡＩＴ状態９０１の開始直前にスレッド１２０は更新処理９００を実行し、処理待ち数６２２に１を加算する。また、ＷＡＩＴ状態９０１の終了直後にスレッド１２０は更新処理９０２を実行し、処理待ち数６２２を０にする。このため、監視スレッド１１０は、異常動作検出９１１において処理待ち数６２２が０より大きい場合、スレッド１２０がＷＡＩＴ状態９０１であると判定できる。 As a result, immediately before the start of the WAIT state 901 shown in FIG. 7, the thread 120 executes the update process 900 and adds 1 to the processing wait number 622. Further, immediately after the end of the WAIT state 901, the thread 120 executes the update process 902 and sets the process wait number 622 to zero. Therefore, the monitoring thread 110 can determine that the thread 120 is in the WAIT state 901 when the processing wait number 622 is greater than 0 in the abnormal operation detection 911.

次に、第２の実施形態におけるＡＰＩ関数フックによる監視情報６２０の更新処理について説明する。第１の実施形態と同じく、開発者または使用者によって生成された共有ライブラリ５００のパスがＬＤ＿ＰＲＥＬＯＡＤ環境変数５３０に格納された場合、スレッド１２０を生成するプログラムの実行ファイルが起動された後、ＡＰＩ関数フック群５１０に含まれるＡＰＩ関数のフックが有効になる。 Next, update processing of the monitoring information 620 by the API function hook in the second embodiment will be described. As in the first embodiment, when the path of the shared library 500 generated by the developer or user is stored in the LD_PRELOAD environment variable 530, the API function is started after the execution file of the program that generates the thread 120 is started. The API function hook included in the hook group 510 is enabled.

スレッド１２０が実行されると、ＡＰＩ関数フックによって更新処理９００および更新処理９０２が実行され、スレッド１２０はＡＰＩ関数固有の処理の直前に処理待ち数６２２に１を加算し、ＡＰＩ関数固有の処理の直後に処理待ち数６２２から１を減算する。 When the thread 120 is executed, an update process 900 and an update process 902 are executed by the API function hook. The thread 120 adds 1 to the processing wait number 622 immediately before the API function-specific process, and the API function-specific process is executed. Immediately after that, 1 is subtracted from the processing waiting number 622.

次に、異常動作検出９１０〜異常動作検出９１２について説明する。異常動作検出９１０〜異常動作検出９１２は、図６に示す第１の実施形態の異常動作検出８００と同様である。しかし、図６に示すステップ８１０において、第２の実施形態の生存確認３００と、第１の実施形態の生存確認３００とが異なる。 Next, the abnormal operation detection 910 to the abnormal operation detection 912 will be described. The abnormal operation detection 910 to the abnormal operation detection 912 are the same as the abnormal operation detection 800 of the first embodiment shown in FIG. However, in step 810 shown in FIG. 6, the survival confirmation 300 of the second embodiment is different from the survival confirmation 300 of the first embodiment.

図８は、本発明の第２の実施形態の生存確認の処理を示すフローチャートである。 FIG. 8 is a flowchart showing a survival confirmation process according to the second embodiment of this invention.

図８に示す処理は、第２の実施形態における図６に示す処理のステップ８１０に含まれる。そして、第１の実施形態のステップ８１０における生存確認３００に相当する。 The process shown in FIG. 8 is included in step 810 of the process shown in FIG. 6 in the second embodiment. This corresponds to the survival confirmation 300 in step 810 of the first embodiment.

第２の実施形態の監視スレッド１１０がステップ８１０によって異常動作検出処理を開始した後、監視スレッド１１０は生存確認処理を開始する（１０００）。監視スレッド１１０は、スレッド１２０に対応する処理待ち数６２２に格納される値が０より大きいか否かを判定する（１０１０）。 After the monitoring thread 110 of the second embodiment starts the abnormal operation detection process in step 810, the monitoring thread 110 starts the survival confirmation process (1000). The monitoring thread 110 determines whether or not the value stored in the processing wait number 622 corresponding to the thread 120 is greater than 0 (1010).

ステップ１０１０において処理待ち数６２２に格納される値が０より大きいと判定された場合、スレッド１２０はＷＡＩＴ状態９０１である。このため、スレッド１２０は正常と判定することが可能であり、監視スレッド１１０はスレッド１２０を監視する必要がないため、生存確認の結果を有効と判定する（１０２０）。 If it is determined in step 1010 that the value stored in the processing wait number 622 is greater than 0, the thread 120 is in the WAIT state 901. For this reason, it is possible to determine that the thread 120 is normal, and the monitoring thread 110 does not need to monitor the thread 120, so the result of the survival check is determined to be valid (1020).

ステップ１０１０において処理待ち数６２２に格納される値が０であると判定された場合、スレッド１２０はＷＡＩＴ状態９０１ではない。このため、監視スレッド１１０は、スレッド１２０を監視するため、生存フラグ１４０に格納された値が１であるか否かを判定する（１０３０）。 If it is determined in step 1010 that the value stored in the processing wait number 622 is 0, the thread 120 is not in the WAIT state 901. Therefore, the monitoring thread 110 determines whether or not the value stored in the survival flag 140 is 1 in order to monitor the thread 120 (1030).

ステップ１０３０において生存フラグ１４０に格納される値が１であると判定された場合、スレッド１２０は正常に実行されている。このため、監視スレッド１１０は、生存フラグ１４０に０を格納し（１０４０）、ステップ１０２０を実行する。 If it is determined in step 1030 that the value stored in the survival flag 140 is 1, the thread 120 is normally executed. For this reason, the monitoring thread 110 stores 0 in the survival flag 140 (1040) and executes Step 1020.

ステップ１０３０において生存フラグ１４０に格納される値が１ではないと判定された場合、スレッド１２０は異常である可能性がある。このため、監視スレッド１１０は、生存確認の結果を無効と判定する（１０５０）。 If it is determined in step 1030 that the value stored in the survival flag 140 is not 1, the thread 120 may be abnormal. Therefore, the monitoring thread 110 determines that the result of the survival confirmation is invalid (1050).

ステップ１０２０またはステップ１０５０の後、監視スレッド１１０はステップ８１０に含まれる生存確認の処理を終了する（１０６０）。 After step 1020 or step 1050, the monitoring thread 110 ends the survival confirmation process included in step 810 (1060).

図６および図８に示す処理によって、図７に示す異常動作検出９１１を行う場合、監視スレッド１１０は、処理待ち数６２２に１が格納されているため、スレッド１２０が正常であると判定する。 When the abnormal operation detection 911 shown in FIG. 7 is performed by the processing shown in FIGS. 6 and 8, the monitoring thread 110 determines that the thread 120 is normal because 1 is stored in the processing wait number 622.

そして、図６および図８に示す処理によって、図７に示す異常動作検出９１２を行う場合、監視スレッド１１０は、処理待ち数６２２に０が格納されており、かつ、生存フラグ１４０に１が格納されているため、スレッド１２０が正常であると判定する。これは、図８に示す処理によって、生存フラグ１４０は、ＷＡＩＴ状態９０１の間、１を示す値を保持し続けるためである。 When the abnormal operation detection 912 shown in FIG. 7 is performed by the processing shown in FIGS. 6 and 8, the monitoring thread 110 stores 0 in the processing wait number 622 and 1 in the survival flag 140. Therefore, it is determined that the thread 120 is normal. This is because the survival flag 140 keeps a value indicating 1 during the WAIT state 901 by the processing shown in FIG.

第２の実施形態によれば、監視情報６２０が処理待ち数６２２を保持することによって、監視スレッド１１０は、例えば、スレッド１２０をＷＡＩＴ状態９０１にするＡＰＩ関数などのＡＰＩ関数が、監視周期よりも長い時間処理する場合においても、ＡＰＩ関数の処理中はスレッド１２０を正常であると判定することができる。また、処理待ち数６２２を生存フラグ１４０を参照するよりも先に参照することによって、ＷＡＩＴ状態９０１中は、生存フラグ１４０が０に更新されない。この結果、監視スレッド１１０がＷＡＩＴ状態９０１終了直後に異常動作検出９１２を行っても、生存フラグ１４０には１が格納されているため、監視スレッド１１０はスレッド１２０を正常と判定することができる。 According to the second embodiment, when the monitoring information 620 holds the processing wait number 622, the monitoring thread 110 has an API function such as an API function that sets the thread 120 to the WAIT state 901, for example, more than the monitoring cycle. Even when processing is performed for a long time, it is possible to determine that the thread 120 is normal while the API function is being processed. In addition, the survival flag 140 is not updated to 0 in the WAIT state 901 by referring to the processing wait number 622 before referring to the survival flag 140. As a result, even if the monitoring thread 110 performs the abnormal operation detection 912 immediately after the end of the WAIT state 901, 1 is stored in the survival flag 140, so the monitoring thread 110 can determine that the thread 120 is normal.

第２の実施形態の監視スレッド１１０が、スレッド１２０を異常であると判定した場合の処理は、第１の実施形態と同じであり、スレッド１２０またはスレッド１２０が含まれるプロセスを再起動する。 Processing when the monitoring thread 110 of the second embodiment determines that the thread 120 is abnormal is the same as that of the first embodiment, and the thread 120 or a process including the thread 120 is restarted.

第２の実施形態によれば、ＷＡＩＴ状態となるＡＰＩ関数をフックすることによって、カーネル１００の変更、スレッド監視の対象であるプログラムのソースコードの変更およびリコンパイル、または、スレッド監視の対象であるプログラムのコンパイル済み実行ファイルへの変更を伴わないスレッド監視方法を実現できる。 According to the second embodiment, by hooking an API function that becomes a WAIT state, the kernel 100 is changed, the source code of the program that is the target of thread monitoring is changed and recompiled, or the target of the thread monitoring is performed. It is possible to implement a thread monitoring method that does not involve changes to the compiled executable file of the program.

また、第２の実施形態によれば、監視スレッド１１０は、ＷＡＩＴ状態が監視周期よりも長い場合にも、正常なＷＡＩＴ状態を異常として誤判定しない。 Further, according to the second embodiment, the monitoring thread 110 does not erroneously determine that the normal WAIT state is abnormal even when the WAIT state is longer than the monitoring cycle.

また、第２の実施形態によれば、既存のＡＰＩ関数の処理に処理待ち数６２２が追加されるのみであるため、スレッド監視の対象のプログラムの処理を大きく遅延させることがない。 Further, according to the second embodiment, the processing wait number 622 is only added to the processing of the existing API function, so that the processing of the thread monitoring target program is not greatly delayed.

なお、開発者または使用者は、ＡＰＩ関数フック群５１０の生成処理、ＬＤ＿ＰＲＥＬＯＡＤ環境変数５３０に共有ライブラリ５００の識別子を設定する処理、および、監視スレッド１１０を実行する処理等の、第１の実施形態および第２の実施形態を行うための設定処理を、機器４０が備えるキーボード等の入力装置を介して行ってもよい。 It should be noted that the developer or the user uses the first embodiment, such as the process of generating the API function hook group 510, the process of setting the identifier of the shared library 500 in the LD_PRELOAD environment variable 530, and the process of executing the monitoring thread 110. And the setting process for performing 2nd Embodiment may be performed via input devices, such as a keyboard with which the apparatus 40 is provided.

また、開発者または使用者は、前述の第１の実施形態および第２の実施形態を行うための設定処理を行うプログラムを生成し、他の計算機から機器４０へＮＩＦ４４０を介して、生成されたプログラムを送信してもよい。そして、機器４０に、送信されたプログラムによって第１の実施形態および第２の実施形態を行うための設定をさせてもよい。 In addition, the developer or the user generates a program for performing the setting process for performing the first embodiment and the second embodiment described above, and is generated from another computer to the device 40 via the NIF 440. You may send a program. And you may make the apparatus 40 set for performing 1st Embodiment and 2nd Embodiment with the transmitted program.

また、開発者または使用者は、前述の第１の実施形態および第２の実施形態を行うための設定処理を行うプログラムを、計算機によって読み取り可能な非一時的記憶媒体によって、機器４０に入力してもよい。 Further, the developer or user inputs a program for performing the setting process for performing the first embodiment and the second embodiment to the device 40 by a non-transitory storage medium readable by the computer. May be.

また、第１の実施形態および第２の実施形態において、前述の監視スレッド１１０は、メモリ４２０が保持する監視プログラムを実行するためのスレッドであるが、他の機能を有するプログラムに監視スレッド１１０の機能を含められることによって、実行されてもよい。 In the first embodiment and the second embodiment, the monitoring thread 110 described above is a thread for executing a monitoring program held in the memory 420. However, the monitoring thread 110 may be a program having other functions. It may be performed by including a function.

（第３の実施形態）
前述した第１の実施形態と第２の実施形態とにおいては、ＡＰＩ関数フック群５１０をスレッド監視の対象であるプログラムの開発者または使用者によって予め選択する必要があった。しかし、第３の実施形態において、定期的または指定された時刻に、ＡＰＩ関数の使用状況に基づいてＡＰＩ関数フック群５１０が自動的に決定される。 (Third embodiment)
In the first embodiment and the second embodiment described above, the API function hook group 510 needs to be selected in advance by the developer or user of the program to be monitored. However, in the third embodiment, the API function hook group 510 is automatically determined based on the usage status of the API function periodically or at a designated time.

第３の実施形態における機器４０のハードウェア構成は、図２に示す第１の実施形態の機器４０と同じである。また、第３の実施形態のプロセッサ４００は、第１の実施形態および第２の実施形態と同じく、図４に示す監視スレッド１１０およびスレッド１２０を実行する。 The hardware configuration of the device 40 in the third embodiment is the same as the device 40 of the first embodiment shown in FIG. In addition, the processor 400 of the third embodiment executes the monitoring thread 110 and the thread 120 illustrated in FIG. 4 as in the first and second embodiments.

図９は、本発明の第３の実施形態のＡＰＩ関数フック群５１０を決定する処理を示す説明図である。 FIG. 9 is an explanatory diagram illustrating processing for determining the API function hook group 510 according to the third embodiment of this invention.

ＡＰＩ関数フック群５１０を決定するために、スレッド１２０は、ＡＰＩ関数の使用状況を示す統計情報１１４０を更新する。第３の実施形態において、スレッド１２０は、図９に示す統計情報１１４０を取得するための共有ライブラリ１１００とＬＤ＿ＡＵＤＩＴ機能とを利用し、各ＡＰＩ関数の統計情報１１４０を取得する。 In order to determine the API function hook group 510, the thread 120 updates the statistical information 1140 indicating the usage status of the API function. In the third embodiment, the thread 120 acquires the statistical information 1140 of each API function by using the shared library 1100 for acquiring the statistical information 1140 shown in FIG. 9 and the LD_AUDIT function.

第３の実施形態におけるＬＤ＿ＡＵＤＩＴ機能とは、第３の実施形態のＬＤ＿ＡＵＤＩＴ環境変数１１２０が保持する機能である。第３の実施形態のＬＤ＿ＡＵＤＩＴ環境変数１１２０には、共有ライブラリ１１００を読み出すための識別子が格納され、共有ライブラリ１１００には、ＡＰＩ関数フックされた際の処理内容が含まれる。ＬＤ＿ＡＵＤＩＴ環境変数１１２０は、第３の実施形態のスレッド監視の対象であるプログラムに含まれるＡＰＩ関数をメモリ４２０にロードする際に読み出される。 The LD_AUDIT function in the third embodiment is a function held by the LD_AUDIT environment variable 1120 in the third embodiment. The LD_AUDIT environment variable 1120 of the third embodiment stores an identifier for reading the shared library 1100, and the shared library 1100 includes the processing contents when the API function is hooked. The LD_AUDIT environment variable 1120 is read when an API function included in a program that is a target of thread monitoring according to the third embodiment is loaded into the memory 420.

そして、共有ライブラリ１１００を読み出すための識別子が開発者または使用者によってＬＤ＿ＡＵＤＩＴ環境変数１１２０に格納された後、スレッド監視の対象であるプログラムによって実行されるすべてのＡＰＩ関数は、メモリ４２０にロードされる際、共有ライブラリ１１００で指定した処理を追加してロードする。なお、スレッドプロシージャ５２０には、第１の実施形態と同じく、スレッド監視の対象であるプログラムに含まれるＡＰＩ関数が含まれる。 Then, after an identifier for reading the shared library 1100 is stored in the LD_AUDIO environment variable 1120 by the developer or user, all API functions executed by the program to be monitored by the thread are loaded into the memory 420. At this time, the process designated by the shared library 1100 is added and loaded. Note that the thread procedure 520 includes an API function included in a program that is a target of thread monitoring, as in the first embodiment.

共有ライブラリ１１００には、ＡＰＩ関数名を統計情報１１４０に出力する処理を示す処理内容１１１０が格納される。このため、各スレッド１２０によって実行されるＡＰＩ関数は、メモリ４２０にロードされる際に処理が追加されることによって、ＡＰＩ関数固有の処理と、ＡＰＩ関数名を統計情報１１４０に出力する処理とを実行する。 The shared library 1100 stores processing content 1110 indicating processing for outputting the API function name to the statistical information 1140. For this reason, the API function executed by each thread 120 includes processing unique to the API function and processing for outputting the API function name to the statistical information 1140 when processing is added to the memory 420 when it is loaded. Run.

例えば、スレッドプロシージャ５２０がｓｌｅｅｐＡＰＩ関数を含む場合、プロセッサ４００がｓｌｅｅｐＡＰＩ関数固有の処理５６０を実行する前に、スレッド１２０は、メモリ４２０にロードされた追加処理を実行することによって、ｓｌｅｅｐＡＰＩ関数の処理５６０にＡＰＩ関数フック１１２１処理を行う。そして、ＡＰＩ関数フック１１２１がされた後、スレッド１２０はＡＰＩ関数名（すなわち、「ｓｌｅｅｐ」）を統計情報１１４０に出力する処理内容１１１０が示す処理を実行する。 For example, if the thread procedure 520 includes a sleep API function, the thread 120 performs additional processing loaded into the memory 420 before the processor 400 executes the sleep API function specific processing 560, thereby causing the sleep API function to be executed. In step 560, the API function hook 1121 is processed. After the API function hook 1121 is executed, the thread 120 executes the processing indicated by the processing content 1110 for outputting the API function name (that is, “sleep”) to the statistical information 1140.

従って、共有ライブラリ１１００が生成され、ＬＤ＿ＡＵＤＩＴ環境変数１１２０に共有ライブラリ１１００を示す識別子が格納された後に、スレッド監視の対象であるプログラムを起動した場合、統計情報１１４０には、実行されたＡＰＩ関数の使用状況が格納される。 Therefore, when the shared library 1100 is generated and the identifier indicating the shared library 1100 is stored in the LD_AUDIT environment variable 1120 and then the program to be monitored is started, the statistical information 1140 includes the executed API function. Stores usage status.

統計情報１１４０は、メモリ４２０に保持されるデータの集合であり、ＡＰＩ関数名１１４１と使用回数１１４２とを保持する。ＡＰＩ関数名１１４１は、処理内容１１１０の処理によって入力されたＡＰＩ関数名を含む。使用回数１１４２は、ＡＰＩ関数名１１４１が示すＡＰＩ関数が、実行された回数を含む。 The statistical information 1140 is a set of data held in the memory 420, and holds an API function name 1141 and a use count 1142. The API function name 1141 includes the API function name input by the processing of the processing content 1110. The usage count 1142 includes the number of times the API function indicated by the API function name 1141 has been executed.

なお、スレッド１２０は、処理内容１１１０が示す処理において、統計情報１１４０に出力するＡＰＩ関数名が統計情報１１４０に既に含まれているか否かを判定する。そして、スレッド１２０は、統計情報１１４０に出力するＡＰＩ関数名が統計情報１１４０に既に含まれていた場合、スレッド１２０によって出力されるＡＰＩ関数名をＡＰＩ関数名１１４１に含むエントリの使用回数１１４２に１を加算する。 The thread 120 determines whether or not the API function name output to the statistical information 1140 is already included in the statistical information 1140 in the processing indicated by the processing content 1110. If the API function name to be output to the statistical information 1140 is already included in the statistical information 1140, the thread 120 sets the API function name output by the thread 120 to 1 in the usage count 1142 of the API function name 1141. Is added.

また、統計情報１１４０に出力するＡＰＩ関数名が統計情報１１４０に含まれていない場合、スレッド１２０は、処理内容１１１０が示す処理において、統計情報１１４０に新たなエントリを生成し、新規ＡＰＩ関数としてＡＰＩ関数名を生成されたエントリに格納する。そして、生成されたエントリの使用回数１１４２に１を格納する。これによって、統計情報１１４０は、ＡＰＩ関数が実行された回数を、ＡＰＩ関数毎に保持できる。 If the API function name to be output to the statistical information 1140 is not included in the statistical information 1140, the thread 120 generates a new entry in the statistical information 1140 in the processing indicated by the processing content 1110, and uses the API as a new API function. Store the function name in the generated entry. Then, 1 is stored in the use count 1142 of the generated entry. Thus, the statistical information 1140 can hold the number of times the API function is executed for each API function.

次に、ＡＰＩ関数決定プログラム１１７０は、更新された統計情報１１４０からスレッド監視処理におけるＡＰＩ関数フック群５１０を決定する。ＡＰＩ関数決定プログラム１１７０は、機器４０がメモリ４２０に保持するプログラムであり、プロセッサ４００によって実行される。ＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数決定プログラム１１７０の処理が可能であれば、別の機能を含むプログラムによって実装されても、複数のプログラムによって実装されてもよい。 Next, the API function determination program 1170 determines the API function hook group 510 in the thread monitoring process from the updated statistical information 1140. The API function determination program 1170 is a program that the device 40 holds in the memory 420 and is executed by the processor 400. As long as the API function determination program 1170 can be processed, the API function determination program 1170 may be implemented by a program including another function or a plurality of programs.

ここで、ＡＰＩ関数決定プログラム１１７０は、統計情報１１４０に含まれるＡＰＩ関数名１１４１をすべてＡＰＩ関数フック群５１０に決定してもよい。しかし、ＡＰＩ関数決定プログラム１１７０がデッドロックを起こすＡＰＩ関数を決定した場合、監視スレッド１１０がスレッド１２０の状態を正確に判定できなくなる可能性がある。 Here, the API function determination program 1170 may determine all API function names 1141 included in the statistical information 1140 as the API function hook group 510. However, if the API function determination program 1170 determines an API function that causes a deadlock, the monitoring thread 110 may not be able to accurately determine the state of the thread 120.

そこで、第３の実施形態において、機器４０はメモリ４２０にＡＰＩ関数フック禁止リスト１１３０を保持する。ＡＰＩ関数フック禁止リスト１１３０には、開発者または使用者によって、ＡＰＩ関数フック群５１０として決定されてはいけないＡＰＩ関数を示す値が格納される。 Therefore, in the third embodiment, the device 40 holds the API function hook prohibition list 1130 in the memory 420. The API function hook prohibition list 1130 stores a value indicating an API function that should not be determined as the API function hook group 510 by the developer or the user.

ＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数フック禁止リスト１１３０および統計情報１１４０を参照し、ＡＰＩ関数名１１４１が示すＡＰＩ関数のうち、ＡＰＩ関数フック禁止リスト１１３０に含まれていないＡＰＩ関数を、ＡＰＩ関数フック群５１０に決定する。例えば、ＡＰＩ関数フック禁止リスト１１３０にｃｏｎｄ＿ｗａｉｔＡＰＩ関数が格納される場合、ＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数名１１４１にｃｏｎｄ＿ｗａｉｔＡＰＩ関数が含まれていても、ｃｏｎｄ＿ｗａｉｔＡＰＩ関数をＡＰＩ関数フック群５１０に決定しない。 The API function determination program 1170 refers to the API function hook prohibition list 1130 and the statistical information 1140, and API functions that are not included in the API function hook prohibition list 1130 out of the API functions indicated by the API function name 1141 are API function hooks. Group 510 is determined. For example, when the cond_wait API function is stored in the API function hook prohibition list 1130, the API function determination program 1170 adds the cond_wait API function to the API function hook group 510 even if the API function name 1141 includes the cond_wait API function. Not decided.

第３の実施形態の共有ライブラリ５００およびＡＰＩ関数フック群５１０は、第１の実施形態および第２の実施形態の共有ライブラリ５００およびＡＰＩ関数フック群５１０と同じである。 The shared library 500 and API function hook group 510 of the third embodiment are the same as the shared library 500 and API function hook group 510 of the first and second embodiments.

ＡＰＩ関数フック禁止リスト１１３０は、開発者または使用者によって生成される。デッドロックを引き起こす可能性があるＡＰＩ関数を、開発者または使用者は事前に取得できるため、開発者または使用者が、本実施形態のスレッド監視の対象であるプログラムのソースコードを閲覧できない場合も、開発者または使用者は、ＡＰＩ関数フック禁止リスト１１３０を生成できる。 The API function hook prohibition list 1130 is generated by a developer or a user. Since an API function that may cause a deadlock can be acquired in advance by the developer or user, the developer or user may not be able to view the source code of the program that is subject to thread monitoring of this embodiment. The developer or user can generate the API function hook prohibit list 1130.

図１０は、本発明の第３の実施形態のＡＰＩ関数決定プログラム１１７０の処理を示すフローチャートである。 FIG. 10 is a flowchart showing a process of the API function determination program 1170 according to the third embodiment of this invention.

図１０は、統計情報１１４０からＡＰＩ関数フック群５１０を決定する処理を示す。 FIG. 10 shows processing for determining the API function hook group 510 from the statistical information 1140.

まず、ＡＰＩ関数決定プログラム１１７０は、定期的、または、開発者もしくは使用者等の指示に従って、ＡＰＩ関数フック群５１０を決定する処理を開始する（１２００）。 First, the API function determination program 1170 starts processing to determine the API function hook group 510 periodically or in accordance with an instruction from a developer or a user (1200).

ステップ１２００の後、ＡＰＩ関数決定プログラム１１７０は、統計情報１１４０の一つのエントリからＡＰＩ関数Ｘを抽出する（１２１０）。なお、ステップ１２１０におけるＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数フック群５１０にまだ含まれていない関数であれば、統計情報１１４０のいかなるＡＰＩ関数を抽出してもよい。 After step 1200, the API function determination program 1170 extracts the API function X from one entry of the statistical information 1140 (1210). Note that the API function determination program 1170 in step 1210 may extract any API function of the statistical information 1140 as long as it is a function that is not yet included in the API function hook group 510.

ステップ１２１０の後、ＡＰＩ関数決定プログラム１１７０は、ステップ１２１０によって統計情報１１４０からＡＰＩ関数Ｘが抽出されたか否かを判定する（１２２０）。ＡＰＩ関数Ｘが抽出されない場合、ＡＰＩ関数フック群に決定するべきＡＰＩ関数がないため、ＡＰＩ関数決定プログラム１１７０は、図１０に示す処理を終了する。 After step 1210, the API function determination program 1170 determines whether or not the API function X is extracted from the statistical information 1140 by step 1210 (1220). If the API function X is not extracted, there is no API function to be determined in the API function hook group, so the API function determination program 1170 ends the processing shown in FIG.

ステップ１２２０において、ＡＰＩ関数Ｘが抽出されたと判定された場合、ＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数ＸをＡＰＩ関数名１１４１に含む統計情報１１４０のエントリの使用回数１１４２を抽出し、抽出された使用回数１１４２が所定の閾値Ｔより大きいか否か判定する（１２３０）。ここで、所定の閾値Ｔとは統計情報１１４０に含まれるＡＰＩ関数を、ＡＰＩ関数フック群５１０に含めるか否かを判定するための閾値であり、ここでは統計情報１１４０の各エントリの使用回数１１４２の値の平均値である。 If it is determined in step 1220 that the API function X has been extracted, the API function determination program 1170 extracts the usage count 1142 of the entry of the statistical information 1140 that includes the API function X in the API function name 1141, and the extracted usage It is determined whether or not the number of times 1142 is greater than a predetermined threshold T (1230). Here, the predetermined threshold T is a threshold for determining whether or not the API function included in the statistical information 1140 is included in the API function hook group 510. Here, the number of times of use 1142 of each entry of the statistical information 1140 is used. Is the average of the values.

ステップ１２３０において、抽出された使用回数１１４２が所定の閾値Ｔ以下であると判定された場合、ＡＰＩ関数Ｘは、監視スレッド１１０によって監視されるほど、頻繁に実行される関数ではない。このため、ＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数ＸをＡＰＩ関数フック群５１０に決定しない。そして、ＡＰＩ関数決定プログラム１１７０は、ステップ１２１０に戻り、ＡＰＩ関数Ｘを抽出しなおす。 If it is determined in step 1230 that the extracted usage count 1142 is equal to or less than the predetermined threshold T, the API function X is not a function that is executed as frequently as monitored by the monitoring thread 110. For this reason, the API function determination program 1170 does not determine the API function X as the API function hook group 510. Then, the API function determination program 1170 returns to Step 1210 and extracts the API function X again.

ステップ１２３０において、抽出された使用回数１１４２が所定の閾値Ｔよりも大きいと判定された場合、ＡＰＩ関数Ｘは、監視スレッド１１０によって監視されるべきＡＰＩ関数である可能性が高い。このため、ＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数ＸがＡＰＩ関数フック禁止リスト１１３０に含まれているか否かを判定する（１２４０）。 If it is determined in step 1230 that the extracted usage count 1142 is greater than the predetermined threshold T, the API function X is likely to be an API function to be monitored by the monitoring thread 110. Therefore, the API function determination program 1170 determines whether or not the API function X is included in the API function hook prohibition list 1130 (1240).

ステップ１２４０において、ＡＰＩ関数ＸがＡＰＩ関数フック禁止リスト１１３０に含まれていると判定された場合、ＡＰＩ関数Ｘは、ＡＰＩ関数フック群５１０として決定されるべきＡＰＩ関数ではない。このため、ＡＰＩ関数決定プログラム１１７０は、ステップ１２１０に戻り、ＡＰＩ関数Ｘを抽出しなおす。 If it is determined in step 1240 that the API function X is included in the API function hook prohibition list 1130, the API function X is not an API function to be determined as the API function hook group 510. For this reason, the API function determination program 1170 returns to Step 1210 and extracts the API function X again.

ステップ１２４０において、ＡＰＩ関数ＸがＡＰＩ関数フック禁止リスト１１３０に含まれていないと判定された場合、ＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数ＸをＡＰＩ関数フック群５１０に追加する（１２５０）。そして、新しいＡＰＩ関数Ｘを抽出するため、ステップ１２１０に戻る。 If it is determined in step 1240 that the API function X is not included in the API function hook prohibition list 1130, the API function determination program 1170 adds the API function X to the API function hook group 510 (1250). Then, the process returns to step 1210 to extract a new API function X.

具体的には、図９に示す統計情報１１４０において、ｓｌｅｅｐＡＰＩ関数を示すエントリの使用回数１１４２は１００回を示し、ｐｒｉｎｔｆＡＰＩ関数の使用回数１１４２は１０回である。このため、使用回数１１４２の平均値は５５であり、ステップ１２３０における閾値Ｔは５５である。そして、ＡＰＩ関数決定プログラム１１７０は、ステップ１２３０によってｓｌｅｅｐＡＰＩ関数のみを決定し、ステップ１２５０によってｓｌｅｅｐＡＰＩ関数をＡＰＩ関数フック群５１０に追加する。 Specifically, in the statistical information 1140 shown in FIG. 9, the use count 1142 of the entry indicating the sleep API function indicates 100, and the use count 1142 of the printf API function is 10. For this reason, the average value of the number of uses 1142 is 55, and the threshold value T in step 1230 is 55. Then, the API function determination program 1170 determines only the sleep API function at step 1230, and adds the sleep API function to the API function hook group 510 at step 1250.

なお、処理内容１１１０は、統計情報１１４０にＡＰＩ関数の使用時刻またはＡＰＩ関数のスレッドＩＤなど追加する処理を含んでもよい。そして、ＡＰＩ関数決定プログラム１１７０は、統計情報１１４０に格納される値に基づいて、監視スレッド１１０が保持する監視周期と、図６に示すステップ８５０において監視スレッド１１０によって用いられる保護回数とを算出してもよい。 The processing content 1110 may include processing for adding to the statistical information 1140 the API function use time or API function thread ID. Then, the API function determination program 1170 calculates the monitoring cycle held by the monitoring thread 110 and the number of protections used by the monitoring thread 110 in step 850 shown in FIG. 6 based on the values stored in the statistical information 1140. May be.

例えば、ＡＰＩ関数が頻繁に実行される時間帯が、統計情報１１４０のＡＰＩ関数の使用時刻から取得できる場合、ＡＰＩ関数決定プログラム１１７０は、ＡＰＩ関数が頻繁に実行される時間帯において監視スレッド１１０が頻繁に監視情報６２０を参照するように、監視スレッド１１０の監視周期を定めてもよい。 For example, when the time period in which the API function is frequently executed can be acquired from the use time of the API function in the statistical information 1140, the API function determination program 1170 causes the monitoring thread 110 to be in the time period in which the API function is frequently executed. The monitoring cycle of the monitoring thread 110 may be determined so that the monitoring information 620 is frequently referred to.

また、統計情報１１４０にＡＰＩ関数のスレッドＩＤが含まれる場合、ＡＰＩ関数決定プログラム１１７０は、スレッド毎にＡＰＩ関数の使用状況を考慮してＡＰＩ関数フック群５１０を決定してもよい。 In addition, when the thread ID of the API function is included in the statistical information 1140, the API function determination program 1170 may determine the API function hook group 510 in consideration of the API function usage status for each thread.

なお、第３の実施形態と、第１の実施形態と、第２の実施形態とを併用し、開発者または使用者が統計情報１１４０からＡＰＩ関数フック群５１０に格納されるＡＰＩ関数を選択してもよい。また本実施形態は、統計情報１１４０を取得する方法をＬＤ＿ＡＵＤＩＴ環境変数１１２０によるＡＰＩ関数フックに限定しなくてもよく、プロセスまたはスレッド１２０によって読み出される変数を介して、ＡＰＩ関数フック処理を読み出せれば、いかなる変数を用いてもよい。 The third embodiment, the first embodiment, and the second embodiment are used together, and the developer or user selects an API function stored in the API function hook group 510 from the statistical information 1140. May be. Further, in the present embodiment, the method for obtaining the statistical information 1140 does not have to be limited to the API function hook by the LD_AUDIT environment variable 1120. If the API function hook process can be read via a variable read by the process or thread 120, Any variable may be used.

第３の実施形態の共有ライブラリ５００の生成処理、ＡＰＩ関数フックによる生存報告処理、異常動作検出処理、異常検出時の処理については、第１の実施形態または第２の実施形態と同じである。すなわち、ＡＰＩ関数決定プログラム１１７０がＷＡＩＴ状態になるＡＰＩ関数をＡＰＩ関数フック群５１０に含める場合、第３の実施形態の監視スレッド１１０およびスレッド１２０は、第２の実施形態と同じ処理を行い、ＡＰＩ関数決定プログラム１１７０がＷＡＩＴ状態になるＡＰＩ関数を含めない場合、第３の実施形態の監視スレッド１１０およびスレッド１２０は、第１の実施形態と同じ処理を行う。 The generation processing of the shared library 500, the survival report processing by the API function hook, the abnormal operation detection processing, and the processing at the time of abnormality detection are the same as those in the first embodiment or the second embodiment. That is, when the API function determination program 1170 includes an API function that enters the WAIT state in the API function hook group 510, the monitoring thread 110 and the thread 120 in the third embodiment perform the same processing as in the second embodiment, When the function determination program 1170 does not include an API function that enters a WAIT state, the monitoring thread 110 and the thread 120 in the third embodiment perform the same processing as in the first embodiment.

第３の実施形態によれば、カーネル１００の変更、スレッド監視の対象であるプログラムのソースコードの変更およびリコンパイル、スレッド監視の対象であるプログラムのコンパイル済み実行ファイルへの変更を伴わずに、ＡＰＩ関数フック群５１０を自動的に決定でき、さらに、スレッド監視を自動的に実行することができる。 According to the third embodiment, without changing the kernel 100, changing and recompiling the source code of the program that is the target of thread monitoring, and changing the program that is the target of thread monitoring to the compiled executable file, The API function hook group 510 can be automatically determined, and thread monitoring can be automatically executed.

また、第３の実施形態によれば、開発者または使用者がソースコードを閲覧できない場合も、ＡＰＩ関数決定プログラム１１７０によって、ＡＰＩ関数フック群５１０が自動的に決定される。また、統計情報１１４０にＡＰＩ関数の使用状況を示す値を格納することによって、開発者または使用者の環境に従って、最適なＡＰＩ関数フック群５１０が決定される。 Further, according to the third embodiment, even when the developer or user cannot browse the source code, the API function hook group 510 is automatically determined by the API function determination program 1170. Further, by storing a value indicating the usage status of the API function in the statistical information 1140, the optimum API function hook group 510 is determined according to the environment of the developer or user.

また、第３の実施形態によれば、ＡＰＩ関数フック禁止リスト１１３０を用いることによって、呼ばれる回数が多く、実行速度に影響がある排他制御（例えば、ロック取得の処理またはアンロックの処理など）をＡＰＩ関数フック群５１０に決定しないことができる。このため、第３の実施形態のスレッド監視方法を提供するために、スレッド監視の対象であるプログラムの処理を遅延させることがない。 Further, according to the third embodiment, by using the API function hook prohibition list 1130, exclusive control (for example, lock acquisition processing or unlock processing) that is called many times and affects the execution speed is performed. The API function hook group 510 may not be determined. For this reason, in order to provide the thread monitoring method of the third embodiment, the processing of the program that is the target of thread monitoring is not delayed.

なお、第３の実施形態における開発者または使用者は、ＡＰＩ関数フックを決定するために、スレッド監視の対象となるプログラムを含むソフトウェアを、ＡＰＩ関数フック群５１０を決定するために起動し、さらに、決定されたＡＰＩ関数フック群５１０を読み出すために共有ライブラリ５００を起動する必要がある。このため、第３の実施形態における機器４０は、スレッド監視の対象となるプログラムを含むソフトウェアを連続して起動するためのプログラムを、メモリ４２０に保持してもよい。 Note that the developer or user in the third embodiment activates software including a program to be monitored for thread monitoring in order to determine the API function hook group 510 in order to determine API function hooks. In order to read the determined API function hook group 510, it is necessary to start the shared library 500. For this reason, the device 40 according to the third embodiment may hold a program for continuously starting software including a program to be monitored by the thread in the memory 420.

また、開発者または使用者は、共有ライブラリ１１００および処理内容１１１０の生成処理、ＬＤ＿ＡＵＤＩＴ環境変数１１２０に共有ライブラリ１１００の識別子を設定する処理、および、ＡＰＩ関数決定プログラム１１７０を実行する処理等の第３の実施形態を実行するための設定処理を、機器４０が備えるキーボード等の入力装置を介して行ってもよい。 In addition, the developer or the user performs a third process such as a process for generating the shared library 1100 and the processing content 1110, a process for setting the identifier of the shared library 1100 in the LD_AUDIO environment variable 1120, and a process for executing the API function determination program 1170. The setting process for executing the embodiment may be performed via an input device such as a keyboard provided in the device 40.

また、第３の実施形態において、開発者または使用者は、前述の第３の実施形態を行うための設定処理を行うプログラムを生成し、他の計算機から機器４０へＮＩＦ４４０を介して、生成されたプログラムを送信してもよい。そして、機器４０に、送信されたプログラムを実行させてもよい。 In the third embodiment, the developer or user generates a program for performing setting processing for performing the above-described third embodiment, and is generated from another computer to the device 40 via the NIF 440. You may send a program. Then, the device 40 may be caused to execute the transmitted program.

また、第３の実施形態において、開発者または使用者は、前述の第３の実施形態を行うための設定処理を行うプログラムを、計算機によって読み取り可能な非一時的記憶媒体によって、機器４０に入力してもよい。 In the third embodiment, the developer or user inputs a program for performing the setting process for performing the above-described third embodiment to the device 40 using a non-transitory storage medium readable by a computer. May be.

本実施形態によれば、スレッド１２０が、無限ループ、デッドロック、またはロック処理を伴わない停止状態であることによって、正常に動作していない場合に、スレッド１２０が異常に動作していることを検出することができる。 According to the present embodiment, the thread 120 is operating abnormally when the thread 120 is not operating normally due to an infinite loop, deadlock, or a stopped state without a lock process. Can be detected.

また、本実施形態によれば、ＡＰＩ関数フックによってスレッド１２０を監視するための生存フラグ１４０を更新することによって、カーネル１００の変更、スレッド監視の対象であるプログラムのソースコードの変更およびリコンパイル、または、スレッド監視の対象であるプログラムのコンパイル済み実行ファイルへの変更を伴わないスレッド監視方法を実現できる。 Further, according to the present embodiment, by updating the survival flag 140 for monitoring the thread 120 by the API function hook, the kernel 100 is changed, the source code of the program to be monitored is changed and recompiled, Alternatively, it is possible to realize a thread monitoring method that does not involve a change to a compiled executable file of a program to be monitored.

また、本実施形態によれば、既存のＡＰＩ関数の処理に監視情報６２０を更新する処理が追加されるのみであるため、スレッド監視の対象のプログラムの処理を大きく遅延させることがない。さらに、図１１に示すステップ１２３における監視命令メッセージが不要であるため、プログラムサイズの縮小とプロセッサの処理時間の削減とが可能になる。 In addition, according to the present embodiment, only the process of updating the monitoring information 620 is added to the existing API function process, so that the process of the thread monitoring target program is not significantly delayed. Furthermore, since the monitoring instruction message in step 123 shown in FIG. 11 is not necessary, the program size can be reduced and the processing time of the processor can be reduced.

１００カーネル
１１０監視スレッド
１２０スレッド
１３０命令メッセージキュー
１４０生存フラグ
４００ＣＰＵ
４２０メモリ
４４０ＮＩＦ
５００共有ライブラリ
５１０ＡＰＩ関数フック群
５２０スレッドプロシージャ
５３０ＬＤ＿ＰＲＥＬＯＡＤ環境変数
６１０監視対象リスト
６２０監視情報
１１００共有ライブラリ
１１２０ＬＤ＿ＡＵＤＩＴ環境変数
１１３０ＡＰＩ関数フック禁止リスト
１１４０統計情報 100 Kernel 110 Monitoring thread 120 Thread 130 Instruction message queue 140 Survival flag 400 CPU
420 memory 440 NIF
500 Shared library 510 API function hook group 520 Thread procedure 530 LD_PRELOAD environment variable 610 Monitoring target list 620 Monitoring information 1100 Shared library 1120 LD_AUDIT environment variable 1130 API function hook prohibition list 1140 Statistical information

Claims

A computer system that executes multiple threads,
The computer system is
At least one processor and memory;
Execute each thread to execute an API function assigned a process by an application programming interface;
Execute the first thread to execute the first process assigned to the first API function;
In order to execute a process of monitoring the state of the first thread, execute a monitoring thread;
A monitoring information area for holding survival information indicating whether or not the first thread is normal;
A first process content including information indicating the first process and information indicating a second process for updating the survival information with a value indicating that the first thread is normal; In memory,
The first thread performs API function hook processing to execute the first processing and the second processing by reading the first processing content,
The monitoring thread is
Determining whether the survival information indicates that the first thread is normal;
If the survival information is determined to indicate that the first thread is normal, the first thread is determined to be normal and the first thread is not normal Update the survival information with the value,
If the existence information is determined to be the first thread it does not indicate that a normal computer system, characterized by determining said first thread is not normal.

The computer system further holds in the memory a monitoring cycle for the monitoring thread to monitor the state of the first thread,
The monitoring information area further holds waiting information indicating whether or not the first thread is in a waiting state,
The first processing content includes information indicating a third process for updating the waiting information with a value indicating that the first process is started, and a value indicating that the first process is completed. Further includes information indicating a fourth process for updating the waiting information, and an order in which the first process, the second process, the third process, and the fourth process are executed,
When the first API function is executed, the first thread executes the second process and the third process according to the order included in the first process content, and then executes the first process. The process is executed, and after the first process is executed, the fourth process is executed,
The monitoring thread is
Determining whether the waiting information in the monitoring period indicates that the first process is started or indicates that the first process has ended;
The waiting information, when the first process is determined to indicate that the start, the first thread is determined to be normal,
When it is determined that the waiting information indicates that the first processing has ended, it is determined whether or not the survival information indicates that the first thread is normal. Item 2. The computer system according to Item 1.

The first thread is included in a process for executing a program including the first API function;
The computer according to claim 1, wherein when the monitoring thread determines that the first thread is not normal, the computer system restarts the first thread or the process. system.

The computer system is
A shared library that is read when the API function is executed is further held in the memory;
The shared library includes an identifier indicating the first processing content,
The computer system according to claim 1, wherein the first thread reads the first processing content by reading the shared library.

The computer system is
Statistical information indicating the usage status of the API function executed by the thread being executed;
A second process content indicating a process of updating the statistical information indicating the usage status of the API function, and further stored in the memory,
Executing the second thread to execute the fifth process assigned to the second API function;
Executing a decision thread to determine the first API function according to the statistical information;
The second thread executes the fifth process and the process of updating the statistical information indicating the usage status of the second API function by reading the second process content,
The decision thread is
2. The computer system according to claim 1, wherein whether or not the second API function is determined as the first API function is determined according to the statistical information.

The computer system further holds, in the memory, prohibition information indicating the API function that is not selected as the first API function,
When the second API function is determined as the first API function, it is determined according to the statistical information, and when the prohibition information does not indicate the second API function, the determination thread 6. The computer system according to claim 5, wherein a function is determined to be the first API function.

The computer system is
When the first thread is executed, the monitoring information area is generated in the memory, and an identifier indicating the first thread and a position of the monitoring information area in the memory are notified to the monitoring thread . Execute the process,
The monitoring thread determines whether the survival information indicates that the first thread is alive based on the notified identifier indicating the first thread and the position of the monitoring information area in the memory. The computer system according to claim 1, wherein:

A monitoring method by a computer system that executes a plurality of threads,
The computer system is
At least one processor and memory;
The monitoring method includes:
A procedure for the processor to execute each of the threads to execute an API function assigned a process by an application programming interface;
A step of executing the first thread in order for the processor to execute a first process assigned to the first API function;
A step of executing a monitoring thread so that the processor executes a process of monitoring a state of the first thread;
The computer system is
A monitoring information area for holding survival information indicating whether or not the first thread is normal;
A first process content including information indicating the first process and information indicating a second process for updating the survival information with a value indicating that the first thread is normal; In memory,
The procedure for executing the first thread includes a procedure in which the processor performs an API function hook process for executing the first process and the second process by reading the first process content,
The procedure for executing the monitoring thread includes:
A procedure for the processor to determine whether the survival information indicates that the first thread is normal;
If the processor determines that the survival information indicates that the first thread is normal, the processor determines that the first thread is normal and the first thread is normal Updating the survival information with a value that does not indicate that;
Wherein the processor, when the survival information is the first thread is determined not to indicate that a normal, monitoring, characterized in that it comprises a procedure for determining the first thread is not normal Method.

The computer system further holds in the memory a monitoring cycle for the monitoring thread to monitor the state of the first thread,
The monitoring information area further holds waiting information indicating whether or not the first thread is in a waiting state,
The first processing content includes information indicating a third process for updating the waiting information with a value indicating that the first process is started, and a value indicating that the first process is completed. Further includes information indicating a fourth process for updating the waiting information, and an order in which the first process, the second process, the third process, and the fourth process are executed,
In the procedure for executing the first thread, the processor executes the first process after executing the second process and the third process according to the order included in the first process content. And executing the fourth process after executing the first process,
The procedure for executing the monitoring thread includes:
A step of determining whether the waiting information in the monitoring period indicates that the first process is started or that the first process is ended in the monitoring period;
Wherein the processor, when the waiting information is determined to indicate that the first process is started, the first thread and procedures judged to be normal,
Wherein the processor, when the waiting information is determined to indicate that the first process has finished, characterized in that the existence information to determine whether indicating that the first thread is normal The monitoring method according to claim 8.

The first thread is included in a process for executing a program including the first API function;
In the monitoring method , when the processor determines that the first thread is not normal in the procedure of executing the monitoring thread, the processor restarts the first thread or the process. The monitoring method according to claim 8, further comprising:

The computer system further holds, in the memory, a shared library that is read when the API function is executed,
The shared library includes an identifier indicating the first processing content,
9. The monitoring method according to claim 8, wherein the procedure for executing the first thread includes a procedure for the processor to read the first processing content by reading the shared library.

The computer system is
Statistical information indicating the usage status of the API function executed by processing the thread;
A second process content indicating a process of updating the statistical information indicating the usage status of the API function, and further stored in the memory,
The monitoring method includes:
A step of executing the second thread in order for the processor to execute a fifth process assigned to the second API function;
The processor determining the first API function according to the statistical information;
The procedure for executing the second thread is to read the second processing content, thereby to update the fifth processing and the API function hook processing for updating the statistical information indicating the usage status of the second API function. Including steps to perform and
The step of determining the first API function includes a step of determining whether or not the processor determines the second API function as the first API function according to the statistical information. The monitoring method according to claim 8.

The computer system further holds, in the memory, prohibition information indicating the API function that is not selected as the first API function,
The procedure for determining the first API function is determined according to the statistical information when the second API function is determined as the first API function, and the prohibition information does not indicate the second API function. 13. The monitoring method according to claim 12, further comprising a step of determining the second API function as the first API function by the processor.

The monitoring method includes:
When the first thread is executed, the monitoring information area is generated in the memory, and an identifier indicating the first thread and a position of the monitoring information area in the memory are notified to the monitoring thread. Including procedures,
The procedure for determining whether or not the survival information indicates that the first thread is normal is that the processor uses an identifier indicating the notified first thread and a position in the memory of the monitoring information area. 9. The monitoring method according to claim 8, further comprising a step of determining whether or not the survival information indicates that the first thread is alive based on the information.