JP2007072958A

JP2007072958A - Method and device for detecting deray of event synchronization

Info

Publication number: JP2007072958A
Application number: JP2005262006A
Authority: JP
Inventors: Tomonori Sekiguchi; 知紀関口; Koji Amano; 光司天野; Takahiro Ohira; 崇博大平
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2005-09-09
Filing date: 2005-09-09
Publication date: 2007-03-22

Abstract

<P>PROBLEM TO BE SOLVED: To detect an execution restart delay of a standby thread when a plurality of programs run in cooperation with an event. <P>SOLUTION: When a standby request is received from a thread in relation to a monitoring event, an OS records information indicating standby in a memory. When notification is received in relation to the event, the OS records a notification time in the memory. When execution of the standby sled is restarted, the OS erases corresponding standby record and the appropriate notification time. While the standby record and the appropriate notification record remain, and when a current time passes over a predetermined allowed time or more than the remaining notification time, the OS determines as an execution restart delay of the corresponding thread. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、計算機で実行するプログラムの動作を監視する技術に係わり、特に、プログラムがイベントで連携している場合に動作遅延を検出する技術に関する。 The present invention relates to a technique for monitoring the operation of a program executed on a computer, and more particularly to a technique for detecting an operation delay when a program is linked by an event.

高度な応答性能が要求される計算機システムや、処理完了までの時間が厳密に規定されている計算機システムでは、個々のプログラムの処理実行を予め定められた一定時間内に納めることが重要となる。このようなシステムの構築では、処理時間を一定時間内に収めるように設計するか、決められた時間を越えても処理が終了しないイベントを監視することが重要となる。 In a computer system that requires a high level of response performance and a computer system in which the time until the completion of processing is strictly defined, it is important that the processing execution of each program is accommodated within a predetermined time. In the construction of such a system, it is important to design the processing time so as to be within a certain time, or to monitor an event in which the processing does not end even if a predetermined time is exceeded.

一般にリアルタイムＯＳと呼ばれるＯＳを用いて構築されるシステムでは、計算機で実行されるすべての処理の処理時間が計算可能であり、それによって、利用者が実行するプログラムの動作時間を予め設計できる。これによって、システムで実行する処理を一定の時間内に収めることが可能である。 In a system constructed using an OS generally referred to as a real-time OS, the processing time of all the processes executed by the computer can be calculated, whereby the operation time of the program executed by the user can be designed in advance. As a result, it is possible to keep processing executed in the system within a certain period of time.

一方、汎用のＯＳにおいては、プログラムの処理時間を予め求めることはできない。これは、利用者からは制御できない処理がＯＳ内で実行されるためである。このようなＯＳにおいては、プログラムが予め想定している通りに実行しているか否かを監視することが重要である。このために、特許文献１では、特定のプログラムを実行するプロセスについて、そのプロセスの状態を管理し、表示する方法を実現している。 On the other hand, in a general-purpose OS, the processing time of a program cannot be obtained in advance. This is because processing that cannot be controlled by the user is executed in the OS. In such an OS, it is important to monitor whether or not the program is executing as expected. For this reason, Patent Document 1 realizes a method for managing and displaying the state of a process executing a specific program.

また、汎用のＯＳは、複数のプロセスの連係動作が滞らないようにする機能を組み込んでいる場合もある。例えば、特許文献２では、プロセスの連係動作に利用する排他制御機能を用いて、プログラムの異常を検知する方法を開示している。また、非特許文献１では、プロセスのスケジューリングにおいて、優先度の高いプロセスが実行しているため、走行可能であってもＣＰＵを与えられないプロセスを走行させる仕掛けとして、プロセスの優先度を一時的に高くする方法を開示している。 In addition, a general-purpose OS may incorporate a function that prevents the linkage operation of a plurality of processes from being delayed. For example, Patent Document 2 discloses a method of detecting an abnormality in a program using an exclusive control function used for process linkage operation. In Non-Patent Document 1, since a process with high priority is executed in process scheduling, the priority of the process is temporarily set as a mechanism for running a process that can run but cannot be given a CPU. The method of making it high is disclosed.

他にも、プログラムが正常に実行していることを一定周期で連絡して、その状況を監視するウォッチドッグタイマと呼ばれる方法も一般的である。 In addition, there is also a general method called a watch dog timer which notifies that a program is normally executed at a constant cycle and monitors the situation.

米国特許第５，６３６，３７６号（１９９７年６月） System and method for selectively and contemporaneously monitoring processes in a multiprocessing serverUS Patent 5,636,376 (June 1997) System and method for selectively and contemporaneously monitoring processes in a multiprocessing server 特開２００１−５６９４号公報JP 2001-5694 A D. Solomon他１名, “Windows(登録商標) Internal 4th Edition,” (Priority Boost for CPU Starvation, pp354-355), Microsoft Press, 2005年D. Solomon et al., “Windows® Internal 4th Edition,” (Priority Boost for CPU Starvation, pp354-355), Microsoft Press, 2005

汎用ＯＳ向けの従来技術では、あるイベントによるプログラム間の連係動作を監視できないという問題がある。特に、イベント通知を待機しているプロセスが、イベント通知されたにも関わらず、システム内の他の高優先度プロセスのために走行不能である状況を検知できないという問題がある。プロセスをスレッドの単位で実行するときのスレッドについても同様である。 In the conventional technology for general-purpose OS, there is a problem that the linked operation between programs due to a certain event cannot be monitored. In particular, there is a problem that a process waiting for an event notification cannot detect a situation in which it cannot run due to another high priority process in the system despite the event notification. The same applies to threads when a process is executed in units of threads.

従来技術による個々のプログラムの動作監視では、何をもってそれぞれのプログラムが正常に実行しているとするのかを定めるのが困難であるという問題がある。例えば、ウォッチドッグタイマによる監視においては、プログラムは正常に動作していることを知らせるために、一定時間内にウォッチドッグタイマにアクセスするが、この動作は必ずしもプログラムが正常に実行していることを反映しないため、プログラムの動作監視にならない場合があり問題である。 In the operation monitoring of individual programs according to the prior art, there is a problem that it is difficult to determine what causes each program to execute normally. For example, in monitoring by the watchdog timer, the watchdog timer is accessed within a certain period of time in order to notify that the program is operating normally, but this operation does not necessarily indicate that the program is running normally. Because it does not reflect, there is a case that the operation of the program may not be monitored, which is a problem.

本発明の目的は、複数のプログラムがイベントにより連携している場合の動作遅延を検出することにある。 An object of the present invention is to detect an operation delay when a plurality of programs are linked by an event.

本発明は、イベントによるプログラムの連係動作を監視するために、監視対象とするイベントを指定し、監視対象として指定されているイベント通知に対する待機、通知の操作を記録し、待機しているスレッドの実行再開を記録し、この記録を検査してイベント通知されたにも関わらず走行を再開しないスレッドが存在することを検知する技術を特徴とする。 The present invention specifies an event to be monitored in order to monitor a program linkage operation by an event, records a standby for an event notification specified as a monitoring target, records a notification operation, and The technique is characterized by recording a restart of execution and detecting the existence of a thread that does not resume running despite being notified of an event by checking the record.

本発明によれば、汎用ＯＳを利用するシステムにおいて、特定のイベントによるプログラムの連携が滞りなく動作しているか否かを監視できる。これによって、システム内で不正なプロセスがＣＰＵを占有する場合や、過負荷のために、所定のスレッドが所定の時間内に実行を再開できない事象を確実かつ迅速に検知できる。 According to the present invention, in a system using a general-purpose OS, it is possible to monitor whether or not the cooperation of a program due to a specific event is operating without delay. As a result, it is possible to reliably and quickly detect a case where an unauthorized process occupies the CPU in the system or an event in which a predetermined thread cannot resume execution within a predetermined time due to overload.

本発明を複数の計算機から構成するクラスタシステムに適用すれば、イベント連携による遅延を異常として検知し、それを契機にクラスタの構成変更を開始できる。 If the present invention is applied to a cluster system composed of a plurality of computers, a delay due to event cooperation can be detected as an abnormality, and a cluster configuration change can be started on that occasion.

また、本発明を非同期Ｉ／Ｏ操作の完了通知に適用すれば、Ｉ／Ｏ操作が完了しているのに実行を再開できないプログラムを検知できる。 Further, when the present invention is applied to a notification of completion of an asynchronous I / O operation, it is possible to detect a program whose execution cannot be resumed even though the I / O operation is completed.

以下に、図を用いて本発明の実施形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

以下、実施例１について説明する。図５は、実施例１の計算機５００の構成例を示す図である。ここに示した構成は一例であって、本発明が実施可能な計算機の構成を限定するものではない。 Example 1 will be described below. FIG. 5 is a diagram illustrating a configuration example of the computer 500 according to the first embodiment. The configuration shown here is an example, and does not limit the configuration of a computer that can implement the present invention.

計算機５００は、ＣＰＵ５０１、主記憶装置５０２、外部記憶装置５０３、通信制御部５０４、入出力装置、および、それらを接続する制御部から成る。ＣＰＵ５０１は、主記憶装置（メモリ）５０２にロードされたプログラムを実行する。以下、ＣＰＵがプログラムを実行することを、単にプログラムが実行すると説明する。また、ＯＳは、プログラムの実行をスレッドの単位でスケジュールする。 The computer 500 includes a CPU 501, a main storage device 502, an external storage device 503, a communication control unit 504, an input / output device, and a control unit that connects them. The CPU 501 executes a program loaded in the main storage device (memory) 502. Hereinafter, it will be described that the CPU executes the program simply that the program executes. In addition, the OS schedules program execution in units of threads.

外部記憶装置５０３は、計算機５００の実行に必要なＯＳ、プログラム、監視対象イベントを登録する登録プログラム１２０、設定ファイル５２０などのファイルを格納している。図５の主記憶装置５０２は、ＯＳ、プログラム１２１、およびプログラム１２２がロードされている様子を示している。 The external storage device 503 stores files necessary for execution of the computer 500, such as an OS, a program, a registration program 120 for registering a monitoring target event, and a setting file 520. The main storage device 502 in FIG. 5 shows a state in which the OS, the program 121, and the program 122 are loaded.

ＯＳ内には、イベント連携監視用のイベント監視管理データ１００とイベント処理部５１１が存在している。ＯＳは、様々の処理間の連携のために、イベントオブジェクト（以下、イベントと略称する場合がある）を提供する。スレッドは、ＯＳ内に割り当てたイベントに対し、待機の操作ができる。あるイベントを待機するスレッドは、他の処理がそのイベントについて通知操作をするまで実行を中断して待機する。別の処理がそのイベントについて通知操作を実行すると、ＯＳはそのイベントで待機しているスレッドが実行を再開するようにスケジューラに指示する。 In the OS, event monitoring management data 100 for event linkage monitoring and an event processing unit 511 exist. The OS provides an event object (hereinafter sometimes abbreviated as an event) for cooperation between various processes. A thread can perform a standby operation on an event assigned in the OS. A thread that waits for an event suspends execution and waits until another process performs a notification operation on the event. When another process performs a notification operation for the event, the OS instructs the scheduler to resume execution of the thread waiting for that event.

実施例１では、プログラム１２１とプログラム１２２がイベントによる同期をとるものとして説明する。また、外部記憶装置５０３はイベント監視に関する設定ファイル５２０を保持し、登録プログラム１２０が、設定ファイル５２０の内容に従ってＯＳに監視対象イベントを登録するものとする。 In the first embodiment, it is assumed that the program 121 and the program 122 are synchronized by an event. The external storage device 503 holds a setting file 520 related to event monitoring, and the registration program 120 registers a monitoring target event in the OS according to the contents of the setting file 520.

図１は、実施例１のデータ構造と処理モジュールの関係を示す図である。まず、データ構造について説明する。 FIG. 1 is a diagram illustrating a relationship between a data structure and a processing module according to the first embodiment. First, the data structure will be described.

計算機５００が実行するＯＳは、イベント監視管理データ１００を保持する。イベント監視管理データ１００は、監視イベントテーブル１０１と通知リスト１０２から成る。通知リスト１０２は、少なくとも１つの通知ブロック１０３ないし１０５から成る。監視イベントテーブル１０１は、登録プログラム１２０が監視対象として登録したイベントの情報を保持する。 The OS executed by the computer 500 holds the event monitoring management data 100. The event monitoring management data 100 includes a monitoring event table 101 and a notification list 102. The notification list 102 comprises at least one notification block 103-105. The monitoring event table 101 holds information on events registered as monitoring targets by the registration program 120.

監視イベントテーブル１０１は、各対象イベントオブジェクトについて、イベントオブジェクトのアドレス、イベント通知されてから待機スレッドが走行を再開するまでの許容時間、および、イベント通知操作それぞれについて待機スレッドの再開状況を保持する通知リスト１０２の先頭アドレスを保持する。 The monitoring event table 101 holds, for each target event object, the event object address, the allowable time from when the event is notified until the standby thread resumes running, and the standby thread restart status for each event notification operation. The head address of the list 102 is held.

通知リスト１０２は、イベントに対して通知操作が実行されたときに待機しているスレッドの実行再開状況を記録する。通知リスト１０２は、少なくとも１つの通知ブロックから構成される。通知ブロックは、初期化処理によって１つ生成され、その後各イベント通知が行われるごとに１つずつ生成されて通知リスト１０２に追加される。通知ブロックは、イベントで待機した後に実行を再開していない待機スレッド数、イベント通知が受け付けられた通知時刻、より古いイベント通知記録を保持する次の通知ブロックのアドレスを保持する。 The notification list 102 records the execution resumption status of a thread waiting when a notification operation is executed for an event. The notification list 102 includes at least one notification block. One notification block is generated by the initialization process, and one notification block is generated and added to the notification list 102 each time each event notification is performed. The notification block holds the number of waiting threads that have not resumed execution after waiting for an event, the notification time when the event notification was accepted, and the address of the next notification block that holds an older event notification record.

通知リスト１０２の先頭の通知ブロック１０３は、その時点での当該イベントに関するスレッドの待機状況を示している。通知ブロック１０５も同様である。通知リスト１０２の先頭以降のブロック１０４は、当該イベントで待機していたがイベント通知により実行を再開するはずのスレッドの状況を示している。図１の例では、通知リスト１０２は、通知ブロック１０３と１０４で構成されている。通知ブロック１０４は、以前にイベント通知されたが実行を再開していないスレッドが２つあることを示している。また、通知ブロック１０３は、現時点でイベント待機しているスレッドが３つあることを示している。 A notification block 103 at the top of the notification list 102 indicates a waiting state of a thread related to the event at that time. The same applies to the notification block 105. A block 104 after the top of the notification list 102 indicates the status of a thread that has been waiting for the event but should resume execution by the event notification. In the example of FIG. 1, the notification list 102 includes notification blocks 103 and 104. Notification block 104 indicates that there are two threads that have been previously notified of the event but have not resumed execution. The notification block 103 indicates that there are three threads waiting for an event at the present time.

次に、実施例１のイベント操作の処理手順について説明する。イベント処理部５１１は、イベント登録処理部１１１、イベント待機処理部１１２、イベント通知処理部１１３およびイベント監視処理部１１４から構成される。 Next, an event operation processing procedure according to the first embodiment will be described. The event processing unit 511 includes an event registration processing unit 111, an event standby processing unit 112, an event notification processing unit 113, and an event monitoring processing unit 114.

イベント登録処理部１１１は、監視イベントテーブル１０１へのイベントオブジェクトの登録や削除を実施する。登録処理は、イベントオブジェクトの名称と待機スレッドが実行を再開するまでの許容時間を受けて、必要に応じて指定されたイベントオブジェクトをそのスレッドに割り当て、監視イベントテーブル１０１にイベント用のエントリを割り当て、そのエントリにイベントオブジェクトのアドレスと許容時間を記録する。そして、通知リスト１０２を初期化する。初期化処理は、通知リスト１０２の先頭として通知ブロック１０３を割り当てて、そのアドレスを監視イベントテーブル１０１の当該エントリの通知リスト先頭アドレスに記録する。 The event registration processing unit 111 registers and deletes event objects in the monitoring event table 101. The registration process receives the name of the event object and the allowable time until the standby thread resumes execution, assigns the specified event object to the thread as necessary, and assigns an entry for the event to the monitoring event table 101 Record the event object's address and allowable time in the entry. Then, the notification list 102 is initialized. In the initialization process, the notification block 103 is assigned as the head of the notification list 102, and the address is recorded in the notification list head address of the entry in the monitoring event table 101.

登録処理は、イベント連携遅延検出時の動作の登録も含む。この処理では、指定された異常時動作を示す識別子を、異常時処理モード１１０に記録する。 The registration process includes registration of an operation when an event linkage delay is detected. In this processing, an identifier indicating the specified abnormal operation is recorded in the abnormal processing mode 110.

登録プログラム１２０は、システム起動時に実行して、設定ファイル５２０の内容に従って、監視対象イベントとイベント連携異常時の処理を示す識別子を登録する。設定ファイル５２０の例を図６に示す。設定ファイル５２０は、監視対象のイベント毎に、イベント名、イベント連携についての遅延の許容時間を指定する。また、設定ファイル５０２は、イベント連携の異常を検知したときの動作識別子（この例では“panic”）も指定している。 The registration program 120 is executed when the system is started up, and registers an identifier indicating a monitoring target event and an event linkage abnormality process according to the contents of the setting file 520. An example of the setting file 520 is shown in FIG. The setting file 520 specifies an event name and an allowable delay time for event cooperation for each event to be monitored. The setting file 502 also specifies an operation identifier (“panic” in this example) when an event linkage abnormality is detected.

イベント待機処理部１１２とイベント通知処理部１１３は、ユーザプログラムが利用するイベント連携処理である。本実施形態では、ＯＳが基本的なイベント連携機能を提供しており、それにイベント連携監視のための処理を加える。これらの処理部は、監視イベントテーブル１０１を参照してイベント連携操作を記録する。各処理部の処理手順は、後述する。 The event standby processing unit 112 and the event notification processing unit 113 are event linkage processing used by the user program. In the present embodiment, the OS provides a basic event linkage function, and processing for event linkage monitoring is added to it. These processing units record the event cooperation operation with reference to the monitoring event table 101. The processing procedure of each processing unit will be described later.

イベント監視処理部１１４は、監視イベントテーブル１０１を参照して、イベント通知操作がなされたが許容時間を経過しても実行を再開していないスレッドが存在していないかを検査する。イベント監視処理部１１４は、ＯＳが一定周期で受け付けるタイマ割り込み処理の中で実行されるように、ＯＳ自身がＯＳ初期化処理時に監視周期を設定する。例えば、ＯＳは、イベント監視処理部１１４が１秒毎に実行されるように登録する。この周期は外部からも設定可能であってよい。イベント監視処理部１１４の処理手順は、後述する。 The event monitoring processing unit 114 refers to the monitoring event table 101 to check whether there is a thread that has performed an event notification operation but has not resumed execution even after an allowable time has elapsed. The event monitoring processing unit 114 sets a monitoring period at the time of OS initialization processing so that the OS itself is executed in a timer interrupt process accepted by the OS at a constant period. For example, the OS registers the event monitoring processing unit 114 to be executed every second. This period may be settable from the outside. The processing procedure of the event monitoring processing unit 114 will be described later.

イベント監視処理部１１４が再開の遅れているスレッドを見つけた場合は、異常時処理モード１１０に設定された処理を実行する。この処理としては、例えば、ＯＳの異常終了、通信制御部５０４を介する他の計算機への異常通知、主記憶装置５０２内への異常発生の記録、等が可能である。ここで可能な処理は、この記載の限りではない。 When the event monitoring processing unit 114 finds a thread that is delayed in restarting, the process set in the abnormal time processing mode 110 is executed. As this processing, for example, abnormal termination of the OS, notification of an abnormality to another computer via the communication control unit 504, recording of the occurrence of an abnormality in the main storage device 502, and the like are possible. The processing possible here is not limited to this description.

ＯＳは、ユーザプログラムが利用可能なＯＳインターフェイス１１５を提供する。インターフェイス１１５は、イベント待機と通知処理、及び、監視設定を登録するインターフェイスを含む。登録プログラム１２０、プログラム１２１、プログラム１２２は、インターフェイス１１５を介して、ＯＳのイベント待機、通知、登録処理を実行する。 The OS provides an OS interface 115 that can be used by a user program. The interface 115 includes an interface for registering event waiting and notification processing and monitoring settings. The registration program 120, the program 121, and the program 122 execute OS event waiting, notification, and registration processing via the interface 115.

次に、ＯＳのイベント処理手順について説明する。ここでは、プログラム１２１が、あるイベントについてイベント通知を待機し、プログラム１２２がそのイベントへの通知を行うものとして説明する。 Next, an OS event processing procedure will be described. Here, it is assumed that the program 121 waits for an event notification for a certain event and the program 122 notifies the event.

図２は、実施例１のイベント待機処理手順である。プログラム１２１がＯＳのイベント待機処理を呼び出すと、イベント待機処理部１１２はステップ２０１からの処理を実行する。まず、イベント待機処理部１１２は、監視イベントテーブル１０１を参照して、対象のイベントが監視対象か否かを判定する（ステップ２０１）。監視対象でないならば、イベント待機を要求した当該スレッドをイベント待機状態とする（ステップ２１１）。この処理は、対象イベントオブジェクトに当該スレッドの識別子を登録し、スレッド状態をイベント待ち状態とし、スレッドのスケジュール処理を実行する。これは、一般的なＯＳのイベント待機処理と同等の処理である。イベントに対する通知操作が実行されるとスレッドは実行を再開し、プログラム１２１の処理に復帰する。 FIG. 2 is an event standby process procedure according to the first embodiment. When the program 121 calls the OS event standby process, the event standby processing unit 112 executes the process from step 201. First, the event standby processing unit 112 refers to the monitoring event table 101 and determines whether or not the target event is a monitoring target (step 201). If it is not the monitoring target, the thread that requested the event standby is set in the event standby state (step 211). In this process, the identifier of the thread is registered in the target event object, the thread state is set to the event waiting state, and the thread scheduling process is executed. This is a process equivalent to a general OS event waiting process. When the notification operation for the event is executed, the thread resumes execution and returns to the processing of the program 121.

対象イベントが監視対象である場合は、イベント待機処理部１１２は、当該イベントへの操作を禁止する（ステップ２０２）。これは、当該イベントのロックの獲得等によって実現される。次に、当該イベントの通知リスト１０２の先頭の通知ブロックの待機スレッド数に１を加えて、待ちスレッドがあることを登録する（ステップ２０３）。 If the target event is a monitoring target, the event standby processing unit 112 prohibits an operation on the event (step 202). This is realized by acquiring the lock of the event. Next, 1 is added to the number of waiting threads of the first notification block in the notification list 102 of the event to register that there is a waiting thread (step 203).

その後、スレッド待ちを記録した通知ブロックのアドレスを当該スレッドのメモリ領域に記録して（ステップ２０４）、当該スレッドをイベント待機状態とする（ステップ２０５）。通知ブロックのアドレスは、実行しているスレッドのスタック等を格納するメモリ領域に記録すればよい。ステップ２０３と２０４の処理は、当該スレッドと対応づけて待機を示す情報を記録する処理であれば、他の方法を用いてもよい。ステップ２０５の処理は、ステップ２１１と同様の処理である。本実施形態では、イベント待機を要求した当該スレッドの実行はステップ２０５とステップ２０６の間で待ち状態にあるものとする。 Thereafter, the address of the notification block in which the thread wait is recorded is recorded in the memory area of the thread (step 204), and the thread is set in the event wait state (step 205). The address of the notification block may be recorded in a memory area that stores the stack of the thread being executed. As long as the processing of steps 203 and 204 is processing for recording information indicating standby in association with the thread, other methods may be used. The process of step 205 is the same process as step 211. In the present embodiment, it is assumed that the execution of the thread that requested event waiting is in a waiting state between step 205 and step 206.

ステップ２０６以降の処理は、別の処理が当該イベントに対してイベント通知処理を実行し、待機しているスレッドが実行を再開するときの処理である。 The processes after Step 206 are processes when another process executes the event notification process for the event and the waiting thread resumes execution.

イベント通知処理で詳細説明するが、プログラム１２２がイベントに通知処理を実行すると、通知リスト１０２の先頭に新しい通知ブロックが挿入される。以降の説明では、ステップ２０４でそのアドレスを記録した通知ブロックは図１の通知ブロック１０４であるとして説明する。 As will be described in detail in the event notification processing, when the program 122 executes notification processing for an event, a new notification block is inserted at the top of the notification list 102. In the following description, it is assumed that the notification block in which the address is recorded in step 204 is the notification block 104 in FIG.

当該スレッドが実行を再開すると、イベント待機処理部１１２は、まず、当該イベントに対する操作を禁止し（ステップ２０６）、そのアドレスを当該スレッドのメモリ領域に記録した通知ブロック１０４の待機スレッド数から１を減じる（ステップ２０７）。待機スレッド数が０になった場合（ステップ２０８ｙｅｓ）、その通知ブロックに関する待機スレッドがすべて実行を再開したことを意味する。この場合、イベント待機処理部１１２は、当該通知ブロック１０４を通知リスト１０２から削除する（ステップ２０９）。ステップ２０８，２０９の処理は、ステップ２０３，２０４で記録した対応するスレッドの待機記録を消し込む処理であればよい。 When the thread resumes execution, the event waiting processing unit 112 first prohibits the operation for the event (step 206), and calculates 1 from the number of waiting threads of the notification block 104 whose address is recorded in the memory area of the thread. Decrease (step 207). When the number of waiting threads becomes 0 (step 208 yes), it means that all waiting threads related to the notification block have resumed execution. In this case, the event standby processing unit 112 deletes the notification block 104 from the notification list 102 (step 209). The processing in steps 208 and 209 may be processing that erases the standby recording of the corresponding thread recorded in steps 203 and 204.

最後に、当該イベントへの操作禁止を解除して（ステップ２１０）、プログラム１２１の処理に復帰する。 Finally, the prohibition of operation for the event is canceled (step 210), and the processing returns to the program 121.

次に、イベント通知の処理手順を説明する。図３は、実施例１のイベント通知処理の手順である。プログラム１２２がイベント通知処理を呼び出すと、イベント通知処理部１１３はステップ３０１からの処理を実行する。 Next, an event notification processing procedure will be described. FIG. 3 is a procedure of event notification processing according to the first embodiment. When the program 122 calls the event notification process, the event notification processing unit 113 executes the process from step 301.

まず、イベント通知処理部１１３は、監視イベントテーブル１０１を参照して、対象イベントが監視対象か否かを判定する（ステップ３０１）。監視の対象でないならば、イベント通知処理部１１３は、通常のイベント通知操作を実行する。すなわち、イベントで待機しているスレッドの状態を走行可能に設定し、イベントオブジェクトから待機スレッドの識別子を削除し、スケジューリング要求を登録し（ステップ３０８）、終了する。イベント通知処理部１１３は、プログラム１２２に復帰する過程でスケジューリング要求を検査し、スレッドのスケジュール処理を実行し、待機していたスレッドが走行を再開することとなる。 First, the event notification processing unit 113 refers to the monitoring event table 101 and determines whether or not the target event is a monitoring target (step 301). If it is not a monitoring target, the event notification processing unit 113 executes a normal event notification operation. That is, the state of the thread waiting in the event is set to run, the identifier of the waiting thread is deleted from the event object, the scheduling request is registered (step 308), and the process ends. The event notification processing unit 113 checks the scheduling request in the process of returning to the program 122, executes the scheduling process of the thread, and the waiting thread resumes running.

イベントが監視対象である場合、イベント通知処理部１１３は、まず、対象イベントに対する操作を禁止する（ステップ３０２）。そして、通知リスト１０２の先頭の通知ブロックの待機スレッド数を参照して、スレッドが待機しているか否かを判定する（ステップ３０３）。待機していない場合は、ステップ３０７に進み、イベント通知処理を完了する。 If the event is a monitoring target, the event notification processing unit 113 first prohibits an operation on the target event (step 302). Then, with reference to the number of waiting threads of the first notification block in the notification list 102, it is determined whether or not the thread is waiting (step 303). If not, the process proceeds to step 307, and the event notification process is completed.

待機しているスレッドがある場合、イベント通知処理部１１３は、通知リスト１０２の先頭の通知ブロックの通知時刻に、現在の時刻を記録する（ステップ３０４）。次に、イベント通知処理部１１３は、以降のイベント通知処理に備えて、通知リスト１０２の先頭に新しく割り当てた通知ブロックを挿入する（ステップ３０５）。この先頭の通知ブロックの「待機スレッド数」には、０が設定される。ステップ２０９を実行したとき、その通知リスト１０２の先頭の通知ブロックは、次のイベント待機と通知を記録するための空きブロックとなっている。図１の通知リスト１０２は、通知ブロック１０３を挿入後のデータ構造を示している。ステップ３０４で通知時刻を記録した通知ブロックは、通知ブロック１０４になっている。 If there is a waiting thread, the event notification processing unit 113 records the current time in the notification time of the first notification block in the notification list 102 (step 304). Next, the event notification processing unit 113 inserts a newly allocated notification block at the top of the notification list 102 in preparation for the subsequent event notification processing (step 305). 0 is set in the “number of waiting threads” of the first notification block. When step 209 is executed, the first notification block in the notification list 102 is an empty block for recording the next event standby and notification. The notification list 102 in FIG. 1 shows the data structure after the notification block 103 is inserted. The notification block in which the notification time is recorded in step 304 is the notification block 104.

続いて、イベント通知処理部１１３は、待機しているスレッドを走行可能としてスケジューリングを要求し（ステップ３０６）、イベント操作禁止を解除し（ステップ３０７）、プログラム１２２の処理に復帰する。 Subsequently, the event notification processing unit 113 requests scheduling so that the waiting thread can run (step 306), cancels the event operation prohibition (step 307), and returns to the processing of the program 122.

以上の処理により、通知リスト１０２は、次のイベント通知操作を記録する通知ブロック１０３と、過去の通知操作に対するスレッド待機状況を記録した通知ブロック１０４からなるリストとなる。イベントで待機しているスレッドが正常にスケジュールされて実行を再開すれば、ステップ２０６からの処理によって、待機状況を記録している通知ブロック１０４は通知リスト１０２から削除される。しかし、待機しているスレッドが走行を再開しない場合、通知ブロック１０４はリストに残り続ける。 As a result of the above processing, the notification list 102 is a list including a notification block 103 that records the next event notification operation and a notification block 104 that records the thread standby status for the past notification operation. If a thread waiting for an event is normally scheduled and resumes execution, the processing from step 206 deletes the notification block 104 in which the standby status is recorded from the notification list 102. However, if the waiting thread does not resume running, the notification block 104 remains in the list.

図４は、イベント監視処理部１１４の処理手順を示す。この手順は、監視イベントテーブル１０１および通知リスト１０２を検査して、実行の再開が遅延しているスレッドを検知する。以下に、この処理手順を説明する。 FIG. 4 shows a processing procedure of the event monitoring processing unit 114. In this procedure, the monitoring event table 101 and the notification list 102 are inspected, and a thread whose execution restart is delayed is detected. Hereinafter, this processing procedure will be described.

図４に示した処理は、監視イベントテーブル１０１に登録されているすべてのイベントについて実行する。ステップ４０１では、イベント監視処理部１１４は、処理対象イベントを選択する。監視イベントテーブル１０１のイベントをすべて処理したとき、処理を終了する。 The process shown in FIG. 4 is executed for all events registered in the monitoring event table 101. In step 401, the event monitoring processing unit 114 selects a processing target event. When all the events in the monitoring event table 101 have been processed, the processing ends.

次に、イベント監視処理部１１４は、処理対象のイベントの通知リスト１０２を取得し（ステップ４０２）、リストに含まれるそれぞれのイベント通知ブロックを処理する。ステップ４０３では、リスト中のすべての通知ブロックを処理したか否かを判定し、残りの通知ブロックがあればその１つを選択する。すべての通知ブロックを処理したとき、ステップ４０１に戻り処理を繰り返す。 Next, the event monitoring processing unit 114 acquires the event notification list 102 to be processed (step 402), and processes each event notification block included in the list. In step 403, it is determined whether or not all notification blocks in the list have been processed, and if there are remaining notification blocks, one of them is selected. When all the notification blocks have been processed, the process returns to step 401 and is repeated.

イベント監視処理部１１４は、選択した通知ブロックについて、以下の処理を実行する。まず、通知ブロックが通知リストの先頭である場合、ステップ４０３に戻り、次の通知ブロックを選択する。通知リストの先頭は、次のイベント通知を記録するための空きブロックであるためである。 The event monitoring processing unit 114 executes the following processing for the selected notification block. First, when the notification block is at the head of the notification list, the process returns to step 403 to select the next notification block. This is because the top of the notification list is an empty block for recording the next event notification.

通知ブロックがイベント通知されていてその待機スレッド数が０でない場合には、イベント監視処理部１１４は、現在時刻が通知ブロックに記載の通知時刻に監視イベントテーブル１０１に記載の許容時間を加えた時刻を過ぎているか否か検査する（ステップ４０５）。過ぎていない場合、つまり、待機スレッド数が０か、あるいはイベント通知されてからまだ設定されている許容時間を経過していない場合は、ステップ４０３に戻り、次の通知ブロックを選択する。 When the notification block is notified of an event and the number of waiting threads is not 0, the event monitoring processing unit 114 adds the allowable time described in the monitoring event table 101 to the notification time described in the notification block. It is checked whether or not it has passed (step 405). If it has not passed, that is, if the number of waiting threads is 0, or if the set allowable time has not passed since the event notification, the process returns to step 403 to select the next notification block.

イベント通知されてから許容時間が経過している場合は、ＯＳは、異常時処理モード１１０で指定される処理を実行する（ステップ４０６）。この処理が終了して復帰した場合は、他のイベントを検査するためにステップ４０１に戻る。 If the allowable time has elapsed since the event notification, the OS executes the process specified in the abnormal time processing mode 110 (step 406). When this process is completed and the process returns, the process returns to step 401 to check another event.

実施例１の代替方式として、例えばスレッドから待機要求を受けるごとに時系列にそのスレッド識別子をテーブルに記録し、イベント通知があるごとにその通知時刻を時系列に記録されたスレッド識別子をテーブルに記録し、イベント通知があるごとにその通知時刻を時系列に記録されたスレッド識別子に対応づけて記録し、各スレッドが実行再開されるごとにスレッド識別子と対応する通知時刻を記録から削除する方式でもよい。イベント監視処理部１１４は、現在時刻が残っている通知時刻より所定の許容時間以上過ぎているか否か判定する。 As an alternative method of the first embodiment, for example, each time a standby request is received from a thread, the thread identifier is recorded in a time series in the table, and each time an event notification is received, the notification time is recorded in the time series in the thread identifier. A method of recording, recording the notification time in association with the thread identifier recorded in time series every time there is an event notification, and deleting the notification time corresponding to the thread identifier from the recording each time execution of each thread is resumed But you can. The event monitoring processor 114 determines whether or not a predetermined allowable time has passed beyond the notification time when the current time remains.

以上のイベント通知と待機処理およびイベント監視処理により、イベント連係操作を介して実行されるプログラムの連係動作の遅延を検知することが可能となる。 By the event notification, the standby process, and the event monitoring process described above, it is possible to detect a delay in the linkage operation of the program executed through the event linkage operation.

実施例１では、ステップ４０１からのイベント監視処理をタイマ割り込み処理の中で実行するため、計算機５００内で優先度の高い処理が不正にＣＰＵを占有した場合でも、イベント連携の遅延を迅速に検知できる。加えて、異常の検知時に異常処理を実行することができるため、イベント連携の異常に対して迅速な対処が可能となる。 In the first embodiment, since the event monitoring process from step 401 is executed in the timer interrupt process, even if a high priority process illegally occupies the CPU in the computer 500, the event linkage delay is detected quickly. it can. In addition, since an abnormality process can be executed when an abnormality is detected, it is possible to quickly cope with an event linkage abnormality.

実施例１は、特に、汎用のＯＳを用いたシステムを構成する場合に有効である。すなわち、汎用ＯＳでは、利用者が制御できない処理が走行するため、これによるイベント連携の遅延の可能性が避けられない。本発明によれば、この遅延を検知することができ、それに対応するための処理の実行も可能となる。 The first embodiment is particularly effective when configuring a system using a general-purpose OS. That is, in the general-purpose OS, a process that cannot be controlled by the user travels, so that the possibility of event linkage delay is unavoidable. According to the present invention, this delay can be detected, and processing for responding to the delay can be performed.

なお、実施例１においては、監視イベントテーブル１０１と通知リスト１０２のデータ構造をＯＳのイベントオブジェクトとは別に設けたが、これらは、イベントオブジェクトに統合されていても良い。例えば、イベントオブジェクトに、監視対象であるかを示すフラグ、遅延許容時間、通知リストのアドレスといったデータを含めることによって、同等の処理が実現可能である。 In the first embodiment, the data structure of the monitoring event table 101 and the notification list 102 is provided separately from the event object of the OS, but these may be integrated into the event object. For example, an equivalent process can be realized by including data such as a flag indicating whether to be monitored, a delay allowable time, and a notification list address in the event object.

また、実施例１においては、イベント監視処理部１１４をタイマ割り込みの処理内で実行するとしたが、通常のプログラムで実行しても良い。この場合には、イベント監視管理データ１００がそのプログラムから参照可能なように設定されている必要がある。また、そのプログラムの優先度を高くしておく、そのプログラムの実行をウォッチドッグタイマで監視する、といったことをしても良い。 In the first embodiment, the event monitoring processing unit 114 is executed in the timer interrupt processing, but may be executed by a normal program. In this case, the event monitoring management data 100 needs to be set so that it can be referred to from the program. Further, the priority of the program may be increased, or the execution of the program may be monitored by a watchdog timer.

次に、実施例２について説明する。実施例１では、汎用のＯＳインタフェイスを利用し、イベント処理部５１１の中で監視対象のイベントとそうでないイベントに対する操作を分けていた。しかしイベント監視のために別のＯＳインターフェイスを設けても良い。実施例２では、監視対象イベントに関するＯＳインターフェイス１１５を、汎用のイベント操作インターフェイスとは別に専用的に設ける方法を示す。 Next, Example 2 will be described. In the first embodiment, a general-purpose OS interface is used, and the event processing unit 511 separates operations for monitored events and other events. However, another OS interface may be provided for event monitoring. In the second embodiment, a method is described in which the OS interface 115 related to the monitoring target event is provided separately from the general-purpose event operation interface.

ここでは、デバイスドライバによって新しいイベント操作インターフェイスを導入する構成を示す。図７が、実施例２の構成を示す図である。 Here, a configuration in which a new event operation interface is introduced by a device driver is shown. FIG. 7 is a diagram illustrating a configuration of the second embodiment.

実施例２においては、連携遅延監視を行うイベントに関する待機処理は、図２のステップ２０２からの処理、通知処理は図３のステップ３０２からの処理となる。これらの処理部は、デバイスドライバ内に実装される。但し、ステップ２１１およびステップ３０８の処理はＯＳ標準のイベント処理であって、デバイスドライバから呼び出し可能であるとする。 In the second embodiment, the standby process related to the event for which the cooperation delay monitoring is performed is the process from step 202 in FIG. 2, and the notification process is the process from step 302 in FIG. These processing units are mounted in the device driver. However, the processing in step 211 and step 308 is an OS standard event processing, and can be called from the device driver.

プログラム１２１，１２２は、新しく導入したデバイスドライバへの指示によって、イベント待機と通知処理を実行する。通常、ＯＳインターフェイス１１５は、デバイスドライバに対して指示を発行するためのインターフェイスを含んでおり、プログラムはそれを用いて、イベント操作を実行できる。 The programs 121 and 122 execute event standby and notification processing according to instructions to the newly introduced device driver. Usually, the OS interface 115 includes an interface for issuing an instruction to the device driver, and the program can execute an event operation using the interface.

以上のように実施例２により、既存の汎用ＯＳのイベント連携方法を用いて、連携遅延の検知が可能なイベント連携処理を実現できる。 As described above, according to the second embodiment, it is possible to realize an event cooperation process capable of detecting a cooperation delay by using an existing general-purpose OS event cooperation method.

次に、実施例３について説明する。実施例３は、実施例１または実施例２を適用するクラスタ構成の計算機システムを特徴とする。 Next, Example 3 will be described. The third embodiment is characterized by a computer system having a cluster configuration to which the first or second embodiment is applied.

クラスタ構成の計算機システムは、主系と従系となる２台の計算機で構成される。主系である計算機が実際の処理を行い、従系である計算機は主系での障害発生に備えて待機している。従系は、主系の障害発生を検知すると主系として実行するように切り替わる。これを系切り替えと呼ぶ。クラスタ構成によれば、計算機に障害が発生してもシステム全体としての処理を継続することができる。 A cluster-structured computer system is composed of two computers, a primary system and a secondary system. The computer that is the main system performs the actual processing, and the computer that is the secondary system waits for a failure in the main system. When the slave system detects the occurrence of a fault in the master system, it switches to execute as the master system. This is called system switching. According to the cluster configuration, it is possible to continue the processing of the entire system even if a failure occurs in the computer.

クラスタ構成の計算機システムにおける課題の１つは、迅速に主系の障害を検知することである。本発明は、この障害検知方法の１つとして用いることができる。以下に、図面を用いて説明する。 One of the problems in a cluster-structured computer system is to quickly detect a main system failure. The present invention can be used as one of the failure detection methods. Below, it demonstrates using drawing.

図８は、実施例３のクラスタ構成の計算機システムの構成例と、クラスタ制御手順を示す。計算機８００と計算機８１０の２台の計算機が、通信制御部８０２と８１２を介して接続し、クラスタを構成する。クラスタ制御部８１１は、通信制御部を介してクラスタ制御部８０１と周期的に通信し、計算機８００が正常に動作しているか否かを常に検知する。また、各計算機で実行するＯＳは、本実施形態のイベント処理部５１１とイベント監視管理データ１００を保持している。各計算機で実行するプログラムは、実施例１，２のイベント待機と通知処理により連係動作を実行する。図には示していないが、登録プログラム１２０が指定されたイベントを監視対象としてＯＳに登録し、また、連携遅延検知時の動作をＯＳの異常終了として設定する。 FIG. 8 shows a configuration example of a computer system having a cluster configuration according to the third embodiment and a cluster control procedure. Two computers of the computer 800 and the computer 810 are connected via the communication control units 802 and 812 to form a cluster. The cluster control unit 811 periodically communicates with the cluster control unit 801 via the communication control unit, and always detects whether or not the computer 800 is operating normally. The OS executed on each computer holds the event processing unit 511 and the event monitoring management data 100 of this embodiment. The program executed on each computer executes a linkage operation by event waiting and notification processing in the first and second embodiments. Although not shown in the figure, the registration program 120 registers the specified event in the OS as a monitoring target, and sets the operation at the time of cooperation delay detection as the abnormal termination of the OS.

例えば、計算機８００が主系として実行しているとして、処理の流れを説明する。８５０はその処理フローを示す図である。 For example, the processing flow will be described assuming that the computer 800 is executing as a main system. 850 is a diagram showing the processing flow.

計算機８００上のプログラム８０３と８０４は、イベントによって連携して処理を実行している（ステップ８５１）。連携するイベントは任意のものでよい。ここで、何らかの原因により、イベントによる連携に遅延が発生したとする（ステップ８５２）。イベント監視処理部１１４は、ステップ４０１からの処理によってイベント連携の遅延を検知し、ＯＳを異常終了する（ステップ８５３）。ここでＯＳ異常終了の意味は、計算機８００がシステムとして異常終了したことと同意である。 The programs 803 and 804 on the computer 800 execute processing in cooperation with each other (step 851). Coordinated events may be arbitrary. Here, it is assumed that there is a delay in cooperation due to an event for some reason (step 852). The event monitoring processing unit 114 detects a delay in event cooperation by the processing from step 401 and abnormally terminates the OS (step 853). Here, the meaning of the abnormal termination of the OS is that the computer 800 has terminated abnormally as a system.

待機系の計算機８１０のクラスタ制御部８１１は、計算機８００からの通信が途絶えたことを検知して、主系であった計算機８００が停止したと判定し（ステップ８５４）、計算機８１０を主系として実行するように設定し（ステップ８５５）、対応するプログラムをイベント連携により実行する（ステップ８５６）。 The cluster control unit 811 of the standby computer 810 detects that communication from the computer 800 has been interrupted, determines that the computer 800 that has been the main system has stopped (step 854), and sets the computer 810 as the main system. The program is set to be executed (step 855), and the corresponding program is executed by event cooperation (step 856).

以上により、本発明によるイベント連携の遅延の検知と、クラスタの系切り替え制御を連携できる。これによって、クラスタの系切り替えの要因とする異常の種類を追加でき、更に、イベント連携の遅延という異常を迅速にシステム構成の変更に反映できる。 As described above, detection of event linkage delay and cluster system switching control according to the present invention can be linked. As a result, the type of abnormality that causes cluster system switching can be added, and an abnormality such as event linkage delay can be quickly reflected in a change in system configuration.

以上実施例３の説明では、連携の遅延の検知時にＯＳを異常終了するとしたが、その代わりに通信制御部を介して相手系に異常を通知してもよい。この場合、更に、迅速な異常の検知が可能となる。 In the above description of the third embodiment, the OS is abnormally terminated when the cooperation delay is detected. Instead, an abnormality may be notified to the partner system via the communication control unit. In this case, it is possible to quickly detect an abnormality.

実施例４は、プログラムが実行を起動するする入出力（Ｉ／Ｏ）操作からの復帰の遅延を検知する制御手順を示す。実施例４は、実施例１あるいは実施例２を利用して構成される。 The fourth embodiment shows a control procedure for detecting a return delay from an input / output (I / O) operation that starts execution of a program. The fourth embodiment is configured by using the first or second embodiment.

ネットワーク受信やディスク読み取りなど、時間がかかることが予想されるＩ／Ｏ操作を、プログラムの実行とは非同期に実行できるＯＳがある。このようなＯＳの中には、非同期に実行したＩ／Ｏ操作の完了を、プログラムが割り当てたイベントへの通知操作によってプログラムに通知できるものがある。 There are OSs that can execute I / O operations that are expected to take time, such as network reception and disk reading, asynchronously with program execution. Some OSs can notify the program of completion of an asynchronously executed I / O operation by a notification operation for an event assigned by the program.

このようなＯＳに本発明を適用することにより、Ｉ／Ｏ操作が完了しているにも関わらず、Ｉ／Ｏの完了を待機しているプログラムの再開が遅延する障害を検知可能となる。図９に、実施例４のプログラムおよびＯＳの処理手順を示す。 By applying the present invention to such an OS, it is possible to detect a failure that delays the resumption of a program that is waiting for I / O completion despite completion of the I / O operation. FIG. 9 shows the processing procedure of the program and OS of the fourth embodiment.

まず、プログラムは、Ｉ／Ｏ処理完了をＯＳに通知してもらうためのイベントを割り当て（ステップ９０１）、監視対象として登録する（ステップ９０２）。そして、プログラムは、このイベントをＩ／Ｏ完了時の通知イベントに指定して、ＯＳに非同期でＩ／Ｏ操作を実行するように要求する（ステップ９０３）。 First, the program assigns an event for notifying the OS of completion of I / O processing (step 901) and registers it as a monitoring target (step 902). Then, the program designates this event as a notification event when I / O is completed, and requests the OS to execute an I / O operation asynchronously (step 903).

ＯＳは、Ｉ／Ｏ操作要求を受けて、指示されたＩ／Ｏ操作を開始する（ステップ９０４）。ＯＳは、非同期にＩ／Ｏ操作を実行するよう指示されているので、Ｉ／Ｏ操作を開始してプログラムに制御を戻す。プログラムは、Ｉ／Ｏ操作の完了を割り当てたイベントで待機する（ステップ９０５）。このイベント待機処理は、実施例１又は２のイベント待機処理とする。 In response to the I / O operation request, the OS starts the instructed I / O operation (step 904). Since the OS is instructed to execute the I / O operation asynchronously, the OS starts the I / O operation and returns control to the program. The program waits at the event assigned to complete the I / O operation (step 905). This event standby process is the event standby process of the first or second embodiment.

ＯＳは、非同期に実行しているＩ／Ｏ操作が完了すると、Ｉ／Ｏ操作開始時に指定されたイベントに対して通知操作を実行し、プログラムにＩ／Ｏ完了を通知する（ステップ９０６）。このイベント通知処理は、実施例１又は２のイベント通知処理とする。 When the I / O operation being executed asynchronously is completed, the OS executes a notification operation for the event designated at the start of the I / O operation and notifies the program of the completion of the I / O (step 906). This event notification process is the event notification process of the first or second embodiment.

その後、ＯＳのスケジューラによってイベント待機しているプログラムの実行が再開し（ステップ９０７）、プログラムは処理を継続する。 Thereafter, the execution of the program waiting for the event is resumed by the scheduler of the OS (step 907), and the program continues processing.

Ｉ／Ｏが完了してＯＳがイベント通知操作を実行したがプログラムの実行が再開されない場合、イベント監視処理部１１４が遅延を検知し、異常処理を実行することができる。 When the I / O is completed and the OS executes the event notification operation, but the execution of the program is not resumed, the event monitoring processing unit 114 can detect a delay and execute an abnormal process.

以上により、本発明によれば、Ｉ／Ｏ操作が完了し走行可能となっているにも関わらず、何らかの原因で実行の再開が遅延しているプログラムを検知できる。 As described above, according to the present invention, it is possible to detect a program that is delayed in restarting execution for some reason even though the I / O operation is completed and the vehicle can run.

実施例１のデータ構造と処理モジュールの構成を示す図である。It is a figure which shows the data structure of Example 1, and the structure of a processing module. 実施例１のイベント待機処理手順を示す図である。It is a figure which shows the event standby process procedure of Example 1. FIG. 実施例１のイベント通知処理手順を示す図である。It is a figure which shows the event notification process procedure of Example 1. FIG. 実施例１の実行再開が遅延しているスレッドを検出する手順を示す図である。It is a figure which shows the procedure which detects the thread | sled which execution resumption of Example 1 is overdue. 実施例１の計算機の構成を示す図である。3 is a diagram illustrating a configuration of a computer according to Embodiment 1. FIG. 設定ファイルの内容例を示す図である。It is a figure which shows the example of the content of a setting file. 実施例２の構成を示す図である。6 is a diagram illustrating a configuration of Example 2. FIG. 実施例３のクラスタ構成の計算機システムの構成および制御手順を示す図である。FIG. 10 is a diagram illustrating a configuration and a control procedure of a computer system having a cluster configuration according to a third embodiment. 実施例４の処理手順を示す図である。FIG. 10 is a diagram illustrating a processing procedure of Example 4.

Explanation of symbols

１０１…監視イベントテーブル、１０２…通知リスト、１０３ないし１０５…通知ブロック、１１１…イベント登録処理部、１１２…イベント待機処理部、１１３…イベント通知処理部、１１４…イベント監視処理部、１１５…ＯＳインターフェイス、１２０…登録プログラム、１２１、１２２…プログラム
DESCRIPTION OF SYMBOLS 101 ... Monitor event table, 102 ... Notification list, 103 thru | or 105 ... Notification block, 111 ... Event registration process part, 112 ... Event standby process part, 113 ... Event notification process part, 114 ... Event monitor process part, 115 ... OS interface 120, registration program, 121, 122 ... program

Claims

In a computer in which at least one thread executing a program synchronizes with another program processing by an event, a delay detection method for detecting an execution restart delay of the thread,
When receiving a wait request from the thread in relation to the event, recording information indicating that the event is waiting in association with the thread;
Recording a notification time in the memory for each notification when the notification is received from the processing of the other program regarding the event;
When the waiting thread is resumed, the corresponding waiting record and the corresponding notification time are erased.
Determining whether or not the standby record and the corresponding notification record remain, and whether or not a predetermined allowable time has passed from the notification time when the current time remains,
An event-synchronized delay detection method, wherein, when the current time exceeds the allowable time, it is determined that the execution delay of the corresponding thread is delayed.

2. The event synchronization delay detection method according to claim 1, wherein the step of determining whether or not the current time has exceeded the allowable time is executed in a timer interrupt process of an operating system.

Each step of the delay detection method is realized by executing an operating system (OS), and the interface of the waiting request from the thread and the interface of the notification from the processing of the other program are interfaces of a general-purpose OS. 2. The event synchronization delay detection method according to claim 1, wherein the interface is a dedicated interface different from the face.

In a cluster-structured computer system having a first computer that operates as a main system and a second computer that operates as a standby system, when the first computer detects the delay and detects the execution restart delay 2. The event synchronization delay according to claim 1, wherein an abnormal state of the first computer is asserted, and the second computer detects the abnormal state and switches from a standby operation to a main operation. Detection method.

The event is an event for notifying completion of the input / output operation, the thread issues a request for waiting for the completion of the input / output operation, and the processing of the other program is completion of the input / output operation. The event synchronization delay detection method according to claim 1, wherein:

A computer comprising a CPU and a memory, wherein at least one thread executing a program synchronizes with another program process and an event, and detects a delay in restarting the execution of the thread, the computer comprising:
Means for recording in the memory information indicating that a wait is made in association with the thread when a wait request is received from the thread with respect to the event;
Means for recording a notification time in the memory for each notification when the notification is received from the processing of the other program regarding the event;
Means for erasing the corresponding waiting record and the corresponding notification time when the waiting thread is resumed;
Means for determining whether or not the standby record and the corresponding notification record remain, and whether or not a predetermined allowable time has passed beyond the notification time when the current time remains;
When the current time has passed the allowable time or more, it is determined that the execution delay of the corresponding thread is delayed.

7. The computer according to claim 6, wherein the means for determining whether or not the current time has exceeded the allowable time is executed in a timer interrupt process of an operating system.

Each means of the computer is realized by executing an operating system (OS), and an interface of a waiting request from the thread and an interface of notification from the processing of the other program are an interface of a general-purpose OS. 7. The computer according to claim 6, wherein each is a different dedicated interface.

A first computer that operates as a main system and a second computer that operates as a standby system constitute a cluster-structured computer system. The computer performs the delay detection as the first computer and resumes execution. 7. An abnormal state of the first computer is asserted when a delay is detected, and the second computer detects the abnormal state and switches from standby system operation to main system operation. Calculator.

The event is an event for notifying completion of the input / output operation, the thread issues a request for waiting for the completion of the input / output operation, and the processing of the other program is completion of the input / output operation. 7. The computer according to claim 6, wherein the computer is notified.