JP2007257395A

JP2007257395A - Fault monitoring method for application

Info

Publication number: JP2007257395A
Application number: JP2006082053A
Authority: JP
Inventors: Eiji Mikawa; 英治三川
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 2006-03-24
Filing date: 2006-03-24
Publication date: 2007-10-04

Abstract

<P>PROBLEM TO BE SOLVED: To ensure and facilitate a fault monitoring for an application in a computer system without a WDT function. <P>SOLUTION: The computer system is provided with a plurality of counters 2<SB>1</SB>to 2<SB>N</SB>corresponding to a plurality of applications 1<SB>1</SB>to 1<SB>N</SB>to be monitored and a monitoring task 3 to read and write count values for each counter. Each application increments the corresponding counter at a constant cycle during normal processing or at preset termination of a process. The monitoring task reads the count value of each counter at a constant cycle, sets all counters to "0" when all of each count values are within a defined range and carries out an abnormality alarm or a system reset for the application when even one of each count values is out of the range. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ウオッチドッグタイマー（ＷＤＴ）機能をもたないコンピュータシステムにおいて、アプリケーションの異常を監視する方法に関する。 The present invention relates to a method of monitoring an application abnormality in a computer system that does not have a watchdog timer (WDT) function.

コンピュータとそのアプリケーションを搭載して各種の情報処理機能や制御機能を確立するコンピュータシステム（図３に一般的なハードウェア構成を示す）において、ハードウェアまたはソフトウェア上で予期されないデータ破壊やプログラムミスが存在したときにシステムまたはアプリケーションの暴走となることがある。 In a computer system (a general hardware configuration is shown in FIG. 3) in which various information processing functions and control functions are established by installing a computer and its application, unexpected data destruction or program error on hardware or software. When present, it can cause system or application runaway.

この暴走対策として、一般には、ハードウェア構成のＷＤＴ機能を設け、ＣＰＵから一定時間内にＷＤＴにリセット出力があるか否かにより、正常／異常の警報発生、あるいはシステムリセットをかけるようにしている（例えば、特許文献１参照）。 As a measure against this runaway, generally, a WDT function having a hardware configuration is provided, and a normal / abnormal alarm is generated or a system reset is performed depending on whether or not a reset output is output from the CPU within a predetermined time. (For example, refer to Patent Document 1).

このＷＤＴ機能による監視は、システムおよびプライオリティの高いアプリケーションが暴走した場合に警報あるいはシステムリセットをかけるようにしている。
特開２００５−２５０５２４号公報 In the monitoring by the WDT function, when a system and a high priority application run away, an alarm or a system reset is applied.
JP 2005-250524 A

ＷＤＴ機能を持たないコンピュータシステムでは、アプリケーションが暴走などを起こした場合にシステムリセットをかけることができない。 In a computer system that does not have a WDT function, a system reset cannot be performed when an application runs away.

また、ＷＤＴ機能をもつコンピュータシステムにおいても、暴走したアプリケーションのプライオリティが低ければ，ＣＰＵ負荷の問題にならないので、システムリセットがかからない。 Even in a computer system having a WDT function, if the priority of a runaway application is low, the CPU load does not become a problem, so that the system is not reset.

また、ＷＤＴ機能では、どのアプリケーションが暴走したかの判別ができないため、その後のシステム修復が難しくなる。 Further, in the WDT function, it is impossible to determine which application has runaway, so that subsequent system repair becomes difficult.

本発明の目的は、ＷＤＴ機能をもたないコンピュータシステムにおけるアプリケーションの異常監視を確実、容易にしたアプリケーションの異常監視方法を提供することにある。 An object of the present invention is to provide an application abnormality monitoring method that reliably and easily monitors application abnormality in a computer system having no WDT function.

本発明は、前記の課題を解決するため、各アプリケーションに計測用メモリ（以下、カウンタと略す）を割り当て、各アプリケーションは一定周期で対応つけたカウンタをインクリメントする機能を設け、各カウンタのカウント値を監視タスクが一定周期でそれぞれ読み取り、それらが定義された範囲内にあるか否かによりアプリケーションの正常／異常の判定を得るようにしたもので、以下の方法を特徴とする。 In order to solve the above-described problems, the present invention assigns a measurement memory (hereinafter abbreviated as a counter) to each application, and each application has a function of incrementing a counter associated with a certain period, and the count value of each counter The monitoring tasks are respectively read at regular intervals, and whether the application is normal or abnormal is determined based on whether or not they are within a defined range.

（１）ウオッチドッグタイマー機能をもたないコンピュータシステムにおけるアプリケーションの異常監視方法であって、
コンピュータシステムは、前記アプリケーションのうち、異常監視対象とする複数のアプリケーションに対応つけた複数のカウンタと、この各カウンタのカウント値を読み書き可能な監視タスクを設け、
前記複数のアプリケーションは、その正常処理中に一定周期または予め設定された処理の終了で対応つけられた前記カウンタをインクリメントする処理ステップを設け、
前記監視タスクは、前記各カウンタのカウント値を一定周期で読み取り、各カウント値の全てが定義された範囲内にあるときに各カウンタの全てを「０」にセットし、各カウント値の１つでも範囲外にあるときにアプリケーションの異常警報あるいはシステムリセットを行う処理ステップを設けたことを特徴とする。 (1) An application abnormality monitoring method in a computer system having no watchdog timer function,
The computer system includes a plurality of counters associated with a plurality of applications to be monitored for abnormality among the applications, and a monitoring task capable of reading and writing the count value of each counter,
The plurality of applications are provided with a processing step for incrementing the counter associated with the end of a predetermined cycle or preset processing during the normal processing,
The monitoring task reads the count value of each counter at a constant period, sets all of the counters to “0” when all of the count values are within a defined range, and sets one of the count values. However, it is characterized in that a processing step for performing an application alarm or system reset when it is out of range is provided.

以上のとおり、本発明によれば、各アプリケーションに対応つけてカウンタを設け、各アプリケーションは一定周期で対応つけたカウンタをインクリメントする機能を設け、各カウンタのカウント値を監視タスクが一定周期でそれぞれ読み取り、それらが定義された範囲内にあるか否かによりアプリケーションの正常／異常の判定を得るようにしたため、ハードウェアにＷＤＴ機能をもたないコンピュータシステムにおけるアプリケーションの異常監視が確実、容易になる。 As described above, according to the present invention, a counter is provided in association with each application, each application is provided with a function of incrementing a counter associated with a fixed period, and the monitoring task determines the count value of each counter at a fixed period. Since the normality / abnormality of the application is obtained by reading and whether or not they are within the defined range, it is possible to reliably and easily monitor the abnormality of the application in a computer system having no hardware WDT function. .

すなわち、異常監視機能は、複数のカウンタと１つの監視タスクの追加、およびアプリケーションにカウンタのインクリメント機能を追加するのみで容易に実現でき、しかもアプリケーションの処理の実行を直接にカウンタのカウント値として監視することで確実な監視ができる。 In other words, the abnormality monitoring function can be easily realized by simply adding a plurality of counters and one monitoring task, and adding the counter increment function to the application, and directly monitoring the execution of the application process as the count value of the counter. By doing so, reliable monitoring can be performed.

さらに、異常発生したアプリケーションの識別も可能になり、その修復作業を容易にする。 Furthermore, it becomes possible to identify an application in which an abnormality has occurred, and facilitate the repair work.

また、ＷＤＴ機能を有するコンピュータシステムにおいても、プライオリティの低いアプリケーションが暴走した場合に、システムリセットが可能となる。 Further, even in a computer system having a WDT function, a system reset is possible when an application with a low priority runs away.

図１は、本発明の実施形態を示す異常監視方法の機能構成図である。コンピュータシステムは、それに搭載する各アプリケーションのうち、異常監視対象とする複数のアプリケーション（ＡＰＬ）１₁〜１_Nに対応つけた複数のカウンタ２₁〜２_Nと、この各カウンタのカウント値を読み書き可能な監視タスク３の構成をとる。 FIG. 1 is a functional configuration diagram of an abnormality monitoring method showing an embodiment of the present invention. The computer system reads and writes a plurality of counters 2 ₁ to 2 _{N associated} with a plurality of applications (APL) 1 ₁ to 1 _N to be monitored for anomalies among the applications installed therein, and the count value of each counter A possible monitoring task 3 configuration is taken.

異常監視対象とする各アプリケーション１₁〜１_Nは、その正常処理中に一定周期または予め設定された処理の終了でカウンタをインクリメントする命令を発生する機能を設けておく。なお、インクリメント命令を発生する機能は、コンピュータシステムに搭載する全てのアプリケーションに設ける場合と、暴走を起こす可能性の高い複雑なソフトウェア構成になるアプリケーションやプライオリティの高いアプリケーションに限ることでもよい。 Each of the applications 1 ₁ to 1 _N to be monitored for abnormality is provided with a function for generating a command for incrementing the counter at a constant cycle or at the end of a preset process during the normal process. It should be noted that the function for generating the increment command may be limited to a case where it is provided in all applications installed in the computer system, and an application having a complicated software configuration with a high possibility of causing a runaway or a high priority application.

カウンタ２₁〜２_Nは、インクリメント命令を発生する機能をもつアプリケーションに対応させて、例えばコンピュータシステムの共有メモリ上に設けられ、対応するアプリケーション１₁〜１_Nによるインクリメント動作および他のタスクからもそのカウント値の読み取りを可能とする。 The counters 2 _{1 to} 2 _N are provided, for example, on a shared memory of the computer system in correspondence with an application having a function of generating an increment instruction, and from the increment operation by the corresponding applications 1 ₁ to 1 _N and other tasks. The count value can be read.

コンピュータシステムは、監視タスク内のローカルメモリに、表１に例を示すように、カウンタのカウント値の正常範囲を示す上限値、下限値の組をカウンタ１〜カウンタＮに対応し具備する。 The computer system includes, in the local memory in the monitoring task, a pair of an upper limit value and a lower limit value indicating the normal range of the count value of the counter corresponding to counter 1 to counter N as shown in Table 1.

監視タスク３は、各アプリケーションによるインクリメント命令発生周期よりも十分に長い一定周期で、カウンタ２₁〜２_Nのカウント値を読み取り、各カウント値の全てが表１で定められた上限値及び下限値の範囲内であれば、カウンタ２₁〜２_Nの全てを「０」にセットするカウンタリセット命令を発行する機能をもつ。また、カウンタ２₁〜２_Nのうち、１つでも定義された上限値及び下限値の範囲外のものがあれば、監視タスク３は警報あるいはシステムリセット（ソフトウェアリセット）を行う。 Monitoring task 3, a sufficiently long predetermined period than increment instruction generation period by each application, the counter 2 ₁ to 2 reads the count value of _N, the upper and lower limits that all the count values have been established in Table 1 within the scope of, and has a function of issuing a counter reset instruction to set all the counters 2 ₁ to 2 _N to "0". If any _{one of} the counters 21 to _2N is out of the range of the upper limit value and the lower limit value defined, the monitoring task 3 performs an alarm or system reset (software reset).

図２は、アプリケーションと監視タスクに設けた処理フローを示す。各アプリケーション１₁〜１_Nは、それぞれが本来からもつ処理（一般処理）を実行し（Ｓ１）、必要に応じて一定時間を待つ（Ｓ２）。この一定時間は、上記のように、一定周期または予め設定された処理の終了とする。この一定時間後、対応付けられたカウンタの値をインクリメントし（Ｓ３）、アプリケーション処理の終了でなければ処理Ｓ１に戻って次の処理に移る（Ｓ４）。 FIG. 2 shows a processing flow provided for the application and the monitoring task. Each of the applications 1 ₁ to 1 _N executes a process (general process) inherent to each application (S1), and waits for a certain time as necessary (S2). As described above, this fixed time is a fixed period or the end of a preset process. After this fixed time, the associated counter value is incremented (S3), and if the application process is not finished, the process returns to process S1 and proceeds to the next process (S4).

監視タスク３では、まず、カウンタリセット命令を発行し、カウンタ２₁〜２_Nの全てのカウント値を「０」にセットし（Ｓ１１）、一定時間を待つ（Ｓ１２）。この一定時間は、上記のように、各アプリケーションによるインクリメントコマンド発生周期よりも十分に長い時間とする。この一定時間後に、カウンタ２₁〜２_Nの各カウント値をチェックし（Ｓ１３）、これらカウント値が表１で定められた上限値及び下限値の範囲内か否かを判定する（Ｓ１４）。カウント値が全て表１で定められた上限値及び下限値の範囲内の場合には処理Ｓ１１に戻ってカウンタ２₁〜２_Nの全てのカウント値を「０」にセットする。また、カウント値の１つでも表１で定められた上限値及び下限値定義された範囲外の場合はアプリケーション異常の発生として警報あるいはシステムリセットを行う（Ｓ１５）。 In the monitoring task 3, first, a counter reset command is issued, all count values of the counters 2 ₁ to 2 _N are set to “0” (S11), and a predetermined time is waited (S12). As described above, this fixed time is sufficiently longer than the increment command generation cycle by each application. After this fixed time, the count values of the counters 2 ₁ to 2 _N are checked (S13), and it is determined whether or not these count values are within the upper limit value and the lower limit value defined in Table 1 (S14). When all the count values are within the range of the upper limit value and the lower limit value defined in Table 1, the process returns to step S11 and all the count values of the counters 2 ₁ to 2 _N are set to “0”. If even one of the count values is outside the range defined by the upper limit value and the lower limit value defined in Table 1, an alarm or system reset is performed as an application error (S15).

なお、表１の上限値及び下限値に定義された値は、同じものとするに限らず、アプリケーション毎に異なる値とすることもできる。この場合、アプリケーションの違いによるインクリメント発生周期の違いに対応可能になる。 Note that the values defined for the upper limit value and the lower limit value in Table 1 are not limited to the same value, and may be different values for each application. In this case, it becomes possible to cope with a difference in increment generation cycle due to a difference in application.

また、監視タスクは、警報を発生する場合に、表１で定められた上限値及び下限値の範囲外になるカウンタに対応つけたアプリケーションの識別情報を不揮発性メモリに書き込んでおくことで、その後の異常原因解析を容易にすることができる。 In addition, when an alarm is generated, the monitoring task writes application identification information associated with a counter that falls outside the range of the upper limit value and the lower limit value defined in Table 1 to the non-volatile memory. It is possible to easily analyze the cause of abnormalities.

本発明の実施形態を示す異常監視方法の機能構成図。The function block diagram of the abnormality monitoring method which shows embodiment of this invention. 実施形態におけるアプリケーションと監視タスクの処理フロー。The processing flow of the application and the monitoring task in the embodiment. コンピュータシステムの構成図。The block diagram of a computer system.

Explanation of symbols

１₁〜１_N アプリケーション
２₁〜２_N カウンタ
３監視タスク
1 ₁ to 1 _N application 2 ₁ to 2 _N counter 3 Monitoring task

Claims

An application abnormality monitoring method in a computer system that does not have a watchdog timer function,
The computer system includes a plurality of counters associated with a plurality of applications to be monitored for abnormality among the applications, and a monitoring task capable of reading and writing the count value of each counter,
The plurality of applications are provided with a processing step for incrementing the counter associated with the end of a predetermined cycle or preset processing during the normal processing,
The monitoring task reads the count value of each counter at a constant period, sets all of the counters to “0” when all of the count values are within a defined range, and sets one of the count values. However, an application abnormality monitoring method characterized by providing a processing step for performing an application abnormality alarm or system resetting when out of range.