JPS60171544A - Self-diagnosis device for abnormality of computer system - Google Patents

Self-diagnosis device for abnormality of computer system

Info

Publication number
JPS60171544A
JPS60171544A JP2716384A JP2716384A JPS60171544A JP S60171544 A JPS60171544 A JP S60171544A JP 2716384 A JP2716384 A JP 2716384A JP 2716384 A JP2716384 A JP 2716384A JP S60171544 A JPS60171544 A JP S60171544A
Authority
JP
Japan
Prior art keywords
abnormality
level
task
computer system
timer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2716384A
Other languages
Japanese (ja)
Inventor
Haruki Inoue
春樹 井上
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Engineering Co Ltd
Hitachi Ltd
Original Assignee
Hitachi Engineering Co Ltd
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Engineering Co Ltd, Hitachi Ltd filed Critical Hitachi Engineering Co Ltd
Priority to JP2716384A priority Critical patent/JPS60171544A/en
Publication of JPS60171544A publication Critical patent/JPS60171544A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2236Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test CPU or processors

Abstract

PURPOSE:To suppress a computer system to minimize the discontinuation of the function by detecting a fault factor of the software by the computer system itself and removing said fault factor. CONSTITUTION:A level action diagnosis module 2 and an action monitor timer 3 are provided for each working level of all tasks for an operating system OS1. The module 2 is started periodically by the OS1 and sets the fixed time value to a timer 3. The timer 3 sets 1 to a level action monitor register 4 after the count-up of time to store an abnormality. The OS1 monitors the working time of each task and sets this time value to a tast working time memory 5. A system diagnosis device 6 works periodically to analyze the contents of a fault and detects the direct factor of the abnormality. Then the device 6 delivers a stop request of the corresponding tast to the OS1 and also displays the abnormality contents to an external abnormality display device 7.

Description

【発明の詳細な説明】 〔発明の利用分野」 本発明は計算機システムの異常診断装置に係り、特に、
ソフトウェアの不良により引き起こされるシステムの全
機能停止を最小の局所停止に抑制する自己診断装置に関
する。
[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to an abnormality diagnosis device for a computer system, and in particular,
The present invention relates to a self-diagnosis device that suppresses a total system outage caused by a software defect to a minimum local outage.

〔発明の背景〕[Background of the invention]

複数の独立したプログラムが同時に動作し、これらを一
定の手順で制御するオペレーティングシステム(以下0
S)t−もつ計算機システムでは、全ての処理機能を常
に遅滞なく実行させることが最大の眼目である。
An operating system (hereinafter referred to as 0
S) In a computer system with t-, the most important point is to always execute all processing functions without delay.

従来、プログラムの不良に対してO8はその多くを自己
判断し、その不良の全システムへの波及を最小限に抑制
することにある程度成功している。
Conventionally, the O8 has determined most of the program defects by itself, and has been successful to some extent in minimizing the spread of the defects to the entire system.

例えば、データの書込みエリアに対するプロテクトチェ
ック、無効命令の実行チェックなどで不良発見時、直ち
に、該当のタスク(以後独立して動作するプログラムの
最小単位をタスクと称する)を動作禁止とする機能がこ
れに該当する。
For example, when a defect is discovered during a data write area protection check or invalid instruction execution check, this function immediately disables the corresponding task (hereinafter the smallest unit of an independently operating program is referred to as a task). Applies to.

しかし、現在まで抑制することが困難であった不良に、
プログラムの永久ループ及び共有資源(データファイル
)の破壊防止のための占有・解除に関する命令によるタ
スク間のテンドロックニがあった。計算機システムでは
タスクにいくつかの動作レベルを与え、処理の重要性・
緊急性に応じて優先権を与えているが、あるレベルのタ
スクが永久ループに陥いると、それより低レベルのタス
クは、以後、全て待状態となり、実質的な機能停止とな
る。また、占有・WC除命令の発行會誤まると、ある資
源に対して入出力を行なう機能が全て停止トなる。この
様に、これらの不良はシステム全体の機能停止となる可
能性が高い。
However, defects that have been difficult to suppress until now,
There was an endless loop of the program and a tend lock between tasks due to commands related to possession and release to prevent shared resources (data files) from being destroyed. In computer systems, tasks are given several action levels, and the importance and importance of processing are determined.
Priority is given according to the level of urgency, but if a task at a certain level falls into an eternal loop, all tasks at a lower level will be placed in a waiting state and will essentially stop functioning. Furthermore, if an occupancy/WC removal command is issued incorrectly, all functions for inputting and outputting to a certain resource are stopped. Thus, these defects are likely to cause the entire system to stop functioning.

これに対し、従来より様々な手法が考えられ実用化さr
してきた。例えば、永久ループの検出にはいくつかのタ
イマーを使用するなどである。これは各タスク毎に処理
最大時間を予め計算、設定しておき、この時間を超える
と該当のタスク全実行禁止とする方法である。この方法
では該当タスクに対しては完全な対策となるが、仮にそ
のタスクが直接原因でない場合は誤った処置を実施した
ことになり事態をより悪化させてしまう。また、処理最
大時間の設定は各タスクの設計者に任されるため、多く
の場合、十分検討されていなかったり、設定されていな
かったりして、十分信頼することはできないものであっ
た。また、システム全体としての方法として最F位レベ
ルにタスクを設け・・−ドウエアタイマー(ウォッチ嚇
ドグ・タイマーと呼ばわる)[対して周期的に一定値を
設定し、動作不可能の時、外部接点eONさせる方法が
あるが、本方法では異常の検知が可能なたけで、システ
ム停止の抑制には効果がない。
To deal with this, various methods have been considered and have not been put into practical use.
I've been doing it. For example, some timers may be used to detect endless loops. This is a method in which the maximum processing time is calculated and set in advance for each task, and when this time is exceeded, the execution of all tasks is prohibited. This method provides a perfect countermeasure for the task in question, but if the task is not the direct cause, then the wrong action has been taken and the situation will worsen. Furthermore, since the setting of the maximum processing time is left to the designer of each task, in many cases it has not been sufficiently considered or set, making it unreliable. In addition, as a method for the entire system, a task is set at the highest level. There is a method of turning the system on, but this method only detects an abnormality and is not effective in preventing system stoppage.

このように、従来の方法では十分な効果をあげることは
不可能であった。
As described above, it has been impossible to achieve sufficient effects with conventional methods.

〔発明の目的〕[Purpose of the invention]

本発明の目的はソフトウェアの不良要因をシステム的に
短時間で、かつ、的確に把え、システム自身が直接要因
金取り除くことにより、システムの機能停止を最小限に
抑制する装置を提供するにある。
An object of the present invention is to provide a device that minimizes system outages by systematically identifying the causes of software failures in a short time and accurately, and directly removing the causes by the system itself. .

〔発明の概要〕[Summary of the invention]

本発明では各プログラム゛の設計者に異常検出機能設定
をゆたねることを止め、計算機システム自身がその検出
と、要因除去全行なう点に特徴がある。すなわち、O8
が各レベルに対して、周期的に動作可、不可を診断し、
かつ、常に全タスクの処理時間を測定し、これら二つの
情報より異常要因を決定し、該当要因を除去することに
ある。
The present invention is characterized in that it does not depend on the designer of each program to set the abnormality detection function, and the computer system itself performs all the detection and removal of the cause. That is, O8
periodically diagnoses whether the operation is possible or not for each level,
Moreover, the purpose is to constantly measure the processing time of all tasks, determine the cause of the abnormality from these two pieces of information, and remove the relevant cause.

〔発明の実施例〕[Embodiments of the invention]

以下、本発明の一実施例を第1図ないし第4図音用いて
説明する。
An embodiment of the present invention will be described below with reference to FIGS. 1 to 4.

第1図は、一つのタスクが永久ループに陥った時のシス
テムへの波及を示したものである。
Figure 1 shows the effects on the system when one task falls into an endless loop.

縦軸にはタスクの動作レベル(本例では7レベルに設定
)、横軸に時間の経過ケ示す。図において時刻t。〜1
oではシステムが正常に動作しており、より高位のレベ
ル(若い番号のレベルを高位と称すり、1.、L2.・
・・L7)のタスクが優先的に動作している。t8〜t
9間でタスク■がある資源を占有(リザーブ[F]と称
す)している。to よりタスク■が動作を開始するが
、何らかの要因により、永久ループ状態となる。その後
、レベルの高いタスク■は正常動作するが、レベルの低
い■〜■は待ち状態となり、機能停止となる。また、高
レベルのタスクでもその資源を占有しようとする場合、
解除待ちとなり、機能停止となる。こび〕ように、’2
ではシステム全体機能が停止する。
The vertical axis shows the task operation level (set to level 7 in this example), and the horizontal axis shows the elapsed time. In the figure, time t. ~1
o, the system is operating normally, and the higher levels (levels with lower numbers are referred to as higher levels, 1., L2..
...Task L7) is operating with priority. t8~t
9, task ■ occupies a certain resource (referred to as reserve [F]). Task (2) starts operating from to, but due to some reason, it becomes in an endless loop state. After that, the high-level task (■) operates normally, but the low-level tasks (2) to (2) enter a waiting state and stop functioning. Also, if a high-level task also tries to occupy that resource,
It will be waiting for release and will stop functioning. '2
The entire system will stop functioning.

第2図は、計算機システム異常自己診断装置構成例を示
す。
FIG. 2 shows a configuration example of a computer system abnormality self-diagnosis device.

本装黄け081下に、全タスク動作レベル毎に、レベル
動作診断モジュール2と、動作監視タイマー3を設け、
2は周期的に1より起動され3に一定時間値全セットす
る。3はタイムアンプするとレベル動イ11監視Vジス
タ4に1をセントし、異常を記憶する。O8は各タスク
の動作時間を監視しており、この値をタスク動作時間メ
モリー5にセットしている。システム診断装置6は周期
的に動作し、異常内容を解析し、その直、接原因を探し
田しIVc対し該当タスクの停止要求を発行、壕だ、外
部の異常表示装置7に異常内容を表示する。
Under the main cover 081, a level operation diagnosis module 2 and an operation monitoring timer 3 are provided for each task operation level,
2 is periodically activated by 1 and sets all values to 3 for a certain period of time. When 3 is a time amplifier, 1 is sent to the level movement 11 monitoring V register 4, and an abnormality is memorized. O8 monitors the operation time of each task and sets this value in the task operation time memory 5. The system diagnostic device 6 operates periodically, analyzes the contents of the abnormality, searches for the direct cause, issues a request to the IVc to stop the corresponding task, and displays the contents of the abnormality on the external abnormality display device 7. do.

ここで、2はタスクと同等の動作を行なうように構成さ
れ、動作待ち行列に接続される。従って、各レベルの動
作可、不司金3金用いて検出することになる。
Here, 2 is configured to perform an operation equivalent to a task and is connected to an operation queue. Therefore, each level of operation is detected using the three metals.

第3図は、6の詳細構成を示す。6は4.5會入力とし
て、まず、4が全てゼロが否が、すなわち、システムに
異常があるか無いがを判断する67゜次に4のフラグO
N最左端No決定装置68により、異常レベル全検出後
、次に異常要因タスク決定装置69vcより、直接原因
を解析する。
FIG. 3 shows the detailed configuration of 6. 6 is the 4.5 meeting input. First, it is determined whether 4 is all zero or not, that is, whether there is an abnormality in the system or not. Next, the flag O of 4 is determined.
After all abnormal levels are detected by the leftmost N number determining device 68, the direct cause is analyzed by the abnormal cause task determining device 69vc.

69は68で決定された異常レベルに含まれるタスクの
動作時間を5より取り出し、その中で最大のものを直接
原因とする。その後、タスク動作停止要求装@:610
が1に対して停止要求611を、異常表示装置612が
外部に対し表示出力する。
69 extracts the operating times of the tasks included in the abnormality level determined in 68 from 5, and determines the maximum among them as the direct cause. After that, task operation stop request device @:610
The abnormality display device 612 displays and outputs a stop request 611 to the outside.

第4図は、第1図の状態全本装置により、最小の波及で
抑制した例である。
FIG. 4 shows an example in which all of the conditions shown in FIG. 1 are suppressed with minimal influence by this device.

’!1で全システムの一時的な停止状態となるがココテ
C1〜C7が動作するとc1〜Cuffレベルが高いた
め、動作し、3にタイマー館全セントするが、C4〜C
7はタスク■の永久ループにより、3に値をセットでき
ない。3がt、2でタイムアンプすると4がONとなり
、6はタスク■を検出し、1は■を動作禁止とする。こ
れにより、資源占有タスク■が動作し、これを解除(q
う)する。
'! At 1, the entire system is temporarily stopped, but when C1 to C7 operate, the c1 to Cuff level is high, so they operate, and at 3, the timer hall all cents, but C4 to C
7 cannot set the value to 3 due to the eternal loop of task ■. 3 is t, time amplification is performed at 2, 4 is turned on, 6 detects task 2, and 1 disables task 2. As a result, the resource-occupying task ■ operates and is released (q
c) Do.

この結果、待ち状態となっていたタスクが次々に待ち解
除され、t24でシステムは■以外の機能が全て正常と
なり、異常の波及は最小に抑制さね、たことになる。
As a result, the tasks in the wait state are released from the wait state one after another, and at t24, all functions other than ■ in the system become normal, and the spread of the abnormality can be minimized.

〔発明の効果〕〔Effect of the invention〕

本発明によtば、計算機自身がソフトウェアの異常を自
身で検出し、かつ、その要因を除去することができるの
で、計算機システムの信頼性、稼動路を向上させること
ができる。
According to the present invention, since the computer itself can detect software abnormalities and eliminate the cause thereof, the reliability and operation path of the computer system can be improved.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明のタスク永久ループによるシステl、へ
り波及の説明図、第2図は本発明の計算機システム異常
自己診断装置のブロック図、第3図は本発明の詳細1構
成図、第4図は本発明の異常抑止動作側図である。
FIG. 1 is an explanatory diagram of system l and edge spread due to the eternal task loop of the present invention, FIG. 2 is a block diagram of the computer system abnormality self-diagnosis device of the present invention, and FIG. 3 is a detailed 1 configuration diagram of the present invention. FIG. 4 is a side view of the abnormality suppression operation of the present invention.

Claims (1)

【特許請求の範囲】[Claims] 1、複数の独立したプログラムが同時に動作し、これら
を一定の手順で制御するオペレーティングシステムをも
つ計算機システムにおいて、前記プログラムの動作レベ
ル毎に設けられ、動作監視を行なうレベル動作診断装置
と、動作監視タイマーと、この動作監視タイマーがタイ
ムアツプするとデータがセットされるレベル動作監視レ
ジスターと、このレベル動作監視レジスターとプログラ
ム動作時間メモリーとを入力とし、システム異常要因を
解析し、該当する要因全除去し、外部に表示するシステ
ム診断装置とからなることを特徴とする計算機システム
異常自己診断装置。
1. In a computer system having an operating system in which a plurality of independent programs operate simultaneously and control them according to a fixed procedure, a level operation diagnostic device is provided for each operation level of the program and monitors the operation, and an operation monitor. A timer, a level operation monitoring register to which data is set when the operation monitoring timer times up, this level operation monitoring register and program operation time memory are input, analyze the cause of system abnormality, remove all applicable causes, A computer system abnormality self-diagnosis device comprising a system diagnosis device that displays an external display.
JP2716384A 1984-02-17 1984-02-17 Self-diagnosis device for abnormality of computer system Pending JPS60171544A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2716384A JPS60171544A (en) 1984-02-17 1984-02-17 Self-diagnosis device for abnormality of computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2716384A JPS60171544A (en) 1984-02-17 1984-02-17 Self-diagnosis device for abnormality of computer system

Publications (1)

Publication Number Publication Date
JPS60171544A true JPS60171544A (en) 1985-09-05

Family

ID=12213385

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2716384A Pending JPS60171544A (en) 1984-02-17 1984-02-17 Self-diagnosis device for abnormality of computer system

Country Status (1)

Country Link
JP (1) JPS60171544A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0997197A (en) * 1995-09-28 1997-04-08 Nec Corp Self-diagnosis method for information processor

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0997197A (en) * 1995-09-28 1997-04-08 Nec Corp Self-diagnosis method for information processor

Similar Documents

Publication Publication Date Title
US6662204B2 (en) Thread control system and method in a computer system
JPH05108391A (en) Method for continuing program execution
US4096564A (en) Data processing system with interrupt functions
JP2000187600A (en) Watchdog timer system
JP5212357B2 (en) Multi-CPU abnormality detection and recovery system, method and program
EP0125797A1 (en) Interrupt signal handling apparatus
JPH02294739A (en) Fault detecting system
JPS60171544A (en) Self-diagnosis device for abnormality of computer system
JPH03259349A (en) Fault recovering system
JPH03179538A (en) Data processing system
JPS6115239A (en) Processor diagnosis system
JP2870250B2 (en) Microprocessor runaway monitor
JPH04369046A (en) Test system for active check circuit
JP2836084B2 (en) Computer inspection equipment
JP2922981B2 (en) Task execution continuation method
CN108415788B (en) Data processing apparatus and method for responding to non-responsive processing circuitry
JP2730209B2 (en) I / O control method
JPH01134637A (en) Supervising system for information in stall processing system
JP4387863B2 (en) Disturbance occurrence detection program and disturbance occurrence detection method
JP2924732B2 (en) Self-diagnosis method for information processing device
JPS6155748A (en) Electronic computer system
JPH0642207B2 (en) Multi-level programming method
JPH01183701A (en) Plant supervisory unit
JPS60195649A (en) Error reporting system of microprogram-controlled type data processor
JPH0469744A (en) Runaway detector for microcomputer