JPH04213123A - Preventive maintenance system for fault of electronic computer - Google Patents

Preventive maintenance system for fault of electronic computer

Info

Publication number
JPH04213123A
JPH04213123A JP2405320A JP40532090A JPH04213123A JP H04213123 A JPH04213123 A JP H04213123A JP 2405320 A JP2405320 A JP 2405320A JP 40532090 A JP40532090 A JP 40532090A JP H04213123 A JPH04213123 A JP H04213123A
Authority
JP
Japan
Prior art keywords
storage device
error
preventive maintenance
fault
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2405320A
Other languages
Japanese (ja)
Inventor
Tatsuya Sugano
辰也 菅野
Shinichiro Sakuraba
桜庭 伸一郎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP2405320A priority Critical patent/JPH04213123A/en
Publication of JPH04213123A publication Critical patent/JPH04213123A/en
Pending legal-status Critical Current

Links

Landscapes

  • Test And Diagnosis Of Digital Computers (AREA)
  • Debugging And Monitoring (AREA)

Abstract

PURPOSE:To allow a computer system to have a self-diagnostic function, history managing function, and alarm function in order to operate a preventive maintenance by early discovering the fault of an outside storage device by an on-line operation. CONSTITUTION:The self-diagnosis of the outside storage device is automatically operated in a fixed cycle in parallel with the normal operation of an electronic computer in order to cope with the preventive maintenance of the fault of the outside storage device of the electronic computer. The diagnosed result is held and updated, an error history is generated, and the number of times of an error is monitored. An alarm is given to an operator in the case of more than the fixed number of times of the error, and the preventive maintenance is operated in order to prevent the serious fault.

Description

【発明の詳細な説明】[Detailed description of the invention]

【0001】0001

【産業上の利用分野】この発明は電子計算機システムに
関し、特に外部記憶装置に故障が発生し復旧不可能にな
る以前に、オンラインにて外部記憶装置の定期的自己診
断を行う電子計算機の故障予防保全方式に関するもので
ある。
[Field of Industrial Application] This invention relates to computer systems, and in particular, computer failure prevention that performs periodic online self-diagnosis of external storage devices before a failure occurs in the external storage device and recovery becomes impossible. It concerns the maintenance method.

【0002】0002

【従来の技術】電子計算機における記憶装置は、その使
い方から故障が発生すると、即システムダウンにつなが
るケースが多く、単に故障率が低いからといって安心で
きるものではない。故障の多くは記憶装置に予め保持さ
れていたプログラムを必要時、読み込むと正しく読めな
かったり、読み込んだ内容が違っていたりすることであ
り、この場合、計算機は正常動作をしない。従って、こ
れら故障を未然に防ぐことがシステムの信頼性を上げる
要素であるが、従来の方式は、適切とは言えなかった。
2. Description of the Related Art If a storage device in an electronic computer malfunctions due to its usage, it often leads to immediate system failure, and simply having a low failure rate does not provide peace of mind. Most failures occur when a program previously stored in a storage device is loaded when necessary, or the computer cannot read the program correctly or the contents loaded are incorrect, and in this case, the computer does not operate normally. Therefore, preventing these failures is a factor in increasing the reliability of the system, but conventional methods cannot be said to be appropriate.

【0003】第1の方式は、既に一般的なように、保守
員が定期点検時、装置の診断プログラムを実行し、エラ
ーの有無をチェックし、保全を行う方式である。
The first method, which is already common, is a method in which maintenance personnel run a diagnostic program for the device during periodic inspections, check for errors, and perform maintenance.

【0004】また、第2の方式は、前述に比して一般的
ではないが、電子計算機システムが動作し、記憶装置を
アクセスする時その領域でエラーが発生すると、そのエ
ラーの履歴を格納しておき、定期点検時保守員がその履
歴を見て保全を行う方式である。
[0004] The second method, which is less common than the above-mentioned method, stores the history of the error when an error occurs in the area when the computer system operates and accesses the storage device. In this method, maintenance personnel check the history and carry out maintenance during periodic inspections.

【0005】ここで示すエラーとはソフトウェア、ハー
ドウェアともリトライ可能なエラーを指しており、それ
ぞれリトライ回数を定め、それ以内にエラーが復旧する
と、エラーメッセージ等は出力しないのが一般的である
[0005] The error referred to here refers to an error that can be retried in both software and hardware.The number of retries is determined for each, and if the error is recovered within that time, generally no error message is output.

【0006】上記第2の方式は、リトライで正常な場合
でも、エラーとみなして発生をカウントしかつリトライ
回数もカウントするのが一般的である。
[0006] In the second method, even if a retry is successful, the occurrence is regarded as an error and the occurrence is generally counted, and the number of retries is also counted.

【0007】また記憶装置のエラーは、その進行がリト
ライで復旧するケースの増加およびリトライ回数の増加
に表れるのも一般的である。
[0007]Furthermore, the progress of storage device errors is generally manifested in an increase in the number of cases in which recovery is possible through retries and an increase in the number of retries.

【0008】[0008]

【発明が解決しようとする課題】従来の電子計算機の故
障予防保全方式は、オフラインで保守要員の定期的な診
断プログラムの実施による外部記憶装置の確認、及び計
算機の記憶装置に格納されているエラー情報の収集によ
るエラー解析を実施しなければならず、その為、診断に
人手と時間を要すると共に、次の定期点検に入る間に故
障が発生した場合、故障発見が遅れ故障の度合が進行し
てしまうという慮れもあった。
[Problems to be Solved by the Invention] Conventional failure preventive maintenance methods for electronic computers involve checking the external storage device by periodically running a diagnostic program offline by maintenance personnel, and checking for errors stored in the computer storage device. Error analysis must be performed by collecting information, which requires manpower and time for diagnosis, and if a failure occurs before the next periodic inspection, failure detection may be delayed and the severity of the failure may progress. There was also a consideration that it would end up happening.

【0009】この発明は上記のような問題点を解消する
ためになされたもので、電子計算機の通常の稼動と並行
し、稼動を停止している期間を縫って計算機の自己診断
機能を働かせ、オンラインで故障監視、発見ができる電
子計算機の故障予防保全方式を得ることを目的とする。
[0009] This invention was made in order to solve the above-mentioned problems, and it operates the self-diagnosis function of the computer in parallel with the normal operation of the computer and during the period when the operation is stopped. The purpose is to obtain a fault preventive maintenance method for electronic computers that allows online fault monitoring and detection.

【0010】0010

【課題を解決するための手段】この発明に係る電子計算
機の故障予防保全方式は、電子計算機が定期的に行う外
部記憶装置のアクセス動作に並行して周期的に外部記憶
装置の自己診断を行い、該自己診断結果を保持更新しエ
ラー履歴を生成すると共に、エラー回数を監視し一定回
数以上のエラー発生に対して警報出力する機能を備えた
ものである。
[Means for Solving the Problems] A failure preventive maintenance method for a computer according to the present invention periodically performs self-diagnosis of an external storage device in parallel with access operations of the external storage device that the computer periodically performs. The self-diagnosis result is held and updated to generate an error history, and the system also has a function of monitoring the number of errors and outputting an alarm when an error occurs more than a certain number of times.

【0011】[0011]

【作用】この発明によれば、電子計算機の外部記憶装置
の故障に対する自己診断を、電子計算機の通常の稼動と
並行して自動的に行い、この自己診断結果よりエラー履
歴を生成し、一定回数以上のエラー発生に対して警報を
操作員に告知するといった外部記憶装置の診断、診断結
果の格納、エラー警報管理を一体化して電子計算機が重
大故障に至らないよう予防保全を行うものである。
[Operation] According to the present invention, a self-diagnosis for failures in the external storage device of a computer is automatically performed in parallel with the normal operation of the computer, an error history is generated based on the self-diagnosis results, and the error history is generated a certain number of times. This system integrates external storage device diagnosis, storage of diagnostic results, and error alarm management, such as notifying the operator of an alarm in response to the occurrence of an error, to perform preventive maintenance to prevent the computer from causing serious failure.

【0012】0012

【実施例】以下、この発明の一実施例を図によって説明
する。図1において、1は計算機システムのH/W構成
を示す。11は主記憶装置、12は中央処理装置、13
は入出力制御装置14は対象外部記憶装置である。
DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. In FIG. 1, 1 indicates the H/W configuration of the computer system. 11 is a main storage device, 12 is a central processing unit, 13
The input/output control device 14 is the target external storage device.

【0013】また、2は計算機システム内の外部記憶装
置内におけるS/Wイメージを表したものである。21
はOS(オペレーティングシステム)、22は予防保全
方式を実施するユーティリティプログラムであり、3つ
の機能を備えている。また、23はアプリケーションソ
フトである。上記ユーティリティプログラム22の3つ
機能としては以下の通りである。
Further, 2 represents a S/W image within an external storage device within the computer system. 21
22 is an OS (operating system), and 22 is a utility program that implements a preventive maintenance method, and has three functions. Further, 23 is application software. The three functions of the utility program 22 are as follows.

【0014】 (a)  オンライン点検機能…計算機システムの動作
に影響を与えることなしに任意に指定した周期で記憶装
置の全領域の診断を実施する機能、エラー発生の際はエ
ラーヒストリー機能に順次格納される。 (b)  エラーヒストリー機能…aの機能でのエラー
発生状況を貯え、指定フォーマットに変換して、順次更
新しヒストリーとして保持する機能、エラー情報はH/
Wリトライ,S/Wリトライで復旧されたもの全てが対
象。 (c)  点検管理機能…計算機システムに影響を与え
ず、指定周期にて、エラーヒストリー機能で格納された
エラー情報を参照し予防保全基準と計算機のシステム規
模に応じ、アラームを発生する機能、例えば計算機シス
テム上ではエラーとしないH/Wリトライによるエラー
がある回数超えた場合アラームをメッセージアウトする
ものである。
(a) Online inspection function: A function that diagnoses all areas of the storage device at arbitrarily specified intervals without affecting the operation of the computer system, and when an error occurs, it is sequentially stored in the error history function. be done. (b) Error history function: A function that stores the error occurrence status in the function a, converts it to a specified format, updates it sequentially, and holds it as a history. Error information is stored in H/
This applies to all items restored by W retry and S/W retry. (c) Inspection management function: A function that refers to error information stored in the error history function at specified intervals without affecting the computer system, and generates an alarm according to the preventive maintenance standards and the computer system scale, for example. On a computer system, an alarm message is output when a certain number of errors due to H/W retry is exceeded, which is not considered an error.

【0015】次に実際の予防保全方式について説明する
。図2のシーケンスを例に説明すると、先ずシーケンス
1は通常の計算機システムにおける外部記憶装置のアプ
リケーションウェアのアクセスである。外部記憶装置に
対しアクセスした際、正常終了であれば計算機システム
は次のシーケンスとして動作を継続する。しかしシーケ
ンス2により、エラーが発生した場合、エラーヒストリ
ー機能によりエラー情報の収集が開始される。アプリケ
ーションウェア動作の間にシーケンス3〜5(点線)の
オンライン点検機能が計算機システムに影響を与えない
周期でアクセスが順次外部記憶装置に対し実行されてい
る。正常終了した時(シーケンス3,4)は動作が続行
され、エラーが発生した場合は(シーケンス5)、アプ
リケーションソフトウェアと同様にエラーヒストリー機
能によりエラー情報がスタックされる。この際、点検管
理機能の動作が、エラーヒストリー機能により収集され
たと同時に判定処理動作が実行される。
Next, an actual preventive maintenance method will be explained. To explain the sequence of FIG. 2 as an example, first, sequence 1 is access of application software of an external storage device in a normal computer system. When accessing the external storage device, if the access ends normally, the computer system continues the operation in the next sequence. However, according to sequence 2, if an error occurs, the error history function starts collecting error information. During operation of the application software, the online inspection functions of sequences 3 to 5 (dotted lines) are sequentially accessing the external storage device at a cycle that does not affect the computer system. When the process ends normally (sequences 3 and 4), the operation continues, and when an error occurs (sequence 5), error information is stacked by the error history function in the same way as application software. At this time, the determination processing operation is executed at the same time as the operation of the inspection management function is collected by the error history function.

【0016】図3は点検管理機能の判定処理一事例であ
る(ディスク装置を例)。リトライで救えたコレクタブ
ルデータチェックエラーが発生した場合、20回発生し
たかという判定を実施し回数が基準内であれば図の通り
判定処理としては完了する。次に図で示した基準値を超
えた場合は、警報プログラムが起動されユーザーに警報
する等、計算機システムユーザーとの取り決めに従い処
理を実施する。
FIG. 3 shows an example of the determination processing of the inspection management function (taking a disk device as an example). When a collectable data check error that can be saved by retrying occurs, it is determined whether it has occurred 20 times, and if the number of times is within the standard, the determination process is completed as shown in the figure. Next, if the reference value shown in the figure is exceeded, an alarm program will be activated to alert the user, and other actions will be taken according to the agreement with the computer system user.

【0017】この予防保全方式は、特に固定ヘッドディ
スク装置に有効であるが、他に光(磁気)DISK,フ
ロッピーディスク,マグネットテープ装置のような外部
記憶装置に対しても適用できる。
This preventive maintenance method is particularly effective for fixed head disk devices, but can also be applied to external storage devices such as optical (magnetic) disks, floppy disks, and magnetic tape devices.

【0018】[0018]

【発明の効果】以上のように、この発明によれば、電子
計算機の通常の外部記憶装置のアクセスに並行して、外
部記憶装置の自己診断、エラー履歴、故障警報を行う機
能を備えたことで、オンラインでの自己診断が可能とな
り人的な省力化が図れると共に、予防保全を早期に、且
つ確実に実施できるという効果がある。
[Effects of the Invention] As described above, according to the present invention, a computer is provided with a function to perform self-diagnosis, error history, and failure alarm of an external storage device in parallel with normal access to the external storage device. This allows for online self-diagnosis, which saves human labor, and has the effect of allowing preventive maintenance to be carried out quickly and reliably.

【図面の簡単な説明】[Brief explanation of the drawing]

【図1】この発明の一実施例による電子計算機の故障予
防保全方式における計算機システムのハードウェア構成
図と外部記憶装置のソフトウェアイメージを示す図であ
る。
FIG. 1 is a diagram showing a hardware configuration diagram of a computer system and a software image of an external storage device in a failure preventive maintenance method for an electronic computer according to an embodiment of the present invention.

【図2】本実施例方式の動作を説明するシーケンスであ
る。
FIG. 2 is a sequence explaining the operation of the method of this embodiment.

【図3】本実施例方式における点検管理機能の判定処理
の一事例を示すフローチャートである。
FIG. 3 is a flowchart showing an example of the determination processing of the inspection management function in the method of this embodiment.

【符号の説明】[Explanation of symbols]

1    計算機システム 2    外部記憶装置のソフトウェアイメージ14 
 外部記憶装置 21  オペレーティングシステム 22  ユーティリティプログラム
1 Computer system 2 Software image of external storage device 14
External storage device 21 Operating system 22 Utility program

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】  電子計算機が定期的に行う外部記憶装
置のアクセス動作に並行して周期的に外部記憶装置の自
己診断を行い、該自己診断結果を保持更新しエラー履歴
を生成すると共に、エラー回数を監視し一定回数以上の
エラー発生に対して警報出力する機能を備えたことを特
徴とする電子計算機の故障予防保全方式。
Claim 1: A computer periodically performs self-diagnosis of the external storage device in parallel with the access operation of the external storage device that is periodically performed, maintains and updates the self-diagnosis results, generates an error history, and detects errors. A fault preventive maintenance method for electronic computers characterized by having a function of monitoring the number of errors and outputting an alarm when an error occurs more than a certain number of times.
JP2405320A 1990-12-06 1990-12-06 Preventive maintenance system for fault of electronic computer Pending JPH04213123A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2405320A JPH04213123A (en) 1990-12-06 1990-12-06 Preventive maintenance system for fault of electronic computer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2405320A JPH04213123A (en) 1990-12-06 1990-12-06 Preventive maintenance system for fault of electronic computer

Publications (1)

Publication Number Publication Date
JPH04213123A true JPH04213123A (en) 1992-08-04

Family

ID=18514934

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2405320A Pending JPH04213123A (en) 1990-12-06 1990-12-06 Preventive maintenance system for fault of electronic computer

Country Status (1)

Country Link
JP (1) JPH04213123A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6622264B1 (en) * 1999-10-28 2003-09-16 General Electric Company Process and system for analyzing fault log data from a machine so as to identify faults predictive of machine failures
CN103729274A (en) * 2013-12-31 2014-04-16 广州华多网络科技有限公司 Method and system for detecting audio faults

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6622264B1 (en) * 1999-10-28 2003-09-16 General Electric Company Process and system for analyzing fault log data from a machine so as to identify faults predictive of machine failures
CN103729274A (en) * 2013-12-31 2014-04-16 广州华多网络科技有限公司 Method and system for detecting audio faults

Similar Documents

Publication Publication Date Title
US4922491A (en) Input/output device service alert function
JP3481737B2 (en) Dump collection device and dump collection method
JP2017091077A (en) Pseudo-fault generation program, generation method, and generator
US7171586B1 (en) Method and apparatus for identifying mechanisms responsible for “no-trouble-found” (NTF) events in computer systems
JPH10312327A (en) Mirroring monitor system
KR100319852B1 (en) Computer system having failover function and method thereof
JPH04213123A (en) Preventive maintenance system for fault of electronic computer
CN113625957B (en) Method, device and equipment for detecting hard disk faults
JP3040186B2 (en) Digital controller for control
JP3342039B2 (en) Processing unit that manages files
JP3357777B2 (en) Program control system
JP2679575B2 (en) I / O channel fault handling system
JPH0424838A (en) Fault control system for multiprocessor
JPH05274093A (en) Volume fault prevention control system
JP4593301B2 (en) Elevator failure analysis system
JPH06265445A (en) Monitor
JP2833928B2 (en) Diagnostic initialization method
CN111966514A (en) Exception handling method, exception handling system, electronic equipment and storage medium
JPH0434626A (en) Error logging method
JPS61120248A (en) Diagnosis system
JPH03127233A (en) Patrol diagnosing device for computer system
JP2002229923A (en) Disk state acquisition method and recording medium
JPH0683667A (en) Diagnostic system for information processor
GAWHON SAFEGUARD Data-Processing System
JPH0580838A (en) H/d duplexing system for operator's console