JPH0355640A

JPH0355640A - Collection system for fault analysis information on peripheral controller

Info

Publication number: JPH0355640A
Application number: JP1191763A
Authority: JP
Inventors: Chisato Komiyama; 小宮山　千里
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1989-07-25
Filing date: 1989-07-25
Publication date: 1991-03-11

Abstract

PURPOSE:To quickly and easily collect the fault analysis information without affecting a system by securing a constitution where the fault analysis information is saved in a saving area of its own device and an asynchronous interruption at occurrence of a designated fault phenomenon. CONSTITUTION:An operator inputs a saving chance setting instruction to a peripheral controller 3 from a system console 6 prior to the start of an operation. When a fault phenomenon designated from a CPU 2 occurs, the controller 3 saves the fault analysis information to a saving area 31 and at the same time produces an asynchronous interruption. When an operating system detects the asynchronous interruption, the system reads the fault analysis information on the area 31 to a software area 12 of a main memory 1 and transfers it to a hardware-only area 11 to restart the process of the controller 3 stored in an error log file 51 via a service processor 4. Thus it is possible to quickly and easily correct the fault information without giving any big influence to the system.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は情報処理システムに於いて利用される周辺制御
装置の障害解析情報を採取する周辺制御装置の障害解析
情報採取方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a peripheral control device failure analysis information collecting method for collecting failure analysis information of a peripheral control device used in an information processing system.

[Conventional technology]

周辺制御装置に障害が発生した場合、周辺制御装置内の
障害解析情報を採取して障害解析を行なう必要があるが
、従来は障害の発生した周辺制御装置に保守用装置を接
続し、この保守用装置から周辺制御装置に対して或る障
害事象が発生した時に処理を停止することを指示してお
き、周辺制御装置の処理が停止した時の内部レジスタの
内容等を障害解析情報として採取するようにしている．
〔発明が解決しようとする課題〕従来は上述したように、周辺制御装置に保守用装置を接
続し、周辺制御装置に保守用装置によって指示された障
害事象が発生した時、障害解析情報を採取するようにし
ているので、障害解析情報を採取するまでに時間がかか
ると共に保守者に負担がかかる問題があり、また、更に
、その間、周辺制御装置をシステムに組込むことができ
ないので、システムに与え゛る影響が大きいという問題
もある．本発明の目的はシステムに大きな影響を与えることなく
、迅速且つ容易に障害解析情報を採取できるようにする
ことにある．〔課題を解決するための手段〕本発明は上記目的を達或するため、ハードウェア専用領域及びソフトウェア領域を有する主
記憶装置と、前記ソフトウェア領域の情報を前記ハードウェア専用領
域に移送する機能を有する中央処理装置と、前記ハードウェア専用領域の情報を読出す機能を有する
サービスプロセッサと、周辺制御装置とを含むシステムに於いて、該周辺制御装
置に、上位装置から指定された障害事象が発生すること
により、障害解析情報を自装置内の退避領域にセーブし
、非同期割込みを発生させる機能を設けると共に、前記周辺制御装置が前記非同期割込みを発生することに
より、前記周辺制御装置の退避領域にセーブされている
障害解析情報を前記主記憶装置のソフトウェア領域に読
出し、その後前記周辺制御装置に処理再開を指示すると
共に前記主記憶装置に格納されている障害解析情報をロ
グさせるログ指示を出力する障害処理手段と、前記サービスプロセッサにより情報が書込まれるエラー
ログファイルとを設け、前記中央処理装置は前記障害処理手段からログ指示が出
力されることにより、前記ソフトウェア領域に格納され
ている障害解析情報を前記ハードウェア専用領域に移送
すると共に前記サービスプロセッサに対してログ指示を
出力し、前記サービスプロ′セッサは前記中央処理装置からログ
指示が出力されることにより、前記ハードウェア専用領
域に格納されている障害解析情報を前記エラーログファ
イルに格納する．〔作　用〕周辺制御装置は上位装置から指定された障害事象が発生
することにより、障害解析情報を自装置内の退避領域に
セーブし、非同期割込みを発生させる．障害処理手段は
周辺制御装置が非同期割込みを発生することにより、周
辺制御装置の退避頷域にセーブされている障害解析情報
を主記憶装置のソフトウェア領域に読出し、その後前記
周辺制御装置に処理再開を指示すると共に前記主記憶装
置に格納されている障害解析情報をログさせるログ指示
を出力する．中央処理装置は障害処理手段からログ指示
が出力されることにより、ソフトウェア領域に格納され
ている障害解析情報をハードウェア専用領域に移送する
と共にサービスプロセソサに対してログ指示を出力する
．サービスプロセッサは中央処理装置からログ指示が出
力されることにより、ハードウェア専用領域に格納され
ている障害解析情報を前記エラーログファイルに格納す
る．〔実施例〕次に本発明の実施例について図面を参照して詳細に説明
する．第１図は本発明の実施例のブロック図であり、ハードウ
ェアによってしかアクセスすることができないハードウ
ェア専用領域ｌ１及びソフトウェアによってもアクセス
することができるソフトウェア領域１２が存在するメイ
ンメモリ１と、メインメモリ１のソフトウェア領域１２
の情報をハードウェア専用領域ｌ１に移送する機能を有
する中央処理装置（ＣＰＵ）２と、上位装置からの退避
契機設定指示によって指示された障害事象が発生するこ
とにより、障害解析情報を自装置に設けられた退避領域
３１にセーブすると共に、非同期割込みを発生させる周
辺制御装置（ＰＣＵ）３と、メインメモリｌのハードウ
ェア専用領域１を読出す機能を有するサービスプロセッ
サ（ＳＶＰ）４と、エラーログファイル５１が設けられ
た磁気ディスク装置５と、システムコンソール６とから
構威されている．第２図〜第４′図は周辺制御装置３に上位装置からの退
避契機設定指示により指示された障害事象が発生した場
合に於ける第１図の処理動作を示した流れ図であり、第
２図はオペレーティングシステム（ＯＳ）の処理例を、
第３図は中央処理装置２の処理例を、第４図はサービス
プロセッサ４の処理例を示している．次に各図を参照して本実施例の動作を説明する．システ
ム運用の開始に先立って、操作者はシステムコンソール
６から周辺制ｗ装置３に対する退避契機設定指示を入力
し、中央処理装置２はこの退避契機設定指示を周辺制御
装置３に加える．周辺ｍ御装置３は退避契機設定指示に
よって指示された障害事象が発生すると、障害解析情報
を自装置内の退避領域３１にセーブすると共に、非同期
割込みを発生させ、中央処理装置２上で動作するオペレ
ーティングシステムは周辺制ＩＩ装置３が発生した非同
期割込みを検出すると、第２図の流れ図に示す処理を開
始する。When a failure occurs in a peripheral control device, it is necessary to collect failure analysis information from the peripheral control device and perform failure analysis, but conventionally, a maintenance device is connected to the peripheral control device where the failure has occurred, and this maintenance The peripheral control device instructs the peripheral control device to stop processing when a certain failure event occurs, and collects the contents of internal registers etc. as failure analysis information when the processing of the peripheral control device stops. That's what I do.
[Problem to be solved by the invention] Conventionally, as described above, a maintenance device is connected to a peripheral control device, and when a failure event instructed by the maintenance device occurs in the peripheral control device, failure analysis information is collected. As a result, it takes time to collect failure analysis information and puts a burden on maintenance personnel.Furthermore, during this time, peripheral control devices cannot be incorporated into the system, so there is no impact on the system. There is also the problem that the impact of The purpose of the present invention is to enable failure analysis information to be collected quickly and easily without significantly affecting the system. [Means for Solving the Problems] In order to achieve the above object, the present invention includes: a main storage device having a hardware-only area and a software area; and a function for transferring information in the software area to the hardware-only area. In a system including a central processing unit having a central processing unit, a service processor having a function of reading information from the hardware-dedicated area, and a peripheral control device, a failure event specified by a higher-level device occurs in the peripheral control device. By doing so, a function is provided to save failure analysis information in a save area within the own device and generate an asynchronous interrupt, and when the peripheral control device generates the asynchronous interrupt, the information is saved in the save area of the peripheral control device. Reads the saved failure analysis information into a software area of the main storage device, then instructs the peripheral control device to restart processing, and outputs a log instruction to log the failure analysis information stored in the main storage device. A fault processing means and an error log file in which information is written by the service processor are provided, and the central processing unit reads a fault analysis stored in the software area by outputting a log instruction from the fault processing means. The information is transferred to the hardware dedicated area and a log instruction is output to the service processor, and the service processor stores the information in the hardware dedicated area in response to the log instruction output from the central processing unit. The failure analysis information that has been detected is stored in the error log file. [Operation] When a failure event specified by the host device occurs, the peripheral control device saves the failure analysis information in the save area within its own device and generates an asynchronous interrupt. The failure processing means reads the failure analysis information saved in the evacuation nod area of the peripheral control device into the software area of the main storage device when the peripheral control device generates an asynchronous interrupt, and then instructs the peripheral control device to resume processing. At the same time, it outputs a log instruction to log the failure analysis information stored in the main storage. When the log instruction is output from the fault processing means, the central processing unit transfers the fault analysis information stored in the software area to the hardware dedicated area and outputs the log instruction to the service processor. The service processor stores failure analysis information stored in the hardware-dedicated area in the error log file in response to a log instruction output from the central processing unit. [Example] Next, an example of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram of an embodiment of the present invention, showing a main memory 1 including a hardware-only area l1 that can only be accessed by hardware and a software area 12 that can also be accessed by software; Software area 12 of memory 1
The central processing unit (CPU) 2, which has the function of transferring the information of A peripheral control unit (PCU) 3 that saves in a provided save area 31 and generates an asynchronous interrupt, a service processor (SVP) 4 that has a function of reading the hardware dedicated area 1 of the main memory l, and an error log. It consists of a magnetic disk device 5 in which files 51 are installed and a system console 6. 2 to 4' are flowcharts showing the processing operation of FIG. 1 when a failure event instructed by the evacuation trigger setting instruction from the higher-level device occurs in the peripheral control device 3. The figure shows an example of operating system (OS) processing.
3 shows an example of processing by the central processing unit 2, and FIG. 4 shows an example of processing by the service processor 4. Next, the operation of this embodiment will be explained with reference to each figure. Prior to the start of system operation, the operator inputs an evacuation trigger setting instruction for the peripheral control device 3 from the system console 6, and the central processing unit 2 adds this evacuation trigger setting instruction to the peripheral control device 3. When the failure event specified by the evacuation trigger setting instruction occurs, the peripheral m control device 3 saves the failure analysis information in the evacuation area 31 within its own device, generates an asynchronous interrupt, and operates on the central processing unit 2. When the operating system detects an asynchronous interrupt generated by the peripheral system II device 3, it starts the process shown in the flowchart of FIG.

オペレーティングシステムは先ずステップ２００に於い
て、非同期割込みを発生した周辺制？１！置３の退避領
域３１に格納されている障害解析情報を読出すことを指
示する読出し指示を発行し、これによりソフトウェア領
域ｌ２に障害解析情報が読出される．その後、オペレー
ティングシステムはステップ２０１に於いて、ソフトウ
ェア領域ｌ２内に読出した障害解析情報をエラーログフ
ァイル５１に格納すること指示するＣＰＵ命令を発行す
る．中央処理装置２は上記ＣＰＵ命令が発行されると、第３
図の流れ図に示すように、先ず、ステップ３００に於い
てＣＰＵ命令によって指示された情報、即ちソフトウェ
ア領域１２に格納されている障害解析情報をハードウェ
ア専用領域１１に移送し、次のステップ３０１に於いて
サービスプロセッサ４に対してハードウェア専用領域１
１に格納されている障害解析情報をエラーログファイル
５１にログすることを指示し、その後、上記ＣＰＵ命令
に対する処理を終了する．また、サービスプロセッサ４
は中央処理装置２からログ指示が加えられると、第４図
の流れ図に示す処理を行ない、メインメモリ１のハード
ウェア専用領域ｌ１に格納されている障害解析情報を読
出し、エラーログファイル５ｌにログする（ステップ４
００）．オペレーティングシステムは第２図のステップ
２０１でＣＰＵ命令を発行すると、次のステソプ２０２
で周辺ａｍ装置３をリセットし、周辺制御装置３の処理
を再開させた後、通常処理に戻る。First, in step 200, the operating system determines which peripheral system generated the asynchronous interrupt. 1! A read instruction is issued to read out the failure analysis information stored in the save area 31 of the storage area 3, and thereby the failure analysis information is read out to the software area 12. Thereafter, in step 201, the operating system issues a CPU command instructing to store the failure analysis information read into the software area 12 in the error log file 51. When the CPU command is issued, the central processing unit 2 executes the third CPU command.
As shown in the flowchart of the figure, first, in step 300, the information instructed by the CPU command, that is, the failure analysis information stored in the software area 12, is transferred to the hardware dedicated area 11, and then in the next step 301. Hardware dedicated area 1 for service processor 4
1 to log the failure analysis information stored in the error log file 51, and then terminates the processing for the CPU instruction. In addition, service processor 4
When a log instruction is applied from the central processing unit 2, it performs the processing shown in the flowchart of FIG. (Step 4
00). When the operating system issues a CPU instruction in step 201 of FIG.
After resetting the peripheral AM device 3 and restarting the processing of the peripheral control device 3, the process returns to normal processing.

〔Effect of the invention〕

以上説明したように、本発明は、中央処理′！Ｊｉ置等
の上位装置から周辺制御装置に対し、障害解析情報の退
避契機を設定し、指定された障害事象が発生することに
より、周辺制御装置が自装置内に設けられている退避領
域に障害解析情報をセーブすると共に上位装置に非同期
割込みを発生し、非同期割込みが発生することにより退
避領域にセーブされている障害解析情報をエラーログフ
ァイルにログすると共に周辺制御装置の処理を再開させ
るものであるので、システムに大きな影響を与えること
なく、容易且つ迅速に障害解析情報を採取できる効果が
ある．As explained above, the present invention provides central processing'! A trigger for saving failure analysis information is set from a host device such as a PCI to a peripheral control device, and when a specified failure event occurs, the peripheral control device releases the failure in the evacuation area provided within its own device. It saves the analysis information and generates an asynchronous interrupt to the host device, and when the asynchronous interrupt occurs, the failure analysis information saved in the save area is logged to the error log file and the processing of the peripheral control device is restarted. This has the effect of allowing failure analysis information to be collected easily and quickly without significantly affecting the system.

[Brief explanation of drawings]

第１図は本発明の実施例のブロック図、第２図はオペレ
ーティングシステムの処理例を示す流れ図、第３図は中央処理装置の処理例を示す流れ図及び、第４図はサービスプロセッサの処理例を示す流れ図であ
る．Fig. 1 is a block diagram of an embodiment of the present invention, Fig. 2 is a flowchart showing an example of processing by the operating system, Fig. 3 is a flowchart showing an example of processing by the central processing unit, and Fig. 4 is an example of processing by the service processor. This is a flowchart showing the process.

Claims

[Scope of Claims] A main storage device having a hardware-only area and a software area; a central processing unit having a function of transferring information in the software area to the hardware-only area; In a system that includes a service processor with a reading function and a peripheral control device, when a failure event specified by a higher-level device occurs in the peripheral control device, failure analysis information is saved in a save area within the device itself. The peripheral control device generates the asynchronous interrupt, thereby saving the failure analysis information saved in the save area of the peripheral control device to the software in the main storage device. failure processing means for outputting a logging instruction for reading failure analysis information stored in the main storage device and logging the failure analysis information stored in the main storage device, and then instructing the peripheral control device to restart processing; and an error log file, and upon output of a log instruction from the failure processing means, the central processing unit transfers the failure analysis information stored in the software area to the hardware dedicated area, and also transfers the failure analysis information stored in the software area to the hardware dedicated area. The service processor outputs a log instruction to the central processing unit, and upon output of the log instruction from the central processing unit, the service processor instructs the service processor to store the failure analysis information stored in the hardware dedicated area in the error log file. Features a fault analysis information collection method for peripheral control devices.