JP2723073B2

JP2723073B2 - Computer system failure log information acquisition method

Info

Publication number: JP2723073B2
Application number: JP7086529A
Authority: JP
Inventors: 辰也高田
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1995-03-17
Filing date: 1995-03-17
Publication date: 1998-03-09
Anticipated expiration: 2013-03-09
Also published as: JPH08263329A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、コンピュータシステム
において障害が発生した場合に障害ログ情報を出力する
障害ログ情報出力方式に関し、特にＣＰＵ等を多重化し
て耐故障性を強化したコンピュータシステムにおける障
害ログ情報出力方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a fault log information output method for outputting fault log information when a fault occurs in a computer system, and more particularly, to a fault log in a computer system in which a CPU and the like are multiplexed to enhance fault tolerance. It relates to an information output method.

【０００２】[0002]

【従来の技術】コンピュータシステムでは、動作中に障
害が発生して正常な動作ができなくなった場合に、当該
障害に関する障害ログ情報を作成する。ユーザは、この
障害ログ情報を調査することによって障害の原因を解析
し、対応処置を施すことができる。2. Description of the Related Art In a computer system, when a fault occurs during operation and normal operation cannot be performed, fault log information relating to the fault is created. The user can analyze the cause of the failure by examining the failure log information and take a countermeasure.

【０００３】ところで、宇宙船や人工衛星等の制御シス
テムや金融システム等のように処理結果の高度な信頼性
を要求されるシステムにおいては、ＣＰＵ等からなる中
央処理部を３重以上に多重化して同一の処理を実行さ
せ、各中央処理部の処理結果を比較して多数決により適
当な処理結果を採用するコンピュータシステムが実現さ
れている。このようなコンピュータシステムでは、常に
２つ以上（あるいは過半数）のＣＰＵが同一の処理結果
を出力した場合に、当該処理結果を採用するため、処理
結果の信頼性が非常に高くなる。また、１つのＣＰＵが
故障しても他の２つ以上のＣＰＵが処理を続行すること
によって故障に対する耐性を向上させることができる。
このようなコンピュータシステムにおいても、所定のＣ
ＰＵが故障により正常な動作ができなくなった場合に
は、当該ＣＰＵについての障害ログ情報が作成される。In a system requiring a high degree of reliability of a processing result, such as a control system for a spacecraft or an artificial satellite, a financial system, or the like, a central processing unit including a CPU or the like is multiplexed into three or more layers. Thus, a computer system is realized in which the same processing is executed, the processing results of the central processing units are compared, and an appropriate processing result is adopted by majority decision. In such a computer system, when two or more (or a majority) CPUs always output the same processing result, the processing result is adopted, so that the reliability of the processing result is extremely high. Further, even if one CPU fails, the other two or more CPUs can continue processing, thereby improving the tolerance against the failure.
Even in such a computer system, a predetermined C
If the PU cannot operate normally due to a failure, failure log information for the CPU is created.

【０００４】この種のコンピュータシステムにおいて
は、所定のＣＰＵに障害が発生すると、当該ＣＰＵは中
央処理部の多重化から切り離される。このため、採取さ
れた障害ログ情報は、ログファイルへ送られることなく
故障した中央処理部内のメモリに格納されたままであっ
た。そこで、障害ログ情報を取得するため、従来は、デ
バッグ用の端末装置等を接続して中央処理部内のメモリ
に格納されている障害ログ情報を読み出したり、多重化
されている他の中央処理部の通常の動作を終了してから
故障した中央処理部にアクセスして障害ログ情報を取得
しログファイルに格納したりしていた。In this type of computer system, when a failure occurs in a predetermined CPU, the CPU is disconnected from the multiplexing of the central processing unit. For this reason, the collected failure log information has not been sent to the log file and has been stored in the memory in the failed central processing unit. Therefore, in order to obtain the fault log information, conventionally, a debugging terminal device or the like is connected to read out the fault log information stored in the memory in the central processing unit, or another multiplexed central processing unit. After the normal operation of, the faulty central processing unit is accessed to obtain fault log information and store it in a log file.

【０００５】また、主動作用の中央処理部と予備の中央
処理部とを設けて中央処理部の多重化を行い、主動作用
の中央処理部に故障が発生した場合でも予備の中央処理
部が処理を継続することによって耐故障性を向上させた
コンピュータシステムも実現されている。このようなコ
ンピュータシステムにおいても、主動作用の中央処理部
が故障の発生によりシステムから切り離されてしまう場
合には、上記のシステムの場合と同様に、障害ログ情報
は故障した中央処理部のメモリに格納されたままとな
る。したがって、障害ログ情報を取得するため、デバッ
グ用の端末装置を必要としたり、予備の中央処理部によ
る処理の終了を待つ必要があった。Further, a central processing unit for main operation and a spare central processing unit are provided to multiplex the central processing units, so that even if a failure occurs in the central processing unit for main operation, the spare central processing unit performs processing. A computer system with improved fault tolerance by continuing the above has also been realized. In such a computer system, if the central processing unit for main operation is disconnected from the system due to the occurrence of a failure, the failure log information is stored in the memory of the failed central processing unit, as in the case of the above system. It remains stored. Therefore, in order to acquire the fault log information, a terminal device for debugging is required, and it is necessary to wait for the end of the processing by the spare central processing unit.

【０００６】[0006]

【発明が解決しようとする課題】上述した従来のコンピ
ュータシステムにおける障害口グ情報取得方式では、多
重化された中央処理部のうちの１つのＣＰＵに障害が発
生すると、該ＣＰＵを含む中央処理部がシステムから切
り離されるため、採取された障害ログ情報をログファイ
ルへ格納することができなかった。このため、障害の解
析、保守を行う上で迅速な対応がとれないという欠点が
あった。In the above-described fault information acquisition method in a conventional computer system, when a failure occurs in one of the multiplexed central processing units, the central processing unit including the CPU is not required. Was disconnected from the system, and the collected failure log information could not be stored in the log file. For this reason, there has been a drawback that a quick response cannot be taken in analyzing and maintaining the failure.

【０００７】また、複数の中央処理部の処理結果を比較
して多数決により適当な処理結果を採用するコンピュー
タシステムにおいては、複数の中央処理部に故障が発生
し、正常に動作している中央処理部が１つのみとなった
場合には、処理結果を採用するための多数決ができなく
なるため、直ちにシステム自体の動作を停止させてい
る。したがって、このような場合には障害口グ情報を取
得することができないという欠点があった。Further, in a computer system in which the processing results of a plurality of central processing units are compared and an appropriate processing result is adopted by a majority decision, a failure occurs in the plurality of central processing units and the central processing unit operating normally. If there is only one copy, it is impossible to make a majority decision to adopt the processing result, so the operation of the system itself is immediately stopped. Therefore, in such a case, there is a drawback that the fault information cannot be obtained.

【０００８】本発明は、上記従来の欠点を解消し、故障
が発生してシステムから切り離された中央処理部から速
やかに障害ログ情報を取得して障害の解析およびシステ
ムの保守に寄与するコンピュータシステムの障害ログ情
報取得方式を提供することを目的とする。また、複数の
中央処理部の処理結果を比較して多数決により適当な処
理結果を採用するコンピュータシステムにおいて、正常
に動作している中央処理部が１つのみとなってシステム
自体の動作を停止させた場合にも障害口グ情報を取得し
てシステムの速やかな復旧に寄与するコンピュータシス
テムの障害ログ情報取得方式を提供することを目的とす
る。The present invention solves the above-mentioned drawbacks of the related art, and quickly acquires failure log information from a central processing unit disconnected from the system due to a failure, thereby contributing to failure analysis and system maintenance. It is an object of the present invention to provide a failure log information acquisition method. Also, in a computer system that compares the processing results of a plurality of central processing units and adopts an appropriate processing result by majority decision, only one central processing unit that operates normally stops operation of the system itself. It is also an object of the present invention to provide a computer system failure log information acquisition method which acquires failure log information even in a case where the error occurs and contributes to prompt recovery of the system.

【０００９】[0009]

【課題を解決するための手段】上記の目的を達成するた
め、本発明は、３重以上に多重化して設けられ同一の処
理を実行する中央処理部と、前記各中央処理部の処理結
果を比較して多数決により適当な処理結果を採用する多
数決比較部とを備えるコンピュータシステムにおいて、
前記各中央処理部に設けられ、前記中央処理部に障害が
発生した場合に、該障害に関する障害ログ情報を他の中
央処理部に転送すると共に、前記障害が発生した中央処
理部から送られた障害ログ情報を受信して出力データと
して出力する診断プロセッサと、前記各中央処理部に設
けられた診断プロセッサを相互に接続する通信バスとを
備える構成としている。In order to achieve the above object, the present invention provides a central processing unit which is provided in a multiplexed configuration of three or more and executes the same processing, and a processing result of each central processing unit. In a computer system including a majority decision comparison unit that adopts an appropriate processing result by majority decision in comparison,
Provided in each of the central processing units, when a failure occurs in the central processing unit, the failure log information related to the failure is transferred to another central processing unit and sent from the central processing unit in which the failure has occurred. The system includes a diagnostic processor that receives the failure log information and outputs it as output data, and a communication bus that interconnects the diagnostic processors provided in each of the central processing units.

【００１０】また、他の態様では、前記多数決比較部
が、２以上の中央処理部に障害が発生し正常に動作して
いる中央処理部が１つのみとなった場合に、該正常に動
作している中央処理部から出力される障害ログ情報だけ
を受け付けるログアウトモードを実行する手段を備える
構成としている。In another aspect, the majority decision comparing unit operates normally when two or more central processing units fail and only one central processing unit operates normally. And a means for executing a logout mode for receiving only the failure log information output from the central processing unit.

【００１１】また、他の態様では、前記診断プロセッサ
が、中央処理部で発生した障害を検出する障害検出手段
と、発生した障害に関する障害ログ情報を取得するログ
収集手段と、取得した障害ログ情報を前記通信バスを介
して他の中央処理部の診断プロセッサへ転送する診断プ
ロセッサ間転送手段と、他の中央処理部の診断プロセッ
サから受信した障害ログ情報を前記多数決比較部へ出力
するＩＯＰ転送手段とを備える構成としている。[0011] In another aspect, the diagnostic processor includes a failure detection unit that detects a failure that has occurred in the central processing unit, a log collection unit that acquires failure log information regarding the failure that has occurred, Between the diagnostic processors of another central processing unit via the communication bus, and IOP transfer means for outputting the fault log information received from the diagnostic processor of the other central processing unit to the majority comparison unit Are provided.

【００１２】上記目的を達成する他の障害ログ情報取得
方式では、多重化された中央処理部と、障害が発生した
中央処理部をシステムから切り離す手段とを備えたコン
ピュータシステムにおいて、前記各中央処理部に設けら
れ、前記中央処理部に障害が発生した場合に、該障害に
関する障害ログ情報を他の中央処理部に転送すると共
に、前記障害が発生した中央処理部から送られた障害ロ
グ情報を受信して出力データとして出力する診断プロセ
ッサと、前記各中央処理部に設けられた診断プロセッサ
を相互に接続する通信バスとを備える構成としている。In another fault log information acquisition method which achieves the above object, in a computer system comprising a multiplexed central processing unit and a means for separating a failed central processing unit from the system, When a failure occurs in the central processing unit, the failure log information on the failure is transferred to another central processing unit, and the failure log information sent from the failed central processing unit is provided. The diagnostic processor includes a diagnostic processor that receives and outputs the output as output data, and a communication bus that interconnects the diagnostic processors provided in each of the central processing units.

【００１３】[0013]

【作用】本発明によれば、各ＣＰＵセットに設けら
れ、通信バスで相互に接続された診断プロセッサが、故
障が発生したＣＰＵセットの障害ログ情報を採取し、正
常なＣＰＵセットから出力することにより、速やかに障
害ログ情報を取得することができる。また、複数の中央
処理部の処理結果を比較して多数決により適当な処理結
果を採用するコンピュータシステムにおいて、正常な中
央処理部が１つのみとなった場合に、当該正常な中央処
理部から出力される障害ログ情報だけを受け付けるログ
アウトモードへ移行して障害ログ情報を出力することに
より、正常な中央処理部が１つのみとなったためにシス
テム自体の動作を停止するような場合にも、速やかに障
害ログ情報を取得することができる。According to the present invention, a diagnostic processor provided in each CPU set and interconnected by a communication bus collects failure log information of a failed CPU set and outputs the information from a normal CPU set. Thus, the failure log information can be obtained promptly. Also, in a computer system that compares the processing results of a plurality of central processing units and adopts an appropriate processing result by majority decision, when only one normal central processing unit is used, the output from the normal central processing unit is output. If the system shifts to the logout mode that accepts only the error log information and outputs the error log information, even if the operation of the system itself stops because only one normal central processing unit is used, Can acquire the failure log information.

【００１４】[0014]

【実施例】以下、本発明の実施例について図面を参照し
て説明する。図１は、本発明の１実施例にかかる障害ロ
グ情報取得方式を実現するコンピュータシステムの構成
を示すブロック図である。なお、本実施例のコンピュー
タシステムは、複数の中央処理部の処理結果を比較して
多数決により適当な処理結果を採用するコンピュータシ
ステムとし、中央処理部を３重化した場合について説明
する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration of a computer system that implements a failure log information acquisition method according to one embodiment of the present invention. The computer system of the present embodiment is a computer system that compares the processing results of a plurality of central processing units and adopts an appropriate processing result by majority decision, and describes a case where the central processing units are tripled.

【００１５】図示のように、本実施例のコンピュータシ
ステムは、中央処理部であるＣＰＵセット１０、２０、
３０と、各ＣＰＵセット１０、２０、３０による実行結
果を入力して比較する多数決比較部５０と、ログ情報を
格納するログファイル６１を含むファイル装置６０とを
備える。As shown in the figure, the computer system of the present embodiment includes a CPU set 10, 20, a central processing unit.
30; a majority decision unit 50 for inputting and comparing the execution results of the CPU sets 10, 20, and 30; and a file device 60 including a log file 61 for storing log information.

【００１６】ＣＰＵセット１０、２０、３０は、図から
明らかなように同一の構成を有し、同一の処理を行う。
これによって、本実施例のコンピュータシステムは中央
処理部が３重化され、耐故障性を向上させている。ＣＰ
Ｕセット１０、２０、３０は、それぞれＣＰＵ１１、２
１、３１と、メモリ１２、２２、３２と、データの入出
力用のＩＯＰ（Ｉｎｐｕｔ／ＯｕｔｐｕｔＰｒｏｃｅ
ｓｓｏｒ）１３、２３、３３と、ＣＰＵセット１０、２
０、３０の状態を診断する診断プロセッサ（Ｄｉａｇｎ
ｏｓｉｓＰｒｏｃｅｓｓｏｒ以下、適宜ＤＧＰと称
す）１４、２４、３４とを備えて構成される。The CPU sets 10, 20, and 30 have the same configuration and perform the same processing as is apparent from the drawing.
Thus, the central processing unit of the computer system according to the present embodiment is tripled, and the fault tolerance is improved. CP
U sets 10, 20, and 30 are provided with CPUs 11, 2,
1, 31, memories 12, 22, 32, and an IOP (Input / Output Process) for data input / output.
sors) 13, 23, 33 and CPU sets 10, 2,
Diagnosis processor (Diagn) for diagnosing states of 0 and 30
ossis Processor (hereinafter, appropriately referred to as DGP) 14, 24, and 34.

【００１７】診断プロセッサ１４、２４、３４は、図２
に示すように、ＣＰＵセット１０、２０、３０上で発生
した障害を検出する障害検出部４１と、当該障害に関す
る障害ログ情報を取得するログ収集部４２と、取得した
障害ログ情報を通信バス８０を介して他の診断プロセッ
サへ転送する診断プロセッサ間転送部４３（以下、ＤＧ
Ｐ間転送部４３と称す）と、受信した障害ログ情報を各
ＣＰＵセット１０、２０、３０のＩＯＰ１３、２３、３
３を介して多数決比較部５０へ出力するＩＯＰ転送部４
４とを備えて構成される。また、図１に示すように、診
断プロセッサ１４、２４、３４は、通信バス８０を介し
て相互に接続される。The diagnostic processors 14, 24, 34 are shown in FIG.
As shown in (1), a failure detection unit 41 that detects a failure that has occurred on the CPU set 10, 20, 30; a log collection unit 42 that acquires failure log information related to the failure; Transfer unit 43 (hereinafter referred to as DG) for transferring to another diagnostic processor via
And transfer the received failure log information to the IOPs 13, 23, and 3 of each of the CPU sets 10, 20, and 30.
IOP transfer unit 4 for outputting to majority decision comparing unit 50 through 3
4 is provided. Also, as shown in FIG. 1, the diagnostic processors 14, 24, 34 are interconnected via a communication bus 80.

【００１８】多数決比較部５０は、ＣＰＵセット１０、
２０、３０から出力されたデータ７０の内容を比較する
データ比較部５１と、入力データ７０が障害ログ情報で
ある場合に当該障害ログ情報をファイル装置６０のログ
ファイル６１へ格納するファイル格納部５２とを備えて
構成される。ここで、データ７０は、図１に示すよう
に、当該データの種別を示すタグ７１とデータ本体７２
とからなる。The majority decision comparing section 50 includes a CPU set 10,
A data comparing unit 51 that compares the contents of the data 70 output from the devices 20 and 30; and a file storage unit 52 that stores the failure log information in the log file 61 of the file device 60 when the input data 70 is failure log information. And is provided. Here, the data 70 includes a tag 71 indicating the type of the data and a data body 72 as shown in FIG.
Consists of

【００１９】データ比較部５１は、通常動作においては
各ＣＰＵセット１０、２０、３０から入力したデータ７
０を比較する。そして、２つ以上のＣＰＵセット１０、
２０、３０から同一内容のデータ７０を入力した場合
に、当該内容のデータ７０をＣＰＵセット１０、２０、
３０の処理結果として採用する。なお、各ＣＰＵセット
１０、２０、３０から入力したデータ７０の内容がすべ
て異なっている場合や、いずれか２つのＣＰＵセット１
０、２０、３０に故障が発生して正常な処理結果が１つ
しか得られない場合は、多数決による決定ができないた
め、本実施例のシステム自体を停止させる。ただし後述
するように、システムの動作がログアウトモードとな
り、障害ログ情報を出力する場合にはこの限りでない。In a normal operation, the data comparing section 51 stores data 7 inputted from each of the CPU sets 10, 20, and 30.
Compare 0. And two or more CPU sets 10,
When data 70 of the same content is input from 20, 30, the data 70 of the same content is input to the CPU sets 10, 20,.
This is adopted as the processing result of Step 30. It should be noted that the data 70 input from each of the CPU sets 10, 20, and 30 have different contents, or that any two CPU sets 1
If a failure occurs in 0, 20, and 30 and only one normal processing result is obtained, the system itself of this embodiment is stopped because a majority decision cannot be made. However, this does not apply to the case where the operation of the system is in the logout mode and failure log information is output, as described later.

【００２０】ファイル格納部５２は、データ比較部５１
による比較の結果、データ７０の内容が同一であり、か
つタグ７１を参照してデータ７０の種類が障害ログ情報
でないことが確認された場合に、当該データ７０をファ
イル装置６０の所定のエリアに格納する。The file storage unit 52 includes a data comparison unit 51
As a result of the comparison, when it is confirmed that the contents of the data 70 are the same and that the type of the data 70 is not the failure log information with reference to the tag 71, the data 70 is stored in a predetermined area of the file device 60. Store.

【００２１】以上のように構成したコンピュータシステ
ムの動作について説明する。通常、ＣＰＵセット１０、
２０、３０は、常に同期をとりながら同一の処理を実行
する。各ＣＰＵセット１０、２０、３０により出力され
る各々の実行結果は、ＩＯＰ１３、２３、３３を介して
多数決比較部５０へ出力される。この際、当該実行結果
であるデータ本体７２の前に、そのデータの種類を示す
タグ７１が付加され、データ７０として多数決比較部５
０へ出力される。The operation of the computer system configured as described above will be described. Usually, CPU set 10,
20 and 30 always execute the same processing while maintaining synchronization. Each execution result output by each of the CPU sets 10, 20, 30 is output to the majority decision comparing section 50 via the IOPs 13, 23, 33. At this time, a tag 71 indicating the type of the data is added in front of the data body 72 as the execution result, and the majority comparison unit 5
Output to 0.

【００２２】各ＣＰＵセット１０、２０、３０からそれ
ぞれ出力されるデータ７０は、多数決比較部５０に入力
される。そして、データ比較部５１が、各々のデータ７
０の内容が同一であるかどうかを比較、判定する。デー
タ比較部５１により、各々のデータ７０の内容が同一で
あり、かつ当該データ本体７２が障害ログ情報以外のデ
ータであることが確認されると、ファイル格納部７２
が、当該データをファイル装置６０の所定のエリアに格
納する。Data 70 output from each of the CPU sets 10, 20, and 30 is input to the majority decision comparing section 50. Then, the data comparing unit 51 checks each data 7
A comparison is made to determine whether the contents of 0 are the same. When the data comparison unit 51 confirms that the contents of the respective data 70 are the same and that the data body 72 is data other than the failure log information, the file storage unit 72
Stores the data in a predetermined area of the file device 60.

【００２３】次に、本実施例による障害口グの取得動作
について説明する。まず、３重化されたＣＰＵセット１
０、２０、３０の１つに障害が発生した場合の障害ログ
情報の取得について、図３のブロック図および図４のフ
ローチャートを参照して説明する。Next, a description will be given of an operation of acquiring a fault log according to the present embodiment. First, a triple CPU set 1
The acquisition of failure log information when one of 0, 20, and 30 has failed will be described with reference to the block diagram of FIG. 3 and the flowchart of FIG.

【００２４】何らかの原因により、ＣＰＵセット１０で
障害が発生したとする。この時、ＣＰＵセット１０の診
断プロセッサ１４の障害検出部４１が発生した障害を検
出し、ログ収集部４２が障害口グ情報を採取する（ステ
ップ４０１、４０２）。次に、診断プロセッサ１４のＤ
ＧＰ間転送部４３が、採取された障害口グ情報を、図３
の矢印で示すように通信バス８０を介してＣＰＵセット
２０、および３０の各診断プロセッサ２４、３４へ転送
する（ステップ４０３）。It is assumed that a failure has occurred in the CPU set 10 for some reason. At this time, the failure detection unit 41 of the diagnostic processor 14 of the CPU set 10 detects the failure, and the log collection unit 42 collects failure log information (steps 401 and 402). Next, D of the diagnostic processor 14
The inter-GP transfer unit 43 stores the collected fault log information in FIG.
Are transferred to the diagnostic processors 24 and 34 of the CPU sets 20 and 30 via the communication bus 80 as shown by arrows (step 403).

【００２５】ＣＰＵセット１０からＣＰＵセット２０、
３０に転送された障害ログ情報は、各診断プロセッサ２
４、３４のＤＧＰ間転送部４３に受信される。障害ログ
情報が受信されると、診断プロセッサ２４、３４のＩＯ
Ｐ転送部４４が、当該データ本体７２が障害ログ情報で
あることを示すタグ７１を付加し、ＩＯＰ２３、３３を
介して多数決比較部５０へ出力する（ステップ４０
４）。From the CPU set 10 to the CPU set 20,
The failure log information transferred to the diagnostic processor 2
4 and 34 are received by the inter-DGP transfer unit 43. When the failure log information is received, IO of the diagnostic processors 24 and 34 is performed.
The P transfer unit 44 adds a tag 71 indicating that the data body 72 is failure log information, and outputs the data to the majority comparison unit 50 via the IOPs 23 and 33 (step 40).
4).

【００２６】ＣＰＵセット２０、３０から出力されたデ
ータ７０が多数決比較部５０のデータ比較部５１に受信
されると、多数決比較部５０は、受信したデータ７０に
対応する、ＣＰＵセット１０からのデータ７０の受信を
待つ。しかし、ＣＰＵセット１０に生じた障害によりＣ
ＰＵセット１０からのデータ７０の出力はない。したが
って、あらかじめ定められた一定の時間が経過する等の
所定のタイミングで、データ比較部５１が、ＣＰＵセッ
ト１０を本実施例のコンピュータシステムから切り離す
（ステップ４０５、４０６）。次いで、データ比較部５
１は、ＣＰＵセット２０、３０から受信したデータ７０
をファイル格納部５２へ転送する（ステップ４０７）。When the data 70 output from the CPU sets 20 and 30 is received by the data comparing section 51 of the majority comparing section 50, the majority comparing section 50 transmits the data from the CPU set 10 corresponding to the received data 70. Wait for reception of 70. However, due to a failure in the CPU set 10, C
There is no output of data 70 from PU set 10. Therefore, the data comparing unit 51 disconnects the CPU set 10 from the computer system of the present embodiment at a predetermined timing such as when a predetermined time elapses (steps 405 and 406). Next, the data comparison unit 5
1 is data 70 received from CPU sets 20 and 30
Is transferred to the file storage unit 52 (step 407).

【００２７】ファイル格納部５２は、受信したデータ７
０に付加されたタグ７１から、当該データが障害口グ情
報であることを確認すると、障害口グ情報であるデータ
本体７２をファイル装置６０のログファイル６１へ格納
する（ステップ４０８、４０９）。以上のようにして、
３重化されたＣＰＵセット１０、２０、３０の１つに障
害が発生した場合の障害ログ情報が取得される。The file storage 52 stores the received data 7
When it is confirmed from the tag 71 added to "0" that the data is fault information, the data body 72 as fault information is stored in the log file 61 of the file device 60 (steps 408 and 409). As described above,
Failure log information in the case where a failure has occurred in one of the tripled CPU sets 10, 20, 30 is acquired.

【００２８】なお、図４のフローチャートにおいて、多
数決比較部５０がＣＰＵセット２０、３０からデータを
入力した後、ＣＰＵセット１０からの入力があった場合
（ステップ４０５）、およびＣＰＵセット１０を切り離
してデータをファイル格納部５２へ転送した際に当該デ
ータが障害ログ情報ではないと判断した場合は、処理が
終了することとなる。これは障害ログ情報を取得する動
作に着目したことによるものである。実際には、それぞ
れの場合において、適当な処理に移行するのは言うまで
もない。In the flowchart of FIG. 4, when the majority decision comparing section 50 receives data from the CPU sets 20 and 30 and then receives an input from the CPU set 10 (step 405), the CPU set 10 is disconnected. When it is determined that the data is not the failure log information when the data is transferred to the file storage unit 52, the process ends. This is because attention has been paid to the operation of acquiring the failure log information. In practice, it goes without saying that in each case, the processing shifts to an appropriate processing.

【００２９】次に、３重化されたＣＰＵセット１０、２
０、３０の２つに障害が発生した場合の障害ログ情報の
取得について、図５のブロック図および図６のフローチ
ャートを参照して説明する。Next, the triple CPU set 10, 2
The acquisition of failure log information when two failures 0 and 30 occur will be described with reference to the block diagram of FIG. 5 and the flowchart of FIG.

【００３０】上述したＣＰＵセット１０に障害が発生し
ている状態において、さらにＣＰＵセット２０に障害が
発生したとする。この時、上記ＣＰＵセット１０の診断
プロセッサ１４と同様に、ＣＰＵセット２０の診断プロ
セッサ２４の障害検出部４１がＣＰＵセット２０障害を
検出し、口グ収集部４２が障害ログ情報を採取する（ス
テップ６０１、６０２）。次に、診断プロセッサ２４の
ＤＧＰ間転送部４３が、採取した障害ログ情報を、通信
バス８０を介してＣＰＵセット３０の診断プロセッサ３
４へ転送する（ステップ６０３）。It is assumed that a fault has occurred in the CPU set 20 while the fault has occurred in the CPU set 10 described above. At this time, similarly to the diagnostic processor 14 of the CPU set 10, the failure detecting unit 41 of the diagnostic processor 24 of the CPU set 20 detects the failure of the CPU set 20, and the log collection unit 42 collects the failure log information (step S1). 601 and 602). Next, the inter-DGP transfer unit 43 of the diagnostic processor 24 transmits the collected failure log information to the diagnostic processor 3 of the CPU set 30 via the communication bus 80.
4 (step 603).

【００３１】ＣＰＵセット２０から転送された障害口グ
情報は、ＣＰＵセット３０の診断プロセッサ３４のＤＧ
Ｐ間転送部４３に受信される。障害ログ情報が受信され
ると、診断プロセッサ３４のＩＯＰ転送部４４が、当該
データ本体７２が障害ログ情報であることを示すタグ７
１を付加し、ＩＯＰ３３を介して多数決比較部５０へ出
力する（ステップ６０４）。The fault information transferred from the CPU set 20 is stored in the DG of the diagnostic processor 34 of the CPU set 30.
It is received by the inter-P transfer unit 43. When the failure log information is received, the IOP transfer unit 44 of the diagnostic processor 34 sends a tag 7 indicating that the data body 72 is the failure log information.
1 is added and output to the majority decision comparing unit 50 via the IOP 33 (step 604).

【００３２】ＣＰＵセット３０から出力されたデータ７
０が多数決比較部５０のデータ比較部５１に受信される
と、多数決比較部５０は、受信したデータ７０に対応す
る、ＣＰＵセット１０および２０からのデータ７０の受
信を待つ。しかし、ＣＰＵセット１０、２０に生じた障
害によりＣＰＵセット１０、２０からのデータ７０の出
力はない。したがって、所定のタイミングで、データ比
較部５１が、ＣＰＵセット１０および２０を本実施例の
コンピュータシステムから切り離す（ステップ６０５、
６０６）。これ以後、多数決比較部５０は、ＣＰＵセッ
ト３０から出力される障害ログ情報だけを受け付けるロ
グアウトモードへ移行する（ステップ６０６）。データ
比較部５１は、口グアウトモードであることを確認する
と、多数決処理を行わずに、ＣＰＵセット３０から受信
したデータ７０をファイル格納部５２へ転送する（ステ
ップ６０７）。Data 7 output from CPU set 30
When 0 is received by the data comparing section 51 of the majority decision comparing section 50, the majority comparing section 50 waits for reception of the data 70 from the CPU sets 10 and 20 corresponding to the received data 70. However, there is no output of the data 70 from the CPU sets 10 and 20 due to a failure that has occurred in the CPU sets 10 and 20. Therefore, at a predetermined timing, the data comparing unit 51 disconnects the CPU sets 10 and 20 from the computer system of this embodiment (step 605,
606). Thereafter, the majority comparison unit 50 shifts to a logout mode in which only the failure log information output from the CPU set 30 is received (step 606). When confirming that the data comparison unit 51 is in the mouthout mode, the data comparison unit 51 transfers the data 70 received from the CPU set 30 to the file storage unit 52 without performing majority processing (step 607).

【００３３】ファイル格納部５２は、受信したデータ７
０に付加されたタグ７１から、当該データが障害口グ情
報であることを確認すると、障害口グ情報であるデータ
本体７２をファイル装置６０のログファイル６１へ格納
する（ステップ６０８、６０９）。以上のようにして、
３重化されたＣＰＵセット１０、２０、３０の２つに障
害が発生した場合の障害ログ情報が取得される。The file storage 52 stores the received data 7
When it is confirmed from the tag 71 added to 0 that the data is fault log information, the data body 72 as fault log information is stored in the log file 61 of the file device 60 (steps 608 and 609). As described above,
Failure log information in the case where a failure has occurred in two of the tripled CPU sets 10, 20, 30 is obtained.

【００３４】なお、図６のフローチャートにおいて、多
数決比較部５０がＣＰＵセット３０からデータを入力し
た後、ＣＰＵセット１０または２０からの入力があった
場合（ステップ６０５）、およびＣＰＵセット１０を切
り離してデータをファイル格納部５２へ転送した際に当
該データが障害ログ情報ではないと判断した場合は、処
理が終了することとなる。これは障害ログ情報を取得す
る動作に着目したことによるものである。実際には、そ
れぞれの場合において、適当な処理に移行するのは言う
までもない。In the flowchart of FIG. 6, when the majority decision comparing section 50 inputs data from the CPU set 30 and then receives an input from the CPU set 10 or 20 (step 605), the CPU set 10 is disconnected. When it is determined that the data is not the failure log information when the data is transferred to the file storage unit 52, the process ends. This is because attention has been paid to the operation of acquiring the failure log information. In practice, it goes without saying that in each case, the processing shifts to an appropriate processing.

【００３５】以上好ましい実施例をあげて本発明を説明
したが、本発明は必ずしも上記実施例に限定されるもの
ではない。例えば、本実施例では中央処理部を３重化し
たコンピュータシステムについて説明したが、４重以上
の多重化を行ったコンピュータシステムにおいても、同
様に本発明を利用することができる。Although the present invention has been described with reference to the preferred embodiments, the present invention is not necessarily limited to the above embodiments. For example, in the present embodiment, the computer system in which the central processing unit is tripled has been described. However, the present invention can be similarly applied to a computer system in which four or more multiplexes are performed.

【００３６】また本実施例では、複数の中央処理部の処
理結果を比較して多数決により適当な処理結果を採用す
るコンピュータシステムを対象として説明したが、主動
作用の中央処理部と予備の中央処理部とを設けて中央処
理部の多重化を行ったコンピュータシステムについて
も、同様に本発明を利用することができる。この場合、
中央処理部が２重化された構成である場合があるが、そ
の場合でも、多数決比較手段等が不要となるに過ぎず、
本発明の特徴的技術である障害ログ情報取得方式に何ら
影響を与えるものではない。In this embodiment, a computer system which compares the processing results of a plurality of central processing units and adopts an appropriate processing result by majority decision has been described. However, a central processing unit for main operation and a spare central processing unit are used. The present invention can be similarly applied to a computer system in which a central processing unit is multiplexed by providing a central processing unit. in this case,
In some cases, the central processing unit has a duplicated configuration, but even in that case, the majority comparison means and the like are no longer necessary.
It does not affect the failure log information acquisition method which is a characteristic technique of the present invention.

【００３７】[0037]

【発明の効果】以上説明したように、本発明は、上記従
来の欠点を解消し、故障が発生してシステムから切り離
された中央処理部から速やかに障害ログ情報を取得する
ことができるため、障害の解析およびシステムの保守に
おける効率の向上を図ることができる。また、複数の中
央処理部の処理結果を比較して多数決により適当な処理
結果を採用するコンピュータシステムにおいて、正常に
動作している中央処理部が１つのみとなってシステム自
体の動作を停止させた場合にも障害口グ情報を取得する
ことができるため、システムの速やかな復旧を実現する
ことができる。As described above, according to the present invention, the above-mentioned conventional disadvantages can be solved, and failure log information can be promptly obtained from the central processing unit disconnected from the system due to the occurrence of a failure. Efficiency in failure analysis and system maintenance can be improved. Also, in a computer system that compares the processing results of a plurality of central processing units and adopts an appropriate processing result by majority decision, only one central processing unit that operates normally stops operation of the system itself. In such a case, the failure log information can also be acquired, so that the system can be quickly restored.

[Brief description of the drawings]

【図１】本発明の１実施例にかかる障害ログ情報取得
方式を実現するコンピュータシステムの構成を示すブロ
ック図である。FIG. 1 is a block diagram illustrating a configuration of a computer system that implements a failure log information acquisition method according to one embodiment of the present invention.

【図２】診断プロセッサの構成を示すブロック図であ
る。FIG. 2 is a block diagram illustrating a configuration of a diagnostic processor.

【図３】１つのＣＰＵセットに障害が発生した場合の
コンピュータシステムの状態を示すブロック図である。FIG. 3 is a block diagram illustrating a state of the computer system when a failure occurs in one CPU set.

【図４】１つのＣＰＵセットに障害が発生した場合の
障害ログ情報の取得動作を示すフローチャートである。FIG. 4 is a flowchart illustrating an operation of acquiring failure log information when a failure occurs in one CPU set.

【図５】２つのＣＰＵセットに障害が発生した場合の
コンピュータシステムの状態を示すブロック図である。FIG. 5 is a block diagram illustrating a state of the computer system when a failure occurs in two CPU sets.

【図６】２つのＣＰＵセットに障害が発生した場合の
障害ログ情報の取得動作を示すフローチャートである。FIG. 6 is a flowchart illustrating an operation of acquiring failure log information when a failure occurs in two CPU sets.

[Explanation of symbols]

１０、２０、３０ＣＰＵセット１１、２１、３１ＣＰＵ１２、２２、３２メモリ１３、２３、３３ＩＯＰ１４、２４、３４診断プロセッサ４１障害検出部４２ログ収集部４３ＤＧＰ間転送部４４ＩＯＰ転送部５０多数決比較部５１データ比較部５２ファイル格納部６０ファイル装置６１ログファイル７０データ７１タグ７２データ本体８０通信バス 10, 20, 30 CPU set 11, 21, 31 CPU 12, 22, 32 Memory 13, 23, 33 IOP 14, 24, 34 Diagnostic processor 41 Failure detection unit 42 Log collection unit 43 Inter-DGP transfer unit 44 IOP transfer unit 50 Majority comparison unit 51 data comparison unit 52 file storage unit 60 file device 61 log file 70 data 71 tag 72 data body 80 communication bus

Claims

(57) [Claims]

1. A central processing unit which is multiplexed into three or more types and executes the same processing, and a majority comparing unit which compares processing results of the central processing unit and adopts an appropriate processing result by majority decision. A computer system provided with each central processing unit, wherein when a failure occurs in the central processing unit, failure log information relating to the failure is stored in all other units.
And transfers to the central processing unit of the Te, and the diagnostic processor to be output to the majority comparing unit for receiving the fault log information which the failure is sent from the other central processing unit occurs, provided to the central processing unit Dedicated for connecting the diagnostic processors to each other and transmitting and receiving the fault log information
A communication bus , wherein the majority comparison unit is configured to output the fault from the diagnostic processor.
A failure log information acquisition method for a computer system, comprising: means for storing harm log information .

2. The system according to claim 1, wherein the majority comparison unit has a fault in two or more central processing units, and only one of the central processing units operates normally.
Other central processing that failed from the central processing unit
When the trouble log information is received by the
Check that there is no data input from the processing
Separating the processing unit, without performing the majority decision processing
The diagnostic processor of the central processing unit operating normally;
Failure of other central processing unit where the failure output from the server has occurred
Shift to logout mode that accepts only log information
The method according to claim 1, further comprising means .

3. The diagnostic processor according to claim 1, wherein the diagnostic processor detects a failure that has occurred in its own central processing unit, a log collection unit that acquires failure log information related to the failure that has occurred, and stores the acquired failure log information in the diagnostic processor. Communication means for transferring to a diagnostic processor of another central processing unit via a communication bus; and transfer means for outputting fault log information received from the diagnostic processor of another central processing unit to the majority comparison unit. The failure log information acquisition method for a computer system according to claim 1 or 2, wherein:

4. A computer system comprising a multiplexed central processing unit and means for separating a failed central processing unit from the system, wherein the computer system is provided in each of the central processing units, and the central processing unit has a failure. If an error occurs, the error log information on the error
And a diagnostic processor that receives the failure log information sent from the other central processing unit in which the failure has occurred and outputs it to a log file, and a diagnostic processor that transfers the failure log information to the central processing unit. Communication for interconnecting the diagnostic processors and transmitting and receiving the fault log information
A failure log information acquisition method for a computer system, comprising: a bus ;

5. The diagnostic processor according to claim 1, wherein the diagnostic processor detects a failure that has occurred in its own central processing unit, a log collection unit that acquires failure log information regarding the failure that has occurred, and stores the acquired failure log information in the diagnostic processor. Communication means between diagnostic processors for transferring to a diagnostic processor of another central processing unit via a communication bus, and transfer means for outputting failure log information received from the diagnostic processor of another central processing unit to the log file The method for acquiring fault log information of a computer system according to claim 4, wherein: