JPS58168163A

JPS58168163A - Machine check processing system

Info

Publication number: JPS58168163A
Application number: JP57051605A
Authority: JP
Inventors: Hisashi Ibe; 井辺　寿; Hideaki Ando; 秀明安藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-03-30
Filing date: 1982-03-30
Publication date: 1983-10-04

Abstract

PURPOSE:To attain the processing efficiently at the generation of emergent machine-check interruption, by performing logical operation between machine- check interruption code correcting information and a machine-check interruption code, and forming a new machine-check interruption code. CONSTITUTION:When a service processor 3 patrols a central processing unit 1, and the stop of the unit 1 is detected, scanning log as to the unit 1 is collected and an error is analyzed. After the error analysis is completed, the service processor 3 produces a machine-check interruption code. The unit 1 performs logical operation between the machine-check interruption code information form the service processor 3 and the machine-check interruption code, and stores the result in a main storage device 2 as a new machine-check interruption code.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、緊急マシンチェックが発生したとき環境状態
情報を主記憶にダンプすると共に、エラー位置情報と環
境状態情報の有効性を示す有効ビットより成るマシンチ
ェック割込コードを主記憶にストアするようＫなりた計
算機システムにおいて、＊境状態情報およびマシンチェ
ック割込コードの主記憶へのストアを中央処理装置１Ｉ
ＥＫ行わせるようにしたマシンチェック処理方式に関す
るものである。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention dumps environmental status information into main memory when an emergency machine check occurs, and dumps environmental status information from a valid bit indicating the validity of error location information and environmental status information. In a computer system configured to store machine check interrupt codes consisting of
This relates to a machine check processing method in which EK is performed.

[Prior art and problems]

Ｎ１図は本発明が適用される計算機システムの概要を示
す図、第２図はマシンチェック割込コード（以下ＭＣＩ
Ｃという）の構成を示す図、第３図は従来の緊急！シン
チェック処理方式を説明する図である。Figure N1 is a diagram showing an overview of the computer system to which the present invention is applied, and Figure 2 is a diagram showing the outline of the computer system to which the present invention is applied.
Figure 3 shows the configuration of the conventional Emergency! FIG. 2 is a diagram illustrating a thin check processing method.

ｌｌｌＩｌ図において、ｌは中央処理装置、２は主記憶
装置、３はサービス−プロセッサをそれぞれ示している
０中央処塩装置１に緊急報告エラー又は重症エラーが発
生し次場合、中央処理装ｆｉｌｌは停止し、サービス・
プロセッサ３の処理なまつ。サービス・プロセッサ３は
中央、Ｉ＆塩装置１の状態をスキャン・アウトによｐａ
ギングし、エラー内容を解析する。解析した結果はＭＣ
ＩＣＥコード化され、！シンチェック割込み時に報告さ
れる０また。それまで動作していた汎用レジスタ、浮動
小数点レジスタ、制御レジスタの内容などの環境状態情
報もマシンチェック割込時に主記憶２に格納され、それ
らの有効性はＭＣＩ　Ｃの有効ビットで示される。！シ
ンチェック割込み時ＫＦ；ｉ、旧ＰＳＷや新ｐｓｗ、各
種レジスタの内容、フェイリング記憶アドレス、リーラ
１ン・コード、拡張制御コード、ＭＣＩＣなどの情報が
主記憶２にストアされ、ｐｓｗのスワツピングが行われ
る。In the figure, l indicates the central processing unit, 2 indicates the main memory, and 3 indicates the service processor. 0 If an emergency reporting error or serious error occurs in the central processing unit 1, the central processing unit fill Stop and service
Processor 3 processing time. Service Processor 3 scans out the status of Central, I & Salt Device 1
log and analyze the error details. The analyzed results are MC
ICE coded! 0 also reported during thin check interrupt. Environmental status information such as the contents of general-purpose registers, floating point registers, and control registers that have been operating up to that point are also stored in the main memory 2 at the time of machine check interrupt, and their validity is indicated by the valid bit of MCI C. ! At the time of syncheck interrupt, information such as KF;i, old PSW, new psw, contents of various registers, failing memory address, reeler 1 code, extended control code, MCIC, etc. is stored in main memory 2, and swapping of psw is performed. It will be done.

第２図はＭＣＩＣの構成を示すものであり、各ビットは
下記のような意味を有している。FIG. 2 shows the configuration of MCIC, and each bit has the following meaning.

ＳＤ二″Ｖイクロプログラム・アドレス・レジスタのパ
リティ・エラーやシステム・コントロール・ルーチンで
エラー検出されたことを示す。SD2''V Indicates that a parity error in the microprogram address register or an error was detected in the system control routine.

ＰＤ：リド２イネ可命令領域でのエラーの発生やリトラ
イ失敗を示す。PD: Indicates the occurrence of an error in the read/write command area or a retry failure.

ＳＲ：命令バッファ・エラー、ＴＬＲＰＥ％セグメント
・テーブル・オリジン・エラー、ＫＳ８にのりカバリ成
功およびサービス・プロセッサ・エラーのりロード成功
を示す。SR: Indicates instruction buffer error, TLRPE% segment table origin error, successful recovery to KS8, and successful loading of service processor error.

ＴＤ：８０番地の２ピクト・エラー、即ちイン　　　゛
タバル・タイマにエラーが発生したことを示す。TD: Indicates a 2-picture error at address 80, that is, an error has occurred in the interval timer.

ＣＤ：ＴＯＤ時計、ＣＰＵタイマ、クロック・コンパレ
ータのいずれかくエラーが発生したことを示すＯＥＤ　：　０Ｍモード中のエラー発生やサービス・プロ
セッサのエラー発生を示すＯＤＧ：ＴＬ、Ｂ切随時にオンとなる。CD: TOD Indicates that an error has occurred in the clock, CPU timer, or clock comparator. OED: Indicates that an error has occurred in the 0M mode or in the service processor. DG: Turns on when TL or B is urgent. Become.

Ｗ　：電源異常を示す。W: Indicates power supply abnormality.

Ｂ　：未使用Ｄ　＝割込マスク・オフでベンディングになりたことを
示す。B: Unused D = Indicates that the interrupt mask is off and bending has occurred.

ＳＥ：主記憶に２ビツト・エラーが発生し几ことを示す
。SE: Indicates that a 2-bit error has occurred in the main memory.

ＳＣ：未使用ＫＥ：キー記憶に２ビツト争エラーが弛生したことを示
す。SC: Unused KE: Indicates that a 2-bit conflict error has occurred in key memory.

ＷＰ：旧ＰＳＷＣ）ＥＭＷＰが正しくストアされたこと
を示すＯＭＳ：旧ＰＳＷマスクとキーが正しくストアされたこと
を示す。WP: Old PSWC) Indicates that the EMWP was stored correctly.MS: Indicates that the old PSW mask and key were stored correctly.

ＰＭ：旧ＰＳＷのプログラムｅマスクと条件コードが正
しくストアされたことを示す。PM: Indicates that the program e-mask and condition code of the old PSW were stored correctly.

ＩＡ：旧ＰＳＷの命令アドレスが正しくストアされたこ
とを示す。IA: Indicates that the old PSW instruction address was correctly stored.

ＦＡ：フェイリング記憶アドレスが正しくストアされた
ことを示す。FA: Indicates that the failing storage address was stored correctly.

ＲＣ：サービス・プロセッサ・エラ一時、ストアされる
。（モデルディペンド）ＥＣ：ＥＤコードが正しくストアされたことを示す。RC: Service processor error Temporarily stored. (Model Depend) EC: Indicates that the ED code was stored correctly.

ＦＰ：浮動小数点レジスタが正しくログアウトされたこ
とを示す。FP: Indicates that floating point registers were successfully logged out.

ＧＲ：汎用レジスタが正しくログアウトされ次ことを示
す。GR: Indicates that the general register has been successfully logged out.

ＣＲ：制御レジスタが正しくログアウトされたことを示
す。CR: Indicates that the control register was successfully logged out.

ＬＧ：未使用ＳＴ：リカパリ時にオンとなシ、主記憶の内容が保障さ
れていることを示す。LG: Unused ST: Turns on during recovery, indicating that the contents of the main memory are guaranteed.

Ｃ’ｌ’：ＣＰＵタイマが正しくログアウトされたこと
を示す。C'l': Indicates that the CPU timer was correctly logged out.

ＣＣ：クロック・コンパレータが正しくログアウトされ
たことを示す。CC: Indicates that the clock comparator was successfully logged out.

ＭＣＥＬ　：オールＯが入る。MCEL: All O's enter.

なお、リロードの成功、不成功はＳＲ又はＥＤのマシン
チェック割込みと共にリージ冒ン・コードで通知される
。リージ冒ン・コードがＸ　　０００００００１’ でＥＤがオンのと＠にはサービス・プロセッサの処理続
行不可能を示し、リージ嘗ン・コードがＸ　　０００１
００００’ でＳＲがオンのときはりロード完了時の割込みを示す。Note that the success or failure of reloading is notified by a league attack code together with an SR or ED machine check interrupt. If the league code is X 00000001' and ED is on, @ indicates that the service processor cannot continue processing, and the league code is X 0001.
0000' indicates an interrupt when loading is completed when SR is on.

従来技術においては、ＭＣＩＣの生成はスキャン・アウ
トなど解析能力のあるサービス・プロセッサ３によシ行
われ、主配憶２への各種レジスタのダンプもサービス・
プロセッサ３から行われていた。また、ＭＣＩＣの有効
ビットも各種レジスタのダンスが成功した場合にサービ
スのプロセッサ３によｐセットされ、主記憶２にストア
されていた。第３図は従来のマシンチェック処理方式の
概要を示すものである。サービス中プロセッサ３は、中
央処理装置１をパトロールし、中央処理装置１の停止を
検出すると、スキャン自ログを収集し、エラー解析を行
い、ＭＣＩＣを生成する。そして、中央処理装置１に対
する有効ビットをオンにする。次いで、中央処理装置１
の汎用レジスタをリードし、汎用レジスタの内容を主記
憶２にログアウトする。ログ・アラ？ＯＫであればＭＣ
Ｉについて同様な処理を行い、ＭＣＩＣを主記憶２にス
トアする。ＭＣＩＣを主記憶２にストアし友後、サービ
ス・プロセッサ３は中央処理装置を起動する。起動され
ると、中央処理装置１はＰＳＷのスワップを行い１次命
令を実行する０第３図に示スよ５に、サービス・プロセ
ッサ３は中央処理装置１１に異常が発生しないかぎシ、
パトロールと呼ばれる中央処理装置の監視を行っている
のみで。In the conventional technology, the generation of MCIC is performed by the service processor 3 which has analytical capabilities such as scan out, and the dumping of various registers to the main storage 2 is also performed by the service processor 3.
This was done from processor 3. Further, the valid bit of the MCIC is also set to p by the service processor 3 when the dance of various registers is successful, and is stored in the main memory 2. FIG. 3 shows an outline of a conventional machine check processing method. The in-service processor 3 patrols the central processing unit 1, and when detecting a stoppage of the central processing unit 1, collects the scan log, performs error analysis, and generates MCIC. Then, the valid bit for the central processing unit 1 is turned on. Next, the central processing unit 1
The general-purpose register is read and the contents of the general-purpose register are logged out to the main memory 2. Log ara? MC if OK
Similar processing is performed for I, and MCIC is stored in main memory 2. After storing the MCIC in the main memory 2, the service processor 3 starts up the central processing unit. When activated, the central processing unit 1 swaps the PSW and executes the primary instruction. As shown in FIG.
It only monitors the central processing unit called patrol.

直接のジ謬プの処理は中央処理装置ｌで行りている。し
たがって、サービス・プロセッサ３は低速で低コストの
ハードウェアで実現しているのが一般である。このため
、従来方式ではレジスタ・ダンプ等に時間がかかってし
まうという欠点があも〔発明の目的〕本発明は、上記の欠点を除去するものであって。Direct zip processing is performed by the central processing unit 1. Therefore, the service processor 3 is generally implemented using low-speed, low-cost hardware. Therefore, the conventional method has the disadvantage that it takes time to perform register dumping, etc. [Object of the Invention] The present invention is intended to eliminate the above-mentioned disadvantage.

緊急！シンチェッ）割込発生時における各種レジスタの
主記憶へのダンプおよびＭＣＩＣの主記憶へのストアを
高速で効率よく行い得るようにしたマシンチェック処理
方式を提供することを目的としている。emergency! An object of the present invention is to provide a machine check processing method that enables dumping of various registers to main memory and storage of MCIC to main memory at high speed and efficiency when an interrupt occurs.

[Structure of the invention]

そしてそのため１本発明のマシンチェック処理方式は、
中央処理装置と、主記憶装置と、上記中央処理装置をパ
トロールする監視装置とを有する計算機システムにおい
て、上記監視装置が、上記中央処理装置の停止を検出し
友とき、上記中央処理装置の各部のスキャン・アクトを
行い、スキャン・アウト結果に基づいてエラー解析を行
い、エラー解析結果に従ってマシンチェックｍ込コード
を作成し、上記主記憶へログアウトすべき各種レジスタ
類のうち内容の保障できないものを示すマシンチェック
割込コード修正情報を作成し、しかる後に上記中央処理
装置を起動するように構成され、上記中央処理装置が、
起動され友とき上記レジスタ類の上記主記憶へのログア
ウトを行い、成功したログアウト・データに対応する上
記讐シンチェック割込コードの有効ビットをオンにし、
上記マシンチェック割込コード修正情報と上記マシンチ
ェック割込コードとの画壇演算を行って新のマシンチェ
ック割込コードを作成し、しかる後、上記新のマシンチ
ェック割込コードを上記主記憶の所定の記憶域にストア
するよう構成されていることを特徴とするものである。Therefore, the machine check processing method of the present invention is as follows:
In a computer system having a central processing unit, a main storage device, and a monitoring device that patrols the central processing unit, when the monitoring device detects a stoppage of the central processing unit, it monitors each part of the central processing unit. Performs a scan act, performs error analysis based on the scan-out result, creates a machine check m-in code according to the error analysis result, and indicates the contents of the various registers that should be logged out to the main memory whose contents cannot be guaranteed. The system is configured to create machine check interrupt code correction information and then start up the central processing unit, and the central processing unit:
When activated, the registers are logged out to the main memory, and the enable bit of the interrupt check interrupt code corresponding to the successful logout data is turned on.
A new machine check interrupt code is created by performing an arithmetic operation on the machine check interrupt code correction information and the machine check interrupt code, and then the new machine check interrupt code is stored in a predetermined location in the main memory. It is characterized in that it is configured to be stored in the storage area of .

[Embodiments of the invention]

以下、本発明を図面を参照しつつ説明する。 Hereinafter, the present invention will be explained with reference to the drawings.

第４図は本発明の一実施例を示すものである。FIG. 4 shows an embodiment of the present invention.

サービス・プロセッサ３は、中央処理装置１をパトロー
ルし、中央処理装置ｌの停止を検出すると。The service processor 3 patrols the central processing unit 1 and detects that the central processing unit l has stopped.

中央処理装置１についてのスキャン・ログヲ収集し、エ
ラー解析を行う。エラー解析を行り友後、サービス・プ
ロセッサ３は予めＭＣＩＣを生成する。ＭＣＩＣを生成
した後、サービス中プロセッサ３は、ＭＣＩＣ修正情報
を作成する。ＭＣＩＣ修正情報は、第２図のＷＰ、Ｍ８
％ＰＭ、ＩＡ。Collect scan logs for the central processing unit 1 and perform error analysis. After performing error analysis, the service processor 3 generates the MCIC in advance. After generating the MCIC, the in-service processor 3 creates MCIC modification information. MCIC correction information is WP, M8 in Figure 2.
%PM, IA.

ＦＡ％ＲＣ，ＥＣ，ＦＰ、ＧＲ・、ＣＲ，ＬＧなどに対
応したビットを有しており、内容の保障できないレジス
タ類の有効ビット位置は論理「１」とされている。ＭＣ
Ｉ　Ｃ修正情報を生成した後、サービス・プロセッサ３
は、中央処理装置１を有効゛にし、これを起動する。中
央処理装置ｌは、起動されると、汎用レジスタ、浮動小
数点レジスタ、制御レジスタなどを主記憶２にログ・ア
ウトし、成功したログ・アウト・データに対するＭＣＩ
Ｃ内の有効ビットをオンにする。次いで、ＭＣＩＣ修正
情報とＭＣＩ　Ｃ＋２）ＡＮＤ−ＩＮＶＥＲ８Ｅ論ｍｖ
とり、これをｔｒＭｃ　Ｉ　Ｃとし、主記憶２の定めら
れた記憶域にストアする。ＡＮＤ−ＩＮＶＥＲ８Ｅ論理
をとることにより内容の保障できないログ・データの有
効ビットはリセットされる。主記憶２の足められた記憶
域にＭＣＩ　Ｃ１にストアした後、中央処理装置ｌはＰ
Ｓｗのスワップを行い、次命令を実行する。なｅ、ＡＮ
Ｄ−ＩＮＶＥＲ８ＫＩｌ＊Ｍ　（ＡＮＤｉ）はＡＡＢ→Ａ　である。It has bits corresponding to FA%RC, EC, FP, GR., CR, LG, etc., and the effective bit position of registers whose contents cannot be guaranteed is set to logic "1". M.C.
After generating the IC modification information, the service processor 3
enables the central processing unit 1 and starts it. When the central processing unit l is started, it logs out general-purpose registers, floating point registers, control registers, etc. to the main memory 2, and registers the MCI for successfully logged out data.
Turn on the valid bit in C. Next, MCIC modification information and MCI C+2) AND-INVER8E theory mv
This is set as trMc I C and stored in a predetermined storage area of the main memory 2. By applying the AND-INVER8E logic, the valid bit of log data whose contents cannot be guaranteed is reset. After storing in MCI C1 in the additional memory area of main memory 2, central processing unit l stores P
Swap SW and execute the next instruction. Nae, AN
D-INVER8KIl*M (ANDi) is AAB→A.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれば、緊急
マシンチェック割込み発生時に行うべき処理を効率よく
且つ短時間で行うことができる。As is clear from the above description, according to the present invention, processing to be performed when an emergency machine check interrupt occurs can be performed efficiently and in a short time.

【図面の簡単な説明】第１図は本発明が適用される計算機システムの概要を示
す図、第２図はマシンチェック割込コード（ＭＣＩＣと
いう）の構成を示す図、第３図は従来の緊急マシンチェ
ック処理方式を説明する図、第４図は本発明の一実施例
を示す図である。ｌ・・・中央処理装置、２・・・主記憶装置、３川サー
ビス・プロセッサ。特許出願人　富士通株式会社代理人弁理士　京　谷　四　部[Brief Description of the Drawings] Figure 1 is a diagram showing an overview of a computer system to which the present invention is applied, Figure 2 is a diagram showing the configuration of a machine check interrupt code (MCIC), and Figure 3 is a diagram showing a conventional computer system. FIG. 4, which is a diagram explaining the emergency machine check processing method, is a diagram showing an embodiment of the present invention. 1...Central processing unit, 2...Main storage device, 3 service processors. Patent Applicant: Fujitsu Limited Representative Patent Attorney Yotsube Kyotani

Claims

[Claims]

In a computer system having a central processing unit, a main memory, and a monitoring device that parses the central processing unit, the monitoring device detects a stoppage of the central processing unit. 1. Scan out each part of the above central processing unit, perform error analysis based on the scanning scan results, create a machine check interrupt code according to the error analysis results, and log-act to the main memory. is configured to create machine check interrupt code correction information indicating that the contents are not guaranteed among the types, and then start up the central processing unit, and when the central processing unit is started up, the above registers are logout to the above main memory, turn on the valid bit of the above machine check interrupt code corresponding to the successful logout data, and check the logic between the above machine check interrupt code modification information and the above machine check interrupt code. The apparatus is characterized in that it is configured to perform an operation to create a new machine check interrupt code, and then store the new machine check interrupt code in a predetermined storage area of the main memory. Machine check processing method 0