JP2003022222A

JP2003022222A - Information processor and its maintenance method

Info

Publication number: JP2003022222A
Application number: JP2001205571A
Authority: JP
Inventors: Yuji Fujiwara; 勇治藤原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2001-07-06
Filing date: 2001-07-06
Publication date: 2003-01-24

Abstract

PROBLEM TO BE SOLVED: To provide an information processor capable of specifying a device, in which error is detected on a PCI bus, and improving maintenability accompanying error processing. SOLUTION: In this information processor having a plurality of devices interconnected through buses 60 and 70, the device is provided with a detecting means 230 for detecting an error at the time of executing the transaction of the bus and recoding means 170 and 200 for recording the kind of the error and the information of the device which detect the error as an error event in a recording medium 240 according to the detected result of the detecting means.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、情報処理装置に
係わり、特に、情報処理装置内でエラーが発生した際の
エラーロギング機能を強化し、保守性を向上されたエラ
ーロギング処理に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus, and more particularly, to an error logging process in which an error logging function when an error occurs in the information processing apparatus is strengthened and maintainability is improved.

【０００２】[0002]

【従来の技術】近年のＰＣサーバやワークスステーショ
ンは、マザーボード上にシステムバスとして、３２ビッ
トのデータバス幅を有するＰＣＩ（Ｐｅｒｉｐｈｅｒａ
ｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ）
とよばれる高速同期型バスを実装している。ＰＣＩバス
上のデバイス、または、マザーボード上の拡張スロット
に接続されたＰＣＩバス上のデバイスは、ＰＣＩバス上
で発生した事象を処理するように定められている。ＰＣ
Ｉバス上のアドレス・データ信号や制御信号を使って伝
送される情報に誤りが発生していなかを検出する為、Ｐ
ＣＩバス上にパリティ線が用意されている。特に、ＰＣ
Ｉデバイスは、ＰＣＩバス上でアドレス・パリティエラ
ーまたはデータ・パリティエラーを検出した時、ＰＥＲ
Ｒ（ＰａｒｉｔｙＥｒｒｏｒ）信号をＰＣＩバス上に
発行するように定められている。又、ＰＣＩデバイス
は、ＰＣＩバス上で致命的なシステムエラーを検出した
時、ＳＥＲＲ（ＳｙｓｔｅｍＥｒｒｏｒ）信号をＰＣ
Ｉバス上に出力する。ＰＣＩ上でＰＥＲＲ信号又はＳＥ
ＲＲ信号が検出された場合、ＰＣＩバスを管理するデバ
イスは、ＣＰＵに対してマスク不可能な割り込み信号
（以下、ＮＭＩと称す。）を発生し、ＳＥＬと呼ばれる
不揮発性メモリにエラー発生の事実のみを記憶してい
た。即ち、ＰＣＩバス上でＰＥＲＲ信号が検出した場
合、不揮発性メモリにＰＥＲＲ信号の発生を示すデータ
のみが記憶されていた。又、同様に、ＰＣＩバス上でＳ
ＥＲＲ信号が検出された場合、不揮発性メモリにＳＥＲ
Ｒ信号の発生を示すデータのみが記憶されていた。保守
要員が、後日、当該不揮発性メモリに記憶された値を読
み出し、どのエラーがどのように発生したかを詳細に解
析していた。2. Description of the Related Art Recent PC servers and workstations have a PCI (Periphera) having a 32-bit data bus width as a system bus on a motherboard.
l Component Interconnect)
It is equipped with a high-speed synchronous bus called. A device on the PCI bus or a device on the PCI bus connected to an expansion slot on the motherboard is defined to handle events that occur on the PCI bus. PC
In order to detect whether an error has occurred in the information transmitted using the address / data signal or control signal on the I bus, P
A parity line is prepared on the CI bus. Especially PC
When the I-device detects an address parity error or a data parity error on the PCI bus,
It is defined to issue an R (Parity Error) signal on the PCI bus. When the PCI device detects a fatal system error on the PCI bus, the PCI device sends a SERR (System Error) signal to the PC.
Output on I bus. PERR signal or SE on PCI
When the RR signal is detected, the device that manages the PCI bus generates a non-maskable interrupt signal (hereinafter referred to as NMI) to the CPU, and only the fact that an error occurs in the nonvolatile memory called SEL. I remembered. That is, when the PERR signal is detected on the PCI bus, only the data indicating the generation of the PERR signal is stored in the non-volatile memory. Similarly, S on the PCI bus
When the ERR signal is detected, the SER is stored in the nonvolatile memory.
Only the data indicating the generation of the R signal was stored. The maintenance staff later read the value stored in the nonvolatile memory and analyzed in detail what error occurred and how.

【０００３】また、特開平１１−１６４２０号公報に
は、メモリのパリティエラーやシステムバスで発生した
回復不可能なエラーロギング処理する技術を開示してい
る。この従来技術では、エラーが検出された場合、ＣＰ
Ｕにシステム管理割込み（以下、ＳＭＩと称す。）信号
を通知する。ＳＭＩ信号で起動されるエラー処理ルーチ
ンは、エラーが発生した際のＣＰＵのアドレス値が保持
されたラッチ回路からデータを読み出し、不揮発性メモ
リにエラー発生アドレスを記憶する。保守要員が、後
日、当該不揮発性メモリに記憶された値を読み出し、ど
のエラーがどのように発生したかを詳細に解析してい
た。Further, Japanese Laid-Open Patent Publication No. 11-16420 discloses a technique for performing a non-recoverable error logging process for a memory parity error or a system bus. In this conventional technique, when an error is detected, the CP
The U is notified of a system management interrupt (hereinafter referred to as SMI) signal. The error processing routine started by the SMI signal reads the data from the latch circuit that holds the address value of the CPU when the error occurs, and stores the error occurrence address in the nonvolatile memory. The maintenance staff later read the value stored in the nonvolatile memory and analyzed in detail what error occurred and how.

【０００４】しかしながら、従来技術では、ＰＣＩバス
上でエラーが発生した時、エラー発生の事実のみを不揮
発性メモリに記憶する。また、特開平１１−１６４２０
号公報のエラーロギング処理は、ＣＰＵのアドレス値を
不揮発性メモリに記憶する。従って、保守要員が、どの
デバイスがエラーを発生し、検出したのか特定すること
が困難であった。エラーの発生箇所を特定できなかった
場合、マザーボード全部、又は、拡張スロットに実装さ
れた拡張ボード全部を取り換える必要があった。However, in the prior art, when an error occurs on the PCI bus, only the fact of the error occurrence is stored in the non-volatile memory. In addition, JP-A-11-16420
In the error logging process of the publication, the address value of the CPU is stored in the non-volatile memory. Therefore, it is difficult for the maintenance personnel to specify which device has generated the error and has detected the error. If the location of the error could not be identified, it was necessary to replace the entire motherboard or the entire expansion board installed in the expansion slot.

【０００５】[0005]

【発明が解決しようとする課題】上記した従来技術で
は、システムバス上で発生したエラーの事実程度しか不
揮発性メモリに記憶できていなかったので、どのデバイ
スがエラーを発生したかを分からず、エラーの原因を追
求することができなかったという問題があった。In the above-mentioned prior art, since only the fact that an error occurred on the system bus can be stored in the non-volatile memory, it is not possible to know which device has generated the error and the error occurs. There was a problem that we could not pursue the cause of.

【０００６】そこで、この発明は上記の問題を解決する
ためになされたものであり、本発明の情報処理装置は、
エラーを検出したデバイスを特定し、エラー処理に伴う
保守性の向上を提供することを目的とする。Therefore, the present invention has been made to solve the above problems, and the information processing apparatus of the present invention is
The purpose of the present invention is to identify the device that has detected an error and to improve the maintainability associated with error processing.

【０００７】[0007]

【課題を解決するための手段】この発明は、バスを介し
て接続される複数のデバイスを有する情報処理装置に於
いて、前記デバイスは、バスのトランザクションの際、
エラーを検出する検知手段を具備し、更に、前記検出手
段の検出結果に従い、エラーの種類と共に、エラーを検
出したデバイスの情報をエラーイベントとして記録媒体
に記録する記録手段とを具備したことを特徴とする。SUMMARY OF THE INVENTION The present invention is an information processing apparatus having a plurality of devices connected via a bus, wherein the devices are:
The recording medium further comprises a detection unit for detecting an error, and further includes a recording unit for recording the information of the device in which the error is detected as an error event on a recording medium according to the detection result of the detection unit. And

【０００８】このような構成にするからこそ、エラーを
検出したデバイスを特定し、エラー処理に伴う保守性の
向上することができる。With such a structure, it is possible to identify the device in which the error is detected and improve the maintainability associated with the error processing.

【０００９】また、この発明は、バスを介して接続され
た複数のデバイスを有し、エラーの種類と共に、エラー
を検出したデバイスの情報をエラーイベントとして記録
される記録媒体を有する情報処理装置に於いて、前記記
憶媒体に記録されているエラーイベントを読み出すステ
ップと、前記読み出しステップの読み出し結果に従い、
エラーを検出したデバイスを特定するステップとを具備
したことを特徴とする。Further, the present invention is directed to an information processing apparatus having a plurality of devices connected via a bus, and having a recording medium in which the information of the device in which an error is detected is recorded as an error event together with the type of error. In the step of reading the error event recorded in the storage medium, and the reading result of the reading step,
And a step of identifying a device that has detected an error.

【００１０】このような構成にするからこそ、エラーを
検出したデバイスを特定し、エラー処理に伴う保守性の
向上することができる。Because of such a configuration, it is possible to identify the device in which the error is detected and improve the maintainability associated with the error processing.

【００１１】また、更に、この発明は、バスを介して接
続された複数のデバイスを有する情報処理装置に於い
て、エラーの種類と共に、エラーを検出したデバイスの
情報をエラーイベントとして記録される記録媒体と、前
記記憶媒体に記録されているエラーイベントを読み出す
手段と、前記読み出し手段の読み出し結果に従い、エラ
ーを検出したデバイスを特定する手段とを具備したこと
を特徴とする。Further, according to the present invention, in an information processing apparatus having a plurality of devices connected via a bus, the type of error and the information of the device in which the error is detected are recorded as an error event. It is characterized by comprising a medium, a unit for reading out an error event recorded in the storage medium, and a unit for specifying a device in which an error is detected according to a reading result of the reading unit.

【００１２】このような構成にするからこそ、エラーを
検出したデバイスを特定し、エラー処理に伴う保守性の
向上することができる。Because of such a configuration, it is possible to identify the device in which the error is detected and improve the maintainability associated with the error processing.

【００１３】[0013]

【発明の実施の形態】以下、図面を参照してこの発明の
一実施形態を説明する。DETAILED DESCRIPTION OF THE INVENTION An embodiment of the present invention will be described below with reference to the drawings.

【００１４】図１には、この発明の一実施形態に係わる
情報処理装置の構成が示される。この情報処理装置は、
ワークステーションタイプ、または、サーバタイプのコ
ンピュータシステムであり、そのマザーボード上には、
ホストバス３０、ＰＣＩバス６０、７０、メモリバス８
０、システム管理バス（以下、ＳＭバスと称す。）１４
０、業界標準アーキテクチャバス（以下、ＩＳＡバスと称
す。）１５０及び内部統合回路バス（以下、Ｉ２Ｃバス
と称す。）２５０が配線される。FIG. 1 shows the configuration of an information processing apparatus according to an embodiment of the present invention. This information processing device
A workstation-type or server-type computer system whose motherboard has
Host bus 30, PCI buses 60, 70, memory bus 8
0, system management bus (hereinafter referred to as SM bus) 14
0, an industry standard architecture bus (hereinafter referred to as ISA bus) 150 and an internal integrated circuit bus (hereinafter referred to as I2C bus) 250 are wired.

【００１５】このコンピュータ本体内には、ＣＰＵ１
０、２０、ＣＰＵ−ＰＣＩブリッジ装置（以下、ノース
ブリッジと称す。）４０、主メモリ５０、表示装置９
０、ＲＡＩＤ装置１００、ＬＡＮ装置１１０、カードバ
ス装置１２０、ＰＣＩ−ＩＳＡブリッジ装置（以下、サウ
スブリッジと称す。）１３０、各種入出力周辺装置をシ
ステム統合したＧＡ１６０、ＢＩＯＳ−ＲＯＭ１７０、
キーボード装置（以下、ＫＢと称す。）１８０、フロッ
ピディスクドライブ装置（以下、ＦＤＤと称す。）１９
０、マザーボード管理コントローラ（以下、ＢＭＣと称
す。）２００、フィールドリプレイスメントユニット
（以下、ＦＲＵと称す）２１０、センサデータ記憶装置
（以下、ＳＤＲと称す。）２２０、センサ２３０、システ
ムイベントログ（以下、ＳＥＬと称す。）２４０などが
設けられる。In the main body of the computer, a CPU 1
0, 20, CPU-PCI bridge device (hereinafter referred to as north bridge) 40, main memory 50, display device 9
0, a RAID device 100, a LAN device 110, a card bus device 120, a PCI-ISA bridge device (hereinafter referred to as a south bridge) 130, a GA 160 in which various input / output peripheral devices are system-integrated, a BIOS-ROM 170,
Keyboard device (hereinafter referred to as KB) 180, floppy disk drive device (hereinafter referred to as FDD) 19
0, a motherboard management controller (hereinafter referred to as BMC) 200, a field replacement unit (hereinafter referred to as FRU) 210, a sensor data storage device (hereinafter referred to as SDR) 220, a sensor 230, a system event log (hereinafter referred to as 240) and the like.

【００１６】次に、図１のコンピュータ本体に設けられ
た各コンポーネントの機能及びその構成について説明す
る。Next, the function and configuration of each component provided in the computer main body of FIG. 1 will be described.

【００１７】ＣＰＵ１０、２０は、例えば、米インテル
社によって製造販売されているマイクロプロセッサなど
によって実現される。このＣＰＵ１０、２０の入出力ピ
ンに直結されているホストバス３０は、１３３Ｍｈｚの
バンド幅を有する６４ビット幅のデータバスを有する。
更に、ＣＰＵ１０、２０は、ＳＭＩ信号を入力するピン
を有する。The CPUs 10 and 20 are realized by, for example, microprocessors manufactured and sold by Intel Corporation of the United States. The host bus 30 directly connected to the input / output pins of the CPUs 10 and 20 has a 64-bit width data bus having a bandwidth of 133 Mhz.
Further, the CPUs 10 and 20 have pins for inputting SMI signals.

【００１８】主メモリ５０は、オペレーティングシステ
ム、デバイスドライバ、実行対象のアプリケーションプ
ログラム及び処理データなどを格納するメモリデバイス
であり、複数のデュアル・インライ・メモリ・モジュー
ル（以下、ＤＩＭＭと称す。）によって構成される。こ
の主メモリ５０は、マザーボード上に予め実装されてい
るシステムメモリと、ユーザによって必要に応じて装着
される拡張メモリとから構成される。これらシステムメ
モリ及び拡張メモリを構成するＤＩＭＭとしては、シン
クロナスＤＲＡＭやＲａｍｂｕｓなど、バンク毎にメモ
リクロックの供給が必要な高速メモリが利用される。The main memory 50 is a memory device that stores an operating system, a device driver, an application program to be executed, processing data, etc., and is composed of a plurality of dual in-memory modules (hereinafter referred to as DIMMs). To be done. The main memory 50 is composed of a system memory mounted on the motherboard in advance and an expansion memory mounted by the user as needed. As the DIMMs forming the system memory and the extended memory, a high-speed memory such as a synchronous DRAM or a Rambus that needs to be supplied with a memory clock for each bank is used.

【００１９】この主メモリ５０は、１３３Ｍｈｚのバン
ド幅を有する６４ビット幅のデータバスを有する専用の
メモリバス８０を介してノースブリッジ４０に接続され
る。メモリバス８０のデータバスとしては、ホストバス
３０のデータバスを利用することも出来る。この場合、
メモリバス８０は、アドレスバスと各種メモリ制御信号
線とから構成される。The main memory 50 is connected to the north bridge 40 via a dedicated memory bus 80 having a 64-bit data bus having a bandwidth of 133 Mhz. The data bus of the host bus 30 can also be used as the data bus of the memory bus 80. in this case,
The memory bus 80 is composed of an address bus and various memory control signal lines.

【００２０】ノースブリッジ４０は、ホストバス３０と
二つのＰＣＩバス６０、７０との間を繋ぐブリッジＬＳ
Ｉであり、ＰＣＩバス６０、７０のバスマスタの１つと
して機能する。このノースブリッジ４０は、ＰＣＩバ６
０、７０に接続されたデバイス間のバス調停回路、ホス
トバス４０とＰＣＩバス６０、７０との間で、データ及
びアドレスを含むバスサイクルを双方向で変換する機能
及びメモリバス８０を介して主メモリ５０をアクセス制
御する機能などを有する。The north bridge 40 is a bridge LS that connects the host bus 30 and the two PCI buses 60 and 70.
I, and functions as one of the bus masters of the PCI buses 60 and 70. This north bridge 40 is a PCI bus 6.
0 and 70, a bus arbitration circuit between devices, a function of bidirectionally converting a bus cycle including data and an address between the host bus 40 and the PCI buses 60 and 70, and a main memory bus 80. It has a function of controlling access to the memory 50.

【００２１】ＰＣＩバス６０、７０はクロック同期型の
入出力バスであり、ＰＣＩバス６０、７０上の全てのバ
スサイクルはバスクロックに同期して行われる。ＰＣＩ
バスクロックの周波数は、最大３３ＭＨｚである。ＰＣ
Ｉバス６０、７０は、時分割的に使用されるアドレス／
データバスを有している。このアドレス／データバス
は、３２ビット幅である。The PCI buses 60 and 70 are clock synchronous type input / output buses, and all bus cycles on the PCI buses 60 and 70 are performed in synchronization with the bus clock. PCI
The maximum frequency of the bus clock is 33 MHz. PC
The I buses 60 and 70 are addresses / time-divisionally used.
It has a data bus. This address / data bus is 32 bits wide.

【００２２】１９９８年１２月１８日、発行のＰＣＩ規
格書Ｒｅｖ２．２に従えば、ＰＣＩバス６０、７０上の
ＰＣＩデバイス（イニシエータとターゲット）間のデー
タ転送サイクルは、アドレスフェーズとそれに後続する
１以上のデータフェーズとから構成される。アドレスフ
ェーズに於いては、アドレス及び転送タイプが出力さ
れ、データフェーズでは８ビット、１６ビット、２４ビ
ットまたは３２ビットのデータが出力される。According to the PCI standard Rev. 2.2 issued on December 18, 1998, the data transfer cycle between the PCI devices (initiator and target) on the PCI buses 60 and 70 is the address phase and the subsequent 1 It consists of the above data phases. In the address phase, the address and transfer type are output, and in the data phase, 8-bit, 16-bit, 24-bit or 32-bit data is output.

【００２３】ＰＣＩバス６０に接続された表示装置９０
は、ビデオメモリ（以下、ＶＲＡＭと称す。）を内蔵
し、主メモリ５０に展開された画像データをＶＲＡＭに
格納し、該画像データを図示しないＬＣＤや外部のＣＲ
Ｔディスプレイに表示する。Display device 90 connected to PCI bus 60
Includes a video memory (hereinafter referred to as VRAM), stores the image data expanded in the main memory 50 in the VRAM, and stores the image data in an LCD (not shown) or an external CR.
Display on the T display.

【００２４】ＰＣＩバス６０に接続されたＲＡＩＤ装置
１００は、アレイ状に構成された複数のハードディスク
ドライブ（以下、ＨＤＤと称す。）を制御するＲＡＩＤ
コントローラを内蔵する。ＲＡＩＤコントローラは、Ｈ
ＤＤの故障対策として、元のデータを修復するための冗
長情報をアレイを構成するＨＤＤに記憶する。更に、Ｒ
ＡＩＤ装置１００は、コンピュータシステムのオペレー
ティングシステムやアプリケーションプログラム及びデ
ータを記憶する。The RAID device 100 connected to the PCI bus 60 is a RAID that controls a plurality of hard disk drives (hereinafter referred to as HDDs) arranged in an array.
Built-in controller. RAID controller is H
As a measure against a DD failure, redundant information for restoring the original data is stored in the HDD forming the array. Furthermore, R
The AID device 100 stores an operating system of a computer system, application programs, and data.

【００２５】ＰＣＩバス７０に接続されたＬＡＮ装置１
１０は、パケットデータを１０Ｍｂｐｓ／１００Ｍｐｂ
ｓの非同期データ転送を制御する。ＬＡＮ装置１１０
は、図示しないＲＪ４５接続口を介して外部ＬＡＮ回線
に接続される。LAN device 1 connected to PCI bus 70
10 indicates packet data of 10 Mbps / 100 Mbps
Control the asynchronous data transfer of s. LAN device 110
Is connected to an external LAN line via an RJ45 connection port (not shown).

【００２６】ＰＣＩバス７０に接続されたカードバス装
置１２０は、図示しないＰＣカードを挿抜する為、複数
のスロットを有し、ＰＣカードの各種設定やＰＣＩバス
７０とＰＣカード間のデータ転送の為のインタータフェ
ースとして機能する。The card bus device 120 connected to the PCI bus 70 has a plurality of slots for inserting and removing a PC card (not shown), and for various settings of the PC card and data transfer between the PCI bus 70 and the PC card. Function as an interface.

【００２７】サウスブリッジ１３０は、ＰＣＩバス６０
とＳＭバス１４０とＩＳＡバス１５０との間を繋ぐブリ
ッジＬＳＩである。このサウスブリッジ１３０には、Ｓ
Ｍバス１４０とＩＳＡバス１５０が接続され、各種バス
間のインターフェースとして機能する。また、サウスブ
リッジ１３０は、ＰＣＩバス６０、７０上のＰＥＲＲ信
号やＳＥＲＲ信号の検出に伴ない、ＳＭＩ信号をＣＰＵ
１０、２０に発行する回路を内蔵する。The south bridge 130 is a PCI bus 60.
Is a bridge LSI that connects the SM bus 140 and the ISA bus 150. This South Bridge 130 has S
The M bus 140 and the ISA bus 150 are connected and function as an interface between various buses. In addition, the south bridge 130 sends the SMI signal to the CPU upon detection of the PERR signal or the SERR signal on the PCI buses 60 and 70.
It has a built-in circuit for issuing to 10 and 20.

【００２８】ＩＳＡバス１５０には、ＫＢＣやＦＤＣ等
の各種入出力回路をインテグレーションされたＧＡ１６
０や、この発明の一実施形態であるエラーロギング処理
プログラムやコンピュータのコンフィグレーション設定
のためのプログラムが格納されるＢＩＯＳ−ＲＯＭ１７
０が接続される。ＫＢＣは、データ入力の為のＫＢ１８
０の制御を実行する。ＦＤＤ１９０には、この発明の一
実施形態である保守ツールプログラムが格納されたＦＤ
が装填可能である。The ISA bus 150 has a GA 16 integrated with various input / output circuits such as KBC and FDC.
0, and the BIOS-ROM 17 in which the error logging processing program according to the embodiment of the present invention and the program for setting the computer configuration are stored.
0 is connected. KBC is KB18 for data input
The control of 0 is executed. The FDD 190 stores an FD that stores a maintenance tool program according to an embodiment of the present invention.
Can be loaded.

【００２９】ＳＭバス１４０は、クロック信号とデータ
／アドレス線を有するシリアルバスであり、ＢＭＣ２０
０に接続される。サウスブリッジ１３０は、ＢＩＯＳ−
ＲＯＭ１８０に記録されたエラー処理プログラムの制御
の下、ＳＭバス１４０を介して、ＢＭＣ２００との間で
通信を実行し、ＰＣＩバス６０、７０上で発生したエラ
ー情報をＢＭＣ２００に通知する。１９９９年１１月１
５日発行のインテリジェント・プットフォーム・マネー
ジメント・インターフェース規格書（以下、ＩＰＭＩと
称す。）Ｒｅｖ．１．１には、ＢＭＣ２００へのアクセ
ス方法（プロトコル）が開示される。この発明の一実施
形態であるエラーロギング処理の実行は、後程、詳細に
説明する。更に、Ｉ２Ｃバス２５０に接続されたＢＭＣ
２００は、ＳＭバス１４０とＩ２Ｃバスとの間のインタ
ーフェースとして機能する。The SM bus 140 is a serial bus having a clock signal and a data / address line, and is a BMC 20.
Connected to 0. The south bridge 130 is a BIOS-
Under the control of the error processing program recorded in the ROM 180, communication is performed with the BMC 200 via the SM bus 140, and the BMC 200 is notified of error information generated on the PCI buses 60 and 70. November 1, 1999
Intelligent Putform Management Interface Standards (hereinafter referred to as IPMI) Rev. 5 issued 1.1 discloses a method (protocol) for accessing the BMC 200. The execution of the error logging process, which is an embodiment of the present invention, will be described in detail later. In addition, the BMC connected to the I2C bus 250
The 200 functions as an interface between the SM bus 140 and the I2C bus.

【００３０】Ｉ２Ｃバス２５０は、１本のクロック信号
線と１本のデータ線から構成される双方向バスであり、
ＦＲＵ２１０、ＳＤＲ２２０、センサ２３０、及びＳＥ
Ｌ２４０に接続される。The I2C bus 250 is a bidirectional bus composed of one clock signal line and one data line,
FRU210, SDR220, sensor 230, and SE
It is connected to L240.

【００３１】ＦＲＵ２１０は、シリアルバス接続タイプ
のＥＥＰＲＯＭであり、コンピュータを構成する各モジ
ュールの情報（マザーボードや各種デバイスの種類）を
記述する為、メーカ番号やシリアル番号等のベンダＩＤ
とデバイスＩＤを表す情報が製造時記憶される。この発
明の一実施形態である保守ツールプログラムの実行時、
ＦＲＵ２１０に格納されたシリアル番号を参照し、ＰＣ
Ｉバス６０、７０上の故障デバイスを特定する。The FRU 210 is a serial bus connection type EEPROM, and in order to describe information (types of mother board and various devices) of each module constituting the computer, a vendor ID such as a maker number and a serial number.
Information indicating the device ID is stored at the time of manufacture. When the maintenance tool program according to the embodiment of the present invention is executed,
Referring to the serial number stored in FRU210, PC
Identify the failing device on the I-bus 60, 70.

【００３２】ＳＤＲ２２０は、シリアルバス接続タイプ
のＥＥＰＲＯＭであり、ＢＭＣ２２０が管理しているセ
ンサの種類（温度、電圧等）や、異常を識別する為の閾
値などが製造時記憶される。The SDR 220 is a serial bus connection type EEPROM, and stores the types of sensors (temperature, voltage, etc.) managed by the BMC 220, threshold values for identifying abnormalities, and the like during manufacturing.

【００３３】センサ２３０は、マザーボード上の電圧回
路からの出力電圧（±１２、±５、＋３．３等）、ＰＣ
Ｉバス６０、７０上で発行されたＳＥＲＲ信号／ＰＥＲ
Ｒ信号の発行やＣＰＵ等の温度を監視する。このセンサ
２３０は、ＢＭＣ２００により、所定間隔でポーリング
される。The sensor 230 is an output voltage (± 12, ± 5, +3.3, etc.) from the voltage circuit on the motherboard, PC
SERR signal / PER issued on I-bus 60, 70
Issuance of R signal and temperature of CPU etc. are monitored. The sensor 230 is polled by the BMC 200 at predetermined intervals.

【００３４】ＳＥＬ２４０は、シリアルバスタイプのＥ
ＥＰＲＯＭであり、ＢＭＣ２００により、マザーボード
上で異常を検出した場合、又は、センサ２３０で閾値を
超えるエラーを検出した場合、エラー情報が記憶され
る。The SEL 240 is a serial bus type E
It is an EPROM and stores error information when the BMC 200 detects an abnormality on the motherboard or when the sensor 230 detects an error exceeding a threshold value.

【００３５】この発明の一実施形態として、ＰＣＩバス
上のデバイスは、ノースブリッジ４０、表示装置９０、
ＲＡＩＤ装置１００、ＬＡＮ装置１１０、カードバス装
置１２０、及びサウスブリッジ１３０である。各ＰＣＩ
デバイスに共通の機能を以下の通りに説明する。As one embodiment of the present invention, the devices on the PCI bus are the north bridge 40, the display device 90,
The RAID device 100, the LAN device 110, the card bus device 120, and the south bridge 130. Each PCI
Functions common to the devices are described as follows.

【００３６】各ＰＣＩデバイスは、デバイス識別レジス
タ及びステータスレジスタ等のコンフィグレーションレ
ジスタを内蔵し、当該レジスタをアクセスするためのコ
ンフィグレーション空間が２５６バイトほど設けられ
る。デバイス識別レジスタは、コンフィグレーション空
間のアドレス００ｈｅｘと０２ｈｅｘに配置され、ベン
ダＩＤとデバイスＩＤの値を格納する。ベンダＩＤと
は、ＰＣＩデバイスの製造メーカ示す。デバイスＩＤ
は、ベンダＩＤで指定されるメーカが製造したＰＣＩデ
バイス同士を識別する。当該ステータスレジスタは、コ
ンフィグレーション空間のアドレス０６ｈｅｘに配置さ
れ、ＰＣＩバス上で発生した事象を記録する為に使用さ
れる。ＰＣＩバス上で回復不可能なシステムエラーが検
出されて、ＰＣＩデバイスがＳＥＲＲ信号を出力した
時、当該ステータスレジスタのビット１４の値を１にセ
ットする。ＰＣＩデバイスがデータパリティ又はアドレ
スパリティエラーを検出した時、当該ステータスラスレ
ジスタのビット１５の値を１にセットする。Each PCI device contains a configuration register such as a device identification register and a status register, and a configuration space for accessing the register is provided with about 256 bytes. The device identification register is arranged at addresses 00hex and 02hex in the configuration space and stores the values of the vendor ID and the device ID. The vendor ID indicates the manufacturer of the PCI device. Device ID
Identifies PCI devices manufactured by the manufacturer specified by the vendor ID. The status register is located at address 06hex in the configuration space and is used to record the event that occurred on the PCI bus. When an unrecoverable system error is detected on the PCI bus and the PCI device outputs the SERR signal, the value of bit 14 of the status register is set to 1. When the PCI device detects a data parity or address parity error, it sets the value of bit 15 of the status register to 1.

【００３７】更に、ＰＣＩデバイスは、コンフィグレー
ション空間の０４ｈｅｘにコマンドレジスタを配置す
る。このコマンドレジスタのビット６は、パリティエラ
ー応答と呼ばれ、パリティ・エラーが発生した時のデバ
イスの動作を制御する。このコマンドレジスタのビット
６の値が１の時、ＰＣＩデバイスはデータ又はアドレス
パリティエラーに応答して、ＰＥＲＲ信号をＰＣＩバス
上に発行する。このコマンドレジスタのビット８は、Ｓ
ＥＲＲイネーブルと呼ばれ、ＰＣＩデバイスがＳＥＲＲ
信号をドライブするかどうかを制御する。このコマンド
レジスタのビット８の値が１の時、ＳＥＲＲ信号発生の
ための駆動が許可される。Further, the PCI device arranges a command register at 04hex in the configuration space. Bit 6 of this command register, called the Parity Error Response, controls the operation of the device when a parity error occurs. When the value of bit 6 of this command register is 1, the PCI device issues a PERR signal on the PCI bus in response to a data or address parity error. Bit 8 of this command register is S
Called ERR enable, the PCI device is
Controls whether to drive the signal. When the value of bit 8 of this command register is 1, the driving for generating the SERR signal is permitted.

【００３８】ＰＣＩデバイス（イニシエータとターゲッ
ト）は、トランザクションのデータフェーズにおけるパ
リティ線を検査する。ＰＣＩバス上のトランザクション
のライト・サイクルで、ターゲットがデータ・パリティ
エラーを検知した場合、ステータスレジスタのパリティ
エラー検出ビットの値を１にセットする。コマンドレジ
スタのパリティエラー応答ビットの値が１にセットされ
ていた場合、ターゲットは、当該サイクルでＰＥＲＲ信
号を発行する。当該サイクルのイニシエータは、ターゲ
ットがＰＥＲＲ信号を発行しているのを検知した場合、
ステータスレジスタのデータパリティエラー検知ビット
をセットする。また、イニシエータがＰＣＩバス上のト
ランザクションのリード・サイクルでデータ・パリティ
エラーを検知した場合、イニシエータはＰＥＲＲ信号を
発行する。イニシエータは、ステータスレジスタのデー
タパリティエラー検知ビットをセットする。The PCI devices (initiator and target) check the parity line in the data phase of the transaction. In the write cycle of a transaction on the PCI bus, if the target detects a data parity error, the value of the parity error detection bit in the status register is set to 1. When the value of the parity error response bit of the command register is set to 1, the target issues the PERR signal in the cycle. When the initiator of the cycle detects that the target issues the PERR signal,
Set the data parity error detection bit in the status register. Also, if the initiator detects a data parity error in a read cycle of a transaction on the PCI bus, the initiator issues a PERR signal. The initiator sets the data parity error detection bit in the status register.

【００３９】また、ＰＣＩデバイス（イニシエータとタ
ーゲット）は、トランザクションのアドレスフェーズに
おけるパリティ線も検査する。アドレス・パリティエラ
ーを検出したＰＣＩデバイスは、データパリティエラー
の場合と同様に、必ずパリティエラー検出ビットの値を
１にセットする。コマンドレジスタのＳＥＲＲ信号イネ
ーブルビットとパリティエラー応答ビットが両方ともセ
ットされていた場合、ＰＣＩデバイスはＳＥＲＲ信号を
ＰＣＩバス上に発行する。The PCI devices (initiator and target) also check the parity line in the address phase of the transaction. The PCI device that has detected the address / parity error always sets the value of the parity error detection bit to 1 as in the case of the data parity error. The PCI device issues a SERR signal on the PCI bus if both the SERR signal enable bit and the parity error response bit in the command register are set.

【００４０】更に、ＰＣＩデバイスは、何らかのエラー
を検出し、そのエラーがコンピュータシステムにとって
致命的であったと判断した場合、あるいは、他の方法で
は、システムへのエラー通知やエラーの回復処理ができ
ないと判断した場合、当該ＰＣＩデバイスはＳＥＲＲ信
号をＰＣＩバス上に発行する。ＳＥＲＲ信号を出力した
場合、ＰＣＩデバイスは、必ず、ステータスレジスタの
システム通知ビットの値を１にセットする。Furthermore, if the PCI device detects some kind of error and determines that the error is fatal to the computer system, or if another method cannot notify the system of the error or recover the error. If so, the PCI device issues a SERR signal on the PCI bus. When outputting the SERR signal, the PCI device always sets the value of the system notification bit of the status register to 1.

【００４１】次に、ノースブリッジ４０は、ＣＰＵ１
０、２０による各ＰＣＩデバイスのコンフィグレーショ
ンレジスタをアクセスするために、コンフィグレーショ
ンサイクル発生回路を内蔵する。ＣＰＵ１０、２０が、
コンフィグレーションレジスタにアクセスする時、バス
番号、デバイス番号、機能番号、レジスタ番号の４段階
のアドレス指定をコンフィグレーションアドレスレジス
タに設定する。バス番号とは、ＰＣＩバスの番号を示
す。デバイス番号とは、一つのバス番号上で指定される
デバイスの番号を示す。機能番号とは、一つのデバイス
に含まれる、すなわち、多機能デバイスに含まれるここ
の機能を識別する為に使用される。レジスタ番号とは、
一つの機能に割り当てられるコンフィグレーション空間
のアドレス番地を指定する為に使用される。Next, the north bridge 40 has the CPU 1
A configuration cycle generating circuit is built in for accessing the configuration register of each PCI device by 0 and 20. CPU10,20
When accessing the configuration register, four levels of address designation of bus number, device number, function number, and register number are set in the configuration address register. The bus number indicates the PCI bus number. The device number indicates the device number designated on one bus number. The function number is used to identify the function included in one device, that is, included in the multi-function device. What is a register number?
It is used to specify the address address of the configuration space assigned to one function.

【００４２】この発明の一実施形態に於いて、図２に示
される通り、ノースブリッジ４０のバス番号、デバイス
番号、機能番号は、夫々、「０、０、０」である。サウ
スブリッジ１３０のバス番号、デバイス番号、機能番号
は、夫々、「０、１、０」である。表示装置９０のバス
番号、デバイス番号、機能番号は、夫々、「０、２、
０」である。ＲＡＩＤ装置１００のバス番号、デバイス
番号、機能番号は、夫々、「０、３、０」である。カー
ドバス装置１２０のバス番号、デバイス番号、機能番号
は、夫々、「１、１、０」である。ＬＡＮ装置１１０の
バス番号、デバイス番号、機能番号は、夫々、「１、
２、０」である。In one embodiment of the present invention, as shown in FIG. 2, the bus number, device number, and function number of the north bridge 40 are "0, 0, 0", respectively. The bus number, device number, and function number of the south bridge 130 are "0, 1, 0", respectively. The bus number, device number, and function number of the display device 90 are “0, 2,
It is 0 ". The bus number, device number, and function number of the RAID device 100 are “0, 3, 0”, respectively. The bus number, device number, and function number of the card bus device 120 are “1, 1, 0”, respectively. The bus number, device number, and function number of the LAN device 110 are “1,
2, 0 ".

【００４３】更に、ノースブリッジ４０のコンフィグレ
ーションアドレスレジスタは、ダブルワードのレジスタ
であり、ＣＰＵ１０、２０がＩ／Ｏアドレス空間の０Ｃ
Ｆ８ｈｅｘに対してダブルワードのＩ／Ｏリード又はＩ
／Ｏライトを実行した場合、当該レジスタに対してリー
ド又はライトとして処理される。Further, the configuration address register of the north bridge 40 is a double word register, and the CPUs 10 and 20 are 0C of the I / O address space.
Double word I / O read or I to F8hex
When the / O write is executed, the register is processed as a read or a write.

【００４４】また、ノースブリッジ４０は、コンフィグ
レーションデータレジスタを内蔵する。コンフィグレー
ションレジスタは、ダブルワードのレジスタであり、Ｉ
／Ｏアドレス空間の「０ＣＦＣｈｅｘ」に配置され、
ＣＰＵからのコンフィグレーションアドレスレジスタで
指定されたアドレス番地のデータレジスタにリード又は
ライトアクセスを実行する。この発明の一実施形態に於
いては、データレジスタとして、ステータスレジスタで
あり、ＣＰＵ１０、２０によりリードアクセスされる。The north bridge 40 also contains a configuration data register. The configuration register is a double word register
Located in the "0CFC hex" of the / O address space,
Read or write access is performed to the data register at the address specified by the configuration address register from the CPU. In one embodiment of the present invention, the data register is a status register, which is read-accessed by the CPUs 10 and 20.

【００４５】図３は、本願発明の一実施形態であるＳＥ
Ｌ２４０の記録フォーマットを示す。ＳＥＬ２４０は、
ＢＭＣ２００により、イベント毎に、所定番地に所定情
報が書込まれる。ＳＥＬ２４０の記録フォーマットは、
エントリ（通し）番号と共に、当該情報を記憶したデバ
イス、時間、イベントデータ１としてのＳＥＲＲ信号／
ＰＥＲＲ信号の発生を示す情報が所定番地に書込まれ
る。この発明の一実施形態に於いては、更に、バス番
号、ファンクション番号、デバイス番号を示す情報を書
込む為、ＳＥＬ２４０のフォーマットのリザーブ番地に
イベントデータ２を用意する。FIG. 3 shows an SE according to an embodiment of the present invention.
The recording format of L240 is shown. SEL240 is
The BMC 200 writes predetermined information in a predetermined address for each event. The recording format of SEL240 is
Along with the entry (serial) number, the device that stored the information, the time, and the SERR signal as event data 1 /
Information indicating the generation of the PERR signal is written at a predetermined address. In one embodiment of the present invention, event data 2 is prepared at the reserved address of the SEL 240 format in order to further write information indicating the bus number, function number and device number.

【００４６】図４は、この発明の一実施形態に係わるエ
ラーロギング処理を示す。FIG. 4 shows an error logging process according to the embodiment of the present invention.

【００４７】次に、図４を参照して、ＢＩＯＳ―ＲＯＭ
１７０に格納され、ＳＥＲＲ信号／ＰＥＲＲ信号の発生
時のＳＥＬ２４０へのエラーイベントを書き込む処理を
以下の通りに説明する。Next, referring to FIG. 4, a BIOS-ROM
The process of writing the error event stored in 170 and writing to the SEL 240 when the SERR signal / PERR signal is generated will be described as follows.

【００４８】センサ２３０は、ＰＣＩバス６０、７０上
のＰＣＩデバイスが発行したＳＥＲＲ信号／ＰＥＲＲ信
号を検出する。センサ２３０は、ＳＥＲＲ信号／ＰＥＲ
Ｒ信号を検出した場合、サウスブリッジ１３０に対し
て、ＳＭＩ信号をＣＰＵ１０、２０に出力するように指
示する。サウスブリッジ１３０は、センサ２３０からの
指示に従い、ＣＰＵ１０、２０に対して、ＳＭＩ信号を
出力する。ＣＰＵ１０、２０が、ＳＭＩ信号を受信した
場合、ＣＰＵ１０、２０の動作モードをシステム管理モ
ード（以下、ＳＭＭと称す。）に切り替える。ＳＭＭモ
ードへの切り替え後、ＳＭＭモード上に配置されたＳＭ
Ｉハンドラは、ＢＩＯＳ―ＲＯＭ１７０に格納されたエ
ラーロギング処理プログラムを起動する。The sensor 230 detects the SERR signal / PERR signal issued by the PCI device on the PCI buses 60 and 70. The sensor 230 uses the SERR signal / PER
When the R signal is detected, the south bridge 130 is instructed to output the SMI signal to the CPUs 10 and 20. The south bridge 130 outputs an SMI signal to the CPUs 10 and 20 according to an instruction from the sensor 230. When the CPUs 10 and 20 receive the SMI signal, they switch the operation mode of the CPUs 10 and 20 to the system management mode (hereinafter referred to as SMM). SM placed on SMM mode after switching to SMM mode
The I handler activates the error logging processing program stored in the BIOS-ROM 170.

【００４９】ＳＭＩ信号に応答して、起動されたエラー
ロギング処理プログラムは、ＰＣＩバス６０、７０上の
どのデバイスが、ＳＥＲＲ信号／ＰＥＲＲ信号を出力し
たか、順次、各デバイスのステータスレジスタのビット
１４と１５の値をチェックする。In response to the SMI signal, the activated error logging processing program sequentially determines which device on the PCI buses 60, 70 output the SERR signal / PERR signal, bit 14 of the status register of each device. Check the values of 15 and 15.

【００５０】先に説明した通り、各ＰＣＩデバイスのス
テータスレジスタは、コンフィグレーション空間に配置
されており、ノースブリッジ４０のコンフィグレーショ
ンアドレスレジスタにバス番号、デバイス番号、機能番
号、レジスタ番号を指定し、コンフィグレーションデー
タレジスタからＰＥＲＲ信号／ＳＥＲＲ信号の発行・検
出を示すビットの有無をチェックする。As described above, the status register of each PCI device is arranged in the configuration space, and the bus number, device number, function number, and register number are specified in the configuration address register of the north bridge 40, From the configuration data register, it is checked whether or not there is a bit indicating the issue / detection of the PERR signal / SERR signal.

【００５１】エラーロギング処理プログラムは、コンフ
ィグレーションアドレスレジスタ内のバス番号、デバイ
ス番号、機能番号、レジスタ番号を夫々「０、０、０、
０」に設定し、検査対象のＰＣＩデバイスに対してＩ／
Ｏリードアクセスを実行する（Ｓ１００）。エラーロギ
ング処理プログラムは、コンフィグレーションデータレ
ジスタを介して読み出されたステータスレジスタのビッ
ト１４、１５の値をチックする（Ｓ１１０）。The error logging processing program sets the bus number, device number, function number and register number in the configuration address register to "0, 0, 0,
Set to 0 ”and I / O for the PCI device to be inspected.
O read access is executed (S100). The error logging processing program ticks the values of bits 14 and 15 of the status register read via the configuration data register (S110).

【００５２】ステータスレジスタのビット１４又は１５
の値が１にセットされていると判断された時、即ち、こ
のＰＣＩデバイスがＳＥＲＲ信号／ＰＥＲＲ信号を発行
・検出したならば、エラーロギング処理プログラムはＢ
ＭＣ２００に対して、図３に示すＳＥＬ２４０のイベン
トデータ１と２にエラーの種類（ＳＥＲＲ信号／ＰＥＲ
Ｒ信号）、エラーを発行・検出したデバイスのバス番
号、ファンクション番号、デバイス番号を記録するよう
に指示する。ＢＭＣ２００は、当該指示に基づき、エラ
ーイベントとして、ＳＥＬ２４０にイベントの通し番
号、センサの種類と時刻等の付加情報を記録する（Ｓ１
１０のＹｅｓ→Ｓ１９０）。Status register bit 14 or 15
When it is determined that the value of is set to 1, that is, when this PCI device issues and detects the SERR signal / PERR signal, the error logging processing program is
In the event data 1 and 2 of the SEL 240 shown in FIG.
R signal), and the bus number, function number, and device number of the device that issued / detected the error. Based on the instruction, the BMC 200 records, as an error event, additional information such as the serial number of the event, the type of sensor and the time in the SEL 240 (S1).
10: Yes → S190).

【００５３】一方、Ｓ１１０で、このデバイスがＳＥＲ
Ｒ信号／ＰＥＲＲ信号を発行・検出していないと判断さ
れたならば、エラーロギング処理プログラムは機能番号
の値を１だけ増加する（Ｓ１１０のＮｏ→Ｓ１２０）。
エラーロギング処理プログラムは、コンフィグレーショ
ンアドレスレジスタで特定されたアドレスに対して、Ｉ
／Ｏリードアクセスを実行する。On the other hand, in S110, the device
If it is determined that the R signal / PERR signal is not issued / detected, the error logging processing program increments the value of the function number by 1 (No in S110 → S120).
The error logging processing program sends I address to the address specified by the configuration address register.
/ O Read access is executed.

【００５４】Ｉ／Ｏリードアクセスの結果、ＰＣＩバス
上で対象デバイスが存在しないと判断されたならば、エ
ラーロギング処理プログラムは機能番号を０に設定し、
デバイス番号の値を１だけ増加する（Ｓ１３０のＮｏ→
Ｓ１４０）。エラーロギング処理プログラムは、コンフ
ィグレーションアドレスレジスタで特定されたアドレス
に対して、Ｉ／Ｏリードアクセスを実行する。As a result of the I / O read access, if it is determined that the target device does not exist on the PCI bus, the error logging processing program sets the function number to 0,
The value of the device number is incremented by 1 (No in S130 →
S140). The error logging processing program executes I / O read access to the address specified by the configuration address register.

【００５５】Ｉ／Ｏリードアクセスの結果、ＰＣＩバス
上で対象デバイスが存在しないと判断されたならば、エ
ラーロギング処理プログラムはデバイス番号と機能番号
を０に設定し、バス番号の値を１だけ増加する（Ｓ１５
０のＮｏ→Ｓ１６０）。エラーロギング処理プログラム
は、コンフィグレーションアドレスレジスタで特定され
たアドレスに対して、Ｉ／Ｏリードアクセスを実行す
る。If it is determined as a result of the I / O read access that the target device does not exist on the PCI bus, the error logging processing program sets the device number and the function number to 0 and sets the value of the bus number to 1 only. Increase (S15
0 No → S160). The error logging processing program executes I / O read access to the address specified by the configuration address register.

【００５６】Ｉ／Ｏリードアクセスの結果、ＰＣＩバス
上で対象デバイスが存在しないと判断されたならば、エ
ラーロギング処理プログラムは全てのＰＣＩデバイスを
チェックしたか判断する。全てのＰＣＩデバイスをチェ
ックしていないと判断したならば、エラーロギング処理
プログラムは処理をＳ１２０に戻る（Ｓ１７０のＮｏ→
Ｓ１８０のＮｏ→Ｓ１２０）。また、エラーロギング処
理プログラムは、全てのＰＣＩデバイスをチェックした
と判断した時、その処理を終了する（Ｓ１８０のＹｅ
ｓ）。If it is determined that the target device does not exist on the PCI bus as a result of the I / O read access, the error logging processing program determines whether all PCI devices have been checked. If it is determined that all PCI devices have not been checked, the error logging processing program returns the processing to S120 (No in S170 →
No of S180 → S120). When the error logging processing program determines that all PCI devices have been checked, the processing ends (Yes in S180).
s).

【００５７】一方、Ｉ／Ｏリードアクセスの結果、ＰＣ
Ｉバス上で対象デバイスが存在すると判断されたならば
（Ｓ１３０のＹｅｓ、Ｓ１５０のＹｅｓ、Ｓ１７０のＹ
ｅｓ）、エラーロギング処理プログラムはＳ１１０の処
理と同様にＰＣＩデバイスのステータスレジスタのビッ
ト１４と１５の値をチェックする。ＰＣＩデバイスがＳ
ＥＲＲ信号／ＰＥＲＲ信号を発行・検出していたなら
ば、エラーロギング処理プログラムは、エラーイベント
として、ＳＥＬ２４０のイベント１と２にエラーの種類
（ＳＥＲＲ信号／ＰＥＲＲ信号）、エラーを発行・検出
したデバイスのバス番号、ファンクション番号、デバイ
ス番号を書き込むようにＢＭＣ２００に指示する（Ｓ１
９０）。On the other hand, as a result of the I / O read access, the PC
If it is determined that the target device exists on the I-bus (Yes in S130, Yes in S150, Y in S170).
es), the error logging processing program checks the values of bits 14 and 15 of the status register of the PCI device as in the processing of S110. PCI device is S
If the ERR signal / PERR signal has been issued / detected, the error logging processing program determines the error type (SERR signal / PERR signal) and the device that issued / detected the error in events 1 and 2 of SEL240 as error events. Instruct the BMC 200 to write the bus number, function number, and device number of the device (S1
90).

【００５８】図５は、この発明の一実施形態に係わり、
エラーロギング処理プログラムにより、ＳＥＬ２４０に
記録されたエラーイベントの一例を示す。特に、ＲＡＩ
Ｄ装置１００がＳＥＲＲ信号を発行したエラーイベント
である。FIG. 5 relates to an embodiment of the present invention,
An example of the error event recorded in SEL240 by the error logging processing program is shown. In particular, RAI
This is an error event in which the D device 100 issues a SERR signal.

【００５９】図６は、この発明の一実施形態に係わり、
保守ツールプログラムの処理を示す。FIG. 6 relates to one embodiment of the present invention,
The processing of a maintenance tool program is shown.

【００６０】次に、図５と図６を参照して、ＳＥＲＲ信
号を発行したデバイスを特定までの手順を以下に説明す
る。Next, with reference to FIGS. 5 and 6, the procedure for identifying the device that issued the SERR signal will be described below.

【００６１】ＳＥＲＲ信号が発行された場合、システム
が回復不可能なエラーが発生したので、通常、コンピュ
ータシステムのＯＳは処理を停止する。コンピュータシ
ステムのユーザは、保守要員にシステムの回復・修理を
依頼する。保守要員は、持参した保守ツールプログラム
が書き込まれたＦＤをコンピュータシステムのＦＤＤ１
９０に装填し、システム電源を投入する。When the SERR signal is issued, since the system has an unrecoverable error, the OS of the computer system normally stops the processing. A user of a computer system requests maintenance personnel to recover or repair the system. The maintenance staff uses the FDD 1 of the computer system to store the FD in which the maintenance tool program brought in is written.
Load 90 and power on the system.

【００６２】保守ツールプログラムは、コンピュータシ
ステムの投入と同時にＦＤから起動され、コンピュータ
システムの初期化を実行する。起動された保守ツールプ
ログラムは、ＢＭＣ２００に対してＳＥＬ２４０のエラ
ーイベントを読み出すように指示する。ＢＭＣ２００
は、保守ツールプログラムの指示に従って、ＳＥＬ２４
０に記録されたイベントデータを順次読み出し、エラー
ロギング処理によりＳＥＬ２４０に記録されたＳＥＲＲ
信号／ＰＥＲＲ信号のイベントのみを取得する。即ち、
ＢＭＣ２００は、読み出されたエントリー（通し番号）
にイベントデータ１にＳＥＲＲ信号／ＰＥＲＲ信号の検
出を示すデータ０４ｈｅｘ又は００５ｈｅｘが記録され
ているかチェックする（Ｓ２００）。The maintenance tool program is started from the FD at the same time when the computer system is turned on and executes the initialization of the computer system. The activated maintenance tool program instructs the BMC 200 to read the error event of the SEL 240. BMC200
SEL24 according to the instructions of the maintenance tool program.
Event data recorded in 0 is sequentially read and SERR recorded in SEL240 by error logging processing.
Only the event of the signal / PERR signal is acquired. That is,
BMC200 reads the entry (serial number)
Then, it is checked whether or not the data 04hex or 005hex indicating the detection of the SERR signal / PERR signal is recorded in the event data 1 (S200).

【００６３】もしイベントデータ１にＳＥＲＲ信号／Ｐ
ＥＲＲ信号の検出を示すデータが記録されていないと判
断されたならば、保守ツールプログラムはＳＥＬ２４０
から次のエントリを読み出す（Ｓ２００のＮｏ→Ｓ２５
０）。保守ツールプログラムは、読み出されたエントリ
のイベントデータ１にＳＥＲＲ信号／ＰＥＲＲ信号の検
出を示すデータか否かチェックスする。If the event data 1 has a SERR signal / P
If it is determined that the data indicating the detection of the ERR signal is not recorded, the maintenance tool program executes SEL240.
Reads the next entry from (No in S200 → S25
0). The maintenance tool program checks whether the event data 1 of the read entry is data indicating the detection of the SERR signal / PERR signal.

【００６４】一方、Ｓ２００で、イベントデータ１にＳ
ＥＲＲ信号／ＰＥＲＲ信号の検出を示すデータが記録さ
れていると判断された場合、ＢＭＣ２００は、保守ツー
ルプログラムの指示により、エントリのイベントデータ
２の値を取得し、バス番号、デバイス番号、機能番号を
示すデータを取得する（Ｓ２１０）。この場合、ＰＣＩ
バス上のＳＥＲＲ信号のエラー発生で、バス番号、デバ
イス番号、機能番号は、夫々、「０、０、０Ｂｈｅｘ」
である。On the other hand, in S200, S is added to the event data 1.
When it is determined that the data indicating the detection of the ERR signal / PERR signal is recorded, the BMC 200 acquires the value of the event data 2 of the entry according to the instruction of the maintenance tool program, and the bus number, the device number, the function number. Is obtained (S210). In this case, PCI
When an error occurs in the SERR signal on the bus, the bus number, device number, and function number are "0, 0, 0Bhex", respectively.
Is.

【００６５】次に、保守ツールプログラムは、取得した
バス番号、デバイス番号、機能番号をノースブリッジ４
０のコンフィグレーションアドレスレジスタに設定し、
コンフィグレーションデータレジスタからＰＣＩデバイ
スのベンダＩＤとデバイスＩＤのデータを取得する（Ｓ
２２０）。保守ツールプログラムは、取得されたＰＣＩ
デバイスのベンダＩＤとデバイスＩＤのデータに対する
ＰＣＩデバイスの名前を特定する為、ＦＲＵ２１０に記
憶された機器構成一覧テーブルからデバイスの名前を取
得する。ＳＥＲＲ信号を発行したデバイスの名前が表示
装置９０に表示される（Ｓ２３０）。Next, the maintenance tool program sets the acquired bus number, device number, and function number to the north bridge 4
Set to 0 configuration address register,
Acquire the vendor ID and device ID data of the PCI device from the configuration data register (S
220). The maintenance tool program is the acquired PCI
In order to specify the vendor ID of the device and the name of the PCI device for the data of the device ID, the name of the device is acquired from the device configuration list table stored in the FRU 210. The name of the device that issued the SERR signal is displayed on the display device 90 (S230).

【００６６】保守ツールプログラムは、ＳＥＬ２４０に
記録された全てのエントリをチェックする迄、Ｓ２００
〜Ｓ２３０の処理を繰り返す（Ｓ２４０のＮｏ）。最後
のエントリがチェックされた時、保守ツールプログラム
は処理を終了する（Ｓ２４０のＹｅｓ）。The maintenance tool program waits until all the entries recorded in SEL240 are checked, until the S200
~ The process of S230 is repeated (No of S240). When the last entry is checked, the maintenance tool program ends the process (Yes in S240).

【００６７】この発明のような構成にすることにより、
保守要員は、表示画面に表示されたエラーを発行・検出
したＰＣＩデバイスの名前から、故障したＰＣＩデバイ
スを容易に特定することができる。By adopting the configuration of the present invention,
The maintenance staff can easily identify the failed PCI device from the name of the PCI device that issued / detected the error displayed on the display screen.

【００６８】尚、この発明の実施形態において、センサ
２３０が、ＰＣＩバス６０、７０上のデバイスが発行し
たＳＥＲＲ信号／ＰＥＲＲ信号を検出するように構成さ
れていたが、ＳＥＲＲ信号／ＰＥＲＲ信号をサウスブリ
ッジ１３０の割り込み入力（ＩＲＱ）端子に接続し、サ
ウスブリッジ１３０がＳＥＲＲ信号／ＰＥＲＲ信号を検
出し、ＣＰＵ１０、２０に対してＳＭＩ信号を発行する
こともできる。In the embodiment of the present invention, the sensor 230 is configured to detect the SERR signal / PERR signal issued by the device on the PCI buses 60 and 70. By connecting to the interrupt input (IRQ) terminal of the bridge 130, the south bridge 130 can detect the SERR signal / PERR signal and issue an SMI signal to the CPUs 10 and 20.

【００６９】また、この発明の実施形態において、エラ
ーロギング処理の記録としてＳＥＬ２４０を使用してい
たが、ＲＡＩＤ装置１００のＨＤＤ又はＦＤＤ１９０の
ＦＤにエラーロギング処理のエラーイベントの記録とし
て、ファイル形式で記憶させることもできる。Further, in the embodiment of the present invention, the SEL 240 is used as the record of the error logging process, but it is stored in the HDD of the RAID device 100 or the FD of the FDD 190 as a record of the error event of the error logging process in a file format. You can also let it.

【００７０】[0070]

【発明の効果】以上説明したように、この発明によれ
ば、エラーを発生したデバイスまたはエラーを検出した
デバイスを特定し、エラー処理に伴う保守性の向上する
ことができる。As described above, according to the present invention, the device in which an error has occurred or the device in which an error has been detected can be specified, and the maintainability associated with error processing can be improved.

[Brief description of drawings]

【図１】この発明の一実施形態に係わるコンピュータシ
ステムの構成を示すブロック図。FIG. 1 is a block diagram showing the configuration of a computer system according to an embodiment of the present invention.

【図２】同実施形態に係わるコンピュータシステムの各
ＰＣＩデバイスのアドレス番号を示す図。FIG. 2 is an exemplary view showing an address number of each PCI device of the computer system according to the embodiment.

【図３】同実施形態に係わるＳＥＬの記録フォーマット
を示す図。FIG. 3 is a view showing a recording format of SEL according to the same embodiment.

【図４】同実施形態に係わるエラーロギング処理を示す
フローチャート。FIG. 4 is a flowchart showing an error logging process according to the same embodiment.

【図５】同実施形態に係わるエラーロギング処理によ
り、ＳＥＬに記録されたエラー情報を示す図。FIG. 5 is a diagram showing error information recorded in SEL by the error logging process according to the embodiment.

【図６】同実施形態に係わる保守ツールプログラムの処
理を示すフローチャートFIG. 6 is a flowchart showing processing of a maintenance tool program according to the same embodiment.

[Explanation of symbols]

１０、２０…ＣＰＵ、３０…プロセッサバス、４０…ノ
ースブリッジ、５０…ＤＩＭＭ、６０…ＰＣＩバス０、
７０…ＰＣＩバス１、８０…メモリバス、９０…表示装
置、１００…ＲＡＩＤ装置、１１０…ＬＡＮコントロー
ラ装置、１２０…カードバスコントローラ装置、１３０
…サウスブリッジ、１４０…ＳＭバス、１５０…ＩＳＡ
バス、１６０…システムＩ／Ｏ、１７０…ＢＩＯＳ−Ｒ
ＯＭ、１８０…キーボード（ＫＢ）、１９０…ＦＤＤ、
２００…ＢＭＣ、２１０…ＦＲＵ、２２０…ＳＤＲ、２
３０…センサ、２４０…ＳＥＬ、２５０…Ｉ２Ｃバス10, 20 ... CPU, 30 ... Processor bus, 40 ... Northbridge, 50 ... DIMM, 60 ... PCI bus 0,
70 ... PCI bus 1, 80 ... Memory bus, 90 ... Display device, 100 ... RAID device, 110 ... LAN controller device, 120 ... Card bus controller device, 130
… South Bridge, 140… SM Bus, 150… ISA
Bus, 160 ... System I / O, 170 ... BIOS-R
OM, 180 ... keyboard (KB), 190 ... FDD,
200 ... BMC, 210 ... FRU, 220 ... SDR, 2
30 ... Sensor, 240 ... SEL, 250 ... I2C bus

Claims

[Claims]

1. An information processing apparatus having a plurality of devices connected via a bus, wherein the device comprises a detection means for detecting an error during a transaction of the bus, and further comprises: An information processing apparatus, comprising: a recording unit that records, as an error event, information of a device in which an error is detected on a recording medium, according to a detection result, together with the type of the error.

2. The information processing apparatus according to claim 1, wherein the device information includes at least a number for identifying a bus type, a number for identifying a device type, and a number for identifying a function. .

3. An information processing apparatus comprising: a plurality of devices connected via a bus; and a recording medium in which information of a device in which an error has been detected is recorded as an error event together with the type of error. An information processing apparatus maintenance method, comprising: a step of reading an error event recorded in a storage medium; and a step of identifying a device in which an error is detected according to a reading result of the reading step.

4. The maintenance method for an information processing apparatus according to claim 3, wherein in the step of reading the error event, information of a device in which an error is detected recorded in the storage medium is read.

5. The maintenance method for an information processing apparatus according to claim 4, wherein the identifying step displays the device on a display device based on the information of the device in which the error is detected in the reading step.

6. An information processing apparatus having a plurality of devices connected via a bus, together with the type of error,
A recording medium in which information of a device in which an error is detected is recorded as an error event, a unit for reading out the error event recorded in the storage medium, and a unit for identifying the device in which the error is detected according to the reading result of the reading unit. An information processing apparatus comprising:

7. The information processing apparatus according to claim 6, wherein the reading unit reads information of a device in which an error is detected, which is recorded in the storage medium.

8. The information processing apparatus according to claim 7, wherein the specifying unit displays the device on a display device based on information of the device in which the error is detected.

9. The information processing apparatus according to claim 8, wherein the device information includes at least a number for identifying a bus type, a number for identifying a device type, and a number for identifying a function. .