JP3644399B2

JP3644399B2 - PCI bus defective part isolation method and program thereof

Info

Publication number: JP3644399B2
Application number: JP2001089687A
Authority: JP
Inventors: 昭彦谷
Original assignee: ティーエム・ティーアンドディー株式会社
Priority date: 2001-03-27
Filing date: 2001-03-27
Publication date: 2005-04-27
Anticipated expiration: 2021-03-27
Also published as: JP2002288049A

Description

【０００１】
【発明の属する技術分野】
本発明は、ＰＣＩバス仕様に準拠したＰＣＩバス不良箇所切り離し方法に関するもので、特にＰＣＩバス上にＨ／Ｗの追加をする事なく、ＰＣＩバス上の不良発生デバイス／ファンクションを論理的に切り離してシステムを立ち上げ、システムの連続稼動性を向上するものである。
【０００２】
【従来の技術】
図１２は例えば特開平１１−１９１０７３号公報に示されたＰＣＩバス処理装置を示す構成図である。図において、２０はＰＣＩバス処理装置であり、２１はＰＣＩバス上のトランザクションの開始を検出するトランザクション開始検出回路であり、ＰＣＩバス信号のＦＲＡＭＥ＃を入力する。２２はＰＣＩバス上のアドレス／データ線及びコマンド／バイトイネーブル線上の情報を保持する保持レジスタであり、２３はＰＣＩバス上の異常を検出する異常検出回路であり、２４は有効か否かを示すｖａｌｉｄｂｉｔを有し、異常が発生した時のアドレス及びコマンド情報を格納する格納レジスタ、２５はＰＣＩバスインタフェースであり、すべてのＰＣＩバス信号を入力する。
【０００３】
次に動作について説明する。
（１）ＰＣＩバス上のバスマスタがＰＣＩバス上にトランザクションを開始すると、ＰＣＩバス処理装置２０内のトランザクション開始検出回路はトランザクションの開始を検出し、保持レジスタ２２に通知し、保持レジスタ２２はアドレス／コマンド情報を格納する。
（２）次に異常検出回路２３はＰＣＩバス上のトランザクションを監視し、異常を検出すると保持レジスタ２２の内容を格納レジスタ２４に格納すると同時にｖａｌｉｄｂｉｔを有効にする。
【０００４】
（３）そしてＰＣＩバス上で異常の報告を受けたホストＣＰＵ（不図示）はＰＣＩバス処理装置２０内の格納レジスタ２４をＰＣＩバスを介してリードする。
（４）ＰＣＩバス処理装置２０内のＰＣＩバスインタフェース２５はホストＣＰＵからのＰＣＩリードトランザクションを受けて、ｖａｌｉｄｂｉｔが有効な場合のみ格納レジスタ２４の値をホストＣＰＵに返す。
一方、ｖａｌｉｄｂｉｔが無効な場合は”ＦＦＦＦＦＦＦＦｈ”を返す。
【０００５】
従って、異常発生時のアドレス及びコマンド情報をＰＣＩバスインターフェースによりＰＣＩバス上に出力するように構成したので、ホストＣＰＵは異常が発生した時に異常発生アドレスにより異常ＰＣＩファンクション、異常ＰＣＩデバイスを特定し、異常発生箇所の切り離しを行う。
【０００６】
【発明が解決しようとする課題】
従来のＰＣＩバス処理装置による異常個所の特定は以上のように行われているので、ＰＣＩのトランザクションを解析するという特殊なＨ／Ｗ回路の追加が必要であった。
【０００７】
また、ＰＣＩバス上に回路を接続する為、ＰＣＩバス上の電気的負荷となり、ＰＣＩバスの拡張スロットを１つ占有してしまうという課題があった。
【０００８】
また、最初から異常であるＰＣＩデバイス／ファンクションに対してＰＣＩコンフィギュレーションサイクルを実行した場合、ＰＣＩデバイス／ファンクションからは初期化未完了を示すリトライが無限に繰り返されることとなるが、これ自体はＰＣＩのトランザクションとしては正常であるため従来のＰＣＩバス処理装置では検出不可能であり、ホストＣＰＵとしても異常個所の特定が行えないという課題があった。
【０００９】
【課題を解決するための手段】
（１）請求項１記載の発明に係わるＰＣＩバス不良箇所切り離し方法は、ホストＣＰＵカードと、ＰＣＩファンクションを内蔵したＰＣＩデバイスとがＰＣＩバスを介して接続されたシステムに対し、上記ＰＣＩファンクションまたはＰＣＩデバイスをコンフィギュレーションした場合に異常があると、異常対象のＰＣＩファンクションまたはＰＣＩデバイスを上記システムから切り離すＰＣＩバス不良個所切り離し方法において、上記システム立ち上げ時に上記各ＰＣＩファンクションまたは各ＰＣＩデバイス毎に順次コンフィギュレーションを実行する第１のステップと、上記コンフィギュレーション中に異常があると、上記ホストＣＰＵカード及び全てのＰＣＩファンクションまたはＰＣＩデバイスをリセットする第２のステップと、上記リセット後に異常があったＰＣＩファンクションまたはＰＣＩデバイスを切り離して、残りの各ＰＣＩファンクションまたは各ＰＣＩデバイスに対し順次コンフィギュレーションを実行する第３のステップと、第３のステップでコンフィギュレーションを実行しても異常が解消しない場合は、第２のステップへ戻って上記ホストＣＰＵカード及び全てのＰＣＩファンクションまたはＰＣＩデバイスに対してリセットを実行するよう、第２と第３のステップを所定回数または異常が解消するまで繰り返し行う第５のステップとを上記ホストＣＰＵカードにて行うことにより、複数の異常ＰＣＩファンクションまたはＰＣＩデバイスをシステムから切り離し可能としたものである。
【００１０】
（２）請求項２記載の発明に係わるＰＣＩバス不良箇所切り離し方法は、請求項１のＰＣＩバス不良個所切り離し方法において、第２のステップでリセットした後に、第１のステップに戻り再度、全てのＰＣＩファンクションまたはＰＣＩデバイスに対してコンフィギュレーションするよう第１と第２のステップを少なくとも１回繰り返し、異常が解消しないと第３のステップへ移行する第４のステップを含めたものである。
【００１１】
（３）請求項３記載の発明に係わるＰＣＩバス不良箇所切り離し方法は、請求項１または請求項２のＰＣＩバス不良個所切り離し方法において、第３のステップを実行しても異常が解消しない場合、あるいは、第３のステップを実行してから所望時間後に異常が解消しない場合、または、第１〜第３ステップのいずれか１つのステップが実行できない場合は、ホストＣＰＵカードの異常と判定する第６のステップを含めたものである。
【００１２】
（４）請求項４記載の発明に係わるＰＣＩバス不良箇所切り離し方法を実行するプログラムは、請求項１〜３のいずれか１項に記載のＰＣＩバス不良個所切り離し方法を実行させるためのプログラムとしたものである。
【００１３】
【発明の実施の形態】
以下、この発明の実施の一形態を説明する。
実施の形態１．
図１はＰＣＩバスを有するシステムの構成図で、１０はＰＣＩバスで、ホストＣＰＵカード１１とアドインカード１２，１３，１４とが接続されている。ホストＣＰＵカード１１にはホストＣＰＵ１１ａ及び図示しないがメモリ、インターフェース等の各種の計算機機能が内蔵されていてホストＣＰＵ機能を形成している。また、各種の必要なＳ／Ｗ１１ｂも内蔵されていて、本発明のＰＣＩバス不良箇所切り離し方法のソフトも内蔵されている。アドインカード１２，１３，１４には、例えばＬＳＩで構成されたＰＣＩデバイス１２ａ，１３ａ，１４ａが設けられ、それらのＰＣＩデバイス内には各種の機能を有するＰＣＩファンクション１２ｂ，１２ｃ，１３ｂ，１４ｂ，１４ｃを内蔵している。
【００１４】
図２はこの発明の実施の形態１によるＰＣＩバス不良箇所切り離し方法を実現するブロック図であり、図において、１はＰＣＩバスに接続されるホストＣＰＵカード上で動作し、ＰＣＩコンフィギュレーションサイクルを実行するＰＣＩコンフィギュレーションサイクル実行部、２はこれからＰＣＩコンフィギュレーションサイクルを実行するアクセスアドレス（ＰＣＩバス番号／デバイス番号／ファンクション番号の組み合わせ）を格納しておくアクセスアドレス格納領域、３は全ファンクションの論理的切り離し状況を管理するファンクション管理テーブル（図６参照）である。
【００１５】
４はホストＣＰＵカード上のＣＰＵ及びその上で動作するソフトウェアの正常動作を監視するＷＤＴ回路、５はホストＣＰＵカード上のＣＰＵ及びその上で動作するソフトウェアが正常に動作している場合に、ソフトウェアにより定期的に書込みが行われるＷＤＴクリアレジスタであり、書き込みが行われる事によりＷＤＴのカウントアップ前にＷＤＴカウンタ値のクリアを行う。なお、アドインカード上のＣＰＵやその上で動作するソフトウェアの動作が異常の場合にはホストＣＰＵカード上のＣＰＵ及びその上で動作するソフトウェアの異常として現れ、結果としてＷＤＴクリアレジスタ５への書込みが停止する。従ってＷＤＴ回路４によってアドインカード側も監視している事となる。
【００１６】
６はＷＤＴ回路４がカウントアップした場合に起動され、システムにウェイクアップリセット（Ｗａｋｅｕｐｒｅｓｅｔ）を発行するリセット生成回路、７はパワーＯＮによるリセット解除か、ウェイクアップリセットによるリセット解除かを示すリセット要因レジスタであり、ＰＣＩコンフィギュレーションサイクル実行部１より読み込みが可能である。
なお、アクセスアドレス格納領域２とファンクション管理テーブル３のメモリ上に内容が格納され、ウェイクアップリセットをしてもその内容が保持される。
【００１７】
次にＷＤＴ回路４の動作について説明する。
図３、図４はＷＤＴ回路４のカウンタ値の動きの一例を示す図である。
（１）まず図３（ａ）において、パワーＯＮリセット解除（時刻：Ｔ１）にてＷＤＴ回路４がカウントし始める。
（２）ホストＣＰＵとホストＣＰＵ上で動作しているソフトウェアが正常に動作しておりＷＤＴカウントアップ以前にソフトウェアによりＷＤＴクリアレジスタ５に書き込みを行うことによりＷＤＴカウンタをクリアしている。この間にコンフィギュレーションが完了し、システムが立ち上がる。
【００１８】
（３）図３（ｂ）のように、あるＰＣＩファンクションに異常があり、その為、そのＰＣＩファンクションへのＰＣＩコンフィギュレーション処理が時刻Ｔ２で停止したとすると、ソフトウェアによるＷＤＴクリアレジスタ５への書込みが停止し、期間Ｐ１後にはＷＤＴカウンタ値はカウントアップし、１回目のＷＤＴカウントアップであるのでウェイクアップリセットの実施をリセット生成回路６に要求し、リセット生成回路６によりウェイクアップリセットが実施される（時刻：Ｔ３）。
（４）ウェイクアップリセット解除後に再度ＷＤＴ回路４がカウントし始めるが、今回は異常ＰＣＩファンクションへのコンフィギュレーションをパスすることにより、ＷＤＴカウントアップ前にコンフィギュレーションが完了し、ソフトウェアによりＷＤＴクリアレジスタ５をクリアするので、システムが正常に立ち上がる。
【００１９】
（５）ＰＣＩファンクションに異常が発生し異常が復旧していない場合、図４（ａ）のように、ソフトウェアによるＷＤＴクリアレジスタ５への書き込みが停止したままであるので、期間Ｐ２（＝Ｐ１）後には再度ＷＤＴカウンタ値はカウントアップし、今回が連続した２回目のＷＤＴカウントアップであるので、ホストＣＰＵあるいは共通部であるＰＣＩバスそのものの機能停止と判断し、例えばアラーム信号を送出するなどの異常判定処理を行う（時刻Ｔ５）。
（６）なお、図４（ｂ）のように期間Ｐ２の間に一度でもソフトウェアによるＷＤＴクリアレジスタ５への書き込みが行われた場合は、次のＷＤＴカウントアップは１回目とみなし、ウェイクアップリセットとなる。（期間Ｐ２の間にＷＤＴクリア処理が入ったために、連続２回目とならない。）
【００２０】
次に全体の処理フローについて説明する。
図５はＰＣＩコンフィギュレーションサイクル実行部１がＰＣＩバス上の各ファンクションをコンフィギュレーションする一例を示すフローチャート図である。まず全ＰＣＩバス／デバイス／ファンクションが正常かつ、リセット要因がパワーＯＮの場合を説明する。
【００２１】
（１）ＰＣＩコンフィギュレーションサイクル実行部１はリセット要因レジスタ７の内容を読み出し（ステップＳＴ１−１）、
（２）今回のリセット要因を調べ（ステップＳＴ１−２）、
（３）リセット要因がＷＤＴカウントアップによるウェイクアップリセット以外でなので、ファンクション管理テーブル３中の全ファンクションステータスを“０：正常”に初期化し（ステップＳＴ１−３）、
（４）全ＰＣＩバス／デバイス／ファンクションのコンフィギュレーション繰り返し処理に移り（ステップＳＴ１−４）、
（５）今回の繰り返し処理中にコンフィギュレーションする対象ファンクションステータスを調べ（ステップＳＴ１−５）、
【００２２】
（６）異常なしなので今回コンフィギュレーションする対象のアクセスアドレスをアクセスアドレス格納領域２に格納し（ステップＳＴ１−６）、
（７）対象ファンクションのＰＣＩコンフィギュレーション実行（ステップＳＴ１−７）、
（８）完了後にアクセスアドレス格納領域２をクリア（ステップＳＴ１−８）、全ＰＣＩバス／デバイス／ファンクション分をステップＳＴ１−４より繰り返し実行（ステップＳＴ１−９）により全ＰＣＩコンフィギュレーションを終了し、
（９）ＷＤＴカウントアップ前にＷＤＴクリアレジスタ５に書き込みを行う（ステップＳＴ１−１０）。（正常終了時でもＷＤＴカウントアップしないようにカウンタの値を決定している。）
【００２３】
次にＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１で異常が発生している状態でパワーＯＮリセットからの流れを説明する。
（１）ステップＳＴ１−１〜ＳＴ１−６までは前述と全く同じである。
（２）ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１に対してＰＣＩコンフィギュレーションを実施すると（ステップＳＴ１−７）、
【００２４】
（３）ＰＣＩコンフィギュレーションサイクルが無限にリトライ処理を繰り返し、終了しないケースに陥る場合がある。（汎用のＰＣＩチップセットとＣＰＵを搭載し、ＣＰＵ上で動作するＳ／Ｗによりこの汎用ＰＣＩチップを初期化するようなＰＣＩターゲットカード（アドインカード）の場合、Ｓ／Ｗが正常に動作しないような異常が発生するとＰＣＩチップの初期化が完了しない。この場合にはホストＣＰＵカードからのＰＣＩコンフィギュレーションサイクルに対して無限にリトライ処理を繰り返す結果となる場合がある。）結果としてＷＤＴカウントアップ以前にＷＤＴクリアレジスタ５への書き込みを行う事ができないため、ウェイクアップリセットが発生する。
【００２５】
次に上記ウェイクアップリセット発生以降の流れを説明する。
（１）ＰＣＩコンフィギュレーションサイクル実行部１はリセット要因レジスタの内容を読み出し（ステップＳＴ１−１）、
（２）今回のリセット要因を調べ（ステップＳＴ１−２）、
（３）リセット要因がＷＤＴカウントアップによるウェイクアップリセットなのでアクセスアドレス格納領域２の内容を読み出し（ステップＳＴ１−１１）、
（４）格納内容を調べ（ステップＳＴ１−１２）、
（５）異常を生じたアクセスアドレスが格納されているのでそのアクセスアドレス格納領域２をクリア（ステップＳＴ１−１３）後、
【００２６】
（６）ウェイクアップリセット発生以前にアクセスしていた情報であるＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１を元に、ファンクション管理テーブル３中の該当するファンクションステータスを“１：異常”にセットし（ステップＳＴ１−１４）、
（７）全ＰＣＩバス／デバイス／ファンクションのコンフィギュレーション繰り返し処理に移り（ステップＳＴ１−４）、
（８）今回の繰り返し処理中にコンフィギュレーションする対象ファンクションステータスを調べ（ステップＳＴ１−５）、
【００２７】
（９）ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１のファンクションステータスが”１：異常”にセットされているので、ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１に対するＰＣＩコンフィギュレーション処理を省略し、残りの全ＰＣＩバス／デバイス／ファンクション分をステップＳＴ１−４より繰り返し実行（ステップＳＴ１−９）により全ＰＣＩコンフィギュレーションを終了し、
（１０）ＷＤＴカウントアップ前にＷＤＴクリアレジスタ５に書き込みを行う（ステップＳＴ１−１０）。
【００２８】
（１１）残りの全ＰＣＩバス／デバイス／ファンクションのＰＣＩコンフィギュレーション中に他のファンクションで同様の異常となった場合には、ＰＣＩコンフィギュレーション処理で停止し、連続２回目のＷＤＴカウントアップ後にホストＣＰＵ機能が停止していると判定する。
つまり、最初に異常を検出した１つのＰＣＩファンクションの切り離し処理のみを行い、複数のＰＣＩファンクションが異常の場合には、ホストＣＰＵあるいは共通部であるＰＣＩバスそのものの機能停止と判断し、アラーム等の送出処理を行う。
【００２９】
図６にＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１で異常が発生してＰＣＩコンフィギュレーション処理を省略した後のファンクション管理テーブル３の例を示す。
【００３０】
このように、ＷＤＴカウントアップによるウェイクアップリセットを設け、異常ファンクションに対するＰＣＩコンフィギュレーションサイクル処理の停止からのリセット復帰をできる様にし、さらに、アクセスアドレス格納領域２とファンクション管理テーブル３により、異常となったＰＣＩコンフィギュレーションサイクルのアクセス先（ＰＣＩバス番号／デバイス番号／ファンクション番号）を格納・保持する手段を設け、全ＰＣＩコンフィギュレーション完了後に始めてＷＤＴクリア処理を行う為、１つの異常ＰＣＩファンクションの検出と、その１つのＰＣＩファンクションの論理的に切り離しが可能となり、正常部分でのシステムの連続稼動性の向上が可能となる。
【００３１】
また、ＰＣＩのトランザクションを解析するという特殊なＨ／Ｗ回路の追加も必要ない。
【００３２】
また、ＰＣＩバス上の電気的負荷とならないので、ＰＣＩバスの拡張スロットを１つ占有する事も無いため、ＰＣＩバス拡張スロットを有効に使用できる。
【００３３】
実施の形態２．
実施の形態１では図４（ａ）において、Ｔ３時点で１回目のＷＤＴカウントアップによりウェイクアップリセットをした後、異常のＰＣＩファンクションを除いて、残りのＰＣＩファンクションに対してＰＣＩコンフィギュレーションを実行するようにしたが、たまたまノイズの影響などで過渡的に異常が発生し、正常のＰＣＩファンクションが異常とみなされることがある。
この発明の実施の形態２では、異常のあるアドレスのＰＣＩファンクションを除かず全アドレスのＰＣＩファンクションに対してＰＣＩコンフィギュレーションを再度実行する。
つまり、図４（ａ）のＰ１の期間に相当する動作を少なくとも一回は繰り返すようにし、その後にＰ２の期間の処理に移行する。
【００３４】
実施の形態３．
次に、この発明の実施の形態３について説明する。実施の形態３ではブロック図は実施の形態１での図２と同様であり、実施の形態１と異なるのはＰＣＩコンフィギュレーションサイクル実行部１の処理を示す図７のフローチャートと、図８の使用するファンクション管理テーブルの内容である。
【００３５】
次に動作について説明する。
図７はこの発明の実施の形態３によるＰＣＩコンフィギュレーションサイクル実行部１がファンクション管理テーブル３を使用して処理を行う一例を示すフローチャート図である。図７において、実施の形態１と同等の処理ステップには図５と同一のステップ番号を付けて重複説明を省略する。また、図７において、図５と異なる部分についてのみ新たなステップ番号ＳＴ２−１５を付けて説明する。
【００３６】
全ＰＣＩバス／デバイス／ファンクションが正常の場合には実施の形態１と同様の処理になる為に説明を省略する。
また、ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１で異常が発生している状態でのパワーＯＮリセットからウェイクアップリセット１回目が発生するまでの流れも実施の形態１と同様の処理になるために説明を省略する。
【００３７】
次に上記ウェイクアップリセット１回目発生以降の流れを説明する。
（１）ステップＳＴ１−１〜ＳＴ１−１４までは実施の形態１と同様であり、この時点ではＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１に該当するファンクション管理テーブル３中のファンクションステータスは“１：異常”にセットされている。
（２）この後、実施の形態３ではＷＤＴクリアレジスタ５への書き込み処理を行い（ステップＳＴ２−１５）、
（３）全ＰＣＩバス／デバイス／ファンクションのコンフィギュレーション繰り返し処理に移り（ステップＳＴ１−４）、
（４）今回の繰り返し処理中にコンフィギュレーションする対象ファンクションステータスを調べ（ステップＳＴ１−５）、
（５）ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１のファンクションステータスが”１：異常”にセットされているので、ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１に対するＰＣＩコンフィギュレーション処理を省略し、残りの全ＰＣＩバス／デバイス／ファンクション分をステップＳＴ１−４より繰り返し実行する（ステップＳＴ１−９）。
【００３８】
（６）ここで、残りの全ＰＣＩバス／デバイス／ファンクションのＰＣＩコンフィギュレーション中に他のＰＣＩファンクションで同様の異常となった場合として、ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝３の異常を想定する（図８参照）。このＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝３に対するＰＣＩコンフィギュレーション処理が無限にリトライを繰り返し、終了しない。結果として再度ＷＤＴカウンタがカウントアップする。
（７）ここで実施の形態１と異なる点はステップＳＴ２−１５にて一度ＷＤＴクリアレジスタ５への書き込みを実施しているため、今回のカウントアップでもウェイクアップリセットが発生する点である。
【００３９】
（８）ウェイクアップリセット後は再度ステップＳＴ１−１より開始し、
（９）最終的にはＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１とＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝３の両ファンクションの異常を検出し、両ファンクションに対するＰＣＩコンフィギュレーション処理を省略することにより、両ファンクションをＰＣＩバスより論理的に切り離す。
【００４０】
図８にＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１と、ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝３で異常が発生してＰＣＩコンフィギュレーション処理を省略した後のファンクション管理テーブル３の例を示す。
【００４１】
このように、ＷＤＴカウントアップによるウェイクアップリセットを設け、異常ファンクションに対するＰＣＩコンフィギュレーションサイクル処理の停止からのリセット復帰をできる様にし、さらに、アクセスアドレス格納領域２とファンクション管理テーブル３により、異常となったＰＣＩコンフィギュレーションサイクルのアクセス先（ＰＣＩバス番号／デバイス番号／ファンクション番号）を格納・保持する手段を設け、異常ファンクションを検出する度にＷＤＴクリア処理を行う為、複数または全ての異常ＰＣＩファンクションの検出と、それら複数または全ての異常ＰＣＩファンクションを論理的に切り離し可能となり、正常部分でのシステムの連続稼動性の向上が可能となる。
【００４２】
以上のように複数の異常のＰＣＩファンクションがあり、これらの異常のある全てのＰＣＩファンクションが切り離されるまで、コンフィギュレーションを繰り返してコンフィギュレーションを完了することができる。
しかし、ホストＣＰＵカードの故障があると、コンフィギュレーション動作を繰り返すので、これを防止するため所定回数コンフィギュレーションするとホストＣＰＵあるいは共通部であるＰＣＩバスそのものの機能停止と判定し、アラーム等を送出してコンフィギュレーションを中止するようにしてもよい。
【００４３】
実施の形態４．
システム構成としてＰＣＩバス上に実装されるＨ／Ｗとして１カード＝１デバイス＝１ファンクションあるいは、１カード＝１デバイス＝複数ファンクションの構成が多く、また、異常となる単位もカード単位、つまり、デバイス単位となる場合が多い。この実施の形態４では、ＰＣＩデバイス単位での論理的切り離しを行うようにし、システム立ち上がり時間を更に短縮するものである。
【００４４】
図９はこの発明の実施の形態４によるＰＣＩバス不良箇所切り離し方法を実現するブロック図であり、図において、実施の形態３と同等のブロックには図２と同一の番号を付けて重複説明を省略する。また、図９において、図２と異なる部分についてのみ３０番台の新たなブロック番号を付けて説明する。
実施の形態４ではＰＣＩバスからの論理的切り離しの単位をＰＣＩデバイスとするので、実施の形態３でのファンクション管理テーブル３は、デバイス管理テーブル３３となる。その他のブロックは実施の形態３と同様である。
【００４５】
次に動作について説明する。
図１０はこの発明の実施の形態４によるＰＣＩコンフィギュレーションサイクル実行部１がデバイス管理テーブル３３を使用して処理を行う一例を示すフローチャート図である。図において、実施の形態３と同等の処理ステップには図７と同一のステップ番号を付けて重複説明を省略する。また、図１０において、図７と異なる部分についてのみ新たなステップ番号ＳＴ３−３，ＳＴ３−５，ＳＴ３−１４を付けて説明する。
【００４６】
全ＰＣＩバス／デバイス／ファンクションが正常の場合には実施の形態３と同様の処理になる為に説明を省略する。
ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１で異常が発生している状態でパワーＯＮリセットからの流れを説明する。
（１）ステップＳＴ１−１〜ＳＴ１−２までは実施の形態３と全く同じである。
（２）リセット要因がＷＤＴカウントアップによるウェイクアップリセット以外なのでデバイス管理テーブル３３中の全デバイスステータスを“０：正常”に初期化し（ステップＳＴ３−３）、
【００４７】
（３）ステップＳＴ１−４〜ステップＳＴ１−６は実施の形態３と同様の処理を行い、ＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１に対してＰＣＩコンフィギュレーションを実施すると（ステップＳＴ１−７）、ＰＣＩコンフィギュレーションサイクルが無限にリトライ処理を繰り返し、終了しないケースに陥る場合がある。結果としてＷＤＴカウントアップ以前にＷＤＴクリアレジスタ５への書き込みを行う事ができないため、ウェイクアップリセットが発生する。
【００４８】
次に上記ウェイクアップリセット発生以降の流れを説明する。
（１）ステップＳＴ１−１〜ＳＴ１−１３までは実施の形態３と同様に、ウェイクアップリセット発生以前にアクセスしていた情報であるＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１を元に、デバイス管理テーブル３３中の該当するデバイスステータスを“１：異常”にセットし（ステップＳＴ３−１４）、
（２）全ＰＣＩバス／デバイス／ファンクションのコンフィギュレーション繰り返し処理に移り（ステップＳＴ１−４）、
（３）今回の繰り返し処理中にコンフィギュレーションする対象デバイスステータスを調べ（ステップＳＴ３−５）、
【００４９】
（４）ＰＣＩバス番号＝０／デバイス番号＝１のデバイスステータスが”１：異常”にセットされているので、ＰＣＩバス番号＝０／デバイス番号＝１の全ファンクションに対するＰＣＩコンフィギュレーション処理を省略し、残りの全ＰＣＩバス／デバイス／ファンクション分をステップＳＴ１−４より繰り返し実行（ステップＳＴ１−９）により全ＰＣＩコンフィギュレーションを終了し、
（５）ＷＤＴカウントアップ前にＷＤＴクリアレジスタ５に書き込みを行う（ステップＳＴ１−１０）。
【００５０】
（６）残りの全ＰＣＩバス／デバイス／ファンクションのＰＣＩコンフィギュレーション中に他のデバイスで同様の異常となった場合には、デバイス管理テーブル３３中の該当するデバイスステータスを“１：異常”にセットし、再度ステップＳＴ１−１より再開する。
この場合でもデバイス管理テーブル３３はクリアされずに保持されている為、前回、既に異常のためにＰＣＩコンフィギュレーション処理を省略したデバイスは、今回も再度省略される。
【００５１】
図１１にＰＣＩバス番号＝０／デバイス番号＝１／ファンクション番号＝１と、ＰＣＩバス番号＝０／デバイス番号＝３／ファンクション番号＝１で異常が発生してＰＣＩコンフィギュレーション処理を省略した後のデバイス管理テーブル３３の例を示す。なお、コンフィギュレーションする場合や異常を検出するのはファンクション単位であるが、切り離しはデバイス単位であるので、図１１のデバイス管理テーブル３３にはファンクション番号が表示されない。
【００５２】
このように、ＷＤＴカウントアップによるウェイクアップリセットを設け、異常デバイスに対するＰＣＩコンフィギュレーションサイクル処理の停止からのリセット復帰をできる様にし、さらに、アクセスアドレス格納領域２とデバイス管理テーブル３３により、異常となったＰＣＩコンフィギュレーションサイクルのアクセス先（ＰＣＩバス番号／デバイス番号）を格納・保持する手段を設け、異常デバイスを検出する度にＷＤＴクリア処理を行う為、複数または全ての異常ＰＣＩデバイスの検出と、それら複数または全ての異常ＰＣＩデバイスを論理的に切り離し可能となり、正常部分でのシステムの連続稼動性の向上が可能となる。
【００５３】
また、ＰＣＩデバイス単位での論理的切り離しによって、ＰＣＩファンクション単位での論理的切り離しよりもシステム立ち上がり時間が短縮される。
【００５４】
上記各実施の形態ではＰＣＩバスについて説明したが、最近よく使用されているＣｏｍｐａｃｔＰＣＩバスについてもこの発明が適用できる。
【００５５】
【発明の効果】
（１）以上のように、請求項１記載の発明によれば、１つの異常ＰＣＩファンクションの検出と、その１つのＰＣＩファンクションの論理的に切り離しが可能となり、正常部分でのシステムの連続稼動性を向上させる効果がある。
【００５６】
（２）請求項２記載の発明によれば、異常があっても全てのＰＣＩファンクションに対しコンフィギュレーションを少なくとも１回繰り返すことにより、ノイズの影響などで過渡的に異常が発生しても、正常のＰＣＩファンクションが異常とみなされることを防止する効果がある。
【００５７】
（３）また、請求項１及び２記載の発明によれば、異常があれば順次残りのＰＣＩファンクションのコンフィギュレーションを行うことにより、複数のまたは全ての異常ＰＣＩファンクションの検出と、それら複数のまたは全ての異常ＰＣＩファンクションの論理的に切り離しが可能となり、正常なＰＣＩファンクションだけで立ち上げることができ、正常部分でのシステムの連続稼動性を向上させる効果がある。
【００５８】
（４）請求項３記載の発明によれば、ＰＣＩデバイス単位で異常を検出し、そのＰＣＩデバイスを切り離すようにしたので、正常部分でのシステムの連続稼動性を向上させる効果がある。
【００５９】
（５）請求項４記載の発明によれば、正常部分でのシステムの連続稼動性を向上するプログラムとした効果がある。
【００６０】
【図面の簡単な説明】
【図１】本発明の実施の形態１によるＰＣＩバスシステムの構成図である。
【図２】本発明の実施の形態１によるＰＣＩ不良箇所切り離し方法の一例を示すブロック図である。
【図３】本発明の実施の形態１によるＰＣＩ不良箇所切り離し方法で使用するＷＤＴ回路の動作例である。
【図４】本発明の実施の形態１によるＰＣＩ不良箇所切り離し方法で使用するＷＤＴ回路の動作例である。
【図５】本発明の実施の形態１によるＰＣＩ不良箇所切り離し方法の一例を示すフローチャート図である。
【図６】本発明の実施の形態１によるＰＣＩ不良箇所切り離し方法で使用するファンクション管理テーブルの一例である。
【図７】本発明の実施の形態３によるＰＣＩ不良箇所切り離し方法の一例を示すフローチャート図である。
【図８】本発明の実施の形態３によるＰＣＩ不良箇所切り離し方法で使用するファンクション管理テーブルの一例である。
【図９】本発明の実施の形態４によるＰＣＩ不良箇所切り離し方法の一例を示すブロック図である。
【図１０】本発明の実施の形態４によるＰＣＩ不良箇所切り離し方法の一例を示すフローチャート図である。
【図１１】本発明の実施の形態４によるＰＣＩ不良箇所切り離し方法で使用するデバイス管理テーブルの一例である。
【図１２】従来例のＰＣＩバス処理装置の構成例である。
【符号の説明】
１ＰＣＩコンフィギュレーションサイクル実行部
２アクセスアドレス格納領域３ファンクション管理テーブル
４ＷＤＴ回路５ＷＤＴクリアレジスタ
６リセット生成回路７リセット要因レジスタ
１２，１３，１４アドインカード
１２ａ，１３ａ，１４ａＰＣＩデバイス
１２ｂ，１２ｃ，１３ｂ，１４ｂ，１４ｃＰＣＩファンクション[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method for isolating a defective part of a PCI bus conforming to the PCI bus specification. In particular, the present invention logically separates a defective device / function on the PCI bus without adding H / W on the PCI bus. The system is started and the continuous operability of the system is improved.
[0002]
[Prior art]
FIG. 12 is a block diagram showing a PCI bus processing apparatus disclosed in, for example, Japanese Patent Application Laid-Open No. 11-191073. In the figure, 20 is a PCI bus processing device, 21 is a transaction start detection circuit for detecting the start of a transaction on the PCI bus, and inputs FRAME # of the PCI bus signal. 22 is a holding register for holding information on the address / data line and command / byte enable line on the PCI bus, 23 is an abnormality detection circuit for detecting an abnormality on the PCI bus, and 24 indicates whether or not it is valid. A storage register 25 that has a valid bit and stores an address and command information when an abnormality occurs. A PCI bus interface 25 inputs all PCI bus signals.
[0003]
Next, the operation will be described.
(1) When the bus master on the PCI bus starts a transaction on the PCI bus, the transaction start detection circuit in the PCI bus processing device 20 detects the start of the transaction and notifies the holding register 22. Stores command information.
(2) Next, the abnormality detection circuit 23 monitors a transaction on the PCI bus. When an abnormality is detected, the contents of the holding register 22 are stored in the storage register 24 and at the same time the validbit is validated.
[0004]
(3) Then, the host CPU (not shown) having received the abnormality report on the PCI bus reads the storage register 24 in the PCI bus processing device 20 via the PCI bus.
(4) The PCI bus interface 25 in the PCI bus processing device 20 receives a PCI read transaction from the host CPU, and returns the value of the storage register 24 to the host CPU only when the validbit is valid.
On the other hand, when validbit is invalid, “FFFFFFFFh” is returned.
[0005]
Therefore, since the address and command information at the time of occurrence of an abnormality are output on the PCI bus via the PCI bus interface, the host CPU identifies the abnormal PCI function and abnormal PCI device by the abnormality occurrence address when an abnormality occurs, Isolate the location where the error occurred.
[0006]
[Problems to be solved by the invention]
Since the abnormal part is identified by the conventional PCI bus processing apparatus as described above, it is necessary to add a special H / W circuit for analyzing a PCI transaction.
[0007]
In addition, since a circuit is connected on the PCI bus, there is a problem that an electric load is generated on the PCI bus and one expansion slot of the PCI bus is occupied.
[0008]
In addition, when a PCI configuration cycle is executed for a PCI device / function that is abnormal from the beginning, the PCI device / function will retry indefinitely indicating that the initialization has not been completed. Since this transaction is normal, it cannot be detected by a conventional PCI bus processing device, and the host CPU cannot identify an abnormal part.
[0009]
[Means for Solving the Problems]
(1) A PCI bus defective portion isolation method according to the first aspect of the present invention is a host CPU.cardAnd a PCI device with a built-in PCI function connected to the system via the PCI bus, if there is an abnormality when the PCI function or PCI device is configured, the PCI function or PCI device to be abnormally In a method for isolating a defective PCI bus from a system, a first step of sequentially performing configuration for each PCI function or each PCI device at the time of starting the system, and an abnormality during the configurationButIf there,The host CPU card andA second step of resetting all PCI functions or PCI devices and a step of performing configuration sequentially for each remaining PCI function or PCI device by disconnecting the PCI function or PCI device that has failed after the resetting. 3 steps andIf the abnormality is not resolved even after executing the configuration in the third step, the second step is performed so as to return to the second step and reset the host CPU card and all PCI functions or PCI devices. And the fifth step in which the third step is repeated a predetermined number of times or until the abnormality is resolved is performed on the host CPU card, so that a plurality of abnormal PCI functions or PCI devices can be separated from the system.Is.
[0010]
(2) The PCI bus fault location isolation method according to the invention of claim 2 is the PCI bus fault location isolation method according to claim 1, after resetting in the second step, returning to the first step, The first step and the second step are repeated at least once so as to configure the PCI function or PCI device, and if the abnormality is not resolved, the fourth step proceeds to the third step.IncludingIt is a thing.
[0011]
(3) The PCI bus fault location isolation method according to the invention of claim 3 is the PCI bus fault location isolation method according to claim 1 or 2, wherein the abnormality is not resolved even if the third step is executed. Alternatively, if the abnormality is not resolved after a desired time after the third step is executed, or if any one of the first to third steps cannot be executed, the sixth determination is made that the host CPU card is abnormal. Including the stepsIs.
[0012]
(4) The program for executing the PCI bus defective part isolation method according to the invention described in claim 4 is a program for executing the PCI bus defective part isolation method according to any one of claims 1 to 3. Is.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
  An embodiment of the present invention will be described below.
Embodiment 1 FIG.
  FIG. 1 is a configuration diagram of a system having a PCI bus. Reference numeral 10 denotes a PCI bus to which a host CPU card 11 and add-in cards 12, 13, and 14 are connected. The host CPU card 11 incorporates a host CPU 11a and various computer functions such as a memory and an interface (not shown) to form a host CPU function. In addition, various necessary S / Ws 11b are also built in, and software of the PCI bus fault location isolation method of the present invention is also built in. The add-in cards 12, 13, and 14 are provided with PCI devices 12a, 13a, and 14a made of, for example, LSI, and PCI functions 12b, 12c, 13b, 14b, and 14c having various functions are included in these PCI devices. Built in.
[0014]
  FIG. 2 is a block diagram for realizing the PCI bus fault location isolation method according to the first embodiment of the present invention. In FIG. 2, 1 operates on a host CPU card connected to the PCI bus and executes a PCI configuration cycle. PCI configuration cycle execution unit to perform, 2 is an access address storage area for storing an access address (combination of PCI bus number / device number / function number) for executing a PCI configuration cycle from now on, and 3 is a logical for all functions It is a function management table (refer FIG. 6) which manages the cutting | disconnection condition.
[0015]
  4Is a WDT circuit that monitors the normal operation of the CPU on the host CPU card and the software operating on it, and 5 is the software when the CPU on the host CPU card and the software operating on the CPU are operating normally. This is a WDT clear register that is periodically written. By writing, the WDT counter value is cleared before the WDT counts up. If the operation of the CPU on the add-in card or the software operating on the add-in card is abnormal, it appears as an abnormality of the CPU on the host CPU card and the software operating on the CPU, and as a result, writing to the WDT clear register 5 is performed. Stop. Therefore, the add-in card side is also monitored by the WDT circuit 4.
[0016]
6 is a reset generation circuit that is activated when the WDT circuit 4 counts up and issues a wake up reset to the system, and 7 is a reset indicating whether the reset is released by power-on or the wake-up reset is released. It is a factor register and can be read from the PCI configuration cycle execution unit 1.
The contents are stored in the memory of the access address storage area 2 and the function management table 3, and the contents are retained even after a wake-up reset.
[0017]
Next, the operation of the WDT circuit 4 will be described.
3 and 4 are diagrams showing an example of the movement of the counter value of the WDT circuit 4.
(1) First, in FIG. 3A, the WDT circuit 4 starts counting at power-on reset release (time: T1).
(2) The host CPU and the software operating on the host CPU are operating normally, and the WDT counter is cleared by writing to the WDT clear register 5 by the software before the WDT count up. During this time, configuration is complete and the system is up.
[0018]
(3) As shown in FIG. 3 (b), if there is an abnormality in a PCI function and the PCI configuration processing for that PCI function is stopped at time T2, writing to the WDT clear register 5 by software Stops, and after the period P1, the WDT counter value is counted up. Since this is the first WDT count up, the reset generation circuit 6 requests the reset generation circuit 6 to execute the wakeup reset, and the reset generation circuit 6 performs the wakeup reset. (Time: T3).
(4) The WDT circuit 4 starts counting again after canceling the wakeup reset, but this time the configuration is completed before the WDT count-up by passing the configuration to the abnormal PCI function, and the WDT clear register 5 is set by software. Since the system is cleared, the system will start up normally.
[0019]
(5) If an abnormality occurs in the PCI function and the abnormality has not been recovered, writing to the WDT clear register 5 by the software remains stopped as shown in FIG. 4A, so the period P2 (= P1) Later, the WDT counter value is counted up again, and since this is the second consecutive WDT count up, it is determined that the function of the host CPU or the PCI bus itself, which is a common unit, is stopped, and an alarm signal is sent, for example. An abnormality determination process is performed (time T5).
(6) As shown in FIG. 4B, if the software writes to the WDT clear register 5 even once during the period P2, the next WDT count-up is regarded as the first time and the wake-up reset is performed. It becomes. (Since the WDT clear process is entered during the period P2, it is not the second consecutive time.)
[0020]
Next, the entire processing flow will be described.
FIG. 5 is a flowchart showing an example in which the PCI configuration cycle execution unit 1 configures each function on the PCI bus. First, the case where all PCI buses / devices / functions are normal and the reset factor is power ON will be described.
[0021]
(1) The PCI configuration cycle execution unit 1 reads the contents of the reset factor register 7 (step ST1-1),
(2) Check the cause of reset this time (step ST1-2),
(3) Since the reset factor is other than wake-up reset by WDT count-up, all function statuses in the function management table 3 are initialized to “0: normal” (step ST1-3),
(4) Move to configuration repetition processing for all PCI buses / devices / functions (step ST1-4),
(5) Check the target function status to be configured during the current iterative process (step ST1-5),
[0022]
(6) Since there is no abnormality, the access address to be configured this time is stored in the access address storage area 2 (step ST1-6).
(7) PCI configuration execution of target function (step ST1-7),
(8) After completion, the access address storage area 2 is cleared (step ST1-8), and all PCI buses / devices / functions are repeatedly executed from step ST1-4 (step ST1-9) to complete all PCI configurations.
(9) Write to the WDT clear register 5 before counting up the WDT (step ST1-10). (The counter value is determined so that the WDT does not count up even at the normal end.)
[0023]
Next, the flow from the power-on reset in the state where an abnormality has occurred with PCI bus number = 0 / device number = 1 / function number = 1 will be described.
(1) Steps ST1-1 to ST1-6 are exactly the same as described above.
(2) When PCI configuration is performed for PCI bus number = 0 / device number = 1 / function number = 1 (step ST1-7),
[0024]
(3) There are cases where the PCI configuration cycle repeats retry processing indefinitely and does not end. (In the case of a PCI target card (add-in card) in which a general-purpose PCI chip set and a CPU are mounted and this general-purpose PCI chip is initialized by an S / W operating on the CPU, the S / W does not operate normally. If a malfunction occurs, the initialization of the PCI chip will not be completed, and in this case, the retry process may be repeated infinitely for the PCI configuration cycle from the host CPU card.) As a result, before the WDT count up Since no data can be written to the WDT clear register 5, a wake-up reset occurs.
[0025]
Next, the flow after the occurrence of the wake-up reset will be described.
(1) The PCI configuration cycle execution unit 1 reads the contents of the reset factor register (step ST1-1),
(2) Check the cause of reset this time (step ST1-2),
(3) Since the reset factor is a wake-up reset by WDT count-up, the contents of the access address storage area 2 are read (step ST1-11),
(4) Check the stored contents (step ST1-12),
(5) Since the access address where the abnormality has occurred is stored, the access address storage area 2 is cleared (step ST1-13),
[0026]
(6) Based on PCI bus number = 0 / device number = 1 / function number = 1, which is information accessed before the occurrence of wakeup reset, the corresponding function status in the function management table 3 is set to “1: Abnormal”. "(Step ST1-14)
(7) Move to configuration repetition processing for all PCI buses / devices / functions (step ST1-4),
(8) Check the target function status to be configured during the current iterative process (step ST1-5),
[0027]
(9) Since the function status of PCI bus number = 0 / device number = 1 / function number = 1 is set to “1: abnormal”, PCI bus number = 0 / device number = 1 / function number = 1 The PCI configuration processing is omitted, and the remaining PCI buses / devices / functions are repeatedly executed from step ST1-4 (step ST1-9) to complete the entire PCI configuration.
(10) Write to the WDT clear register 5 before counting up the WDT (step ST1-10).
[0028]
(11) If the same abnormality occurs in other functions during the PCI configuration of all remaining PCI buses / devices / functions, it stops at the PCI configuration process, and after the second consecutive WDT count up, the host CPU It is determined that the function is stopped.
In other words, only the process of disconnecting one PCI function that first detects an abnormality is performed. If a plurality of PCI functions are abnormal, it is determined that the function of the host CPU or the PCI bus itself that is a common unit is stopped, and an alarm, etc. Perform sending processing.
[0029]
FIG. 6 shows an example of the function management table 3 after a PCI bus number = 0 / device number = 1 / function number = 1 and an abnormality occurs and the PCI configuration processing is omitted.
[0030]
In this way, a wake-up reset by counting up the WDT is provided so that the reset can be recovered from the stop of the PCI configuration cycle processing for the abnormal function. Further, the access address storage area 2 and the function management table 3 cause an abnormality. A means for storing and holding the access destination (PCI bus number / device number / function number) of the PCI configuration cycle is provided, and a WDT clear process is performed only after completion of all PCI configurations. The one PCI function can be logically separated, and the continuous operability of the system in the normal part can be improved.
[0031]
Further, it is not necessary to add a special H / W circuit for analyzing a PCI transaction.
[0032]
Further, since no electrical load is applied on the PCI bus, one PCI slot expansion slot is not occupied, so that the PCI bus expansion slot can be used effectively.
[0033]
Embodiment 2. FIG.
In the first embodiment, in FIG. 4A, after the wake-up reset is performed by the first WDT count-up at time T3, the PCI configuration is executed for the remaining PCI functions except for the abnormal PCI function. However, a transient abnormality may occur due to the influence of noise or the like, and a normal PCI function may be regarded as abnormal.
In the second embodiment of the present invention, the PCI configuration is re-executed for the PCI functions of all addresses without removing the PCI functions of the abnormal addresses.
That is, the operation corresponding to the period P1 in FIG. 4A is repeated at least once, and then the process proceeds to the period P2.
[0034]
Embodiment 3 FIG.
Next, a third embodiment of the present invention will be described. In the third embodiment, the block diagram is the same as that in FIG. 2 in the first embodiment. The difference from the first embodiment is the flowchart of FIG. 7 showing the processing of the PCI configuration cycle execution unit 1 and the use of FIG. This is the contents of the function management table.
[0035]
Next, the operation will be described.
FIG. 7 is a flowchart showing an example in which the PCI configuration cycle execution unit 1 according to the third embodiment of the present invention performs processing using the function management table 3. In FIG. 7, the same processing steps as those in the first embodiment are given the same step numbers as in FIG. Further, in FIG. 7, only the parts different from those in FIG. 5 will be described with new step numbers ST2-15.
[0036]
When all the PCI buses / devices / functions are normal, the processing is the same as that of the first embodiment, and the description thereof is omitted.
Further, the flow from the power-on reset to the first occurrence of the wake-up reset in the state where an abnormality has occurred with PCI bus number = 0 / device number = 1 / function number = 1 is the same as that of the first embodiment. The description is omitted because it is a process.
[0037]
Next, the flow after the first occurrence of the wakeup reset will be described.
(1) Steps ST1-1 to ST1-14 are the same as in the first embodiment. At this time, the function in the function management table 3 corresponding to PCI bus number = 0 / device number = 1 / function number = 1. The status is set to “1: Abnormal”.
(2) Thereafter, in the third embodiment, a write process to the WDT clear register 5 is performed (step ST2-15),
(3) Shift to configuration repetition processing for all PCI buses / devices / functions (step ST1-4),
(4) Check the target function status to be configured during the current iterative process (step ST1-5),
(5) Since the function status of PCI bus number = 0 / device number = 1 / function number = 1 is set to “1: abnormal”, PCI bus number = 0 / device number = 1 / function number = 1 The PCI configuration process is omitted, and the remaining PCI bus / device / function is repeatedly executed from step ST1-4 (step ST1-9).
[0038]
(6) Here, when the same abnormality occurs in other PCI functions during the PCI configuration of all remaining PCI buses / devices / functions, PCI bus number = 0 / device number = 1 / function number = 3 (See FIG. 8). The PCI configuration processing for the PCI bus number = 0 / device number = 1 / function number = 3 is repeated indefinitely and does not end. As a result, the WDT counter counts up again.
(7) Here, the difference from the first embodiment is that the write to the WDT clear register 5 is performed once in step ST2-15, and therefore a wake-up reset occurs even in the current count-up.
[0039]
(8) After wake-up reset, start again from step ST1-1,
(9) Eventually, an abnormality of both functions of PCI bus number = 0 / device number = 1 / function number = 1 and PCI bus number = 0 / device number = 1 / function number = 3 is detected, and both functions are detected. By omitting the PCI configuration process, both functions are logically separated from the PCI bus.
[0040]
In FIG. 8, after the PCI configuration processing is omitted due to the occurrence of an abnormality with PCI bus number = 0 / device number = 1 / function number = 1 and PCI bus number = 0 / device number = 1 / function number = 3 An example of the function management table 3 is shown.
[0041]
In this way, a wake-up reset by counting up the WDT is provided so that the reset can be recovered from the stop of the PCI configuration cycle processing for the abnormal function. Further, the access address storage area 2 and the function management table 3 cause an abnormality. A means for storing and holding the access destination (PCI bus number / device number / function number) of the PCI configuration cycle is provided, and WDT clear processing is performed each time an abnormal function is detected. It is possible to logically separate the detection and the plurality or all of the abnormal PCI functions, and the continuous operability of the system in the normal part can be improved.
[0042]
  As described above, there are multiple abnormal PCI functions, and all PCI functions with these abnormalities.ButThe configuration can be repeated to complete the configuration until disconnected.
  However, the host CPUcardIf there is a failure, the configuration operation is repeated. To prevent this, if the configuration is performed a predetermined number of times, it is determined that the function of the host CPU or the PCI bus itself, which is a common unit, has stopped, and an alarm is sent to cancel the configuration. You may make it do.
[0043]
Embodiment 4 FIG.
There are many configurations of 1 card = 1 device = 1 function or 1 card = 1 device = multiple functions as hardware mounted on the PCI bus as a system configuration, and an abnormal unit is also a card unit, that is, a device It is often a unit. In the fourth embodiment, logical separation is performed in units of PCI devices, and the system rise time is further shortened.
[0044]
FIG. 9 is a block diagram for realizing a PCI bus fault location isolation method according to the fourth embodiment of the present invention. In the figure, the same blocks as those in the third embodiment are given the same numbers as in FIG. Omitted. Further, in FIG. 9, only parts different from FIG. 2 will be described with new block numbers in the 30s.
In the fourth embodiment, since the unit of logical separation from the PCI bus is a PCI device, the function management table 3 in the third embodiment is a device management table 33. Other blocks are the same as those in the third embodiment.
[0045]
Next, the operation will be described.
FIG. 10 is a flowchart showing an example in which the PCI configuration cycle execution unit 1 according to the fourth embodiment of the present invention performs processing using the device management table 33. In the figure, the same step numbers as those in FIG. In FIG. 10, only steps different from FIG. 7 will be described with new step numbers ST3-3, ST3-5, and ST3-14.
[0046]
When all the PCI buses / devices / functions are normal, the processing is the same as that of the third embodiment, so that the description is omitted.
A flow from a power-on reset in the state where an abnormality has occurred with PCI bus number = 0 / device number = 1 / function number = 1 will be described.
(1) Steps ST1-1 to ST1-2 are exactly the same as in the third embodiment.
(2) Since the reset factor is other than wake-up reset by WDT count-up, all device statuses in the device management table 33 are initialized to “0: normal” (step ST3-3).
[0047]
(3) Steps ST1-4 to ST1-6 perform the same processing as in the third embodiment, and when PCI configuration is performed for PCI bus number = 0 / device number = 1 / function number = 1 (step ST1-7) In some cases, the PCI configuration cycle repeats retry processing indefinitely and does not end. As a result, since writing to the WDT clear register 5 cannot be performed before the WDT count-up, a wake-up reset occurs.
[0048]
Next, the flow after the occurrence of the wake-up reset will be described.
(1) In steps ST1-1 to ST1-13, as in the third embodiment, PCI bus number = 0 / device number = 1 / function number = 1, which is information accessed before the wake-up reset is generated. Originally, the corresponding device status in the device management table 33 is set to “1: abnormal” (step ST3-14),
(2) Move to configuration repetition processing for all PCI buses / devices / functions (step ST1-4),
(3) Check the target device status to be configured during the current iterative process (step ST3-5),
[0049]
(4) Since the device status of PCI bus number = 0 / device number = 1 is set to “1: abnormal”, the PCI configuration processing for all functions of PCI bus number = 0 / device number = 1 is omitted. The remaining PCI bus / device / function is repeatedly executed from step ST1-4 (step ST1-9) to complete the entire PCI configuration.
(5) Write to the WDT clear register 5 before counting up the WDT (step ST1-10).
[0050]
(6) If the same abnormality occurs in other devices during the PCI configuration of all remaining PCI buses / devices / functions, the corresponding device status in the device management table 33 is set to “1: abnormal”. Then, it restarts again from step ST1-1.
Even in this case, since the device management table 33 is held without being cleared, a device for which the PCI configuration processing has already been omitted because of an abnormality last time is omitted again this time.
[0051]
In FIG. 11, after PCI bus number = 0 / device number = 1 / function number = 1 and PCI bus number = 0 / device number = 3 / function number = 1, an error occurs and PCI configuration processing is omitted. An example of the device management table 33 is shown. Note that when configuring or detecting an abnormality in units of functions, since disconnection is in units of devices, no function number is displayed in the device management table 33 in FIG.
[0052]
In this way, a wake-up reset by counting up the WDT is provided so that the reset can be returned from the stop of the PCI configuration cycle processing for the abnormal device. Further, the access address storage area 2 and the device management table 33 cause an abnormality. A means for storing and holding the access destination (PCI bus number / device number) of the PCI configuration cycle is provided, and a WDT clear process is performed every time an abnormal device is detected. The plurality or all of the abnormal PCI devices can be logically separated, and the continuous operability of the system in the normal part can be improved.
[0053]
In addition, the logical detachment in units of PCI devices shortens the system startup time compared with the logical detachment in units of PCI functions.
[0054]
In each of the above embodiments, the PCI bus has been described. However, the present invention can also be applied to a Compact PCI bus that is frequently used recently.
[0055]
【The invention's effect】
(1) As described above, according to the first aspect of the present invention, it is possible to detect one abnormal PCI function and logically separate the one PCI function, and the continuous operation of the system in a normal part. There is an effect of improving.
[0056]
(2) According to the invention described in claim 2, even if there is an abnormality, it is normal even if a transient abnormality occurs due to the influence of noise or the like by repeating the configuration for all PCI functions at least once. The PCI function is prevented from being regarded as abnormal.
[0057]
(3)Claims 1 and 2According to the described invention, if there is an abnormality, the remaining PCI functions are sequentially configured to detect a plurality or all of the abnormal PCI functions and logically separate the plurality or all of the abnormal PCI functions. It is possible to start up only with a normal PCI function, and there is an effect of improving the continuous operability of the system in a normal part.
[0058]
(4) Claim3According to the described invention, since an abnormality is detected in units of PCI devices and the PCI devices are disconnected, there is an effect of improving the continuous operability of the system in a normal part.
[0059]
(5) Claim4According to the described invention, there is an effect that the program improves the continuous operability of the system in the normal part.
[0060]
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a PCI bus system according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing an example of a PCI defective portion isolation method according to the first embodiment of the present invention.
FIG. 3 is an operation example of a WDT circuit used in the PCI defective portion isolation method according to the first embodiment of the present invention;
FIG. 4 is an operation example of a WDT circuit used in the PCI defective portion isolation method according to the first embodiment of the present invention.
FIG. 5 is a flowchart showing an example of a PCI defective portion isolation method according to the first embodiment of the present invention.
FIG. 6 is an example of a function management table used in the PCI defective portion isolation method according to the first embodiment of the present invention.
FIG. 7 is a flowchart showing an example of a PCI defective portion isolation method according to Embodiment 3 of the present invention.
FIG. 8 is an example of a function management table used in the PCI defective portion isolation method according to the third embodiment of the present invention.
FIG. 9 is a block diagram showing an example of a PCI defective portion isolation method according to Embodiment 4 of the present invention.
FIG. 10 is a flowchart showing an example of a PCI defective portion isolation method according to Embodiment 4 of the present invention.
FIG. 11 is an example of a device management table used in the PCI defective portion isolation method according to the fourth embodiment of the present invention.
FIG. 12 is a configuration example of a conventional PCI bus processing device.
[Explanation of symbols]
1 PCI configuration cycle execution unit
2 Access address storage area 3 Function management table
4 WDT circuit 5 WDT clear register
6 Reset generation circuit 7 Reset factor register
12, 13, 14 Add-in card
12a, 13a, 14a PCI devices
12b, 12c, 13b, 14b, 14c PCI function

Claims

If an abnormality occurs when the PCI function or PCI device is configured in a system in which a host CPU card and a PCI device with a built-in PCI function are connected via a PCI bus, the PCI function or PCI that is the target of the abnormality is detected. In the method of isolating a defective part of the PCI bus that isolates the device from the system,
A first step of sequentially performing configuration for each PCI function or each PCI device at the time of starting the system;
A second step of resetting the host CPU card and all PCI functions or PCI devices if there is an abnormality during the configuration;
A third step of disconnecting the PCI function or PCI device having an abnormality after the reset and sequentially performing configuration for each remaining PCI function or PCI device ;
If the abnormality is not resolved even after the configuration is executed in the third step, the second and the second steps are performed so that the host CPU card and all PCI functions or PCI devices are reset. It is characterized in that a plurality of abnormal PCI functions or PCI devices can be separated from the system by performing the fifth step of repeating the third step a predetermined number of times or until the abnormality is resolved with the host CPU card. To isolate defective PCI bus parts.

2. The PCI bus fault location isolation method according to claim 1, wherein after resetting in the second step, returning to the first step and again configuring all PCI functions or PCI devices. at least one repetition, abnormality when persists fourth step the method disconnect the PCI bus failed portion, characterized in that was free Me of the transition to the third step a.

3. The method for isolating a defective PCI bus location according to claim 1 or 2 , wherein the abnormality is not resolved even if the third step is executed, or the abnormality is not eliminated after a desired time after the execution of the third step. or, first to the case where any one of the steps of the third step can not be executed, PCI bus defective portion disconnect method is characterized in that there was including Me a sixth step of determining an abnormality of the host CPU card.

The program for performing the PCI bus fault location isolation | separation method of any one of Claims 1-3 .