JP4672224B2

JP4672224B2 - Peer-to-peer interconnect diagnostics

Info

Publication number: JP4672224B2
Application number: JP2001540468A
Authority: JP
Inventors: ミラー、マイケル、ハワード; クームス、ジェイムズ、アレン
Original assignee: Seagate Technology LLC
Current assignee: Seagate Technology LLC
Priority date: 1999-11-22
Filing date: 2000-11-22
Publication date: 2011-04-20
Anticipated expiration: 2020-11-22
Also published as: CN1391673A; JP2003515967A; GB0212193D0; GB2372606B; WO2001038982A1; GB2372606A; KR20020050300A; KR100824109B1; DE10085218T1

Description

【０００１】
（関連出願）
本出願は３５Ｕ．Ｓ．Ｃ２１９（ｅ）に基づき２０００年１１月２２日提出の米国仮出願一連番号６０／１６６、８０５号の優先権を主張する。
【０００２】
（発明の技術分野）
本発明はループ診断の分野に関係する。特に、本発明はピアーツーピア・インターフェース診断に関係する。
【０００３】
（発明の背景）
コンピュータ・システムの重要部品の１つはデータを記憶する装置である。コンピュータ・システムはデータを記憶可能な多数の異なる場所を有する。コンピュータ・システムに大量のデータを記憶する一般的な場所はディスクドライブ上である。ディスクドライブの最も基本的な部品は回転するディスクと、ディスク上の色々の場所へ変換器を移動するアクチュエータと、ディスクからデータを読み書きするために使用される電子回路である。ディスクドライブはまたデータをコード化してディスク面から成功裏に検索書き込み出来るようにする回路も含む。マイクロプロセッサがディスクドライブの殆どの操作と共にデータを要求コンピュータに引き渡すことと要求コンピュータからデータを取ってディスクに記憶することを制御する。
【０００４】
データを表す情報は記憶ディスクの表面に記憶される。ディスクドライブ・システムは記憶ディスクのトラック上に記憶された情報を読取り及び書込む。
【０００５】
ファイバチャネル（ＦＣ）はＡＮＳＩにより標準化されたシリアルデータ転送体系である。有名なＦＣ標準はファイバチャネル・アービトレーテッドループ（Fibre Channel Arbitrated Loop、ＦＣ−ＡＬ）である。この標準は分散デージーチェーン・ループを規定する。ＦＣはこのループ上でピアーツーピアー通信を提供する。
【０００６】
ＦＣ−ＡＬは非常な広帯域を必要とする新たな大量記憶装置とその他の周辺装置用に設計された。ＦＣ−ＡＬはスモールコンピュータ・システムインターフェース（ＳＣＳＩ）コマンドセットに加えてその他の上位プロトコルもサポートする。ＦＣへのこれらの上位プロトコルのマッピングはＦＣ−４層と呼ばれる。
【０００７】
ＦＣ−ＡＬでは、発生装置からの情報は、受信装置に到着する前に、複数のその他装置と、装置間のリンクを介して受渡し可能である。複数リンク上の情報の受渡しは、ポイントツーポイント接続を介した限界及び故障リンクを分離する複雑さを加えるが、限界リンクを分離する３つの従来技術が存在する。限界ＦＣリンクを分離する1つの技術は問題リンクを分離するためにリンクステータスを使用する。第２の方式はＦＣ−４マッピングのエラー報告機能を使用する。第３の方式は最初の２つの組み合わせである。
【０００８】
３技術の主要要件はトポロジー（すなわち接続順序）の知識である。トポロジーの知識はループ位置マップからＦＣ−ＡＬ定義ループ初期化時にまたは暗黙的な手段により得られる。暗黙的な手段の例はハード・アドレスを使用したディスクドライブのエンクロージャである。
【０００９】
限界リンクの分離にリンクステータスを使用する第１の方式は、ループ上で少なくとも１台の装置に管理アプリケーション（ＭＡ）を必要とする。いずれかの故障をカバーするため数台のＭＡを実装してもよい。ＭＡは通常のループ操作時にループを定期的にポールするか、またはリンクエラーを検出した装置に事故を報告するよう要求する。ポーリング・モードでは、全ての装置に累積されたステータスを使用して限界リンクを探す。報告エラー、識別モードでは、エラーを報告した全ての装置から累積したステータスを使用して限界リンクを探す。
【００１０】
この方式により単一エラー源の分離は可能であるが必ずしも保証されていない。
【００１１】
リンクステータスの使用はこの方式をＦＣ−４独立にする。これは複数プロトコル・ループでは有利である。しかしながら、リンクステータスを使用する欠点はポーリングまたは報告エラー・モードのオーバーヘッドがループの効率を減少することである。
【００１２】
第２の方式はＦＣ−４マッピングのエラー報告機能を使用する。ループ上のエラー源を分離するためＦＣ−４報告エラーを使用すると、エラーのログを保持することが必要である。ログを解析することによってエラー源を位置決めしてどの装置がエラーを報告しどれが報告していないかを決定する。
【００１３】
ループ上のエラー源を分離するためＦＣ−４報告エラーを使用することは、リンクエラー履歴を保持しループをポールするためのＭＡの必要性を除去する。ループをポーリングしないと、ループのオーバーヘッドが減少する。さらに、エラーは発生したときにのみ報告される。
【００１４】
ループ上のエラー源を分離するためＦＣ−４報告エラーを使用すると、単一のマスタ装置が全ての報告エラーを受信する実装では最良に動作する。このような実装の例は、単一イニシエータＳＣＳＩ記憶サブシステムである。
【００１５】
ＦＣ−４エラーステータスにのみ頼ると少なくとも３つの欠点がある。単一エラーの発生はエラー源を分離するのに十分な情報を提供しない。さらに、エラー源を分離するためには履歴を構築するためステータスを累積しなければならない。最後に、ＦＣ−４ステータスを受信する複数プロトコルまたは複数装置をサポートするループでは、エラーが共通の目的装置に報告されないため実装が困難となる。
【００１６】
限界ＦＣリンクを分離する第３の技術は、問題リンクを分離するためリンクステータスとＦＣ−４エラー報告を使用する。ポーリングは使用されず、単一エラー源の分離が可能である。
【００１７】
リンクステータスの使用と同様に、全てのエラー装置のエラーカウントを保持するためにＭＡが必要である。ＦＣ−４エラーが報告されるかまたはＭＡがリンクエラーを検出すると、ＭＡは全ての装置から累積リンクステータスを読取って可能なエラー源を決定する。
【００１８】
複数ＦＣ−４によるループ上の実装の欠点は、ＭＡが全てのＦＣ−４をサポートしなければならない点である。
【００１９】
図１を参照すると、ＳＣＳＩファイバチャネル・プロトコル（ＦＣＰ）装置を含むループ１０５の図が図示されている。ループは、ＳＣＳＩターゲット装置１２０、１３０、１４０と通信する、ループマスタとして動作するＳＣＳＩイニシエータ装置１１０を含む。装置１２０と装置１３０との間のリンク又は相互接続１５０は、限界及び／または故障している。
【００２０】
ＦＣ−４により提供されるエラー検出と報告を、利用可能な時に限界リンクの分離に使用してもよい。
【００２１】
限界リンク１５０により、ループマスタ１１０はコマンド・タイムアウトとデータエラーを体験する。コマンド・タイムアウトはコマンド、転送レディ、または応答フレーム時のエラーの結果である。これらのフレームはエラーで受信した時に放棄される。タイムアウトは、ターゲットへの放棄フレーム、コマンドまたはターゲットからの、転送レディ及び応答から起因するため、不良リンクの位置は決定不能である。
【００２２】
書込みデータ操作では、装置１２０はループマスタ１１０からのデータに対してエラーを体験しない。装置１２０と装置１３０は、しかしながら、限界リンクにより導入されたエラーを検出する。書込みデータに対するエラーがＦＣＰ応答で報告される。
【００２３】
読取りデータ操作では、ループマスタ１１０は装置１３０と装置１４０からの読取りデータに対してエラーを検出しない。
【００２４】
必要なものは、ループのオーバーヘッド・トラヒックを減少し、診断の有効性を増加する、ループのトポロジーの知識を必要としないループエラー診断である。
【００２５】
（発明の要約）
エラー源を分離するピアーツーピアー方式では、管理アプリケーション（ＭＡ）機能はループ上の全ての装置に分散されている。リンクステータスがエラー源分離に使用される。特に、各装置はアイデンティティとその入力、上流に接続された装置のリンクエラーステータスを保持する。装置がその入力上でリンクエラーを検出した時、装置は上流装置にリンクエラー・カウントの要求を開始する。
【００２６】
上流装置のリンクステータスが、その装置もリンクエラーを検出したことを指示している時、エラー源はループ上の異なるリンクである。上流装置からのリンクステータスが、エラーを検出していることを指示していない場合、エラー源は上流装置とその装置自身との間の相互接続であろう。装置は次いで自身と上流装置との間で診断転送を開始して相互接続が限界であることを検証する。
【００２７】
本発明のループエラー診断は、ループの完全なトポロジーの知識を必要としない点が有利である。エラー分離がループ中の装置の各々に分散されているため、本発明はまたループオーバーヘッド・トラヒックも減少する。さらに、問題源に最も近い装置が診断を実行するため、ループ診断の有効性が増加する。加えて、各装置の診断機能は、装置がアイドル時に実行されるよう付勢され、従って、高優先度タスク時に装置の性能に診断が影響を与えることを防止するので、本発明はループ上の各装置の性能の劣化を最小化する。
【００２８】
（望ましい実施例の詳細な説明）
望ましい実施例の以下の詳細な説明では、本明細書の一部を形成し、本発明を実施する特定の実施例を実例により図示する添付図面を参照する。本発明の範囲から逸脱することなく他の実施例を利用してもよいし、また構造を変更してもよいことを理解すべきである。
【００２９】
本願で記載する発明は、回転またはリニア駆動のどちらかを有するディスクドライブの全ての機械的構成で有用である。さらに、本発明はまた、表面から変換器のアンロードと変換器のパークが望ましいハードディスク・ドライブ、ジップドライブ、フロッピー（登録商標）ディスク・ドライブを含む全ての型式のディスクドライブにも有用である。図２は回転アクチュエータを有するディスクドライブ２００の１型式の展開図である。ディスクドライブ２００は、ハウジングまたはベース２１２と、カバー２１４を含む。ベース２１２とカバー２１４は、ディスク・エンクロージャを形成する。アクチュエータ・シャフト２１８上でベース２１２に回転可能に取付けられているのはアクチュエータ組立体２２０である。アクチュエータ組立体２２０は、複数個のアーム２２３を有する櫛状構造体２２２を含む。櫛２２２上の別々のアーム２２３にはロードビームまたはロードスプリング２２４が取付けられる。ロードビームまたはロードスプリングはまたサスペンションとも呼ばれる。各ロードスプリング２２４の端部には磁気変換器２５０を担持するスライダ２２６が取付けられる。変換器２５０を有するスライダ２２６はいわゆるヘッドを形成する。多数のスライダが１個の変換器２５０を有し、これが図面に図示されているものであることに注意されたい。１個の変換器２５０が一般的に読取りに使用され他方が一般的に書込みに使用される、いわゆるＭＲまたは磁気抵抗ヘッドと呼ばれるような、１個以上の変換器を有するスライダにも本発明は同様に適用可能であることにも注意すべきである。ロードスプリング２２４とスライダ２２６に対向するアクチュエータアーム組立体２２０の端部にはボイスコイル２２８がある。
【００３０】
ベース２１２内には第１磁石２３０と第２磁石２３１が取付けられる。図２に示すように、第２磁石２３１はカバー２１４と関係している。第１及び第２磁石２３０、２３１とボイスコイル２２８は、アクチュエータ・シャフト２１８のまわりに回転させるためアクチュエータ組立体２２０に力を印加するボイスコイル・モータの重要部品である。ベース２１２はスピンドルモータも取付けられる。スピンドルモータはスピンドルハブ２３３と呼ばれる回転部分を含む。この特定のディスクドライブでは、スピンドルモータはハブ内にある。図２では、多数のディスク２３４がスピンドルハブ２３３に取付けられている。他のディスクドライブでは、単一のディスクまたは異なる数のディスクがハブに取付けられる。本明細書で記載する本発明は、複数枚のディスクを有するディスクドライブと共に単一のディスクを有するディスクドライブにも等しく適用可能である。本明細書で記載する発明は、ハブ２３３内にあるまたはハブの下にあるスピンドルモータを有するディスクドライブに等しく適用可能である。
【００３１】
次に図３を参照すると、ループエラー診断の方法３００のプロセス図が示されている。方法３００はループで上流装置のアイデンティティを決定する段階３１０を含む。以後、方法３００はアイデンティティを保存する段階３２０を含む。一実施例では、決定段階３１０と保存段階３２０は装置の初期化時に実行される。他の実施例では、ループの上流装置のアイデンティティはループマップから検索される。以後、方法３００はループの上流装置からリンクエラー・カウントを要求する段階３３０を含む。方法３００はまたリンクエラー・カウントを局所的に記憶する段階３４０を含む。以後、方法３００はループ上のエラーを監視する段階３５０を含む。以後、方法３００は装置の入力にエラーが存在するかどうかを決定する段階３６０を含む。そうでない場合、本方法は動作３５０を続行する。エラーが存在する場合、ループの上流装置から現在のリンクエラー・カウントを要求する３７０。以後本方法はループの構成が変化したかどうかを決定する。ループの構成が変化した場合、本方法は動作３１０を続行し、そうでない場合本方法は現在のリンクエラー・カウントが保存したエラーカウントと比較して変化したかどうかを決定する段階３８５を続行する。現在のリンクエラー・カウントが保存エラーカウントと比較して変化した場合、これはループ上のどこかでのエラーを指示するが、本方法はリンクエラー・カウントを局所的に記憶する動作３４０を続行する。現在のリンクエラー・カウントが保存エラーカウントと比較して変化していない場合、上流装置とエラーを検出した装置との間でエラーが発生し、本方法は検査リンク３９０を続行し、エラーが報告される３９５。
【００３２】
本発明のループエラー診断はループの完全なトポロジーの知識を必要とせず、エラー分離がループ中の装置の各々に分散されているためループのオーバーヘッドを減少する。さらに、問題の源に最も近い装置が診断を実行するためループ診断の有効性が増加する。加えて、各装置の診断機能は装置がアイドルの時に実行されるよう付勢され、従って、各装置の診断機能は装置がアイドルの時に実行されるよう付勢され、従って高優先度タスク時の装置の性能に診断が影響を与えることを防止するので、本発明はループ上の各装置の性能の劣化を最小化する。
【００３３】
次に図４を参照すると、ループエラー診断の方法４００のプロセス図が図示されている。方法４００は分散デージーチェーン・ピアーツーピアー・ループで装置上に局所的に記録されたリンクエラー条件を識別する段階４１０を含む。この識別は以下の図５と関連してさらに詳細に説明されている。一実施例では、分散デージーチェーン・ピアーツーピアー・ループはファイバチャネル・アービトレーテッド・ループ（ＦＣ−ＡＬ）である。他の実施例では、装置は図２のディスクドライブ２００のような、ディスクドライブである。
【００３４】
ファイバチャネル（ＦＣ）装置は、装置が受信するエラーを検出しカウントする。カウントはリンクエラーステータス・ブロック（ＬＥＳＢ）に保存される。装置で出会うであろうエラーは、リンク故障（例えば指定時間以上のワード同期の損失）、同期の損失（例えば、指定時間以下のワード同期の損失と指定数以上のワードの不正伝送）、実行不正パリティエラーまたは不正文字を検出した、及び／または不正巡回冗長チェックの不正伝送ワードを含む。
【００３５】
ＬＥＳＢの何らかのフィールドが増加している場合、装置はエラーを検出している。
【００３６】
ループ上の装置からリンクステータスを獲得する当業者には公知のいくつかの技術がある。１つの技術は読取りリンクステータス（ＲＬＳ）拡張リンクサービス（ＥＬＳ）を使用し、これはアドレスした装置のＬＥＳＢを返す。ＲＬＳＥＬＳの一実施例では、ＲＬＳを受信した装置に対してＬＥＳＢを可能とするＲＬＳの実装を装置がサポートする。ループ上の装置からリンクステータスを獲得するほかの実施例は、スモールコンピュータ・システム・インターフェース（ＳＣＳＩ）ログ・センスコマンドの使用を介してであり、これによるとディスクドライブがログページにＬＥＳＢを返す。この技術は、アプリケーションにＦＣＥＬＳ情報を渡さないデバイスドライバを有するシステムのためのものである。ループ上の装置からリンクステータスを獲得するさらに他の実施例は、エンクロージャ・サービス・インターフェース（ＥＳＩ）の使用によるものであり、これによるとディスクドライブはＳＦＦ委員会業界グループ仕様（ＳＦＦ）８０６７規定のエンクロージャ開始ＥＳＩをサポートする。１つの機能が両装置のＬＥＳＢ、ループ初期化、及び現在ステータスをエンクロージャ・プロセッサに提供する。エンクロージャ・プロセッサはループ管理にこの情報を使用してもよいし、またはこれを他の管理エンティティに与えてもよい。ループ上の装置からリンクステータスを獲得するさらに他の実施例は報告装置ステータス（ＲＰＳ）ＥＬＳの使用によるものであり、ここではＲＬＳ要求装置によるＬＥＳＢ、ループ初期化カウントと当該装置の現在のステータスがある。
【００３７】
ループ上の装置からリンクステータスを獲得するこれらの各方法の共通要素はＬＥＳＢである。
【００３８】
方法４００はまたエラーを診断する段階４２０を含む。ループエラー診断の方法４００はループの完全なトポロジーの知識を必要とせず、かつエラー分離がループ中の各装置に分散されているためループオーバーヘッド・トラヒックを減少する。さらに、問題源に最も近い装置が診断を実行するためループ診断の有効性が増加する。加えて、各装置の診断機能は装置がアイドルの時に実行されるよう付勢され、従って高優先度タスク時に装置の性能に影響を与えることを防止するので、方法４００はループ上の各装置の性能の劣化を最小化する。
【００３９】
次に図５を参照すると、図４の段階４１０のような、分散デージーチェーン・ピアーツーピア・ループ以外の装置上に局所的に記録されたエラー条件を識別する方法５００のプロセス図が図示されている。
【００４０】
方法５００は分散デージーチェーン・ピアーツーピアー・ループの直上流装置の局所源から現在のエラーステータス・カウントを受信する段階５１０を含む。方法５００はまた分散デージーチェーン・ピアーツーピアー・ループの直上流装置の局所源から以前のエラーステータス・カウントを受信する段階５２０を含む。１実施例では、受信段階５２０は装置の初期化時に実行される。各種の実施例では、受信段階５２０は受信段階５１０の前、その時及び／または後に実行される。以後、方法５００は現在のエラーステータス・カウントを以前のエラーステータス・カウントと比較する段階５３０を含む。その後、方法５００は比較がエラーを指示していることを決定する段階５４０を含む。
【００４１】
次に図６を参照すると、エラーを決定し、診断し、解決する方法６００のプロセス図が図示されている。方法６００では、図５の決定段階５４０が、現在のエラーステータス・カウントは以前のエラーステータス・カウントと異なることを決定する６１０。その後、方法６００で、図４の診断段階４１０は、分散デージーチェーン・ピアーツーピアー・ループで直上流装置と当該装置との間のリンクを検査する段階６２０を含む。検査段階６２０の一実施例では、検査段階は装置から分散デージーチェーン・ピアーツーピアー・ループを介して装置へループ上にデータを送信する段階と、データが送信されたように装置により受信されたかされなかったかを決定する段階を含む。
【００４２】
上流リンクにエラーを決定した場合、当該装置と分散デージーチェーン・ピアーツーピアー・ループの直上流装置との間のリンクにエラーがあることが疑われることを指示するエラー報告が発生される。各種の実施例で、発生段階６３０は、検査段階６２０の前、その時及び／または後に実行される。
【００４３】
図７はループ中のピアー装置７００のブロック線図である。
【００４４】
装置７００はループ７２０と動作的に結合した通信入出力部品７１０を含む。本装置は上流装置及び／または上流リンクのエラーを決定する。一実施例では、ループ７２０はＦＣ−ＡＬである。ループ７２０の残りの部分は、ピアー装置７００からループ７２０の上流にある少なくとも１台の他の装置（図示せず）を含む。一実施例では、ループの他の装置はピアー装置７００である。通信装置７１０はループエラー分離管理アプリケーション７３０に動作的に結合されている。各種の実施例で、ループエラー分離管理アプリケーション７３０は、方法３００、４００、５００及び／または６００の段階を実行する。
【００４５】
ピアー装置７００はループの完全なトポロジーの知識を必要としない。エラー分離がループ中の装置の各々に分散されているためピアー装置７００はループオーバーヘッド・トラヒックを減少する。さらに、問題源に最も近いピアー装置７００が診断を実行するためループ診断の有効性が増加する。加えて、ピアー装置７００がアイドル時に各ピアー装置７００の診断機能を実行するように付勢され、従って高優先度タスク時にピアー装置７００の性能に診断が影響を与えることを防止するので、本発明はループ上の各装置の性能の劣化を最小化する。
【００４６】
一実施例では、ピアー装置７００は図２のディスクドライブ２００のようなディスクドライブを含む。
【００４７】
図８はピアー装置７００のようなピアー装置のループエラー分離管理アプリケーション（ＭＡ）８００のブロック線図である。ＭＡ８００はループ中の上流装置のアイデンティティ（図示せず）の決定器８１０を含む。決定器８１０は図７の通信入出力７１０を通してアイデンティティを受信する。アイデンティティは局所記憶部８２０によりピアー装置７００に局所的に記憶される。記憶部８２０は決定器に動作的に結合される。一実施例では、決定器８１０はループマップから上流装置のアイデンティティの検索器を含む。
【００４８】
ＭＡ８００はループ中の上流装置からリンクエラー・カウントの要求器８３０も含む。要求器８３０はリンクエラー・カウントの局所記憶部８４０に動作的に結合される。リンクエラー・カウントの局所記憶部８４０は現在のリンクエラー・カウントとの以後の履歴比較用のリンクエラー・カウントを記憶する。
【００４９】
ＭＡ８００はまたループ中の上流装置から現在のリンクエラー・カウントの要求器８５０を含む。要求器８５０は図７の通信入出力７１０に結合される。要求器８５０はリンクエラーの現在のカウントを受信する。
【００５０】
ＭＡ８００はまた構成ループ変化の決定器８６０も含む。決定器８６０は図７の通信入出力７１０に動作的に結合される。
【００５１】
比較器８７０は要求器８５０から受信した現在のリンクエラー・カウントと、記憶部８４０から受信した保存エラーカウントと、決定器８６０から受信したループ構成の変化と比較し、これに従ってリンクエラーの解決器８８０または装置エラー診断要求の発生器及び送信器８９０のどちらかを起動する。一実施例では、要求器８８０はリンク検査器を含む。
【００５２】
装置８００の一実施例では、初期化器はループの上流装置のアイデンティティの決定器８１０に動作的に結合され、またアイデンティティの局所記憶部８４０に動作的に結合される。
【００５３】
装置８００の他の実施例では、リンクエラー・カウントの局所記憶部に動作的に結合されたループエラーのモニタが含まれる。さらに、ピアー装置の通信入力上のエラーの検出器がモニタに結合される。
【００５４】
システム７００と８００部品はコンピュータ・ハードウェア回路としてまたはコンピュータ読取り可能プログラムとして、または両者の組み合わせとして実施可能である。
【００５５】
特に、装置７００と８００のコンピュータ読取り可能プログラム実施例では、プログラムはジャバ、スモールトークまたはＣ＋＋のようなオブジェクト指向言語を使用したオブジェクト指向で構造化可能であり、またプログラムはＣＯＢＯＬまたはＣのようなプロシジャー言語を使用したプロシジャー指向で構造化可能である。ソフトウェア・コンポーネントは、アプリケーション・プログラム・インターフェース（Ａ．Ｐ．Ｉ．）またはリモート・プロシジャー・コール（Ｒ．Ｐ．Ｃ．）、common object request broker architecture （ＣＯＲＢＡ）、コンポーネント・オブジェクト・モデル（ＣＯＭ）、分散コンポーネント・オブジェクト・モデル（ＤＣＯＭ）、分散システム・オブジェクト・モデル（ＤＳＯＭ）及びリモートメソッド呼び出し（ＲＭＩ）のようなプロセス間通信技術のような当業者に公知である多数の手段のどれかで通信する。コンポーネントは１台のコンピュータ、またはコンポーネントが存在する少なくとも多数のコンピュータ上で実行する。
【００５６】
図９はコンピュータ・システムの概略図である。本発明はコンピュータ・システム２０００での使用に適している点が有利であり、このコンピュータ・システム２０００はループ中の上流装置に動作的に結合された通信装置と、分散デージーチェーン・ピアーツーピアー・ループで装置上に局所的に記録されたエラー条件を識別する装置とを含む。
コンピュータ・システム２０００はまた電子システムまたは情報処理システムとも呼ばれ、中央処理装置、メモリ及びシステムバスを含む。情報処理装置は、中央処理装置２００４、ランダムアクセスメモリ２０３２、及び中央処理装置２００４とランダムアクセスメモリ２０３２を通信的に結合するシステムバス２０３０を含む。情報処理システム２００２は、上述したランプを含むディスクドライブ装置を含む。情報処理システム２００２はまた入出力バス２０１０と、入出力バス２０１０に取付けられている２０１２、２０１４、２０１６、２０１８、２０２０及び２０２２のようないくつかの周辺装置を含んでもよい。周辺装置は、ハードディスク・ドライブ、磁気光学装置、フロッピー（登録商標）ディスク・ドライブ、モニタ、キーボード及びその他のこのような周辺装置を含む。任意の型式のディスクドライブが上述したようなディスク面へのスライダのローディングとアンローディングの方法を使用してもよい。
【００５７】
ループエラー診断の本発明はループのトポロジーの知識を必要とせず、ループ分離がループ中の装置の各々に分散されているためループオーバーヘッド・トラヒックを減少する。さらに、問題源に最も近い装置が診断を実行するためループ診断の有効性が増加する。加えて、各装置の診断機能は装置がアイドルの時に実行されるよう付勢され、従って高優先度タスク時の装置の性能に影響を与えることを診断が防止しているので、本発明は各装置の性能の劣化を最小化する。
【００５８】
（結論）
結論として、相互接続エラーを管理する方法で、本方法は分散デージーチェーン・ピアーツーピアー・ループ１００の装置上に局所的に記録されたエラー条件を識別する段階４１０と、エラーを診断する段階４２０とを含む。一実施例では、本方法は１１０、１２０、１３０及び／または１４０のような装置により実行される。他の実施例では、分散デージーチェーン・ピアーツーピアー・ループはＦＣ−ＡＬ１５０を含む。さらに他の実施例では、装置はディスクドライブ２００である。
【００５９】
さらに他の実施例では、識別段階３１０は、分散デージーチェーン・ピアーツーピアー・ループ１００で、直上流装置１２０または１３０の局所源から現在のエラーステータス・カウントを受信する段階３７０と、分散デージーチェーン・ピアーツーピアー・ループ１５０で直上流装置１２０または１３０の局所源から以前のエラーステータス・カウントを受信する段階３３０と、３７５のように、現在のエラーステータス・カウントを従来のエラーステータス・カウントと比較する段階と、比較がエラーを指示していることを決定する段階３８５とを含む。さらに他の実施例では、受信段階３７０は受信段階５２０の後に実行される。
【００６０】
別の実施例では、受信段階３３０は装置１１０、１２０、１３０及び／または１４０の初期化時に実行される。
【００６１】
さらに他の実施例では、決定段階５４０は、現在のエラーステータス・カウントが以前のエラーステータス・カウントと異なっていることを決定する段階６１０を含む。エラーステータス・カウントが異なっている時、上流装置もまたエラーを検出し、上流装置と当該装置との間のリンクはエラー源ではない。
【００６２】
別の実施例では、診断段階４２０は分散デージーチェーン・ピアーツーピアー・ループ中の当該装置と直上流装置との間のリンクを検査する段階６３０を含む。検査段階６３０はまた直上流装置から当該装置へ分散デージーチェーン・ピアーツーピアー・ループを介してデータを送信する段階とデータが送信されたように装置により受信されなかったことを決定する段階とを含む。
【００６３】
別な実施例では、診断段階４２０は、当該装置と分散デージーチェーン・ピアーツーピアー・ループ中の直上流装置との間のリンクにエラーがあることが疑われることを指示するエラー報告を発生する段階６２０を含む。
【００６４】
本発明は、ループ７２０中の上流装置に動作的に結合された通信装置７１０と、分散デージーチェーン・ピアーツーピアー・ループで装置上に局所的に記録されたエラー条件を識別する装置７３０とを含む情報処理システム９００を含む。
【００６５】
本発明はまたループ１５０にピアー装置７００を含み、この装置は通信入力７１０と通信入力と動作的に通信するループエラー分離管理アプリケーション７３０とを含む。ループエラー分離管理アプリケーション７３０の一実施例は、ループの上流装置のアイデンティティの決定器８１０、決定器と通信するアイデンティティの局所記憶部８２０、記憶部と通信する、ループの上流装置からリンクエラー・カウントの要求器８３０、要求器８３０と通信する、リンクエラー・カウントの局所記憶部８４０、ループ中の上流装置から現在のリンクエラー・カウントの要求器８５０、構成ループ変化の決定器８６０、決定器８６０と通信する保存エラーカウントに対する現在リンクエラー・カウントの比較器８７０、リンクエラー・カウントの記憶部８４０、現在リンクエラー・カウントの記憶部８５０、比較器と通信するリンクエラーの解決器８８０、比較器８７０とアイデンティティの記憶部８２０と通信する装置エラー診断要求の送信器８９０とを含む。装置７００の一実施例では、ピアー装置７００は、ベースとベースに回転可能に取付けたディスクとを有するディスクドライブ２００を含む。他の実施例では、解決器８８０はリンク検査器を含む。さらに他の実施例では、ループ中の上流装置のアイデンティティの決定器８１０はループマップから上流装置のアイデンティティの検索器を含む。さらに他の実施例では、本装置は、ループ中の上流装置のアイデンティティの決定器８１０と通信し、かつアイデンティティの局所記憶部と通信する初期化器を含む。
【００６６】
ディスクドライブのような情報処理システムは、ループ中の他の装置と通信する制御器を含み、分散またはピアーツーピアー・ループエラー診断を実行する。ループの一例はファイバチャネル・アービトレーテッド・ループである。分散またはピアーツーピアー・ループエラー診断は、エラーカウントを監視してエラーカウントが増加しているかどうかを決定することにより直上流装置と直上流リンクでのエラーを識別し診断する。増加したエラーカウントまたは変化したループ構成は、エラー源が上流装置ではないことを指示し、一方不変エラーカウントと不変ループ構成はエラー源が上流リンクであることを指示している。
【００６７】
上記の説明は説明用であり、制限的な意図のものではないことを理解すべきである。上記の説明を検討すると当業者には多数のその他の実施例が明らかとなる。それ故、本発明の範囲は添付の請求項と共に前記請求項が与えている等価物の全範囲を参照して決定されるべきである。
【図面の簡単な説明】
【図１】ＳＣＳＩＦＣチャネル・プロトコル装置を含む従来のループのブロック線図。
【図２】ディスク面へ変換器をロード・アンロードするランプ組立体と複数ディスクスタックを有するディスクドライブの展開図。
【図３】ループエラー診断の方法のプロセス図。
【図４】ループエラー診断の方法のプロセス図。
【図５】分散デージーチェーン・ピアーツーピアー・ループで局所的に記録されたエラー条件を識別する方法のプロセス図。
【図６】エラーを決定、診断、解決する方法のプロセス図。
【図７】上流装置及び／または上流リンクのエラーを決定するループ中のピアー装置のブロック線図。
【図８】ピアー装置のループエラー分離管理アプリケーションのブロック線図。
【図９】コンピュータシステムの概略図。[0001]
(Related application)
This application is 35U. S. Claims priority of US Provisional Application Serial No. 60 / 166,805 filed Nov. 22, 2000 under C219 (e).
[0002]
(Technical field of the invention)
The present invention relates to the field of loop diagnosis. In particular, the present invention relates to peer-to-peer interface diagnostics.
[0003]
(Background of the Invention)
One important part of a computer system is a device that stores data. Computer systems have a number of different locations where data can be stored. A common place for storing large amounts of data in a computer system is on a disk drive. The most basic components of a disk drive are a rotating disk, actuators that move the transducer to various locations on the disk, and electronic circuitry used to read and write data from the disk. The disk drive also includes circuitry that encodes the data so that it can be retrieved and written successfully from the disk surface. The microprocessor controls the delivery of data to the requesting computer with most operations of the disk drive and the taking and storing of data from the requesting computer on the disk.
[0004]
Information representing data is stored on the surface of the storage disk. The disk drive system reads and writes information stored on the tracks of the storage disk.
[0005]
Fiber Channel (FC) is a serial data transfer system standardized by ANSI. The famous FC standard is Fiber Channel Arbitrated Loop (FC-AL). This standard defines a distributed daisy chain loop. FC provides peer-to-peer communication on this loop.
[0006]
FC-AL was designed for new mass storage devices and other peripheral devices that require very wide bandwidth. FC-AL supports other higher level protocols in addition to the Small Computer System Interface (SCSI) command set. The mapping of these higher level protocols to FC is called the FC-4 layer.
[0007]
In FC-AL, the information from the generating device can be passed through a plurality of other devices and links between the devices before arriving at the receiving device. While passing information on multiple links adds the complexity of isolating the limit and fault links over point-to-point connections, there are three prior art techniques for isolating the limit links. One technique for isolating marginal FC links uses link status to isolate problem links. The second method uses the error reporting function of FC-4 mapping. The third scheme is the first two combinations.
[0008]
The main requirement of the three technologies is knowledge of topology (ie connection order). Knowledge of the topology is obtained from the loop position map at FC-AL definition loop initialization or by implicit means. An example of an implicit means is a disk drive enclosure using hard addresses.
[0009]
The first scheme that uses link status for marginal link isolation requires a management application (MA) on at least one device on the loop. Several MAs may be implemented to cover any failure. The MA periodically polls the loop during normal loop operation or requests equipment reporting a link error to report an accident. In polling mode, the limit status link is searched using the status accumulated in all devices. In reporting error and identification mode, the limit link is searched using the status accumulated from all devices reporting the error.
[0010]
Although this method allows separation of a single error source, it is not always guaranteed.
[0011]
The use of link status makes this scheme FC-4 independent. This is advantageous in a multiple protocol loop. However, the disadvantage of using link status is that the polling or reporting error mode overhead reduces the efficiency of the loop.
[0012]
The second method uses the error reporting function of FC-4 mapping. When using FC-4 reporting errors to isolate error sources on the loop, it is necessary to maintain a log of errors. By analyzing the log, the error source is located to determine which devices report errors and which are not.
[0013]
Using FC-4 reporting errors to isolate error sources on the loop eliminates the need for MAs to maintain link error history and poll the loop. Without polling the loop, the loop overhead is reduced. In addition, errors are only reported when they occur.
[0014]
Using FC-4 reporting errors to isolate error sources on the loop works best in implementations where a single master device receives all reporting errors. An example of such an implementation is a single initiator SCSI storage subsystem.
[0015]
Relying only on the FC-4 error status has at least three drawbacks. The occurrence of a single error does not provide enough information to isolate the error source. In addition, status must be accumulated to build a history in order to isolate the error source. Finally, loops that support multiple protocols or multiple devices that receive FC-4 status are difficult to implement because errors are not reported to a common target device.
[0016]
A third technique for isolating marginal FC links uses link status and FC-4 error reporting to isolate problem links. Polling is not used and single error sources can be isolated.
[0017]
Similar to the use of link status, an MA is required to maintain an error count for all error devices. When an FC-4 error is reported or the MA detects a link error, the MA reads the cumulative link status from all devices to determine possible error sources.
[0018]
A drawback of the implementation on a loop with multiple FC-4s is that the MA must support all FC-4s.
[0019]
Referring to FIG. 1, a diagram of a loop 105 that includes a SCSI Fiber Channel Protocol (FCP) device is illustrated. The loop includes a SCSI initiator device 110 operating as a loop master that communicates with the SCSI target devices 120, 130, 140. The link or interconnect 150 between device 120 and device 130 is limited and / or failed.
[0020]
Error detection and reporting provided by FC-4 may be used to isolate marginal links when available.
[0021]
Due to the limit link 150, the loop master 110 experiences command timeouts and data errors. A command timeout is the result of an error during a command, transfer ready, or response frame. These frames are discarded when received in error. Since the timeout is due to the abandoned frame to the target, the transfer ready and response from the command or the target, the location of the bad link cannot be determined.
[0022]
In a write data operation, device 120 does not experience an error on data from loop master 110. Devices 120 and 130, however, detect errors introduced by the limit link. An error for the write data is reported in the FCP response.
[0023]
In read data operations, the loop master 110 does not detect errors in the read data from the devices 130 and 140.
[0024]
What is needed is a loop error diagnosis that does not require knowledge of the topology of the loop, reducing loop overhead traffic and increasing the effectiveness of the diagnosis.
[0025]
(Summary of the Invention)
In peer-to-peer schemes that isolate error sources, management application (MA) functions are distributed across all devices on the loop. Link status is used for error source isolation. In particular, each device retains its identity, its input, and the link error status of the device connected upstream. When the device detects a link error on its input, the device initiates a request for a link error count to the upstream device.
[0026]
When the upstream device's link status indicates that the device has also detected a link error, the error source is a different link on the loop. If the link status from the upstream device does not indicate that an error has been detected, the error source will be the interconnection between the upstream device and the device itself. The device then initiates a diagnostic transfer between itself and the upstream device to verify that the interconnect is marginal.
[0027]
Advantageously, the loop error diagnosis of the present invention does not require knowledge of the complete topology of the loop. Since error isolation is distributed to each of the devices in the loop, the present invention also reduces loop overhead traffic. Furthermore, the effectiveness of the loop diagnosis increases because the device closest to the problem source performs the diagnosis. In addition, the diagnostic function of each device is energized to run when the device is idle, thus preventing diagnostics from affecting the performance of the device during high priority tasks, so that the present invention Minimize the performance degradation of each device.
[0028]
(Detailed description of preferred embodiments)
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments for implementing the invention. It should be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.
[0029]
The invention described herein is useful in all mechanical configurations of disk drives that have either rotational or linear drive. In addition, the present invention is also useful for all types of disk drives including hard disk drives, zip drives, floppy disk drives where transducer unloading and transducer parking are desired from the surface. FIG. 2 is a development view of one type of disk drive 200 having a rotary actuator. The disk drive 200 includes a housing or base 212 and a cover 214. Base 212 and cover 214 form a disk enclosure. Actuator assembly 220 is rotatably mounted on base 212 on actuator shaft 218. The actuator assembly 220 includes a comb structure 222 having a plurality of arms 223. Load beams or load springs 224 are attached to separate arms 223 on the comb 222. A load beam or load spring is also called a suspension. A slider 226 carrying the magnetic transducer 250 is attached to the end of each load spring 224. A slider 226 having a transducer 250 forms a so-called head. Note that many sliders have one transducer 250, which is what is shown in the drawing. The present invention also applies to a slider having one or more transducers, called a so-called MR or magnetoresistive head, where one transducer 250 is generally used for reading and the other is typically used for writing. It should also be noted that the same applies. There is a voice coil 228 at the end of the actuator arm assembly 220 opposite the load spring 224 and the slider 226.
[0030]
A first magnet 230 and a second magnet 231 are attached in the base 212. As shown in FIG. 2, the second magnet 231 is related to the cover 214. The first and second magnets 230 and 231 and the voice coil 228 are important components of a voice coil motor that applies a force to the actuator assembly 220 for rotation about the actuator shaft 218. The base 212 is also attached with a spindle motor. The spindle motor includes a rotating part called a spindle hub 233. In this particular disk drive, the spindle motor is in the hub. In FIG. 2, a large number of disks 234 are attached to the spindle hub 233. In other disk drives, a single disk or a different number of disks are attached to the hub. The invention described herein is equally applicable to disk drives having a single disk as well as disk drives having multiple disks. The invention described herein is equally applicable to disk drives having a spindle motor that is in or under the hub 233.
[0031]
Referring now to FIG. 3, a process diagram of a loop error diagnosis method 300 is shown. The method 300 includes a step 310 of determining the identity of the upstream device in a loop. Thereafter, the method 300 includes a step 320 of storing the identity. In one embodiment, the decision step 310 and the save step 320 are performed at device initialization. In another embodiment, the identity of the upstream device of the loop is retrieved from the loop map. Thereafter, the method 300 includes a step 330 of requesting a link error count from an upstream device in the loop. The method 300 also includes storing 340 the link error count locally. Thereafter, the method 300 includes a step 350 of monitoring for errors on the loop. Thereafter, the method 300 includes a step 360 of determining whether there is an error in the input of the device. Otherwise, the method continues with operation 350. If there is an error, request 370 the current link error count from the upstream device in the loop. Thereafter, the method determines whether the loop configuration has changed. If the loop configuration has changed, the method continues with operation 310, otherwise the method continues with step 385 to determine if the current link error count has changed compared to the stored error count. . If the current link error count has changed compared to the stored error count, this indicates an error somewhere on the loop, but the method continues with operation 340 storing the link error count locally. To do. If the current link error count has not changed compared to the stored error count, an error has occurred between the upstream device and the device that detected the error, and the method continues with test link 390 and the error is reported. 395.
[0032]
The loop error diagnosis of the present invention does not require knowledge of the complete topology of the loop and reduces loop overhead because error isolation is distributed to each of the devices in the loop. Furthermore, the effectiveness of the loop diagnosis increases because the device closest to the source of the problem performs the diagnosis. In addition, the diagnostic function of each device is activated to be executed when the device is idle, and thus the diagnostic function of each device is activated to be performed when the device is idle, and thus during a high priority task. The present invention minimizes the performance degradation of each device on the loop because it prevents diagnostics from affecting the performance of the device.
[0033]
Referring now to FIG. 4, a process diagram of a loop error diagnosis method 400 is illustrated. Method 400 includes identifying 410 a link error condition recorded locally on the device in a distributed daisy chain peer-to-peer loop. This identification is described in more detail in connection with FIG. 5 below. In one embodiment, the distributed daisy chain peer-to-peer loop is a Fiber Channel Arbitrated Loop (FC-AL). In other embodiments, the device is a disk drive, such as disk drive 200 of FIG.
[0034]
The Fiber Channel (FC) device detects and counts errors received by the device. The count is stored in a link error status block (LESB). Errors that the device will encounter are link failure (eg loss of word synchronization over a specified time), loss of synchronization (eg loss of word synchronization below a specified time and illegal transmission of more than a specified number of words), illegal execution A parity error or illegal character is detected and / or an illegal cyclic redundancy check illegal transmission word is included.
[0035]
If any field in the LESB is increasing, the device has detected an error.
[0036]
There are several techniques known to those skilled in the art for obtaining link status from devices on a loop. One technique uses Read Link Status (RLS) Extended Link Service (ELS), which returns the LESB of the addressed device. In one embodiment of the RLS ELS, the device supports an RLS implementation that enables LESB to the device that received the RLS. Another example of obtaining link status from a device on the loop is through the use of a small computer system interface (SCSI) log sense command, whereby the disk drive returns a LESB in the log page. This technology is for systems with device drivers that do not pass FC ELS information to applications. Yet another embodiment for obtaining link status from devices on the loop is through the use of the Enclosure Service Interface (ESI), which indicates that the disk drive is compliant with SFF Committee Industry Group Specification (SFF) 8067. Supports enclosure start ESI. One function provides the enclosure processor with the LESB, loop initialization, and current status of both devices. The enclosure processor may use this information for loop management or may provide this to other management entities. Yet another embodiment for obtaining link status from a device on the loop is through the use of a reporting device status (RPS) ELS, where the RLS requesting device's LESB, loop initialization count and the current status of the device are is there.
[0037]
The common element of each of these methods for obtaining link status from devices on the loop is LESB.
[0038]
The method 400 also includes a step 420 for diagnosing errors. The loop error diagnostic method 400 does not require knowledge of the complete topology of the loop and reduces loop overhead traffic because error isolation is distributed among the devices in the loop. Furthermore, the effectiveness of the loop diagnosis increases because the device closest to the problem source performs the diagnosis. In addition, since the diagnostic function of each device is energized to be executed when the device is idle, thus preventing affecting the performance of the device during high priority tasks, the method 400 can be used for each device on the loop. Minimize performance degradation.
[0039]
Referring now to FIG. 5, a process diagram of a method 500 for identifying locally recorded error conditions on a device other than a distributed daisy chain peer-to-peer loop, such as step 410 of FIG. 4, is illustrated. .
[0040]
Method 500 includes receiving 510 a current error status count from a local source of a device immediately upstream of a distributed daisy chain peer-to-peer loop. Method 500 also includes receiving 520 a previous error status count from a local source of the upstream device in the distributed daisy chain peer-to-peer loop. In one embodiment, receive stage 520 is performed at device initialization. In various embodiments, the receive stage 520 is performed before, at and / or after the receive stage 510. Thereafter, the method 500 includes a step 530 of comparing the current error status count with the previous error status count. Thereafter, the method 500 includes a step 540 of determining that the comparison indicates an error.
[0041]
Referring now to FIG. 6, a process diagram of a method 600 for determining, diagnosing, and resolving errors is illustrated. In method 600, decision stage 540 of FIG. 5 determines 610 that the current error status count is different from the previous error status count. Thereafter, in the method 600, the diagnostic step 410 of FIG. 4 includes a step 620 of examining the link between the upstream device and the device in a distributed daisy chain peer-to-peer loop. In one embodiment of the test phase 620, the test phase includes transmitting data on the loop from the device to the device via a distributed daisy chain peer-to-peer loop, and whether the data was received by the device as transmitted. Including determining if it was not done.
[0042]
If an error is determined for the upstream link, an error report is generated indicating that the link between the device and the upstream upstream device of the distributed daisy chain peer-to-peer loop is suspected of being in error. In various embodiments, the generation stage 630 is performed before, at and / or after the inspection stage 620.
[0043]
FIG. 7 is a block diagram of the peer device 700 in a loop.
[0044]
Device 700 includes a communication input / output component 710 operably coupled to loop 720. The device determines an upstream device and / or upstream link error. In one embodiment, loop 720 is FC-AL. The remaining portion of loop 720 includes at least one other device (not shown) upstream from loop device 720 from peer device 700. In one embodiment, the other device in the loop is a peer device 700. Communication device 710 is operatively coupled to loop error isolation management application 730. In various embodiments, the loop error isolation management application 730 performs the steps of the methods 300, 400, 500, and / or 600.
[0045]
The peer device 700 does not require knowledge of the complete topology of the loop. Peer device 700 reduces loop overhead traffic because error isolation is distributed to each of the devices in the loop. Further, the effectiveness of the loop diagnosis is increased because the peer device 700 closest to the problem source performs the diagnosis. In addition, since the peer device 700 is energized to perform the diagnostic function of each peer device 700 when idle, thus preventing diagnostics from affecting the performance of the peer device 700 during high priority tasks. Minimizes the performance degradation of each device on the loop.
[0046]
In one embodiment, peer device 700 includes a disk drive, such as disk drive 200 of FIG.
[0047]
FIG. 8 is a block diagram of a loop error isolation management application (MA) 800 of a peer device such as peer device 700. MA 800 includes a determiner 810 for the identity (not shown) of the upstream device in the loop. The determiner 810 FIG. The identity is received through the communication input / output 710. The identity is stored locally in the peer device 700 by the local storage unit 820. Storage unit 820 is operatively coupled to the determiner. In one embodiment, the determiner 810 includes an upstream device identity retriever from a loop map.
[0048]
The MA 800 also includes a link error count requester 830 from the upstream device in the loop. Requester 830 is operatively coupled to a local storage 840 of link error counts. The link error count local storage unit 840 stores a link error count for subsequent history comparison with the current link error count.
[0049]
The MA 800 also includes a current link error count requester 850 from the upstream device in the loop. The requester 850 FIG. Are coupled to the communication input / output 710. Requester 850 receives the current count of link errors.
[0050]
The MA 800 also includes a configuration loop change determiner 860. The determiner 860 FIG. Is operatively coupled to the communication input / output 710 of the device.
[0051]
The comparator 870 compares the current link error count received from the requester 850, the stored error count received from the storage unit 840, and the change in the loop configuration received from the determiner 860, and accordingly, a link error resolver. Either 880 or a device error diagnostic request generator and transmitter 890 is activated. In one embodiment, requester 880 includes a link checker.
[0052]
In one embodiment of the device 800, the initializer is operably coupled to an identity determiner 810 of the upstream device of the loop and operably coupled to the local store 840 of the identity.
[0053]
Another embodiment of the apparatus 800 includes a loop error monitor operatively coupled to a local store of link error counts. In addition, a detector for errors on the communication input of the peer device is coupled to the monitor.
[0054]
The systems 700 and 800 components can be implemented as computer hardware circuits or as computer readable programs, or a combination of both.
[0055]
In particular, in the computer-readable program embodiments of devices 700 and 800, the program can be structured in an object-oriented manner using an object-oriented language such as Java, Small Talk or C ++, and the program can be a COBOL or C-like. It can be structured in a procedure-oriented manner using a procedure language. Software components include Application Program Interface (API) or Remote Procedure Call (RPC), common object request broker architecture (CORBA), Component Object Model (COM) Any of a number of means known to those skilled in the art, such as inter-process communication techniques such as Distributed Component Object Model (DCOM), Distributed System Object Model (DSOM) and Remote Method Invocation (RMI) connect. The component runs on one computer or at least a number of computers on which the component resides.
[0056]
FIG. 9 is a schematic diagram of a computer system. The present invention is advantageous in that it is suitable for use in a computer system 2000, which includes a communication device operably coupled to an upstream device in a loop and a distributed daisy chain peer-to-peer network. Identifying an error condition locally recorded on the device in a loop.
Computer system 2000, also referred to as an electronic system or information processing system, includes a central processing unit, memory, and a system bus. The information processing apparatus includes a central processing unit 2004, a random access memory 2032, and a system bus 2030 that communicatively couples the central processing unit 2004 and the random access memory 2032. The information processing system 2002 includes a disk drive device including the lamp described above. The information processing system 2002 may also include an input / output bus 2010 and a number of peripheral devices such as 2012, 2014, 2016, 2018, 2020 and 2022 attached to the input / output bus 2010. Peripheral devices include hard disk drives, magneto-optical devices, floppy disk drives, monitors, keyboards and other such peripheral devices. Any type of disk drive may use the method of loading and unloading the slider onto the disk surface as described above.
[0057]
The present invention of loop error diagnosis does not require knowledge of the topology of the loop and reduces loop overhead traffic because the loop separation is distributed to each device in the loop. Furthermore, the effectiveness of the loop diagnosis increases because the device closest to the problem source performs the diagnosis. In addition, since the diagnostic function of each device is energized to run when the device is idle, thus preventing the diagnostic from affecting the performance of the device during high priority tasks, the present invention Minimize equipment performance degradation.
[0058]
(Conclusion)
In conclusion, in a method for managing interconnect errors, the method identifies 410 an error condition recorded locally on the device of the distributed daisy chain peer-to-peer loop 100 and diagnoses the error 420. Including. In one embodiment, the method is performed by a device such as 110, 120, 130 and / or 140. In another embodiment, the distributed daisy chain peer-to-peer loop includes FC-AL 150. In yet another embodiment, the device is a disk drive 200.
[0059]
In yet another embodiment, the identifying step 310 includes receiving a current error status count 370 from the local source of the upstream device 120 or 130 in the distributed daisy chain peer-to-peer loop 100; Receiving a previous error status count from the local source of the immediately upstream device 120 or 130 in the peer-to-peer loop 150; and, as in 375, the current error status count and the conventional error status count Comparing and determining 385 that the comparison indicates an error. In yet another embodiment, receive stage 370 is performed after receive stage 520.
[0060]
In another embodiment, the receiving stage 330 is performed at the initialization of the devices 110, 120, 130 and / or 140.
[0061]
In yet another embodiment, the determining step 540 includes determining 610 that the current error status count is different from the previous error status count. When the error status count is different, the upstream device also detects an error and the link between the upstream device and the device is not an error source.
[0062]
In another embodiment, the diagnostic stage 420 includes examining 630 a link between the device in the distributed daisy chain peer-to-peer loop and the upstream device. The test phase 630 also includes transmitting data from the upstream device to the device via a distributed daisy chain peer-to-peer loop and determining that the data was not received as received. Including.
[0063]
In another embodiment, the diagnostic stage 420 generates an error report indicating that the link between the device and the upstream device in the distributed daisy chain peer-to-peer loop is suspected of having an error. Step 620 is included.
[0064]
The present invention includes a communication device 710 operably coupled to an upstream device in loop 720 and a device 730 that identifies error conditions recorded locally on the device in a distributed daisy chain peer-to-peer loop. An information processing system 900 is included.
[0065]
The present invention also includes a peer device 700 in the loop 150, which includes a communication input 710 and a loop error isolation management application 730 in operative communication with the communication input. One embodiment of the loop error isolation management application 730 includes a loop upstream device identity determiner 810, an identity local store 820 that communicates with the determiner, and a link error count from the loop upstream device that communicates with the store. Requester 830, link error count local storage 840 in communication with requester 830, current link error count requester 850 from upstream device in the loop, constituent loop change determiner 860, determiner 860 Current link error count comparator 870, link error count storage unit 840, current link error count storage unit 850, link error resolver 880 communicating with the comparator, comparator 870 communicates with the identity storage unit 820 And a transmitter 890 for error diagnosis request. In one embodiment of the apparatus 700, the peer apparatus 700 includes a disk drive 200 having a base and a disk rotatably mounted on the base. In other embodiments, the resolver 880 includes a link checker. In yet another embodiment, the upstream device identity determiner 810 in the loop includes an upstream device identity retriever from the loop map. In yet another embodiment, the apparatus includes an initializer that communicates with an identity determiner 810 of an upstream device in the loop and with a local store of identities.
[0066]
Information processing systems such as disk drives include controllers that communicate with other devices in the loop and perform distributed or peer-to-peer loop error diagnosis. An example of a loop is a fiber channel arbitrated loop. Distributed or peer-to-peer loop error diagnosis identifies and diagnoses errors in the upstream device and the upstream link by monitoring the error count to determine if the error count is increasing. An increased error count or changed loop configuration indicates that the error source is not an upstream device, while an invariant error count and invariant loop configuration indicate that the error source is an upstream link.
[0067]
It should be understood that the above description is illustrative and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are given.
[Brief description of the drawings]
FIG. 1 is a block diagram of a conventional loop including a SCSI FC channel protocol device.
FIG. 2 is a development view of a disk drive having a lamp assembly and a plurality of disk stacks for loading / unloading a converter to / from a disk surface.
FIG. 3 is a process diagram of a loop error diagnosis method.
FIG. 4 is a process diagram of a loop error diagnosis method.
FIG. 5 is a process diagram of a method for identifying locally logged error conditions in a distributed daisy chain peer-to-peer loop.
FIG. 6 is a process diagram of a method for determining, diagnosing and resolving errors.
FIG. 7 is a block diagram of a peer device in a loop that determines an upstream device and / or upstream link error.
FIG. 8 is a block diagram of a loop error isolation management application for a peer device.
FIG. 9 is a schematic diagram of a computer system.

Claims

In a method of error diagnosis in a distributed daisy chain peer-to-peer loop, the method is performed by a device in a distributed daisy chain peer-to-peer loop;
(a) identifying a local error condition of the device, the identification step comprising:
(a) (1) receiving a current error status count from a local source of an upstream device in the distributed daisy chain peer-to-peer loop;
(a) (2) receiving a previous error status count from a local source of an upstream device in the distributed daisy chain peer-to-peer loop; and
(a) (3) comparing the current error status count against the previous error status count; and
(b) diagnosing the error;
For diagnosing loop errors in a distributed daisy chain peer-to-peer loop including

The method of claim 1, wherein the distributed daisy chain peer-to-peer loop comprises a Fiber Channel arbitrated loop.

The method of claim 1, wherein the method is performed by an apparatus that detects an error.

The method according to claim 1, wherein the receiving step (a) (1) is performed after the receiving step (a) (2).

5. The method according to claim 4, wherein the receiving step (a) (2) is performed during initialization of the device.

The method of claim 1, further comprising:
(a) (4) determining that the comparison indicates an error;
Including methods.

The method of claim 6, wherein said determining step (a) (4) comprises:
(a) (4) (i) determining whether the current error status count is equal to the previous error status count;
Including methods.

The method according to claim 6, wherein the diagnostic step (b) comprises:
(b) (1) generating an error report indicating that there is an error in the link between the device in the distributed daisy chain peer-to-peer loop and the upstream device;
Including methods.

The method of claim 6, wherein said determining step (a) (4) comprises:
(a) (4) (i) determining that the current error status count is not equal to the previous error status count;
Including methods.

The method of claim 9, wherein the diagnosing step (b) comprises:
(b) (1) determining that there is no suspicion that there is an error source on the link between the device and the upstream device in a distributed daisy chain peer-to-peer loop;
Including methods.

The method of claim 9, wherein the diagnosing step (b) comprises:
(b) (1) testing a link between the device and the upstream device in a distributed daisy chain peer-to-peer loop;
Including methods.

12. The method of claim 11, wherein the testing step (b) (1) comprises: (b) (1) (i) passing data upstream through the distributed daisy chain peer-to-peer loop. Transmitting from said device to said device; and
(b) (1) (ii ) if the data is Ru is transmitted, the step of the data to determine whether or not received by the device,
Including methods.

The method of claim 1, wherein the apparatus further comprises a disk drive.

In the peer device in the loop,
Communication input,
A loop error isolation management application actively coupled to the communication input,
The loop error isolation management application
An upstream device identity determiner in the loop;
Local storage of identities actively coupled to the determiner;
A link error count requestor from an upstream device in the loop actively coupled to the memory;
Local storage of the link error count actively coupled to the requestor;
A requester of a current link error count from the upstream device of the loop;
A composition loop change determiner;
A current error link count comparator of saved error count actively connected to the determiner, a link error count store, and a store of the current link error count;
A link error resolver actively coupled to the comparator; and a device error diagnostic request transmitter actively coupled to the comparator and an identity store;
Device with.

15. The peer device of claim 14, wherein the peer device comprises a disk drive having a base and a disk rotatably attached to the base.

15. The peer device of claim 14, wherein the resolver includes a link tester;
The upstream device identity determiner of the loop includes an upstream device identity retriever from a loop map, and the device is actively coupled to an upstream device identity determiner in the loop; and An initializer actively coupled to the local store of identities;
Pier device.

15. The peer device of claim 14, wherein the peer device further comprises a disk drive.