JPH05216699A - Path fault processing system of external storage device - Google Patents

Path fault processing system of external storage device

Info

Publication number
JPH05216699A
JPH05216699A JP4054362A JP5436292A JPH05216699A JP H05216699 A JPH05216699 A JP H05216699A JP 4054362 A JP4054362 A JP 4054362A JP 5436292 A JP5436292 A JP 5436292A JP H05216699 A JPH05216699 A JP H05216699A
Authority
JP
Japan
Prior art keywords
path
fault
external storage
input
route
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP4054362A
Other languages
Japanese (ja)
Other versions
JP2830592B2 (en
Inventor
Mitsujirou Uchida
密次郎 内田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP4054362A priority Critical patent/JP2830592B2/en
Publication of JPH05216699A publication Critical patent/JPH05216699A/en
Application granted granted Critical
Publication of JP2830592B2 publication Critical patent/JP2830592B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Retry When Errors Occur (AREA)
  • Hardware Redundancy (AREA)

Abstract

PURPOSE:To prevent the performance of the system from being degraded by promoting the retrial of the exchange path even when a path fault occurs intermittently and repeatedly and further blocking the fault path early. CONSTITUTION:An external storage device 33 is connected with an OS12 and plural connection paths for its significance and it is composed so that the input/ output operation may be continued in the exchange path even when a path is impossible to be used due to a fault. In this case, an external storage control device 31 classifies the occurred fault into the fault of the common part in the external storage sub-system 30 and a path fault, and increases the count in a path fault cumulative counter 32 installed in the external storage control device 31 when the fault is the path fault. The part exhibiting the fault processing request classification to an input/output control part 16 is provided within the detailed information generated by the external storage control device 31. The input/output control part 16 which received this report selects the fault processing according to the fault processing request classification and a source state table 17.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は情報処理システム、さら
に詳しくいえば、ホストプロセッサ上で稼働するOS
が、ホストプロセッサに複数の経路で接続されている外
部記憶装置の状態を管理する経路障害処理方式に関す
る。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing system, and more specifically, an OS running on a host processor.
Relates to a path failure processing method for managing the state of an external storage device connected to a host processor by a plurality of paths.

【0002】[0002]

【従来の技術】情報処理システムにおいては、外部記憶
装置が何らかの障害を検知して入出力動作を異常終了す
る場合に障害の状況を示す詳細情報を生成し、これをO
Sに報告している。従来はこの詳細情報内に経路障害で
あるか、経路に依存しない共通部分の障害であるかの提
示を行うのが一般的であった。OSはこの共通部障害か
経路障害の情報を判定し、異常終了した入出力動作を次
のように再試行している。
2. Description of the Related Art In an information processing system, when an external storage device detects some kind of failure and abnormally terminates the input / output operation, detailed information indicating the status of the failure is generated, and the detailed information is generated.
Report to S. Conventionally, it has been general to present in the detailed information whether the route is a fault or the fault of a common part that does not depend on the route. The OS judges the information of the common part failure or the path failure and retries the abnormally ended input / output operation as follows.

【0003】まず、共通部障害の場合は障害経路上で入
出力動作を規定回数だけ再試行する。つぎに経路障害の
場合は規定回数の障害経路上での再試行を行い、これで
救済できない場合に交替経路上で規定回数だけ再試行を
行う。利用可能な交替経路が存在しない場合は前者と同
様な障害処理を行う。交替経路上での再試行で救済に成
功した場合、交替経路の動作が正常であること、そして
障害経路では既に複数回の試行が失敗しており、障害経
路を以降の入出力動作で用いた場合に同種の障害が発生
する確率が高いことから障害経路を以降の入出力動作で
用いないように閉塞する。この経路閉塞の目的は障害確
率の高い経路の使用を取り止めて障害処理によるシステ
ムのオーバヘッド増大を回避することである。
First, in the case of a common part failure, the input / output operation is retried a prescribed number of times on the failure path. Next, in the case of a route failure, a retry is performed a prescribed number of times on the faulty route, and if this cannot be relieved, a retry is performed a prescribed number of times on the alternate route. When there is no available alternative route, the same failure processing as the former is performed. If the rescue is successful by retrying on the alternate route, the alternate route operation is normal, and the failed route has already failed multiple times, and the failed route was used for subsequent I / O operations. In this case, since the same type of failure is likely to occur, the failure path is blocked so as not to be used in the subsequent input / output operation. The purpose of this path blockage is to avoid the use of paths with a high probability of failure and avoid the increase in system overhead due to failure processing.

【0004】[0004]

【発明が解決しようとする課題】ところで、線路障害が
間欠的であるが、繰り返し発生する環境下では障害の発
生頻度が高まると、それに応じて障害線路再試行回数も
増し、システムの性能を圧迫するという問題があった。
しかも、障害経路再試行で救済されてしまうので、従来
の方式では障害経路を閉塞するまでは至らず以降の入出
力動作でも引き続き障害経路を使用してしまっていた。
本発明の目的は上記問題を解決するもので、不安定な経
路を使用する機会を減らし、無駄な再試行処理によるシ
ステム性能の低下を防止する外部記憶装置の経路障害処
理方式を提供することにある。
By the way, although line faults are intermittent, if the frequency of faults increases in an environment where they repeatedly occur, the number of faulty line retries increases correspondingly and pressures the system performance. There was a problem to do.
Moreover, since the failure route is relieved by the retry, the conventional method does not close the failure route and continues to use the failure route for subsequent input / output operations.
An object of the present invention is to solve the above problems, and to provide a path failure processing method for an external storage device, which reduces the chance of using an unstable path and prevents deterioration of system performance due to useless retry processing. is there.

【0005】[0005]

【課題を解決するための手段】前記目的を達成するため
に本発明による外部記憶装置の経路障害処理方式はホス
トプロセッサ上で稼働するOSが、ホストプロセッサに
複数の経路で接続された外部記憶装置の状態管理を行う
方式であって、前記外部記憶装置は経路障害発生累積回
数を計数し、前記経路障害発生累積回数がしきい値を越
えた後の経路障害発生時に、入出力動作異常終了をOS
に報告する際に経路障害発生累積回数がしきい値を越え
たことも報告し、前記報告を受けたOSは交替経路が存
在すれば、この交替経路上で、交替経路が存在しなけれ
ば現障害経路で入出力動作を再試行するように構成して
ある。また、本発明は上記構成に加えて、前記交替経路
が存在し、この交替経路上で再試行した入出力動作が成
功した場合には、初期の障害経路を閉塞するように構成
してある。
In order to achieve the above object, a path failure processing method for an external storage device according to the present invention is an external storage device in which an OS running on a host processor is connected to the host processor through a plurality of paths. In the above method, the external storage device counts the cumulative number of occurrences of path failures, and when the path failure occurs after the cumulative number of occurrences of path failures exceeds a threshold value, abnormal termination of I / O operation is performed. OS
It also reports that the cumulative number of occurrences of path failures exceeds the threshold value when reporting to the OS. If the replacement path exists, the OS that has received the report indicates that there is no replacement path on the replacement path. It is configured to retry the I / O operation on the fault path. In addition to the above configuration, the present invention is configured to block the initial failure route when the alternate route exists and the input / output operation retried on the alternate route succeeds.

【0006】[0006]

【実施例】以下、図面を参照して本発明をさらに詳しく
説明する。図1は本発明による外部記憶装置の線路障害
処理方式を採用した情報処理システムの実施例を示すブ
ロック図である。本実施例の情報処理システムは物理的
にはホストプロセッサ10,入出力チャネル群20およ
び外部記憶サブシステム30より構成されている。OS
12はホストプロセッサ10上で稼働するプログラムで
あって、ジョブ管理部13,資源管理部14等、情報処
理システム全体を管理している。OS12の資源管理部
14の一部には資源状態テーブル17が設けられ、OS
12が利用可能な情報処理システム内の各資源の状態が
掌握されている。
DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in more detail below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of an information processing system adopting a line fault processing method for an external storage device according to the present invention. The information processing system of this embodiment is physically composed of a host processor 10, an input / output channel group 20, and an external storage subsystem 30. OS
A program 12 runs on the host processor 10 and manages the entire information processing system such as the job management unit 13 and the resource management unit 14. A resource status table 17 is provided in a part of the resource management unit 14 of the OS 12,
The state of each resource in the information processing system which can be used by 12 is grasped.

【0007】入出力チャネル群20はホストプロセッサ
10と外部記憶サブシステム30との間を接続する接続
部であり、OS12からは外部記憶サブシステム30へ
のアクセス経路の一部分として用いられる。外部記憶サ
ブシステム30は複数の外部記憶装置33および外部記
憶装置33と入出力チャネル20との間のデータ交換を
行う複数の外部記憶制御装置31よりなる。外部記憶装
置33はその重要性からOS12と複数の接続経路によ
って接続され、ある経路が障害により使用不能になって
も交替の経路で入出力動作を継続し得るように構成され
ている。外部記憶制御装置31は各経路上に配置されて
いる。
The input / output channel group 20 is a connecting portion for connecting the host processor 10 and the external storage subsystem 30 and is used as a part of an access path from the OS 12 to the external storage subsystem 30. The external storage subsystem 30 comprises a plurality of external storage devices 33 and a plurality of external storage control devices 31 for exchanging data between the external storage devices 33 and the input / output channels 20. The external storage device 33 is connected to the OS 12 by a plurality of connection routes because of its importance, and is configured so that even if a certain route becomes unusable due to a failure, the input / output operation can be continued through the alternate route. The external storage control device 31 is arranged on each path.

【0008】さて、アプリケーションプログラム11が
発する入出力要求はOS12内のファイル管理部15を
経由して同じくOS12内の入出力制御部16の管理下
で実行される。入出力制御部16は資源状態テーブル1
7を参照してアクセスすべき外部記憶装置33へ至る複
数の経路から使用可能なひとつの経路を選択し、選択し
た経路上の入出力チャネル20に対して入出力命令を発
行する。発行された入出力命令は入出力チャネル20と
外部記憶制御装置31の複合的な動作により実行され、
障害の有無および障害の状況を提示する詳細情報が入出
力命令の終了結果として入出力制御部16に返却され
る。入出力動作における障害は入出力チャネル20,外
部記憶制御装置31または外部記憶装置33で発生し、
このうち外部記憶制御装置31と外部記憶装置33で発
生した障害に関しては外部記憶制御装置31において詳
細情報が生成される。
The input / output request issued by the application program 11 is executed under the control of the input / output control unit 16 in the OS 12 via the file management unit 15 in the OS 12. The input / output control unit 16 uses the resource status table 1
7, one available path is selected from a plurality of paths to the external storage device 33 to be accessed, and an input / output instruction is issued to the input / output channel 20 on the selected path. The issued input / output instruction is executed by the combined operation of the input / output channel 20 and the external storage controller 31,
Detailed information indicating the presence or absence of a failure and the status of the failure is returned to the input / output control unit 16 as the end result of the input / output instruction. A failure in the input / output operation occurs in the input / output channel 20, the external storage controller 31 or the external storage device 33,
Of these, detailed information is generated in the external storage control device 31 regarding the failure that has occurred in the external storage control device 31 and the external storage device 33.

【0009】外部記憶制御装置31は発生した障害を外
部記憶サブシステム30における共通部分の障害と経路
障害とに分類し、経路障害である場合に外部記憶制御装
置31内に設置した経路障害累積カウンタ32をインク
リメントする。外部記憶制御装置31が生成する詳細情
報内には入出力制御部16への障害処理要求種別を提示
する部分を設け、「経路障害累積カウンタがしきい値未
満である経路障害」,「経路障害累積カウンタがしきい
値以上である経路障害」または「経路障害ではない」の
いずれかを格納する。この報告を受け取った入出力制御
部16は障害処理要求種別と資源状態テーブル17に応
じて障害処理を選択する。
The external storage control device 31 classifies the generated faults into a common portion fault and a route fault in the external storage subsystem 30, and in the case of a route fault, a route fault cumulative counter installed in the external storage control device 31. Increment 32. In the detailed information generated by the external storage control device 31, there is provided a portion for presenting the type of fault processing request to the input / output control unit 16, and the "route fault whose cumulative route fault counter is less than the threshold value", "route fault" Either "path failure whose cumulative counter is greater than or equal to a threshold value" or "not path failure" is stored. The input / output control unit 16 that has received this report selects the failure processing according to the failure processing request type and the resource status table 17.

【0010】図2は入出力制御部のその動作手順を説明
するためのフローチャートである。まず、交替経路があ
るか否かを判断し(ステップ1)、利用不能である場合
には障害処理要求種別によらず障害経路での再試行を行
う(ステップ7)。一方、交替経路が利用可能である場
合には障害処理要求種別を判断し(ステップ2)、以下
のような障害処理を実行する。「経路障害でない」場合
には障害経路での再試行を行う(ステップ7)。「経路
障害である」場合には「経路障害累積カウンタがしきい
値未満である経路障害」であるか否かを判断する(ステ
ップ3)。「経路障害累積カウンタがしきい値未満であ
る経路障害」である場合には障害経路での規定回数の再
試行の後(ステップ4)、救済できるか否かを判断する
(ステップ5)。救済できる場合には再試行成功として
終了する。救済できない場合には交替経路での再試行に
進む(ステップ9)。また「経路障害累積カウンタがし
きい値以上である経路障害」の場合には直ちにステップ
9に進み交替経路での再試行を行う。
FIG. 2 is a flow chart for explaining the operation procedure of the input / output control unit. First, it is determined whether or not there is an alternate route (step 1), and if it is unavailable, retry is performed on the fault route regardless of the type of fault processing request (step 7). On the other hand, when the alternate route is available, the failure processing request type is determined (step 2) and the following failure processing is executed. If the path is not a path failure, the failed path is retried (step 7). In the case of "path failure", it is judged whether or not "path failure in which the path failure cumulative counter is less than the threshold value" (step 3). In the case of "a route fault in which the route fault cumulative counter is less than the threshold value", after a specified number of retries on the faulty route (step 4), it is judged whether or not the repair is possible (step 5). If it can be remedied, it ends as a retry success. If it cannot be repaired, the process proceeds to the retry on the alternate route (step 9). Further, in the case of "a route fault in which the route fault cumulative counter is equal to or more than the threshold value", the process immediately proceeds to step 9 and retry is performed on the alternate route.

【0011】障害経路再試行、交替経路再試行のいずれ
かで障害が救済された場合、入出力制御部16は入出力
要求元であるアプリケーションプログラム11に入出力
動作が正常に終了したことを通知する。また、入出力制
御部16は上記障害処理の過程で交替経路再試行を行っ
て交替経路上で救済した場合に、資源状態テーブル17
上で初期の障害経路を利用不能とする(ステップ1
1)。一方、外部記憶制御装置31は個々の入出力命令
を実行するのであって、各入出力命令が初期の命令であ
るのか、障害経路再試行時の命令であるのか、交替経路
再試行時の命令であるのか判別しない。したがって、障
害処理要求種別はあくまで要求であって、障害処理種別
は入出力制御部16の判定によって定まるものである。
When the fault is relieved by either the fault route retry or the alternate route retry, the input / output control unit 16 notifies the application program 11 as the input / output request source that the input / output operation has been normally completed. To do. Further, when the input / output control unit 16 retries the alternate route and rescues it on the alternate route in the process of the failure processing, the resource state table 17
Make the initial failure path unavailable above (step 1
1). On the other hand, the external storage control device 31 executes individual input / output instructions, and whether each input / output instruction is an initial instruction, a failure route retry instruction, or an alternate path retry instruction. Is not determined. Therefore, the failure processing request type is just a request, and the failure processing type is determined by the judgment of the input / output control unit 16.

【0012】[0012]

【発明の効果】以上、説明したように本発明によれば、
経路障害が間欠的ではあるが繰り返し発生する場合に
も、交替経路再試行を促進しさらに障害経路を早期に閉
塞することが可能なように構成されているので、無駄な
再試行処理によるシステム性能の低下を防止するともに
不安定な経路を継続して使用することにより起こりがち
なより重度の障害の発生を防止できるという効果があ
る。
As described above, according to the present invention,
Even if a route failure occurs intermittently but repeatedly, it is configured so that alternate route retries can be promoted and the failed route can be blocked early. It is possible to prevent the occurrence of a more serious failure that tends to occur by continuously using the unstable route while preventing the deterioration of the power consumption.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明による外部記憶装置の経路障害処理方式
の実施例を示すブロック図である。
FIG. 1 is a block diagram showing an embodiment of a path failure processing method of an external storage device according to the present invention.

【図2】図1の入出力制御部の障害処理を説明するため
のフローチャートである。
FIG. 2 is a flow chart for explaining a failure process of the input / output control unit of FIG.

【符号の説明】[Explanation of symbols]

10…ホストプロセッサ 11…アプリケーションプログラム 12…OS(オペレーションシステム) 13…ジョブ管理部 14…資源管理部 15…ファイル管理部 16…入出力制御部 17…資源状態テーブル 20…入出力チャネル 30…外部記憶サブシステム 31…記憶制御装置 32…経路障害累積カウンタ 33…外部記憶装置 DESCRIPTION OF SYMBOLS 10 ... Host processor 11 ... Application program 12 ... OS (Operating system) 13 ... Job management unit 14 ... Resource management unit 15 ... File management unit 16 ... Input / output control unit 17 ... Resource status table 20 ... Input / output channel 30 ... External storage Subsystem 31 ... Storage control device 32 ... Path failure cumulative counter 33 ... External storage device

Claims (2)

【特許請求の範囲】[Claims] 【請求項1】 ホストプロセッサ上で稼働するOSが、
ホストプロセッサに複数の経路で接続された外部記憶装
置の状態管理を行う方式であって、 前記外部記憶装置は経路障害発生累積回数を計数し、 前記経路障害発生累積回数がしきい値を越えた後の経路
障害発生時に、入出力動作異常終了をOSに報告する際
に経路障害発生累積回数がしきい値を越えたことも報告
し、 前記報告を受けたOSは交替経路が存在すれば、この交
替経路上で、交替経路が存在しなければ現障害経路で入
出力動作を再試行するように構成したことを特徴とする
外部記憶装置の経路障害処理方式。
1. An operating system running on a host processor comprises:
A method of managing the state of an external storage device connected to a host processor by a plurality of routes, wherein the external storage device counts the cumulative number of occurrences of path failures, and the cumulative number of occurrences of path failures exceeds a threshold value. When a subsequent path failure occurs, it also reports that the cumulative number of times of path failure occurrence exceeds the threshold value when reporting the abnormal termination of the input / output operation to the OS, and the OS receiving the report, if an alternate path exists, On this alternate route, if there is no alternate route, the I / O operation is retried on the current fault route.
【請求項2】前記交替経路が存在し、この交替経路上で
再試行した入出力動作が成功した場合には、初期の障害
経路を閉塞するように構成したことを特徴とする請求項
1記載の外部記憶装置の線路障害処理方式。
2. The alternate path exists, and when the input / output operation retried on this alternate path is successful, the initial failure path is closed. Line fault handling method for external storage device.
JP4054362A 1992-02-05 1992-02-05 Route failure processing method for external storage device in information processing system Expired - Fee Related JP2830592B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4054362A JP2830592B2 (en) 1992-02-05 1992-02-05 Route failure processing method for external storage device in information processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4054362A JP2830592B2 (en) 1992-02-05 1992-02-05 Route failure processing method for external storage device in information processing system

Publications (2)

Publication Number Publication Date
JPH05216699A true JPH05216699A (en) 1993-08-27
JP2830592B2 JP2830592B2 (en) 1998-12-02

Family

ID=12968540

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4054362A Expired - Fee Related JP2830592B2 (en) 1992-02-05 1992-02-05 Route failure processing method for external storage device in information processing system

Country Status (1)

Country Link
JP (1) JP2830592B2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02165357A (en) * 1988-12-20 1990-06-26 Nec Corp Information transfer device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02165357A (en) * 1988-12-20 1990-06-26 Nec Corp Information transfer device

Also Published As

Publication number Publication date
JP2830592B2 (en) 1998-12-02

Similar Documents

Publication Publication Date Title
US5652833A (en) Method and apparatus for performing change-over control to processor groups by using rate of failed processors in a parallel computer
JP4012498B2 (en) Information processing system, information processing apparatus, information processing apparatus control method, and program
US7328367B2 (en) Logically partitioned computer system and method for controlling configuration of the same
US6526521B1 (en) Methods and apparatus for providing data storage access
EP2510439B1 (en) Managing errors in a data processing system
US20110138219A1 (en) Handling errors in a data processing system
US20090271541A1 (en) Information processing system and access method
CN107870832B (en) Multi-path storage device based on multi-dimensional health diagnosis method
US7236454B2 (en) Loop diagnosis system and method for disk array apparatuses
JPH07104947A (en) Disk controller and control method for the same
US7702823B2 (en) Disk subsystem monitoring fault
CN113595836A (en) Heartbeat detection method of high-availability cluster, storage medium and computing node
US6338151B1 (en) Input/output recovery which is based an error rate and a current state of the computer environment
CN115793963A (en) Hard disk fault processing method, device, equipment and storage medium
JP3139548B2 (en) Error retry method, error retry system, and recording medium therefor
US7117397B1 (en) Apparatus and method for preventing an erroneous operation at the time of detection of a system failure
US6336193B1 (en) Input/output recovery method which is based upon an error rate and a current state of the computer environment
US11704180B2 (en) Method, electronic device, and computer product for storage management
CN114968129B (en) Disk array redundancy method, system, computer equipment and storage medium
US6338145B1 (en) Input/output recovery system which is based upon an error rate and a current state of the computer environment
JPH05216699A (en) Path fault processing system of external storage device
CN111309504A (en) Control method for embedded module serial port redundant transmission and related components
JPH07141308A (en) Back-up method in information processing system
US11669399B2 (en) System and method for fault identification and fault handling in a multiport power sourcing device
JP3596744B2 (en) Resource use status monitoring control method and recording medium recording the program

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080925

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080925

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090925

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090925

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100925

Year of fee payment: 12

LAPS Cancellation because of no payment of annual fees