JP2830592B2 - Route failure processing method for external storage device in information processing system - Google Patents

Route failure processing method for external storage device in information processing system

Info

Publication number
JP2830592B2
JP2830592B2 JP4054362A JP5436292A JP2830592B2 JP 2830592 B2 JP2830592 B2 JP 2830592B2 JP 4054362 A JP4054362 A JP 4054362A JP 5436292 A JP5436292 A JP 5436292A JP 2830592 B2 JP2830592 B2 JP 2830592B2
Authority
JP
Japan
Prior art keywords
path
failure
external storage
storage device
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP4054362A
Other languages
Japanese (ja)
Other versions
JPH05216699A (en
Inventor
密次郎 内田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Electric Co Ltd filed Critical Nippon Electric Co Ltd
Priority to JP4054362A priority Critical patent/JP2830592B2/en
Publication of JPH05216699A publication Critical patent/JPH05216699A/en
Application granted granted Critical
Publication of JP2830592B2 publication Critical patent/JP2830592B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Hardware Redundancy (AREA)
  • Retry When Errors Occur (AREA)

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【産業上の利用分野】本発明は情報処理システム、さら
に詳しくいえば、ホストプロセッサ上で稼働するOS
が、ホストプロセッサに複数の経路で接続されている外
部記憶装置の状態を管理する経路障害処理方式に関す
る。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing system, more specifically, an OS running on a host processor.
Relates to a path failure processing method for managing the state of an external storage device connected to a host processor via a plurality of paths.

【0002】[0002]

【従来の技術】情報処理システムにおいては、外部記憶
装置が何らかの障害を検知して入出力動作を異常終了す
る場合に障害の状況を示す詳細情報を生成し、これをO
Sに報告している。従来はこの詳細情報内に経路障害で
あるか、経路に依存しない共通部分の障害であるかの提
示を行うのが一般的であった。OSはこの共通部障害か
経路障害の情報を判定し、異常終了した入出力動作を次
のように再試行している。
2. Description of the Related Art In an information processing system, when an external storage device detects some kind of failure and abnormally terminates an input / output operation, it generates detailed information indicating the state of the failure and sends it to an O / O.
Report to S. Conventionally, it has been general to indicate in this detailed information whether the failure is a path failure or a failure in a common part independent of the path. The OS determines the information of the common unit failure or the path failure, and retry the abnormally terminated input / output operation as follows.

【0003】まず、共通部障害の場合は障害経路上で入
出力動作を規定回数だけ再試行する。つぎに経路障害の
場合は規定回数の障害経路上での再試行を行い、これで
救済できない場合に交替経路上で規定回数だけ再試行を
行う。利用可能な交替経路が存在しない場合は前者と同
様な障害処理を行う。交替経路上での再試行で救済に成
功した場合、交替経路の動作が正常であること、そして
障害経路では既に複数回の試行が失敗しており、障害経
路を以降の入出力動作で用いた場合に同種の障害が発生
する確率が高いことから障害経路を以降の入出力動作で
用いないように閉塞する。この経路閉塞の目的は障害確
率の高い経路の使用を取り止めて障害処理によるシステ
ムのオーバヘッド増大を回避することである。
First, in the case of a common unit failure, the input / output operation is retried a specified number of times on the failure path. Next, in the case of a path failure, retries are performed a specified number of times on the failed path, and if this cannot be remedied, retries are performed a specified number of times on the alternate path. If there is no available alternative route, the same failure processing as the former is performed. If the remedy succeeds on the alternate route, the operation of the alternate route is normal, and multiple attempts have already failed on the failed route, and the failed route was used in subsequent I / O operations. In such a case, since the probability of occurrence of the same type of fault is high, the fault route is closed so as not to be used in subsequent input / output operations. The purpose of this path blockage is to stop using a path with a high probability of failure and avoid an increase in system overhead due to failure processing.

【0004】[0004]

【発明が解決しようとする課題】ところで、線路障害が
間欠的であるが、繰り返し発生する環境下では障害の発
生頻度が高まると、それに応じて障害線路再試行回数も
増し、システムの性能を圧迫するという問題があった。
しかも、障害経路再試行で救済されてしまうので、従来
の方式では障害経路を閉塞するまでは至らず以降の入出
力動作でも引き続き障害経路を使用してしまっていた。
本発明の目的は上記問題を解決するもので、不安定な経
路を使用する機会を減らし、無駄な再試行処理によるシ
ステム性能の低下を防止する外部記憶装置の経路障害処
理方式を提供することにある。
By the way, although the line failure is intermittent, in a repetitive environment, as the frequency of occurrence of the failure increases, the number of times the failed line is retried increases accordingly, and the performance of the system is reduced. There was a problem of doing.
In addition, since the fault route is remedied by retrying, the conventional method does not block the fault route and continues to use the fault route in subsequent input / output operations.
SUMMARY OF THE INVENTION An object of the present invention is to solve the above-described problem, and to provide a path failure processing method for an external storage device that reduces the chance of using an unstable path and prevents a decrease in system performance due to useless retry processing. is there.

【0005】[0005]

【課題を解決するための手段】前記目的を達成するため
に本発明による情報処理システムにおける外部記憶装置
の経路障害処理方式は、ホストプロセッサ上で稼働する
OSが、ホストプロセッサに複数の経路で接続された外
部記憶装置の状態管理を行う情報処理システムにおい
て、前記外部記憶装置経路障害発生累積回数を計数
る経路障害累積カウンタ手段と、前記経路障害発生累積
回数がしきい値を越えた後の経路障害発生時に、入出力
動作異常終了をOSに報告する際に経路障害発生累積回
数がしきい値を越えたことを報告する制御手段を有し、
前記報告を受けたOSは交替経路が存在すれば、この交
替経路上で、交替経路が存在しなければ現障害経路で入
出力動作を再試行するようにし、 前記交替経路上で再試
行し、その入出力動作が成功した場合には初期の障害経
路を閉塞するように構成してある。
In order to achieve the above object, an external storage device in an information processing system according to the present invention is provided.
Path failure processing method of, OS running on the host processor, an information processing system odor that performs state management of the external storage device connected by a plurality of paths to the host processor
Te, be counted a path failure cumulative number of times the external storage device
A path failure accumulation counter means that, when a path failure occurs after the cumulative number of path failure occurrences exceeds a threshold value, when the I / O operation abnormal end is reported to the OS, the path failure accumulation number counts the threshold value. and a control means for reporting a call beyond,
If the OS receives the report exists the alternative path, on this alternative path, if the alternative path exists so as to retry the input and output operations in the current failure path, over the alternative path retried
If the I / O operation succeeds, the initial failure
It is configured to block the road .

【0006】[0006]

【実施例】以下、図面を参照して本発明をさらに詳しく
説明する。図1は本発明による外部記憶装置の線路障害
処理方式を採用した情報処理システムの実施例を示すブ
ロック図である。本実施例の情報処理システムは物理的
にはホストプロセッサ10,入出力チャネル群20およ
び外部記憶サブシステム30より構成されている。OS
12はホストプロセッサ10上で稼働するプログラムで
あって、ジョブ管理部13,資源管理部14等、情報処
理システム全体を管理している。OS12の資源管理部
14の一部には資源状態テーブル17が設けられ、OS
12が利用可能な情報処理システム内の各資源の状態が
掌握されている。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in more detail with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of an information processing system employing a line fault processing method for an external storage device according to the present invention. The information processing system of this embodiment is physically composed of a host processor 10, an input / output channel group 20, and an external storage subsystem 30. OS
A program 12 runs on the host processor 10 and manages the entire information processing system such as the job management unit 13 and the resource management unit 14. A resource status table 17 is provided in a part of the resource management unit 14 of the OS 12, and the OS 12
The status of each resource in the information processing system that can use the information 12 is controlled.

【0007】入出力チャネル群20はホストプロセッサ
10と外部記憶サブシステム30との間を接続する接続
部であり、OS12からは外部記憶サブシステム30へ
のアクセス経路の一部分として用いられる。外部記憶サ
ブシステム30は複数の外部記憶装置33および外部記
憶装置33と入出力チャネル20との間のデータ交換を
行う複数の外部記憶制御装置31よりなる。外部記憶装
置33はその重要性からOS12と複数の接続経路によ
って接続され、ある経路が障害により使用不能になって
も交替の経路で入出力動作を継続し得るように構成され
ている。外部記憶制御装置31は各経路上に配置されて
いる。
The input / output channel group 20 is a connection unit for connecting the host processor 10 and the external storage subsystem 30, and is used as a part of an access path from the OS 12 to the external storage subsystem 30. The external storage subsystem 30 includes a plurality of external storage devices 33 and a plurality of external storage control devices 31 for exchanging data between the external storage device 33 and the input / output channel 20. The external storage device 33 is connected to the OS 12 by a plurality of connection paths due to its importance, and is configured to be able to continue input / output operations on an alternate path even if a certain path becomes unavailable due to a failure. The external storage control device 31 is arranged on each path.

【0008】さて、アプリケーションプログラム11が
発する入出力要求はOS12内のファイル管理部15を
経由して同じくOS12内の入出力制御部16の管理下
で実行される。入出力制御部16は資源状態テーブル1
7を参照してアクセスすべき外部記憶装置33へ至る複
数の経路から使用可能なひとつの経路を選択し、選択し
た経路上の入出力チャネル20に対して入出力命令を発
行する。発行された入出力命令は入出力チャネル20と
外部記憶制御装置31の複合的な動作により実行され、
障害の有無および障害の状況を提示する詳細情報が入出
力命令の終了結果として入出力制御部16に返却され
る。入出力動作における障害は入出力チャネル20,外
部記憶制御装置31または外部記憶装置33で発生し、
このうち外部記憶制御装置31と外部記憶装置33で発
生した障害に関しては外部記憶制御装置31において詳
細情報が生成される。
An input / output request issued by the application program 11 is executed under the control of an input / output control unit 16 in the OS 12 via a file management unit 15 in the OS 12. The input / output control unit 16 stores the resource status table 1
7, an available path is selected from a plurality of paths to the external storage device 33 to be accessed, and an input / output instruction is issued to the input / output channel 20 on the selected path. The issued I / O instruction is executed by the combined operation of the I / O channel 20 and the external storage controller 31,
Detailed information indicating the presence / absence of a failure and the state of the failure is returned to the input / output control unit 16 as the end result of the input / output command. A failure in the input / output operation occurs in the input / output channel 20, the external storage control device 31, or the external storage device 33,
Among them, detailed information is generated in the external storage control device 31 with respect to a failure that has occurred in the external storage control device 31 and the external storage device 33.

【0009】外部記憶制御装置31は発生した障害を外
部記憶サブシステム30における共通部分の障害と経路
障害とに分類し、経路障害である場合に外部記憶制御装
置31内に設置した経路障害累積カウンタ32をインク
リメントする。外部記憶制御装置31が生成する詳細情
報内には入出力制御部16への障害処理要求種別を提示
する部分を設け、「経路障害累積カウンタがしきい値未
満である経路障害」,「経路障害累積カウンタがしきい
値以上である経路障害」または「経路障害ではない」の
いずれかを格納する。この報告を受け取った入出力制御
部16は障害処理要求種別と資源状態テーブル17に応
じて障害処理を選択する。
The external storage controller 31 categorizes the generated fault into a common part fault and a path fault in the external storage subsystem 30, and if the fault is a path fault, a path fault accumulation counter installed in the external storage controller 31. 32 is incremented. In the detailed information generated by the external storage control device 31, there is provided a portion for presenting a type of a fault processing request to the input / output control unit 16, and “path faults whose route fault cumulative counter is less than a threshold value”, “path faults”. Either "path failure whose cumulative counter is equal to or greater than the threshold value" or "not a path failure" is stored. Upon receiving this report, the input / output control unit 16 selects a failure process according to the failure process request type and the resource status table 17.

【0010】図2は入出力制御部のその動作手順を説明
するためのフローチャートである。まず、交替経路があ
るか否かを判断し(ステップ1)、利用不能である場合
には障害処理要求種別によらず障害経路での再試行を行
う(ステップ7)。一方、交替経路が利用可能である場
合には障害処理要求種別を判断し(ステップ2)、以下
のような障害処理を実行する。「経路障害でない」場合
には障害経路での再試行を行う(ステップ7)。「経路
障害である」場合には「経路障害累積カウンタがしきい
値未満である経路障害」であるか否かを判断する(ステ
ップ3)。「経路障害累積カウンタがしきい値未満であ
る経路障害」である場合には障害経路での規定回数の再
試行の後(ステップ4)、救済できるか否かを判断する
(ステップ5)。救済できる場合には再試行成功として
終了する。救済できない場合には交替経路での再試行に
進む(ステップ9)。また「経路障害累積カウンタがし
きい値以上である経路障害」の場合には直ちにステップ
9に進み交替経路での再試行を行う。
FIG. 2 is a flowchart for explaining the operation procedure of the input / output control unit. First, it is determined whether or not there is an alternate route (step 1). If the route is unavailable, retry is performed on the failed route regardless of the type of the failure processing request (step 7). On the other hand, when the alternative route is available, the type of the failure processing request is determined (step 2), and the following failure processing is executed. If "not a path failure", retry is performed on the failed path (step 7). In the case of "path failure", it is determined whether or not "path failure with a path failure cumulative counter smaller than a threshold value" (step 3). If it is "route fault whose route fault cumulative counter is less than the threshold value", after retrying the fault route a specified number of times (step 4), it is determined whether or not rescue is possible (step 5). If the remedy can be made, the retry is completed successfully. If it cannot be remedied, the process proceeds to the retry on the alternate route (step 9). In the case of "route fault whose route fault cumulative counter is equal to or greater than the threshold value", the process immediately proceeds to step 9 to retry on an alternate route.

【0011】障害経路再試行、交替経路再試行のいずれ
かで障害が救済された場合、入出力制御部16は入出力
要求元であるアプリケーションプログラム11に入出力
動作が正常に終了したことを通知する。また、入出力制
御部16は上記障害処理の過程で交替経路再試行を行っ
て交替経路上で救済した場合に、資源状態テーブル17
上で初期の障害経路を利用不能とする(ステップ1
1)。一方、外部記憶制御装置31は個々の入出力命令
を実行するのであって、各入出力命令が初期の命令であ
るのか、障害経路再試行時の命令であるのか、交替経路
再試行時の命令であるのか判別しない。したがって、障
害処理要求種別はあくまで要求であって、障害処理種別
は入出力制御部16の判定によって定まるものである。
When the fault is remedied by either the retry of the fault route or the retry of the alternate route, the input / output control unit 16 notifies the application program 11 that is the input / output requester that the input / output operation has been normally completed. I do. Also, the input / output control unit 16 retrys the alternate route in the course of the above-mentioned fault processing and relieves on the alternate route.
Makes the initial failure route unavailable (step 1
1). On the other hand, the external storage control device 31 executes each input / output instruction, and determines whether each input / output instruction is an initial instruction, an instruction at the time of retrying a failed path, or an instruction at the time of retrying an alternate path. Is not determined. Therefore, the failure processing request type is just a request, and the failure processing type is determined by the input / output control unit 16.

【0012】[0012]

【発明の効果】以上、説明したように本発明によれば、
経路障害が間欠的ではあるが繰り返し発生する場合に
も、交替経路再試行を促進しさらに障害経路を早期に閉
塞することが可能なように構成されているので、無駄な
再試行処理によるシステム性能の低下を防止するともに
不安定な経路を継続して使用することにより起こりがち
なより重度の障害の発生を防止できるという効果があ
る。
As described above, according to the present invention,
Even if the path failure is intermittent but occurs repeatedly, the system is configured so that alternate path retry can be promoted and the failure path can be closed early, so that system performance due to useless retry processing is improved. In addition, there is an effect that the occurrence of a more serious failure which is likely to occur by continuously using the unstable path can be prevented.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明による外部記憶装置の経路障害処理方式
の実施例を示すブロック図である。
FIG. 1 is a block diagram showing an embodiment of a route failure processing method for an external storage device according to the present invention.

【図2】図1の入出力制御部の障害処理を説明するため
のフローチャートである。
FIG. 2 is a flowchart illustrating a failure process of an input / output control unit in FIG. 1;

【符号の説明】[Explanation of symbols]

10…ホストプロセッサ 11…アプリケーションプログラム 12…OS(オペレーションシステム) 13…ジョブ管理部 14…資源管理部 15…ファイル管理部 16…入出力制御部 17…資源状態テーブル 20…入出力チャネル 30…外部記憶サブシステム 31…記憶制御装置 32…経路障害累積カウンタ 33…外部記憶装置 REFERENCE SIGNS LIST 10 host processor 11 application program 12 OS (operation system) 13 job management unit 14 resource management unit 15 file management unit 16 input / output control unit 17 resource status table 20 input / output channel 30 external storage Subsystem 31 ... Storage control device 32 ... Path failure accumulation counter 33 ... External storage device

Claims (1)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】 ホストプロセッサ上で稼働するOSが、
ホストプロセッサに複数の経路で接続された外部記憶装
置の状態管理を行う情報処理システムにおいて、 前記外部記憶装置経路障害発生累積回数を計数する経
路障害累積カウンタ手段と、 前記経路障害発生累積回数がしきい値を越えた後の経路
障害発生時に、入出力動作異常終了をOSに報告する際
に経路障害発生累積回数がしきい値を越えたことを報告
する制御手段を有し、 前記報告を受けたOSは交替経路が存在すれば、この交
替経路上で、交替経路が存在しなければ現障害経路で入
出力動作を再試行するようにし、 前記交替経路上で再試行し、その入出力動作が成功した
場合には初期の障害経路を閉塞するように構成したこと
を特徴とする情報処理システムにおける外部記憶装置の
経路障害処理方式。
1. An OS running on a host processor,
In an information processing system for managing a state of an external storage device connected to a host processor via a plurality of paths, a process for counting the number of times of path failure occurrence accumulation in the external storage device is performed.
A path failure accumulation counter means for, when a path failure occurs after the cumulative number of path failures exceeds a threshold value, reporting the I / O operation abnormal end to the OS, the cumulative number of path failure occurrences exceeding the threshold value; reporting and octopus
A control means for, if the report received OS is it exists the alternative path, on this alternative path, so as to retry the input and output operations in the current failure path if the alternative path exists, the Retryed on alternate route and the I / O operation was successful
A path failure processing method for an external storage device in an information processing system, wherein an initial failure path is closed in a case .
JP4054362A 1992-02-05 1992-02-05 Route failure processing method for external storage device in information processing system Expired - Fee Related JP2830592B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4054362A JP2830592B2 (en) 1992-02-05 1992-02-05 Route failure processing method for external storage device in information processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4054362A JP2830592B2 (en) 1992-02-05 1992-02-05 Route failure processing method for external storage device in information processing system

Publications (2)

Publication Number Publication Date
JPH05216699A JPH05216699A (en) 1993-08-27
JP2830592B2 true JP2830592B2 (en) 1998-12-02

Family

ID=12968540

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4054362A Expired - Fee Related JP2830592B2 (en) 1992-02-05 1992-02-05 Route failure processing method for external storage device in information processing system

Country Status (1)

Country Link
JP (1) JP2830592B2 (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02165357A (en) * 1988-12-20 1990-06-26 Nec Corp Information transfer device

Also Published As

Publication number Publication date
JPH05216699A (en) 1993-08-27

Similar Documents

Publication Publication Date Title
KR100557399B1 (en) A method of improving the availability of a computer clustering system through the use of a network medium link state function
US5894583A (en) Variable timeout method for improving missing-interrupt-handler operations in an environment having I/O devices shared by one or more systems
US5758190A (en) Control unit threshold timeout controls for software missing interrupt handlers in operating systems
US6986076B1 (en) Proactive method for ensuring availability in a clustered system
JP4012498B2 (en) Information processing system, information processing apparatus, information processing apparatus control method, and program
JP2905373B2 (en) Disk control device and control method thereof
US5652833A (en) Method and apparatus for performing change-over control to processor groups by using rate of failed processors in a parallel computer
JPH08328880A (en) Computer operation management system for operating system capable of simultaneous executing of plural application programs
US20080288812A1 (en) Cluster system and an error recovery method thereof
US20070130432A1 (en) Storage system for copying data between storage devices, and method of copying data
US6338151B1 (en) Input/output recovery which is based an error rate and a current state of the computer environment
JP3139548B2 (en) Error retry method, error retry system, and recording medium therefor
US7117397B1 (en) Apparatus and method for preventing an erroneous operation at the time of detection of a system failure
US6336193B1 (en) Input/output recovery method which is based upon an error rate and a current state of the computer environment
JP2830592B2 (en) Route failure processing method for external storage device in information processing system
JP3555047B2 (en) Compound computer system
US11704180B2 (en) Method, electronic device, and computer product for storage management
US6338145B1 (en) Input/output recovery system which is based upon an error rate and a current state of the computer environment
JPS6146543A (en) Fault processing system of transfer device
JP3683831B2 (en) Checkpoint designating method, apparatus and program recording medium for channel recovery in data processing system
JP2000148525A (en) Method for reducing load of active system in service processor duplex system
JPH05265876A (en) Error report processing system
JP7007025B2 (en) Fault handling equipment, fault handling methods and computer programs
JPS592152A (en) Resetting system in case of fault
JP3254766B2 (en) Multiple writing method of the same data, data reading method and data recovery method, and control device therefor

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080925

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080925

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090925

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090925

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100925

Year of fee payment: 12

LAPS Cancellation because of no payment of annual fees