JPS629929B2

JPS629929B2 -

Info

Publication number: JPS629929B2
Application number: JP55185395A
Authority: JP
Inventors: Kyoichiro Goto
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1980-12-29
Filing date: 1980-12-29
Publication date: 1987-03-03
Also published as: JPS57111736A

Description

【発明の詳細な説明】本発明は、特にオンライン・システム運用者に
おける業務ジヨブの使用データの障害に対して、
運用と並列な復旧処理と、その後のダイナミツク
なシステムへの再組込みを行わせるリカバリ方式
に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention provides solutions for failures in data used by business jobs, particularly in online system operators.
It relates to a recovery method that performs recovery processing in parallel with operation and subsequent re-integration into a dynamic system.

第１図ないし第３図は従来のリカバリ方式を示
すものであつて、第１図はボリユーム全体に対す
るリカバリ方式を説明する図、第２図はデバイス
障害に対するリカバリ方式を説明する図、第３図
は部分障害に対する障害を説明する図である。 Figures 1 to 3 show conventional recovery methods, with Figure 1 explaining the recovery method for the entire volume, Figure 2 explaining the recovery method for device failure, and Figure 3. FIG. 2 is a diagram illustrating a failure with respect to a partial failure.

第１図において、VOO１はボリユーム、Ａと
Ｂはデータ・セツト、J₁ないしJ₃は利用者をそれ
ぞれ示している。ボリユーム全体に障害があつた
場合には、全利用者J₁ないしJ₃のジヨブをクロー
ズし、利用者が零の状態にする。次に新しい予備
媒体をデバイス上にセツトし、この予備媒体に対
して同一のボリユーム名を与え、ボリユーム・イ
ニシヤライズを行う。次にバツクアツプ・データ
およびログ・データを用いてデータ・セツトの復
旧を行う。データ・セツト復旧後に利用者のジヨ
ブの立上げを行う。 In FIG. 1, VOO1 represents the volume, A and B represent the data sets, and _J1 to _J3 represent the users, respectively. If a failure occurs in the entire volume, the jobs of all users J ₁ to J ₃ are closed, leaving the number of users at zero. Next, set a new spare medium on the device, give the same volume name to this spare medium, and perform volume initialization. Next, the data set is restored using the backup data and log data. After the data set is restored, start up the user's job.

第２図はデバイス障害時のリカバリ方式を説明
するものである。図において、DVはデバイス、
UCBはユニツト制御ブロツクをそれぞれ示して
いる。いま、例えば機番が110番のデバイスに障
害が発生したと仮定すると、機番110番のデバイ
ス上の媒体を他のデバイス、例えば機番111番の
デバイスに掛け替え、媒体に対してボリユーム名
VOO１を与え、ユニツト制御ブロツクUCBの機
番名などを書き替える。このユニツト制御ブロツ
クUCBの内容を書き替える処理をスワツプ処理
とよんでいる。 FIG. 2 explains a recovery method when a device failure occurs. In the figure, DV is the device,
UCB indicates a unit control block, respectively. For example, suppose a failure occurs in the device with machine number 110, then transfer the medium on the device with machine number 110 to another device, for example, the device with machine number 111, and set the volume name to the medium.
Give VOO1 and rewrite the machine number name etc. of the unit control block UCB. This process of rewriting the contents of the unit control block UCB is called swap processing.

第３図はボリユーム部分障害発生時のリカバリ
方式を説明するものであつて、TRはトラツク、
RECはレコード、ATRIは交代トラツク情報部、
ATRは交代トラツクをそれぞれ示している。第
３図に示すように、トラツクに障害が発生する
と、オペレータは障害トラツクに対する交代トラ
ツク割当てを行い、書き替え用のユーテイリテイ
を起動する。空きの交代トラツクが存在しない場
合または交代トラツクに障害が検出された場合に
は、結果的には新たなボリユームに移行しなけれ
ばならず、ボリユーム全体障害としてのリカバリ
が必要となる。 Figure 3 explains the recovery method when a volume partial failure occurs.
REC is record, ATRI is alternate track information department,
ATR indicates each alternate track. As shown in FIG. 3, when a fault occurs in a track, the operator allocates a replacement track to the faulty track and starts the rewriting utility. If there is no free replacement track, or if a failure is detected in the replacement track, the volume must be migrated to a new volume, and recovery as a whole volume failure is required.

第１図の従来方式は、障害データセツトを使用
している業務ジヨブを停止させなければ復旧を行
えないので、MTBFが低下するという欠点を有
し、更に復旧対象となる媒体は同一ボリユーム情
報を持つていなければならない制約があるので、
予備媒体の用意とボリユーム・イニシヤライズが
必要であるという欠点がある。 The conventional method shown in Figure 1 has the disadvantage that the MTBF decreases because recovery cannot be performed unless the business job using the faulty data set is stopped, and furthermore, the medium to be recovered has the same volume information. There are restrictions that you must have,
It has the disadvantage that it requires preparation of a spare medium and initialization of the volume.

第２図の従来方式は、パツク移動が不可能な大
記憶装置に対してはその機能が適用できないの
で、このような性質を持つものにデバイス障害が
発生すると実質的にボリユーム全体障害となつて
しまうという欠点がある。 The conventional method shown in Figure 2 cannot be applied to large storage devices that cannot be moved in bulk, so if a device failure occurs in a device with such characteristics, it will essentially cause the entire volume to fail. It has the disadvantage of being stored away.

本発明は、上記の考察に基づくものであつて、
記憶装置のボリユーム全体障害発生時に含まれて
いる幾つかのエクステントを任意の現用ボリユー
ム上へ復旧格納することが出来るようにして予備
ボリユームの必要性を解消し、更にこのようにし
て復旧されたエクステントとのダイナミツクなス
ワツプ手法を用いることによりオープン状態での
復旧処理および運用再開処理を行い得るようにし
たエクステント・スワツプ・リカバリ方式を提供
することを目的としている。そしてそのため、本
発明のエクステント・スワツプ・リカバリ方式
は、データ・セツトへのアクセス中にボリユーム
障害が発生した場合、障害ボリユーム上のエクス
テントを他のボリユーム上で復旧し、しかる後に
障害ボリユーム上のエクステントとをスワツプす
るエクステント・スワツプ・リカバリ方式であつ
て、復旧元のエクステント並びに復旧先ボリユーム
およびスペースとに関する情報を含むリカバリ指
令が入力されたとき、上記指定された復旧先ボリ
ユームのスペース上にトータル・ダンプ・データ
とログ・データとを用いて復旧元エクステントを
復旧すると共に、当該エクステントとボリユーム
との関係を示す制御情報を更新するリカバリ手段
と、エクステントを復旧した後、当該エクステント
が要求されたとき、上記指定された復旧先ボリユ
ームのスペースに格納されているエクステントを
アクセスするように、上記更新された制御情報を
参照してエクステントのアクセス・ルートを切替
えるエクステント・スワツプ手段と、を備えることを特徴とするものである。以下、本
発明を図面を参照しつつ説明する。 The present invention is based on the above considerations, and includes:
This eliminates the need for a spare volume by making it possible to recover and store some of the extents included in the entire volume of the storage device on any active volume, and furthermore, the extents recovered in this way can be recovered and stored on any active volume. The purpose of the present invention is to provide an extent swap recovery method that enables recovery processing and operation restart processing in an open state by using a dynamic swap method with . Therefore, when a volume failure occurs during access to a data set, the extent swap recovery method of the present invention recovers the extents on the failed volume on other volumes, and then restores the extents on the failed volume. This is an extent swap recovery method that swaps the recovery source extent and the recovery destination volume and space, and when a recovery command is input that includes information about the recovery source extent and the recovery destination volume and space, the total a recovery means for restoring a recovery source extent using dump data and log data, and updating control information indicating a relationship between the extent and a volume; and after restoring the extent, when the extent is requested. , extent swapping means for switching the extent access route by referring to the updated control information so as to access the extent stored in the space of the specified recovery destination volume; That is. Hereinafter, the present invention will be explained with reference to the drawings.

本発明は、障害となつたデータ・セツトについ
て利用者が存在している状態、すなわちオープン
状態のまゝでの復旧データ・セツトの再組み込み
実現およびデータ・セツトをリカバリする場合の
復旧形態に対する柔軟性などを目的とするもので
あり、以下にその特徴点について列挙する。 The present invention realizes the re-incorporation of a recovery data set while the failed data set remains in the state in which the user exists, that is, the open state, and provides flexibility regarding the recovery mode when recovering the data set. Its purpose is sex, etc., and its characteristics are listed below.

今後開発される大記憶装置は大容量であり且つ
固定パツクの形態が多く、これらの装置にデバイ
ス障害が発生すると、従来のダイナミツク・デバ
イス・リコンフイグレーシヨン機能によるリカバ
リ（オープン中状態での可能なスワツプ・リカバ
リ）が適用できず、実質的にボリユーム全体障害
の扱いとなり、運用中でのリカバリが不可能とな
る。そこで、これに代わるリカバリ方式が必要と
され、固定大記憶装置に限らず、標準的なオープ
ン中状態でのリカバリ機能として、運用と並行し
て障害エクステントの復旧を行い、その後に障害
エクステントと復旧後のエクステントとを動的に
スワツプ処理することにより、システムへの再組
込みを可能とした。 Large storage devices that will be developed in the future will have a large capacity and will often be in the form of fixed packs, so if a device failure occurs in these devices, recovery using the conventional dynamic device reconfiguration function (possible while open) Swap recovery) cannot be applied, and the entire volume is effectively treated as a failure, making recovery during operation impossible. Therefore, an alternative recovery method is needed, and is not limited to fixed large storage devices.As a standard open state recovery function, the failed extent is recovered in parallel with operation, and then the failed extent is restored. By dynamically swapping with subsequent extents, it is possible to re-incorporate it into the system.

オープン状態においてスワツプすべき内容に
は、復旧先ボリユーム上でのデータの先頭トラツ
ク・アドレスから最終トラツク・アドレスを管理
しているエクステント情報と、復旧先ボリユーム
へのアクセス・ルートを定めるユニツト情報とが
ある。また、利用者ジヨブのイニシエート段階で
管理されるジヨブ制御文に対応した入出力装置資
源に関するユニツト情報も対象としている。 The contents to be swapped in the open state include extent information that manages the first track address to the last track address of data on the recovery destination volume, and unit information that determines the access route to the recovery destination volume. be. It also covers unit information regarding input/output device resources that correspond to job control statements that are managed at the initiation stage of a user's job.

上記のスワツプ対象となる情報は、各利用者の
ジヨブ毎に保有しているので、スワツプは各利用
者毎の資格で行わせる必要があり、そのために復
旧後の利用者からの最初の再開アクセスを契機と
してスワツプを行つている。 The information to be swapped above is held for each job of each user, so swapping must be performed with the qualifications of each user, so the first restarted access from the user after recovery Taking this as an opportunity, we are conducting a swap.

今後の大記憶装置は、大容量化の傾向を示して
いるので、従来のように予備ボリユームもしくは
予備スピンドルを利用したリカバリ形態だけでは
なく、現用ボリユーム上の空きスペースを利用し
たデータ・セツトの復旧形態を目指し、先ず、こ
のためにエクステントが属していた（所在してい
た）ボリユームとの対応関係が変更されることへ
の対策としてAIMが管理しているカタログ情報
（データ・セツトとボリユームとの対応管理情
報）をリカバリと連動して自動変更できるように
した。なお、AIMとは、オペレーテイング・シ
ステムとユーザ・プログラムの間に位置づけられ
るOSのサブシステムである。AIMについては、
例えばFACOM OS TV／F4に概説されている。 Since large storage devices in the future are showing a tendency to increase in capacity, it is not only possible to recover data sets using the spare volume or spare spindle as in the past, but also to use free space on the current volume. For this purpose, we first created catalog information managed by AIM (the relationship between data sets and volumes) as a countermeasure against the change in the correspondence relationship with the volume to which the extent belonged (was located). Support management information) can now be changed automatically in conjunction with recovery. Note that AIM is an OS subsystem located between the operating system and user programs. Regarding AIM,
For example, it is outlined in FACOM OS TV/F4.

更に、ボリユーム全体障害や固定パツク形のデ
バイス障害、部分障害のような各種の障害種に対
しての適用を可能とし、対象システム規模の点か
らも広範囲なリカバリ方式とした。 Furthermore, it can be applied to various types of failures such as total volume failures, fixed pack type device failures, and partial failures, making it a wide-ranging recovery method in terms of target system scale.

第４図ないし第１０図は本発明の１実施例を示
すものであり、第４図は本発明によるデータ・セ
ツトの復旧形態を示す図、第５図は本発明のエク
ステント・スワツプ・リカバリ方式の処理の概要
を示す図、第６図はエクステントに対するアクセ
ス管理を説明する図、第７図は障害発生時からア
クセス禁止までのプロセスを示す図、第８図は障
害エクステントに対するデータ復旧プロセスを示
す図、第９図はオープン中データ・セツトとのエ
クステント・スワツプ・プロセスを示す図、第１
０図は障害エクステント復旧後の新規利用者ジヨ
ブに対するボリユームの割当て論理を示す図であ
る。 4 to 10 show one embodiment of the present invention, FIG. 4 is a diagram showing a data set recovery mode according to the present invention, and FIG. 5 is a diagram showing an extent swap recovery method according to the present invention. 6 is a diagram illustrating access management to extents, FIG. 7 is a diagram illustrating the process from failure occurrence to access prohibition, and Figure 8 is a diagram illustrating the data recovery process for failed extents. Figure 9 shows the extent swap process with an open data set, Figure 1.
FIG. 0 is a diagram showing the logic for allocating volumes to new user jobs after failure extent recovery.

第４図イは、ボリユームVOO１にボリユーム
全体障害が発生したとき、ボリユームVOO１の
エクステントＡを現用ボリユームVOO２の空き
スペースに復旧し、エクステントＢを現用ボリユ
ームVOO３の空きスペースに復旧した場合を示
している。第４図ロは、ボリユームVOO１にボ
リユーム全体障害が発生した場合、ボリユーム
VOO１のエクステントＡとＢを予備ボリユーム
上に復元した場合を示している。 Figure 4A shows the case where when a total volume failure occurs in volume VOO1, extent A of volume VOO1 is restored to the free space of the current volume VOO2, and extent B is restored to the free space of the current volume VOO3. . Figure 4B shows that if a volume-wide failure occurs in volume VOO1, the volume
This shows the case where extents A and B of VOO1 are restored onto the spare volume.

第５図は、本発明のエクステント・スワツプ・
リカバリ方式の処理の概要を示している。なお、
，，……は事象発生の順序を示している。
例えばオンライン系の利用者ジヨブがボリユーム
VOO１上のエクステントＡ又はＢを使用してい
る場合に、ボリユームVOO１に全体障害が発生
したとすると、利用者ジヨブに対して障害発生が
通知され、同時にオペレータに対しても障害発生
が通知される。障害発生が通知されると、オペレ
ータは、ジヨブ制御文で復旧先の空きスペースを
指定し、AIMリカバリ・ジヨブを起動する。
AIMリカバリ・ジヨブは、トータル・ダンプ・
データおよびログ・データを使用して現用ボリユ
ームVOO２および現用ボリユームVOO３の空き
のスペースにエクステントＡとＢとを復旧する。
次に利用者ジヨブのアクセスが再開されると、エ
クステント・スワツプ処理が行われ、利用者ジヨ
ブは、ボリユームVOO２上のエクステントＡお
よびボリユームVOO３上のエクステントＢを用
いて実行される。 FIG. 5 shows the extent swap of the present invention.
An overview of recovery method processing is shown. In addition,
,,... indicate the order of event occurrence.
For example, the volume of online user jobs is
If extent A or B on VOO1 is used and a general failure occurs in volume VOO1, the user job will be notified of the failure, and at the same time the operator will also be notified of the failure. . When notified of a failure, the operator specifies a free space to recover to using a job control statement and starts the AIM recovery job.
The AIM recovery job is a total dump
The data and log data are used to restore extents A and B to free space in the current volume VOO2 and the current volume VOO3.
Next, when access to the user job is resumed, extent swap processing is performed, and the user job is executed using extent A on volume VOO2 and extent B on volume VOO3.

第６図はエクステントとボリユームの対応なら
びにエクステントに対するアクセス管理を説明す
る図である。図において、UCBはユニツト制御
ブロツク、DCBはデータ・セツト制御ブロツ
ク、DEBはデータ・セツト・エクステント・ブ
ロツク、EACBはエクステント・アクセス制御ブ
ロツク、EBLTはエクステント・リスト、TCBは
タスク制御ブロツク、TiOTはタスク入出力テー
ブルをそれぞれ示している。ユニツト制御ブロツ
クUCBは、ボリユーム１対１の対応関係にあ
り、対応するボリユー名（例えばVOO1のいう識
別情報）およびボリユームが装着されている物理
機番を管理している。 FIG. 6 is a diagram illustrating the correspondence between extents and volumes and access management to extents. In the figure, UCB is the unit control block, DCB is the data set control block, DEB is the data set extent block, EACB is the extent access control block, EBLT is the extent list, TCB is the task control block, and TiOT is the task. Each shows an input/output table. The unit control block UCB has a one-to-one correspondence with the volumes, and manages the corresponding volume name (for example, identification information of VOO1) and the physical machine number to which the volume is installed.

データ・セツト制御ブロツクDCBは、応用プ
ログラムが或るデータ・セツトをアクセスする時
のデータ管理が使用する制御ブロツクであり、デ
ータ・セツト・エクステント・ブロツクDEBを
チエインしている。データ・セツト・エクステン
ト・ブロツクDEBは、データ・セツトのエクス
テント情報を管理している制御ブロツクであつ
て、ボリユーム上での物理的媒体（データが格納
されている先頭トラツク・アドレスから最後のト
ラツク・アドレス）やそのボリユームが何れの装
置に置かれているかをUCBアドレスをチエイン
ニングして管理している。エクステント・アクセ
ス制御ブロツクEACBは、AIM管理下の制御表で
あつて、エクステント単位にアクセスの可／不可
のステータスを管理している。エクステント・リ
ストEBLTは、同一エクステントを使用している
全応用プログラム（全タスク）を管理するもので
あつて、エクステント・アクセス制御ブロツク
EACBと対応付けられており、また、アクセス・
キーであるエクステント番号により対応管理され
る。タスク制御ブロツクTCBは、応用プログラ
ム・タスクを管理するためのブロツクであり、こ
こではタスクとの対応関係を持つために使用して
いる。タスク入出力リストTiOTはタスクという
呼び名がついているが、実際にはジヨブで使用す
る入出力装置の資源に関する割当情報を管理する
テーブルである。使用用途としては、応用プログ
ラムがデータ・セツトのオープン処理（使用開始
の宣言でこの時点でユニツト制御ブロツク←デー
タ・セツト制御ブロツクDCB←データ・セツ
ト・エクステント・ブロツクのチエイン関係が張
られる）時に、このテーブルに基づいて入出力資
源との対応付けが決定される。また、データ・セ
ツト制御ブロツクDCBとタスク入出力テーブル
TiOTとの関係は、DD名により対応付けされ
る。ジヨブ制御文JCLは、利用者（応用プログラ
ム）のジヨブを起動するときの制御文であり、
こゝで使用する入出力装置資源の割当てを行うた
めに、データ・セツトの名前（DD文で）を指
定するようになつている。AIMカタログ情報
は、応用プログラム・ジヨブ起動時にデータ・セ
ツトを割当る際におけるエクステントとボリユー
ムとの対応関係を管理しており、AIMデイレク
トリと呼ばれる管理データ・セツト上に格納され
ている。また、利用者が入出力資源を指定すると
きには、データ・セツトのみを指定するようにな
つており、ボリユームまで意識させない。 The data set control block DCB is a control block used for data management when an application program accesses a certain data set, and is chained to the data set extent block DEB. The data set extent block DEB is a control block that manages data set extent information. address) and which device its volume is located on is managed by chaining UCB addresses. The extent access control block EACB is a control table under AIM management, and manages the status of access on an extent-by-extent basis. The extent list EBLT manages all application programs (all tasks) using the same extent, and is the extent access control block.
It is associated with EACB, and access/
The correspondence is managed using the extent number which is the key. The task control block TCB is a block for managing application program tasks, and is used here to have a correspondence relationship with tasks. Although the task input/output list TiOT is called a task, it is actually a table that manages allocation information regarding the resources of the input/output devices used in the job. When the application program opens the data set (declaring the start of use, at this point a chain relationship of unit control block ← data set control block DCB ← data set extent block is established), Correspondence with input/output resources is determined based on this table. Also, data set control block DCB and task input/output table
The relationship with TiOT is based on the DD name. Job control statement JCL is a control statement used when starting a user's (application program) job.
In order to allocate the input/output device resources to be used here, the name of the data set (in the DD statement) is specified. AIM catalog information manages the correspondence between extents and volumes when allocating data sets when starting an application program job, and is stored on a management data set called the AIM directory. Furthermore, when the user specifies input/output resources, the user specifies only the data set, and does not have to worry about the volume.

第６図における処理の流れについて説明する。
応用プログラムは、データ・セツトのアクセスに
先立ち、オープン処理を依頼してデータ・セツト
制御ブロツクDCB→データ・セツト・エクステ
ント・ブロツク→ユニツト制御ブロツクのチエイ
ン関係を確立させ、その後にデータをアクセスす
るためにリード／ライト処理の依頼を行うシーケ
ンスとなつている。 The flow of processing in FIG. 6 will be explained.
Prior to accessing the data set, the application program requests open processing to establish a chain relationship of data set control block DCB → data set extent block → unit control block, and then accesses the data. The sequence is to request read/write processing.

第６図は、オープン完了後におけるデータ・ア
クセス時の処理の流れを示している。 FIG. 6 shows the flow of processing when accessing data after opening is completed.

応用プログラムにより、リード／ライトのデ
ータ・アクセス依頼が行われる。 An application program makes a read/write data access request.

AIMのアクセス管理は、そのデータ・セツ
トに対するアクセス可／不可をエクステント・
アクセス制御ブロツクEACB上のステータス表
示を参照してエクステント単位で行つており、
こゝでデータ管理へのアクセスの許可／不許可
が決定される。 AIM's access management controls which extents and which data sets are accessible.
This is done on an extent-by-extent basis by referring to the status display on the access control block EACB.
Here, it is decided whether or not to allow access to data management.

アクセスできる状態であると、データ管理に
対してアクセス依頼が行われる。 If the access is possible, an access request is made to the data management.

′ アクセス不可の状態（例えば障害が発生し
ている状態）であると、こゝで応用プログラム
側へ異常復帰となる。' If the access is not possible (for example, a failure has occurred), this will result in an abnormal return to the application program.

データ管理では、依頼アクセスのキー情報
（論理的なデータ位置情報）を、データ・セツ
ト・エクステント・ブロツクDEBを参照して
物理的な媒体情報に変換し、かつ対応するユニ
ツト制御ブロツクUCBによりアクセス・ルー
トを決定する。 In data management, the requested access key information (logical data location information) is converted into physical media information by referring to the data set extent block DEB, and the access/extent information is converted to physical media information by the corresponding unit control block UCB. Decide on the route.

物理媒体よりデータのアクセスを実行する。 Access data from physical media.

第７図は障害発生時からアクセス禁止までのプ
ロセスを示すものである。 FIG. 7 shows the process from the occurrence of a failure to access prohibition.

応用プログラムAPL１がエクステントＡを
アクセスした時にボリユームVOO１にボリユ
ーム全体障害が発生した。 When application program APL1 accessed extent A, a total volume failure occurred in volume VOO1.

データ管理によりAIMにその旨のエラー情
報が通知される。 Data management notifies AIM of the error information.

AIMでは、障害となつた原因および障害の
エクステント情報をオペレータに通知するため
の障害メツセージをコンソール上に出力する。 AIM outputs a failure message on the console to notify the operator of the cause of the failure and the extent of the failure.

その後の障害エクステントＡに対するアクセ
スを禁止するために、エクステント・アクセス
制御ブロツクEACB上にアクセス禁止のステー
タスを表示する。このアクセス禁止は、障害種
に応じた禁止範囲となり、ボリユーム全体もし
くはデバイス障害の場合にはエクステント全体
がアクセス禁止となり、部分障害の場合には障
害トラツクのみがアクセス禁止となる。 In order to prohibit subsequent access to faulty extent A, an access prohibited status is displayed on the extent access control block EACB. This access prohibition has a prohibited range depending on the type of failure; in the case of an entire volume or device failure, access to the entire extent is prohibited; in the case of a partial failure, access to only the failed track is prohibited.

AIMは、障害発生によりアクセスが異常終
了したことを応用プログラムに通知する。 AIM notifies the application program that access has ended abnormally due to the occurrence of a failure.

コンソール上に表示された障害メツセージに
より、オペレータはエクステント・リカバリ処
理が必要であることを認識する。これ以後にお
いては、システムはエクステントＡのサービス
を停止したフオールバツク運用となる。 A failure message displayed on the console alerts the operator to the need for extent recovery processing. After this, the system will be in fallback operation with the service of extent A stopped.

第８図イは、障害エクステントに対するデータ
復旧プロセスを示す図である。 FIG. 8A is a diagram showing a data recovery process for a failed extent.

障害エクステントに対するデータ復旧は、オペ
レータがAIMの提供するデータ・セツト・リカ
バリ・ユーテイリテイ・ジヨブを起動することに
より行われこのジヨブは、システム運転中におい
ては利用者ジヨブと並行して実行される。リカバ
リ制御文は、リカバリ・ユーテイリテイの入力情
報となるものであつて、リカバリ対象エクステン
トとこれを格納すべきリカバリ先のボリユーム名
を指定するものである。利用者は、障害となつた
エクステントと同等の大きさの空きスペースを持
つボリユームを選択し、この制御文でスワツプす
べきボリユームを指定することになる。下記にリ
カバリ・ジヨブについて説明する。 Data recovery for failed extents is performed by the operator starting the data set recovery utility job provided by AIM, and this job is executed in parallel with user jobs during system operation. The recovery control statement serves as input information for the recovery utility, and specifies the extent to be recovered and the volume name of the recovery destination in which it is to be stored. The user selects a volume with free space of the same size as the failed extent, and uses this control statement to specify the volume to be swapped. The recovery job is explained below.

対象エクステントに対するスペース割当は、
ダイナミツクにリカバリ・ユーテイリテイで行
う。これは、通常のジヨブ・イニシエート時で
のスペース割当てを利用すると、他に同一のデ
ータ・セツト名を使用中のジヨブが存在すると
いうことでスペース割当てが不可になることを
回避するためである。 Space allocation for the target extent is
Perform dynamic recovery utility. This is to avoid space allocation being impossible if normal space allocation at job initiation is used because there is another job using the same data set name.

リカバリ対象エクステントの全体のアクセス
を禁止する。これは、対象エクステントの障害
種が部分障害であつたケースなど考慮して行わ
れるものである。 Prevent access to the entire extent to be recovered. This is done in consideration of the case where the failure type of the target extent is a partial failure.

トータル・ダンプ・データおよびログ・デー
タを利用して割当てられたスペースにデータを
復旧する。第８図ロは、データの入力データを
説明する図である。トータル・ダンプ・データ
とは、運用に入る前のエクステントの初期時Ｘ
のデータを吸い上げたもので、障害発生を考慮
した言わばバツクアツプ・データである。ロ
グ・データとは、上記の吸い上げ以降において
障害発生時点Ｙまでに更新されたデータをボリ
ユームのレコード位置と対応して蓄積したデー
タ群である。 Recover data into the allocated space using total dump data and log data. FIG. 8B is a diagram illustrating input data. Total dump data is the initial time of the extent before it goes into operation.
This is backup data that takes into account the occurrence of failures. The log data is a data group in which data updated up to failure occurrence point Y after the above-mentioned siphoning is accumulated in correspondence with the record position of the volume.

AIMのカタログ情報におけるエクステント
１に対応するボリユーム名を、VOO１から
VOO２に更新する。 Change the volume name corresponding to extent 1 in the AIM catalog information from VOO1.
Update to VOO2.

エクステント・リストEBLTにスワツプ要の
ステータスをセツトする。 Sets swap required status in extent list EBLT.

リカバリが完了すると、エクステント・アク
セス制御ブロツクEACBのアクセス禁止表示を
解除する。 When the recovery is complete, the access prohibition display in the extent access control block EACB is canceled.

第９図はオープン中のデータ・セツトの障害エ
クステントと復旧先エクステントのスワツプ・プ
ロセスを示すものである。下記の各ステツプにお
ける処理を説明する。 FIG. 9 shows the process of swapping the failed extent of the open data set and the recovery destination extent. The processing in each step will be explained below.

障害エクステントの復旧後、オープン中であ
つた該当エクステントに対するスワツプ処理
は、応用プログラムからの最初のアクセス再開
を契機として、対象エクステントをオープン中
であつた応用プログラム（タスク）で行われ
る。これは、データ・セツト・アクセスに必要
なデータ・セツト制御ブロツクUCBおよびデ
ータ・セツト・エクステント・ブロツクDEB
を保持しているためである。 After the faulty extent is recovered, swap processing for the corresponding extent that was open is performed by the application program (task) that was opening the target extent, triggered by the first resumption of access from the application program. This includes the data set control block UCB and data set extent block DEB required for data set access.
This is because it holds .

応用プログラムより、復旧後の再開アクセス
が行われる。 The application program resumes access after recovery.

エクステント・アクセス制御ブロツクのステ
ータスが正常（アクセス可）であり、且つタス
ク対応のエクステント・リストEBLT上にスワ
ツプ要が表示されていると、スワツプを実行す
る。 If the status of the extent access control block is normal (accessible) and a swap requirement is displayed on the extent list EBLT corresponding to the task, the swap is executed.

エクステントのスワツプ処理では、データ・
エクステント・ブロツクDEB上の物理エクス
テント情報およびUCBチエインを復旧先に切
替え（スワツプ）、更に、タスク入出力テーブ
ルTiOTが未だスワツプされていない場合に限
り、タスク入出力テーブルTiOTのUCBチエイ
ンも復旧先へ切替える。１ジヨブ内に同一エク
ステントを使用する応用プログラムが複数存在
すると、タスク入出力テーブルTiOTについて
は、１ジヨブ内で１つ管理されているのでスワ
ツプ済が論理的に考えられる。タスク入出力テ
ーブルTiOTの必要性は、データのクローズ
後、ジヨブを終了させずに、再度オープンの実
行が行われても矛盾がないように考慮したもの
である。 Extent swapping involves swapping data.
Switch (swap) the physical extent information and UCB chain on the extent block DEB to the recovery destination, and also switch the UCB chain of the task input/output table TiOT to the recovery destination only if the task input/output table TiOT has not been swapped yet. Switch. If there are multiple application programs that use the same extent within one job, one task input/output table TiOT is managed within one job, so it can be logically assumed that the task input/output table TiOT has been swapped. The task input/output table TiOT is necessary so that there will be no inconsistency even if the job is opened again after closing the data without terminating the job.

〜その後、データ管理では、復旧後の媒体
に対する処理となり、この結果、ボリユーム
VOO２における復旧先エクステントのデータ
をアクセスすることになる。~ Data management then processes the post-recovery media, resulting in a volume
Data in the recovery destination extent in VOO2 will be accessed.

第１０図は障害エクステント復旧後の新規利用
者ジヨブに対するボリユームの割当論理を示す図
である。この図のように、AIMのデータ・セツ
トは、AIMデイレクトリ・データ・セツト上の
カタログ情報によりジヨブ・イニシエート時のボ
リユーム名を割当てるようになつているため、利
用者はボリユームの変更に従い、ジヨブ制御文を
変更する必要がないように配慮されている。 FIG. 10 is a diagram showing the logic for allocating volumes to new user jobs after failure extent recovery. As shown in this figure, the AIM data set assigns a volume name at the time of job initiation based on the catalog information on the AIM directory data set, so the user can control the job according to the volume change. Care has been taken so that there is no need to change the text.

以上の説明から明らかなように、本発明によれ
ば、 (1) 固定パツク形態の大記憶装置のデバイス障害
時にダイナミツク・デバイス・リコンフイグレ
ーシヨン機能に代わるリカバリ手段を提供でき
る。 As is clear from the above description, according to the present invention, (1) it is possible to provide a recovery means that replaces the dynamic device reconfiguration function when a device failure occurs in a fixed pack type large storage device;

(2) 部分障害時の交代トラツクの割当てに失敗し
たとき、該当のエクステントのみをスワツプ・
リカバリすることが出来る。(2) When allocation of a replacement track at the time of a partial failure fails, only the relevant extent is swapped.
It can be recovered.

(3) 予備ボリユームを特に用意せずとも現用ボリ
ユーム上に予備スペースが用意されておれば良
く、更に予備スペースが複数台の現用スペース
上にあれば、予備ボリユームが複数台存在する
ことと実質的に等価になる。(3) There is no need to prepare a spare volume as long as a spare space is prepared on the current volume, and if the spare space is located on the current space of multiple units, it is effectively considered that there are multiple spare volumes. is equivalent to

(4) ボリユーム・イニシアライズが不要となるた
め、リカバリ時間が短縮される。(4) Since volume initialization is no longer necessary, recovery time is shortened.

等の効果が得られる。Effects such as this can be obtained.

[Brief explanation of the drawing]

第１図ないし第３図は従来のリカバリ方式を示
すものであつて、第１図はボリユーム全体に対す
るリカバリ方式を説明する図、第２図はデバイス
障害に対するリカバリ方式を説明する図、第３図
は部分障害に対する障害を説明する図、第４図な
いし第１０図は本発明の１実施例を示すものであ
り、第４図は本発明によるデータ・セツトの復旧
形態を示す図、第５図は本発明のエクステント・
スワツプ・リカバリ方式の処理の概要を示す図、
第６図はエクステントとボリユームとの対応なら
びにエクステントに対するアクセス管理を説明す
る図、第７図は障害発生時からアクセス禁止まで
のプロセスを示す図、第８図は障害エクステント
に対するデータ復旧プロセスを示す図、第９図は
オープン中データ・セツトとのエクステント・ス
ワツプ・プロセスを示す図、第１０図は障害エク
ステント復旧後の新規利用者ジヨブに対するボリ
ユームの割当て論理を示す図である。 UCB……ユニツト制御ブロツク、DCB……デ
ータ・セツト制御ブロツク、DEB……データ・
セツト・エクステント・ブロツク、EACB……エ
クステント・アクセス制御ブロツク、EBLT……
エクステント・リンス、TCB……タスク制御ブ
ロツク、TiOT……タスク入出力テーブル。 Figures 1 to 3 show conventional recovery methods, with Figure 1 explaining the recovery method for the entire volume, Figure 2 explaining the recovery method for device failure, and Figure 3. 4 to 10 show an embodiment of the present invention. FIG. 4 is a diagram showing a data set recovery mode according to the present invention. FIG. is the extent of the present invention.
A diagram showing an overview of the processing of the swap recovery method,
Figure 6 is a diagram explaining the correspondence between extents and volumes and access management to extents, Figure 7 is a diagram showing the process from failure occurrence to access prohibition, and Figure 8 is a diagram illustrating the data recovery process for failed extents. , FIG. 9 is a diagram showing the extent swap process with an open data set, and FIG. 10 is a diagram showing the volume allocation logic for a new user job after failure extent recovery. UCB...Unit control block, DCB...Data set control block, DEB...Data set control block.
Set extent block, EACB...extent access control block, EBLT...
Extent rinse, TCB...Task control block, TiOT...Task input/output table.

Claims

[Scope of Claims] 1. Extent swapping that, when a volume failure occurs during access to a data set, restores the extent on the failed volume on another volume, and then swaps the extent with the extent on the failed volume. - In the recovery method, when a recovery command is input that includes information about the recovery source extent and the recovery destination volume and space, the total dump data and log data are saved in the space of the recovery destination volume specified above. a recovery means for restoring the restoration source extent using the above-mentioned recovery method and updating control information indicating the relationship between the extent and the volume; and after restoring the extent, when the extent is requested, extent swapping means for switching an access route of an extent by referring to the updated control information so as to access an extent stored in a volume space; and an extent swap recovery characterized by comprising: method.