JPH07141232A

JPH07141232A - File storage management device

Info

Publication number: JPH07141232A
Application number: JP5285190A
Authority: JP
Inventors: Naoto Matsunami; 直人松並; Taisuke Kaneda; 泰典兼田; Takashi Oeda; 高大枝; Hiroaki Takahashi; 宏明高橋; Hitoshi Akiyama; 仁秋山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-11-15
Filing date: 1993-11-15
Publication date: 1995-06-02

Abstract

PURPOSE:To decrease the frequency of disk access processing accompanying parity update processing. CONSTITUTION:A file is decomposed into several logical blocks LB0 to LB1, which are arranged on four data disk devices 201 to 203. A logical parity group is composed of four logical parity groups (e.g. LB0, LB1, LB2, and LB3) in total which are stored on the different disk devices 201 to 203 and those four logical blocks are exclusively ORed to generate a logical parity block (LP0 in the above case). The logical parity block is stored on a parity disk device 204. The logical blocks and logical parity block are related in a management table and the logical parity block is updated on a host memory when the logical blocks are updated. Therefore, the logical block groups which are divided into different stripes in the case of conventional technology are included in the same logical parity group.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ファイルを複数のディ
スク装置に分散配置するファイル格納管理装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a file storage management device for distributing files in a plurality of disk devices.

【０００２】[0002]

【従来の技術】ファイルを管理する従来技術としては、
「ＵＮＩＸ４．３ＢＳＤの設計と実装」（丸善、１９９
１年出版）に記載された技術が知られている。以下この
技術について説明する。図２１はこの技術の説明図であ
る。１はホストコンピュータ（以下ホストと略記す
る）、２はディスクアレイサブシステムである。３はホ
スト１とディスクアレイサブシステム２を接続する、予
め定められたディスクインタフェース（以下ディスクＩ
／Ｆと略記する）、たとえば、ＳＣＳＩ(small compute
r system interface)に従った信号が流れる信号線であ
り、ディスクＩ／Ｆと以下では呼ぶ。ホスト１において
はディスク管理等を行うオペレーティングシステム１０
３（以下単にＯＳと略記する。ＯＳは、ホスト１のＣＰ
Ｕと主記憶により実行される）が動作しており、この管
理の下、アプリケーション１０１（以下単にＡＰと略記
する。ＡＰは、ホスト１のＣＰＵと主記憶により実行さ
れる）が動作しており、ユーザが所望の計算処理を実現
できるよう構成されている。ＡＰ１０１の発行したディ
スク装置へのリード／ライト要求は、ＯＳ１０３内部
の、ファイル管理を行うファイルシステム１０５により
処理され、デバイスドライバ１０６がディスクＩ／Ｆの
コマンド体系に従いコマンドをディスク装置に発行す
る。ここで、ファイルシステム１０５は、ＡＰ１０１の
ディスクアクセス要求を解析する要求処理部１０５３、
ファイルを論理的なブロック（このデータセグメントを
以下では論理ブロックと呼ぶ）に分割し、ブロックごと
にインデックスとよぶ論理的なブロック番号を付し、こ
のインデックスでファイルを管理するファイルインデッ
クス管理部１０５４、ファイルのインデックス管理用テ
ーブル１０５６、インデックスと実メモリ上に構成する
バッファ１０８ｂとの対応づけおよびバッファ管理を行
うバッファ管理部１０５５、バッファ管理用のテーブル
１０５７から構成されている。ファイルシステム１０５
のファイルインデックス管理部１０５４及びバッファ管
理部１０５７は、ユーザデータ領域１０８ａにあるＡＰ
１０１が管理するデータをディスク装置に格納する際
に、ディスク装置上のどの領域に保存すれば良いかを決
定する。具体的には、ディスク装置番号、パーティショ
ン番号、ディスク上のブロック（セクタ）番号、バッフ
ァ領域のメモリアドレス等を決定する。リード時にはこ
れら決定した諸情報を基にリードアクセスを行う。これ
ら諸情報は上記管理テーブル１０５６、１０５７に登録
する。ディスク装置はホスト１からディスクＩ／Ｆ３経
由でリード／ライトコマンドを受信し、指定されたディ
スクアドレスから読みだし、書き込みする。一方、ディ
スク装置の高性能化、高信頼化技術としては、カリフォ
ルニア大学バークレイ校で開発された、複数のディスク
装置からなるＲＡＩＤ（ＲｅｄｕｎｄａｎｔＡｒｒａ
ｙｓｏｆＩｎｅｘｐｅｎｓｉｖｅＤｉｓｋｓ）が
知られており、この技術は同校発行の論文”ＡＣａｓ
ｅｆｏｒＲｅｄｕｎｄａｎｔＡｒｒａｙｓｏｆ
ＩｎｅｘｐｅｎｓｉｖｅＤｉｓｋｓ（ＲＡＩＤ）”
に詳しく述べられている。以下、本技術を簡単に説明す
る。図１４（ａ）は本従来技術のデータ配置方式を示し
た図である。ＲＡＩＤでは複数台のディスク装置にデー
タを分散配置し、これらのディスク装置を並列に動作さ
せることで高速転送処理を実現したものである。また、
同図のように、異なるディスク装置にわたるディスクア
ドレスをストライプと呼ぶブロックにまとめ、同一スト
ライプに含まれる全データディスク装置のデータの排他
的論理和を計算し、冗長データ（誤り訂正符号とも呼ば
れる。なお、冗長データには、誤り検出符号も含む）の
一種であるパリティを生成し、これをパリティディスク
装置に保存する。例えば、ファイルＡが論理ブロックＬ
Ｂ０，ＬＢ１，ＬＢ２，ＬＢ３から構成され、ファイル
Ｂが論理ブロックＬＢ１０，ＬＢ１１から構成されてい
るとする。そのとき、ストライプ０は、ファイルＡに含
まれる論理ブロックＬＢ０，ＬＢ２とファイルＢに含ま
れるＬＢ１０，ＬＢ１１から構成される。論理ブロック
ＬＢ０，ＬＢ２，ＬＢ１０，ＬＢ１１で一つのパリティ
グループが構成される。これにより、もし、任意の１台
のディスク装置が故障し、データの読みだしが不能とな
っても、パリティと、読みだし不能なデータと同一パリ
ティグループに含まれる他のデータディスク装置中のデ
ータとから復元可能としており、高信頼性を実現したも
のである。図２１のディスクアレイサブシステムも、Ｒ
ＡＩＤの一種である。２はディスクアレイサブシステ
ム、２１０はホストとの接続に用いるディスクＩ／Ｆを
制御する部位２１０、２１１はデータの分配収集、パリ
ティ計算等の処理を実施するディスクアレイ制御部、２
００〜２０４はディスク装置である。図２１の従来例の
ディスクアレイサブシステム２はホスト１からみて１台
のディスク装置に見えるように設計されている。ホスト
は１台のディスク装置を想定してリード／ライト要求を
発行するので、ディスクアレイ制御部２１１はこれを複
数台のディスク装置へのコマンドに変換し、データのリ
ード／ライトを実行する。また、ライト時には上記パリ
ティも更新する必要があり、この処理も併せて実行す
る。2. Description of the Related Art As a conventional technique for managing files,
"Design and implementation of UNIX 4.3 BSD" (Maruzen, 199)
The technology described in (1 year publication) is known. This technique will be described below. FIG. 21 is an explanatory diagram of this technique. Reference numeral 1 is a host computer (hereinafter abbreviated as host), and 2 is a disk array subsystem. Reference numeral 3 denotes a predetermined disk interface (hereinafter referred to as disk I) that connects the host 1 and the disk array subsystem 2.
/ F), for example, SCSI (small compute
This is a signal line through which a signal according to the r system interface) flows, and will be referred to as a disk I / F below. An operating system 10 that manages disks in the host 1
3 (hereinafter simply referred to as OS. OS is CP of host 1)
U is executed by the main memory), and under this management, the application 101 (hereinafter simply referred to as AP. AP is executed by the CPU of the host 1 and the main memory) is operating. , So that the user can realize the desired calculation processing. The read / write request issued by the AP 101 to the disk device is processed by the file system 105 in the OS 103 for file management, and the device driver 106 issues a command to the disk device according to the command system of the disk I / F. Here, the file system 105 is a request processing unit 1053 that analyzes the disk access request of the AP 101,
A file index management unit 1054 that divides a file into logical blocks (this data segment is hereinafter referred to as a logical block), assigns a logical block number called an index to each block, and manages a file with this index, It is composed of a file index management table 1056, a buffer management unit 1055 for associating an index with the buffer 108b configured on the real memory and buffer management, and a buffer management table 1057. File system 105
The file index management unit 1054 and the buffer management unit 1057 of the AP are located in the user data area 108a.
When the data managed by 101 is stored in the disk device, it is determined which area on the disk device should be stored. Specifically, the disk device number, partition number, block (sector) number on the disk, memory address of the buffer area, etc. are determined. At the time of reading, read access is performed based on the various information thus determined. These pieces of information are registered in the management tables 1056 and 1057. The disk device receives a read / write command from the host 1 via the disk I / F 3, reads from the specified disk address, and writes. On the other hand, as a technique for improving the performance and reliability of a disk device, a RAID (Redundant Arra) composed of a plurality of disk devices developed at the University of California, Berkeley.
ys of Inexpensive Disks) is known, and this technology is based on the paper “A Cas published by the school.
e for Redundant Arrays of
Inexpensive Disks (RAID) "
In detail. The present technology will be briefly described below. FIG. 14 (a) is a diagram showing a data arrangement method of this conventional technique. In RAID, data is distributed in a plurality of disk devices and these disk devices are operated in parallel to realize high-speed transfer processing. Also,
As shown in the figure, disk addresses across different disk devices are grouped into blocks called stripes, exclusive OR of data of all data disk devices included in the same stripe is calculated, and redundant data (also called error correction code). , Redundant data also includes an error detection code), which is a kind of parity and is stored in the parity disk device. For example, file A is logical block L
It is assumed that the file B is composed of B0, LB1, LB2, and LB3, and the file B is composed of logical blocks LB10 and LB11. At that time, stripe 0 is composed of logical blocks LB0 and LB2 included in file A and LB10 and LB11 included in file B. The logical blocks LB0, LB2, LB10, and LB11 form one parity group. As a result, even if one of the disk devices fails and data cannot be read, the parity and the data in the other data disk devices included in the same parity group as the unreadable data are not read. It is possible to restore from, and realizes high reliability. The disk array subsystem of FIG.
It is a type of AID. Reference numeral 2 is a disk array subsystem, 210 is a part 210 for controlling a disk I / F used for connection with a host, 211 is a disk array control unit for performing processing such as data distribution and collection, parity calculation, etc.
Reference numerals 00 to 204 denote disk devices. The disk array subsystem 2 of the conventional example shown in FIG. 21 is designed so that it can be seen from the host 1 as one disk device. Since the host issues a read / write request assuming one disk device, the disk array control unit 211 converts this into a command for a plurality of disk devices and executes data read / write. Further, at the time of writing, it is necessary to update the above parity, and this processing is also executed.

【０００３】[0003]

【発明が解決しようとする課題】上記従来の技術によれ
ば、ホストのファイルシステムは、ファイルを論理的な
ブロックに分割し、ブロック単位に管理しているので、
ファイル新規作成時にはブロックを１つずつ順次格納す
る。そこで、１ブロック以上の大きさのファイルを保存
する場合、これらのブロックは各々独立に管理されて、
ディスクアレイサブシステム上に格納されるため、同一
ファイルの連続したブロックであってもディスクアレイ
サブシステム上のアドレス空間の非連続的な空間に配置
される。そのため、連続したブロックが異なるパリティ
グループに含まれることが少なくない。また、ディスク
アレイサブシステムにおいても、ファイル単位でのデー
タ管理は行っておらず、単に、ホストが送出したブロッ
クライト要求に従い指定されたアドレス空間にブロック
を格納する。以上の結果、１つのパリティグループは、
互いに無関係な複数のファイルに属するブロックにより
構成されることが多くなる。この場合、１つのファイル
をライトする際にも、１つのファイルに属する複数のブ
ロックが異なるパリティグループに格納することにな
る。その結果、たとえば、最悪は、図１４（ａ）に示す
論理ブロックＬＢ１，ＬＢ３のライトを行う場合のよう
に、異なるパリティグループに属する複数のブロックの
ライトを行う際に、１つのブロックをライトする度にパ
リティブロックＰ５，Ｐ８を１つずつリード／ライトす
るパリティ更新処理が発生し、ファイルライト処理の速
度を著しくて低下させるという問題点があった。本発明
の目的は、従来技術に比べて、パリティ更新処理回数を
減らしたファイル格納管理装置を提供することにある。
また、本発明の他の目的は、上記ファイル格納管理装置
を有する、高性能でかつ高信頼なディスクアレイサブシ
ステムを提供することにある。According to the above conventional technique, the file system of the host divides the file into logical blocks and manages them in block units.
When creating a new file, blocks are sequentially stored one by one. Therefore, when saving a file with a size of 1 block or more, these blocks are managed independently,
Since it is stored on the disk array subsystem, even consecutive blocks of the same file are arranged in a non-contiguous space of the address space on the disk array subsystem. Therefore, consecutive blocks are often included in different parity groups. Also, the disk array subsystem does not manage data in units of files, and simply stores blocks in an address space designated according to a block write request sent by the host. As a result, one parity group is
It is often composed of blocks belonging to a plurality of unrelated files. In this case, when writing one file, a plurality of blocks belonging to one file are stored in different parity groups. As a result, for example, in the worst case, when writing a plurality of blocks belonging to different parity groups, as in the case of writing the logical blocks LB1 and LB3 shown in FIG. 14A, one block is written. There is a problem that the parity update process of reading / writing the parity blocks P5 and P8 one by one occurs each time, and the speed of the file write process is significantly reduced. An object of the present invention is to provide a file storage management device in which the number of parity update processes is reduced as compared with the prior art.
Another object of the present invention is to provide a high-performance and highly reliable disk array subsystem having the file storage management device.

【０００４】[0004]

【課題を解決するための手段】上記目的を実現するため
に、本発明は、同一のファイルを構成するデータを、複
数のデータセグメントに分割し、１つのデータセグメン
トで１つのデータブロックを構成し、上記複数のデータ
ブロックを外部の複数のディスク装置に分配するファイ
ル格納管理装置において、上記複数のディスク装置に格
納されたデータブロックのうち互いに異なるディスク装
置に格納される複数のデータブロックで構成される冗長
データグループを決定するとともに、上記複数のデータ
ブロックの各々を格納するディスク装置を決定する第１
の決定手段と、同一冗長データグループ内のデータブロ
ックに含まれるデータにより冗長データを求め、該冗長
データから構成される冗長データブロックを冗長データ
グループごとに生成する手段と、上記ファイルと、上記
ファイルを構成する上記データブロックとの対応関係に
関する情報を記憶する第１の記憶手段と、上記データブ
ロックと上記冗長データブロックとの対応関係に関する
情報を記憶する第２の記憶手段とを有することとしたも
のである。In order to achieve the above object, the present invention divides data constituting the same file into a plurality of data segments, and one data segment constitutes one data block. In a file storage management device that distributes the plurality of data blocks to a plurality of external disk devices, the file storage management device includes a plurality of data blocks stored in different disk devices among the data blocks stored in the plurality of disk devices. Determining a redundant data group to be stored and a disk device that stores each of the plurality of data blocks.
Determining means, means for obtaining redundant data from data included in data blocks in the same redundant data group, and generating redundant data blocks composed of the redundant data for each redundant data group, the file, and the file. And a second storage unit for storing information on the correspondence relation between the data block and the redundant data block, and a second storage unit for storing information on the correspondence relation between the data block and the redundant data block. It is a thing.

【０００５】また、ディスクアレイサブシステムにおい
て、上記のファイル格納管理装置と、複数のディスク装
置とを有することとしたものである。Further, the disk array subsystem has the above file storage management device and a plurality of disk devices.

【０００６】[0006]

【作用】たとえば、ｎ＋１台のディスク装置をホストに
接続した場合、ｎ台はデータ用ディスク装置、１台は冗
長データ用ディスク装置として使用する。上記データブ
ロックを格納するディスク装置を決定する手段は、ファ
イルのデータブロックを配置する格納ディスク装置を決
定する。同一ファイルのｎ個のデータブロックにより冗
長データグループと呼ぶグループを構成し、同一冗長デ
ータグループ内のすべてのデータブロックの、たとえば
排他的論理和を計算することで冗長データブロックを生
成する。このように、ファイル毎に冗長データブロック
が一意に決定する。データブロックと対応する冗長デー
タの関係付けを管理するテーブルに、ファイル毎に、デ
ータブロックと対応する冗長データブロックとを対応づ
けて登録する。同一ファイルのデータブロックを新規作
成、追加、変更している間は、１つの冗長データを繰り
返し更新する機会が一般に多いため、以上の管理装置を
採用すると、ファイル単位で、冗長データをデータブロ
ックに対応づけて管理していることより、以下の効果が
ある。すなわち、本管理装置によれば、あるファイルを
オープンしている間は、同一冗長データグループに属す
るデータの更新に伴って、冗長データ生成の素になる差
分データを更新していく必要があるが、データが１つの
ファイルに属するため、バッファ上に、対応する差分デ
ータも存在することが多くなる。その結果、データの更
新に伴う差分データの更新はほとんどが冗長データ用バ
ッファ上で行われることになり、従来技術の様に、デー
タの更新のたびに、冗長データをディスク装置からリー
ド／ライトすることがなくなり、結果として冗長データ
更新に伴うディスクアクセス回数を大幅に減少できる。For example, when n + 1 disk devices are connected to the host, n disks are used as data disk devices and one disk is used as a redundant data disk device. The means for determining the disk device for storing the data block determines the storage disk device for arranging the data block of the file. A group called a redundant data group is formed by n data blocks of the same file, and a redundant data block is generated by calculating, for example, exclusive OR of all the data blocks in the same redundant data group. In this way, the redundant data block is uniquely determined for each file. For each file, the data block and the corresponding redundant data block are registered in association with each other in a table that manages the relationship between the data block and the corresponding redundant data. Since there is often an opportunity to repeatedly update one redundant data while creating, adding, or changing a data block of the same file, if the above management device is adopted, the redundant data is converted into a data block in a file unit. The following effects can be obtained by managing them in association with each other. That is, according to this management device, while a file is open, it is necessary to update the difference data, which is the basis of redundant data generation, with the update of data belonging to the same redundant data group. Since the data belongs to one file, the corresponding difference data often exists in the buffer. As a result, most of the difference data is updated with the data update on the redundant data buffer, and the redundant data is read / written from the disk device each time the data is updated as in the conventional technique. As a result, the number of disk accesses required for updating redundant data can be greatly reduced.

【０００７】[0007]

【実施例】第１の実施例を説明する。図１は本発明のフ
ァイル管理方式を示した模式図である。ファイルｆｉｌ
ｅ０はデータ部及びパリティ部により構成される。デー
タ部はさらに論理ブロックと呼ぶ小ブロックに分割され
る。この例では論理ブロックは８ＫＢであり、ファイル
のデータ部は９６ＫＢであるとする。ファイルｆｉｌｅ
０は先頭のブロックから順にＬＢ０、ＬＢ１、・・・、
ＬＢ１１の１２個の論理ブロックに分割できる。また、
ディスクアレイサブシステム２はディスク０（２０
０）、ディスク１（２０１）、・・・、ディスク４（２
０４）の計ディスク５台を内蔵している。特にディスク
４（２０４）はパリティディスクと呼び、パリティ格納
専用のディスクである。上記、論理ブロックはディスク
アレイサブシステム２のディスク０からディスク３の計
４台のデータディスクと呼ぶディスク群に分配されて、
格納される。この例ではＬＢ０はディスク０に、ＬＢ１
はディスク１に、ＬＢ２はディスク２に、と以下順に繰
り返し配置され、格納される。ここで、ファイル先頭論
理ブロックからデータディスク台数個、すなわちＬＢ０
からＬＢ３までの４個の論理ブロックにより論理パリテ
ィグループＬＰＧ０、ＬＢ４からＬＢ７までの４個の論
理ブロックで論理パリティグループＬＰＧ１、ＬＢ８か
らＬＢ１１までの４個の論理ブロックで論理パリティグ
ループＬＰＧ２を形成し、各該グループ内でバイト単位
に読みだし、各ビットごとに排他的論理和を計算し、論
理パリティブロックを生成する。すなわち、ＬＢ０＋ＬＢ１＋ＬＢ２＋ＬＢ３＝ＬＰ０（１）ＬＢ４＋ＬＢ５＋ＬＢ６＋ＬＢ７＝ＬＰ１（２）ＬＢ８＋ＬＢ９＋ＬＢ１０＋ＬＢ１１＝ＬＰ２（３）となる。ただし、ここで記号＋は排他的論理和を表す。
生成した論理パリティブロックＬＰ０、ＬＰ１、ＬＰ２
はパリティディスクすなわちディスク４に格納される。
上記のファイルｆｉｌｅ０のデータ部及びパリティ部を
構成する各論理ブロック論理パリティブロックはそれぞ
れ互いに関連づけて管理する。もし、ある論理ブロック
ＬＢｎのデータを書き換えようとするとき、関連する論
理パリティブロックＬＰｋも同時に書き換える。上記の
ファイル管理方法を実現する具体的な実施例を以下に示
す。図２は本実施例のシステム構成を表すブロック図で
ある。１はホスト、２はディスクアレイサブシステム、
３はディスクインタフェース（以下Ｉ／Ｆと略記す
る。）である。ホスト１ではオペレーティングシステム
（以下ＯＳと略記する。）１０３が動作し、その管理の
下で、アプリケーション（１０１および１０２）（以下
ＡＰと略記する）が動作している。ＯＳ１０３は以下の
各構成部を有する。１０４はアプリケーションからのデ
ィスクアクセス要求等を受信するシステムコールＩ／
Ｆ、１０５はファイルのディスクへの格納管理を行うフ
ァイルシステム、１０６はディスクＩ／Ｆ制御部１０７
の制御を行うデバイスドライバ部である。ファイルシス
テム１０５はディスクアレイ管理部１０５８を内蔵し、
該部はディスクアレイの制御を司る。１０７はディスク
Ｉ／Ｆ３を制御するディスクＩ／Ｆ制御部である。ディ
スクアレイサブシステム２は５台のディスク（２００〜
２０４）から構成されており、これらはディスクＩ／Ｆ
３を介しそれぞれホスト１に接続している。このディス
クアレイサブシステム２は上記の通りファイルシステム
１０５中のディスクアレイ管理部１０５８により制御、
管理されている。ホスト１のＯＳ１０３内部にあるファ
イル管理部（ファイルシステム）１０５は、以下の動作
をもってホストに接続した複数台のディスク装置２００
〜２０４のどこにデータブロックを格納するか決定す
る。すなわち、上記ファイル管理部１０５がファイルに
データを格納する要求を受信すると、該部１０５は該部
１０５内の上記ファイルとデータブロックを対応づける
テーブルを参照し、上記データがどのデータブロックで
あるかを判定する。もし、すでにあるデータブロックで
あれば、旧データとの差分データを計算し、バッファに
格納しておく。もし、新規のデータブロックであれば、
複数台のディスク装置のうちどのディスク装置にデータ
ブロックを格納するかを決定する手段が格納ディスク装
置を決定する。そして上記ディスク装置のどのアドレス
空間にデータブロックを格納するかを決定する手段は上
記各ディスク装置の使用状況を管理するテーブルを参照
し、格納アドレスを決定する。また、上記冗長データを
生成する手段は、上記データブロックと対応する冗長デ
ータブロックとの関係付けを管理するテーブルを参照
し、上記データブロックに対応する冗長データブロック
の有無を判断し、もし、対応する冗長データブロックが
存在しない場合、上記冗長データブロックをどのディス
ク装置に格納するかを決定する手段は冗長データ格納の
ためのディスク装置を決定し、上記ディスク装置のどの
アドレス空間に冗長データブロックを格納するかを決定
する手段は上記ディスク装置の使用状況を管理するテー
ブルを参照し、格納アドレスを決定し、上記データブロ
ックと対応する冗長データブロックの関係付けを管理す
るテーブルに登録する。もし対応する冗長データブロッ
クがバッファ上に存在する場合には、上記冗長データブ
ロックのデータとの排他的論理和を計算しバッファ上に
保存する。もし対応する冗長データブロックがバッファ
上に存在しない場合には、このデータをバッファ上に保
存する冗長データを計算する手段は、上記の処理の後非同期的
にバッファ上の差分データとディスク装置上の旧冗長デ
ータとの排他的論理和を計算し、新冗長データを生成
し、ディスク装置に書き戻す。５台のディスク装置がホ
スト１に接続されているが、４台はデータ用ディスク装
置、１台は冗長データ用ディスク装置として使用する。
上記データブロックを格納するディスク装置を決定する
手段は、ファイルのデータブロックをデータ用ディスク
装置に略均等に配置するように格納ディスク装置を決定
する。同一ファイルの４個のデータブロックにより冗長
データグループと呼ぶグループを構成し、同一冗長デー
タグループ内のすべてのデータブロックの排他的論理和
を計算することで冗長データブロックを生成する。この
ように、ファイル毎に冗長データブロックが一意に決定
する。冗長データの更新は上記の通り処理する。データ
ブロックと対応する冗長データの関係付けを管理するテ
ーブルに、ファイル毎に、データブロックと対応する冗
長データブロックとを対応づけて登録する。同一ファイ
ルのデータブロックを新規作成、追加、変更している間
は、１つの冗長データを繰り返し更新する機会が一般に
多いため、以上の管理方法を採用すると、ファイル単位
で、冗長データをデータブロックに対応づけて管理して
いることより、以下の効果がある。すなわち、本管理方
法によれば、あるファイルをオープンしている間は、同
一冗長データグループに属するデータの更新に伴って、
冗長データ生成の素になる差分データを更新していく必
要があるが、データが１つのファイルに属するため、バ
ッファ上に、対応する差分データも存在することが多く
なる。その結果、データの更新に伴う差分データの更新
はほとんどが冗長データ用バッファ上で行われることに
なり、従来技術の様に、データの更新のたびに、冗長デ
ータをディスク装置からリード／ライトすることがなく
なり、結果として冗長データ更新に伴うディスクアクセ
ス回数を大幅に減少できる。以下、図３にディスクアレ
イサブシステム２の管理を司るファイルシステム１０３
を中心とする関係各部の構成を示す。ファイルシステム
１０５において、１０５１はアプリケーションがファイ
ルを使用する際に行うファイルオープン処理を行うファ
イルオープン処理部、１０５２はオープンされたファイ
ルのファイル名と対応するファイルインデックス管理テ
ーブル１０５６との関係を管理するファイル名管理テー
ブル、１０５３はアプリケーションからのファイルアク
セス要求を受信し解釈処理を行う要求処理部、１０５６
はファイルをいくつかのブロックに分割し、そのブロッ
ク番号をインデックスとし、あるブロックがディスク上
のどこに格納しているかを管理するファイルインデック
ス管理テーブル、１０５４はファイルインデックス管理
テーブル１０５６の新規登録、変更、削除等の管理を行
うファイルインデックス管理部、１０５７はファイルの
ブロックを物理メモリ上に構築されているバッファ領域
にマッピングするための管理情報であるバッファ管理テ
ーブルであり、特に１０５７ａはデータ用のバッファ領
域を管理するためのデータ用バッファ管理テーブル、１
０５７ｂはパリティ用のバッファ領域を管理するための
パリティ用バッファ管理テーブルである。パリティにつ
いては後述する。１０５５は上記バッファ管理テーブル
１０５７の新規登録、変更、削除、バッファ割当て、等
のバッファ管理を行うバッファ管理部、１０５８は上記
ディスクアレイサブシステム中、ディスク、パーティシ
ョン、ディスク上の格納アドレス、等の決定を行うこと
でディスクアレイサブシステムを管理するディスクアレ
イ管理部である。同図中１０８はホスト１の主記憶中に
割り当てられたバッファ領域である。１０８ａはユーザ
がファイルアクセスの為に獲得したユーザデータ領域、
１０８ｂは、ファイルシステム１０５がディスクをアク
セスする際のデータ管理用のデータ用バッファ領域、１
０８ｃはファイルシステム１０５がパリティを管理する
ためのパリティ用バッファ領域、１０８ｄはファイルシ
ステム１０５がパリティの更新時等に一時的に使用する
作業用バッファ領域である。図５はディスクアレイ管理
部１０５８のブロック図を示している。１０５８１は新
規ブロック登録管理部、１０５８２は新規ブロックを格
納する機器（ディスク、パーティション）を選択する機
器選択部、１０５８３は選択した機器の物理的なディス
クブロックアドレスを決定する物理ブロックマッピング
決定部、１０５８４は論理パリティグループを決定し、
論理データブロックと、論理パリティブロックを関係づ
ける処理を行う論理パリティ管理部、１０５８５はディ
スクアレイの使用状況を管理するディスクアレイ使用状
況管理テーブルである。ディスクアレイ使用状況管理テ
ーブル１０５８５は、ディスクアレイを構成するディス
ク毎に、ディスクの物理ブロックの使用状況を管理する
物理ブロック管理テーブル１０５８５ａおよび物理ブロ
ックを構成するセクタの使用状況を管理するセクタ管理
テーブル１０５８５ｂから構成される。１０５８６は排
他的論理和計算を実施するパリティ演算部、１０５８７
は１台のディスク故障時にパリティデータを利用し、当
該故障ディスクに格納しているデータを再現する縮退処
理を実施する縮退処理部である。図６はディスクアレイ
使用状況管理テーブル１０５８５の具体例を示した図で
ある。ディスク毎に物理ブロック及びセクタの使用状況
及び使用用途が判断できるよう構成されている。以下、
本実施例の動作を説明する。ＡＰはファイルの操作に先
立ちファイルのオープン処理を行う必要がある。ＡＰ１
０１はファイル名、オープンモード（リードモード、ラ
イトモード、リードライトモード、追加モード等）とと
もにオープンシステムコールを発行する。追加モードと
は、すでにあるファイルにファイル領域を追加してライ
トする場合であることを示すモードをいう。ＯＳの種類
によってはライトモードに含めるものもある。ＯＳ１０
３のシステムコールＩ／Ｆ１０４はこれを受信し、オー
プンシステムコールであることを判断するとファイルシ
ステムのファイルオープン処理部１０５１を起動する。
該部はこれを受信し、ファイル名からファイル名管理テ
ーブル１０５２を参照し、ファイル番号ｋをかえす。も
し、ライトモード時に該管理テーブル１０５２上にこの
ファイルが登録されていないならば新規のファイルであ
るので新規に登録する。この処理により、ＡＰは対象フ
ァイルの操作番号であるファイル番号を獲得する。以後
本ファイルのアクセスにはこのファイル番号を使用す
る。図４はファイル管理用テーブルの構成例を示してい
る。１０５２はファイル名管理テーブル、１０５６はフ
ァイルインデックス管理テーブル、１０５７はバッファ
管理テーブル、１０８はバッファメモリである。上記オ
ープン処理により、ファイル番号を獲得すると、このフ
ァイル番号をキーとして対応するファイルインデックス
管理テーブル１０５６が参照できる。ファイルインデッ
クス管理テーブルは、同図の通りの内容を保管したファ
イル毎のテーブルである。その内容の概要は以下の通り
である。モードはファイルのアクセスモードを示す。所
有者はファイルを作成したユーザ名、アクセス識別子は
当該ファイルのアクセス許可範囲を示す。参照カウント
はファイルに対する参照の数を示す。タイムスタンプは
ファイルが最後に読み書きされた時間や、当該ファイル
インデックス管理テーブルが最後に更新された時間を示
す。大きさはファイルのバイト単位の大きさを示す。ブ
ロック数は使用している論理ブロックの数を表す。また
論理ブロック番号に対応しバッファ管理テーブル１０５
７へのポインタが格納されている。バッファ管理テーブ
ル１０５７は各論理ブロック毎に１つ対応し、以下の内
容が記載されている。ハッシュリンクはバッファが有効
かどうかすばやく判定するためのハッシュ表へのリンク
ポインタ、待ち行列リンクは待ち行列を形成するための
リンクポインタ、フラグはバッファの状態すなわち、有
効なデータが格納されているかどうか、バッファは使用
されているかどうか、バッファの内容がディスクに未反
映となっているかどうか、等を示す。機器番号はディス
クの番号及びパーティションの番号である。ブロック番
号は機器番号で示されたディスク上のディスクアドレス
番号である。バイト数は本ブロックに格納された有効な
データのバイト数である。バッファサイズは本バッファ
のバイト単位の大きさである。バッファポインタは物理
バッファメモリへのポインタである。論理パリティグル
ープ番号は上記のファイル毎にパリティを生成する際に
構成した論理パリティグループの番号である。論理パリ
ティポインタはデータ用バッファ管理テーブル１０５７
ａの時有効であり、当該バッファに格納するデータ論理
ブロックに対応した論理パリティブロックを格納するパ
リティ用バッファ管理テーブル１０５７ｂへのポインタ
である。データ用バッファ管理テーブル１０５７ａおよ
びパリティ用バッファ管理テーブル１０５７ｂは構造的
には同一である。データ用バッファ管理テーブル１０５
７ａはファイルインデックス管理テーブル１０５６から
参照し、パリティ用バッファ管理テーブル１０５７ｂは
対応するデータ用バッファ管理テーブル１０５７ａから
参照する点が異なるのみである。バッファを確保する物
理メモリはデータ用バッファとパリティ用バッファとで
異なる領域にマッピングしても良いし、同じ領域を使用
してもかまわない。図３では論理的なイメージを記載し
ている。次にファイルのリード／ライト動作を図３、図
４、および図７から図１２のフローチャート、図５、６
に示すディスクアレイ管理部の詳細構成を示す図、図１
３に示すデータおよびパリティ更新動作の模式図を用い
説明する。初めにリード動作を図７を用い説明する。上
記の通りファイルをリードモードでオープンする。その
後、ＡＰはリードシステムコールを発行し（１１００
１）、ＯＳ１０３のシステムコールＩ／Ｆ１０４がこれ
を受信し、リードシステムコールであることを認識し、
ファイルシステム１０５をコールする（１１００２）。
ファイルシステム１０５の要求処理部１０５３は、バイ
ト単位にＡＰが発行した要求（オフセット）を上記論理
ブロック単位に変換し、該当する論理ブロックを順次ア
クセスする（１１００３）。ここでオフセットとは、転
送するデータの量（＝転送終了アドレス−転送開始アド
レス）、すなわち転送長をいう。第１番目の対象論理ブ
ロックのリード要求を要求処理部１０５３はファイルイ
ンデックス管理部１０５４に送信する。該部はこれを受
信し、ファイルインデックス管理テーブル１０５６を参
照し、対象論理ブロックのデータを格納するデータ用バ
ッファ管理テーブル１０５７ａへのポインタを得る。つ
いで、バッファ管理部１０５５はこのポインタを受け、
対象論理ブロックが格納されているディスク装置の機器
番号すなわちディスク番号及びパーティション番号と、
格納ディスクブロック番号を獲得する（１１００５）。
ここで、ディスクブロック番号とはディスク上のセクタ
に対応し線形的に付ける論理アドレスのことである。さ
らに、該部はバッファポインタを参照し、バッファすな
わち物理メモリ空間が対象論理ブロックに割り当てられ
ているかどうか判定する。もし、バッファが割り当てら
れているならば対象論理ブロックのデータが登録されて
いるかどうかバッファ管理テーブル上のフラグを判定
し、もし”データ有効”フラグが”ＯＮ”であればディ
スクへリードアクセスは不要である（キャッシュヒッ
ト）（１１００６）。もし、バッファが割り当てられて
いないならばバッファ管理部は新たにバッファをわりあ
てる（１１００７）。ついで、この場合、および前記”
データ有効”フラグが”無効”である時には、デバイス
ドライバ部１０６はディスクＩ／Ｆ（例えばＳＣＳＩ）
のリードコマンドを生成し、ディスクＩ／Ｆ制御部１０
７に発行する（１１００８）。ここで、ディスク装置が
指定したデータ用バッファ１０８ｂにデータを転送終了
するまでの間処理は停止し、待ち状態となる（１１００
９）。データ転送が完了すると、また、上記、キャッシ
ュがヒットした場合にも、データ用バッファ１０８ｂか
らＡＰ（ユーザ）が指定したユーザデータ領域１０８ａ
へ指定したバイト単位のデータ（オフセット）を転送す
る（１１０１０）。処理は再びファイルシステム１０５
の要求処理部１０５３に移り、ＡＰが指定したすべての
オフセットデータが転送完了したかどうか判断し、も
し、未完了であれば以上の論理ブロック単位のリードア
クセスを繰り返し完了するまで行う（１１０１１）。す
べての要求オフセットを転送完了したならばファイルシ
ステム１０５はシステムコールＩ／Ｆ１０４を介し、Ａ
Ｐにシステムコール終了通知を発行し、処理はＡＰに戻
る。次にライト時の動作を説明する。図７において、１
１００３までは同様である。ライト時には１１００４に
おいて図８のフローチャート（Ａ）に分岐する。ファイ
ルインデックス管理部１０５４はこれを受信し、ファイ
ルインデックス管理テーブル１０５６を参照し、対象論
理ブロックのデータを格納するデータ用バッファ管理テ
ーブル１０５７ａへのポインタを得る。もし、該当する
データ用バッファ管理テーブル１０５７ａが存在しない
ときは対象論理ブロックがディスク上に格納されていな
い、すなわち新規論理ブロックであることを意味するの
で（１１１０２）の分岐で図１０（Ｄ）のフローチャー
トに分岐する。この場合の詳細に付いては後述する。も
し、該当論理ブロックがすでにディスク上に存在してい
る場合には対応するデータ用バッファ管理テーブルへの
ポインタが見つかる。この場合、ついで、バッファ管理
部１０５５がこのポインタを受け、対象論理ブロックが
格納されているディスク装置の機器番号すなわちディス
ク番号及びパーティション番号と、格納ディスクブロッ
ク番号を獲得する。さらに、該部はバッファポインタを
参照し、バッファすなわち物理メモリ空間が対象論理ブ
ロックに割り当てられているかどうか判定する。ＡＰが
要求したデータライト要求は上記の通りバイト単位のオ
フセットによる指定であるので、該当ブロックのすべて
を書き換えるのではない可能性がある。しかし、この場
合にも論理ブロック単位にライトする必要があるので、
一旦対象論理ブロックをデータ用バッファ上にリード
し、バッファ上で論理ブロックの必要な箇所を更新し、
ディスクに書き戻すリードモディファイライト処理を行
う必要がある。以下リードモディファイライト処理につ
いて述べるが、もし、ＡＰの要求が完全に１論理ブロッ
ク分のデータを書き換える場合には以下のリード処理は
不要である。当該論理ブロックにバッファが割り当てら
れているならば対象論理ブロックのデータが登録されて
いるかどうかバッファ管理テーブル上のフラグを判定
し、もし”データ有効”フラグが”ＯＮ”であればすで
に対象論理ブロックの旧データがデータバッファ上に存
在するので、ディスクへのリードアクセスは不要である
（キャッシュヒット）（１１１０３）。もし、バッファ
が割り当てられていないならばバッファ管理部は新たに
バッファをわりあてる（１１１０４）。ついで、この場
合、および前記”データ有効”フラグが”無効”である
時には、デバイスドライバ部１０６はディスクＩ／Ｆ
（例えばＳＣＳＩ）のリードコマンドを生成し、ディス
クＩ／Ｆ制御部１０７に発行する（１１１０５）。ここ
で、ディスク装置が指定したデータ用バッファ１０８ｂ
にデータを転送終了するまでの間処理は停止し、待ち状
態となる（１１１０６）。ここで旧データのリードデー
タ転送が完了する。次に、当該論理ブロックに対応する
論理パリティブロックのバッファ管理テーブルを参照
し、パリティ用バッファメモリがマッピングされている
かどうか判定する（１１１０７）。もし、バッファがマ
ッピングされている場合でかつバッファ内容が”有効”
である場合には、図９（Ｅ）に進む。これ以外の場合
（バッファメモリマッピングが未完了である）ならばマ
ッピングを実施し（１１１０８）、（Ｆ）に進む。つい
でパリティの更新処理にはいる。パリティ更新処理の詳
細を図１３を併用して説明する。９図（Ｅ）の場合、作
業用バッファ１０８ｄ０を作業用バッファ１０８ｄ０領
域に確保し（１１１０９）、ユーザ領域１０８ａ上の当
該データ（新データ）と当該データバッファ１０８ｂ０
上のデータ（旧データ）との排他的論理和（ＥＯＲ）を
計算し、両データの差分データを生成する（１１１１
０）。ここで、ユーザ領域上の当該新データをデータバ
ッファに転送し、バッファ内容を更新する（１１１１
１）。そして、作業用バッファ１０８ｄ０上の差分デー
タと当該パリティバッファ上の排他的論理和（ＥＯＲ）
を計算し、その結果をパリティバッファ１０８ｃ０に格
納する（１１１１２）。なお、図中パリティバッファを
冗長データバッファとも命名しているのは、上記の計算
により、新パリティが生成できていない場合があるから
である。ここで、生成されたデータはあくまでも新デー
タ、旧データの新差分データと、前回当該論理ブロック
をライトしたときに生成した旧差分データとの排他的論
理和であり、パリティの差分情報を算出したにすぎな
い。よって、いずれ、ディスク上に格納されている旧パ
リティと本パリティの差分情報の排他的論理和を計算す
る必要がある。このような場合、パリティ用バッファ管
理テーブル１０５７ｂの”旧パリティリード済み”フラ
グは”ＯＦＦ”になっている。もし、本フラグが”Ｏ
Ｎ”の場合にはこのパリティ用バッファ上には上記処理
終了時に新パリティが生成されたことになる。次ぎに、
当該データ用バッファ管理テーブル１０５７ａ及びパリ
ティ用バッファ管理テーブル１０５７ｂ上の”バッファ
内容”フラグを”ｄｉｒｔｙ”に設定する（１１１１３
〜１１１１４）。この”ｄｉｒｔｙ”フラグはバッファ
上のデータがディスク上に未反映であることを示す。い
ずれディスク上にライトバックする。この処理は図１２
のフローチャートに示しており、後述する。全ＡＰ要求
データをブロックライトし終わったかどうか判断し（１
１１１５）、もしまだであるなら図７（Ｃ）に戻る。次
に図９（Ｆ）の場合を説明する。（Ｅ）との違いはパリ
ティ用バッファ１０８ｃ０にパリティの差分情報もしく
は旧パリティが格納されていないことである。そこで、
新旧データの差分情報を上記同様生成し、当該パリティ
用バッファ１０８ｃ０に格納する（１１１１６）。パリ
ティバッファが空だから、作業用バッファは不要であ
る。この場合、旧パリティリードはまだ行っていないの
で、上記”旧パリティリード済”フラグを”ＯＦＦ”に
設定しておく（１１１１７）。そして、ユーザ領域上の
当該データ（新データ）を当該データ用バッファ領域１
０８ｂ上に転送し、バッファ内容を更新する（１１１１
８）。以下上記（Ｅ）の場合と同様である。次に新規ブ
ロックをライトする場合（Ｄ）を図１０を用い説明す
る。当該論理ブロックに対応する論理パリティグループ
が存在するかどうかを判定する（１１１２０）。この判
定はファイルインデックス管理テーブル１０５６におい
て当該論理ブロックと同一論理パリティグループの論理
ブロックがすでに登録済みかどうかを判定すれば良い。
判定の方法は、ファイルインデックス管理テーブル１０
５６にバッファ管理テーブル１０５７へのポインタがあ
るかどうかで判断できる。もし、該当する論理パリティ
ブロックが無い場合、データ用バッファ管理テーブル１
０５７ａ及びパリティ用バッファ管理テーブル１０５７
ｂの両者を新規に作成し登録する（１１１２１〜１１１
２２）。ここで、１１１２１と１１１２２で行うバッフ
ァ管理テーブル１０５７の新規割当と、ディスク物理ブ
ロックのマッピングに付いて図１１のフローチャートを
用い説明する。初めに、ディスクアレイ管理部１０５８
の新規論理ブロック登録管理部１０５８１（図５）は、
バッファ管理部１０５５から新規ブロック割当要求を受
信し、パリティ用ブロックかデータ用ブロックかの判定
を行い（１１３０１）、新規にバッファ管理テーブル１
０５７を作成する（１１３０２）。次に機器選択部１０
５８２はデータブロックか、パリティブロックか、ま
た、論理ブロック番号がいくつか等の情報を基に、ブロ
ックを格納するディスク及びパーティションを決定し、
バッファ管理テーブル１０５７に登録する（１０３０
３、１０３０４）つぎに、物理ブロックマッピング決定
部１０５８３はディスク上のどの物理ブロックに当該論
理ブロックを格納するかをディスクアレイ使用状況管理
テーブル１０５８５中の物理ブロック管理テーブルを参
照し、同一ファイルの他の論理ブロックの格納位置等の
情報を基に、シーク、回転待ちが最小となるような最適
なブロックを選択する。また、該管理テーブル１０５８
５を更新する。この物理ブロックの選択結果からディス
クブロック番号すなわちセクタ番号をバッファ管理テー
ブル１０５７中のブロック番号フィールドに登録する
（１１３０５）。ついで、論理パリティグループを然る
べきアルゴリズムで決定し、その論理パリティグループ
を登録する（１１３０６）。もしデータ用ブロックであ
るならば当該データに対応するパリティ用バッファ管理
テーブル１０５７ｂへのポインタをデータ用バッファ管
理テーブル１０５７ａに登録し、もし、パリティ用ブロ
ックであるならば、パリティ用バッファ管理テーブル１
０５７ｂのこのフィールドにＮＵＬＬポインタを登録す
る（１０３０８）。最後に当該バッファ管理テーブル１
０５７を対応するファイルインデックス管理テーブル１
０５６に登録し、新規バッファ管理テーブルの割当及び
論理ブロックの物理ブロックマッピング処理は完了す
る。図１０に戻り、新規に登録したデータ用バッファ管
理テーブル１０５７ａにバッファメモリをマッピングす
る（１１１２３）。もし、パリティ用バッファ管理テー
ブル１０５７ｂも新規に作成したならば同じくバッファ
メモリをマッピングする。以降は上記更新ライト時の処
理同様新旧データの差分データを計算し、パリティ用バ
ッファ領域の旧差分データとの排他的論理和を計算し、
パリティバッファ領域に格納する。もし、旧差分データ
がなければ新差分データをそのままパリティバッファ領
域に格納し、”旧パリティリード済”フラグを”ＯＦ
Ｆ”に設定する。最後にユーザ領域上の当該新データを
当該データ用バッファ領域に転送し、図９（Ｈ）に戻
る。以上ライト処理時には、データ用バッファおよびパ
リティ用バッファに新情報を格納した段階で、バッファ
管理テーブルの”バッファ内容”フラグを”ｄｉｒｔ
ｙ”に設定し、処理を終了していた。これは再び当該バ
ッファへの書き込み要求が発生した場合に古いデータの
ライト処理を削減できる可能性があるからである。これ
を遅延書き込みと呼び、このときには適当なタイミング
でディスク上にライトバックする必要がある。バッファ
内容とディスク内容を同期化させることからこの動作を
シンク動作と呼ぶ。このシンク動作のフローチャートを
図１２に示す。シンクデーモンと呼ぶＡＰプロセスが定
期的にシンクシステムコールを発行する（１１２０
１）。ＯＳはこれを受信し、ファイルシステムのバッフ
ァ管理部１０５５はリストをサーチし、”バッファ内
容”フラグが”ｄｉｒｔｙ”なバッファを探し（１１２
０３）、探したバッファがデータならばこのバッファ内
容を当該データ用バッファ管理テーブルの情報にしたが
いディスクにライトし（１１２０４〜１１２０６）、最
後に”バッファ内容”フラグを”ｃｌｅａｎ”に設定す
る（１１２０７）。もし、パリティ用バッファであるな
らば、”旧パリティリード済”フラグを参照し、もし”
ＯＦＦ”であるならば（１１２０８）、作業用バッファ
を確保し、旧パリティをバッファ管理テーブルが示すデ
ィスクから作業領域にリードする。この旧パリティとパ
リティ用バッファ領域上の差分パリティデータの排他的
論理和を計算し、これをパリティ用バッファに格納す
る。この生成した完全な新パリティをディスクにライト
し、”旧パリティリード済”フラグを”ＯＮ”に設定
し、また、”バッファ内容”フラグを”ｃｌｅａｎ”に
設定する。もし、”旧パリティリード済”フラグがあら
かじめ”ＯＮ”であった場合には、直ちに当該ディスク
にライトし、同じく、”バッファ内容”フラグを”ｃｌ
ｅａｎ”に設定する。以上の処理をすべての”ｄｉｒｔ
ｙ”なバッファに付いて行う。以上のように本実施例に
よれば図１のファイル単位にパリティを管理する方式が
実現できる。次に本実施例の効果を説明する。従来例に
よれば、ファイルシステム１０５はブロックの格納毎に
その時点で最適と思われるディスクアドレスを決定する
ため、ディスクアレイサブシステム上では図１４ａに示
すように断続的なブロック配置になることが多い。最適
な決定とは、例えば、１つのディスク装置内に複数のヘ
ッドがある場合にその時点で空いているヘッドを選ぶ、
回転待ち時間の少ないセクタを選ぶ等がある。ＬＢ０か
らＬＢ３を順次格納していく場合、従来技術の場合、対
応する３つのパリティブロックＰ１、Ｐ５、Ｐ８を更新
する必要があり、データブロックとパリティブロック、
計７個について、計７回のリードモディファイライト処
理を実施する必要がある。一方本実施例の方式では、図
１４ｂに示すとおり連続した論理ブロックＬＢ０〜ＬＢ
３の対応論理パリティは唯一ＬＰ０に決定する。遅延書
き込みを上記実施例の通り実施している場合、ＬＰ０の
更新はバッファ上のみで実行できるため、パリティ更新
処理のためのリードモディファイライト処理は１回で済
み、データの更新も含めると、データブロックとパリテ
ィブロック、計５個について、計５回のリードモディフ
ァイライト処理を行うことになる。この例の場合処理効
率が１．４倍向上したことになる。以上の効果を一般化
すると、（ｎ＋１）台のディスクで構成されるディスク
アレイサブシステムでは、ｋ個の論理ブロックをライト
した場合のパリティ更新最大回数は、（１）従来ｋ回（２）本実施例ｋがｎの倍数の時（ｋ／ｎ）回ｋがｎの倍数でない時（（ｋ／ｎ）＋１）回となる。データ更新まで含めた処理効率は約（２／（１
＋（１／ｎ）））倍に向上する。第２実施例を説明す
る。上記第１実施例では、ファイルの論理ブロック数は
データディスク台数の整数倍であるとして説明してきた
が、実際にはファイルの論理ブロック数は任意整数個で
ある。例えば図１５に示すファイル０（ｆｉｌｅ０）は
合計１０個の論理ブロックから構成されている。データ
ディスクが４台の場合、ＬＢ０〜ＬＢ７の８個の論理ブ
ロックは完全な２つの論理パリティグループＬＰＧ０、
ＬＰＧ１を構成できるが、ＬＢ８、ＬＢ９の２つの論理
ブロックは完全な論理パリティグループを構成できな
い。このような論理ブロックをフラグメント論理ブロッ
クと称することにする。ファイル１（ｆｉｌｅ１）のＬ
Ｂ４、およびファイル２（ｆｉｌｅ２）のＬＢ０も同様
にフラグメント論理ブロックである。このようなフラグ
メント論理ブロックの扱い方の例を示す。この方法は、
完全なる第１実施例の拡張であり、各ファイル内のフラ
グメント論理ブロックのみで論理パリティグループを構
成してしまう方法である。この方法は第１実施例と同様
に実現できる。この方法は追加を頻繁に行うようなファ
イルの場合すでにフラグメント論理ブロックに対応した
論理パリティブロックが割り当てられているので、追加
したブロックにより構成する論理パリティグループがフ
ラグメント論理ブロックによるものか完全なものである
かを認識する必要がなく、容易に構成できるというメリ
ットがある。また、本方法では、ただ１つの論理ブロッ
クにより構成された小さなファイルに対しては論理ブロ
ックと論理パリティブロックの２つのブロックを使用す
ることになるが、これら両者の内容を同一とする。こう
すると、本ファイルリードの際にはどちらのブロックを
リードしても良い。すなわち、現在使用されていないデ
ィスクを選択する、もしくは両ディスクともに使用され
ていないときにはシーク距離、回転待ち時間の小さいデ
ィスクを選択しリードすれば良い。このような小さなフ
ァイルがたくさん存在するようなシステムにおいては以
上の制御を行うことで高速化の効果が大きい。第３実施
例を示す。第２実施例は小ファイルの高速化に効果が大
きく、また管理も容易であるという効果がある一方、フ
ァイル２のようにただ１つの論理ブロックのみで構成さ
れるような小さなファイルに対しても１つの論理パリテ
ィブロックを割り当てる必要があり、ディスク容量の点
で不利な面があった。そこでこの欠点を改善した方法を
図１５に示す。ファイル０（ｆｉｌｅ０）のフラグメン
ト論理ブロックＬＢ８、ＬＢ９と、ファイル１（ｆｉｌ
ｅ１）のフラグメント論理ブロックＬＢ４と、ファイル
２（ｆｉｌｅ２）のフラグメント論理ブロックＬＢ０と
の合計４個のフラグメント論理ブロックにより仮の論理
パリティグループＶＬＰＧ０を構成し、これら論理ブロ
ックの排他的論理和を計算し、フラグメント論理ブロッ
クによるパリティブロック（フラグメントパリティブロ
ック）ＦＰ０を生成する。一つの仮の論理パリティグル
ープに含まれるフラグメント論理ブロックはすべて異な
るデータディスクに格納されるよう配置し、フラグメン
トパリティブロックは以上のデータディスクとは異なる
パリティディスク上に配置する。ファイルの論理ブロッ
クをディスク上に格納するときにはファイル中にフラグ
メント論理ブロックが存在するかどうか検査する必要が
ある。このため、第１実施例で示した、図４のファイル
管理テーブルの一部であるバッファ管理テーブル１０５
７のフラグ領域に”フラグメント論理ブロック”フラグ
を設け、当該論理ブロックがフラグメント論理ブロック
であるかどうかを認識可能としておく。また、論理パリ
ティブロックを管理するパリティ用バッファ管理テーブ
ル１０５７ｂにも同様に”フラグメント論理ブロック”
フラグを設け、当該論理パリティブロックがフラグメン
トパリティブロックであるかどうかを認識可能としてお
く。このフラグメントパリティブロックは複数のファイ
ルのフラグメント論理ブロックのデータ用バッファ管理
テーブル１０５７ａから参照されることになる。既存フ
ァイルに新規論理ブロックを追加するような場合、ま
ず、図４のファイルインデックス管理テーブル１０５６
の論理ブロックに対応するバッファ管理テーブル１０５
１１へのポインタを参照し、新規論理ブロックの番号が
（データディスク数ｎの整数倍−１）でないときはその
論理ブロックはフラグメント論理ブロックとなる。よっ
て、”フラグメント論理ブロック”フラグを”ＯＮ”と
し、もし、既に同一ファイルにフラグメント論理ブロッ
クがあるならば、その仮の論理パリティグループに当該
新規フラグメント論理ブロックを編入する。当該仮の論
理パリティグループに空きがない、もしくは同一ファイ
ルにフラグメント論理ブロックがないときは新規に仮の
論理パリティグループを構成する。もし、新規論理ブロ
ック番号が（データディスク数ｎの整数倍−１）に一致
するときは、当該ファイル内部で完全な論理パリティグ
ループを構成できるので、ｎ個の当該ファイル内部のフ
ラグメント論理ブロックの排他的論理和を計算し、新規
に論理パリティブロックを構成し、パリティディスクに
格納する。この際、すべてのフラグメント論理ブロック
が異なるデータディスク上に格納されるように配置する
ことで、データ論理ブロックを移動すること無く、論理
パリティブロックのみ新規に生成すれば良い。このよう
に、ファイルが大きくなる過程においてはフラグメント
論理ブロックが発生するが、上記実施例１に示したよう
に、パリティの更新処理はホスト主記憶メモリ上にて行
い、直ちにはディスク上に書き出すことはしないため、
論理パリティブロック更新のためのオーバヘッド時間は
通常非常に小さい。以上のように、本実施例によればフ
ァイルの論理ブロック数がデータディスク台数の整数倍
でないときにもパリティディスク容量を節約したパリテ
ィ管理を実現でき、その性能も上記第１実施例とほぼ同
等で実現できる。次に第４実施例を説明する。上記第１
〜３実施例では、図４に示すファイル管理テーブル（バ
ッファ管理テーブル）により論理ブロックと論理パリテ
ィブロックの対応関係を保持するよう構成していた。本
実施例では上記方式と異なる第２の管理方法を図１６に
示す。２１０５２はファイル名管理テーブル、２１０５
２ａはデータファイルを管理するデータファイルインデ
ックス管理テーブルへのポインタ、２１０５２ｂはデー
タファイルの論理ブロックに対応した論理パリティブロ
ックを１つのパリティファイルとして管理するためのパ
リティファイルインデックス管理テーブルへのポイン
タ、２１０５６ａはデータファイルの論理ブロックを管
理するデータファイルインデックス管理テーブル、２１
０５６ｂは論理パリティブロックを管理するパリティフ
ァイルインデックス管理テーブル、２１０５７ａおよび
２１０５７ｂは各論理ブロック及び論理パリティブロッ
クに対応したバッファを管理するためのバッファ管理テ
ーブルである。データファイルインデックス管理テーブ
ル２１０５６ａ及び、パリティファイルインデックス管
理テーブル２１０５６ｂは図４のファイルインデックス
管理テーブル１０５６と同様である。また、バッファ管
理テーブル２１０５７は図４のバッファ管理テーブル１
０５７から論理パリティポインタを除いたものと同様で
ある。本実施例の方式は、１つのファイルをデータファ
イルとパリティファイルの２つのファイルに分割して管
理している。図１７を用いて説明する。ファイル名管理
テーブル２１０５２は２つのポインタフィールド、デー
タファイルインデックスポインタ２１０５２ａおよびパ
リティファイルインデックスポインタ２１０５２ｂを有
する。ｆｉｌｅ０はデータファイルｆｉｌｅ０ｄおよび
パリティファイルｆｉｌｅ０ｐの２部に分けられ、各々
独立したファイルとして管理する。但し、これら両者の
ファイルは上記ファイル名管理テーブル２１０５２で関
係づける。論理ブロックと論理パリティブロック、論理
パリティグループの関係および管理方法は上記第１実施
例と同様である。以上の方式によれば上記第１実施例同
様の効果を得ることができる。また、もし、ユーザの要
求、作業領域である等の理由により、ファイルによって
は信頼性が不要な場合に選択によりパリティを付加しな
い事も可能であり、本ディスクアレイサブシステム利用
者の要求に合致した信頼性を提供できる。第５実施例を
説明する。以上の実施例はすべてホスト１上のＯＳ１０
３によりディスクアレイサブシステムを制御する方式で
あったが、データベース管理システムのようなアプリケ
ーションにおいては、使用方法に、より合致したディス
クシステムの制御の最適化を図る必要があり、このため
にはアプリケーション１０１内部でディスクシステムの
制御を行うことが必要となる。図１８にこの例を示す。
１０１はアプリケーション、１０５８はディスクアレイ
サブシステム２を制御・管理するディスクアレイ管理部
である。１０５８内部の構成は上記第１〜４実施例と同
様である。本アプリケーションは、ＯＳの有するファイ
ル管理機構、すなわちファイルシステムを経由せず、ロ
ーデバイスＩ／Ｏとよぶ、ダイレクトにディスクアレイ
サブシステムを制御する方法を用いる。アプリケーショ
ンがローデバイスＩ／Ｏシステムコールを発行するとＯ
ＳのシステムコールＩ／Ｆはこれを受けデバイスドライ
バに対し、アプリケーションの要求するディスクＩ／Ｆ
制御部にディスクコマンドを発行する。このようにＯＳ
は単にアプリケーションの要求を受けディスクにコマン
ドを発行するだけの簡単な処理のみ行う。どのディスク
に論理ブロックや論理パリティブロックを配置するかを
決定するのはアプリケーション内部のディスクアレイ管
理部の役割となる。該部の構成及び動作は上記第１実施
例同様である。以上の実施例によれば、ディスクアレイ
サブシステムをアプリケーションの使用方法に合致した
形で制御することが可能となり、高性能化、高信頼化の
効果がより大きくなる。次に第６実施例を説明する。本
実施例は図１９に示すとおり、ディスクアレイサブシス
テム２内部にディスクアレイ管理部１０５８を有したも
のである。ディスクアレイ制御部２１１はホスト１と通
信することによりファイルの管理情報をディスクアレイ
制御部２１１内部で管理しているものである。この場
合、第１実施例同様にファイルを論理ブロックとして管
理し、論理ブロックと対応する論理パリティブロックを
管理する。ディスクアレイ管理部１０５８の構成及び動
作は第１実施例同様である。本実施例によれば、ディス
クアレイサブシステム内部において論理ブロック及び論
理パリティブロックの最適配置を実現できるため、従来
技術に比べて高性能化の効果が大きい。次に第７実施例
を説明する。上記第１〜６実施例は容易にネットワーク
型ディスクアレイシステムおよび分散ファイルシステム
に拡張することが可能である。図２０（Ａ）はその一例
を示したものである。５はネットワーク、６０〜６４は
ディスクを保持している計算機であるホスト０〜４、７
０はディスクをユーザがプログラムを実行し、ホスト０
〜４にディスクアクセス要求を発行する計算機であるク
ライアント、８０〜８３はデータ用ディスク、８４はパ
リティ用ディスクである。ホスト０〜４は各々１台のデ
ィスク８０〜８４を保持している。例えばクライアント
７０は上記第１実施例のようなファイル管理方法を行っ
ているとする。ただし、クライアント７０は自分でデー
タ用ディスク装置を保持していないので、論理ブロック
単位でホスト０〜４にネットワーク経由でディスクアク
セス要求を発行する。しかし、クライアント７０のファ
イル管理方式は上記第１実施例と同様に管理することが
できる。ただしこの場合、図４に示すファイル管理テー
ブル中のバッファ管理テーブル１０５７の機器番号情報
にホストのアドレス等の機器認識番号を付加する必要が
ある。たくさんのクライアントがあるような場合、ホス
ト０〜４にはたくさんのディスクアクセス要求が発行さ
れることになるが、ホスト０〜４はその実行順序及び、
論理ブロックの格納アドレスを自由に最適化して決定す
ることができる。また、別の例として図２０（Ｂ）に示
した構成が考えられる。ホスト０はデータディスクのみ
を管理している。ホスト１はパリティディスクのみを管
理している。クライアント７０はホスト０に対してデー
タディスクアクセス要求を発行し、ホスト１に対しパリ
ティディスクアクセス要求を発行する。上記第（Ａ）の
例との相違はホスト０はすべてのデータディスクの管理
を行い、ホスト１はパリティディスクの管理を行う点で
ある。データとパリティは非同期で扱うことが可能なの
で、おのおの各ホストが実行順、および論理ブロックの
格納アドレスを自由に最適化して決定することができ
る。以上いくつかの実施例を説明したが、この実施例中
で（ｎ＋１）台のディスク中の１台をパリティディスク
と固定して扱ったが、論理パリティブロックを（ｎ＋
１）台のディスクすべてに分散し配置することも可能で
ある。この場合にも、並列処理により高速化を計るため
に、論理パリティグループ中の全論理ブロックと論理パ
リティブロックはすべて異なるディスク装置上に格納す
る必要があるのは上記実施例と同様である。各論理ブロ
ック及び論理パリティブロックは各ディスク上の任意の
アドレスに配置することが可能である。また、上記の実
施例はすべて磁気ディスク装置によるアレイシステムに
ついて説明したが、磁気ディスク装置のかわりに光ディ
スク装置や磁気テープ装置や半導体記憶装置を用いて同
様のファイル管理を実現したアレイシステムを構築する
ことも可能である。さらに、磁気ディスクを用いたディ
スクアレイシステム中のパリティディスクのみ光ディス
クや磁気テープ装置や半導体記憶装置を用いたり、これ
らを組み合わせた記憶装置によりおきかえることも可能
である。このように様々なシステム構成に本発明のファ
イル格納管理方式を用いることができる。EXAMPLE A first example will be described. FIG. 1 shows the present invention.
It is a schematic diagram showing a file management system. File fil
e0 is composed of a data section and a parity section. Day
The data part is further divided into small blocks called logical blocks.
It In this example, the logical block is 8 KB and the file
The data part of is assumed to be 96 KB. File file
0 is LB0, LB1, ...
It can be divided into 12 logical blocks of LB11. Also,
Disk array subsystem 2 uses disk 0 (20
0), disk 1 (201), ..., disk 4 (2
It has a total of 5 discs of 04). Especially disk
4 (204) is called parity disk and stores parity
It is a dedicated disc. Above, the logical block is a disk
Array subsystem 2 disk 0 to disk 3 total
It is distributed to a group of disks called four data disks,
Is stored. In this example, LB0 is on disk 0 and LB1
To disk 1, LB2 to disk 2, and so on.
It is placed back and stored again. Where file top theory
Processing block to the number of data disks, that is, LB0
Logical parity with 4 logical blocks from LB to LB3
Group 4 LPG0, LB4 to LB7 four theories
Logical parity group LPG1, LB8 in the logical block?
To LB11 with 4 logical blocks.
Form a loop LPG2, and use byte units within each group
Read out, calculate exclusive OR for each bit, and
Generate a physical parity block. That is, LB0 + LB1 + LB2 + LB3 = LP0 (1) LB4 + LB5 + LB6 + LB7 = LP1 (2) LB8 + LB9 + LB10 + LB11 = LP2 (3) However, the symbol + here represents an exclusive OR.
Generated logical parity blocks LP0, LP1, LP2
Are stored on the parity disk or disk 4.
The data part and the parity part of the file file0 are
Each logical block that composes each logical parity block
And manage them in association with each other. If some logical block
When trying to rewrite the data of LBn, the related theory
The physical parity block LPk is also rewritten at the same time. above
A concrete example of implementing the file management method is shown below.
You FIG. 2 is a block diagram showing the system configuration of this embodiment.
is there. 1 is a host, 2 is a disk array subsystem,
3 is a disk interface (hereinafter abbreviated as I / F)
It ). Operating system on host 1
(Hereinafter abbreviated as OS) 103 operates and manages its
Below, the applications (101 and 102) (below
(Abbreviated as AP) is operating. OS103 is the following
It has each component. 104 is the data from the application
System call I / to receive disk access request etc.
F and 105 are files for managing storage of files on the disk.
File system, 106 is a disk I / F control unit 107
It is a device driver unit that controls the. File system
The system 105 has a built-in disk array management unit 1058,
This unit controls the disk array. 107 is a disk
A disk I / F control unit that controls the I / F 3. Di
The squaray subsystem 2 has five disks (200-
204), and these are disk I / F
3 are connected to the host 1 respectively. This disc
The array subsystem 2 is a file system as described above
Controlled by the disk array management unit 1058 in 105,
It is managed. The file inside the OS 103 of the host 1
The file management unit (file system) 105 operates as follows.
Disk devices 200 connected to the host with
~ 204 to decide where to store the data block
It That is, the file management unit 105 converts the file
Upon receiving the request to store the data, the unit 105
Associate the above file in 105 with the data block
Refer to the table, and in which data block the above data is
Determine if there is. If you have an existing data block
If there is, the difference data with the old data is calculated and stored in the buffer.
Store it. If it is a new data block,
Data to which disk device among multiple disk devices
The means for deciding whether to store the block is the storage disk device.
Position. And which address of the above disk device
The means for deciding whether to store a data block in space is above.
Refer to the table that manages the usage status of each disk unit
Then, the storage address is determined. In addition, the redundant data
The means for generating is the redundant data corresponding to the above data block.
Refer to the table that manages the relationship with the data block
And redundant data block corresponding to the above data block
If there is a corresponding redundant data block,
If not, which redundant data block
The method of deciding whether to store in the storage device is redundant data storage.
Determine the disk device for which of the above disk devices
Decide whether to store redundant data blocks in the address space
Means for managing the usage status of the disk device.
Table, determine the storage address, and
Management of the relationship between the
Register in the table. If the corresponding redundant data block
If the buffer exists in the buffer, the redundant data block
Calculate the exclusive OR with the lock data and put it on the buffer
save. If the corresponding redundant data block is a buffer
If it does not exist above, keep this data in the buffer.
The means for calculating existing redundant data is asynchronous after the above process.
The differential data on the buffer and the old redundant data on the disk unit.
Data with exclusive data and generate new redundant data
And write back to the disk device. 5 disk units
Connected to the store 1, but 4 units are for data disk
One is used as a redundant data disk device.
Determine the disk device to store the data block
Means file data block data disk
Decide the storage disk device so that it will be placed almost evenly on the device
To do. Redundancy due to 4 data blocks in the same file
A group called a data group is formed, and the same redundant data is
Exclusive-or of all data blocks in the data group
A redundant data block is generated by calculating. this
The redundant data block is uniquely determined for each file
To do. The update of redundant data is processed as described above. data
A table that manages the relationship between blocks and corresponding redundant data.
Table, each file has a
Register the long data block in association with it. Same file
While creating, adding, or changing a data block of
Generally has the opportunity to repeatedly update one piece of redundant data
Since there are many, if you use the above management method,
And manage redundant data by associating it with data blocks
There are the following effects. That is, this management method
According to the law, while a file is
With the update of data belonging to one redundant data group,
It is necessary to update the difference data that is the basis of redundant data generation.
However, since the data belongs to one file,
The corresponding difference data often exists on the buffer.
Become. As a result, the difference data is updated as the data is updated.
Is mostly done on the buffer for redundant data
As in the conventional technology, each time the data is updated, a redundant data
Without reading / writing data from the disk device
As a result, disk access due to redundant data update
The number of scans can be greatly reduced. The disc array is shown in Figure 3 below.
File system 103 that manages subsystem 2
The structure of each related part centering on is shown. File system
In 105, 1051 is an application file
File that performs file open processing when using
File open processing unit, 1052 is an open file
File index management table
File name management table that manages the relationship with the cable 1056.
Bull, 1053 is the file access from the application
A request processing unit for receiving a process request and performing an interpretation process, 1056
Splits the file into blocks and
Block number on the disk
File index to manage where it is stored
Management table, 1054 is file index management
Manages new registration, change, deletion, etc. of table 1056
File index manager, 1057
Buffer area where blocks are constructed in physical memory
Management information that is the management information for mapping to
Table, especially 1057a is a buffer area for data.
Data buffer management table for managing data area, 1
057b is for managing the buffer area for parity.
It is a parity buffer management table. Parity
This will be described later. 1055 is the buffer management table
1057 new registration, change, deletion, buffer allocation, etc.
The buffer management unit 1058 that manages the buffer
Disk array subsystem, disk, partition
And storage address on disk, etc.
The disk array that manages the disk array subsystem with
It is the management department. In the figure, 108 is in the main memory of the host 1.
This is the allocated buffer area. 108a is a user
User data area acquired for file access,
108b indicates that the file system 105 has activated the disk.
Data buffer area for data management during access, 1
In 08c, the file system 105 manages the parity.
Buffer area for parity, and 108d is a file system
Used temporarily by the system 105 when updating the parity
This is a work buffer area. Figure 5 shows disk array management
10 shows a block diagram of section 1058. 10581 is new
Rule block registration management unit, 10582 stores new blocks.
Machine to select the equipment (disk, partition) to be delivered
Device selection unit, 10583 is the physical device for the selected device.
Physical block mapping that determines the block address
The determination unit, 10584 determines the logical parity group,
The logical data block and logical parity block are related.
A logical parity management unit that performs the processing
Disk array usage status to manage the usage status of the disk array
It is a situation management table. Disk array usage management
The table 10585 is a disk that constitutes a disk array.
Management of disk physical block usage for each
Physical block management table 10585a and physical block
Management that manages the usage status of the sectors that make up the
It is composed of a table 10585b. 10586 is eliminated
A parity operation unit for performing other logical sum calculation, 10587
Uses parity data when one disk fails,
Degeneration processing that reproduces the data stored in the failed disk
It is a degeneration processing unit that executes the processing. Figure 6 shows a disk array
In the figure which shows the concrete example of the use situation management table 10585
is there. Physical block and sector usage for each disk
It is also configured so that the intended use can be determined. Less than,
The operation of this embodiment will be described. AP operates the file first
It is necessary to open the standing file. AP1
01 is the file name, open mode (read mode,
Read mode, read / write mode, additional mode, etc.)
Issue an open system call. Add mode and
Adds a file area to an existing file
It means a mode that indicates that it is a case of OS type
Some include in light mode. OS10
The system call I / F 104 of 3 receives this, and
If it is judged that it is a system call
The file open processing unit 1051 of the system is activated.
The section receives this, and from the file name, the file name management
The table 1052 is referred to and the file number k is changed. Also
However, in the write mode, this management table 1052
If the file is not registered, it is a new file.
I will newly register. By this process, AP can
Get the file number which is the operation number of the file. After that
Use this file number to access this file.
It FIG. 4 shows a configuration example of the file management table.
It 1052 is a file name management table and 1056 is a file name management table.
File index management table, 1057 is a buffer
The management table 108 is a buffer memory. Above
When the file number is obtained by open processing, this file is
File index corresponding to the file number as a key
The management table 1056 can be referred to. File index
The fax management table stores the contents as shown in the figure.
It is a table for each file. The outline of the contents is as follows
Is. The mode indicates the access mode of the file. Place
The owner is the user name who created the file, and the access identifier is
Indicates the access permission range of the file. Reference count
Indicates the number of references to the file. The time stamp is
The time when the file was last read or written, and the file
Shows the time when the index management table was last updated.
You The size indicates the size of the file in bytes. Bu
The number of locks represents the number of logical blocks used. Also
The buffer management table 105 corresponding to the logical block number
The pointer to 7 is stored. Buffer management table
There is one rule 1057 for each logical block.
The contents are listed. Buffer is valid for hash links
Link to a hash table to quickly determine if
Pointers, queue links for forming queues
The link pointer and flag are the buffer status
Buffer is used to determine if valid data is stored
The contents of the buffer have not been rewritten to the disk.
Indicates whether or not it is a projection. The device number is
And the partition number. Block number
No. is the disk address on the disk indicated by the device number
It is a number. The number of bytes is valid stored in this block.
The number of bytes of data. Buffer size is this buffer
Is the size in bytes of. Buffer pointer is physical
This is a pointer to the buffer memory. Logical parity glue
The group number is used when generating parity for each of the above files.
This is the number of the configured logical parity group. Logical paris
The pointer is the data buffer management table 1057.
Data logic that is valid when a and is stored in the buffer
The partition that stores the logical parity block corresponding to the block.
Pointer to the priority buffer management table 1057b
Is. Data buffer management table 1057a and
And the parity buffer management table 1057b are structural
Are the same. Data buffer management table 105
7a is from the file index management table 1056
Referring to the parity buffer management table 1057b,
From the corresponding data buffer management table 1057a
Only the points of reference are different. Things to reserve buffer
The physical memory consists of a data buffer and a parity buffer.
You can map to different areas or use the same area
It doesn't matter. Figure 3 shows a logical image
ing. Next, the file read / write operation is shown in Fig. 3 and Fig.
4 and the flow charts of FIGS. 7 to 12, FIGS.
1 showing the detailed configuration of the disk array management unit shown in FIG.
Using the schematic diagram of the data and parity update operation shown in 3.
explain. First, the read operation will be described with reference to FIG. Up
Open the file in read mode as described. That
After that, the AP issues a read system call (1100
1), the OS 103 system call I / F 104
And recognize that it is a lead system call,
The file system 105 is called (11002).
The request processing unit 1053 of the file system 105
Request (offset) issued by AP in units of
Convert to block units and sequentially apply the corresponding logical blocks.
Access (11003). Here, offset means
Amount of data to send (= transfer end address-transfer start address
Response), that is, the transfer length. First target logical block
The request processing unit 1053 sends a lock read request to the file
It is transmitted to the index management unit 1054. The department receives this
See the file index management table 1056.
Data block that stores the data of the target logical block.
A pointer to the buffer management table 1057a is obtained. One
Then, the buffer management unit 1055 receives this pointer,
Device of the disk device that stores the target logical block
Number, that is, disk number and partition number,
The storage disk block number is acquired (11005).
Here, the disk block number is the sector on the disk.
Is a logical address that corresponds to and is assigned linearly. It
In addition, this section refers to the buffer pointer and
Physical memory space is allocated to the target logical block
Determine whether or not. If the buffer is allocated
If the data of the target logical block is registered,
Flag on the buffer management table
However, if the "data valid" flag is "ON",
Read access to the disk is not required (cache hit
G) (11006). If the buffer is allocated
If not, the buffer management unit will newly
(11007). Then, in this case, and the above
When the "data valid" flag is "invalid", the device
The driver unit 106 is a disk I / F (for example, SCSI)
Read command is generated, and the disk I / F control unit 10
7 (11008). Here, the disk device
Transferred the data to the specified data buffer 108b
Until it does, the process stops, and the system enters a waiting state (1100
9). When the data transfer is completed,
Data buffer 108b
User data area 108a designated by AP (user)
Transfers the specified byte unit data (offset) to
(11010). The processing is performed again on the file system 105.
Request processing unit 1053 of all the AP specified
Judge whether the offset data has been transferred, and
However, if it is not completed, the above read
The process is repeated until the access is completed (11011). You
When all requested offsets have been transferred, the file system
The stem 105 receives A via the system call I / F 104.
Issue a system call end notification to P and return processing to AP
It Next, the operation during writing will be described. In FIG. 7, 1
The same is true up to 1003. 11004 when writing
Then, the process branches to the flowchart (A) of FIG. Phi
The index management unit 1054 receives this, and
Le index management table 1056
Data buffer management table for storing data
A pointer to the table 1057a is obtained. If applicable
Data buffer management table 1057a does not exist
When the target logical block is not stored on the disk
Meaning that it is a new logical block
Then, at the branch of (11102), the flowchart of FIG.
Branch to Details of this case will be described later. Also
The logical block already exists on the disk.
The corresponding data buffer management table,
The pointer is found. In this case, the buffer management
The unit 1055 receives this pointer and the target logical block
The device number of the stored disk device, that is, the disk
Disk number and partition number and storage disk block
Get a number. In addition, the part
Refer to the buffer or physical memory space
Determines if it is assigned to a lock. AP
The requested data write request is in byte units as described above.
Since it is specified by fusing, all of the applicable blocks
May not be rewritten. But this place
In this case, it is necessary to write in logical block units, so
Once the target logical block is read onto the data buffer
Then, update the necessary part of the logical block on the buffer,
Performs read-modify-write processing to write back to disk
It is necessary. The read modify write processing is as follows.
However, if the AP request is completely 1 logical block,
If you want to rewrite the data for
It is unnecessary. If a buffer is assigned to the logical block
If the data of the target logical block is registered,
Flag on the buffer management table
If the "data valid" flag is "ON", then
Old data of the target logical block exists in the data buffer.
Existing, no read access to the disk is required
(Cache hit) (11103). If the buffer
If is not assigned, the buffer management unit
Allocate the buffer (11104). Then, this place
And the "data valid" flag is "invalid"
Sometimes, the device driver unit 106 uses the disk I / F.
Generate a read command (eg SCSI) and
It is issued to the I / F control unit 107 (11105). here
Then, the data buffer 108b designated by the disk device
Processing is stopped until the end of data transfer to
(11106). Here is the old data lead date
Data transfer is completed. Then, it corresponds to the logical block
Refer to the buffer management table of the logical parity block
And the buffer memory for parity is mapped
It is determined whether or not (11107). If the buffer is
And the buffer contents are "valid"
If it is, proceed to FIG. Otherwise
If (buffer memory mapping is not completed),
Topping is performed (11108), and the process proceeds to (F). Just
Now we are in the process of updating the parity. Details of parity update processing
The details will be described with reference to FIG. In the case of Figure 9 (E),
Work buffer 108d0 is replaced with work buffer 108d0
Secured in the area (11109), and the area on the user area 108a
The data (new data) and the data buffer 108b0
Exclusive OR (EOR) with the above data (old data)
Calculate and generate difference data of both data (1111)
0). Here, the new data in the user area is
Buffer and update the buffer contents (1111
1). Then, the difference data on the work buffer 108d0
Data and exclusive OR (EOR) on the parity buffer
Is calculated, and the result is stored in the parity buffer 108c0.
Pay (11112). In addition, the parity buffer in the figure
The redundant data buffer is also named as the above calculation.
Due to this, new parity may not be generated in some cases.
Is. Here, the generated data is
Data, the new difference data of the old data, and the previous logical block
Exclusive theory with old difference data generated when writing
It is Riwa, and only the difference information of parity is calculated.
Yes. Therefore, eventually, the old files stored on the disk will
Calculates the exclusive OR of the difference information between the parity and this parity
Need to In this case, the parity buffer tube
"Old parity read completed" flag in the processing table 1057b.
Is off. If this flag is "O"
In the case of N ", the above processing is performed on the parity buffer.
At the end, new parity has been generated. Next,
The data buffer management table 1057a and Paris
"Buffer on Tee buffer management table 1057b
Set content "flag" to "dirty" (11113)
~ 11114). This "dirty" flag is a buffer
Indicates that the above data has not been reflected on the disc. I
Write back on the disc. This process is shown in FIG.
Is shown in the flowchart and will be described later. All AP request
Judge whether or not the block write of data is completed (1
1115), if not yet, return to FIG. Next
The case of FIG. 9F will be described. The difference from (E) is Paris
The parity buffer 108c0 has parity difference information
Means that the old parity is not stored. Therefore,
The difference information between the old and new data is generated in the same way as above, and the parity
The data is stored in the data buffer 108c0 (11116). Paris
The work buffer is unnecessary because the tea buffer is empty.
It In this case, the old parity read has not been performed yet.
Then, set the "old parity read completed" flag to "OFF".
It is set (11117). And on the user area
The data (new data) is stored in the data buffer area 1
08b and updates the buffer contents (1111
8). The same applies to the case of the above (E). Then a new
The case of writing the lock (D) will be described with reference to FIG.
It Logical parity group corresponding to the logical block
It is determined whether or not exists (11120). This format
The default is the file index management table 1056.
The same logical block as the logical block
You just need to determine if the block is already registered.
The determination method is the file index management table 10
56 has a pointer to the buffer management table 1057.
It can be judged by whether or not. If applicable logical parity
If there is no block, data buffer management table 1
057a and parity buffer management table 1057
Both of b are newly created and registered (111121-111)
22). Here, the buffs performed at 11121 and 11122
New allocation of disk management table 1057 and disk physical block
Please refer to the flowchart in Figure 11 for the lock mapping
It will be explained. First, the disk array management unit 1058
The new logical block registration management unit 10581 (FIG. 5) of
Receives a new block allocation request from the buffer management unit 1055.
And whether it is a block for parity or a block for data
(11301) to newly create the buffer management table 1
057 is created (11302). Next, the device selection unit 10
582 is a data block, a parity block, or
Also, based on information such as the logical block number,
Disk and partition to store the
Register in the buffer management table 1057 (1030
3, 10304) Next, physical block mapping decision
The part 10583 determines which physical block on the disk is concerned.
Disk array usage management
See the physical block management table in table 10585.
The storage location of other logical blocks in the same file.
Optimum to minimize seek and rotation waiting based on information
Select the appropriate block. Also, the management table 1058
Update 5. From the selection result of this physical block,
Block management block or sector number
Register in the block number field in Bull 1057
(11305). Then follow the logical parity group
Logical parity group
Is registered (11306). If it is a data block
If this is the case, manage the parity buffer corresponding to the data.
Pointer to table 1057b is a data buffer tube
Registered in the processing table 1057a, and if the parity block
If it is a check, the parity buffer management table 1
Register a NULL pointer in this field of 057b
(10308). Finally, the buffer management table 1
057 corresponding file index management table 1
Register in 056, allocate new buffer management table and
The logical block physical block mapping process is complete.
It Returning to FIG. 10, the newly registered data buffer tube
The buffer memory to the physical table 1057a.
(11123). If the parity buffer management table
Bull 1057b is also a buffer if newly created
Map memory. After that, the process at the time of the above update write
The difference data between the old and new data is calculated, and the parity
Calculate exclusive OR with the old difference data of the buffer area,
Store in the parity buffer area. If the old difference data
If there is not, the new difference data is used as is in the parity buffer area.
Stored in the area and set the "old parity read completed" flag to "OF".
Set to F ". Finally, set the new data in the user area.
Transfer to the data buffer area and return to FIG.
It During the above write processing, the data buffer and
When new information is stored in the
Set the "buffer contents" flag of the management table to "dirt"
y ”was set and the process was completed.
Old data when a write request to the buffer occurs.
This is because there is a possibility that the write processing can be reduced. this
Is called delayed writing, and at this time, appropriate timing
It is necessary to write back on the disk. buffer
This behavior is achieved by synchronizing the contents with the disc contents.
This is called sync operation. The flow chart of this sync operation
It shows in FIG. AP process called Sync Daemon is fixed
Issue sync system call periodically (1120
1). The OS receives this, and the file system buffer
The server management unit 1055 searches the list and
Look for a buffer whose "flag is" dirty "(112
03), if the searched buffer is data,
The contents of the data buffer management table
Write to a new disk (11204-11206), and
Later set the "buffer content" flag to "clean"
(11207). If it's a parity buffer
If you see "Old parity read completed" flag,
If it is "OFF" (11208), work buffer
Of the old parity is set in the buffer management table.
Lead from the disk to the work area. This old parity and
Exclusive of differential parity data on the parity buffer area
Calculates the logical sum and stores it in the buffer for parity
It Write this complete new parity to disk
And set the "old parity read completed" flag to "ON".
Also, set the "buffer contents" flag to "clean".
Set. If the "old parity read completed" flag
If it is "ON", the disc is immediately
Write to, and likewise, set the "buffer contents" flag to "cl".
Set to "ean". The above processing is performed for all "dirt".
This is done for the y "buffer.
According to this, the method of managing parity for each file in Fig. 1 is
realizable. Next, the effect of this embodiment will be described. In the conventional example
According to this, the file system 105
Determining the best disk address at that time
Therefore, as shown in FIG. 14a on the disk array subsystem.
In many cases, the block arrangement is intermittent. Optimal
Such decisions include, for example,
If there is a head, select an empty head at that time,
For example, select a sector with a low rotation waiting time. LB0
When LB3 is sequentially stored from the
Update the corresponding three parity blocks P1, P5, P8
Must be a data block and a parity block,
A total of 7 read-modify-write processes for 7 in total
It is necessary to carry out the reason. On the other hand, in the method of this embodiment,
14b, consecutive logical blocks LB0 to LB
The corresponding logical parity of 3 is uniquely determined to be LP0. Delayed letter
When the imprinting is performed as in the above embodiment, the LP0
Since the update can be executed only on the buffer, the parity update
Only one read-modify-write process required for processing
Including data update, data block and parity
5 blocks, total 5 times lead mod.
A write process will be performed. In this case, the processing effect
The rate is 1.4 times higher. Generalize the above effects
Then, a disk composed of (n + 1) disks
Write k logical blocks in array subsystem
The maximum number of parity updates in this case is (1) conventional k times (2) this embodiment k is a multiple of n (k / n) times k is not a multiple of n ((k / n) +1) times Becomes The processing efficiency including data update is about (2 / (1
+ (1 / n))) times. A second embodiment will be described.
It In the first embodiment, the number of logical blocks in the file is
I explained that it is an integer multiple of the number of data disks
However, the number of logical blocks in the file is actually an integer.
is there. For example, the file 0 (file0) shown in FIG.
It is composed of 10 logical blocks in total. data
If there are 4 disks, 8 logical blocks LB0 to LB7
The lock has two complete logical parity groups LPG0,
LPG1 can be configured, but two logics of LB8 and LB9
Block cannot form a complete logical parity group
Yes. Such a logical block is called a fragment logical block.
I will call it Ku. L of file 1 (file1)
The same applies to B4 and LB0 of file 2 (file2).
Is a fragment logic block. Flags like this
The following is an example of how to handle the ment logic block. This method
This is a complete extension of the first embodiment, and the flag in each file is
Segmented logical block only to form a logical parity group.
It is a way to make it. This method is similar to the first embodiment
Can be realized. This method is a file that is frequently added.
File already corresponds to fragment logical block
Added because logical parity block is assigned
The logical parity group composed of
Lagment logic block or complete
It is easy to configure without the need to recognize
There is In addition, this method requires only one logical block.
Logical blocks for small files made up of
Two blocks, a logical block and a logical parity block.
However, the contents of both are the same. like this
Then, when reading this file, which block should be
You may lead. That is, the data not currently in use
Disk, or both disks will be used.
When not in use, seek distance
Select a disc and lead. Such a small
In a system with many files,
By performing the above control, the speed-up effect is great. Third implementation
Here is an example: The second embodiment is very effective in speeding up small files.
While it has the effect of being easy and easy to manage,
File 2 consists of only one logical block
One logical parite for small files
Blocks must be allocated, and the disk capacity
There was a downside. Therefore, the method which improved this shortcoming
It shows in FIG. Fragment of file 0 (file0)
Logical blocks LB8 and LB9, and file 1 (file
e1) fragment logical block LB4 and file
2 (file2) fragment logical block LB0
Of the temporary logic by a total of four fragment logic blocks
Parity group VLPG0 is configured and these logical blocks
Block exclusive OR, and fragment logic block
Parity block (fragment parity block)
C) Generate FP0. A tentative logical parity group
All the fragment logical blocks contained in the
The data disk to be stored on the
Toparity block is different from the above data disks
Place it on the parity disk. Logical block of file
Flag in the file when storing
Need to check for the existence of
is there. Therefore, the file of FIG. 4 shown in the first embodiment.
Buffer management table 105 that is part of the management table
"Fragment logical block" flag in the flag area of 7
And the relevant logical block is a fragment logical block
Whether or not is recognizable. Also logical Paris
Buffer management table for parity that manages the T-block
Similarly, the "fragment logical block" is also applied to the rule 1057b.
A flag is set so that the logical parity block
It is possible to recognize whether it is a top parity block.
Ku. This fragment parity block contains multiple files.
Buffer management for fragment fragment logical block data
It will be referred to from the table 1057a. Existing
When adding a new logical block to the file,
First, the file index management table 1056 of FIG.
Management table 105 corresponding to each logical block
Referring to the pointer to 11, the new logical block number
If it is not (an integer multiple of the number of data disks-1),
The logical block becomes a fragment logical block. Yo
The "fragment logical block" flag to "ON"
If the fragment logical block is already in the same file,
If there is a link in the temporary logical parity group,
Incorporate a new fragment logical block. The provisional theory
Physical parity group is full or the same file
If there is no fragment logical block in the
Configure a logical parity group. If a new logic block
Check number matches (an integer multiple of the number of data disks n-1)
, The complete logical parity group inside the file.
Since loops can be configured, the n internal files
Calculate the exclusive OR of the Lagment logic block and
Configure a logical parity block on the
Store. At this time, all fragment logical blocks
To be stored on different data disks
Therefore, without moving the data logic block, the logic
Only the parity block needs to be newly generated. like this
In the process of growing the file, fragments
Although a logical block occurs, as shown in the first embodiment above.
The parity update processing is performed on the host main memory.
No, it doesn't write to the disc immediately,
The overhead time for updating the logical parity block is
Usually very small. As described above, according to this embodiment,
The number of file logical blocks is an integer multiple of the number of data disks
Parite saves parity disk space even when not
Management can be realized and its performance is almost the same as that of the first embodiment.
Etc. can be realized. Next, a fourth embodiment will be described. First above
In the third to third embodiments, the file management table (version
Logical block and logical parity
It was configured to maintain the correspondence of the two blocks. Book
In the embodiment, a second management method different from the above method is shown in FIG.
Show. 21052 is a file name management table, 2105
2a is a data file index for managing data files.
Pointer to the management table, 21052b is the data
Logical parity block corresponding to the logical block of the
Pack to manage the disk as one parity file.
Point to index file index management table
21056a manages the logical blocks of the data file.
Data file index management table for management, 21
056b is a parity flag for managing the logical parity block.
File index management table, 21057a and
21057b is for each logical block and logical parity block.
Buffer management table to manage the buffer corresponding to
Table. Data file index management table
21056a and parity file index tube
Process table 21056b is the file index of FIG.
It is similar to the management table 1056. Also a buffer tube
Process table 21057 is the buffer management table 1 of FIG.
Same as 057 except logical parity pointer
is there. The method of this embodiment uses one file as a data file.
File and parity file divided into two files
It makes sense. This will be described with reference to FIG. File name management
Table 21052 has two pointer fields, data
Data file index pointer 21052a and
Has a parity file index pointer 21052b
To do. file0 is a data file file0d and
Parity file file0p is divided into two parts,
Manage as an independent file. However, both of these
For the file, see the file name management table 21052 above.
Relate. Logical block and logical parity block, logical
The relation and management method of the parity group is the first embodiment described above.
Similar to the example. According to the above method, the same as the first embodiment
Similar effects can be obtained. Also, if the user needs
Depending on the file, for reasons such as work area
Does not add parity by selection when reliability is not required.
It is also possible to use this disk array subsystem
It is possible to provide reliability that matches the demands of other persons. Example 5
explain. The above embodiments are all OS 10 on the host 1.
With the method of controlling the disk array subsystem by 3
There was an application like a database management system.
In the solution, the
It is necessary to optimize the control of the
Of the disk system inside the application 101
It is necessary to control. FIG. 18 shows this example.
101 is an application, 1058 is a disk array
Disk array management unit that controls and manages subsystem 2
Is. The internal structure of 1058 is the same as that of the first to fourth embodiments.
It is like. This application is a file that the OS has
File management system, that is, without going through the file system,
-Directly disk array called device I / O
A method of controlling the subsystem is used. Application
O when a device issues a raw device I / O system call
S's system call I / F receives this and device dry
Disk I / F required by the application
Issue a disk command to the controller. OS like this
Will simply command the disk to receive the request of the application
Only a simple process of issuing a command is performed. Which disc
Whether to allocate a logical block or logical parity block to
It is the disk array pipe inside the application that determines
It becomes the role of the science department. The configuration and operation of this section are the same as those of the first embodiment
Similar to the example. According to the above embodiments, the disk array
Subsystem matched to application usage
It is possible to control in the form of high performance and high reliability.
The effect is greater. Next, a sixth embodiment will be described. Book
In the embodiment, as shown in FIG. 19, the disk array subsystem
The system 2 has a disk array management unit 1058.
Of. The disk array controller 211 communicates with the host 1.
File management information can be transferred to the disk array
It is managed inside the control unit 211. This place
In this case, the file is managed as a logical block as in the first embodiment.
Logical parity block corresponding to the logical block
to manage. Configuration and operation of disk array management unit 1058
The work is the same as in the first embodiment. According to this embodiment,
Logical blocks and theory within the Clay subsystem
Since it is possible to realize the optimal placement of the physical parity block,
The effect of higher performance is greater than that of technology. Next, the seventh embodiment
Will be explained. The first to sixth embodiments are easily networked.
Disk array system and distributed file system
Can be extended to. Figure 20 (A) shows an example.
Is shown. 5 is a network, 60-64 are
Hosts 0-4, 7 which are computers holding disks
0 is a disk, the user executes the program, host 0
A computer that issues disk access requests to
Clients, 80-83 are data disks, 84 is a disk
It is a disk for ritivity. Each host 0-4 has one device
It holds the disks 80-84. For example client
Reference numeral 70 denotes the file management method as in the first embodiment.
Suppose However, the client 70 does not
Since it does not have a disk unit for data
Disk access to hosts 0 to 4 via the network
Issue a process request. However, the client 70
The file management method can be managed in the same manner as in the first embodiment.
it can. However, in this case, the file management table shown in FIG.
Device number information of the buffer management table 1057 in the buffer
It is necessary to add the device identification number such as the host address to
is there. If you have a lot of clients,
Many disk access requests are issued to 0-4.
However, the hosts 0 to 4 are
Freely optimize and decide the storage address of the logical block
You can In addition, as another example, it is shown in FIG.
It is conceivable that the configuration is made. Host 0 is a data disk only
Is managing. Host 1 manages parity disk only
It makes sense. Client 70 sends data to host 0
Issue a disk access request to Paris to host 1.
Issue a disk access request. Of the above (A)
The difference from the example is that host 0 manages all data disks.
And host 1 manages the parity disk.
is there. Data and parity can be handled asynchronously
Each host has its own execution order and logical blocks.
Storage address can be freely optimized and determined
It Some examples have been described above.
One of the (n + 1) disks is a parity disk
However, the logical parity block is (n +
1) It is possible to disperse and arrange all the disks.
is there. Even in this case, in order to speed up by parallel processing
All logical blocks and logical partitions in the logical parity group are
Store all utility blocks on different disk units.
It is the same as in the above embodiment that it is necessary to do so. Each logic block
And the logical parity block is arbitrary on each disk.
It can be located at the address. Also, the above
All of the examples are for an array system using a magnetic disk device.
I explained about it, but instead of the magnetic disk unit, an optical disk
Disk device, magnetic tape device, or semiconductor memory device.
An array system that realizes file management like
It is also possible. In addition, a disk using a magnetic disk
Only the parity disk in the square array system
Or using a magnetic tape device or a semiconductor memory device
Can be replaced by a storage device that combines these
Is. As described above, the system of the present invention can be applied to various system configurations.
A file storage management method can be used.

【０００８】[0008]

【発明の効果】以上のように、本発明によれば、ファイ
ル毎にパリティを生成するので、パリティを局所化して
扱うことが可能となり、同一ファイルの更新、追加等を
繰り返し行う場合、パリティの更新処理のほとんどをホ
ストのメモリ上のバッファ領域で行うことが可能とな
り、パリティ更新にともなうディスクアクセス回数を大
幅に削減できファイル処理の高速化を実現できるという
効果がある。また、データ、パリティ両者ともにホスト
側で格納位置を決定できるのでディスク上への論理ブロ
ック配置最適化を実現でき、アクセス時間の短い高速リ
ード／ライト処理を実現できるという効果がある。ま
た、ディスクアレイの制御はホスト側ですべて行うの
で、ディスクアレイ制御のための特別な回路が不要であ
る。具体的には、図２１では、２個のディスクＩ／Ｆ制
御部１０７，２１０があるが、図２では、１個のディス
ク制御部１０７のみで済む。これにより、低価格なディ
スクアレイサブシステムを構築できという効果があり、
さらに、特別な回路を付加した際に発生する処理時間の
増加を削減できるという効果もある。また、アプリケー
ション内部に本発明のファイル格納管理方式を搭載でき
るので、アプリケーションのディスクアクセス特性に合
致した最適なファイル格納管理を実現でき、高性能化、
高信頼化を実現できる。また、ネットワーク上のいかな
る場所にでも論理ブロックならびに論理パリティブロッ
クを配置できるので高信頼、高性能な分散ファイルシス
テムを容易に構築できるという効果がある。また、パリ
ティのみをディスク装置以外のデバイスにおくことも可
能となり、データには高性能なディスクを、パリティに
は低性能であるが安価なデバイスを選択することで高コ
ストパフォーマンスなファイルシステムを構築可能であ
る。また、ファイル毎にパリティを付加するか付加しな
いかを利用者が決定できるため、利用者の要望に合致し
た信頼性を提供できる。As described above, according to the present invention, since the parity is generated for each file, the parity can be localized and handled, and when the same file is repeatedly updated or added, the parity can be changed. Since most of the update processing can be performed in the buffer area on the memory of the host, there is an effect that the number of disk accesses associated with parity update can be significantly reduced and file processing can be speeded up. Further, since the storage positions of both the data and the parity can be determined on the host side, there is an effect that optimization of logical block arrangement on the disk can be realized and high-speed read / write processing with a short access time can be realized. Further, since the host side controls all of the disk array, no special circuit for controlling the disk array is required. Specifically, in FIG. 21, there are two disk I / F control units 107 and 210, but in FIG. 2, only one disk control unit 107 is required. This has the effect of building a low-cost disk array subsystem.
Further, there is an effect that it is possible to reduce an increase in processing time that occurs when a special circuit is added. In addition, since the file storage management method of the present invention can be installed inside the application, it is possible to realize optimum file storage management that matches the disk access characteristics of the application, and improve performance.
High reliability can be realized. Further, since the logical block and the logical parity block can be arranged at any place on the network, there is an effect that a highly reliable and high performance distributed file system can be easily constructed. It is also possible to put only parity in a device other than a disk device, and build a high cost performance file system by selecting a high performance disk for data and a low performance but inexpensive device for parity. It is possible. Further, since the user can decide whether or not to add the parity for each file, it is possible to provide the reliability that matches the user's request.

[Brief description of drawings]

【図１】論理ブロックとパリティブロックの対応付けを
示す説明図。FIG. 1 is an explanatory diagram showing correspondence between logical blocks and parity blocks.

【図２】第１実施例のシステム構成図。FIG. 2 is a system configuration diagram of the first embodiment.

【図３】本発明のファイル管理を実現する部分のブロッ
ク図。FIG. 3 is a block diagram of a part that realizes file management of the present invention.

【図４】ファイル管理用のテーブルの関係の説明図。FIG. 4 is an explanatory diagram of a relationship between file management tables.

【図５】ディスクアレイ管理部のブロック図。FIG. 5 is a block diagram of a disk array management unit.

【図６】ディスクアレイ使用状況管理テーブルの構成例
を示す説明図。FIG. 6 is an explanatory diagram showing a configuration example of a disk array usage status management table.

【図７】ディスクリード処理のフローチャート。FIG. 7 is a flowchart of disk read processing.

【図８】ディスクライト処理のフローチャート（１）。FIG. 8 is a flowchart (1) of disk write processing.

【図９】ディスクライト処理のフローチャート（２）。FIG. 9 is a flowchart (2) of disk write processing.

【図１０】ディスクライト処理のフローチャート
（３）。FIG. 10 is a flowchart (3) of disk write processing.

【図１１】バッファ管理テーブル割当処理及び物理ブロ
ックマッピング処理のフローチャート。FIG. 11 is a flowchart of a buffer management table allocation process and a physical block mapping process.

【図１２】ディスクシンク処理のフローチャート。FIG. 12 is a flowchart of disk sync processing.

【図１３】データ論理ブロック、論理パリティブロック
の更新処理の模式図。FIG. 13 is a schematic diagram of update processing of a data logical block and a logical parity block.

【図１４】ファイルの論理ブロックとパリティとの関係
について、従来技術と本発明とを対比して示す説明図。FIG. 14 is an explanatory diagram showing a relationship between a logical block of a file and a parity by comparing the related art with the present invention.

【図１５】第３実施例のファイルの論理ブロックと論理
パリティの関係を示す説明図。FIG. 15 is an explanatory diagram showing a relationship between a logical block and a logical parity of a file according to the third embodiment.

【図１６】第４実施例のファイル管理テーブルの実施例
を示す説明図。FIG. 16 is an explanatory diagram showing an example of a file management table of the fourth example.

【図１７】第４実施例のファイル名管理テーブルとデー
タファイル、パリティファイルの関係を示す説明図。FIG. 17 is an explanatory diagram showing a relationship between a file name management table, a data file, and a parity file according to the fourth embodiment.

【図１８】第５実施例のシステム構成を示すブロック
図。FIG. 18 is a block diagram showing the system configuration of a fifth embodiment.

【図１９】第６実施例のシステム構成を示すブロック
図。FIG. 19 is a block diagram showing the system configuration of a sixth embodiment.

【図２０】第７実施例のシステム構成を示すブロック
図。FIG. 20 is a block diagram showing the system configuration of a seventh embodiment.

【図２１】従来例のファイル管理部を中心とするシステ
ム構成を示すブロック図。FIG. 21 is a block diagram showing a system configuration centering on a file management unit of a conventional example.

[Explanation of symbols]

１０５４ファイルインデックス管理部１０５５バッファ管理部１０５６ファイルインデックス管理テーブル１０５７バッファ管理テーブル 1054 File index management unit 1055 Buffer management unit 1056 File index management table 1057 Buffer management table

───────────────────────────────────────────────────── フロントページの続き (72)発明者大枝高神奈川県横浜市戸塚区吉田町292番地株式会社日立製作所マイクロエレクトロニクス機器開発研究所内 (72)発明者高橋宏明神奈川県横浜市戸塚区吉田町292番地株式会社日立製作所マイクロエレクトロニクス機器開発研究所内 (72)発明者秋山仁神奈川県横浜市戸塚区吉田町292番地株式会社日立製作所マイクロエレクトロニクス機器開発研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Takashi Oeda Takashi Oeda 292 Yoshida-cho, Totsuka-ku, Yokohama-shi, Kanagawa Prefecture Hitachi Ltd. Microelectronics Device Development Laboratory (72) Inventor Hiroaki Takahashi Totsuka-ku, Yokohama-shi, Kanagawa 292 Yoshida-cho, Hitachi, Ltd. Microelectronics Device Development Laboratory (72) Inventor Hitoshi Akiyama 292, Yoshida-cho, Totsuka-ku, Yokohama, Kanagawa Prefecture Hitachi, Ltd. Microelectronics Device Development Laboratory

Claims

[Claims]

1. Data constituting the same file is divided into a plurality of data segments, one data block constitutes one data block, and the plurality of data blocks are distributed to a plurality of external disk devices. A file storage management device for determining a redundant data group composed of a plurality of data blocks stored in different disk devices among the data blocks stored in the plurality of disk devices, Determining means for determining the disk device storing each of the data and redundant data is obtained from the data contained in the data blocks in the same redundant data group, and a redundant data block composed of the redundant data is generated for each redundant data group. Means, the above files, and the above which constitutes the above files Data storage management comprising: first storage means for storing information on the correspondence relationship with the data block; and second storage means for storing information on the correspondence relationship between the data block and the redundant data block. apparatus.

2. The file storage management device according to claim 1, wherein information about the data block and the redundant data block regarding the same file is stored for each file. Management device.

3. The file storage management device according to claim 1, wherein the redundant data group has n data blocks, and when the number m of data blocks forming the file is not a multiple of n, By the surplus data block obtained by dividing m by n, and the data blocks forming other files,
A file storage management device comprising a redundant data group composed of n data blocks.

4. The file storage management device according to claim 1, wherein the redundant data group has n data blocks, and when the number m of data blocks forming the file is not a multiple of n, A file storage management device characterized in that one redundant data group is constituted by a surplus data block obtained by dividing m by n.

5. The file storage management device according to claim 4, wherein when the number of the surplus data blocks is one, the one data block constitutes one redundant data group, The file storage management device, wherein the redundant data block corresponding to the redundant data group is composed of the same data as the data block.

6. The file storage management device according to claim 1, wherein the second storage unit has information regarding the data block for each of the data blocks, and stores the information regarding the data block. A file storage management device, characterized in that it has, as a part, information regarding a correspondence relationship between the data block and the redundant data block.

7. The file storage management device according to claim 1, wherein one independent file is configured as a redundant data file in the entire plurality of redundant data blocks, and the second storage means is provided. A file storage management device having information for accessing the redundant data file.

8. The file storage management device according to claim 1, further comprising means for storing information on whether to generate redundant data for each file. apparatus.

9. The file storage management device according to claim 1, wherein the file storage management device is a file management system included in an operating system of a computer.

10. The file storage management device according to any one of claims 1 to 9, wherein the file storage management device is included in a redundant data group from which the redundant data is generated, out of the plurality of disk devices. There is a determining means for determining a disk device in which no data block is stored as a redundant data disk device for storing the generated redundant data block, and the determining means is one disk for each redundant data block. A file storage management device characterized in that there are a plurality of disk devices for which a device is determined and which is subject to the above determination.

11. The file storage management device according to any one of claims 1 to 9, wherein the file storage management device is included in a redundant data group from which the redundant data is generated, out of the plurality of disk devices. There is a determining means for determining a disk device in which no data block is stored as a redundant data disk device for storing the generated redundant data block, and the determining means is one disk for each redundant data block. A file storage management device, characterized in that a disk device for which a device is determined and which is a target of the above determination is a disk device for storing redundant data blocks exclusively.

12. A disk array type file system, comprising: a computer having the file storage management device according to claim 1; and a disk array device having a plurality of disk devices.

13. A disk array subsystem comprising the file storage management device according to claim 1 and a plurality of disk devices.

14. A computer system comprising a computer which is connected to a network and which has the file storage management device according to any one of claims 1 to 11, and a plurality of disk devices which are connected to the network.

15. The file storage management device according to claim 1, wherein the redundant data disk device to be determined is an optical disk device.

16. The file storage management device according to claim 1, wherein the redundant data disk device to be determined is a semiconductor storage device.

17. The file storage management device according to any one of claims 1 to 11, wherein the redundant data disk device to be determined is a magnetic tape device.