WO1998041915A1 - Sous-systeme pour pile de disques - Google Patents

Sous-systeme pour pile de disques Download PDF

Info

Publication number
WO1998041915A1
WO1998041915A1 PCT/JP1997/000901 JP9700901W WO9841915A1 WO 1998041915 A1 WO1998041915 A1 WO 1998041915A1 JP 9700901 W JP9700901 W JP 9700901W WO 9841915 A1 WO9841915 A1 WO 9841915A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
disk
read
disk drive
control device
Prior art date
Application number
PCT/JP1997/000901
Other languages
English (en)
Japanese (ja)
Inventor
Yoshihito Usui
Toshio Nakano
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/JP1997/000901 priority Critical patent/WO1998041915A1/fr
Publication of WO1998041915A1 publication Critical patent/WO1998041915A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems

Definitions

  • the present invention relates to a disk array subsystem, and in particular, to a disk array subsystem that guarantees a maximum response time to a higher-level device even if a failure occurs in a storage device.
  • the disk drive when a read request is issued from a higher-level device, and when the requested target read data is not normally read from the disk drive, the disk drive Repeat the process of reissuing the read request and retrying, rebuilding data from the remaining disk drive containing redundant data when recovery within a single disk drive is not possible, and reliability of the entire subsystem Is increasing.
  • Japanese Unexamined Patent Publication (Kokai) No. 5-165855 (81) discloses that when a failure occurs in one of the disk drives, the data restoration amount is changed according to the frequency of access from a higher-level device. It describes that the access to the host device and the data restoration process are executed efficiently.
  • JP-A-Hei JP, A 5-3-146060 describes that when a failure data block is accessed, a data block and redundant data necessary for data recovery are used. It describes that data transfer is speeded up by restoring data and transferring it to a higher order.
  • An object of the present invention is to provide a disk array subsystem that guarantees a maximum access time even when a read error occurs.
  • the disk array subsystem when the read data requested from the host device is not normally read from the disk drive, the disk array subsystem issues a read request again to the disk drive and retries the read processing. Without processing, the disk address of the disk drive that could not be read normally is stored, and the redundant data of the parity group to which the data that could not be read immediately belongs and the data of other disk drives are stored. Restore the normal data and transfer it to the host device. After completing the processing for the host device, re-issue the read request to the failed drive and failed storage device stored in the control unit, and perform maintenance diagnostic processing such as executing diagnostic commands. Performs asynchronously, and if recovery is impossible, performs the closing process of the drive.
  • the control device of the disk array device of the present invention accumulates the usage rates of the resources such as the control device and the disk drive, and stores the respective usage rates in the control device. It also has a mechanism to monitor the resource usage at regular intervals, and performs the above maintenance and diagnostic processing asynchronously when the usage falls below a preset threshold.
  • the disk array device of the present invention stores the address of the data block requested by the higher-level device in the control device, and determines the data access mode. If it is determined that there is no continuity in the read request.
  • __Behavior _ If the read data requested by the higher-level device is not normally read from the disk drive, Starts data recovery from the remaining disk drives immediately, including the redundant disk drive, without any attempt.
  • the read request from the higher-level device is a sequential read for a continuous block
  • the data requested by the higher-level device for data transfer and the other parity group to which this data belongs Data and redundant data are stored in the cache memory when data is read from multiple disk drives. At this time, if it is recognized that the reading of the data block is not performed normally, the data that could not be read from the data stored in the cache memory is restored and transferred to the higher-level device without performing read retry.
  • the disk array subsystem of the present invention stores the unreadable disk drives and data blocks in the control device, and performs diagnosis and maintenance asynchronously at a later time. As described above, access to this data block is not performed until this diagnostic maintenance is completed.
  • the disk array subsystem of the present invention has a mechanism for monitoring a read error rate for each disk drive.
  • the disk array subsystem of the present invention determines that a read error rate of a specific drive exceeds a specified threshold value as a sign of a disk drive failure, and also includes a redundant data portion during sequential read. At this time, data transfer to the host device is accelerated by shifting to a mode in which data is read at the time of failure and data recovery in case of read failure is restored using redundant data already stored in the cache and the remaining data.
  • FIG. 1 is a configuration diagram of a disk subsystem of the present invention.
  • FIG. 2 is a functional block diagram of a program operated by a microprocessor in the control device of the present invention.
  • FIG. 3 is a flowchart of a processing procedure of the disk array subsystem of the present invention.
  • FIG. 4 is a flowchart of a data transfer processing procedure at the time of random read.
  • FIG. 5 is a flowchart of a process when a read request for a disk drive for which diagnosis and maintenance have not been performed is received after a read error is detected.
  • FIG. 6 is a flowchart of the processing procedure in the case of RAI D3. BEST MODE FOR CARRYING OUT THE INVENTION
  • FIG. 1 is an example of a configuration of a disk subsystem of the present invention.
  • a host device 10 is a central processing unit that performs data processing.
  • Reference numeral 20 denotes a control device of the disk array subsystem, and reference numerals 30-1 to 5 denote single disk drives which store data of a host device and are controlled by the control device 20.
  • the number of disk drives is five. However, it is not necessary that the number be five.
  • the control unit 20 controls the data transfer between the host device 10 and the disk drive, which is a protocol processor including a DMA for controlling the data transfer to and from the host device 10.
  • a disk data transfer control unit 102 which is a protocol processor including a disk-side DMA, a disk cache 104 for temporarily storing read data and write data from a disk drive, and a disk cache 1 It has a cache control unit 103 that controls the reading and writing of data to 04.
  • the control device 20 has a microprocessor 105 for controlling the flow of data in the control device 20 and a control memory 106 for storing control data.
  • the upper data transfer control unit 101 When the upper data transfer control unit 101 recognizes the access request from the host device 10, it issues an interrupt request to the microprocessor 105, and the microprocessor 105 controls the command from the host device 10. Stored in control memory 106. If this command is a write, the microprocessor 105 instructs the upper data transfer controller 101 to transfer data and stores the data in the cache 104. After storing the data in the cache 104, the upper data transfer control unit 101 issues a write completion report to the upper device 10.
  • RAID 5 disk arrays require old data and old parity when updating new data. If these data are not in the cache 104, the microprocessor 105 instructs the disk data transfer control unit 102 to read them from the disk drives 30-1 to 5, and reads them out. The stored data is stored in the cache 104. When the old data and the old parity are in the cache 104, the microprocessor 105 uses the old data, the new data, and the old parity in the cache 104 to generate the new data. Creates a cache and stores it in cache 104. Thereafter, the microprocessor 105 instructs the disk data transfer control unit 102 to write the new data and new parity in the cache 104 to the disk drives 30-1 to 5-5.
  • the microprocessor 105 sets the disk drive 30-1 in which the data block of the access request is stored in the disk data transfer control unit 102. Instructs access to one of 5 to 5 and stores the read data in cache 104. After that, the higher data transfer unit 101 issues the completion report after transferring the data stored in the cache 104 to the higher order device 10 in accordance with the instruction of the microprocessor 105.
  • the disk data transfer control unit 102 reads data other than the data concerned and redundant data in the parity group from the disk drives 30-1 to 5 and stores them in the cache 104. I do.
  • the microprocessor 105 generates the data using the data and the redundant data, and the upper data transfer controller 101 transfers the data to the upper device 10.
  • FIG. 2 is a functional block diagram of a microphone port program which operates on the microprocessor 105 of the disk array device of the present invention.
  • the microprocessor 105 has the following three functions in addition to the basic function 201 of R AID by a microprogram in order to realize the function of the present invention.
  • the first function is an access mode monitoring function 202. This function is to return to the control memory 106 the address of the data request for the read request from the host device 10 in the past, and if the address is for a continuous block, Recognizes as sequential read, and instructs the disk data transfer control unit 102 to pre-read the data block.
  • the second function is a resource usage monitoring function 203. The time during which resources such as the controller 20 and the disk drives 30-1 to 5 are used within a certain time is monitored, and the usage rate is calculated by division.
  • the third function is the read error rate monitoring unit 204.
  • the disk data transfer control unit 102 calculates the read error rate by accumulating the number of times that a read error occurred when reading data for each disk drive 3 0-1 to 5 and dividing by the total number of accesses. It is stored in the control memory 106.
  • FIG. 3 is a flowchart of processing by the disk array subsystem of the present invention.
  • the host data transfer control unit 101 in the control unit 20 receives an access request from the host unit 10 (step 301), it identifies whether this command is a read request or a write request (step 310). If the request is a write request, the above-described write processing (step 700) is performed. If the request is a read request, the microprocessor 105 uses the access mode monitoring function 202 to determine whether or not the last three accesses are to a continuous block address (step 303).
  • step 400 a random read process (step 400) is performed. The random read processing will be described later (see Fig. 4).
  • the access from the host device 10 is recognized as a sequential read (step 304).
  • the read error rate monitoring function 204 checks the read error rate of each of the disk drives 300 through 1-5 (step 300), and all of the disk drives return the read error rate. If the error rate does not exceed the threshold, normal sequential read processing 8 0 Do 0. That is, the disk data transfer control unit 102 reads the data block of the peripheral address excluding the data block for which the access request was made and the redundant data unit into the cache 104, and reads the redundant data unit only for a temporary error. .
  • the microprocessor 105 and the read error rate monitoring function 204 recognize that the read error rate exceeds the threshold value for one of the specific disk drives 30-1 to 5 (step). 3 0 5), the processing shifts to the following processing method.
  • the read error rate monitoring function 204 is not indispensable, and when a sequential read is recognized, the processing method may be immediately shifted to the processing method of step 306 and thereafter.
  • the microprocessor 105 sends the disk data transfer control unit 102 not only the access request block from the disk drives 30-1 to 5 to the higher-level device 10 but also the parity containing the access request block.
  • the remaining data block and redundant data of the key group are also read out at the same time, and an instruction is given to store them in the cache 104 (step 310).
  • the disk data transfer control unit 102 If the disk data transfer control unit 102 can read the requested data from the host device 10 normally, the host data transfer control unit 101 reconnects to the host device 10, The data is transferred (step 307). If the request data from the higher-level device 10 cannot be read normally, the microprocessor 105 causes the disk data transfer control unit 102 to suppress the retry of the read request for the block address of the data. The disk and block address of the data that could not be read normally are stored in the control memory 106 (step 308). The microprocessor 105 checks whether all data necessary for restoring the data that could not be read and redundant data are stored in the cache 104 (step 309).
  • disk data transfer The transmission control unit 102 reads the missing data from the disk drives 30-1 to 30-5 according to the instruction of the microprocessor 105, and stores the data in the cache 104 (step 320). Then, the microprocessor 105 restores the data using the data in the cache 104 and the redundant data (step 310), stores the data in the cache 104, and reconnects the upper data transfer control ⁇ 01 to the upper device 10. (Step 311) and transfer the restored data (Step 312).
  • the upper data transfer control unit 101 After transferring the data to the host device 10, the upper data transfer control unit 101 issues a completion report (step 3 13), and the cache control unit 103 reads and holds the redundant data for data restoration. Only data is discarded from the cache 104 (step 314).
  • the microprocessor 105 uses the resource usage monitoring function 203 to check that the level of the controller 20 is higher than the preset threshold value for the resource usage of the disk drives 30-1 to 5-5. If it is recognized (step 315), the error rate of each disk drive 30-1 to 5 is updated by the read error rate monitoring function 204 (step 319), stored in the control memory, and the processing is executed. finish. Processing for the next access request from the host device 10 is started.
  • the microprocessor 105 checks whether the address of the data block which has not been subjected to diagnosis and maintenance is stored in the control memory 106 (step 315). 31 6) If not, update the error rate in the same manner as described above (step 319) and end the processing. If there is, execute diagnosis and maintenance for the address (step 317). If the disk failure is recognized as a result of the diagnosis and maintenance (step 318), the flow proceeds to the block processing (step 900). If no disk failure, microprocessor 105 updates the error rate. The processing ends, and processing for the next access request from the host device 10 starts.
  • FIG. 4 is a flowchart of a data transfer process when an access request from a host device is a random read.
  • the disk data transfer control unit 100 2 reads the data block requested to be accessed from the disk drives 30-1 to 5 (step 402) and stores it in the cache 104.
  • the disk data transfer control unit 102 can normally read the requested data from the host device 10 (step 4003), the host data transfer control unit 101 sends the host device 1 Reconnect to 0 (step 408) and transfer the data (step 409).
  • the microprocessor 105 sends the block address of the data that could not be read to the disk data transfer control unit 102. Then, a command is issued to suppress the retry of the read request, and the disk and the block address are stored in the control memory 106 (step 404).
  • the microphone port processor 105 reads the data necessary for restoring the data and redundant data from the disk drives 30-1 to 5 (step 400), and then these data are stored in the cache 104. (Step 406). If these data are not stored, the disk data transfer control unit 102 reads the missing data from the disk drives 30-1 to 5 according to the instruction of the microprocessor 105. (Step 41 1), the data is stored in the cache 104. Next, the microprocessor 105 restores the data using the data in the cache 104 and the redundant data (step 407) and stores the data in the cache. After that, the upper data transfer control unit 101 reconnects to the upper device 10 (step 408), transfers the restored data (step 409), and issues a completion report (step 410).
  • the processing after step 411 is the same as the processing after step 314 in FIG. 3, and therefore the description is omitted.
  • the disk drive 30 (30-1 to 5-5) which is stored in the control memory 106 as a block address where a read error has occurred and has a block address for which diagnosis and maintenance have not been performed.
  • the disk data transfer control unit 102 does not access the access request block (step 503), and the other disk in the parity group necessary for restoring this data.
  • the redundant data are read from the disk drives 30-1 to 5 (step 504), and the microprocessor 105 restores the data (step 506) using these data, and the upper data transfer control unit 101 reads the data. Transfer to upper device 10 (steps 507 to 509) A control method may be adopted.
  • the processing after step 510 is the same as the processing after step 314 in FIG. 3, and a description thereof will be omitted.
  • FIG. 6 is a flowchart of a process when the present invention is applied to a RAID 3 disk array.
  • the same control method as in the above-mentioned sequential read is adopted regardless of whether the access form from the higher-level device 10 is a random read or a sequential read.
  • the disk control unit 103 accesses the disk drives 30-1 to 5 to read data bits in order, and The redundant data section is also read (step 604), stored in the disk cache 104, and if no error is detected (step 605), the upper data transfer control section 101 sends the data to the upper apparatus 10 The data is transferred as it is (step 609). If an error is detected (step 605), the microprocessor 105 restores the data using the normal data bits and redundant data already stored in the cache 104 (step 605). ). After that, the upper data transfer processing unit 101 transfers the restored data to the upper device 10.
  • the disk control unit 103 Accesses only the data disk except for the parity disk among the disk drives 3 0 to 1 to 5 (step 6 19). Only when an error is detected, the redundant data portion necessary for data recovery from the parity disk is accessed. A reading control method may be adopted. In the disk array device of the present invention, when a read error occurs, the data is immediately restored from the redundant data without reissuing the read request, and the data is transferred to the higher-level device. There is no delay in access time due to rotation waiting.
  • the block address of the relevant disk drive at the time of the read error is stored, and as a result of monitoring the utilization of resources such as the controller and the disk drive, diagnosis is performed when the resource is free after the completion of the processing. Since access and maintenance functions are performed independently, there is no delay in the access time during reading.
  • Sequential data such as moving images can be transferred to the host device without interruption.
  • the data transfer speed is reduced by about 30% at the maximum.
  • the drive failure occurs. Even in the event of occurrence, performance equivalent to that at normal times can be maintained.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

L'invention se rapporte à un appareil pour pile de disques qui fait en sorte qu'un temps d'accès maximum soit garanti même lorsqu'une erreur se produit au cours de la lecture de données à partir d'une unité de disques. Lorsqu'une demande d'accès à un bloc de données qui a été effectuée par un appareil (10) d'ordre supérieur n'est pas lue normalement à partir d'unités (30-1 à 30-5) de disques, une nouvelle demande de lecture est adressée aux mêmes unités de disques; un nouvel essai n'est pas réalisé et des données normales sont rétablies immédiatement à partir de données redondantes et de données lues à partir d'une unité de disques qui n'est pas l'unité de disques défaillant, et les données rétablies sont transférées normalement à l'appareil (10) d'ordre supérieur. Une unité de commande (20) établit un diagnostic et effectue le travail de maintenance indépendamment, pendant que les données se trouvant dans l'appareil pour piles de disques ne sont pas utilisées. Ainsi, lors du transfert de données, l'appareil (10) d'ordre supérieur ne doit pas attendre, même en cas d'erreur de lecture.
PCT/JP1997/000901 1997-03-19 1997-03-19 Sous-systeme pour pile de disques WO1998041915A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP1997/000901 WO1998041915A1 (fr) 1997-03-19 1997-03-19 Sous-systeme pour pile de disques

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP1997/000901 WO1998041915A1 (fr) 1997-03-19 1997-03-19 Sous-systeme pour pile de disques

Publications (1)

Publication Number Publication Date
WO1998041915A1 true WO1998041915A1 (fr) 1998-09-24

Family

ID=14180258

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1997/000901 WO1998041915A1 (fr) 1997-03-19 1997-03-19 Sous-systeme pour pile de disques

Country Status (1)

Country Link
WO (1) WO1998041915A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7415636B2 (en) 2004-09-17 2008-08-19 Fujitsu Limited Method and apparatus for replacement processing
JP2009282848A (ja) * 2008-05-23 2009-12-03 Fujitsu Ltd 異常判定装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63244237A (ja) * 1987-03-31 1988-10-11 Toshiba Corp 情報処理装置
JPH02146638A (ja) * 1988-11-29 1990-06-05 Fujitsu Ltd 装置診断方式
JPH03225416A (ja) * 1990-01-31 1991-10-04 Hitachi Ltd 並列データ転送方式
JPH05289818A (ja) * 1992-04-08 1993-11-05 Hitachi Ltd ディスクアレイ制御方式
JPH0651915A (ja) * 1992-08-03 1994-02-25 Hitachi Ltd ディスク装置およびディスクアレイ管理方式

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63244237A (ja) * 1987-03-31 1988-10-11 Toshiba Corp 情報処理装置
JPH02146638A (ja) * 1988-11-29 1990-06-05 Fujitsu Ltd 装置診断方式
JPH03225416A (ja) * 1990-01-31 1991-10-04 Hitachi Ltd 並列データ転送方式
JPH05289818A (ja) * 1992-04-08 1993-11-05 Hitachi Ltd ディスクアレイ制御方式
JPH0651915A (ja) * 1992-08-03 1994-02-25 Hitachi Ltd ディスク装置およびディスクアレイ管理方式

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7415636B2 (en) 2004-09-17 2008-08-19 Fujitsu Limited Method and apparatus for replacement processing
JP2009282848A (ja) * 2008-05-23 2009-12-03 Fujitsu Ltd 異常判定装置
JP4627327B2 (ja) * 2008-05-23 2011-02-09 富士通株式会社 異常判定装置
US8089712B2 (en) 2008-05-23 2012-01-03 Fujitsu Limited Abnormality determining apparatus

Similar Documents

Publication Publication Date Title
EP0502207B1 (fr) Controleur d'entree/sortie
US6442711B1 (en) System and method for avoiding storage failures in a storage array system
JP5887757B2 (ja) ストレージシステム、ストレージ制御装置およびストレージ制御方法
JP2905373B2 (ja) ディスク制御装置及びその制御方法
JP3732869B2 (ja) 外部記憶装置
US7383380B2 (en) Array-type disk apparatus preventing lost data and providing improved failure tolerance
JP2501752B2 (ja) コンピユ―タ・システムのストレ―ジ装置及びデ―タのストア方法
US6321346B1 (en) External storage
EP0945801B1 (fr) Dispositif de mémoire externe et méthode de sauvegarde de données
JP2548480B2 (ja) アレイディスク装置のディスク装置診断方法
CN1300696C (zh) 存储控制器及数据存储方法
US20020038436A1 (en) Disk array apparatus, error control method for the same apparatus, and control program for the same method
JP4939180B2 (ja) 接続された装置を構成するための初期設定コードの実行
JP3681766B2 (ja) ディスクアレイ装置
JP3284963B2 (ja) ディスクアレイの制御装置及び制御方法
JPH1195933A (ja) ディスクアレイ装置
WO1998041915A1 (fr) Sous-systeme pour pile de disques
US5878202A (en) I/O device having check recovery function
JP2000276305A (ja) ディスクアレイ装置
JP2548475B2 (ja) アレイディスク装置のデータ復元量制御方法
JP2735801B2 (ja) 入出力制御装置
JPH06242888A (ja) ディスクアレイ装置、コンピュータシステム及びデータ記憶装置
JPH08147112A (ja) ディスクアレイ装置のエラー回復装置
JP2002123372A (ja) キャッシュメモリ付きディスクアレイ装置及びそのエラー制御方法並びにその制御プログラムを記録した記録媒体
JPH11154058A (ja) ディスクアレイ装置及びデータ保守方法

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase