CA2127380A1 - Computer memory array control - Google Patents

Computer memory array control

Info

Publication number
CA2127380A1
CA2127380A1 CA002127380A CA2127380A CA2127380A1 CA 2127380 A1 CA2127380 A1 CA 2127380A1 CA 002127380 A CA002127380 A CA 002127380A CA 2127380 A CA2127380 A CA 2127380A CA 2127380 A1 CA2127380 A1 CA 2127380A1
Authority
CA
Canada
Prior art keywords
data
memory
buffer
memory units
host computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002127380A
Other languages
French (fr)
Inventor
Andrew James William Hill
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARRAY DATA Ltd
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2127380A1 publication Critical patent/CA2127380A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/18Error detection or correction; Testing, e.g. of drop-outs
    • G11B20/1833Error detection or correction; Testing, e.g. of drop-outs by adding special lists or symbols to the coded information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B2020/10916Seeking data on the record carrier for preparing an access to a specific address

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A computer memory controller for interfacing to a host computer comprises a buffer memory (26) for interfacing to a plurality of memory units (42) and for holding data read thereto and therefrom.
A central controller (22) operative to control the transfer of data to and from the host computer and the memory units (42). The buffer memory (26) is controlled to form a plurality of buffer segments for addressably storing data read from or written to the memory units (42). The central controller (22) is operative to allocate a buffer segment for a read or write request from the host computer, of a size sufficient for the data. The central controller (22) is also operative in response to data requests from the host computer to control the memory units (42) to seek data stored in different memory units (42) simultaneously.

Description

'\~VO 93/1445~i 212 ~ 3 8 ~ PCrtGB92/02291 .
COMPUTER_MEMORY ARRAY CONTROk ~. `

'- ., ,. . ~, ~ '' ~', ~,'.' '"'' .: :
This invention relates to computer memorias, and in particular to a controller for controlling and a method of controlling an array of memory units in a computer. : :
For his~h performanc~e Operating Systems and Fileservers, an idealistic computer memory would be a memory hav~ng no .~.
requirement to "seek" ~the data. Such a memory would have instantaneous~acoess to all data ar~as.:~SucA :a~;memory~could be provided`~by a~:RAM~disk. :This would provide: for~accass to data ràgardless~ oL~whether it was~sequen~ial~or~ random;~ in its distribution ~in ~:the~ memory.~ ~However~,~:the ~use~ of ~ is disadvantageous compared to the use :~of~ conv ntional magnetic~
disk drive storage media~ in ViQW of the~high :cost o~ RAM:and .
e~peci:ally due to t he additional high 08t of pro~iding "redundancyl' to compensate- for failure~of~memory units.
Thus ~;the;~::most~ com~only ~: us-d :~non-volatile~ computer memories~ar~ ma~netic:disk dr es.~However,~ these disk drives:
sufferifrom thé~disadvantage~ at~ ~ ey:re ~ ire~a~period~of~time~
to position~;the head~or~heads~with the :correct part of;the~ disk: :~
corresponding to ~ e~location ~of ~ ~dat .~ miS~ is~te d ~e~
seek:~: and~ rotation delay.~ This ~delay~::becomes a ~ignificant port~ion oS the data access~ time~when~only;a small amount 'of data is~to be~read~or~ itten.~to~or~Srom~.~ e~di6k..
Fo~ disk~ drives:;,~:the~seèk~and-`:r~tational laten~y times can~considerably~limit ~the~-op r tin ~ ~ d~ of~a~c ~ er:. ~ e ~
input~ou ~ ~(I/O)~speed of disk:~drives~has ~not :kep~ pace ~with :
:the-~devélopment of~microprocessors and~therefore: memory access~
time::can~severely~restrain the performance of modern computers.

W093/14455 PCT/GB92/022~
2127~80 - 2 ~
" `'' ;'' ' '' ', ..-' In order to reduca the data access time for ~ large memory, a number of~ ~industry standard relativity inexpensive disk drives have been used. Since a large array of thes~ is used, some redundancy must be incorporated in the array to compensate for disk drive failure.
It is known to provide disk drives in an array of drives in such a way that the contents of any one drive can, should that drive fail, be reconstructed in a replacement drive from the infor~ation stored in the other drives.
Various classifications of arrangements that can perform this are described in more detail in a paper by D.A. Patt-rson, G. Gibson and R.H. Katz under the title "A
Case for Redundant ~Arrays of Inexpensive Disks (RAID)", Report No. UCB/CSD 87/391 12/1987, Computer Science Di~ision, University of California, U.S.A.
This document describes two types of arrangements.
The f~rst of these arrangements i8 particularly adapted for large scale data transfer and i8 termed t'RAID-3". In this arrangement at least three disk drives are provided in which seguential bytes of infor~ation are stored in~the same ~logical~ block ;positions on th~ drives~, one drive ~having~ a~ check~ byte created by a control}er ~wxitten thereto~,~which enables any one of the other bytes on the disk drives to be determined from the check byte and the ~other~byt-s. The`term "RAID-3" as u ed hereinafter i~ as i defined by the forego~ng passage.
~ In ~he RAID-3 ~arrangement there is~preferab}y at least-~five disk~ drives,~ with~ four bytes being written to the ;first four drives and the check byte being written to the fifth drive, in the same logical block position as the data bytes on the other drives. Thus, if any drive fails, each byte stored on lt can be reconstructed by reading the . ~ ~
~: .
~: :

'VO93/1~55 212 73 8 0 PCT/GB92/02291 other drives. Not only can the computer be arranged to continue to operate despite failure of a disk drive, but also the failed disk drive can be replaced and rebuilt without the need to restore its contents from probably out-of-date backup copies. Moreover, even if one dri~e should fail, there is no loss of per~ormance of the computer while the failed disk drive remains inactive and while it is replaced. A disk drive storage system having the RAID-3 arrangement is described in EP-A-0320107, the content of which is incorporated herein by reference.
The second type of storage system which is particularly adapted for multi-user applications, is termed "RAID-5". In the RAID-5 arrangement there are preferably at least f'ive disk drives in which four sectors of each disk drive ,are arranged to store data and one sector stores check :information. The check information is derived not ~rsm the data in the four sectors on the disk, but from designated sectors on each of the other four: disks.
Consequently each disk can be rebuilt from the data and check infbrmation on the remaining disks.
RAID-5 is seen to be advantageous, at least in theory, because lt allows multi-user access, albeit with equivalent transfer performance of a single disk drive.
~ : : However, a write of~ one sector of information involves writing to two disks, that is to say writing the : information to one sector on one disk drive and writing : check~ information to a check~ sector oni a ~econd ! disk drive. Howe~er, writing the~ check sector is a read~modify write operation, ~hat is, a read Qf the existing data and ~check sectors first,: beoause the old contents of thcse :sectors must be known before the correct check information, .~ .
based on the new data to be written, can be generated and written to disk. Nevertheless, RAID-5 does allow simultaneous reads by multiple users from all disks in the ~ :' ~-`: :,' .~ . ~ - .

212 7 3 8 ~ PCT/GB92/0229~
^ - 4 -system which RAID-3 cannot support.
On the other hand, RAID-5 cannot match the rate of data transfer achievable with RAID-3, because with RAID-3, both read and write operations involve a transfer to each of the five disks (in five disk ~ystems) of only a quarter of the total amount of in~ormation transferredO Since each referral can be accomplished simultaneously the process is much faster than reading or writing to a single disk particularly where large scale transfers are involved.
This is because most of the time taken to e~fect a read or write in respect of a given disk drive, is the time taken for the read/write heads to be positioned with respect to the disk, and for the disk to rotate to t~e correct angular positoin. Clearly, this is as long for one disk, as it is for all four. But once in the correct position, transfers of large amounts of sequential infor~ation can be effected relatively quickly.
Moreover, with the current trend for sequential information to be requested by the user, ~AID-5 only o~fers multiple user access in theory, rather than in practice, because requests for sequential information by the same user may involve reading several disks in turn, thereby occupying those disks so that they are not available to other users.
Further~ore,~ when a drive fails in RAID-5 format, the performance of the computer is se~erely retarded. When reading, if the required lnformation is on~a sector in the fail-d drive, it must bè derived by~ reading all four of the other `disks~. Similarly, when writing either check or information data to a working drive, the four wGrking disks must first be read before the appropriate information sector is written and before the appropriate check information is determined and written.

-W~0~3/l~SS 21 Z 738 o PCT/GB92/02291 A further problem with RAID-3 is that di~k drives are presently made to read or write minimum amounts of information on each given occasion. This is the formatted sector size of the disk drive and there is usually a minimum of 256 Bytes. In RAID-3 format this means that the minimum block length on any read or write is 1,024 Bytes.
With growing disk drive capacities t~e tendency is towards even larger minimum block sizes such as 512 Bytes, so that RAID-3 effectively quadruples that minimum to 2,048 Bytes.
However, many applications for computers, ~or example those employing UNIX version 5.3 require a minimum block size of only 512 Bytes and in this event, the known RAID-3 technique is not easily available to such systems. RAID-5 on the other hand does not increase the minimum data block size.
Nevertheless, it is the multi-user capability of RAID-S which makes it theoretically more advantageous than RAID-3; but:, in fact, it is the data transfer rate and cont~nued performance in the event of drive ~ailure in RAID-3 ~ormat which gi~es the latter much greater potential. So it is an object of the present invention to pro~ide a system which exhibits the same multi-user capability of a RAID-5 dis~ array, or indeed better capability in that respect. The inventor has previously developed a system which has been termed RAID-35 and which is disclosed in the specification of. PCT/GBgO/01557. This system offers the same if not bett~r performance as RAID-3 and RAID-5. This system recognises that with modern operating sys$em~, data files tend to be seguential in the nature of their storage on the disk drive surface and read and write operations tend to be sequential or at least partially sequential in nature. Thus even with multi-user access to a disk storage medium, each user may require some sequential data in sequential requests.
' :: " ..

.

.- :
, .. . . .... . . . ..

W093/1~55 PCT/G W2/022 2~2~3~

The RAID-35 system ~a5tly reduces the delay in a host computer receiving data re~uested from the disk array since sequential data is read-ahead and stored in buffer segments. Thus if the requested data is sequential to a previous request then there is no seek delay, since the data is present in the buffer segment.
The RAID-35 system ~is thus highly efficient for applications where users are likely to request sequential data. On the other hand if the data requests are random, the advantages of the RAID-3S system cannot be realised.
It is an object of the present in~ention to provide a computer memory~controller capable of providing a host computer with random data in a fast and efficient manner.
It is also an object of the present invention to provide a computer memory controller cap~ble of operating the RAI~-3~ arrangement and capable of being intQrfaced to a three di~ensional array memory units.
It is also an object of the present invention to provide a computer memory controller capable of operating the RAID-35 arrangemQnt as well as providing a host computer with random data in a fast and efficient mdnner.
~ he present invention provides a computer memory controller for interfacing to a host computer comprising a buffer means for interfacing to a plurality of memory units and for holding data read thereto and therefrom; and .
control ~eans oper~tive to control the transfer of data to and from said host co~puter and said memory units; said buffer means being controlled to form a plurality of buffer segments for addressably storing data read from or written to said memory units; said control means being operative to allocate a buffer segment for a read or write request from the host computer, of a size sufficient for the data; said control means being further operative in response to data requests from said host computer to control said memory units to seek data stored in different memory units ' : ' ":

`WO93/1~55 21 2 7 3 8 0 PCT/~B~2/02291 simultaneously.
The present invention also provides a method of controlling a plurality of memory units for use with a host computer comprising the steps of repeatedly r~ceiving from said host computer a read request for data stored in said memory units and allocating a buffer segment of sufficient size for the data to be read; and seeking data in said plurality of memory units simultaneously.
The present invention further provides a computer memory controller for a host computer comprising buffer means for interfacing to at least three memory channels arranged in parallel, each memory channel comprising a plurality of memory units connected by a bus such that each memory unit of said memory channel is independently accessible; respective memory units o f said memory channels forming a memory bank: a logic circuit connected to said buffer means to split data input from said host computer into a plurality of portions such that 5aid portions are temporarily stored in a buffer segment before being applied to ones of a group of said memory channels for ~torage in a memory bank; said logic circuit being furth~r operative to recombine portions of data successively read from successi~e ones of a group of said memory units of a memory bank~ and into said~ buffer means; said logic circuit including parity~ means~ operative to generate:a check byte or group of bits from said data for temporary storage in :said buffer means before being stored in at least onel sai~
memory;unit:of said:~memory~ bank, and operative to use said check byte to regenerate said~dat read~from said group of memory units of::a ~memory bank if one of said group of memory units :fails; said buffer means being di~ided into a number of channels corresponding to the number of memory channels, each said channel being divided into associated portions of buffer segments; and a control means operative ~ , ~ -,, WO93/1~5S PCT/GB92/0229~-.

~6~ 3 to control the transfer cf data and check bytes or groups of bits to and from said memory banks, including allocating a buffer segment for a read or write request from the host computer of a sufficient size for the data, and controlling said memory banks to seek data stored in di~ferent memory banks simultaneously.
The present invention still ~urther provides a computer storage system comprising a plurality of memory units arranged into a two dimensional array having at least three memory channels arranged in parallel, each said memory channel comprising a plurality of memory units connected !by a bus such that each memory unit is independent]Ly accessible; respective memory units of said memory channels forming a memory bank; and a controller comprising buffer means interfaced to said memory units and ~or holding information read from ~aid ~emory channels;
said buffer means being controlled to form a plurality of buffer segments for addressably storing data read from or written to said memory units; a logic circuit connected to said bu~er means to recombine bytes or groups of bits read fro~ ones of a group of said memory units in a memory bank, parity means operative to use a check byte or group of bits read from~one of said memory units in said memory bank to regenerate information read from said group of memory units if one of said group of memory units fails; and control means for controlling the transfer of data to and from said host computer and said memory units, including allocating a buffer segment for a read or write request from the ~host computer of a sufficient size for the data, and controlling said memory banks to seek data stored in diff~rent memory banks simultaneously.
Conveniently the system of the present invention can be termed RAID-53 since it utilises a combination of R~ID-3 and RAID-5 to provide for fast random access. RAID-53 like ~093/1~55 212 7 3 8 0 PCT/GB92/02291 _ g ~

,''' ~' RAID-5 allows for simultaneous reads by multiple users from all the disk banks in the-system whilst also reducing the read time since the data i5 split between a number of disks :
which are read simultaneously.
In order to increase the speed of access to data ` ~.
stored in the disk array using RAID-53 the disk banks can be addressably se~mented such that respective segments on . ~;
sequential banks have a sequential address. This allows sequential data to ~e written to segements on sequential . :
banks and thus distribute or "stripe" the data across the :~.
memory banks. ~his technique is termed hereinafter ::-"overlay bank stripping".
This organisation of data on the disk array is : controlled by the controller and not the host computer.
The c~ntroller assigns addresses to segments o~ the disk ... .. . .
banks in such a way that when data is written to the disk . .
array it is striped across the ban~s.
This strlpping of the data is also applicable to RAID-35 and will allow~ data to be read or stored on .: .
different banks simultaneously. .
Preferably the memory units are disk dri~es and there are five per memory bank, i.8. five memory channels, one disk containing the~: check lnformation, four disks containing the data. If the currently standard disk dri~e interface SCSI-l (Small Computer Systems Interface) is used then since this has an eight address limit, one of which will be~ used by a~ontroller, seven memory banks can be used~. Alternatively if SCSI-2 is used then 15 banks can be used.~ The present invention is not however:limited to the use: of :such an interface and any number of memory banks :could be used. In fact the more memory banks that are present, the more that can be simultaneously undertaking a seek operation, thus reducing data access time for the host computer.
" :~~~.':

WO93/1445S PCT/GB92/022~

-- 10 -- . , ~, 3~
.

Preferably for optimum performance, the disk drives of a memory bank have their spindles synchronised.
This combination of RAID-3 and RAID-5 provides a simultaneous random access facility with a performance in excess of the theoretical maximum per~ormance of RAID-5 systems with five slave ~us drivee. In addition the performance penalties of Read-Modify-Write characteristics of RAID-5 systems are avoided. What is provided is a fast and simple RAID-3 type Read/Write facility.
The RAID-53 system also sustains maximum transfer rate under a "single" disk drive failure condition per "bank" of disk drives.
During busy I/o requests the control means can queue host data requests for memory banks and carry out the~data seek and transfer when the memory bank containing the requested data is not busy. Proferably the order in which these seeks take place is optimimsed to provide optimised seek ordering.
Pre~erably, when a write reguest is rec~ived by the controller, it can effect the immediate writing of the data to a memory bank to the detriment of any pending read or write reguests. This prevents any important data being lost due~ to power failure for instance when the data normally would be held in a buffer segment. ~
In~ à~preferred embodiment which incre~ses the number of memory banks considerably, a number of buffer means, logic circuits ~and parity means are provide~ogetheriwith a ~number~ of ~associated~ two dimensional arrays of memory units. In this arrangement the control means is operative to~ control the ~ransfer of da~a to and~ from the host computer and the threé dimensional array of memory units formed of layers of the two dimensional arrays.
The hardware utilised for the RAID-3S system of PCT/G890/01557 can be the same as that used for the ,... : .
:.' ~

~093/1~55 212 7 3 8 0 PCT/GB92/02291 ~

RAID-53. Thus it i5 possible to provide RAID-35 and RAID-53 as options for the same hardware or they can be provided together and will share the hardware. In one shared system, a first portion of the buffer means is allocated for RAID-53. The remaining buffer memory is allocated for RAID-35 use. The memory banks can be shared or a number cf them can be allocated for RAID-35 and the rest for RAID-5 The R~ID-3S operation is as follows. The transfer of sequential data to the host computer in response to requests therefrom is controlled by first addressing the buff~r segments in the allocated part of the buffer means to establis;h whether the requested data i5 contained therein and if so supplying said data to said host computer. If the r~quested sequential data is not contained in the buffer segments of the allocated portion of the buffer means, data is read from the memory units and supplied to the host computer. Further data is read from the memory units which is logically sequential to the data requested by the host computer and the furth~r data is stored in a buffer segment in the allocated portion of the buf~er means. The control means also controls the size and number of buffer segments in the portion of the buffer means allocated for RAID-35 usage.
!' ~ The array of disk drives provided by the RAID-35 and RAID-53 systems provide redundancy in the event of disk drive failure. In one embodiment of the 'invention there can ~also be provided redundancy in controllers. If a ; second controlIer is provided at~a different address on the buses of the array then in the event of a failure of the ~main controller, the auxiliary controller can be activated with little or no down time of the system. The controller can then be repaired or replaced whilst the system is still running.

WO93/1~3S~ ~ - 12 - PCT/GB92/0229 The present invention also provides a plurality of buffer means each for interfacing a plurality of memory units arranged into a two dimensional array having at least three memory channels, each memory channel comprising a plurality of memory units connected by a bus such that each memory unit is independen~y accessible; respective memory units of said memory channels forming a memory bank; a plurality of logic circuits connected to respective said buffer means to recombine bytes or groups of bits read from ones of a group of said memory units of a memory bank and stored in said buffer segments to generate ~he reque~ted data; saidl logic circuits each including parity means operative lto use a check byte or group of bits read from one of said memory units of said memory bank to regenerate data read from said group of me~ory units if one of æaid group of memory units fails; said buffer means being divid~d into a number of channels corresponding to the numb~r o~ memory channels, each channel being divided into associated portion of buf~er segment~; and control means operative to control the transfer of data from a three dimensional array o~ memory units formed from a plurality of said two dimensional arrays to said host computer in response: to requests there~rom by first addressing said bu~er:segments to establish whether the~ requested data is contained :therein~and if~ so :supplying said data to said host c~mputer, and if the requested data is not contained in the ~buffer segment~, reading said data !from the memory units, ~supplying said data to said host computex, reading from said~memory: units further data which is logically sequential:to the ~ata reguested by said host computer and storing said further~ data in a buffer segment; said control means further controlling said buffer means to control the number and size of said buffer segments~
: ''' '` . '~':

: ~ ' ~093/14455 - 13-- 2 7 3 8 0 pcT/GB92/o22s1 . ... :.
-~

In this RAID 35 arrangement a three dimensional array of disk drives is provided to increase storage capacity.
Although at present the most csmmonly form of redundant array of inexpensive disks used utilises magnetic disk drives, the present invention is not limited to the use of such dis~ drives. The present invention is equally applicable to the use o~ any memory device which has a long seek time for data compared to the data transfer rate once the data is located. Such media could, for instance, be an optical compact disk.
Thus such an array, according to the present invention, provides large scale storage of information together wi*h the faster data transfer rates and better performance with regard to multi-user applications, and security in the event of any one drive failure (per bank).
Indeed, the mean time between failures tMTBF) of such an array (when meaning the mean time between two simultaneous drive failures (per bank), and which is required in order to result in information being lost beyond recall) i5 measured ln many thousands of years with presently available disk dri~es each having individual MTBFs of many thousands of hours.
Examples of the present invention will now be described with reference to the accompanying drawings in wh~ich~

Figure 1 is a block - diagram of the controller architecture of a disk array~system~according - to one~embodiment of the~present invention.
Figure 2 illustrates; ;the operation of the data ~ ~ s~plitting haxdware.
Figure 3 illustrates the read/write data cell matrix.
Figure 4 illustrates a write data cell.
Figure 5 illustrates a read data cell.

~ . ~
.''; ',~

WO93/l~5i 0 - 14 - PCT/GB9Z/0229~-b, Figure 6 is a flow diagram illustrating the software steps in ~rite operations for RAID-3S
operation~i~
Figure 7 is a flow diagram illustrating the software ~ i steps in read operations for RAID-35 operation.
Figures 8 and 9 are ~low diagrams illustrating the software steps for read ahead and write behind for RAID-35 operation.
Figure 10 is a flow diagram illustrating the software steps invol~ed to restart suspended transfers for RAID-35 operation.
Figure 11 is a flow diagram illustrating the software steps involved in cleaning up segments for R~ID-35 operation. ~ ~-Figures 12 and~13 ; are ~low diagrams illustrating~ the steps invoIved for inpu~/output control for RAID-35 operation. -Figure 14 and 15 are flow diagrams illustrating the software -~
steps performed by the 80376 central i~
controller of Fisure 1 during RAID-53 eration.
Figures 16 to 19~ are~ flow~diagrams illustrating the~ software steps performed by the slave bus controllers of Figure l during RAID-53 operation.
Figure~20 is a block diagram of an embodi~ent of the present invention illustrating the ~ access points~ for RA}D-53~operation.
Figure 21 ~ illustrates~ a block ~diagram of a three , dimensional memory array according to one embodiment of the~present invention.

~093/1~55 - 1~ ~ 2 738 0 PCr/GB92/02291 ; ~;

.. ..
:..., ~, Figure 22 illustrates the use of a redundant controller according to one embodiment of the present invention.
Figure 23 illustrates the distribution of da~a in segments within the array using the technique of overlay bank stripping.
Figure 1 illustrates t~e architecture of the RAID-35 and RAID-53 disk array controller, and ini~ially both systems will be considered together.
In Fig~re 1 of the drawings the internal interface of the computer memory controller 10 ~s termed the ESP data bus interface and the interface to the host computer is termed the SCSI interface. These are provided in interface 12. The SCSI ~us interface~communicates ~with the host computer (not shown) and the ESP interface communicates with a high~pe~ormance direct memory~ a~cess (DMA) unit 14 ~in~a host interface section 11 of ~he computer ~ memory controller 10. The ESP interface is 16 bits (one word) wide.
~ he host interface section communicates wit~ a ~entral bu~fer management tCBM) section 20 which comprises a central controller 22, in the form of a suitable microprocessor such as the Intel 80376 Microproce~sor, and ~data~splitting and ~parity control (DSPC) logic circuit 24.
These~ ~perform ~the~ function ~ of ~splitting information received~from the~host~computer into four channels, and generating parity informatiQn ~for the fifth channel. The DSPC 24 ~also ~oombines ~the~ information on, the ~first !four channels~and, after~checking against ~he parity~channel~
transmits~the;~co~ ined ~i;nformation ~to~;the host ~co puter.
~Furthermore,~the~;DSPC~24 ~is able to;~reconstruct the info D ation~rom a~ny~one ohannel, should that be necessary, on~ the ~basis of the ~ information from the~ other four ~channels. ~
. .. ~ . .

W093/1~55 ~ PCT/GB92/0229,~

2~ 8 .
' The DSPC 24 is,-connected to a central buf~er 26 which is divided into~ f ive channels A to E, each of which is divisible into buffer segments 28. Each central buffer channel 26,A through 26,E have the capacity to store up to half a megabyte of data for example, depending on the application required. F~r ~RAID 35, each segment may be as small as 128 kilobytes for example so that up to 16 segments can be formed in the buffer. For R~ID-53 each segment will be as small as the minimum data request ~rom the host computer.
The central buffer 26 communicates with five slave bus controllers 32 in a slave bus interface (SBI) section 30 of the memory controller lo.
Each slave bus controller 32,A through 32,E
communicates with up to seven disk drives 42,0 to 42,6 along 5CSI~1 buses 44,A through 44,E so that the drives 42,0,A through 42,0,~ form a bank O, of five disk drives and 50 also do drives 42,1,A through 42,1,E etc. to 42,6,A
through 42,6,E. The ~even banks of five drives effectively each constitute a single disk drive, ea¢h individually and independently accessible. This is made possible ~y the use of SCSI-1 buses, which allow for eight device addresses.
One address ~is taken up by the slave bus controller 32 whilst the seven remaining addresses are available for seven~disk~ drives.~ Thus for the RAID-35 system the stirage capacity of each channel can therefore be increased sevenfold and the slave bus controller 32 is able to access ~any one~ of the disk~ drives 42 in the channel independently. The use of more than one bank of disk drives is~essential for the realisation of the advantage of ~ID-53~operation.
This arrangement of banks of dis~ drives is not only applicable to the arrangement shown in Figure 1, but is ' :'',~''".''""

.

, ~NO93~1~55 21 2 7 3 8 0 PCT/GB92/02291 also applicable to the RAXD-3 arrangement. Information stored in the disk drives of one bank can be accessed virtually simultaneously with information being accessed from the disk drives of another bank. This arrangement therefore gives an enhancement in access speed to data stored in an array of disk drives.
In so far as the host computer is concerned, its memory consists of a number of sectors each identified by a unique address number. Where or how these sectors are stored on the various di~k drives of the memory 40 is a matter of no concern to the host computer, it must merely remember the address of the data sectors it requires. o~
cours~, addresses themselves may form part of the data stored in t:he memory.
On the other hand, one of the functions o~ the central controIler 22 is to store data on the various disk drives efficiently. Moreover each sector in so far as the host is concerned, is split between four disk drives in the known RAID-3 format. Under RAID-35 operation, the central controller 22 arranges to store sectors of information passed to it by the host computer, in an ordered fashion so that a ~ector on any given disk drive is likely to contain information which logically follows from a previous adjacent sector.
~ To optimise performance, the disk drives of a bank should have~their spindles synchronised.
O~eration unde~ RA1~-35 ~ When~ the host computer requires data, the read request is received by the central controller~22 which passes the request to the slave bus inter~ace (SBI) controller 32. The slave bus control 32 reads the disk banks 40 and selects the appropriate data from the appropriate banks of disks. The DSPC circuit 24 receives WO93/1~55 pcr/~B92/o22 ~,~,2~ 38~ ' ' ' the requested data and checks it is accurate against the check data in channèl E.
If there is any error detected by the parity check the controller may automatically try to re read the data, if a parity error is still detected the controller may return an error to the hos~ computer. If there is a faulty drive this can be isolated and the system arrang~d to continue working employing the four good channels, in the same way and with no loss of performance, until the faulty driYe is .replaced and re~uilt with the appropriate information.
Ass~ing however that the data is good, the central controller 22 first responds to the data read request by transferrin~ the information to the SCSI-l~ interface~ 12.
~owever, it also instructs further information logically seguential:to the reguested information to be read. This is ter~ed "read ahead information". Read ahead information up to th~ capacity presently allocated by the central controller 22 to any one of the data buffer segments 28 is then stored in one buffer segment 28.
When the host computer makes a further request for information, it is likely that the information requested will follow on from the information previously requested.
Consequently, when the central controller 22 receives a read request,~it first interrogates those buffer segments 28 to determine:if the requir~d information i8 already in : the buffer. If theiinformation is~ there, then the central :controller 22 can respond to the user request Lmmediàtely, without having to read the:disk drives. This is obviously : a much faster procedùre and avoids the seek delay.
` on those occasions when the required information is not already in the buffer, then a new read :of the disk drives is required. Again, the requested ~nformation is passed on and sequential read ahead information is fed to ~~093/1~55 PCT/GB92/02291 another buffer segment. This process continues until all the buffer segments are filled and the system is maintained with its segments permanently filled. Of course, there comes a point when all the segments are filled, but still the disk drives must be read. It is only at this point that a buffer segment is finally deallocated by the ~entral controller 22, by keeping note of which buffer segments buffers 28 are or ha~e been used most frequently, and dumping the most infrequently used one.
During the normal busy operation of the host computer, the central controller 22 will have allocated at least as many buffer segments 28 as there are application programs, up to the maximum number of segments available.
Each buffer segment will be kept full by the central controller 22 ordering the disk drive seek co~mands in the most efficient manner, only over-riding that ordering when .
a buffer segment has been, say 50% emptied by host requests or when a host request cannot be satisfied from existing buffer segments 28. Thus all buffer s~gments are kept as full as possible with read ahead data.
To write information to the disk drives, a similar procedure is followed. When a write instruction is received by the central controller 22 information is split by DSPC circuits 24 and appropriate check information created.~ The five resulting components are placed in allocated write buffer segmentsO The number of~ write buffer segments Imay be p~eselected, or may be dynamically ~allocated as and when required. In any event, write buffer segments~ are protected against de-allo~ating until its information has been written to disk. Actual writing to disk ~is only éffected under instruction from the host computer~, if and when a segment becomes full and the system cannot walt any longer, or, more likely, when the system is WO93/1~5S PCT/GB92/0229~-212~73g~

idle and not performing any read operations.
In any event, simultaneous writes appear to be happening in so fa'r as the host computer is concerned, because the central controller 22 is capable of handling commands vPry rapidly and storing writes in buffers while waiting for an opportunit~ for the more time consuming actual writing to disk drives.
This does not mean however, that in the event of power failure, s~me writes, which the user will think have been recorded on disk, may in fact have been lost by virtue of its temporary location in the random access buffer at the time of power failure. In that event a restored disk drive syst~m from back-up copies is required.
. Alternatively, a hardware switch can be provided to ensure that all write instructions are ~ effected immediately, with write inf ormation only being stored in the buf~er segments transiently befor~ being written to disk. This removeæ the ~ear that a pow~r los~ might result in data being lost which was thought to have been written to d~sk although not actually effected by the m~mory system. There is still however, the unlikely exception that in~ormation may be lost when a power loss occurs ve~y shortly after a user has sent a write command, ~ut in that event, the user is l~kely to be conscious of the problem.
If this alternative is utilised, it does of course affect the performance of the computer.
Oneration under R~IDo53 When the host computer requires data a request is recei~ed and a buffer segment allocated for that data. The read request is re~eived by the.central controller 22 which passes the request to the lave bus controller 32. The slave bus controller 32 reads the disk banks 40 and selects the appropriate data from the appropriate banks of disks.
The DSPC circuit 24 receives the requested data and checks ~V093/1~55 PCT/GB92/02291 - 21 - 2127~8~ ~:

it is accurate against the check data in channel E.
If there is any error detected by the parity check the controller may automatically retry to read the data.
If a parity error is still detected the controller may return an error to the host computer. If there is a faulty drive this can be isolated and the system arranged to continue working employing the four good channels, in the same way and with no loss o~ performance, until the faulty drive is replaced and rebuilt wi~h the appropriate information.
Assu~ing that the data is good the central controller 22 responds to the data read request by transferrins~ the data to the SCSI-l interface 12, and then de-allocating the bu~fer segment. The disk bank is then free to accept another read request and can commence a seek operation under the command of the central controller 22.
The ~ize of the buff~r segments is determined by the size of the data requested by the host computer. No data is read ahead from the disk drives.
The central controller 22 is thus able to receive the read requests and determine in which disk bank that data lies. If the disk bank is idle then the disk bank can be instructed to seek the data. Simultaneously the other disk banks may be seeking data requested by the host computer at an earlier date, and once this has been Iocated the central controller 22 can read the disk ban~ and pass the data ~to the~buffer segments ~or reconstruction,ifrom where~it~is passed to the SCSI-1 interface 12.
; Figure 14 illustrates the seven access points to the ~seven disk banks. Each disk drive of each bank has a ~uni~ue bus (SCSI) address and can thus be accessed independently by the computer memory controller 100. Thus up to seven disk banks can be operating simultaneously to seek data requested by the host computer. Whi~ e a disk ' . ~.

W0 93/1445~; PCI~/GB92/022g~
_2~2~3~Q - 22 - ; ~

bank is seeking it is disconnected from the SCSI~
interface. When the data is located this is indicatéd to the central controll~r 22 which can then read the data.
If a disk bank is busy when a new read request is received then the central controller 22 can qu2ue these requests. To provide &n ~optimised s~ek ordering, the queued read requests may not necessarily be performed in the order in which the host computer issued the commands.
Such queuing of read requests could also be performed on the slave bus controllers 32.
For write operations very much the same thing happens. However the central controller 22 is provided with the ~:apability of "forcing" the in oming data to be "immediately" written to the required bank of disk drives, rather than being queued with pending Read/Write commands.
This ensures that data thoughk by the host computer to be written to disk: is so written, in case of for instance power failure where any data to be written to the disks that is stored in the buffer memory 26 would be lost.
~eta~led O~eration o~ Hardware fQE-~oth RAID-3S and RAID-53 ~ he deta~led operation of the hardware data splitting, parity generation and checking logic, and buffer interface logic will now be described with reference to Flgures:Z~to~5 ~or both RAID-35 and RA D 53.
~ eferring to Figure 2, the controllers internal interface to the host system hardware interface is 16 bi~s (one word) wide. This is the ESP data bus. For every four words of sequential~host: data, one 64 bit wide slice of internal: buffer data is formed. At the same time, an additional word or 16 bits of parity data is formed by the controller; one parity bit for four host data bits. Thus the internal width of the controller's central data bus is -~093/1~5 212 7 3 8 0 PCT/GBg2/02291 ., - ~ ,~ .
- 23 ~
, :' 80 bits. This is made up of 64 bits of host data and 16 bits of parity data.
The data splitting and parity logic 24 is split up into 16 identical read/write data cells within the customised ASICS (application specific integrated circuits) design of the controller. The matrix of these data cells are shown in Figure 3. Each o~ these data cells handles the same data bit from the ESP bus for the complete sequence of four ESP 16 bit data words. That is, with -~
reference to Figure 2, each data cell handles the same bit from each E,SP bus word 0,1,2 and 3. At the same time, each data cell generates/reads the associated parity bit for ;
these four 16 bit ESP bus data words.
; For explanation purposes, only the first data bit 0 (DB0) will be described. Data bits DB1 through DBl5 will~
be identical in operation and descriptlon.
Four basic operations are performed, namely I. Wr~ting host dat~
2. Reading of data to the host ; ~-3. ~ Regeneration of "single failed channel" data during host read operations.
4. ~ Rebuilding of data on a failed disk drive ::unit. ~ :~

Writina~of host data to the disk drive arrav Referring now to Figure 4, as the corresponding data bit~from each host 16 bit word~;is received on the ESP data bus,~each~of these four bits is temporarily stored/latched in~devices~G38 through G41. As each bit appears on the ESP
bus,~ it is~ steered through the multiplexor under the control of the two select lines to the relevant D-type latches G33 through G36, commencing with G33. At the end ,;-' ~ ''', WO~3~1~55 PCT/GB92/022 - 24 ~
2~ 3~

of this initial operation, the four host 16 bit words (64 data bits) will have been stored in the relevant gates G38 through ~41 within all 16 data cells. The four DB0 data bits are now called DB0-A through DB0-D.
During the write operations, the RMW (buffer read modify write) control signa~ is set to select input A from all devices G38 through G42. Under these situations, the rebuild line is not used (don't care). : - .
As each bit is clocked into the data cell, th~
corresponding parity data bit is generated ~ia G31, G32, and ~37. At the end of the sequence o~ the four bit 0's from each of the four incoming ESP bus host data words, the resultant parity bit will have been generated and stored on device G42. This is accomplished as follows. As the first bit-O (DBO-A) appears on the signal DBO, the INIT line is driven high/true and the output fro~ ~he gate G31 i8 driven low/o~f. Whatever value is presen~ on DB0 will appear on ~:
.
the output of gate G32, and at the correct time will be clocked into the D-type G37. The value of DB0 will now appear on the Q output of G37. The INIT signal will now be driven lowjoff, and will now aid the flow of data through G31 for the next'incoming three data bits on DBO. Whatever value was stored as DB0-A on the output of gate G37 will now appear on the output of gate G31, and as the second DB0 bit (DBO-B) appears on the signal DB0, an Exclusive OR
value of these two bits will appear on the output of gate G32. At the appropriate time, this new! value will be clooked into: the de~i~e G37. At the end of the clock cycle,~ the resultant Q output~ of G37 will now:be the Exclusive OR:function of DB0-A and DBo-B. This ~alue will now be stored on devic~ G4Z.: The above operation~ will continue as the remaining two DBO bits (DBO-C and DBO-D) appear on the signal DBO. At the end of this operation, the accumulative Exclusive OR function of all bits DBo-A
~.""'~' ~ ~;, . .

,.' ' '~ . ' '':

:

WO93/1~5~ 2 I 2 7 3 ~ O PCT/GB92/02291 through DB0-D will be stored on device G42, and at the same time, bits DB0-A through D80-D will be stored on devices G38 through G41 respectively.
The accumulative Exclusive OR (XOR) value of DB0~
through DB0-D is generated in this manner so as to preserve buffer timing and synchronisation procedures.
The five outputs DB0-A through DB0-~ are present for all data bits 0 through 15 of the four host data words.
The total of 80 bits are now stored in the central buffer memory (DR~M). The whole procedure is repeated for each sequence o~ four host data words (8 host data bytes).
As each "sector" o~ slave disk drive data is assembled in the central buffer, it is written to the slave disk drive~s (to channel A through channel E) within the same bank of disk drives.
If a failed slave channel, or disk drive exists, then the controller will mask out that drive's data and no data will b~ written to that channel/disk drive. However, the data will be assembled in the central buffer in the normal manner.
Readina of arraY dis~ drive data to the host s~stem Referring now to Figure S, in response to a host request, data is read from the disk array and placed in the central buffer memor~ 26. Also, in the reverse procedure to that~for write operations, the ~0 bits of central buffer data are loaded into devices G10 through G14 for each bit (4 data bits and 1 parlty bit). Again we will! only consider DB0. The resulting five bits are ~DB0-A through DB0-E. ~All~read operations are checked for correct parity by~ regenerating a new parity bit and comparing this bit with the~bit read from the slave disk drives.
Initially, the case of a fully func ioning array will be considered with no fauIty slave disk drives. In this case all mask bits (mask-A through mask--E) will be WO93/1~55 . ~ PCT/GB92/022 - 26 ~
~,12~3~Q

low/false, and all bits from the central buffer 26 will appear on the outputs of devices GlO through Gl4 via "A" .
inputs. Also, all data bits will appear on the outputs of -:~
devices G6 through G9 via their "A" input~. After the -central buffer read operation, the four data bits will simultaneously appear on ~h~ outputs of devices G6 through G9. In the reverse procedure to that for write operations, all data bits D~0-A through DBO-D will be reassembled on the ESP data bus through the mutilplexor under the contxol of the two select lines. As the data bits are read from the central buffer 26, the parity data bit is reganerated by the Exclusive OR gate G4 and compared at gate G2 with ;:~
the parity data read from the slave disk drives at device ~ :
Gl4.. If a difference is detected, a ~MI "non~maskable interrupt" is generated to the master processor device via gate G3. ~ll read operations will terminate immediately or ~ ~`
the -controller may automatically perform read re-try ~.
procedures. ` ;~ -Gate GS suppresses the effect of the parity bit DB0-E from the generation of the new parity bit. Gate Gl :
will suppress NMI .operations if any slave disk drive has ..
failed and the resultant mask bit has been set high/true. `;.~;
Also, gate Gl, in conjunction with gate G5, will allow the read parity bit DBO-E to be u~llised in the regeneration .. ~
process at gate G4, should any channel have failed. .~:

Réaeneration o "sinal~e~fa111ed chan~el" data durin~_host~read operations~
Referring to ~Figure 5, the single failed~ disk ~drive/channel will have its ma~k bit set high/true under the ~direction :of the controller software. The relevant gates wi*hin G6 through G9 and Gl0 through Gl4 for the . .. -:
failed~channel/drives will have their outputs determined by ~ : .
their "B" inputs, not their "A" inputs. Also, Gl will -:' ,' ". ' '., " ~''~''",.
-., :~ . ~-.

--W093/1~55 2127380 PCT/G~92~02291 - 27 ~

.-' ,:

suppress all NMI generation, and together with gate G5, will allow parity bit DB0-E to be utilised at gate G4. In this situation, the four valid bits from gates Glo through G14 will "regenerate" the "missing~' data at gate G4, and the output with gate G4 will be fed to the correct ESP bus data bit DB0 via a "B" input at the relevant gate G6 through Gg.
For example consider the channel ~ disk drive to be faulty, and mask bit mask-C will be driven high/true. The output of .gate G12 will be driven low and will not contribute to the output of gate G4. Also, the output of gate G1 will be driven low/false and will both suppress NMIs, and will allow signal DB0-E to be fed by gate G5 to gate G4. Gate G4 will have all correct inputs from which to regenerate the missing data and feed the data :to the output of ~device G8 via its "B" input. At the correct time, this bit will be ~ed through the multiplexor to DB0.

Re~uildina.o~ data on a failed disk drive unit Referring now to Figures 4 and 5, to rebuild data, the memory controller must first read the data from the functioning four ~ disk drives, regenerate the missing drive's data, and finally write the data to the failed disk drive after it has been repIaced with a new disk dri~e.
: ~ With ~reference to Figure 5 and the example given above for "regeneration of single. failed. channel data ;~ ~ during`host read operations'i, under rebuild conditions'th'e `outputs~from~gates~:G6 through G9 will not-be fed to the ESP
: data bus.'~'-However, the:regenerated data at t~e output of ~ate~:G4`:will be fed to the "B" inputs of gates G38 through G42' of the ~write data: cell in Figure 4. Under rebuild conditions, the: RMW signal will be set high/true and the outputs of devices G38 through G42 will be determined by '. ~

W093/1445~ ~ PCT/GB92/0229~
. ~
_~273~o - 28 ~

the value of the rebuild data on signal rebuild. ~: All channels of the central buffer memory 26 will have their data set to the regeneratPd data, but only the single replaced channel data will be written to the new i : ;
disk drive under software control.

Detection of faultv channel/disk drive . ~ :
The detection of a faulty channel/slave disk drive :
is as per the following three main criteria~
1. ~he master 80376,processor detects an 80186 channel (array controller electronics) failure due to an ~ :
"interprocessor" command protocol failure. , , 2. An 80186 pro~essor detects a disk drive problem i.e. :-a SCSI bus protocol violation.
: 3. :An 80186 processor detects a SCSI bus hardware .~.
: error. : This is a complete channel failure situation, not just~a single disk drive on that SCSI ,~
bus.
.
After d~tection of the fault condition, the ",...... ,;.,;
channel/drive "masking" function is performed by the master 80376 microprocessor. .~. i:
: Under fault conditions, the masked out channel/drive ,....... ..;
is not written to or read from by the as~ociated 80186 .,.,~.,, ~channel~processor. ~

o~eration of Software for:RAID-35 Operation :....... ,:.,.:
Figure 6 through to 13 are diagrams illustrating the ;, . o~peration of the:software run.~y ~he central,controller 22.
Figure 6~ illustrates the steps undertaken during the -writing of: data to the~banks.of disk drives. Initially th~ .. ;,~.
software is operating in "backgrou~d" mode and is awaiting : , ....... ,~
: instructions. Once an instruction from the host is ~.
received indicating that data is to be sent, it is ,.-...... ,.`
detèrmined whether this is sequential within an existing .,.. ,,: ,`.
',:~ ': .' ;. ':~.
- ,... -: -~

..: .. . ~;
-',-;'"''', ,~093/1~5~ 21 2 7 3 8 0 PCT/GB92/02291 ~ ~ ~

segment. If data is sequential thcn this data is stored in the segment to form sequential data. If no sequential data exists in a buffer segment then either a new segment is opened (the write bPhind procedure illustrated in Figure 8) and data is accepted from the host, or the data is accepted into a transit buffer and queued ready to write into a segment. If there is no room for a new s~gment then the segment is found which has been idle for the most time. If there are no such segments then the host write request is entered into a suspended request list. If a segment is available it is determined whether this is a read or write segment. If it is a write segment then if it is empty it is de-allocated. If it is not empty then the segment is removed from consideration for de-allocation. If the segment is a read segment then the segment is de-allocated and opened ready to accept the host data.
The write behind procedure is illustrated in Figure 8 and if there arc any write segments open which need to be emptied, thQn a write request is qu~ued for the I/O handler for each open seg~ent with data in lt~
Figure 7 illustrates, the steps undertaken during read operations. r Initially, the controller is in a "background" mode. When a request for data is received from the~host computer, if the start of the data requested is~,already in~a read segment then data can be transferred ~from the central buffer 26 to the host computer. If the dàta is not already in the central buffer ,26, then it~is ascertained whether it is acceptable to read ahead information. If it is not acceptable then a read request is queued~. If data is to be read ahead then it is determined~ whether~there is room for a new segment. If there is then a new segment is opened and data is read from the drives to the buffer segment and is then transferred to the host computer. If there is no room for a new segment .

WO93/1~55` PCT/GB92/022 Z~2~ 3 8a ~ 30 -then the segment is found for which the largest time has elapsed since it was last accessed, and this segment is de-allocated and opened to accept the data read from the disk drives.
In order to keep the buffer segments 28 full, the read ahead procedure illusb~ated in Figure 9 is formed. It is determined whether there are any read segments open which require a data refresh. If there is such a segment then a read request for the I/0 handler for the segment is ~ `
queued.
Figure 10 illustrates the software steps undertaken to restart suspended transfers. It i5 first determined whether there are suspended host write requests in the ~; -list. If there is it is determined whether there is room for allocation of a segment for suspended host write requests. A new se~ent for the ho~t transfer is opened and the host request which has been suspended longest is determined and data is accepted from the host ~omputer into the buffer segment. -Figure 11 ~llustrates a ~orm o "housekeeping" ; -undertaken by the software in order to clean up the i segments in the central buffer 26. It is determined at a point that it is time to clean up the buffer segments. All the~read segments which have times since the last access time larger ~ than a predotermined limit termed the ~ -"geriatric limit" are found and reallocated. Also it is determined whether there~a;re any such write!segments and if ~so~wrlte operations are tidied up.
Figure 12~ illustrates ~the ~operation of the ~inputtoutpùt ~handIer, whilst~ Figure 13 illustrates the ~ ~
operation of the~input/output sub system. --All these procedures are performed by software which ~ ~`
may be run on the central (80376) controller 22 in order to eontrol and efficiently manage the transfer of data in the ,' . , ":, ,, ~093/1~55 21 2 738 o PCT/GB92/02291 buffex segments 28, in ordPr that the buffer 26 is kept as full as possible with data sequential to data requested by the host computer.

O~eration of Software for ~AID 53 Figures 14 through to l9 are diagrams illustrating the operation of the software run by the central controller 22 and the slave controllers 32 during RAID-53 operation.
Figure 14 illustrates the steps undertaken by the central controller 22 when selected as the SCSI target.
Once selected a command from the initiator (or host computer) is decoded and syntax checked. If a fault is detected t'he command is terminated by a check command status and the controller returns to background processing. If the syntax check indicates no errors then it is determined whether a queue tag messaqe has been receiv-d to assign a queue position~ If not and a command i5 already running a busy status is generated and the controller rQturns to background processing. I~ a command is not already running or if a gueue tag message has ~een received it is determined whether data is required with the command, If data is required then a buffer segment is allocated ~or the data and if the command is to write data then~data is received from the initiator into the allocated .
buffer~segment.~ If there~;is~no space available then a~
queue~full status is generated and the controller returns to background proces~ing. If the command is to~read1data or the command is to write data and data is received from the~;initiator into ~the allocated buffer then a command control block is allocated. If there is no space for this a queue full status is generated and the controller returns to background processing. If a command control block can be successfully allocated the appropriate command is issued ~"

, ~ , W093/1~55 PCT/GB92/022 to the slave bus controller 32 (an 80186 processor) and the command control tag pointer is passed as a tag. A
disconnect message is then sent to the initiator and the controller returns to background processing.
Referring now to Figure lS, this diagram illustrates the operation of the soft~are in the central controller when the slave bus controller responds to commands. Data can be read from the slave bus controller when the response available :interrupt is generated. The response information is read from the dual port RAMs (DPRAM) and the tag from this respons~ is used to look up the correct command control blcck. The receipt of a response from; the ~articular slave bus controller is recorded in the command control b].ock completion flags. It is then determined whether all of the slave bus controllers in the channels have responded and if not whether the command overall time~
out has elapsed. If the command overall tL~e-out has not elapsed then the central controller r~turns to background processing to read the channels which have not responded when they are available. If the command overall time-out has elapsed then a channel fault is recorded. It is then determined whether the command can be completed. If the command cannot be completed then a fatal error is reported and the processor returns to background processing. If the command can be completed or if all the channels have responded then it is d~termined whether the completion of the command requires a data transfer. If not, then the ~ini~iator that gave the com~and is reselected and passed the logical ~unit number (LUN) identity and queue tag messaqe. The centraI controller then returns to background processing awaiting an interrupt whereupon it returns a good status and then returns to background processing.
If the completion of the command does require a data transfer then it is determined whether there is a faulty 2~2738~
~093/l~S~ PCT/GB92/02291 disk in the bank of disks being accessed. If so, then the appropriate channel is masked to cause a reconstruction of the missing data. The initiator that gave the co~mand is reselected and passed LUN identity and queue tag message.
The central processor then goes into background processing until an interrupt is received whereupon a data in bus phase is asserted and data is transferred. The central processor then returns to background processing awaiting interrupt whereupon a good status is returned.
Figures 16a and 16b illustrate the operation of,the software by the slave bus controllers upon receipt of commands from the central controller. When the slave bus controller receives a command ~rom the central controller, the command is read from the DPRAN. The command is decoded and syntax checked and if faulty is reje¢ted. Otherwise, it is determined whether the com~and is a dat read or write request. If it is not then the command is analysed to determine if a memory buffer is required and if so it is allocated. If there is no buffer space then the process is suspended to allow the reading of data to continue. The process is resumed when space is available. Then an input/output queue element is constructed and set up according to command requirements. The queue element is then put into the input/output queue and linked onto the destination targets list.
If the command is,a data read or write request ,then it is ~dete D ined which targets are to be used. The array block ~address~ is then ~ converted to the target blo~k address. It is then determined if the data received is to be d~iverted (or dumped)~ or a read modify write is required. If the command is a read data request then it is determined whether the transfer crosses bank boundaries.
If not, then the input/output queue element is constructed : ~ :

WO93/1~55 PCT/GB92/0229~-, ~,~3a~

and set up for the single read. If the transfer crosses bank boundaries then an input/output link block is allocated and it is recorded that two reads are to be performed for this command. If it is determine~ that there i5 no space then the process is suspended to allow the back~round to continue and ~resume when space is available.
O~herwise the input/output queue element is constructed and set up to read the target and queue request. The input/output queue is also constructed and set up to read the target plus one and the request is queued. The slave bus controller then returns to background processing~
If the command is a data write request then as shown in Figure ~6b it is determined whether the transfer crosses bank boundaries. If not, it is determined whether any read modify writes are required. If ~o, an I/O link block is allocated or the operation suspended until space is available. I/O queue elements for each of the reads of one or two read modifying write sequences are constructed as required. An I/O queue element for the aligned part of the write is then constructed if required and the request is queued. The slave bus controller ~hen enters bac~ground processing.
If the trans f er of data does cross bank boundaries then it is determined whether the writes to the lower target requires a front read modify write. If so, the I/O
queue element for the read part of the read modify write is constructed (lower target) and a request is queued. ! ,The I/O ~queue element for the aligned write part of the transfer is then constructed (}ower target~ and the request i5 queued. It is then determined whether the write to the higher target requires a back read modify write and if so an I/O queue element for the read part of the read modify write is constructed (higher target) and the request is queued. The I/O queue element for the aligned part of the -'VO93/1~55 PCT/GB92/02291 , ,.~ ~,,.
.,, ~.

~ ~:

write is then constructed (higher target) a~d the request queued. The slave bus controller then enters background processing.
Figure 17 il~ustrates the operation of the input/output handling by the slave bus controllers. The SCSI bus phases are handled to perform a required I~O ~or the specified target. If a target was disconnected it is determined whether the command complete message has been received. If not, a warning is generated and a target fault is lo~ged. The SCSX I/O queue element of command just completed is examined to determine if command completion functisn can be executed at this current interrupt level. If so, then the last SCSI I/O command completion function is executed as specified in I/O queue element. ~lso the I/O queue element is unlinked from the SCSI I/O gueue a~d is maxked as being free for other USQS.
If it is determined that ~the command completion function cannot be executed at this current interrupt }evel then the last SCSI I/O command completion function and pointer to I/O gueue element is entered onto the background process queue. Also the I/O queue element from the SCSI
I/O queue is unlinked and the element is not marked as free.~, It remains in use until it is freed by the co~mand comple~ion function which will be executed from the background queue.
The next I/O re~uest from the SCSI I/O ~ eue is ~extracted using the ~I/O-reguest from the !target with the lowest average throughput.~ If ~several have a low figure, the lowest target is used. A s&leot target command is then issued~ to the SCSI and an IjO is queued before the ~-processor returns to ,background processing. If the I/O
queue is empty a flag is set to show that the SCSI I/O has stopped.

"~

W093/1~55 PCT/GB92/022~
~ 3~ 36 -Figure 18 illustrates a simple input/output completion function by a slave bus controller~ This is executed by the SCSI I/O handler from the SCSI interrupt level. The SCSI I/O queu~ element is examined and the queue tag is extracted. The queue tag is given by the central controller when the~command was issued to the slave bus controller. If the SCSI I/o was unsuccess~ully executed then the queue tag and a ~ault response is sent to the central controlIer. If the SCSI I/o is executed successfully then the queue tag and sn "acknowledge~l response is sent to the central controller to inform co~mand co~pletion.
Figure 19 illustrates the operation of a complex I/O
completion function by a slave bus controller. This is executed in the background from the background queue.
The IjO queue ele~ent is accessed with the pointer queued along with the completion function. The I/O link block associated with this I/O is then acces~ed and in the I/O link block it is recorded that the I/O has completed.
If the I/O was unsuccessfully completed then the fault details from the SCSI I/O queue element is stored in the I/O link block error information area.
It is then determined whether the I/O link through the current I/O l~nk block has been completed~ If so! it is d-termined~wh-ther there are any faults recorded in the I/O link block error information area. If not, a "tidy~up"
routine; is executed which i5 particular to the ori~inal command~ from the central controller. A queue tag and acknowledged~ response is then sent~ to the central controller.~
If th-re ar~ faults recorded in the I/O link block error information area then the queue tag, fault response and the fault inform-tion is sent to the central controller.

21273go ~093/1~5~PCT~GB92/02291 ~, ,.
,~ .

The I/O link bloc~ and all attached bu~fers are ~, freed and as well as the SCSI I/O queue element.
The "tidy-up" referred to hereinabove forms the final operation of the slave bus controllers when all associated SCSI I/O has completed successfully.

Sector Translation A problem has been experienced with the disk drives available to form the slave disk drive banks 40. As mentioned above host data arriving in "sectors" is split into four. This arrangement relies upon the slave dis~
drives of t:he array being able to be formatted with sector sizes exac1:1y one quarter of that used by the host. A
current standard sector size i5 512 bytes, with a resultant slave dis~ sector size requirement of 128 bytes. ~ ,~
Until recently this has not been a problem, but due to ~he speed and~ complexity o~ electronics, disk drives above~ the 500 megabyte level ca~ typically only be formatted to a minimum of 256 bytes per ~ector. Further, new d~sk drives above the l gigabyte capacity, can typically only support a minimum of 512 byte sectors. This would mean that the controller would only be able to ~-`
support host sector sizes of~two kilobytes.
; This problem has been overcome by applying a ~echnique termed "sector translation". In this technique eaah~ slave disk sector contains~four host sectors in what ; -;
is termed "virtual" ~lave sectors of 128 bytes. Inl this technique if the host requires a single sector of 512 bytes,~then the controller has to extract an individual ~ --sector of 128~bytes~from within the larger actual 512 bytes ~slàve disk drive sector. When writing data, for individual ~writes~ of a single ector, or less than four correctly grouped sectors, the controller has first to read the required overall sector, then modify the data for the -~
:' ' ' ' ~ ~"
:': : ~. ' . ~: . ~

W093/1~55 PCT/GB92/022~
~ 3~ - 38 ~

actual part of the sector that is necessary, and then finally write the overall slave disk sector bac~ to the disk drive. This is a form o~ read modify write operation and can slow down the transfer of data to the disk drives but this is not nGrmally a problem. Also, for large transfers of data to or ~ro~ the disk drives, the affect of this problem is minimal and is not noticed by the host computer.

Three Dimensional Memory.ArraY
The hardwar~ shown in Figure 1 can be expanded so that the host computer has access to a three dimensional array of disk drives. This is applicable to both RAID-35 and RAID-53 systems.
Figure 21 illustrates an arrangement of the disk drives in ~hree dimensions with respect to the computer memory controller 100. Each plane o~ disk dri~e corresponds to ~he two dimensional array il}ustrated in Figure 1 (42,0.A,...42,6,E). In this arrangement the number of buffer memories 26 and data splitting and parity logic 24 is increased in number to five, one for each two dlmensional array (or planes) of disk drives. . The central controller 22 then controls each buffer memory 26 and its associated slave controllers 32 independently. Each data splitting and parity ~ logic 24 is connected to its associated buffer memory 26 and to the SCSI-l interface 12 ~ For RAID-35 operation this vastly increases! the me~ory~: capacity ~and increases the number of read ahead segments by five,: whilst for RAID 53 operation a vast increase in :access:speed for data is encountered since five times the~ n~mber of seek operations can be carried out simultaneously compared to the two dimensional arrangement of Figure 1.

',' ~
- . .~

2l2738o : , Y093/1~55 PCT/GB92/0~291 What is described hereinabove is a schematicarrangement. In a practical arrangement five separate array controllers may be used, one per plane of disk drives.

Controller Redundancv Figure 16 illustrates the use of a second computer memory controller lOOB. The second computer memory controller lOOB is provided in case of failure of the main computer memory controller lOOA. The second computér memory controller lOOB is connected to each of the SC5I~
buses at a different address to the main computer memory controller .lOOA. This reduc~ the number of banks of disk drives which can be pro~ided to six since two of the SCSI-l address~s a~ee taken up by the controllexs lOOA and lOOB.
This arrangement provides for controller redundancy where it is not acceptable to have to shut down to repair a fault.

Com~ined RAlIc~J~ 2~
The hardware shown in Figures l, 21 and 22 can operate both RAID-35 and RAID-53. In addition the hardware can operate both systems by sharing the hardware. For instance at start-up~ a portion~ of the buffer memory 26 could~; ~be allocated to RAID-53, the remainder being allocated ~ for :RAID-35. When the system ~detects non :se ~ ential data requests then a buffer segment is opened in the portion of the buf~er memory allocated Ifor ~AID-53 and ~data~rèad~;thereto. If ~equentia}:data is detected by the central~ controller 22 then~ a~ buffer segment in the appropriate~ buffer portion is allocated and data read from the disk banks, together with read ahead information in the normal RAID-35 operation.
. ~ . :.
~ The disk banks can either be shared or a num~er of ;~
disk banks could be allocated ~or use by RAID-53 and the - . - :, WO93/1~55 0 40 _ PCT/GB92/0229~
~,~2~3~
. .
. .- -remainder for use by RAID-35.
This apportionment of the hardware can take place selectably by a user or it csuld take place automatically dependent on the sequential and non seguential data ratios. Thus for instance the system could initially ~e set up on RAID-53 mode upon start-up and the size of the portion of the buffer memffry 26 and the number of disk banks allocated for RAID-35 will depend on the number of sequential data requests.

Overlay Bank Strippina Overlay bank stripping is the term used hereinafter for the distribution of data amongst the memory bank~ and is applicable to both RAID-35 and RAID 53.
In t.he embodiments described hereinabove the data is stored in the banks sequentially. That is ~he logical blocks o~ user data are arranged ~equentially across the disk surfac:e and then sequentially across each additional bank. ~his does not fully utiliae the ability of the system to read and write to banks simultaneously. If the data i8 better distributed over the banks of disk drives it is possible to simultaneously read and write to banks even using the RAID-35.arrangement.
The overlay bank stripping technique operates by writing data received from the host computer onto a predefined segment o~ the first bank. Once this segment is full the data is then written onto a seSMent having the same logic l posi~ion in the next bank. T~his is rep~eated until~ the same logical segmen in each bank is full whereupon data is written to the next logical segm@nt in the first bank. This is repeated until the arr y is full.
This process has the advantage of evenly distributing the data over the banks of the array, therefore increasing the likelihood that data required by the host computer is located on different banks which can be read simultaneously . :, '....
' ~ .
~: ' ,' ~:, .

WO93/1~5 212 73~ D PCT/GB92/02291 to increase the speed of data retrieval. Further, since the controller allocates addresses for each segment data can be written to different banks simultaneously to increase the speed of data storage.
Figure 23 illustrates the distributio~ of data in segments within the array. A segment can be defined as a data area that contains at least one block sf disk data, e.g. 512 ~ytes, but more likely many multiples of disk data blocks, e.g. 64K bytes as shown in Figure 23.
If a host data block is 512 bytes this is segmented using the RAID-35 or RAID-53 technique to apply 128 bytes to each ch,annel. Thus a 64K byte segment on each disk drive of ~ach bank can contain 512 of these host data block segments.
Although the size of the segment described hereinabove is 64~ bytes, the segment size can be user selectable to allow tailoring to suit the performance optimisation re~lired for different applications.
When overlay bank stripping is used with the RAID-53 arrangement and host data requests are truly random there is no advantage in uæing overlay banks stripping. However, where host data requests (read or write) appear simultaneously for data which would have previously been on the same:~bank :(but not within the same segmen~ hen a considerable per~ormance improvement will be achieved since the requests are distributed across a number of bank thus allowing; simultaneous ;read/write~ operations~ In, .the arrange~ent shown in Figure 2~ the performance improvement is 7.~
~ ~For the RAID-35 arrangement on the face of it the ability ~to read sequential data may be penalised using overlay bank stripping. However, the use of overlay bank stripping enhances the performance since it allows the data on different banks to be simultaneously read. Thus for .', ~'""`, .
, ' ~:"' WO93/1~55 ; PCT/GB92/022~
3~ - 42 ~

'':' ,,-':: -sequential data greater than a segment whereas withoutoverlay bank stripping the full length of the data is read or written to a bank, with overlay bank stripping the data can be simultaneously read from or written to one or more bank~. This technique can increase the rate of data trans~er to and from th~ array and can overcome a limitation caused by the limited access speed provided by each individual disk. If the data is distributed in a segment on each bank in the arrangement shown in Figure 23 then the transfer rate is increased by a factor of seven.
However, in order to optimise the data transfer rate provided by the SCSI interface the segment size may need to be of suf~icient size, e.g. 64K bytes.
The technique of overlay bank stripping can be used with either the RAID-35 or RAID-53 techniques and where the computer ~emory controlIer is arranged to operate both by appropriately aæsigning the banks for the two techniques, overlay bank stripping can be used by both techniques if the disk banks are shared or only on~ of RAID-35 or RAID-53 if the disk banks are appropriately allocated.
,~
From the embodiments hereinabove de~cribed it~ can~be eeD ~that the controller of the present invention provides for~;large~scale~sequential data transfers f~xom ~emory units for multi-users of~ a host computer and/or random requests ~for small amounts of data from a multitude of users.
Whîle ~ the invention has been described with r-~ference~ to speoiic elements and combinations of eIements~ it is~envisaged-that each element may be combined with other ox any combination of other elementsO It is not intended to limit the ~ lnvention to the particular combinations of elements suggested. Furthermore, the --YVO93/14455 212 7 3 ~ O PCT/G892/02291 - 43 - :~

foregoing description is not intended to suggest that any ~ -~
element mentioned is indispensable to the invention, or that alternatives may not be employed. What is defined as :~
invention should not be construed as limiting the extent of ~ -the disclosure of this specification. ~ . ~

.
;'``',,~,`,' '`''.''`~
' ;` .':. .~, .

' . " .'', ~ .

' '~ '" 1 ' `, ' ~

Claims (41)

1. A computer memory controller for interfacing to a host computer comprising a buffer means for interfacing to a plurality of memory units and for holding data read thereto and therefrom; and control means operative to control the transfer of data to and from said host computer and said memory units; said buffer means being controlled to form a plurality of buffer segments for addressably storing data read from or written to said memory units;
said control means being operative to allocate a buffer segment for a read or write request from the host computer, of a size sufficient for the data; said control means being further operative in response to data requests from said host computer to control said memory units to seek data stored in different memory units simultaneously.
2. A computer memory controller as claimed in Claim 1 for interfacing to a plurality of memory units arranged into a two dimensional array having at least three memory channels, each memory channel comprising a plurality of memory units connected by a bus such that each memory unit of said memory channel is independently accessible;
respective memory units of said memory channels forming a memory bank; wherein said control means is operative in response to data requests from said host computer to store in said buffer segments bytes or groups of bits read from a memory bank; said controller comprising a logic circuit connected to said buffer means to recombine bytes or groups or bits read from a group of said memory units of a memory bank and stored in said buffer segments to generate the requested data; said logic circuit including parity means operative to use a check byte or group of bits read from one of said memory units of said memory bank to regenerate data read from said group of memory units if one of said group of memory units fails; said buffer means being divided into a number of channels corresponding to the number of memory channels, each channel being divided into associated portions of buffer segments; said control means being further operative to control said memory units to seek data stored in different memory banks simultaneously.
3. A computer memory controller as claimed in Claim 2, wherein said control means is adapted to queue host data requests which it is unable to carry out at the time of the request, until the memory bank containing the requested data is not busy.
4. A computer memory controller as claimed in Claim 3, wherein if a memory bank has more than one data request said control means is adapted to control the order in which the data is sought in order to optimise the time taken to accomplish the read operation,
5. A computer memory controller as claimed in any of Claims 2 to 4, wherein said logic circuit is operative to split data input from said host computer into a plurality of portions such that said portions are temporarily stored in a buffer segment before being applied to ones of a group of said memory channels for storage in a memory bank; and said parity means is operative to generate a check byte or group of bits from said data for temporary storage in a buffer segment before being stored in at least one memory unit of a memory bank.
6. A computer memory controller as claimed in Claim 5, wherein said control means is operative to effect the writing of data to a memory bank substantially immediately, to the detriment of any pending read or write requests.
7. A computer memory controller as claimed in any of Claims 2 to 6, wherein said controller is adapted for interfacing to an array of memory units having five memory channels, one said memory channel holding said check byte or groups of bits.
8. A computer memory controller as claimed in any of Claims 2 to 7, comprising a plurality of said buffer means, and said logic circuits, each said buffer means being adapted for interface to one said two dimensional array of memory units; said control means being operative to control the transfer of data to and from said host computer and a three dimensional array of memory units formed of a plurality of said two dimensional arrays of memory units.
9. A computer memory controller as claimed in any preceding claim, wherein said controller is adapted for interfacing to magnetic disk drives.
10. A computer memory controller as claimed in any preceding claim, wherein said buffer means is adapted to hold data requested by said host computer and further data logically sequential thereto; said control means being further operative to control the transfer of data to said host computer in response to requests therefrom by first addressing said buffer segments to establish whether the requested data is contained therein and if: so supplying said data to said host computer, and if the requested data is not contained in the buffer segments reading said data from said memory units, supplying said data to said host computer, reading from said memory units further data which is logically sequential to the data requested by said host computer and storing said further data in a buffer segment;
said control means being further operative to control the buffer means to control the number and size of said buffer segments.
11. A computer memory controller as claimed in Claim 10, wherein said control means is operative to reduce the size of existing buffer segments on each occasion that a request for data from said host computer cannot be complied with from the further data stored in existing ones of said buffer segments, to dynamically allocate a new segment of said buffer means for further data to the data requested, and to continue this process until the size of each buffer segment is some predetermined minimum, whereupon, at the next request for data not available in a buffer. segment, the buffer segment least frequently utilised is employed.
12. A computer memory controller as claimed in any preceding claim, wherein said control means is further operative to addressably segment a plurality of said memory units such that respective segments on sequential memory units have a sequential address, and to write sequential data to sequentially addressed segments on sequential segmented memory units.
13. A computer memory controller as claimed in any of Claims 2 to 11, wherein said control means is further operative to addressably segment a plurality of said memory banks into sequential bank segments on sequential memory banks such that respective segments on sequential banks have sequential address, and to write sequential data to sequentially addressed bank segments on sequential segmented memory banks.
14. A method of controlling a plurality of memory units for use with a host computer comprising the steps of repeatedly receiving from said host computer a read request for data stored in said memory units and allocating a buffer segment of sufficient size for the data to be read;
and seeking data in said plurality of memory units simultaneously.
15. A method as claimed in Claim 14, wherein said memory units are arranged into a two dimensional array having at least three memory channels, each memory channel comprising a plurality of respective memory units connected by a; bus such that each memory unit of said memory channel is independently accessible; respective memory units of said memory channels forming a memory bank; said method including the steps of storing bytes or groups of bits read from a memory bank in said buffer segments in response to data requests from said host computer; recombining bytes or groups of bits read from a group of said memory units of a memory bank and stored in said buffer segments to generate the requested data; reading a check byte or group of bits from one of said memory units of said memory bank;
regenerating data read from said group of memory units using said check byte if one of said group of said memory units fails; and seeking data stored in different memory banks simultaneously.
16. A method as claimed in Claim 15, wherein host data requests which cannot be carried out at the time of request, are queued to be carried out at a time when the memory bank containing the requested data is not busy.
17. A method as claimed in Claim 16, wherein if a memory bank has more than one data request, the order in which th data is sought is controlled in order to optimise the time taken to accomplish the read operation.
18. A method as claimed in any one of Claims 15 to 17 including the steps of splitting data output from said host computer into a plurality of portions; storing said portions are in buffer segment; applying said split data to ones of a group of said memory units of a memory bank;
generating a check byte or group of bits from said data;
storing said check byte or group of bits in a buffer segment; applying said check byte or group of bits to at least one memory unit of a memory bank.
19. A method as claimed in Claim 18, wherein data is written to a memory bank substantially immediately to the detriment of any pending read or write request.
20. A method as claimed in any of Claims 15 to 19 including the step of controlling the transfer of data to and from said host computer and a three dimensional array of memory units formed of a plurality of said two dimensional arrays of memory units.
21 A method as claimed in any of Claims 14 to 20, including the steps of checking a plurality of buffer segments to establish whether the requested data is in said buffer segments, either complying with said request by transferring the data in said buffer segments to said host computer, or first reading said data from said memory units into one buffer segment and then complying with said request, reading from said memory units further data logically sequential to the data requested and storing said data in said buffer segment.
22. A method as claimed in Claim 21, further including the steps of reducing the size of existing buffer segments on each occasion that a request for data from said host computer cannot be complied with from the further data stored in existing ones of said buffer segments, dynamically allocating a new segment of said buffer for further data to the data requested, and continuing this process until the size of each buffer segment is some predetermined minimum, whereupon at the next request for data not available in a buffer segment the buffer segment least frequently utilised is employed.
23. A method as claimed in any of Claims 14 to 22, including the steps of addressably segmenting a plurality of said memory units such that respective segements on sequential memory units have a sequential address, and writing sequential data to sequentially addressed segments of sequential segmented memory units.
24. A method as claimed in any of Claims 15 to 22, including the steps of addressably segmenting a plurality of said memory banks such that respective bank segments on sequential memory banks have a sequential address, and writing sequential data to sequentially addressed bank segments of sequential segmented memory banks.
25. A computer memory controller for a host computer comprising buffer means for interfacing to at least three memory channels arranged in parallel, each memory channel comprising a plurality of memory units connected by a bus such that each memory unit o f said memory channel is independently accessible; respective memory units of said memory channels forming a memory bank; a logic circuit connected to said buffer means to split data input from said host computer into a plurality of portions such that said portions are temporarily stored in a buffer segment before being applied to ones of a group of said memory channels for storage in a memory bank; said logic circuit being further operative to recombine portions of data successively read from successive ones of a group of said memory units of a memory bank and into said buffer means;
said logic circuit including parity means operative to generate a check byte or group of bits from said data for temporary storage in said buffer means before being stored in at least one said memory unit of said memory bank, and operative to use said check byte to regenerate said data read from said group of memory units of a memory bank if one of said group of memory units fails; said buffer means being divided into a number of channels corresponding to the number of memory channels, each said channel being divided into associated portion of buffer segments; and a control means operative to control the transfer of data and check bytes or groups of bits to and from said memory banks, including allocating a buffer segment for a read or write request from the host computer of a sufficient size for the data, and controlling said memory banks to seek requested data stored in different memory banks simultaneously.
26. A computer memory controller as claimed in Claim 25, wherein said control means is adapted to queue host data requests which it is unable to carry out at the time of the request, until the memory bank containing the requested data is not busy.
27. A computer memory controller as claimed in Claim 26, wherein if a memory bank has more than one data request, said control means is adapted to control the order in which the data is sought in order to optimise the time taken to accomplish the read operation.
28. A computer memory controller as claimed in any of Claims 25 to 27, wherein said control means is operative to effect the writing of data to a memory bank substantially immediately, to the detriment of any pending read or write requests.
29. A computer memory controller as claimed in any of Claims 25 to 28, wherein said controller is adapted for interfacing to an array of memory units having five memory channels, one said memory channel holding said check byte or groups of bits.
30. A computer memory controller as claimed in any of Claims 25 to 29, comprising a plurality of said buffer means, and said logic circuits, each said buffer means being adapted for interface to one said two dimensional array of memory units; said control means being operative to control the transfer of data to and from said host computer and a three dimensional array of memory units formed of a plurality of said two dimensional arrays of memory units.
31. A computer memory controller as claimed in any of Claims 25 to 30, wherein said controller is adapted for interfacing to magnetic disk drives.
32. A computer memory controller as claimed in any of Claims 25 to 31, wherein said control means is operative to allocate a first portion of said buffer means for non sequential data, said control means being operative to allocate a buffer segment which is of sufficient size for the data, in said first portion for a read or write request from the host computer which is not sequential, to control said memory banks to seek a plurality of requested data stored in different ones of said memory banks simultaneously, to allocate a second portion of said buffer means for sequential data, to control the transfer of sequential data to said host computer in response to requests therefrom by first addressing said buffer segments of the second portion to establish whether the requested data is contained therein and if so supplying said data to said host computer, and if the requested sequential data is not contained in the buffer segments of said second portion, reading said data from the memory units of said memory banks, supplying said data to said host computer, reading from said memory units further data which is logically sequential to the data requested by the host computer and storing said further data in a buffer segment in said second portion; said control means being further operative to control the second portion of said buffer means to control the number and size of said buffer segments.
33. A computer memory controller as claimed in any of Claims 25 to 31, wherein said control means is operative to allocate a first portion of said buffer means and a number of memory banks for non sequential data, said control means being operative to allocate a buffer segment which is of sufficient size for the data, in said first portion for a read or write request from the host computer which is not sequential, to control said memory banks to seek requested data stored in different ones of said number of memory banks simultaneously, to allocate a second portion of said buffer means and the remaining memory banks for sequential data, to control the transfer of sequential data to said host computer in response to requests therefrom by first addressing said buffer segments of the second portion to establish whether the requested data is contained therein and if so supplying said data to said host computer, and if the requested sequential data is not contained in the buffer segments of said second portion, reading said data from the memory units of the remaining memory banks, supplying said data to said host computer, reading from the memory units further data which is logically sequential to the data requested by the host computer and storing said further data in a buffer segment in said second portion;
said control means being further operative to control the second portion of said buffer means to control the number and size of said buffer segments.
34. A computer memory controller as claimed in Claim 33, wherein said control means is further operative to addressably segment a plurality of said memory banks such that respective bank segments on sequential memory banks have a sequential address, to write sequential data to sequentially addressed bank segments of sequential segmented memory banks, and to seek and read requested data as well as any data sequential thereto stored in sequential bank segments in sequential ones of said plurality of memory banks simultaneously.
35. A computer storage system comprising a plurality of memory units arranged into a two dimensional array having at least three memory channels arranged in parallel, each said memory channel comprising a plurality o f memory units connected by a bus such that each memory unit is independently accessible; respective memory units of said memory channels forming a memory bank; and a controller comprising buffer means interfaced to said memory units and for holding information read from said memory channels, said buffer means being controlled to form a plurality of buffer segments for addressably storing data read from or written to said memory units; a logic circuit connected to said buffer means to recombine bytes or groups of bits read from ones of a group of said memory units in a memory bank, parity means operative to use a check byte or group of bits read from one of said memory units in said memory bank to regenerate information read from said group of memory units if one of said group of memory units fails; and control means for controlling the transfer of data to and from said host computer and said memory units, including allocating a buffer segment for a read or write request from the host computer of a sufficient size for the data, and controlling said memory banks to seek data stored in different memory banks simultaneously.
36. A computer storage system as claimed in Claim 35 comprising five memory channels arranged in parallel, one said memory channel holding said check byte or groups of bits.
37. A computer storage system as claimed in Claim 35 or Claim 36, wherein each memory channel comprises seven memory units thus forming seven memory banks.
38. A computer storage system as claimed in any of Claims 35 to 37, wherein said memory units are magnetic disk drives and the rotation of magnetic disk drives of a memory bank is synchronised.
39. A computer storage system as claimed in Claim 35, wherein said controller includes a plurality of said buffer means, said logic circuits and said parity means, each said buffer means being adapted for interface to one said two dimensional array of memory units; said control means being operative to control the transfer of data to and from said host computer and a three dimensional array of memory units formed of a plurality of said two dimensional arrays of memory units.
40. A computer storage system as claimed in any of Claims 35 to 39, further including a second controller interfaced to said memory channels in a like manner to the first controller and having a different address on said bus.
41. A computer memory controller for a host computer comprising a plurality of buffer means each for interfacing a plurality of memory units arranged into a two dimensional array having at least three memory channels, each memory channel comprising a plurality of memory units connected by a bus such that each memory unit is independently accessible; respective memory units of said memory channels forming a memory bank; a plurality of logic circuits connected to respective said buffer means to recombine bytes or groups of bits read from ones of a group of said memory units of a memory bank and stored in said buffer segments to generate the requested data; said logic circuits each including parity means operative to use a check byte or group of bits read from one of said memory units of said memory bank to regenerate data read from said group of memory units if one of said group of memory units fails; said buffer means being divided into a number of channels corresponding to the number of memory channels, each channel being divided into associated portion of buffer segments; and control means operative to control the transfer of data from a three dimensional array of memory units formed from a plurality of said two dimensional arrays to said host computer in response to requests therefrom by first addressing said buffer segments to establish whether the requested data is contained therein and if so supplying said data to said host computer, and if the requested data is not contained in the buffer segments, reading said data from the memory units, supplying said data to said host computer, reading from said memory units further data which is logically sequential to the data requested by said host computer and storing said further data in a buffer segment; said control means being further operative in response to data requests from said host computer to control said memory units to seek data stored in different memory banks simultaneously; said control means further controlling said buffer means to control the number and size of said buffer segments.
CA002127380A 1992-01-06 1992-12-10 Computer memory array control Abandoned CA2127380A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB929200207A GB9200207D0 (en) 1992-01-06 1992-01-06 Computer memory array control
GB9200207.0 1992-01-06

Publications (1)

Publication Number Publication Date
CA2127380A1 true CA2127380A1 (en) 1993-07-22

Family

ID=10708187

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002127380A Abandoned CA2127380A1 (en) 1992-01-06 1992-12-10 Computer memory array control

Country Status (6)

Country Link
EP (1) EP0620934A1 (en)
JP (1) JPH08501643A (en)
AU (1) AU662376B2 (en)
CA (1) CA2127380A1 (en)
GB (1) GB9200207D0 (en)
WO (1) WO1993014455A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5375084A (en) * 1993-11-08 1994-12-20 International Business Machines Corporation Selectable interface between memory controller and memory simms
EP0727750B1 (en) * 1995-02-17 2004-05-12 Kabushiki Kaisha Toshiba Continuous data server apparatus and data transfer scheme enabling multiple simultaneous data accesses
US5937174A (en) * 1996-06-28 1999-08-10 Lsi Logic Corporation Scalable hierarchial memory structure for high data bandwidth raid applications
US5881254A (en) * 1996-06-28 1999-03-09 Lsi Logic Corporation Inter-bus bridge circuit with integrated memory port

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4958351A (en) * 1986-02-03 1990-09-18 Unisys Corp. High capacity multiple-disk storage method and apparatus having unusually high fault tolerance level and high bandpass
US4993030A (en) * 1988-04-22 1991-02-12 Amdahl Corporation File system for a plurality of storage classes
AU630635B2 (en) * 1988-11-14 1992-11-05 Emc Corporation Arrayed disk drive system and method

Also Published As

Publication number Publication date
GB9200207D0 (en) 1992-02-26
JPH08501643A (en) 1996-02-20
AU662376B2 (en) 1995-08-31
AU3091592A (en) 1993-08-03
WO1993014455A1 (en) 1993-07-22
EP0620934A1 (en) 1994-10-26

Similar Documents

Publication Publication Date Title
US5526507A (en) Computer memory array control for accessing different memory banks simullaneously
EP0426185B1 (en) Data redundancy and recovery protection
US5191584A (en) Mass storage array with efficient parity calculation
US6009481A (en) Mass storage system using internal system-level mirroring
US6058489A (en) On-line disk array reconfiguration
US5502836A (en) Method for disk restriping during system operation
KR100211788B1 (en) Failure prediction for disk arrays
US9378093B2 (en) Controlling data storage in an array of storage devices
US5210860A (en) Intelligent disk array controller
US5893919A (en) Apparatus and method for storing data with selectable data protection using mirroring and selectable parity inhibition
US6182198B1 (en) Method and apparatus for providing a disc drive snapshot backup while allowing normal drive read, write, and buffering operations
US7971013B2 (en) Compensating for write speed differences between mirroring storage devices by striping
US7730257B2 (en) Method and computer program product to increase I/O write performance in a redundant array
US7506187B2 (en) Methods, apparatus and controllers for a raid storage system
US5881311A (en) Data storage subsystem with block based data management
US10229022B1 (en) Providing Raid-10 with a configurable Raid width using a mapped raid group
US8041891B2 (en) Method and system for performing RAID level migration
WO1997044733A1 (en) Data storage system with parity reads and writes only on operations requiring parity information
EP0850448A4 (en) Method and apparatus for improving performance in a redundant array of independent disks
GB2271462A (en) Disk array recording system
US20060059306A1 (en) Apparatus, system, and method for integrity-assured online raid set expansion
EP0657801A1 (en) System and method for supporting reproduction of full motion video on a plurality of playback platforms
US6425053B1 (en) System and method for zeroing data storage blocks in a raid storage implementation
EP0662660A1 (en) An improved data storage device and method of operation
JPH1063576A (en) Hierarchical disk drive and its control method

Legal Events

Date Code Title Description
FZDE Dead