WO2001031456A1 - Techniques de mise en antememoire permettant d'ameliorer le rendement du systeme dans des applications raid (reseau redondant de disques independants) - Google Patents

Techniques de mise en antememoire permettant d'ameliorer le rendement du systeme dans des applications raid (reseau redondant de disques independants) Download PDF

Info

Publication number
WO2001031456A1
WO2001031456A1 PCT/US2000/029881 US0029881W WO0131456A1 WO 2001031456 A1 WO2001031456 A1 WO 2001031456A1 US 0029881 W US0029881 W US 0029881W WO 0131456 A1 WO0131456 A1 WO 0131456A1
Authority
WO
WIPO (PCT)
Prior art keywords
bus
cache memory
disk drive
interfacing
xor
Prior art date
Application number
PCT/US2000/029881
Other languages
English (en)
Other versions
WO2001031456A9 (fr
Inventor
William Lam
Original Assignee
Connectcom Solutions, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Connectcom Solutions, Inc. filed Critical Connectcom Solutions, Inc.
Priority to AU14446/01A priority Critical patent/AU1444601A/en
Publication of WO2001031456A1 publication Critical patent/WO2001031456A1/fr
Publication of WO2001031456A9 publication Critical patent/WO2001031456A9/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2211/00Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
    • G06F2211/10Indexing scheme relating to G06F11/10
    • G06F2211/1002Indexing scheme relating to G06F11/1076
    • G06F2211/1009Cache, i.e. caches used in RAID system with parity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2211/00Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
    • G06F2211/10Indexing scheme relating to G06F11/10
    • G06F2211/1002Indexing scheme relating to G06F11/1076
    • G06F2211/1054Parity-fast hardware, i.e. dedicated fast hardware for RAID systems with parity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/312In storage controller

Definitions

  • the present invention relates to a caching technique for improving system performance in RAID applications.
  • the memory caching architecture technique according to the present invention creates a substantial improvement on the system host bus, especially for a RAID application system.
  • the present invention is related to U. S. Patent Nos. 5,734,924 entitled System For Host Accessing Local Memory, issued March 31, 1998; 5,586,268 entitled Multiple Peripheral Adapter Device Driver Architecture, issued December 17, 1996; and 5,561,813 entitled Circuit For Resolving I/O Port Address Conflicts, issued October 1, 1996, all of which are assigned to the same assignee as the present invention, and the details of which are hereby incorporated by reference.
  • the traditional RAID (Redundant Arrays of Inexpensive Disks) architecture is to fetch data or to store data from/to SCSI Bus (a popular disk drive Bus) to/from PCI Bus (a popular system host Bus).
  • SCSI Bus a popular disk drive Bus
  • PCI Bus a popular system host Bus
  • a typical data transfer size from the Host to the disk drive is in the order of 4K byte to 64K byte.
  • the application software needs to perform the XOR operation (from the RAID algorithm), huge amounts of data transfer required to go on the host bus.
  • each access from the hard disk drive is very slow relative to silicon memory access time. This in term generates a bottleneck situation on the host bus which in-effect slows down the overall system performance.
  • the memory caching architecture technique according to the present invention creates a substantial improvement on the system host bus, especially for the RAID application system, implementing an Exclusive-OR (XOR) function in hardware.
  • XOR Exclusive-OR
  • the present invention provides a caching system comprising a host bus; a disk drive bus; an internal cache memory; interfacing means for interfacing between the host bus, the disk drive bus and the cache memory.
  • the interfacing means includes a host interface for interfacing with the host bus; a disk drive interface for interfacing with the disk drive bus; XOR function means for interfacing with the cache memory and for providing and loading one or more XOR functions into the cache memory; and a micro-controller means for controlling data flow between the host bus, the disk drive bus and the cache memory, including the XOR functions from the cache memory.
  • the present invention includes DMA means for buffering the data transfer between the host bus, the disk drive bus and the cache memory; an internal side bus for interfacing with the XOR function means for providing direct data transfers between the host bus and the disk drive bus; and means for providing simultaneous data transfers between the host bus and the disk drive bus and the cache memory.
  • Figure 1 shows caching architecture block diagram for the present invention.
  • Figure 2 shows a caching architecture block diagram with a XOR datapath and with an embedded CPU for the present invention.
  • Figure 3 shows an XOR function with local memory for the present invention.
  • Figure 4 shows a more detailed XOR function with local memory of Figure 3.
  • the memory caching architecture technique creates a substantial improvement on the system host bus, especially for the RAID application system, implementing an exclusive-OR (XOR) function in hardware.
  • XOR exclusive-OR
  • the present invention introduces a caching technique which resides between the disk drive bus and the host bus.
  • a caching technique which resides between the disk drive bus and the host bus.
  • the data access time from the host can be significantly reduced when there is a cache hit or the data is in the cache memory.
  • data is pulled out from the disk to the cache memory and to the host bus.
  • a read ahead technique can be used in this case as well. More data is read out than required, which translates into more chances for the host to have a cache hit, perhaps for the next time around.
  • the minimum size of the cache is equal to the maximum amount of each data transfer from the host multiply by the total number of the disk drives.
  • the host Since the XOR operation is extracted out from the software application, the host is no longer necessary to access the data from the slow disk drive bus to the host bus.
  • the XOR operation can be performed behind the scene and the RAID software just dispatches the XOR operation to the bus controller.
  • the heavy data accesses between the disk drive and the cache memory are totally isolated from the host bus. This boosts up the overall host bus performance and reduces the software time to perform the XOR operation.
  • Figure 1 shows the architecture described above.
  • Figure 1 is one example of the RAID application architecture for an embedded system.
  • the host bus interface block 102 provides the central communication between the host bus 100 and the host adapter circuitry. In most systems today, the host bus 100 would be a PCI bus.
  • the host bus 100 communicates with the hard disk drive bus 120 through disk drive interface 124 through this architecture.
  • This architecture provides direct data transfer from the host bus 100 to the disk drive via disk drive bus 120 or simultaneously transfer from the disk drive bus 120 to both the host bus 100 and to the cache memory 130.
  • the DMA (Direct Memory Access) datapath 140 is a media to buffer up the incoming data and to pump the data out at the appropriate time, especially since the speed from each of the three buses may not be the same.
  • the data flow and traffic between the buses is managed by the micro-controller 110.
  • the micro-controller 110 accepts the appropriate command blocks from the RAID software and the micro-controller 110 ucode would decode and execute the commands in a way such that the controls are enabling the proper datapath for a specific task.
  • the DMA datapath 140 would provide enough data buffering on each side of the three buses, so that data transfer can be operated in an effective manner.
  • the proper buffering size would determine by the typical transferring size, the incoming data rate and the outgoing data rate. With this architecture, the cache data can transfer to/from the host bus and the disk drive bus concurrently.
  • This effect can be achieved from the side bus interface 144 by providing dual datapath from side bus 146 to host bus 100 and to hard disk bus 120. By alternating the transfer to this two datapath in a reasonable transfer size(such as 512B each time), a concurrent transferring effect from the software point of view can be accomplished.
  • One bus that DMA 140 interfaces with is the side bus 146 which communicates with the cache controller and the XOR functional block 150.
  • This block 150 provides the capability of doing an XOR function and interfacing with the cache memory 130 via memory bus 154. This capability enables the RAID application to substantially improve the system performance. By performing the XOR function on the side while freeing up the host bus 100 for other tasks, the host bus bandwidth is improved significantly and at the same time, the time for performing the XOR task is reduced.
  • the side bus 146 accepts multiple XOR blocks. In this way, the expandable cache memory and multiple XOR functions can execute simultaneously. For the high-end server class system, bigger cache size and multiple XOR functions actively performing simultaneously would be desirable.
  • a 64Mbyte transfer is activated from disk drive bus 120 to host bus 100 via host interface 102.
  • the data transfers from the disk drive bus 120 to the host bus 100, and at the same time, the same data transfers into the cache memory 130 as well.
  • the host requires fetching the same set of data, the data can be fed to the host as fast as several hundreds of nanoseconds. Since the data resides in the cache memory 150, the data can be quickly accessed without going through the hard disk which normally would take up to few milliseconds.
  • the RAID software loads the proper source data into the cache memory 130, and the RAID software would execute the command by properly programmed the internal required registers.
  • the hardware puts the XOR result into the cache memory 130, and the RAID software gets the data from the cache memory 130 and put this result into the disk drive.
  • the RAID software gets the data from the cache memory 130 and put this result into the disk drive.
  • the top level command receives from the host and the microprocessor runs the RAID software to decode the command and formed a list of lower level commands which usually send over the host bus 100. Now these lower level commands are sent to the micro-controller for the next level of execution through the internal bus 146.
  • the XOR function plus datapath 157 for direct cache memory 130 access includes XOR 161 and memory controller 159. This method will even further reduce the traffic on the host interface bus 102.
  • the XOR hardware While the XOR hardware is executing, the data in the cache memory is locking up exclusively for this operation.
  • the XOR operation can be very lengthy in time depending on the number of the input sources and the size of the data. If there are 8 input sources and each source is 64Kbytes, the XOR operation would occupy the cache data for few mini-seconds. This is a very long period of time. If the host bus or the disk drive bus needs to access the cache data, there is a long waiting period before the cache memory is free for access.
  • the technique to improve this situation is to implement a dual datapath function 157, as shown in Figure 2.
  • One datapath is used for the XOR operation and the other is used for direct cache memory access. While the XOR datapath 161 is performing the operation, the host bus 100 or the disk drive bus 120 can access the cache data through the second datapath. This technique would delay the XOR operation from finishing but the overall performance would improve. Since neither the host bus 100 nor the disk drive bus 120 need not have to wait for the XOR operation 161 to complete, the RAID application software can continue to process while the XOR operation 161 is performing in the background.
  • Figure 2 shows the overall caching technique of Figure 1 and where the dual datapaths can fit in to further improve the caching technique.
  • the XOR logic 161 continues for 512Kbtye.
  • the disk drive request is granted. The transfer begins for the next 512byte from the cache memory 130 to the disk drive through disk drive bus 120. Once again the XOR 161 is put on hold. When the data is pumped out from the cache to the disk drive, the XOR 161 is activated again for another 512Kbtye. Once the 512byte operation is completed, the side bus interface 144 swings back to the host bus transfer. The same process continues until all 64Kbyte transfer is completed on both the host bus 100 and the disk drive bus 120. One may notice the cache bus is being multiplexed for the XOR function 161, the host bus 100 and disk drive bus 120.
  • This capability allows the software to transfer data from/to the cache memory 130 while the lengthy XOR 161 is performing in the background.
  • the data access from the cache memory 130 can be performed without waiting for the completion of the XOR operation 161. Therefore, the overall system performance can be greatly improved.
  • the cache memory 130 is partitioned into 5 logical units and assigned the units to be A unit, B unit, C unit, D unit, and X unit. Each unit A, B, C, D and X are assigned 64Kbytes and the host is storing data into the disk. Since there is a cache memory 130, the data will be storing into the cache memory 130 instead. The following are the sequences of operation to perform this task without any local memory.
  • Each unit in the above example is 64Kbytes and this translated into many thousands of cycles depending on the implementation of the memory controller and the memory bus width.
  • memory transfer cycles are reduced by a significant amount and reduce or save power at the same time due to less transaction on the bus.
  • the technique is to add a local memory with the XOR hardware, as shown as block 160 in Figure 3.
  • the incoming data from the cache memory 130 or from the external sources can perform XOR operation with the local memory data and the result stores back into the local memory 160.
  • the optimum size of the local memory is equal to the maximum host bus transfer size per transaction or commonly called strip/segment size. This size may vary from one application to another application and the example above uses 64Kbytes per host transfer as a unit.
  • the intermediate XOR 166 result can be stored in the local memory 170 and the datapath can be arranged so that as the data comes-in from the local side bus 146, the data can go to the cache memory 130 by cache memory controller 159 and to the XOR 166 logic path as well.
  • this architecture not only can reduce the cache bandwidth by a substantial amount but also can complete the XOR transaction in less than half the time.
  • the XOR 166 function is performed as the data come-in from the local side bus 146. Since one source is ready in the local memory 170 and the other is coming in, the XOR 166 function is executed on the fly as the data flow through the datapath. The result is inputted back into the local memory 170 and it is ready to be used again for the next input source.
  • the local memory 170 embedded in this architecture clearly has demonstrated the effectiveness and the efficiency of enhancing the implementation of the XOR feature in the RAID application system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Cette invention a trait à une technique de mise en antémémoire permettant d'améliorer le rendement du système dans des applications RAID (réseaux redondants de disques indépendants). La technique selon cette invention d'architecture de mise en antémémoire permet d'améliorer de manière significative le bus hôte (100) du système, notamment pour des applications RAID, en mettant en oeuvre une fonction OU-exclusive (XOR) dans le matériel. Le fait de retirer l'opération XOR du logiciel et de mettre en oeuvre la fonction XOR dans le matériel (150) ainsi que de supprimer le transfert de données non nécessaire dans l'opération XOR permet d'apporter une amélioration significative au niveau de la largeur de bande du bus hôte.
PCT/US2000/029881 1999-10-28 2000-10-27 Techniques de mise en antememoire permettant d'ameliorer le rendement du systeme dans des applications raid (reseau redondant de disques independants) WO2001031456A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU14446/01A AU1444601A (en) 1999-10-28 2000-10-27 Caching techniques for improving system performance in raid applications

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US42914299A 1999-10-28 1999-10-28
US09/429,142 1999-10-28

Publications (2)

Publication Number Publication Date
WO2001031456A1 true WO2001031456A1 (fr) 2001-05-03
WO2001031456A9 WO2001031456A9 (fr) 2002-05-10

Family

ID=23701974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/029881 WO2001031456A1 (fr) 1999-10-28 2000-10-27 Techniques de mise en antememoire permettant d'ameliorer le rendement du systeme dans des applications raid (reseau redondant de disques independants)

Country Status (2)

Country Link
AU (1) AU1444601A (fr)
WO (1) WO2001031456A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005066761A2 (fr) * 2003-12-29 2005-07-21 Intel Corporation Procede, systeme et programme de gestion d'organisation des donnees
US7188303B2 (en) 2003-12-29 2007-03-06 Intel Corporation Method, system, and program for generating parity data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4571674A (en) * 1982-09-27 1986-02-18 International Business Machines Corporation Peripheral storage system having multiple data transfer rates
US5206943A (en) * 1989-11-03 1993-04-27 Compaq Computer Corporation Disk array controller with parity capabilities
US5522065A (en) * 1991-08-30 1996-05-28 Compaq Computer Corporation Method for performing write operations in a parity fault tolerant disk array
US5937174A (en) * 1996-06-28 1999-08-10 Lsi Logic Corporation Scalable hierarchial memory structure for high data bandwidth raid applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4571674A (en) * 1982-09-27 1986-02-18 International Business Machines Corporation Peripheral storage system having multiple data transfer rates
US5206943A (en) * 1989-11-03 1993-04-27 Compaq Computer Corporation Disk array controller with parity capabilities
US5522065A (en) * 1991-08-30 1996-05-28 Compaq Computer Corporation Method for performing write operations in a parity fault tolerant disk array
US5937174A (en) * 1996-06-28 1999-08-10 Lsi Logic Corporation Scalable hierarchial memory structure for high data bandwidth raid applications

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Understanding HP AutoRAID part II or II: The AutoRAID storage hierarchy performance considerations", ENTERPRISE STORAGE SOLUTIONS DIVISION, HEWLETT-PACKARD COMPANY, March 1999 (1999-03-01), XP002937542, Retrieved from the Internet <URL:www.enterprisestorage.hp.com/products/disk_array/autoraid/sse_II_disk_array_12h_fa.html> [retrieved on 20001215] *
MENON JAI ET AL.: "The architecture of a fault-tolerant cached RAID controller", PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, May 1993 (1993-05-01), pages 76 - 86, XP002937543 *
WILKES JOHN ET AL.: "The HP AutoRAID hierarchical storage system", PROCEEDINGS OF THE FIFTEENTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, December 1995 (1995-12-01), pages 96 - 108, XP002937544 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005066761A2 (fr) * 2003-12-29 2005-07-21 Intel Corporation Procede, systeme et programme de gestion d'organisation des donnees
WO2005066761A3 (fr) * 2003-12-29 2006-08-24 Intel Corp Procede, systeme et programme de gestion d'organisation des donnees
US7188303B2 (en) 2003-12-29 2007-03-06 Intel Corporation Method, system, and program for generating parity data
US7206899B2 (en) 2003-12-29 2007-04-17 Intel Corporation Method, system, and program for managing data transfer and construction

Also Published As

Publication number Publication date
WO2001031456A9 (fr) 2002-05-10
AU1444601A (en) 2001-05-08

Similar Documents

Publication Publication Date Title
US6636927B1 (en) Bridge device for transferring data using master-specific prefetch sizes
EP1019835B1 (fr) Acces direct en memoire (dma) segmente comportant un tampon xor pour sous-systemes de mise en memoire
US5649230A (en) System for transferring data using value in hardware FIFO&#39;S unused data start pointer to update virtual FIFO&#39;S start address pointer for fast context switching
US5594877A (en) System for transferring data onto buses having different widths
US6134630A (en) High-performance bus architecture for disk array system
US7225326B2 (en) Hardware assisted ATA command queuing
US5694581A (en) Concurrent disk array management system implemented with CPU executable extension
US5978856A (en) System and method for reducing latency in layered device driver architectures
US20040139286A1 (en) Method and related apparatus for reordering access requests used to access main memory of a data processing system
JPH08504524A (ja) 順次読取りキャッシュに有利な読み込み/書込みバッファ分割の割当て要求
JP3247075B2 (ja) パリティブロックの生成装置
US8386727B2 (en) Supporting interleaved read/write operations from/to multiple target devices
US6425053B1 (en) System and method for zeroing data storage blocks in a raid storage implementation
JP2001523860A (ja) 高性能構造ディスク・アレイ・コントローラ
JP3266470B2 (ja) 強制順序で行う要求毎ライト・スルー・キャッシュを有するデータ処理システム
US20030236943A1 (en) Method and systems for flyby raid parity generation
US5459838A (en) I/O access method for using flags to selectively control data operation between control unit and I/O channel to allow them proceed independently and concurrently
EP0618537B1 (fr) Système et procédé d&#39;entrelacement d&#39;information d&#39;état avec transferts de données dans un adaptateur de communication
US6008823A (en) Method and apparatus for enhancing access to a shared memory
JPH03189843A (ja) データ処理システムおよび方法
WO2001031456A1 (fr) Techniques de mise en antememoire permettant d&#39;ameliorer le rendement du systeme dans des applications raid (reseau redondant de disques independants)
US6513142B1 (en) System and method for detecting of unchanged parity data
US7136972B2 (en) Apparatus, system, and method for distributed management in a storage system
US5946707A (en) Interleaved burst XOR using a single memory pointer
JPH076093A (ja) 記憶制御装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

COP Corrected version of pamphlet

Free format text: PAGES 1/4-4/4, DRAWINGS, REPLACED BY NEW PAGES 1/4-4/4; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION PURSUANT TO RULE 69(1) EPC (EPO FORM 1205A DATED 09-12-2003)

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase