US20180341564A1 - Method and system for handling bad blocks in a hardware accelerated caching solution - Google Patents

Method and system for handling bad blocks in a hardware accelerated caching solution Download PDF

Info

Publication number
US20180341564A1
US20180341564A1 US15/605,348 US201715605348A US2018341564A1 US 20180341564 A1 US20180341564 A1 US 20180341564A1 US 201715605348 A US201715605348 A US 201715605348A US 2018341564 A1 US2018341564 A1 US 2018341564A1
Authority
US
United States
Prior art keywords
bad block
instructions
command
hash
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/605,348
Other versions
US10528438B2 (en
Inventor
Horia Simionescu
Gowrisankar RADHAKRISHNAN
Timothy Hoglund
Sridhar Rao Veerla
Panthini Pandit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avago Technologies International Sales Pte Ltd
Original Assignee
Avago Technologies International Sales Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Avago Technologies International Sales Pte Ltd filed Critical Avago Technologies International Sales Pte Ltd
Priority to US15/605,348 priority Critical patent/US10528438B2/en
Assigned to AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD. reassignment AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SIMIONESCU, HORIA, RADHAKRISHNAN, GOWRISANKAR, HOGLUND, TIMOTHY, PANDIT, PANTHINI, VEERLA, SRIDHAR RAO
Assigned to AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED reassignment AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED MERGER (SEE DOCUMENT FOR DETAILS). Assignors: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.
Publication of US20180341564A1 publication Critical patent/US20180341564A1/en
Assigned to AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED reassignment AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED CORRECTIVE ASSIGNMENT TO CORRECT THE EXECUTION DATE OF THE MERGER AND APPLICATION NOS. 13/237,550 AND 16/103,107 FROM THE MERGER PREVIOUSLY RECORDED ON REEL 047231 FRAME 0369. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER. Assignors: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.
Application granted granted Critical
Publication of US10528438B2 publication Critical patent/US10528438B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2017Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where memory access, memory control or I/O control functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • G06F11/108Parity data distribution in semiconductor storages, e.g. in SSD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0808Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • G06F12/1018Address translation using page tables, e.g. page table structures involving hashing techniques, e.g. inverted page tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • G06F16/152File search processing using file content signatures, e.g. hash values
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F17/30949
    • G06F17/30952
    • G06F17/30979
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/805Real-time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements
    • G06F2212/621Coherency control relating to peripheral accessing, e.g. from DMA or I/O device

Definitions

  • the present disclosure is generally directed toward computer memory.
  • BBM Bad Block Management
  • FIG. 1 is a block diagram depicting a computing system in accordance with at least some embodiments of the present disclosure
  • FIG. 2 is a block diagram depicting details of an illustrative controller in accordance with at least some embodiments of the present disclosure
  • FIG. 3 is a block diagram depicting details of a data structure used in accordance with at least some embodiments of the present disclosure
  • FIG. 4 is a flow diagram depicting a method of processing a read command in accordance with at least some embodiments of the present disclosure.
  • FIG. 5 is a flow diagram depicting a method of processing a write command in accordance with at least some embodiments of the present disclosure.
  • a method and system are provided that enable bad block processing in a hardware accelerated caching solution.
  • the proposed method identifies the presence of a bad block using a hash value, which is ultimately used in aiding the hardware of the controller to perform automated caching even though a logical disk contains a bad block.
  • the proposed system and method effectively reduce the search time in hardware during a read or write operation, which improves system performance.
  • the cache segment ID (CSID) that represents the row/strip is added to a hash slot along with the hash tag (e.g., row/strip number or a combination of row/strip along with a logical drive number).
  • a bad block entry is added to a software bad block table and at the same time a unique signature (e.g., a string of 0's) is written to those LBAs.
  • a unique signature e.g., a string of 0's
  • embodiments of the present disclosure will be described. While many of the examples depicted and described herein will relate to a RAID architecture, it should be appreciated that embodiments of the present disclosure are not so limited. Indeed, aspects of the present disclosure can be used in any type of computing system and/or memory environment. In particular, embodiments of the present disclosure can be used in any type of memory scheme (whether employed by a RAID controller or some other type of device used in a communication system). In particular, hard drives, hard drive controllers (e.g., SCSI controllers, SAS controllers, or RAID controllers), flash drives, flash drive controllers, etc. may be configured to implement embodiments of the present disclosure. As another example, network cards or the like having cache memory may also be configured to implement embodiments of the present disclosure.
  • hard drives e.g., SCSI controllers, SAS controllers, or RAID controllers
  • flash drives e.g., flash drives, flash drive controllers, etc.
  • network cards or the like having cache memory may also be configured to implement embodiments of the present disclosure.
  • the computing system 100 is shown to include a host system 104 , a controller 108 (e.g., a SCSI controller, a SAS controller, a RAID controller, etc.), and a storage array 112 having a plurality of storage devices 136 a -N therein.
  • the system 100 may utilize any type of data storage architecture.
  • the particular architecture depicted and described herein e.g., a RAID architecture
  • RAID-0 also referred to as a RAID level 0
  • data blocks are stored in order across one or more of the storage devices 136 a -N without redundancy. This effectively means that none of the data blocks are copies of another data block and there is no parity block to recover from failure of a storage device 136 .
  • a RAID-1 also referred to as a RAID level 1
  • RAID-1 uses one or more of the storage devices 136 a -N to store a data block and an equal number of additional mirror devices for storing copies of a stored data block.
  • Higher level RAID schemes can further segment the data into bits, bytes, or blocks for storage across multiple storage devices 136 a -N.
  • One or more of the storage devices 136 a -N may also be used to store error correction or parity information.
  • a single unit of storage can be spread across multiple devices 136 a -N and such a unit of storage may be referred to as a stripe.
  • a stripe may include the related data written to multiple devices 136 a -N as well as the parity information written to a parity storage device 136 a -N.
  • a RAID-5 also referred to as a RAID level 5
  • the data being stored is segmented into blocks for storage across multiple devices 136 a -N with a single parity block for each stripe distributed in a particular configuration across the multiple devices 136 a -N.
  • This scheme can be compared to a RAID-6 (also referred to as a RAID level 6) scheme in which dual parity blocks are determined for a stripe and are distributed across each of the multiple devices 136 a -N in the array 112 .
  • One of the functions of the controller 108 is to make the multiple storage devices 136 a -N in the array 112 appear to a host system 104 as a single high capacity disk drive (e.g., as a storage volume).
  • the controller 108 may be configured to automatically distribute data supplied from the host system 104 across the multiple storage devices 136 a -N (potentially with parity information) without ever exposing the manner in which the data is actually distributed to the host system 104 .
  • the host system 104 is shown to include a processor 116 , an interface 120 , and memory 124 . It should be appreciated that the host system 104 may include additional components without departing from the scope of the present disclosure.
  • the host system 104 in some embodiments, corresponds to a user computer, laptop, workstation, server, collection of servers, or the like. Thus, the host system 104 may or may not be designed to receive input directly from a human user.
  • the processor 116 of the host system 104 may include a microprocessor, central processing unit (CPU), collection of microprocessors, or the like.
  • the memory 124 may be designed to store instructions that enable functionality of the host system 104 when executed by the processor 116 .
  • the memory 124 may also store data that is eventually written by the host system 104 to the storage array 112 . Further still, the memory 124 may be used to store data that is retrieved from the storage array 112 .
  • Illustrative memory 124 devices may include, without limitation, volatile or non-volatile computer memory (e.g., flash memory, RAM, DRAM, ROM, EEPROM, etc.).
  • the interface 120 of the host system 104 enables the host system 104 to communicate with the controller 108 via a host interface 128 of the controller 108 .
  • the interface 120 and host interface(s) 128 may be of a same or similar type (e.g., utilize a common protocol, a common communication medium, etc.) such that commands issued by the host system 104 are receivable at the controller 108 and data retrieved by the controller 108 is transmittable back to the host system 104 .
  • the interfaces 120 , 128 may correspond to parallel or serial computer interfaces that utilize wired or wireless communication channels.
  • the interfaces 120 , 128 may include hardware that enables such wired or wireless communications.
  • the communication protocol used between the host system 104 and the controller 108 may correspond to any type of known host/memory control protocol.
  • Non-limiting examples of protocols that may be used between interfaces 120 , 128 include SAS, SATA, SCSI, FibreChannel (FC), iSCSI, ATA over Ethernet, InfiniBand, or the like.
  • the controller 108 may provide the ability to represent the entire storage array 112 to the host system 104 as a single high volume data storage device. Any known mechanism can be used to accomplish this task.
  • the controller 108 may help to manage the storage devices 136 a -N (which can be hard disk drives, sold-state drives, or combinations thereof) so as to operate as a logical unit.
  • the controller 108 may be physically incorporated into the host device 104 as a Peripheral Component Interconnect (PCI) expansion (e.g., PCI express (PCI)e) card or the like. In such situations, the controller 108 may be referred to as a RAID adapter.
  • PCI Peripheral Component Interconnect
  • the storage devices 136 a -N in the storage array 112 may be of similar types or may be of different types without departing from the scope of the present disclosure.
  • the storage devices 136 a -N may be co-located with one another or may be physically located in different geographical locations.
  • the nature of the storage interface 132 may depend upon the types of storage devices 136 a -N used in the storage array 112 and the desired capabilities of the array 112 .
  • the storage interface 132 may correspond to a virtual interface or an actual interface. As with the other interfaces described herein, the storage interface 132 may include serial or parallel interface technologies. Examples of the storage interface 132 include, without limitation, SAS, SATA, SCSI, FC, iSCSI, ATA over Ethernet, InfiniBand, or the like.
  • the controller 108 is shown to have communication capabilities with a controller cache 140 . While depicted as being separate from the controller 108 , it should be appreciated that the controller cache 140 may be integral to the controller 108 , meaning that components of the controller 108 and the controller cache 140 may be contained within a single physical housing or computing unit (e.g., server blade).
  • the controller cache 140 is provided to enable the controller 108 to perform caching operations.
  • the controller 108 may employ caching operations during execution of I/O commands received from the host system 104 . Depending upon the nature of the I/O command and the amount of information being processed during the command, the controller 108 may require a large number of cache memory modules 148 (also referred to as cache memory) or a smaller number of cache memory modules 148 .
  • the memory modules 148 may correspond to flash memory, RAM, DRAM, DDR memory, or some other type of computer memory that is quickly accessible and can be rewritten multiple times.
  • the number of separate memory modules 148 in the controller cache 140 is typically larger than one, although a controller cache 140 may be configured to operate with a single memory module 148 if desired.
  • the cache interface 144 may correspond to any interconnect that enables the controller 108 to access the memory modules 148 , temporarily store data thereon, and/or retrieve data stored thereon in connection with performing an I/O command or some other executable command.
  • the controller cache 140 may be integrated with the controller 108 and may be executed on a CPU chip or placed on a separate chip within the controller 108 .
  • the interface 144 may correspond to a separate bus interconnect within the CPU or traces connecting a chip of the controller cache 140 with a chip executing the processor of the controller 108 .
  • the controller cache 140 may be external to the controller 108 in which case the interface 144 may correspond to a serial or parallel data port.
  • the controller 108 is shown to include the host interface(s) 128 and storage interface(s) 132 .
  • the controller 108 is also shown to include a processor 204 , memory 208 (e.g., a main controller memory), one or more drivers 212 , and a power source 216 .
  • the processor 204 may include an Integrated Circuit (IC) chip or multiple IC chips, a CPU, a microprocessor, or the like.
  • the processor 204 may be configured to execute instructions in memory 208 that are shown to include a host I/O manager 232 , a buffer manager 248 , a cache manager 252 , a RAID manager 256 , and a SAS manager 260 .
  • the processor 204 may utilize buffer memory 220 , one or more Internal Scatter Gather Lists (ISGLs) 224 , and a bad block table 228 .
  • the host I/O manager 232 is shown to include a plurality of sub-routines that include, without limitation, a host message unit 236 , a command extraction unit 240 , and a completion engine 244 .
  • Each of the components may correspond to different functional blocks that operate in their own local memory loading the global memory (e.g. a global buffer memory 220 or memory 208 ) on an as-needed basis. Each of these different functional blocks can be accelerated by different hardware threads without departing from the scope of the present disclosure.
  • the controller 108 may be considered to have hardware and firmware components.
  • the various manager components e.g., host I/O manager 232 , buffer manager 248 , cache manager 252 , RAID manager 256 , and SAS manager 260 ) may be considered firmware components even though they can be accelerated by different hardware threads.
  • the hardware components of the controller 108 may include drivers 212 , the processor 204 , the interfaces 128 , 132 , the controller cache 140 , etc. As will be discussed in further detail herein, the hardware components and software components of the controller 108 may be enabled to communicate with one another using specialized messages (e.g., LMIDs). These messages may contain information describing operations or routines to be executed by the various components of the controller 108 as well as results of operations already performed by the controller 108 .
  • specialized messages e.g., LMIDs
  • the memory 208 may be volatile and/or non-volatile in nature. As indicated above, the memory 208 may include any hardware component or collection of hardware components that are capable of storing instructions and communicating those instructions to the processor 204 for execution. Non-limiting examples of memory 208 include RAM, ROM, flash memory, EEPROM, variants thereof, combinations thereof, and the like. Similarly, the buffer memory 220 may be volatile or non-volatile in nature. The buffer memory may be configured for multiple read/writes and may be adapted for quick access by the processor 204 .
  • the instructions stored in memory 208 are shown to be different instruction sets, but it should be appreciated that the instructions can be combined into a smaller number of instruction sets without departing from the scope of the present disclosure.
  • the host I/O manager 232 when executed, enable the processor 204 to manage I/O commands received from the host system 104 and facilitate higher-level communications with the host system 104 .
  • the host I/O manager 232 may utilize the host message unit 236 to process incoming messages received from the host system 104 .
  • the controller 108 may receive messages from the host system 104 in an MPI protocol.
  • the host message unit 236 may bring down the messages received from the host system 104 and pass the content of the messages to the command dispatcher unit 240 .
  • the command extraction unit 240 may be configured to determine if a particular command in a message is acceleratable (e.g., capable of being passed to a particular functional block to facilitate hardware acceleration). If a command is determined to be acceleratable, then the command dispatcher unit 240 may implement a hardware acceleration process and generate an appropriate Local Message ID (LMID) that represents all of the information received from the host system 104 (in the command). The LMID effectively represents the command received from the host system 104 , but is in a different format that is understandable by the managers 248 , 252 , 256 , 260 .
  • LMID Local Message ID
  • the command dispatcher unit 240 may, in some embodiments, route the various commands (e.g., LMIDs) to one or more of the buffer manager 248 , cache manager 252 , RAID manager 256 , and SAS manager 260 .
  • the routing of the commands may depend upon a type of the command and the function to be executed.
  • the completion engine of the host I/O manager 232 may be responsible for reporting to the host system 104 that an I/O command has been completed by the controller 108 .
  • the buffer manager 248 may include instructions that, when executed, enable the processor 204 to perform various buffer functions. As an example, the buffer manager 248 may enable the processor 204 to recognize a write command and utilize the buffer memory 220 in connection with executing the write command. In some embodiments, any command or function that leverages the buffer memory 220 may utilize the buffer manager 248 .
  • the cache manager 252 may include instructions that, when executed, enable the processor 204 to perform various caching functions. As an example, the cache manager 252 may enable the processor 204 to respond to read commands or read-ahead commands. The cache manager 252 may also enable the processor 204 to communicate with the controller cache 140 and leverage the memory modules 148 of the controller cache 140 . The cache manager 252 may also manage the creation and lifecycle of cache frame anchors and/or ISGLs 224 . As an example, as caching functions are executed, one or more cache frame anchors may be created or utilized to facilitate the caching function. As used herein, an ISGL may represent the snapshot of data at a given point in time it is used. In some embodiments, the ISGL is capable of encapsulating all the metadata that is required for an I/O read/write and/or read-ahead request, thereby providing an efficient communication mechanism between various modules for processing the read/write and/or read-ahead operations.
  • the RAID manager 256 and/or SAS manager 260 may include instructions that, when executed, enable the processor 204 to communicate with the storage array 112 or storage devices 136 therein.
  • the RAID manager 256 and/or SAS manager 260 may receive commands either directly from the host I/O manager 232 (if no caching was needed) or they may receive commands from the cache manager 252 after an appropriate caching process has been performed.
  • the RAID manager 256 and/or SAS manager 260 may enable the processor 204 to finalize read or write commands and exchange data with the storage array 112 .
  • the driver(s) 212 may comprise firmware, hardware, software, or combinations thereof that enable the processor 204 to make use of other hardware components in the controller 108 .
  • different drivers 212 may be provided to support functions of the interfaces 128 , 132 .
  • separate drivers 212 may be provided to support functions of the buffer memory 220 .
  • the drivers 212 may perform the low-level routines that allow the processor 204 to communicate with the other hardware components and respond to commands received from the processor 204 .
  • the power source 216 may correspond to hardware components that provide the controller 108 with the power necessary to run the processor 204 and other components.
  • the power source 216 may correspond to a power converter that receives AC power from an external source (e.g., a power outlet) and converts the AC power into DC power that is useable by the other hardware components of the controller 108 .
  • the power source 216 may correspond to an internal power source (e.g., a battery pack, bank of capacitors, etc.) that provides power to the hardware components of the controller 108 .
  • the data structure 300 is shown to include a number of fields that can facilitate management of bad blocks in logical drives, virtual drives, physical drives, etc.
  • the data structure 300 may correspond to a hash slot data structure used to store one or a plurality of hash slot IDs, associated flags, and the like.
  • the depicted data structure 300 which should not be construed as limiting embodiments of the present disclosure, is shown to include a first hash slot tag field 304 , a second hash slot tag field 316 , a cache segment ID (CSID) field 308 , and a flag(s) field 312 .
  • the hash slot tag fields 304 , 316 may be used to store hash values used in connection with managing bad blocks of data that have been reported by devices 136 .
  • the data structure 300 contains one or more flags 312 to indicate the presence of bad block and if a bad block is present whether the corresponding CSID 308 is valid or invalid. If flags value is a first predetermined value (e.g. “100”), this may provide an indication that the cache segment/row is valid in CSID field 308 . If flags value is a second predetermined value (e.g., “110”), this may provide an indication of the presence of bad block(s) in strip/row, this may also represent that the CSID field 308 is valid and it represents an existing cache segment/row. If the flags value is a third predetermined value (e.g., “111”), this may provide an indication of the presence of bad block in strip/row and that the CSID field 308 not valid.
  • a first predetermined value e.g. “100”
  • a second predetermined value e.g., “110”
  • the flags value is a third predetermined value (e.g., “111”), this
  • a row/strip contains more than one LBA
  • indication of a bad block in the data structure 300 does not guarantee the presence of a bad block entry in the firmware's bad block table 228 . Accordingly, the controller's 108 firmware would need to resolve these false positives.
  • an entry is added in the bad block table 228 .
  • a hash slot 304 is updated with flags ‘111’ if a hash entry is not already present and with ‘110’ if a hash entry is already present with a valid CSID.
  • a new write request to the same row may be diverted to firmware after performing write buffering.
  • the read command is diverted to firmware.
  • the firmware checks if there is an entry in the bad block table 228 . If such an entry is present, then the I/O (e.g., read request) is failed back to host 104 indicating a bad block. If not, then the I/O is resubmitted back to hardware for read command processing.
  • the CSID that represents the row/strip is added to the hash slot along with the hash tag (e.g., a row/strip number or a combination of row/strip along with a logical device number).
  • the hash tag e.g., a row/strip number or a combination of row/strip along with a logical device number.
  • a bad block table 228 entry is added to the bad block table 228 and at the same time a unique signature (e.g., 0's) is written to those LBAs.
  • a unique signature e.g., 0's
  • the indication of bad block is provided to the hardware with a simple method utilizing the any type of known or yet-to-be-developed hashing algorithm.
  • the false positives are resolved by the firmware of the controller 108 which can search the bad block table 228 to check if the LBA is present in the bad block. If not they can be re-processed in an automated hardware path.
  • the method begins when a read request is received at the controller 108 (step 404 ). When such a command is received, the method continues with the hardware of the controller 108 performing a hash search (step 408 ). In this step, the hardware will compute a hash value with information contained in the read request (e.g., with an LBA number, a LD Number value, etc.) which serves as an index into the hash table. Now the hardware will extract the information from the hash table for this index and check if specific details are matching.
  • a hash value with information contained in the read request (e.g., with an LBA number, a LD Number value, etc.) which serves as an index into the hash table.
  • Hash tag derived from Row Number for this request is matching the hash tag present in the hash.
  • Hash Slot Tag Strip_or_Row_number>>numb_of_hash_slots_in_bits). It should be appreciated that there could be other ways of identifying if there is a hash hit and the current method does not assume or require a specific implementation.
  • the hardware will check the data structure 300 to determine if a bad block is present (step 416 ). Specifically, the hardware can check the flag(s) 312 and/or CSID 308 to determine if a predetermined value is maintained therein (e.g., a value of ‘110’ or ‘111’) indicating the existence of a bad block.
  • a predetermined value e.g., a value of ‘110’ or ‘111’
  • step 420 If a bad block is present (step 420 ), then the read request is diverted to firmware of the controller 108 (step 424 ). The firmware will then reference the bad block table 228 to see if any entries exist in the bad block table 228 that match that of the read request (step 428 ). Specifically, the firmware of the controller 108 may analyze the LBA or LBA range associated with the read request and if any address is listed in the bad block table 228 that falls within the LBA range associated with the read request, then the firmware will confirm the existence of a bad block associated with the read request (step 432 ). If this query is answered affirmatively, then the command is completed (step 436 ). The manner in which this step is performed may depend upon the nature of the RAID system being used.
  • the firmware understands that the drive data for this LBA was already cleared with 0's because of presence of bad blocks and that this IO command should not be issued to the drive.
  • the firmware will complete the read request with a status as “read failed” because of the existence of the bad block containing data needed to complete the read request.
  • the firmware recovers the data from the other arms to satisfy the host read request (step 436 ). Any type of known recovery process can be used without departing from the scope of the present disclosure.
  • Embodiments of the present disclosure indicate a bad block to the hardware so that the hardware cannot process the read requests through accelerated method if such a bad block is present. The method then ends (step 440 ).
  • step 412 If there is no hash hit detected in step 412 , there is no indication of a bad block in the flag(s) 312 , or there is no bad block present in the bad block table 228 , then the method proceeds with the hardware performing the read operation with or without diverting the read request to the firmware (step 444 ). If either of the following conditions are true, then the hardware performs the read request directly: (i) no hash hit is detected or (ii) no flag 312 indicated a presence of a bad block. Conversely, if the method resulted in the read request being diverted to firmware, then the read request will only be performed by the hardware if a false positive condition was detected at step 432 .
  • step 432 if the method proceeds to step 432 and there is no matching address in the bad block table 228 that corresponds to an address associated with the read request, then the initial match made in steps 412 and 420 is treated as a false positive and the read command is re-issued through a hardware automated path.
  • the method then proceeds with the hardware completing the command (step 448 ).
  • the manner in which the command is completed in step 448 may depend upon the nature of the RAID array. In a RAID 0 volume, the firmware will complete the read request with a status being reported back as a read success. In a RAID 1, RAID 5, or RAID 6 volume the firmware will recover the data from the other arms to satisfy the host read request. Thereafter, the method ends (step 440 ).
  • the method begins when a write request is received at the controller 108 from the host 104 (step 504 ).
  • the controller hardware initially responds by initiating a buffering process (step 508 ).
  • some or all data contained in the write request is temporarily stored in buffer memory 220 until the data can be committed to the storage device(s) 136 .
  • write buffering may include allocating buffer memory 220 , allocating cache segments/rows, and then stitching the buffers into cache and adding the cache segment/rows to a hash value.
  • the method continues with the hardware performing a hash search and/or analysis of flag(s) 312 in a hash slot element to determine if a bad block is present (step 512 ). If the hash slot element indicates the presence of a bad block, then the write command is diverted from hardware to firmware of the controller 108 (step 516 ). The firmware then checks the I/O range (e.g., the LBA spanned by the write request) against bad blocks identified in the bad block table 228 (step 520 ). If the range covered by the write request contains a bad block as determined with reference to the bad block table 228 , then the entry in the bad block table 228 is cleared and the firmware is allowed to complete the write request (step 524 ).
  • the I/O range e.g., the LBA spanned by the write request
  • the write request is completed through the automated hardware path and the write request does not need to be processed by the firmware of the controller 108 (step 528 ).

Abstract

A system and method for managing bad blocks in a hardware accelerated caching solution are provided. The disclosed method includes receiving an Input/Output (I/O) request, performing a hash search for the I/O request against a hash slot data structure, and based on the results of the hash search, either performing the I/O request with a data block identified in the I/O request or diverting the I/O request to a new data block not identified in the I/O request. The diversion may also include diverting the I/O request from hardware to firmware of a memory controller.

Description

    FIELD OF THE DISCLOSURE
  • The present disclosure is generally directed toward computer memory.
  • BACKGROUND
  • Memory solutions such as flash memory and certain RAID systems provide a Bad Block Management (BBM) feature where the bad blocks on a physical drive are cleared by writing a value of ‘0’ and the logical drive-level bad block information is maintained in a software table. When caching is accelerated in hardware, existing BBM solutions cannot simply be used to manage the bad blocks on a physical drive.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present disclosure is described in conjunction with the appended figures, which are not necessarily drawn to scale:
  • FIG. 1 is a block diagram depicting a computing system in accordance with at least some embodiments of the present disclosure;
  • FIG. 2 is a block diagram depicting details of an illustrative controller in accordance with at least some embodiments of the present disclosure;
  • FIG. 3 is a block diagram depicting details of a data structure used in accordance with at least some embodiments of the present disclosure;
  • FIG. 4 is a flow diagram depicting a method of processing a read command in accordance with at least some embodiments of the present disclosure; and
  • FIG. 5 is a flow diagram depicting a method of processing a write command in accordance with at least some embodiments of the present disclosure.
  • DETAILED DESCRIPTION
  • The ensuing description provides embodiments only, and is not intended to limit the scope, applicability, or configuration of the claims. Rather, the ensuing description will provide those skilled in the art with an enabling description for implementing the described embodiments. It being understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the appended claims.
  • Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and this disclosure.
  • As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprise,” “comprises,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The term “and/or” includes any and all combinations of one or more of the associated listed items.
  • As will be discussed in further detail herein, a method and system are provided that enable bad block processing in a hardware accelerated caching solution. The proposed method identifies the presence of a bad block using a hash value, which is ultimately used in aiding the hardware of the controller to perform automated caching even though a logical disk contains a bad block. The proposed system and method effectively reduce the search time in hardware during a read or write operation, which improves system performance.
  • In some embodiments, when row/strip needs to be cached, the cache segment ID (CSID) that represents the row/strip is added to a hash slot along with the hash tag (e.g., row/strip number or a combination of row/strip along with a logical drive number).
  • When an unrecoverable medium error is detected on an LBA, a bad block entry is added to a software bad block table and at the same time a unique signature (e.g., a string of 0's) is written to those LBAs. In a hardware accelerated caching solution, when the bad block is present in the table, the I/O commands from the host cannot completely pass through an automated path.
  • Searching a bad block table by the caching hardware results in significant performance degradation. Hence the indication of a bad block is provided to the hardware with a simple method utilizing the hashing functions. The false positive matches are resolved by the controller firmware which can search the bad block table to check if the LBA is present for the bad block. If not, then the I/O command can be processed in an automated path.
  • Although embodiments of the present disclosure will be described in connection with managing a RAID architecture (e.g., a RAID-5 or RAID-6 type of architecture), it should be appreciated that embodiments of the present disclosure are not so limited. In particular, any controller implements hardware accelerated caching and manages bad blocks in memory (regardless of memory architecture used) can benefit from embodiments of the present disclosure.
  • With reference to FIGS. 1-5, various embodiments of the present disclosure will be described. While many of the examples depicted and described herein will relate to a RAID architecture, it should be appreciated that embodiments of the present disclosure are not so limited. Indeed, aspects of the present disclosure can be used in any type of computing system and/or memory environment. In particular, embodiments of the present disclosure can be used in any type of memory scheme (whether employed by a RAID controller or some other type of device used in a communication system). In particular, hard drives, hard drive controllers (e.g., SCSI controllers, SAS controllers, or RAID controllers), flash drives, flash drive controllers, etc. may be configured to implement embodiments of the present disclosure. As another example, network cards or the like having cache memory may also be configured to implement embodiments of the present disclosure.
  • With reference now to FIG. 1, additional details of a computing system 100 capable of implementing memory management techniques will be described in accordance with at least some embodiments of the present disclosure. The computing system 100 is shown to include a host system 104, a controller 108 (e.g., a SCSI controller, a SAS controller, a RAID controller, etc.), and a storage array 112 having a plurality of storage devices 136 a-N therein. The system 100 may utilize any type of data storage architecture. The particular architecture depicted and described herein (e.g., a RAID architecture) should not be construed as limiting embodiments of the present disclosure. If implemented as a RAID architecture, however, it should be appreciated that any type of RAID scheme may be employed (e.g., RAID-0, RAID-1, RAID-2, . . . , RAID-5, RAID-6, etc.).
  • In a RAID-0 (also referred to as a RAID level 0) scheme, data blocks are stored in order across one or more of the storage devices 136 a-N without redundancy. This effectively means that none of the data blocks are copies of another data block and there is no parity block to recover from failure of a storage device 136. A RAID-1 (also referred to as a RAID level 1) scheme, on the other hand, uses one or more of the storage devices 136 a-N to store a data block and an equal number of additional mirror devices for storing copies of a stored data block. Higher level RAID schemes can further segment the data into bits, bytes, or blocks for storage across multiple storage devices 136 a-N. One or more of the storage devices 136 a-N may also be used to store error correction or parity information.
  • A single unit of storage can be spread across multiple devices 136 a-N and such a unit of storage may be referred to as a stripe. A stripe, as used herein and as is well known in the data storage arts, may include the related data written to multiple devices 136 a-N as well as the parity information written to a parity storage device 136 a-N. In a RAID-5 (also referred to as a RAID level 5) scheme, the data being stored is segmented into blocks for storage across multiple devices 136 a-N with a single parity block for each stripe distributed in a particular configuration across the multiple devices 136 a-N. This scheme can be compared to a RAID-6 (also referred to as a RAID level 6) scheme in which dual parity blocks are determined for a stripe and are distributed across each of the multiple devices 136 a-N in the array 112.
  • One of the functions of the controller 108 is to make the multiple storage devices 136 a-N in the array 112 appear to a host system 104 as a single high capacity disk drive (e.g., as a storage volume). Thus, the controller 108 may be configured to automatically distribute data supplied from the host system 104 across the multiple storage devices 136 a-N (potentially with parity information) without ever exposing the manner in which the data is actually distributed to the host system 104.
  • In the depicted embodiment, the host system 104 is shown to include a processor 116, an interface 120, and memory 124. It should be appreciated that the host system 104 may include additional components without departing from the scope of the present disclosure. The host system 104, in some embodiments, corresponds to a user computer, laptop, workstation, server, collection of servers, or the like. Thus, the host system 104 may or may not be designed to receive input directly from a human user.
  • The processor 116 of the host system 104 may include a microprocessor, central processing unit (CPU), collection of microprocessors, or the like. The memory 124 may be designed to store instructions that enable functionality of the host system 104 when executed by the processor 116. The memory 124 may also store data that is eventually written by the host system 104 to the storage array 112. Further still, the memory 124 may be used to store data that is retrieved from the storage array 112. Illustrative memory 124 devices may include, without limitation, volatile or non-volatile computer memory (e.g., flash memory, RAM, DRAM, ROM, EEPROM, etc.).
  • The interface 120 of the host system 104 enables the host system 104 to communicate with the controller 108 via a host interface 128 of the controller 108. In some embodiments, the interface 120 and host interface(s) 128 may be of a same or similar type (e.g., utilize a common protocol, a common communication medium, etc.) such that commands issued by the host system 104 are receivable at the controller 108 and data retrieved by the controller 108 is transmittable back to the host system 104. The interfaces 120, 128 may correspond to parallel or serial computer interfaces that utilize wired or wireless communication channels. The interfaces 120, 128 may include hardware that enables such wired or wireless communications. The communication protocol used between the host system 104 and the controller 108 may correspond to any type of known host/memory control protocol. Non-limiting examples of protocols that may be used between interfaces 120, 128 include SAS, SATA, SCSI, FibreChannel (FC), iSCSI, ATA over Ethernet, InfiniBand, or the like.
  • The controller 108 may provide the ability to represent the entire storage array 112 to the host system 104 as a single high volume data storage device. Any known mechanism can be used to accomplish this task. The controller 108 may help to manage the storage devices 136 a-N (which can be hard disk drives, sold-state drives, or combinations thereof) so as to operate as a logical unit. In some embodiments, the controller 108 may be physically incorporated into the host device 104 as a Peripheral Component Interconnect (PCI) expansion (e.g., PCI express (PCI)e) card or the like. In such situations, the controller 108 may be referred to as a RAID adapter.
  • The storage devices 136 a-N in the storage array 112 may be of similar types or may be of different types without departing from the scope of the present disclosure. The storage devices 136 a-N may be co-located with one another or may be physically located in different geographical locations. The nature of the storage interface 132 may depend upon the types of storage devices 136 a-N used in the storage array 112 and the desired capabilities of the array 112. The storage interface 132 may correspond to a virtual interface or an actual interface. As with the other interfaces described herein, the storage interface 132 may include serial or parallel interface technologies. Examples of the storage interface 132 include, without limitation, SAS, SATA, SCSI, FC, iSCSI, ATA over Ethernet, InfiniBand, or the like.
  • The controller 108 is shown to have communication capabilities with a controller cache 140. While depicted as being separate from the controller 108, it should be appreciated that the controller cache 140 may be integral to the controller 108, meaning that components of the controller 108 and the controller cache 140 may be contained within a single physical housing or computing unit (e.g., server blade). The controller cache 140 is provided to enable the controller 108 to perform caching operations. The controller 108 may employ caching operations during execution of I/O commands received from the host system 104. Depending upon the nature of the I/O command and the amount of information being processed during the command, the controller 108 may require a large number of cache memory modules 148 (also referred to as cache memory) or a smaller number of cache memory modules 148. The memory modules 148 may correspond to flash memory, RAM, DRAM, DDR memory, or some other type of computer memory that is quickly accessible and can be rewritten multiple times. The number of separate memory modules 148 in the controller cache 140 is typically larger than one, although a controller cache 140 may be configured to operate with a single memory module 148 if desired.
  • The cache interface 144 may correspond to any interconnect that enables the controller 108 to access the memory modules 148, temporarily store data thereon, and/or retrieve data stored thereon in connection with performing an I/O command or some other executable command. In some embodiments, the controller cache 140 may be integrated with the controller 108 and may be executed on a CPU chip or placed on a separate chip within the controller 108. In such a scenario, the interface 144 may correspond to a separate bus interconnect within the CPU or traces connecting a chip of the controller cache 140 with a chip executing the processor of the controller 108. In other embodiments, the controller cache 140 may be external to the controller 108 in which case the interface 144 may correspond to a serial or parallel data port.
  • With reference now to FIG. 2 additional details of a controller 108 will be described in accordance with at least some embodiments of the present disclosure. The controller 108 is shown to include the host interface(s) 128 and storage interface(s) 132. The controller 108 is also shown to include a processor 204, memory 208 (e.g., a main controller memory), one or more drivers 212, and a power source 216.
  • The processor 204 may include an Integrated Circuit (IC) chip or multiple IC chips, a CPU, a microprocessor, or the like. The processor 204 may be configured to execute instructions in memory 208 that are shown to include a host I/O manager 232, a buffer manager 248, a cache manager 252, a RAID manager 256, and a SAS manager 260. Furthermore, in connection with performing caching functions, buffer functions, or bad block management functions, the processor 204 may utilize buffer memory 220, one or more Internal Scatter Gather Lists (ISGLs) 224, and a bad block table 228. The host I/O manager 232 is shown to include a plurality of sub-routines that include, without limitation, a host message unit 236, a command extraction unit 240, and a completion engine 244.
  • Each of the components (e.g., host I/O manager 232, buffer manager 248, cache manager 252, RAID manager 256, and SAS manager 260) may correspond to different functional blocks that operate in their own local memory loading the global memory (e.g. a global buffer memory 220 or memory 208) on an as-needed basis. Each of these different functional blocks can be accelerated by different hardware threads without departing from the scope of the present disclosure. The controller 108 may be considered to have hardware and firmware components. The various manager components (e.g., host I/O manager 232, buffer manager 248, cache manager 252, RAID manager 256, and SAS manager 260) may be considered firmware components even though they can be accelerated by different hardware threads. The hardware components of the controller 108 may include drivers 212, the processor 204, the interfaces 128, 132, the controller cache 140, etc. As will be discussed in further detail herein, the hardware components and software components of the controller 108 may be enabled to communicate with one another using specialized messages (e.g., LMIDs). These messages may contain information describing operations or routines to be executed by the various components of the controller 108 as well as results of operations already performed by the controller 108.
  • The memory 208 may be volatile and/or non-volatile in nature. As indicated above, the memory 208 may include any hardware component or collection of hardware components that are capable of storing instructions and communicating those instructions to the processor 204 for execution. Non-limiting examples of memory 208 include RAM, ROM, flash memory, EEPROM, variants thereof, combinations thereof, and the like. Similarly, the buffer memory 220 may be volatile or non-volatile in nature. The buffer memory may be configured for multiple read/writes and may be adapted for quick access by the processor 204.
  • The instructions stored in memory 208 are shown to be different instruction sets, but it should be appreciated that the instructions can be combined into a smaller number of instruction sets without departing from the scope of the present disclosure. The host I/O manager 232, when executed, enable the processor 204 to manage I/O commands received from the host system 104 and facilitate higher-level communications with the host system 104. In some embodiments, the host I/O manager 232 may utilize the host message unit 236 to process incoming messages received from the host system 104. As a non-limiting example, the controller 108 may receive messages from the host system 104 in an MPI protocol. The host message unit 236 may bring down the messages received from the host system 104 and pass the content of the messages to the command dispatcher unit 240. The command extraction unit 240 may be configured to determine if a particular command in a message is acceleratable (e.g., capable of being passed to a particular functional block to facilitate hardware acceleration). If a command is determined to be acceleratable, then the command dispatcher unit 240 may implement a hardware acceleration process and generate an appropriate Local Message ID (LMID) that represents all of the information received from the host system 104 (in the command). The LMID effectively represents the command received from the host system 104, but is in a different format that is understandable by the managers 248, 252, 256, 260. The command dispatcher unit 240 may, in some embodiments, route the various commands (e.g., LMIDs) to one or more of the buffer manager 248, cache manager 252, RAID manager 256, and SAS manager 260. The routing of the commands may depend upon a type of the command and the function to be executed. The completion engine of the host I/O manager 232 may be responsible for reporting to the host system 104 that an I/O command has been completed by the controller 108.
  • The buffer manager 248 may include instructions that, when executed, enable the processor 204 to perform various buffer functions. As an example, the buffer manager 248 may enable the processor 204 to recognize a write command and utilize the buffer memory 220 in connection with executing the write command. In some embodiments, any command or function that leverages the buffer memory 220 may utilize the buffer manager 248.
  • The cache manager 252 may include instructions that, when executed, enable the processor 204 to perform various caching functions. As an example, the cache manager 252 may enable the processor 204 to respond to read commands or read-ahead commands. The cache manager 252 may also enable the processor 204 to communicate with the controller cache 140 and leverage the memory modules 148 of the controller cache 140. The cache manager 252 may also manage the creation and lifecycle of cache frame anchors and/or ISGLs 224. As an example, as caching functions are executed, one or more cache frame anchors may be created or utilized to facilitate the caching function. As used herein, an ISGL may represent the snapshot of data at a given point in time it is used. In some embodiments, the ISGL is capable of encapsulating all the metadata that is required for an I/O read/write and/or read-ahead request, thereby providing an efficient communication mechanism between various modules for processing the read/write and/or read-ahead operations.
  • The RAID manager 256 and/or SAS manager 260 may include instructions that, when executed, enable the processor 204 to communicate with the storage array 112 or storage devices 136 therein. In some embodiments, the RAID manager 256 and/or SAS manager 260 may receive commands either directly from the host I/O manager 232 (if no caching was needed) or they may receive commands from the cache manager 252 after an appropriate caching process has been performed. When invoked, the RAID manager 256 and/or SAS manager 260 may enable the processor 204 to finalize read or write commands and exchange data with the storage array 112.
  • The driver(s) 212 may comprise firmware, hardware, software, or combinations thereof that enable the processor 204 to make use of other hardware components in the controller 108. For instance, different drivers 212 may be provided to support functions of the interfaces 128, 132. As another example, separate drivers 212 may be provided to support functions of the buffer memory 220. The drivers 212 may perform the low-level routines that allow the processor 204 to communicate with the other hardware components and respond to commands received from the processor 204.
  • The power source 216 may correspond to hardware components that provide the controller 108 with the power necessary to run the processor 204 and other components. As an example, the power source 216 may correspond to a power converter that receives AC power from an external source (e.g., a power outlet) and converts the AC power into DC power that is useable by the other hardware components of the controller 108. Alternatively or additionally, the power source 216 may correspond to an internal power source (e.g., a battery pack, bank of capacitors, etc.) that provides power to the hardware components of the controller 108.
  • With reference now to FIG. 3, additional details of a data structure 300 will be described in accordance with at least some embodiments of the present disclosure. The data structure 300 is shown to include a number of fields that can facilitate management of bad blocks in logical drives, virtual drives, physical drives, etc. In some embodiments, the data structure 300 may correspond to a hash slot data structure used to store one or a plurality of hash slot IDs, associated flags, and the like. The depicted data structure 300, which should not be construed as limiting embodiments of the present disclosure, is shown to include a first hash slot tag field 304, a second hash slot tag field 316, a cache segment ID (CSID) field 308, and a flag(s) field 312. The hash slot tag fields 304, 316 may be used to store hash values used in connection with managing bad blocks of data that have been reported by devices 136.
  • The data structure 300 contains one or more flags 312 to indicate the presence of bad block and if a bad block is present whether the corresponding CSID 308 is valid or invalid. If flags value is a first predetermined value (e.g. “100”), this may provide an indication that the cache segment/row is valid in CSID field 308. If flags value is a second predetermined value (e.g., “110”), this may provide an indication of the presence of bad block(s) in strip/row, this may also represent that the CSID field 308 is valid and it represents an existing cache segment/row. If the flags value is a third predetermined value (e.g., “111”), this may provide an indication of the presence of bad block in strip/row and that the CSID field 308 not valid.
  • Since a row/strip contains more than one LBA, indication of a bad block in the data structure 300 does not guarantee the presence of a bad block entry in the firmware's bad block table 228. Accordingly, the controller's 108 firmware would need to resolve these false positives. When there is an unrecoverable medium error on a RAID volume, an entry is added in the bad block table 228. Along with the entry in the bad block table 228, a hash slot 304 is updated with flags ‘111’ if a hash entry is not already present and with ‘110’ if a hash entry is already present with a valid CSID. A new write request to the same row may be diverted to firmware after performing write buffering. This enables the firmware to check the bad block table 228 and see if the new write is overwriting the bad blocks. If this is the case, then the entry in the bad block table 228 is cleared and the flags 312 in the hash entry are cleared to indicate that there is no bad block. If the new write is not overwriting all the bad blocks in the row, then the firmware leaves the bad block table alone and the information in the flags 312 is left as-is.
  • While processing a read request, if the hash slot 304, 316 indicates the presence of a bad block, the read command is diverted to firmware. The firmware checks if there is an entry in the bad block table 228. If such an entry is present, then the I/O (e.g., read request) is failed back to host 104 indicating a bad block. If not, then the I/O is resubmitted back to hardware for read command processing.
  • When a row/strip needs to be cached the CSID that represents the row/strip is added to the hash slot along with the hash tag (e.g., a row/strip number or a combination of row/strip along with a logical device number). When an unrecoverable medium error is detected on an LBA, a bad block table 228 entry is added to the bad block table 228 and at the same time a unique signature (e.g., 0's) is written to those LBAs. In a hardware automated caching solution, when the bad block is present in the bad block table 228, the I/O commands cannot go through an automated hardware path completely. Searching the bad block table 228 by the caching hardware would result in significant performance degradation. Hence the indication of bad block is provided to the hardware with a simple method utilizing the any type of known or yet-to-be-developed hashing algorithm. The false positives are resolved by the firmware of the controller 108 which can search the bad block table 228 to check if the LBA is present in the bad block. If not they can be re-processed in an automated hardware path.
  • With reference now to FIG. 4, additional details of a method of processing a read command will be described in accordance with at least some embodiments of the present disclosure. The method begins when a read request is received at the controller 108 (step 404). When such a command is received, the method continues with the hardware of the controller 108 performing a hash search (step 408). In this step, the hardware will compute a hash value with information contained in the read request (e.g., with an LBA number, a LD Number value, etc.) which serves as an index into the hash table. Now the hardware will extract the information from the hash table for this index and check if specific details are matching. For example, hash tag derived from Row Number for this request is matching the hash tag present in the hash. (Hash Slot Tag=Strip_or_Row_number>>numb_of_hash_slots_in_bits). It should be appreciated that there could be other ways of identifying if there is a hash hit and the current method does not assume or require a specific implementation.
  • If a hash hit is detected (step 412), then the hardware will check the data structure 300 to determine if a bad block is present (step 416). Specifically, the hardware can check the flag(s) 312 and/or CSID 308 to determine if a predetermined value is maintained therein (e.g., a value of ‘110’ or ‘111’) indicating the existence of a bad block.
  • If a bad block is present (step 420), then the read request is diverted to firmware of the controller 108 (step 424). The firmware will then reference the bad block table 228 to see if any entries exist in the bad block table 228 that match that of the read request (step 428). Specifically, the firmware of the controller 108 may analyze the LBA or LBA range associated with the read request and if any address is listed in the bad block table 228 that falls within the LBA range associated with the read request, then the firmware will confirm the existence of a bad block associated with the read request (step 432). If this query is answered affirmatively, then the command is completed (step 436). The manner in which this step is performed may depend upon the nature of the RAID system being used. In some embodiments, if the query is answered affirmatively, then the firmware understands that the drive data for this LBA was already cleared with 0's because of presence of bad blocks and that this IO command should not be issued to the drive. For a RAID 0 volume, the firmware will complete the read request with a status as “read failed” because of the existence of the bad block containing data needed to complete the read request. For a RAID 1 or RAID 5 or RAID 6, on the other hand, the firmware recovers the data from the other arms to satisfy the host read request (step 436). Any type of known recovery process can be used without departing from the scope of the present disclosure. Embodiments of the present disclosure indicate a bad block to the hardware so that the hardware cannot process the read requests through accelerated method if such a bad block is present. The method then ends (step 440).
  • If there is no hash hit detected in step 412, there is no indication of a bad block in the flag(s) 312, or there is no bad block present in the bad block table 228, then the method proceeds with the hardware performing the read operation with or without diverting the read request to the firmware (step 444). If either of the following conditions are true, then the hardware performs the read request directly: (i) no hash hit is detected or (ii) no flag 312 indicated a presence of a bad block. Conversely, if the method resulted in the read request being diverted to firmware, then the read request will only be performed by the hardware if a false positive condition was detected at step 432. Said another way, if the method proceeds to step 432 and there is no matching address in the bad block table 228 that corresponds to an address associated with the read request, then the initial match made in steps 412 and 420 is treated as a false positive and the read command is re-issued through a hardware automated path. The method then proceeds with the hardware completing the command (step 448). Again, as with step 436, the manner in which the command is completed in step 448 may depend upon the nature of the RAID array. In a RAID 0 volume, the firmware will complete the read request with a status being reported back as a read success. In a RAID 1, RAID 5, or RAID 6 volume the firmware will recover the data from the other arms to satisfy the host read request. Thereafter, the method ends (step 440).
  • With reference now to FIG. 5, additional details of processing a write command will be described in accordance with at least some embodiments of the present disclosure. The method begins when a write request is received at the controller 108 from the host 104 (step 504). The controller hardware initially responds by initiating a buffering process (step 508). In this step, some or all data contained in the write request is temporarily stored in buffer memory 220 until the data can be committed to the storage device(s) 136. In general, write buffering may include allocating buffer memory 220, allocating cache segments/rows, and then stitching the buffers into cache and adding the cache segment/rows to a hash value.
  • The method continues with the hardware performing a hash search and/or analysis of flag(s) 312 in a hash slot element to determine if a bad block is present (step 512). If the hash slot element indicates the presence of a bad block, then the write command is diverted from hardware to firmware of the controller 108 (step 516). The firmware then checks the I/O range (e.g., the LBA spanned by the write request) against bad blocks identified in the bad block table 228 (step 520). If the range covered by the write request contains a bad block as determined with reference to the bad block table 228, then the entry in the bad block table 228 is cleared and the firmware is allowed to complete the write request (step 524).
  • If, however, the hash slot element 300 does not indicate the presence of a bad block as determined in step 512, then the write request is completed through the automated hardware path and the write request does not need to be processed by the firmware of the controller 108 (step 528).
  • Specific details were given in the description to provide a thorough understanding of the embodiments. However, it will be understood by one of ordinary skill in the art that the embodiments may be practiced without these specific details. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.
  • While illustrative embodiments of the disclosure have been described in detail herein, it is to be understood that the inventive concepts may be otherwise variously embodied and employed, and that the appended claims are intended to be construed to include such variations, except as limited by the prior art.

Claims (20)

What is claimed is:
1. A method for handling bad blocks in a hardware accelerated caching solution, the method comprising:
maintaining a data structure in connection with a plurality of blocks used for data caching, the data structure including one or more hash values that (i) indicate a presence of a bad block and (ii) if a bad block is present, whether a corresponding Cache Segment Identifier (CSID) is valid or invalid;
receiving an Input/Output (I/O) request;
performing a hash search for the I/O request against the data structure; and
based on the results of the hash search, either performing the I/O request with a data block identified in the I/O request or diverting the I/O request to a new data block not identified in the I/O request.
2. The method of claim 1, further comprising:
determining that the hash search resulted in a hash hit;
in response to determining that the hash search resulted in the hash hit, checking the data structure to determine if a corresponding bad block is identified as valid or invalid; and
in the event that the corresponding bad block is identified as invalid, diverting the I/O request to firmware such that the firmware is enabled to perform a table lookup and decide whether to identify the I/O request as failed with a medium error or to be treated as a false positive.
3. The method of claim 2, wherein the I/O request comprises a write request.
4. The method of claim 3, further comprising:
checking an I/O range of the write request during the table lookup;
determining that the write request is covered by the I/O range; and
in response to determining that the write request is covered by the I/O range, completing the write request and clearing an entry in the table.
5. The method of claim 4, wherein the write request is diverted from hardware to firmware such that the firmware is enabled to check the I/O range of the write request.
6. The method of claim 2, wherein the I/O request comprises a read request.
7. The method of claim 5, further comprising:
processing the read request through a hardware automated path.
8. The method of claim 1, wherein the data structure comprises a hash slot data structure that includes at least three bits that indicate: (i) whether a cache segment or cache row is valid in a CSID field; (ii) a presence of a bad block in a strip or row; and (iii) whether the CSID field is valid or invalid.
9. A memory control system, comprising:
a host interface that receives one or more host Input/Output (I/O) commands;
a storage interface that enables communication with a plurality of storage drives configured in a storage array;
a microprocessor; and
memory that includes computer-readable instructions that are executable by the microprocessor, the instructions including:
instructions that maintain a data structure in connection with a plurality of blocks used for data caching, the data structure including one or more hash values that (i) indicate a presence of a bad block and (ii) if a bad block is present, whether a corresponding Cache Segment Identifier (CSID) is valid or invalid;
instructions that perform a hash search for the I/O command against the data structure; and
instructions that, based on the results of the hash search, either perform the I/O command with hardware or divert the I/O command to firmware for further analysis against a bad block table.
10. The system of claim 9, wherein the instructions further include:
instructions that determine that the hash search resulted in a hash hit;
instructions that, in response to determining that the hash search resulted in the hash hit, check the data structure to determine if a corresponding bad block is identified as valid or invalid; and
instructions that, in the event that the corresponding bad block is identified as invalid, divert the I/O command to the firmware such that the firmware is enabled to perform a table lookup against the bad block table and decide whether to identify the I/O command as failed with a medium error or to be treated as a false positive.
11. The system of claim 10, wherein the I/O command comprises a write command.
12. The system of claim 11, wherein the instructions further include:
instructions that check an I/O range of the write command during the table lookup;
instructions that determine that the write command is covered by the I/O range; and
instructions that, in response to determining that the write command is covered by the I/O range, complete the write command and clear an entry in the bad block table.
13. The system of claim 12, wherein the write command is diverted from the hardware to the firmware such that the firmware is enabled to check the I/O range of the write command.
14. The system of claim 10, wherein the I/O command comprises a read command.
15. The system of claim 14, wherein the instructions further comprise:
instructions that process the read request through a hardware automated path.
16. The system of claim 10, wherein the data structure comprises a hash slot data structure that includes at least three bits that indicate: (i) whether a cache segment or cache row is valid in a CSID field; (ii) a presence of a bad block in a strip or row; and (iii) whether the CSID field is valid or invalid.
17. A controller for managing memory that includes hardware accelerated caching, the controller comprising:
a microprocessor; and
memory that includes computer-readable instructions that are executable by the microprocessor, the instructions including:
instructions that maintain a data structure in connection with a plurality of blocks used for data caching, the data structure including one or more hash values that (i) indicate a presence of a bad block and (ii) if a bad block is present, whether a corresponding Cache Segment Identifier (CSID) is valid or invalid;
instructions that perform a hash search for an I/O command against the data structure; and
instructions that, based on the results of the hash search, either perform the I/O command with hardware or divert the I/O command to firmware for further analysis against a bad block table.
18. The controller of claim 17, wherein the instructions further include:
instructions that determine that the hash search resulted in a hash hit;
instructions that, in response to determining that the hash search resulted in the hash hit, check the data structure to determine if a corresponding bad block is identified as valid or invalid; and
instructions that, in the event that the corresponding bad block is identified as invalid, divert the I/O command to the firmware such that the firmware is enabled to perform a table lookup against the bad block table and decide whether to identify the I/O command as failed with a medium error or to be treated as a false positive.
19. The controller of claim 17, wherein the data structure comprises a hash slot data structure that includes at least three bits that indicate: (i) whether a cache segment or cache row is valid in a CSID field; (ii) a presence of a bad block in a strip or row; and (iii) whether the CSID field is valid or invalid.
20. The controller of claim 17, wherein the I/O command comprises either a read command or a write command and wherein the read or write command is diverted to the firmware for further analysis if the hash search results in a hash hit.
US15/605,348 2017-05-25 2017-05-25 Method and system for handling bad blocks in a hardware accelerated caching solution Active 2037-11-03 US10528438B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/605,348 US10528438B2 (en) 2017-05-25 2017-05-25 Method and system for handling bad blocks in a hardware accelerated caching solution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/605,348 US10528438B2 (en) 2017-05-25 2017-05-25 Method and system for handling bad blocks in a hardware accelerated caching solution

Publications (2)

Publication Number Publication Date
US20180341564A1 true US20180341564A1 (en) 2018-11-29
US10528438B2 US10528438B2 (en) 2020-01-07

Family

ID=64400564

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/605,348 Active 2037-11-03 US10528438B2 (en) 2017-05-25 2017-05-25 Method and system for handling bad blocks in a hardware accelerated caching solution

Country Status (1)

Country Link
US (1) US10528438B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220405201A1 (en) * 2019-08-16 2022-12-22 SK Hynix Inc. Storage device for performing dump operation, method of operating storage device, computing system including storage device and host device for controlling storage device, and method of operating computing system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050383A1 (en) * 2003-08-27 2005-03-03 Horn Robert L. Method of managing raid level bad blocks in a networked storage system
US20130124931A1 (en) * 2011-11-15 2013-05-16 Stec, Inc. Transmission error detector for flash memory controller
US20140006712A1 (en) * 2011-03-16 2014-01-02 Joseph A. Tucek Systems and methods for fine granularity memory sparing
US20140115235A1 (en) * 2012-10-18 2014-04-24 Hitachi, Ltd. Cache control apparatus and cache control method
US20140164715A1 (en) * 2012-12-12 2014-06-12 Lsi Corporation Methods and structure for using region locks to divert i/o requests in a storage controller having multiple processing stacks
US20140359216A1 (en) * 2013-06-03 2014-12-04 Lsi Corporation Confirmed divert bitmap to synchronize raid firmware operations with fast-path hardware i/o processing
US20170242794A1 (en) * 2016-02-19 2017-08-24 Seagate Technology Llc Associative and atomic write-back caching system and method for storage subsystem

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8255613B2 (en) 2009-04-30 2012-08-28 International Business Machines Corporation Wear-leveling and bad block management of limited lifetime memory devices
US8954790B2 (en) 2010-07-05 2015-02-10 Intel Corporation Fault tolerance of multi-processor system with distributed cache
US9092357B2 (en) 2010-10-29 2015-07-28 Microsoft Technology Licensing, Llc Remapping of inoperable memory blocks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050050383A1 (en) * 2003-08-27 2005-03-03 Horn Robert L. Method of managing raid level bad blocks in a networked storage system
US20140006712A1 (en) * 2011-03-16 2014-01-02 Joseph A. Tucek Systems and methods for fine granularity memory sparing
US20130124931A1 (en) * 2011-11-15 2013-05-16 Stec, Inc. Transmission error detector for flash memory controller
US20140115235A1 (en) * 2012-10-18 2014-04-24 Hitachi, Ltd. Cache control apparatus and cache control method
US20140164715A1 (en) * 2012-12-12 2014-06-12 Lsi Corporation Methods and structure for using region locks to divert i/o requests in a storage controller having multiple processing stacks
US20140359216A1 (en) * 2013-06-03 2014-12-04 Lsi Corporation Confirmed divert bitmap to synchronize raid firmware operations with fast-path hardware i/o processing
US20170242794A1 (en) * 2016-02-19 2017-08-24 Seagate Technology Llc Associative and atomic write-back caching system and method for storage subsystem

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220405201A1 (en) * 2019-08-16 2022-12-22 SK Hynix Inc. Storage device for performing dump operation, method of operating storage device, computing system including storage device and host device for controlling storage device, and method of operating computing system

Also Published As

Publication number Publication date
US10528438B2 (en) 2020-01-07

Similar Documents

Publication Publication Date Title
US9772802B2 (en) Solid-state device management
US9785575B2 (en) Optimizing thin provisioning in a data storage system through selective use of multiple grain sizes
US10223009B2 (en) Method and system for efficient cache buffering supporting variable stripe sizes to enable hardware acceleration
US8938584B2 (en) System and method to keep parity consistent in an array of solid state drives when data blocks are de-allocated
US9792350B2 (en) Real-time classification of data into data compression domains
US8732411B1 (en) Data de-duplication for information storage systems
US9423981B2 (en) Logical region allocation with immediate availability
US8935304B2 (en) Efficient garbage collection in a compressed journal file
KR101654807B1 (en) Data storage device and method for operating thereof
US9459800B2 (en) Storage region metadata management
US10089015B1 (en) Per-drive memory resident zeroing maps for drive zeroing in a data storage system
US10282116B2 (en) Method and system for hardware accelerated cache flush
US10528438B2 (en) Method and system for handling bad blocks in a hardware accelerated caching solution
US10282301B2 (en) Method and system for hardware accelerated read-ahead caching
US10599530B2 (en) Method and apparatus for recovering in-memory data processing system
CN110737395B (en) I/O management method, electronic device, and computer-readable storage medium
US10649906B2 (en) Method and system for hardware accelerated row lock for a write back volume
US8140800B2 (en) Storage apparatus
US11315028B2 (en) Method and apparatus for increasing the accuracy of predicting future IO operations on a storage system
US10394673B2 (en) Method and system for hardware accelerated copyback
US11455106B1 (en) Identifying and recovering unused storage resources on a storage system
US11294751B2 (en) Storage apparatus, dump data management method, and dump data management program
EP2924576A1 (en) Storage control apparatus, control program, and control method
US20200057576A1 (en) Method and system for input/output processing for write through to enable hardware acceleration

Legal Events

Date Code Title Description
AS Assignment

Owner name: AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SIMIONESCU, HORIA;RADHAKRISHNAN, GOWRISANKAR;HOGLUND, TIMOTHY;AND OTHERS;SIGNING DATES FROM 20170522 TO 20170525;REEL/FRAME:042509/0299

AS Assignment

Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED, SINGAPORE

Free format text: MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:047231/0369

Effective date: 20180509

Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITE

Free format text: MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:047231/0369

Effective date: 20180509

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

AS Assignment

Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITE

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE EXECUTION DATE OF THE MERGER AND APPLICATION NOS. 13/237,550 AND 16/103,107 FROM THE MERGER PREVIOUSLY RECORDED ON REEL 047231 FRAME 0369. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:048549/0113

Effective date: 20180905

Owner name: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED, SINGAPORE

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE EXECUTION DATE OF THE MERGER AND APPLICATION NOS. 13/237,550 AND 16/103,107 FROM THE MERGER PREVIOUSLY RECORDED ON REEL 047231 FRAME 0369. ASSIGNOR(S) HEREBY CONFIRMS THE MERGER;ASSIGNOR:AVAGO TECHNOLOGIES GENERAL IP (SINGAPORE) PTE. LTD.;REEL/FRAME:048549/0113

Effective date: 20180905

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4