WO2014168603A1 - System for increasing utilization of storage media - Google Patents
System for increasing utilization of storage media Download PDFInfo
- Publication number
- WO2014168603A1 WO2014168603A1 PCT/US2013/035584 US2013035584W WO2014168603A1 WO 2014168603 A1 WO2014168603 A1 WO 2014168603A1 US 2013035584 W US2013035584 W US 2013035584W WO 2014168603 A1 WO2014168603 A1 WO 2014168603A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- buffer
- data
- block
- blocks
- regions
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
- G06F12/0246—Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/72—Details relating to flash memory management
- G06F2212/7202—Allocation control and policies
Definitions
- Storage systems typically present a plurality of physical media devices as one or more logical devices which may have practical advantages over the organization of the underlying physical media. These advantages can be in the form of manageability (performing per device operations to a group of devices), redundancy (allowing and correcting media errors on one or more devices transparently), scalability (allowing the size of logical devices to change
- storage systems may employ intelligent operations such as caching, prefetch or other performance-enhancing techniques.
- Capacity may be described in terms of bytes (basic unit of computer storage -conceptually equivalent to one letter on a typed page) or blocks where a block is typically 512 Bytes.
- the number of bytes in a storage system can be very large (several million millions of bytes- or terabytes).
- Performance of a storage device is typically dependent of the physical capabilities of the storage medium. This performance may typically be considered in terms of three parameters: Input/Output Operations per Second (IOPs), throughput (bytes per second that can be accessed) and latency (time required to perform a data access).
- IOPs Input/Output Operations per Second
- the IOPs metric may be further specified for sequential or random access patterns.
- Capacity optimization may be achieved, for example, by aggregating the capacity of all physical devices into a single logical device. This logical device may have higher capacity than the constituent devices but may have equivalent or slightly lower performance. Reliability optimization may involve using replication of data that sacrifices half the capacity. Alternatively, reliability optimization may involve some error correction encoding which sacrifices some capacity, but less than that associated with replication.
- Performance optimization may involve duplication which allows twice as many read operations per unit time assuming some balancing mechanism, striping which increases throughput by spreading operations over an array of devices, or caching which uses memory to act as a buffer to the physical storage media.
- the storage system will be optimized for a desired performance metric at the cost of another performance metric or by incorporating additional physical elements (such as logic, memory or redundancy).
- Determining the optimal, or most suitable, configuration of a storage system requires matching the requirements of the user of the system to the capabilities of the physical devices and the optimization capabilities of the storage system.
- the performance of the constituent physical devices is typically the determining factor.
- a storage system design may favor IOPs over capacity and thus choose to use a large number of smaller capacity disks rather than creating the equivalent aggregate capacity from larger capacity devices.
- media technology evolves, new methods of increasing performance and compensating for shortcomings of the physical media are constantly sought.
- MLC NAND flash memory may take the form of a Solid State Storage technology known as Multi-Level Cell (MLC) NAND flash memory.
- MLC NAND flash memory is commonly used in cameras, portable devices such as Universal Serial Bus (USB) memory sticks, and music players as well as consumer electronics such as cellular telephones.
- Other forms of flash in common use include Single-Level Cell (SLC) NAND flash memory and NOR flash memory. Both of these latter types offer higher performance at a higher cost as compared to MLC NAND flash.
- SLC NAND flash memory Single-Level Cell
- NOR flash memory Both of these latter types offer higher performance at a higher cost as compared to MLC NAND flash.
- Many manufacturers are currently offering NAND flash with an interface that emulates that of traditional rotating storage devices (disk drives). These flash devices may be referred to as flash Solid State Drives (SSDs) and may be constructed using either MLC or SLC technology.
- SSDs flash Solid State Drives
- Flash SSD devices differ from traditional rotating-disk drives in a number of aspects. Flash SSD devices have certain undesirable aspects. In particular, flash SSD devices suffer from a poor random write performance that may further degrade over time. Because flash media has a lifetime measure as limited number of write operations (a physical limitation of the storage material that eventually causes the device to "wear out"), write performance is also unpredictable.
- the flash SSD may be configured periodically rebalance the written sections of the media in a process called "wear leveling". This process assures that the storage medium is used evenly thus extending the viable life of the device.
- the inability to anticipate, or definitively know, when and for how long such background operations may occur (lack of transparency) is a principal cause of the performance uncertainty.
- a user cannot typically access data stored in the flash SSD device or store data in the flash SSD device while these rebalancing operations are being performed.
- the flash SSD device does not typically provide prior notification of when the background operations are going to occur. This prevents an using application from anticipating the storage non-availability and scheduling other tasks during the flash SSD rebalancing operations.
- the significant performance advantage of flash SSDs over rotating media in performing random and sequential read operations makes SSDs ideal media for high performance storage systems, provided that performance issues can be overcome or avoided.
- An apparatus for storing data has storage media; a buffer capable of storing data of a plurality of user write requests, each user write request having a logical address and an indirection table capable of mapping the logical address of each user write request to a physical address of the storage media.
- a processor is configured to write the data stored in the buffer to a contiguous address range, which may comprise a buffer region, within the storage media and to update the indirection table to include the addresses in the storage media where the data corresponding to the user write request logical address has been stored.
- the contiguous address range may be a physical address range of the storage media and the logical address of data from the user writes is associated with a physical address in the storage media by an indirection table.
- the processor may be configured to aggregate or consolidate data for a plurality of write operations, which may be user write operations, into a staging buffer and write the aggregated data from the staging buffer into the contiguous buffer region within the storage media.
- the indirection table maps the logical addresses of the write operations into a contiguous address within the buffer.
- the storage media may comprise an array of Solid State Devices (SSDs). Such devices may use any type of non-volatile storage such a NAND flash memory. Other memory types may be used, and further memory types are continually being developed having similar function. Additional memory arrays may be used for data storage, such as an array of disks, where the SSD array may be used to store data for more rapid access, and where a copy of the data may also be stored on additional memory array. Prior to writing a buffer to the storage media the processor may be operable to discard data from the buffer or to aggregate data from a plurality of buffers into a buffer region of the plurality of buffer regions according to a number of used storage media buffer regions.
- SSDs Solid State Devices
- a plurality of block counters may be configured, each block counter of the plurality of block counters containing block count values identifying a number of the blocks in the buffer in the storage media containing valid data, wherein the processor is configured to discard data from the buffer or aggregate data from a plurality buffers into a same buffer of the plurality of buffers according to the block count values.
- a plurality of read counters may be configured, each read counter of the plurality of read counters may contain read count values associated with the buffer region and wherein the processor is further configured to discard data from the buffer region according to the associated read count values of a plurality of read counters.
- a bit map may be configured for each of the buffer regions, where bits in the bit maps identify a used or unused status for data of a block within the buffer region, and the processor is further configured to combine blocks from a plurality of buffers together into a same buffer of the plurality of buffers according to the bit maps.
- a storage system for data having a control element configured to establish buffer regions within a storage media to store data as blocks in contiguous address locations; and, to identify blocks within the buffer regions that store subgroups of the data, the control element further configured to relocate the data from the blocks of a plurality of buffer regions into a same buffer region of the plurality of buffers regions or discard the data from the buffer regions, according to utilization of the buffer regions.
- the utilization of the buffer regions corresponds with a number of the buffer regions that are currently being used in the storage media.
- a block counter may be configured for each buffer region of the plurality of buffer regions, the block counter identifying a number of used blocks in the buffer region.
- the control element may be configured to rank order the different buffer regions according to the number of used blocks of each buffer region of the plurality of buffer regions and to combine blocks from different buffer regions together in a same buffer region of the plurality of buffer regions according to the ranked order.
- a read counter may be configured to identify a read count for the buffer region.
- the control element is configured to rank order buffer regions of the plurality of buffer regions according to the read counts and to discard data from different buffer regions depending on the rank order, in accordance with a policy.
- the read count may be representative of the number of read operations that occur within a time period determined by a policy.
- control element may be configured to discard data from buffer regions having a zero read count when the number of buffer regions currently being used in the storage media is below a first threshold; and, to discard data in the buffer regions data according to rank order when the number of buffers regions currently being used in the storage media is above the first threshold.
- a bit map associated with a buffer region of the plurality of buffer regions may have bits indicating a utilization status of a set of blocks in the buffer region, or the utilization status of a subset of blocks in the buffer region.
- the control element may be configured to clear the bit associated with the block when the data in the block is invalidated and to set the bits associated with the block when the data in the associated blocks is currently valid.
- control element may be configured to consolidate data for a plurality of user write operations into a buffer and write the data in the buffer into contiguous block locations in buffer region of the plurality of buffer regions in the storage media.
- a method for operating a data storage apparatus includes receiving a plurality of write operations; accumulating data from the different write operations into a staging buffer; and, writing the data in the staging buffer into contiguous block regions within a same buffer region of a storage media.
- An an indirection table is created to identify the physical address of the block within the buffer region.
- user address is used to identify the indirection table entry that maps the user address the physical addresses in the storage media; and data from the identified the block responsive to the read operation is supplied.
- a bit map having different bits corresponding to the blocks in buffer region is managed by setting bits in the bit map when the data in the staging buffer is written into the buffer region; receiving invalidation requests ; and. clearing the bits in the bit map corresponding to the block of the invalidated data.
- Data from a plurality of buffer regions may be moved into a same one of the buffer regions according to the bit maps associated with the different buffer regions.
- a number of reads operations performed on each of a plurality of buffer regions may be calculated and data may be discarded from the buffer regions according to the number or frequency of reads to the buffer regions, in accordance with a policy.
- FIG. 1 is a block diagram of a storage system used for accessing a Solid State Device (SSD) array;
- SSD Solid State Device
- FIG. 2 shows in more detail some of the operations performed by the storage system shown in FIG. 1 ;
- FIG. 3 is a flow diagram showing in more detail how the storage system operates
- FIG. 4 is a block diagram showing a control element used in the storage system of FIG.1 ;
- FIG. 5 is a block diagram showing an example write operation performed by the storage system
- FIG. 6 shows how the control element tracks data utilization
- FIG. 7 is a flow diagram showing in more detail the operations performed by the control element during a write operation
- FIG. 8 is a flow diagram showing in more detail the operations performed by the control element during a read operation
- FIG. 9 is a flow diagram showing in more detail the operations performed by the control element during a data invalidate operation
- FIG. 10 is a block diagram showing how the control element combines together data from different buffers
- FIG. 1 1 is a flow diagram showing in more detail the operations performed by the control element in FIG. 10;
- FIG. 12 is a flow diagram showing how the control element ranks utilization of buffers.
- the storage system may include an indirection mechanism a control element and a plurality of storage media.
- the storage media may be Solid State Devices (SSD).
- SSD Solid State Devices
- An abstraction of a flash SSD media permits random write operations of arbitrary size by a user to be consolidated to perform large sequential write operations size to an SSD array. This approach reduces the number of individual random write operations performed to the SSD device.
- the sequential writes to the SSD device may increase storage throughput since the SSD device may have to perform fewer defragmentation operations.
- a defragmentation operation is a type of background activity that can involve a number of internal read and write operations and which may inhibit timely user access to the SSD.
- the storage system may increase storage availability by using transparency and a handshaking scheme that allows users to eliminate or minimize the background operations performed in an array of SSDs.
- the storage system may also provide the user with the actual physical addresses to which is stored in the SSD array by a data address indirection mechanism. This differs from conventional SSD arrays where the data address indirection process and the actual physical addresses for stored data in the SSD device are hidden from the user. Read operations performed on each of the different SSD devices in the SSD array are monitored to determine usage patterns.
- a first SSD device of a plurality of SSD devices may be read more often than a second SSD device to provide data to the user.
- the storage system may choose to write new data into the second SSD device, even when the second SSD device may currently be storing more data than the first SSD device. This may increase throughput in the SSD array for particular applications where data is typically read from the storage array more often than written to storage array.
- a web server may provide web pages to clients (users). New or updated web pages may infrequently be written into memory by the web server. However, the same web server may constantly read web pages from the storage system and supply the web pages to clients. But, writes to different SSD devices of the storage system may be performed based on the a measure of SSD device utilization, not solely on SSD device available capacity. An optimal performance balance may be reached, for example, when all SSD devices experience the same substantially the same read demand. Different write loads may be required to achieve this balance.
- the storage system can be configured to use different data block sizes for writing data into the SSD array according to performance characteristics of the SSD devices. For example, a particular SSD device may be able to perform a single 4 Mega Byte (MB) block write significantly faster than 1000 separate 4K block writes. In this situation, the storage system might be configured to perform all of the writes to the SSD array in 4MB blocks A plurality of 4K block writes would be assembled (aggregated) into a single 4MB write operation.
- MB Mega Byte
- a control element determines when data from different write buffers should be combined together or discarded based on fragmentation and read activity. This optimization scheme increases memory capacity and improves memory utilization. Optimizing the combination requires aggregating smaller writes into larger writes without wasting available space within the larger write. Maintaining the metadata information of all smaller writes is the function of the control element.
- FIG. 1 shows a storage system 100 that includes an indirection mechanism 200 and a control element 300, which is a processor controller or the like.
- the storage system 100 and storage users 500 are realized by software executed by one or more processors 105 and memory located in a server 502.
- the storage user 500 may be a proxy for external users connected to the storage server by a network or other communications environment.
- Some elements in the storage system 100 may be implemented in hardware and other elements may be implemented in software.
- the storage system 100 may be located between the users 500 and a disk 20.
- the disk 20 may be a conventional hard disk, a disk array, or another solid state storage device.
- the storage system 100 can be a stand-alone appliance, device, or blade, and the disk 20 can be a stand-alone disk storage array.
- the users 500, storage system 100, and disk 20 are each coupled to each other via wired or wireless Internet connections.
- the users 500 may access one or more disks 20 over an internal or external data bus.
- the storage system 100 could be located in a personal computer or server, or could be a standalone device coupled to the computer/client via a computer bus or packet switched network connection, or the like.
- the storage system 100 accepts reads and writes to disk 20 from users 500 and uses the SSD array 400 for accelerating accesses to data.
- the SSD array 400 could be any combination of Dynamic Random Access Memory (DRAM) and/or Flash memory.
- the SSD array 400 could be implemented with any memory device that provides relatively faster data access than the disk 20.
- the storage users 500 include a software application or hardware that accesses or "uses” or “requests” data stored in the SSD array 400 or disk array 20.
- the storage users 500 may comprise a cache application used by an application 504 operated on a storage server 502.
- application 504 may need to access data stored in SSD array 400 responsive to communications with clients 506 via a Wide Area Network (WAN) 505 or Local Area Network (LAN) 505, referred to generally as an the Intranet or an intranet.
- WAN Wide Area Network
- LAN Local Area Network
- the storage users 500, storage system 100, and SSD array 400 may all be part of the same appliance that is located in the server or computing device 502. In another example, any combination of the storage users 500, storage system 100, and SSD array 400 may operate in different computing devices or servers. In other examples, the storage system 100 may be operated in conjunction with a personal computer, portable video or audio device, or some other type of consumer product. The storage system 100 could operate in any computing environment and with any application that needs to write and read data to and from memory devices.
- the storage system 100 presents the SSD array 400 as a logical volume to storage users 500.
- Storage system 100 presents logical blocks 150 of virtual storage that correspond to physical blocks 450 of physical storage in SSD array 400.
- the SSD array 400 consists of a plurality of SSD devices 402, two of which are referenced as SSD device 402A and SSD device 402B. The total number of SSD devices 402 in the SSD array 400 may vary. While shown being used in conjunction with an SSD array 400, it should also be understood that the storage system 100 can be used with any type or any combination of memory devices.
- Storage users 500 may consist of a number of actual users or a single user presenting virtual logical storage to other users indirectly.
- the storage users 500 could include a cache application that presents virtual storage to a web application 504 operating on the web server 502.
- the logical volume presented to the users 500 may have a configurable block size which may be considered fixed during configured operating mode.
- the size of the virtual logical blocks 150, a block size for data transfers between the storage system 100 and SSD array 400, and the scheme used for selecting SSD devices 402 is contained within configuration registers 110, which may be a memory.
- configuration registers 110 which may be a memory.
- storage system 100 interprets the configuration data in register 110 to set configuration parameters.
- the virtual block size 150 is assumed to be configured as 4 KB. Read and write operations performed by storage system 100 reference an integral number of the virtual blocks 150 each of size 4 KB.
- the indirection mechanism 200 is operated in response to the storage users 500 and is populated by the control element 300 with the physical addresses where the blocks of data are located in SSD array 400.
- Indirection mechanism 200 comprises an indirection table 220 having a plurality of indirection entries 230, two of which may be referenced as indirection entry 230A and indirection entry 230B.
- the indirection table 220 consists of a block level index representation of a logical storage device. The index representation allows virtual logical blocks 150 to be mapped to physical storage blocks 450 in SSD array 400. This may require, for example, one entry per virtual logical block 150 of storage, to uniquely map any block of logical storage to a block of physical storage in SSD array 400.
- the indirection mechanism 200 may comprise a search architecture or schema, such as a hash, binary tree or other data structure, such that any physical block 450 within the SSD array 400 can be mapped to a unique indirection entry 230 associated with a unique virtual logical block 150.
- This search structure may be constructed at the time that the data is written to the storage media 400.
- indirection table 220 grows as more unique virtual logical blocks 150 are written to the storage system 100 and the storage array 400.
- indirection table 220 may comprise a multi-level bitmap or tree search structure such that certain components are static in size while other components grow as more unique virtual logical blocks 150 are created in the storage system 100.
- the indirection mechanism 200 may be implemented as a hardware component or device such as a content addressable memory (CAM). Multiple levels of indirection may be used, some of which may be embodied in software.
- CAM content addressable memory
- the examples of the indirection mechanism 200 may resolve a logical block address of a read or write operation from users 500 into a unique indirection entry 230.
- the indirection entry 230 may comprise, for example, a SSD device ID 232, a user address 233, a block address 234, and a block state 236.
- the SSD device ID 232 corresponds to a unique SSD device 402 in SSD array 400.
- Block address 234 corresponds to the unique physical address of a physical block 450 of storage within the SSD device 402 that corresponds with the device ID 232.
- a block refers to a contiguous group of physical address locations within the SSD array 400.
- Block state 236 contains state information associated with block address 234 for device ID 232. This block state 236 may include, but is not limited to, timestamp information, validity flags, and other information.
- device ID 232 and block address 234 may correspond to physical SSD devices 402 through a secondary level of indirection.
- a disk controller (not shown) may be used to create logical devices from multiple physical devices.
- FIG 3 shows an operation performed on the apparatus of FIGs. 1 and 2.
- the storage user 500 writes data 502 of a random size without a specified SSD address to the storage system 100.
- Data 502 contains a user address which will used in the future to read data 502.
- the control element 300 assigns the write data 502 to one or more 4 KB blocks 508 within a 4 MB staging buffer 370, depending on the amount of data to be stored.
- the control element 300 also identifies a SSD device 402 within the SSD array 400 for storing the contents of 4MB staging buffer 370.
- the control element 300 in operation 254, notifies the indirection mechanism 200 of the particular SSD device 402 and physical block address where the data 502 is written into the SSD array 400.
- the user address 233 specified as part of the write operation of data 502 is stored within indirection mechanism 200 such that a lookup of the user address 233 will return the corresponding physical block address 234.
- Storage user 500 can subsequently retrieve data 502 using this physical block address.
- the data 502 in the staging buffer 370 is written into the SSD array 400.
- the user has not specified an SSD storage address for data 502, some implementation specific transaction state may exist. For example the user may submit multiple instances of write data 502 serially, awaiting a returned physical block address for each write request and recording this address within a memory. In another example, the user may submit several instances of write data 502 concurrently along with a transaction descriptor or numeric identifier than can be used to match the returned physical block addresses. In yet another example, the user submits several instances of write data 502 concurrently without a transaction descriptor or numeric identifier and relies on the ordering of responses to match returned physical block addresses
- the storage users 500 refer to the indirection mechanism 200 to identify the particular SSD device 402 and physical address in SSD array 400 where the stored data 510 to be read is located.
- Control element 300 reads the physical SSD device 402 referenced by device ID 232 at physical block address 234 and returns the data 510 to the particular one of the storage users 500.
- the control element 300 checks block state 236 and may perform the read operation only if data has been previously written to the specified physical block 450.
- a data block of some initial state (customarily all O's) would be returned to the storage user 500 as the result of an invalid read operation.
- indirection mechanism 200 has no indirection entry 230, a similar block would be returned to the storage user 500 indicating that no writes have occurred for the user address that maps to physical address of the specified physical block 450.
- the storage system 100 accepts user write operations of an integral number of (4KB) data blocks from storage users 500, but performs writes to the physical SSD array 400 in large (4MB) data blocks by aggregated the data in staging buffers 370.
- the optimal size of the staging buffers 370 may be determined experimentally, depending on the specific SSD type being used. For the purpose of subsequent examples the large blocks are assumed, through configuration, to be set to 4 MBs. For this configuration, up to 1000 user sub-blocks of 4 KBs can be contained within each staging buffer 370.
- performing large 4 MB writes, preferably of uniform size, from the storage system 100 to the SSD array 400 improves the overall performance of the SSD array 400 since fewer defragmentation operations may be required later.
- a fewer number of larger block writes may increase write throughput compared with a larger number of smaller random block writes.
- control element 300 To service write operations from the storage users 500, the storage system 100 uses control element 300 to identify the most suitable indirect location for storing data and executes a sequence of operations to perform the write operation and to update the indirection table 220. [0067]
- the control element 300 maintains a device list 320 with information regarding each physical SSD device 402 in SSD array 400. Each physical SSD device 402 has a corresponding device buffer list 340 and a corresponding device block map 360. Control element 300 may consult device list 320 to determine the least utilized physical SSD device 402.
- Utilization may be considered in terms both of the number of physical blocks 450 used in the SSD device 402 or the number of pending read operations to the SSD devices 402, or the number of recent read operations. That is the number of read operations to specific 4 MB buffers 405 in the SSD devices 402 over some previous time interval may be considered. This is explained further below in FIGS. 10-12.
- a high read utilization rate for a particular SSD device 402 may cause the control element 300 to select the second SSD device 402B for a next 4 MB block write, even when SSD device 402A may currently be storing less data.
- control element 300 After determining the optimal SSD device 402 for writing, control element 300 consults device buffer list 340 associated with the selected SSD device 402.
- the device buffer list 340 contains a list of buffer entries 342 that identify free 4 MB buffers 405 of storage in SSD array 400. Each buffer entry 342 represents the same buffer size and contains separate block entries 345 that identify the 4 KB blocks 450 within each 4 MB buffer 405 (FIG. 1).
- device buffer list 340 is maintained as a separate structure referenced by the device entries in device list 320.
- Device buffer list 340 has sufficient entries 345 to cover the contiguous block space for each device entry 342 in device list 320.
- Each buffer entry 342 in device buffer list 340 contains, minimally, a block map pointer 355 that points to a subset of bits 365 in the device block map 360.
- the buffer entries 342 may each contain a subset of the bits 365 from the device block map 360 that correspond with a same 4 MB block in the same SSD device 402.
- Device block map 360 contains a one-to-one mapping of 4 KB blocks 450 (FIG. 1) for each buffer entry 342 in device buffer list 340.
- each device block map 360 contains 1000 bits 365.
- Each bit 365 represents the valid/invalid state of one 4KB physical block 450 within a 4MB physical memory space 450 in SSD array 400.
- write operations 600 are sent to the storage system 100 from one or more of the storage users 500.
- Staging buffer 370 is selected as the next available buffer for the least utilized physical device.
- Data for write operations A, Band C are copied into staging buffer 370 which is subsequently written to the SSD array 400 (FIG. 1).
- the write operations A, B, and C each include data and an associated user address (write address). Other write operations may have occurred after write operation C but before the write by control element 300 to a physical disk in SDD array 400.
- indirection mechanism 200 is updated such that the logical 4 KB blocks A, Band C point to valid indirection entries 230A, 230B and 230C, respectively. These indirection entries maintain the mapping between the user logical address and the physical storage address location 234 in the SSD array 400 where the data A, B, and C is written.
- the block address 234 within each indirection entry 230 is the exact physical address for the written block.
- physical block addresses 234 are logical addresses derived from the physical address through another indirection mechanism.
- the block addresses 234 may be encoded with the device ID 232 (FIG. 1).
- the control element 300 in FIG. 4 may not directly perform writes to the selected SSD devices 402.
- a copy of the write data may be placed in the staging buffer 370 using as much space as necessary.
- Staging buffer 370 is the same size as the 4 MB buffer entries 405 in the SSD array 400. Thus, up to 1000 4 KB block writes can fit inside the staging buffer 370.
- Each 4 KB write from user 500 causes the corresponding bit 365 in device block map 360 to be set. Multiple bits 365 are set for writes larger than 4 KB.
- Staging buffer 370 is written to the physical SSD device 402 in SSD array 400 when the staging buffer 370 is full, nearly full, or a predetermined time has lapsed from the first copy into staging buffer 370, depending on a policy.
- the corresponding indirection entry 230 is updated with the physical address location (block address 234) of the data in SSD array 400.
- the indirection entry 230 is used in subsequent read operations to retrieve the stored data.
- An acknowledgement of the original write operation may not be returned to the user 500 until the physical write into SSD array 400 has occurred and the indirection mechanism 200 has been updated.
- the write data A, B, & C is copied into the staging buffer 370 by control element 300.
- the staging buffer 370 uses references to the original write operation to avoid the need to copy.
- staging buffer 370 maintains the list of links to be used by the write operation to SSD array 400.
- storage system 100 may periodically invalidate storage regions or specific blocks of storage. This invalidation may be initiated by activity such as deletion of data or expiration of cached information initiated by the storage users 500.
- the granularity of the invalidation may be the same as the granularity of the storage in terms of block size. That is, invalidation may occur in integral number of blocks (each 4KB, from the previous examples).
- Invalidation clears the corresponding valid bit 365 in the device block map 360.
- device list 320 is consulted for the appropriate device buffer list 340.
- the physical block address 234 in indirection entry 230 is then used to determine the exact bit 365 in the device block map 360 to clear. Once cleared, the indirection entry 230 is updated to indicate that the entry is no longer valid.
- control element 300 (FIG. 4) periodically reads the device buffer list entries 342 to determine if multiple 4 MB buffer regions can be combined. In one embodiment, suitability for combination is determined through a count of the number of valid block entries 345 within each buffer entry 342. Each block entry 345 in a buffer entry 342 corresponds to a 4 KB block 450 within the same 4 MB buffer 405 (FIG. 1). Combining more data from different buffers 405 into the same buffer 405, increases the efficiency and capacity of read and write operations to the SSD array 400.
- two or more 4 MB buffers 405 are read from the SSD array 400 and the valid 4 KB physical blocks 450 are copied into the same empty 4 MB staging buffer 370.
- the 4 KB blocks 450 are packed sequentially (repositioned within the 4 MB staging buffer 370) such that any holes created by the invalidated entries are eliminated.
- the staging buffer 370 may now be written back into a same new 4 MB buffer 405 on the most suitable SSD device 402, as determined by referring to the device list 320.
- the associated indirection entries 230 are updated to reflect the new physical address locations for all of the repositioned 4 KB blocks 450.
- all of the originally read 4 MB buffers 405 can be reused and are made available on the corresponding device buffer list 340. In a flash memory, the 4 MB buffer would be erased prior to reuse.
- One aspect of the remapping operation is that a handshaking operation may be performed between the storage users 500 and the storage system 100.
- the control element 300 of FIG. 4 may send a remap notification message to the storage users 500 prior to remapping multiple different 4 KB blocks 450 from different 4MB buffers 405 into the same 4 MB buffer 405.
- the remap notification message may identify the valid buffer entries 345 that are being moved to a new 4 MB buffer 405.
- the physical data blocks 450 that are being moved are committed in the new 4 MB buffer 405 in the SSD device 402 prior to the control element 300 sending out the remap notification message to the storage users 500.
- the storage users 500 then have to
- the storage users 500 acknowledge the remap notification message and then update the indirection entries 230 in indirection mechanism 200 to contain the new device ID 232 and new block addresses 234 for the remapped data blocks 450 (FIG. 1).
- Defragmentation in prior SSD devices is typically done autonomously without providing any notification to the storage users.
- the remapping described above is transparent to the storage users 500 through the handshaking operation described above. This handshaking allows the storage users 500 to complete operations on particular 4 KB blocks 450 before enabling remapping of the blocks into another 4 MB buffer 405.
- the staging buffers 370 in FIG. 4 might only be partially filled when ready to be written into a particular 4 MB buffer 405 in SSD array 400.
- the control element 300 may take this opportunity to remap blocks 450 from other partially filled 4 MB buffers 405 in SSD array 400 into the same 4 MB buffer where the current contents in staging buffer 370 are going to be written.
- the control element 300 identifies free 4 KB blocks in the new 4 MB buffer 405 using the device buffer list 340.
- a remap notification message may be sent to the storage users 500 for the data blocks 450 that will be copied into the staging buffer 370 and remapped.
- all of the contents of the staging buffer 370, including the new data and the remapped data from storage array 400, may be written into the same 4 MB buffer 405. This remaps the 4 KB blocks 450 from other sparsely populated 4 MB buffers 405 into the new 4 MB buffer 405 along with any new write data previously contained in the staging buffer 370.
- the control element 300 may start reading 4 KB blocks 450 from SSD array 400 for one or more sparsely filled 4 MB buffers 405 into the staging buffer 370.
- the write data may be loaded into the remaining free blocks in the staging buffer 370. All of the contents in the staging buffer 370 may then be written into the same 4 MB buffer 405 after the remap acknowledgement is received from the storage users 500.
- the blocks previously read from the sparsely filled 4 MB blocks in the SSD array may then be made available for other block write operations.
- FIGS. 6-12 describe in more detail examples of how the storage system 100 may be used to remap and optimize storage usage in the SSD array 400.
- the SSD array 400 may be virtualized into, for example, 4 MB buffers 405 with 4 KB physical blocks 450.
- other delineations could be used for the buffer size and block size within the buffers.
- the control element 300 in the storage system 100 maintains a buffer entry 342 for each 4 KB data block 450 in each 4 MB buffer 405 in SSD 400.
- the buffer entry 342 contains the pointer 355 to the physical location of the 4 MB buffer 405 in SSD array 400.
- Different combinations of the 4 KB blocks 450 within the 4 MB buffer 405 may either contain valid data, be designated as used space, or may contain empty or invalid data designated as free space.
- the control element 300 uses a register counter 356 to keep track of the number of blocks 450 that are used for each 4 MB buffer 405 and uses another register counter 357 to track the number of times the blocks 450 are read from the same 4 MB buffer 405. For example, whenever data is written into a previously empty buffer 405, the control element 300 will reset the value in used block count register 356 to 1024. The control element 300 will then decrement the value in used block count register 356 for each 4 KB block 450 that is subsequently invalidated. Whenever there is a read operation to any 4 KB block 450 in a 4 MB buffer 405, the control element 300 will increment the value in a block read count register 357 associated with that particular buffer 405.
- the count value in register 357 may be based on a particular time window.
- the number of reads in register 357 may be a running average for the last minute, hour, day, etc. If the time window was, for example, 1 day, then the number of reads for a last hour may be averaged in with other read counts for the previous 23 hours. If a buffer 405 has not existed for 24 hours, then an average over the time period that the buffer has retained data may be extrapolated to an average value per hour.
- Other counting schemes that indicate the relative read activity of a particular buffer 405 with respect to the other buffers in the SSD array 400 can also be used.
- the device block map 360 as described above is a bit map where each bit indicates whether or not an associated 4 KB data block 450 in a particular 4 MB buffer 405 is used or free.
- a first group of bits 365A in the bit map 360 indicate that a corresponding first group of 4 KB blocks 450A in 4 MB buffer 405 are used.
- a second group of bits 365B in the bit map 360 may indicate that, for example, a corresponding second group of 4 KB blocks 450B in buffer 405 are all free.
- the bits 365 can be configured to represent smaller or larger block sizes.
- the overall storage system 100 (FIG. 1) performs three basic data activities in SSD array 400: read, write, and invalidate. FIG.
- the storage system 100 receives a user write operation.
- the control element 300 determines if there is a staging buffer 370 currently in use in operation 602. If not, the control element 300 initializes a new staging buffer 370 in operation 614 and initializes a new buffer entry 342 for the data associated with the write operation in operation 616.
- the control element 300 copies the user data contained in the write operation from the user 500 into the staging buffer 370 in operation 604.
- the bits 365 in the device block map 360 associated with the data are then set in operation 606. For example, the bits 365 corresponding to the locations of each 4 KB block of data in the 4 MB staging buffer 370 used for storing the data from the user write operation will be set in operation 606.
- Operation 606 also increments the used block counter 356 in buffer entry 342 for each 4 KB block 450 of data used in the staging buffer 370 for storing user write data.
- the control element 300 If the staging buffer 370 is full, in operation 608, the control element 300 writes the data in the staging buffer 370 into an unused 4 MB buffer 405 in the SSD array 400 in operation 618.
- the control element 300 may also keep track how long the staging buffer 370 has been holding data. If data has been held in staging buffer 370 beyond some configured time period in operation 610, the control element 300 may also write the data into the 4 MB buffer 405 in operation 618.
- the control element 300 may update the indirection table 220 in FIG. 1 to include the SSD device ID 232, user addresses 233, and block addresses 234 for the indirection entries 230 associated with the data blocks 450 written into SSD array 400. The process then returns to operation 600 for processing other write operations.
- FIG. 8 explains the operations performed by the control element 300 for read operations.
- the storage system 100 receives a read request from one of the users 500.
- the control device determines if the user read address in the read request is contained in the indirection table 220. If not, a read error message is sent back to the user in operation 634.
- the control element 300 identifies the corresponding device ID 232 and physical block address 234 (FIG. 1) in operation 632. Note that the physical block address 234 may actually have an additional layer of abstraction used internally by the individual SSD devices 402.
- the control element 300 in operation 636 reads the 4 KB data block 450 from SSD array 400 that corresponds with the mapped block address 234.
- the read count value in register 357 (FIG. 6) is then incremented and the control device returns to processing other read requests from the users 500.
- FIG. 9 shows the operations that are performed by the control element 300 for invalidate operations.
- the storage system 100 receives an invalidate command from one of the users 500 in operation 642.
- the control element 300 in operation 644 determines if the user address 233 in the invalidate request is contained in the indirection table 220 (FIG. 1). If not, an invalidate error message is sent back to the user in operation 648.
- the control element 300 identifies the corresponding device ID 232 and physical block address 234 (FIG. 1) in operation 644.
- the control element 300 in operation 646 clears the bits 365 in the device block map 360 (FIG. 6) that correspond with the identified block addresses 234.
- the used block counter value in register 357 is then decremented once for each invalidated 4 KB block 450.
- the control element 300 checks to determine whether the used block counter value in register 356 is zero. If the value is zero, the 4 MB buffer 405 no longer contains any valid data and can be reused in operation 652. When the used block counter 356 is not zero, the control element 300 returns and processes other memory access requests.
- FIGS. 10 and 11 show how data from different 4 MB buffers 405 in the SSD array 400 may be combined together.
- three different buffer entries 342A, 342B, and 342C are identified by the control element 300 for resource recovery and optimization.
- a ranking scheme identifies the best candidate buffers 405 for recovery based on the associated used block count value in buffer 356, the read count value in register 357 in the buffer entries 342 and a buffer utilization.
- One example of the ranking scheme is described in more detail below in FIG. 12.
- the buffer entry 342 A associated with 4 MB buffer 405 A has an associated block count of 16 and a read count of 1. This means that the valid data Al and A2 in buffer 405 A has a combination of 16 valid 4 KB blocks and has been read once. Sixteen different bits are set in the device block map 360A that correspond to the sixteen 4 KB valid blocks of data Al and A2.
- the buffer entry 342B associated with 4 MB buffer 405B has a block count of 20 and a read count of 0, and the buffer entry 342C associated with 4 MB buffer 405C has an associated block count of 24 and a read count of 10.
- 20 bits will be set in the device block map 360B that correspond to the locations of the twenty 4 KB blocks of data B 1 in buffer 405B
- 24 bits will be set in the device block map 360C that correspond to the twenty four 4 KB blocks of data CI in buffer 405C.
- the control element 300 combines the data Al and A2 from buffer 405 A, the data Bl from buffer 405B, and the data CI from buffer 405C into a free 4 MB buffer 405D.
- the data Al and A2 from buffer 405 A are first copied into the first two contiguous address ranges Dl and D2 of buffer 405D, respectively.
- the data Bl from buffer 405B is copied into a next contiguous address range D3 in buffer 405D after data A2.
- the data CI from buffer 405C is copied into a fourth contiguous address range D4 in buffer 405D immediately following data CI .
- a new buffer entry 342D is created for 4 MB buffer 405D and the block count 356D is set to the total number of 4 KB blocks 450 that were copied into buffer 405D.
- 60 total blocks 450 were copied into buffer 405D and the used block count value in register 356D is set to 60.
- the read count 357D is also set to the total number of previous reads of buffers 342A, 342B, and 342C.
- the device block map 360D for buffer 405D is updated by setting the bits corresponding with the physical address locations for each of the 60 4KB blocks 450 of data Al , A2, Bl and CI copied into buffer 405B.
- the data Al, A2, Bl and CI substantially fills the 4 MB buffer 405D. Any remaining 4 KB blocks 450 in buffer 405D remain as free space and the corresponding bits in device block map 360D remain set at zero.
- the different free spaces shown in FIG. 10 may have previously contained valid data that was then later invalidated.
- the writes to SSD array 400 are in 4 MB blocks. Therefore, this free space remains unused until the control element 300 aggregates the data Al A2, Bl , and CI into another buffer 405D. After the aggregation, 4 MBs of data can again be written into 4 MB buffers 405A, 405B, and 405C and the free space reused.
- the storage system 100 reduces the overall write times with respect to random write operations. By then aggregating partially used 4 MB buffers 405, the control element 300 improves the overall utilization of the DDS array 400.
- the control element 300 ranks the 4 MB buffers 405 according to their usefulness in operation 670. Usefulness refers to how much the storage system 100 is using of the data in the 4 MB buffer 405. Ranking buffers will be explained in more detail below in FIG. 12.
- one of the staging buffers 370 (FIG. 4) is cleared for copying data from other currently used 4 MB buffers 405. For example in FIG. 10, a staging buffer 370 is cleared for loading data that will eventually be loaded into 4 MB buffer 405D.
- the control element 300 reads the information from the buffer entry 342 associated with the highest ranked 4 MB buffer 405. For example, the information in buffer entry 342A and device block map 360A in FIG. 10 is read. The control element 300 identifies the valid data in buffer 405 A using the associated buffer entry 342A and device block map 360A in operation 686. The valid 4 KB blocks in buffer 405A are then copied into the staging buffer 370 in operation 688. This process may be repeated in order of the highest ranked 4MB buffers until the staging buffer (FIG. 5) is full in operation 674.
- the control element 300 then creates a new buffer entry 342 in operation 676 and sets the used block counter value in the associated register 356 to the total number of 4 KB blocks copied into the staging buffer 370. For example, the control element 300 creates a new buffer entry 342D for the 4 MB buffer 342D in FIG. 10. The control element 300 also sets the bits for the associated device block map 360D for all of the valid 4 KB blocks 450 in the new 4 MB buffer 405D.
- the data in the staging buffer 370 is written into one of the 4 MB buffers 405 in the SSD array 400 that is not currently being used.
- the aggregated data for Al, A2, Bl and B2 are stored in 4 MB buffer 405D of the SSD array 400.
- the control element 300 in operation 680 updates the indirection mechanism 200 in FIG. 1 to include a new indirection entry 230 that contains the device ID 232 under user addresses 233 and corresponding physical block addresses 234 for each of the 4K blocks in 4 MB buffer 405D. The process then returns in operation 682.
- the SSD array 400 may be used to tier data that is also stored in the disk array 20(FIG. 1) and data in any of the 4 MB buffers 405 may be deleted or "ejected" whenever that data has little usefulness arising from being stored in the SSD array 400. For example, storing data in the SSD array 400 that is seldom read may have little impact in improving the overall read access time provided by the storage system 100 and is therefore less useful. However, storing data in the SSD array 400 that is frequently read could have a substantial impact in reducing the overall read access time provided by storage system 100 and is therefore more useful. Accordingly, the control element 300 may remove data from SSD array 400 that is seldom read and replace it with data that is more frequently read. This is different from conventional SSD devices that cannot eject any data that is currently being used, regardless of the usefulness of the data.
- FIG. 12 explains a scheme for determining which of the 4 MB buffers 405 to recover, and the criteria used for determining which of the buffers to recover first.
- a buffer 405 refers to a 4 MB section or region of memory in the SSD array 400 and a block 450 refers to a 4 KB portion of memory space within one of the 4 MB buffers 405.
- the 4 MB buffer size and the 4 KB block size are just examples and other buffer and block sizes could be used.
- control element 300 calculates the number of used buffers 405 in the SSD array 400 by comparing the number of buffer entries 342 with the overall memory space provided by SSD array 400.
- Operation 702 calculates the total number of 4 KB blocks 450 currently being used (valid) in the SSD array 400. This number may be determined by summing all of the used block counter values in each of the registers 356 for each of the buffer entries 342.
- the control element 300 in operation 704 calculates a fragmentation value that characterizes how much of the SSD array 400 is actually being used. Fragmentation can be calculated globally for all buffer entries 342 or can be calculated for a single 4 MB buffer 405. For example, the number of used blocks 450 identified in operation 702 can be divided by the total number of available 4 KB blocks 450 in the SSD array 400. A fragmentation value close to 1 is optimal, and a value below 50% indicates that at least 2: 1 buffer space recovery potential exists.
- Operation 708 calculates a utilization value that is a measure of how soon the SSD array 400 wil 1 likely run out of space. A utilization above 50% indicates the SSD array may be starting to run out of space and a utilization above 90% indicates the SSD array 400 in the storage system 100 will likely run out of space soon.
- the control element 300 determines the utilization value by dividing the number of used 4 MB buffers 405 identified in operation 700 by the total number of available 4 MB buffers 405 in SSD array 400.
- a fragmentation less than 50% indicates that there are a relatively large percentage of 4 KB blocks 450 within the 4 MB buffers 405 that are currently free/invalid and defragmenting the buffers 405 based on their used block count values in registers 356 will likely provide the most efficient way to free up buffers 405 in the SSD array 400.
- the control element 300 ranks all of the 4 MB buffers 405 in ascending order according to their used block count values in their associated registers 356. For example, the 4 MB buffer 405 with the lowest block count value in associated register 356 is ranked the highest.
- the control element 300 then performs the defragmentation operations described above in FIGS. 10 and 1 1 for the highest ranked buffers 405. The results of the defragmentation my cause the utilization value in operation 708 to fall back down below 50%. If not, additional defragmentation may be performed.
- fragmentation value in operation 710 is greater than 50% in operation 710, then defragmenting buffers is less likely to free up substantial numbers of 4 MB buffers 405. A relatively large percentage of 4 KB blocks 450 within each of the 4 MB buffers 405 are currently being used.
- Operation 712 first determines if the utilization is above 90%. If the utilization value is below 90% in operation 712, then the number of 4 MB buffers is running out, but not likely to immediately run out. In this condition, the control element 300 in operation 718 may discard the data in 4 MB buffers 405 that have a read count of zero in the associated registers 357. This represents data in the SSD array 400 that have had relatively little use since it has not been used in read operations for a particular period of time.
- a utilization value in operation 712 above 90% represents a SSD array 400 that is likely to run out of 4 MB buffers 405 relatively soon.
- the control element 300 in operation 720 ranks the 4 MB buffers 405 in ascending order according to the read counts in their associated read count registers 357. For example, any 4 MB buffers 405 with a zero read count would be ranked highest and any 4 MB buffers 405 with a read count of 1 would be ranked next highest.
- the control element 300 may then discard the data in the 4 MB buffers 405 according to the rankings (lowest number of reads) until the utilization value in operation 712 drops below 90%.
- control element 300 can alternatively discard the buffers that have never been read for recovery.
- the system described above can use dedicated processor systems, micro controllers, programmable logic devices, or microprocessors that perform some or all of the operations. Some of the operations described above may be implemented in software and other operations may be implemented in hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A storage system creates an abstraction of flash Solid State Device (SSD) media allowing random write operations of arbitrary size by a user while performing large sequential write operations of a uniform size to an SSD array. This reduces the number of random write operations performed in the SSD array and as a result increases performance of the SSD array. A control element determines when blocks from different buffers should be combined together or discarded based on fragmentation and read activity. This optimization scheme increases memory capacity and improves memory utilization and performance.
Description
SYSTEM FOR INCREASING UTILIZATION OF STORAGE MEDIA
BACKGROUND
[0001] Storage systems typically present a plurality of physical media devices as one or more logical devices which may have practical advantages over the organization of the underlying physical media. These advantages can be in the form of manageability (performing per device operations to a group of devices), redundancy (allowing and correcting media errors on one or more devices transparently), scalability (allowing the size of logical devices to change
dynamically by adding more physical devices) or performance (using parallelism to spread storage operations over multiple media devices). Additionally, storage systems may employ intelligent operations such as caching, prefetch or other performance-enhancing techniques.
[0002] For comparative purposes, storage systems may be described in terms of capacity and performance. Capacity may be described in terms of bytes (basic unit of computer storage -conceptually equivalent to one letter on a typed page) or blocks where a block is typically 512 Bytes. The number of bytes in a storage system can be very large (several million millions of bytes- or terabytes). Performance of a storage device is typically dependent of the physical capabilities of the storage medium. This performance may typically be considered in terms of three parameters: Input/Output Operations per Second (IOPs), throughput (bytes per second that can be accessed) and latency (time required to perform a data access). The IOPs metric may be further specified for sequential or random access patterns.
[0003] Configuration of a storage system allows for selective optimization of capacity and performance. Capacity optimization may be achieved, for example, by aggregating the capacity of all physical devices into a single logical device. This logical device may have higher capacity than the constituent devices but may have equivalent or slightly lower performance. Reliability optimization may involve using replication of data that sacrifices half the capacity. Alternatively,
reliability optimization may involve some error correction encoding which sacrifices some capacity, but less than that associated with replication.
Performance optimization may involve duplication which allows twice as many read operations per unit time assuming some balancing mechanism, striping which increases throughput by spreading operations over an array of devices, or caching which uses memory to act as a buffer to the physical storage media. In general, the storage system will be optimized for a desired performance metric at the cost of another performance metric or by incorporating additional physical elements (such as logic, memory or redundancy).
[0004] Determining the optimal, or most suitable, configuration of a storage system requires matching the requirements of the user of the system to the capabilities of the physical devices and the optimization capabilities of the storage system. The performance of the constituent physical devices is typically the determining factor. As an example, a storage system design may favor IOPs over capacity and thus choose to use a large number of smaller capacity disks rather than creating the equivalent aggregate capacity from larger capacity devices. As media technology evolves, new methods of increasing performance and compensating for shortcomings of the physical media are constantly sought.
[0005] Physical media may take the form of a Solid State Storage technology known as Multi-Level Cell (MLC) NAND flash memory. The MLC NAND flash memory is commonly used in cameras, portable devices such as Universal Serial Bus (USB) memory sticks, and music players as well as consumer electronics such as cellular telephones. Other forms of flash in common use include Single-Level Cell (SLC) NAND flash memory and NOR flash memory. Both of these latter types offer higher performance at a higher cost as compared to MLC NAND flash. Many manufacturers are currently offering NAND flash with an interface that emulates that of traditional rotating storage devices (disk drives). These flash devices may be referred to as flash Solid State Drives (SSDs) and may be constructed using either MLC or SLC technology.
[0006] Flash SSD devices differ from traditional rotating-disk drives in a number of aspects. Flash SSD devices have certain undesirable aspects. In
particular, flash SSD devices suffer from a poor random write performance that may further degrade over time. Because flash media has a lifetime measure as limited number of write operations (a physical limitation of the storage material that eventually causes the device to "wear out"), write performance is also unpredictable.
[0007] Internally, the flash SSD may be configured periodically rebalance the written sections of the media in a process called "wear leveling". This process assures that the storage medium is used evenly thus extending the viable life of the device. The inability to anticipate, or definitively know, when and for how long such background operations may occur (lack of transparency) is a principal cause of the performance uncertainty.
[0008] For example, a user cannot typically access data stored in the flash SSD device or store data in the flash SSD device while these rebalancing operations are being performed. The flash SSD device does not typically provide prior notification of when the background operations are going to occur. This prevents an using application from anticipating the storage non-availability and scheduling other tasks during the flash SSD rebalancing operations. However, the significant performance advantage of flash SSDs over rotating media in performing random and sequential read operations makes SSDs ideal media for high performance storage systems, provided that performance issues can be overcome or avoided.
SUMMARY
[0009] An apparatus for storing data has storage media; a buffer capable of storing data of a plurality of user write requests, each user write request having a logical address and an indirection table capable of mapping the logical address of each user write request to a physical address of the storage media. A processor is configured to write the data stored in the buffer to a contiguous address range, which may comprise a buffer region, within the storage media and to update the indirection table to include the addresses in the storage media where the data corresponding to the user write request logical address has been stored.
[0010] The contiguous address range may be a physical address range of the storage media and the logical address of data from the user writes is associated with a physical address in the storage media by an indirection table.
[0011] In an aspect, the processor may be configured to aggregate or consolidate data for a plurality of write operations, which may be user write operations, into a staging buffer and write the aggregated data from the staging buffer into the contiguous buffer region within the storage media. The indirection table maps the logical addresses of the write operations into a contiguous address within the buffer.
[0012] The storage media may comprise an array of Solid State Devices (SSDs). Such devices may use any type of non-volatile storage such a NAND flash memory. Other memory types may be used, and further memory types are continually being developed having similar function. Additional memory arrays may be used for data storage, such as an array of disks, where the SSD array may be used to store data for more rapid access, and where a copy of the data may also be stored on additional memory array. Prior to writing a buffer to the storage media the processor may be operable to discard data from the buffer or to aggregate data from a plurality of buffers into a buffer region of the plurality of buffer regions according to a number of used storage media buffer regions.
[0013] In an aspect, a plurality of block counters may be configured, each block counter of the plurality of block counters containing block count values identifying a number of the blocks in the buffer in the storage media containing valid data, wherein the processor is configured to discard data from the buffer or aggregate data from a plurality buffers into a same buffer of the plurality of buffers according to the block count values.
[0014] In another aspect, a plurality of read counters may be configured, each read counter of the plurality of read counters may contain read count values associated with the buffer region and wherein the processor is further configured to discard data from the buffer region according to the associated read count values of a plurality of read counters.
[0015] In still another aspect, a bit map may be configured for each of the buffer regions, where bits in the bit maps identify a used or unused status for data of a block within the buffer region, and the processor is further configured to combine blocks from a plurality of buffers together into a same buffer of the plurality of buffers according to the bit maps.
[0016] A storage system for data is disclosed having a control element configured to establish buffer regions within a storage media to store data as blocks in contiguous address locations; and, to identify blocks within the buffer regions that store subgroups of the data, the control element further configured to relocate the data from the blocks of a plurality of buffer regions into a same buffer region of the plurality of buffers regions or discard the data from the buffer regions, according to utilization of the buffer regions.
[0017] In an aspect, the utilization of the buffer regions corresponds with a number of the buffer regions that are currently being used in the storage media. A block counter may be configured for each buffer region of the plurality of buffer regions, the block counter identifying a number of used blocks in the buffer region. The control element may be configured to rank order the different buffer regions according to the number of used blocks of each buffer region of the plurality of buffer regions and to combine blocks from different buffer regions together in a same buffer region of the plurality of buffer regions according to the ranked order.
[0018] A read counter may be configured to identify a read count for the buffer region. The control element is configured to rank order buffer regions of the plurality of buffer regions according to the read counts and to discard data from different buffer regions depending on the rank order, in accordance with a policy. The read count may be representative of the number of read operations that occur within a time period determined by a policy.
[0019] In another aspect, the control element may be configured to discard data from buffer regions having a zero read count when the number of buffer regions currently being used in the storage media is below a first threshold; and, to discard
data in the buffer regions data according to rank order when the number of buffers regions currently being used in the storage media is above the first threshold.
[0020] A bit map associated with a buffer region of the plurality of buffer regions may have bits indicating a utilization status of a set of blocks in the buffer region, or the utilization status of a subset of blocks in the buffer region. The control element may be configured to clear the bit associated with the block when the data in the block is invalidated and to set the bits associated with the block when the data in the associated blocks is currently valid.
[0021] In yet another aspect, the control element may be configured to consolidate data for a plurality of user write operations into a buffer and write the data in the buffer into contiguous block locations in buffer region of the plurality of buffer regions in the storage media.
[0022] A method for operating a data storage apparatus includes receiving a plurality of write operations; accumulating data from the different write operations into a staging buffer; and, writing the data in the staging buffer into contiguous block regions within a same buffer region of a storage media. An an indirection table is created to identify the physical address of the block within the buffer region.
[0023] When a user read operation is received, user address is used to identify the indirection table entry that maps the user address the physical addresses in the storage media; and data from the identified the block responsive to the read operation is supplied. :
[0024] In an aspect a bit map having different bits corresponding to the blocks in buffer region is managed by setting bits in the bit map when the data in the staging buffer is written into the buffer region; receiving invalidation requests ; and. clearing the bits in the bit map corresponding to the block of the invalidated data. Data from a plurality of buffer regions may be moved into a same one of the buffer regions according to the bit maps associated with the different buffer regions.
[0025] a number of reads operations performed on each of a plurality of buffer regions may be calculated and data may be discarded from the buffer regions
according to the number or frequency of reads to the buffer regions, in accordance with a policy.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] FIG. 1 is a block diagram of a storage system used for accessing a Solid State Device (SSD) array;
[0027] FIG. 2 shows in more detail some of the operations performed by the storage system shown in FIG. 1 ;
[0028] FIG. 3 is a flow diagram showing in more detail how the storage system operates;
[0029] FIG. 4 is a block diagram showing a control element used in the storage system of FIG.1 ;
[0030] FIG. 5 is a block diagram showing an example write operation performed by the storage system;
[0031] FIG. 6 shows how the control element tracks data utilization;
[0032] FIG. 7 is a flow diagram showing in more detail the operations performed by the control element during a write operation;
[0033] FIG. 8 is a flow diagram showing in more detail the operations performed by the control element during a read operation;
[0034] FIG. 9 is a flow diagram showing in more detail the operations performed by the control element during a data invalidate operation;
[0035] FIG. 10 is a block diagram showing how the control element combines together data from different buffers;
[0036] FIG. 1 1 is a flow diagram showing in more detail the operations performed by the control element in FIG. 10; and
[0037] FIG. 12 is a flow diagram showing how the control element ranks utilization of buffers.
DESCRIPTION
[0038] The storage system may include an indirection mechanism a control element and a plurality of storage media. The storage media may be Solid State
Devices (SSD). An abstraction of a flash SSD media permits random write operations of arbitrary size by a user to be consolidated to perform large sequential write operations size to an SSD array. This approach reduces the number of individual random write operations performed to the SSD device. The sequential writes to the SSD device may increase storage throughput since the SSD device may have to perform fewer defragmentation operations. A defragmentation operation is a type of background activity that can involve a number of internal read and write operations and which may inhibit timely user access to the SSD.
[0039] The storage system may increase storage availability by using transparency and a handshaking scheme that allows users to eliminate or minimize the background operations performed in an array of SSDs. The storage system may also provide the user with the actual physical addresses to which is stored in the SSD array by a data address indirection mechanism. This differs from conventional SSD arrays where the data address indirection process and the actual physical addresses for stored data in the SSD device are hidden from the user. Read operations performed on each of the different SSD devices in the SSD array are monitored to determine usage patterns.
[0040] In an example, a first SSD device of a plurality of SSD devices may be read more often than a second SSD device to provide data to the user. The storage system may choose to write new data into the second SSD device, even when the second SSD device may currently be storing more data than the first SSD device. This may increase throughput in the SSD array for particular applications where data is typically read from the storage array more often than written to storage array.
[0041] For example, a web server may provide web pages to clients (users). New or updated web pages may infrequently be written into memory by the web server. However, the same web server may constantly read web pages from the storage system and supply the web pages to clients. But, writes to different SSD devices of the storage system may be performed based on the a measure of SSD device utilization, not solely on SSD device available capacity. An optimal performance balance may be reached, for example, when all SSD devices
experience the same substantially the same read demand. Different write loads may be required to achieve this balance.
[0042] The storage system can be configured to use different data block sizes for writing data into the SSD array according to performance characteristics of the SSD devices. For example, a particular SSD device may be able to perform a single 4 Mega Byte (MB) block write significantly faster than 1000 separate 4K block writes. In this situation, the storage system might be configured to perform all of the writes to the SSD array in 4MB blocks A plurality of 4K block writes would be assembled (aggregated) into a single 4MB write operation.
[0043] In another example, a control element determines when data from different write buffers should be combined together or discarded based on fragmentation and read activity. This optimization scheme increases memory capacity and improves memory utilization. Optimizing the combination requires aggregating smaller writes into larger writes without wasting available space within the larger write. Maintaining the metadata information of all smaller writes is the function of the control element.
[0044] FIG. 1 shows a storage system 100 that includes an indirection mechanism 200 and a control element 300, which is a processor controller or the like. In an example, the storage system 100 and storage users 500 are realized by software executed by one or more processors 105 and memory located in a server 502. Here the storage user 500 may be a proxy for external users connected to the storage server by a network or other communications environment. Some elements in the storage system 100 may be implemented in hardware and other elements may be implemented in software.
[0045] The storage system 100 may be located between the users 500 and a disk 20. The disk 20 may be a conventional hard disk, a disk array, or another solid state storage device. The storage system 100 can be a stand-alone appliance, device, or blade, and the disk 20 can be a stand-alone disk storage array. In this embodiment, the users 500, storage system 100, and disk 20 are each coupled to each other via wired or wireless Internet connections. Alternatively, the users 500 may access one or more disks 20 over an internal or external data bus. The storage
system 100 could be located in a personal computer or server, or could be a standalone device coupled to the computer/client via a computer bus or packet switched network connection, or the like.
[0046] The storage system 100 accepts reads and writes to disk 20 from users 500 and uses the SSD array 400 for accelerating accesses to data. In one embodiment, the SSD array 400 could be any combination of Dynamic Random Access Memory (DRAM) and/or Flash memory. The SSD array 400 could be implemented with any memory device that provides relatively faster data access than the disk 20.
[0047] The storage users 500 include a software application or hardware that accesses or "uses" or "requests" data stored in the SSD array 400 or disk array 20. For example, the storage users 500 may comprise a cache application used by an application 504 operated on a storage server 502. In this example, application 504 may need to access data stored in SSD array 400 responsive to communications with clients 506 via a Wide Area Network (WAN) 505 or Local Area Network (LAN) 505, referred to generally as an the Intranet or an intranet.
[0048] In an aspect, the storage users 500, storage system 100, and SSD array 400 may all be part of the same appliance that is located in the server or computing device 502. In another example, any combination of the storage users 500, storage system 100, and SSD array 400 may operate in different computing devices or servers. In other examples, the storage system 100 may be operated in conjunction with a personal computer, portable video or audio device, or some other type of consumer product. The storage system 100 could operate in any computing environment and with any application that needs to write and read data to and from memory devices.
[0049] The storage system 100 presents the SSD array 400 as a logical volume to storage users 500. Storage system 100 presents logical blocks 150 of virtual storage that correspond to physical blocks 450 of physical storage in SSD array 400. The SSD array 400 consists of a plurality of SSD devices 402, two of which are referenced as SSD device 402A and SSD device 402B. The total number of SSD devices 402 in the SSD array 400 may vary. While shown being used in
conjunction with an SSD array 400, it should also be understood that the storage system 100 can be used with any type or any combination of memory devices.
[0050] Storage users 500 may consist of a number of actual users or a single user presenting virtual logical storage to other users indirectly. For example, as described above, the storage users 500 could include a cache application that presents virtual storage to a web application 504 operating on the web server 502. The logical volume presented to the users 500 may have a configurable block size which may be considered fixed during configured operating mode.
[0051] The size of the virtual logical blocks 150, a block size for data transfers between the storage system 100 and SSD array 400, and the scheme used for selecting SSD devices 402 is contained within configuration registers 110, which may be a memory. Upon initialization, storage system 100 interprets the configuration data in register 110 to set configuration parameters. For the purpose of subsequent examples, the virtual block size 150 is assumed to be configured as 4 KB. Read and write operations performed by storage system 100 reference an integral number of the virtual blocks 150 each of size 4 KB.
[0052] The indirection mechanism 200 is operated in response to the storage users 500 and is populated by the control element 300 with the physical addresses where the blocks of data are located in SSD array 400. Indirection mechanism 200 comprises an indirection table 220 having a plurality of indirection entries 230, two of which may be referenced as indirection entry 230A and indirection entry 230B. In an example, the indirection table 220 consists of a block level index representation of a logical storage device. The index representation allows virtual logical blocks 150 to be mapped to physical storage blocks 450 in SSD array 400. This may require, for example, one entry per virtual logical block 150 of storage, to uniquely map any block of logical storage to a block of physical storage in SSD array 400.
[0053] The indirection mechanism 200 may comprise a search architecture or schema, such as a hash, binary tree or other data structure, such that any physical block 450 within the SSD array 400 can be mapped to a unique indirection entry 230 associated with a unique virtual logical block 150. This search structure may
be constructed at the time that the data is written to the storage media 400. In this example, indirection table 220 grows as more unique virtual logical blocks 150 are written to the storage system 100 and the storage array 400.
[0054] In another example, indirection table 220 may comprise a multi-level bitmap or tree search structure such that certain components are static in size while other components grow as more unique virtual logical blocks 150 are created in the storage system 100.
[0055] In yet another example, the indirection mechanism 200 may be implemented as a hardware component or device such as a content addressable memory (CAM). Multiple levels of indirection may be used, some of which may be embodied in software.
[0056] The examples of the indirection mechanism 200 may resolve a logical block address of a read or write operation from users 500 into a unique indirection entry 230. The indirection entry 230 may comprise, for example, a SSD device ID 232, a user address 233, a block address 234, and a block state 236. The SSD device ID 232 corresponds to a unique SSD device 402 in SSD array 400. Block address 234 corresponds to the unique physical address of a physical block 450 of storage within the SSD device 402 that corresponds with the device ID 232. A block refers to a contiguous group of physical address locations within the SSD array 400. Block state 236 contains state information associated with block address 234 for device ID 232. This block state 236 may include, but is not limited to, timestamp information, validity flags, and other information.
[0057] In an aspect, device ID 232 and block address 234 may correspond to physical SSD devices 402 through a secondary level of indirection. A disk controller (not shown) may be used to create logical devices from multiple physical devices.
[0058] In subsequent description, the choice of data blocks of size 4 KB and data buffers of size 4 MB is used extensively for explanation purposes. Both block and buffer sizes are configurable and the example sizes used below are not intended to be limiting. Chosen sizes as well as the ratio of sizes may differ significantly without compromising the function of the present embodiments.
Overall Operation
[0059] FIG 3 shows an operation performed on the apparatus of FIGs. 1 and 2. In a first operation 250, the storage user 500 writes data 502 of a random size without a specified SSD address to the storage system 100. Data 502 contains a user address which will used in the future to read data 502. In operation 252, the control element 300 assigns the write data 502 to one or more 4 KB blocks 508 within a 4 MB staging buffer 370, depending on the amount of data to be stored. The control element 300 also identifies a SSD device 402 within the SSD array 400 for storing the contents of 4MB staging buffer 370.
[0060] The control element 300, in operation 254, notifies the indirection mechanism 200 of the particular SSD device 402 and physical block address where the data 502 is written into the SSD array 400. The user address 233 specified as part of the write operation of data 502 is stored within indirection mechanism 200 such that a lookup of the user address 233 will return the corresponding physical block address 234. Storage user 500 can subsequently retrieve data 502 using this physical block address. In operation 256, the data 502 in the staging buffer 370 is written into the SSD array 400.
[0061] Although the user has not specified an SSD storage address for data 502, some implementation specific transaction state may exist. For example the user may submit multiple instances of write data 502 serially, awaiting a returned physical block address for each write request and recording this address within a memory. In another example, the user may submit several instances of write data 502 concurrently along with a transaction descriptor or numeric identifier than can be used to match the returned physical block addresses. In yet another example, the user submits several instances of write data 502 concurrently without a transaction descriptor or numeric identifier and relies on the ordering of responses to match returned physical block addresses
[0062] In subsequent read operations 258, the storage users 500 refer to the indirection mechanism 200 to identify the particular SSD device 402 and physical address in SSD array 400 where the stored data 510 to be read is located. Control
element 300 reads the physical SSD device 402 referenced by device ID 232 at physical block address 234 and returns the data 510 to the particular one of the storage users 500.
[0063] The control element 300 checks block state 236 and may perform the read operation only if data has been previously written to the specified physical block 450. A data block of some initial state (customarily all O's) would be returned to the storage user 500 as the result of an invalid read operation. In any situation wherein indirection mechanism 200 has no indirection entry 230, a similar block would be returned to the storage user 500 indicating that no writes have occurred for the user address that maps to physical address of the specified physical block 450.
Write Operation
[0064] Referring to FIGS. 1-4, the storage system 100 accepts user write operations of an integral number of (4KB) data blocks from storage users 500, but performs writes to the physical SSD array 400 in large (4MB) data blocks by aggregated the data in staging buffers 370. The optimal size of the staging buffers 370 may be determined experimentally, depending on the specific SSD type being used. For the purpose of subsequent examples the large blocks are assumed, through configuration, to be set to 4 MBs. For this configuration, up to 1000 user sub-blocks of 4 KBs can be contained within each staging buffer 370.
[0065] As explained above, performing large 4 MB writes, preferably of uniform size, from the storage system 100 to the SSD array 400 improves the overall performance of the SSD array 400 since fewer defragmentation operations may be required later. As also explained above, a fewer number of larger block writes may increase write throughput compared with a larger number of smaller random block writes.
[0066] To service write operations from the storage users 500, the storage system 100 uses control element 300 to identify the most suitable indirect location for storing data and executes a sequence of operations to perform the write operation and to update the indirection table 220.
[0067] The control element 300 maintains a device list 320 with information regarding each physical SSD device 402 in SSD array 400. Each physical SSD device 402 has a corresponding device buffer list 340 and a corresponding device block map 360. Control element 300 may consult device list 320 to determine the least utilized physical SSD device 402.
[0068] Utilization may be considered in terms both of the number of physical blocks 450 used in the SSD device 402 or the number of pending read operations to the SSD devices 402, or the number of recent read operations. That is the number of read operations to specific 4 MB buffers 405 in the SSD devices 402 over some previous time interval may be considered. This is explained further below in FIGS. 10-12.
[0069] A high read utilization rate for a particular SSD device 402, such as SSD device 402A in FIG. 1 , may cause the control element 300 to select the second SSD device 402B for a next 4 MB block write, even when SSD device 402A may currently be storing less data. In some applications, there may be significantly more reads from the SSD devices than writes into the SSD devices. Therefore, evenly distributing read operations may require some SSD devices 402 to store significantly more data than other SSD devices.
[0070] After determining the optimal SSD device 402 for writing, control element 300 consults device buffer list 340 associated with the selected SSD device 402. The device buffer list 340 contains a list of buffer entries 342 that identify free 4 MB buffers 405 of storage in SSD array 400. Each buffer entry 342 represents the same buffer size and contains separate block entries 345 that identify the 4 KB blocks 450 within each 4 MB buffer 405 (FIG. 1). In an example, device buffer list 340 is maintained as a separate structure referenced by the device entries in device list 320.
[0071] Device buffer list 340 has sufficient entries 345 to cover the contiguous block space for each device entry 342 in device list 320. Each buffer entry 342 in device buffer list 340 contains, minimally, a block map pointer 355 that points to a subset of bits 365 in the device block map 360. In another example, the buffer
entries 342 may each contain a subset of the bits 365 from the device block map 360 that correspond with a same 4 MB block in the same SSD device 402.
[0072] Device block map 360 contains a one-to-one mapping of 4 KB blocks 450 (FIG. 1) for each buffer entry 342 in device buffer list 340. In this example, for a buffer entry 342 for a 4 MB region 405 with 4 KB sub-blocks 450, each device block map 360 contains 1000 bits 365. Each bit 365 represents the valid/invalid state of one 4KB physical block 450 within a 4MB physical memory space 450 in SSD array 400. Using the combination of buffer entry 342 and device block map 360, all unused or invalid 4 KB blocks 450 within the selected SSD device 402 for all 4 MB regions 405 in the SSD array 400 may be identified.
[0073] Referring to FIG. 5, write operations 600 are sent to the storage system 100 from one or more of the storage users 500. Staging buffer 370 is selected as the next available buffer for the least utilized physical device. Data for write operations A, Band C are copied into staging buffer 370 which is subsequently written to the SSD array 400 (FIG. 1). The write operations A, B, and C each include data and an associated user address (write address). Other write operations may have occurred after write operation C but before the write by control element 300 to a physical disk in SDD array 400. When the 4 MB write operation to SSD array 400 is completed, indirection mechanism 200 is updated such that the logical 4 KB blocks A, Band C point to valid indirection entries 230A, 230B and 230C, respectively. These indirection entries maintain the mapping between the user logical address and the physical storage address location 234 in the SSD array 400 where the data A, B, and C is written.
[0074] For example, the block address 234 within each indirection entry 230 is the exact physical address for the written block. In another embodiment, physical block addresses 234 are logical addresses derived from the physical address through another indirection mechanism. In another example, the block addresses 234 may be encoded with the device ID 232 (FIG. 1).
[0075] The control element 300 in FIG. 4 may not directly perform writes to the selected SSD devices 402. A copy of the write data may be placed in the staging buffer 370 using as much space as necessary. Staging buffer 370 is the
same size as the 4 MB buffer entries 405 in the SSD array 400. Thus, up to 1000 4 KB block writes can fit inside the staging buffer 370. Each 4 KB write from user 500 causes the corresponding bit 365 in device block map 360 to be set. Multiple bits 365 are set for writes larger than 4 KB.
[0076] Staging buffer 370 is written to the physical SSD device 402 in SSD array 400 when the staging buffer 370 is full, nearly full, or a predetermined time has lapsed from the first copy into staging buffer 370, depending on a policy. Upon success of the write of the contents of the staging buffer 370 into SSD array 400, the corresponding indirection entry 230 is updated with the physical address location (block address 234) of the data in SSD array 400. The indirection entry 230 is used in subsequent read operations to retrieve the stored data.
[0077] An acknowledgement of the original write operation may not be returned to the user 500 until the physical write into SSD array 400 has occurred and the indirection mechanism 200 has been updated.
[0078] In another example, the write data A, B, & C is copied into the staging buffer 370 by control element 300. Alternatively, the staging buffer 370 uses references to the original write operation to avoid the need to copy. In this case, staging buffer 370 maintains the list of links to be used by the write operation to SSD array 400.
Invalidation Operation
[0079] For various reasons, storage system 100 may periodically invalidate storage regions or specific blocks of storage. This invalidation may be initiated by activity such as deletion of data or expiration of cached information initiated by the storage users 500. In an example, the granularity of the invalidation may be the same as the granularity of the storage in terms of block size. That is, invalidation may occur in integral number of blocks (each 4KB, from the previous examples).
[0080] Invalidation clears the corresponding valid bit 365 in the device block map 360. For a specific storage block 450, device list 320 is consulted for the appropriate device buffer list 340. The physical block address 234 in indirection
entry 230 is then used to determine the exact bit 365 in the device block map 360 to clear. Once cleared, the indirection entry 230 is updated to indicate that the entry is no longer valid.
[0081] The process of invalidation leaves unused 4 KB gaps within the 4 MB regions 450 of the SSD devices 402 which constitute wasted space unless reclaimed. However, the entire 4 MB region 405 cannot be reclaimed as long as other valid 4K blocks 450 are still stored within that 4 MB region 405.
Remapping
[0082] To reclaim space freed during invalidation operations without losing existing valid 4 KB blocks 450, control element 300 (FIG. 4) periodically reads the device buffer list entries 342 to determine if multiple 4 MB buffer regions can be combined. In one embodiment, suitability for combination is determined through a count of the number of valid block entries 345 within each buffer entry 342. Each block entry 345 in a buffer entry 342 corresponds to a 4 KB block 450 within the same 4 MB buffer 405 (FIG. 1). Combining more data from different buffers 405 into the same buffer 405, increases the efficiency and capacity of read and write operations to the SSD array 400.
[0083] In a remapping operation, two or more 4 MB buffers 405 are read from the SSD array 400 and the valid 4 KB physical blocks 450 are copied into the same empty 4 MB staging buffer 370. The 4 KB blocks 450 are packed sequentially (repositioned within the 4 MB staging buffer 370) such that any holes created by the invalidated entries are eliminated. When all of the data from one or more 4 MB buffers 405 in SSD array 400 has been read and processed into the same staging buffer 370, the staging buffer 370 may now be written back into a same new 4 MB buffer 405 on the most suitable SSD device 402, as determined by referring to the device list 320. Upon completion of the write operations, the associated indirection entries 230 are updated to reflect the new physical address locations for all of the repositioned 4 KB blocks 450. Upon completion of the update, all of the originally read 4 MB buffers 405 can be reused and are made
available on the corresponding device buffer list 340. In a flash memory, the 4 MB buffer would be erased prior to reuse.
Remap Control and Optimization
[0084] One aspect of the remapping operation is that a handshaking operation may be performed between the storage users 500 and the storage system 100. The control element 300 of FIG. 4 may send a remap notification message to the storage users 500 prior to remapping multiple different 4 KB blocks 450 from different 4MB buffers 405 into the same 4 MB buffer 405.
[0085] The remap notification message may identify the valid buffer entries 345 that are being moved to a new 4 MB buffer 405. The physical data blocks 450 that are being moved are committed in the new 4 MB buffer 405 in the SSD device 402 prior to the control element 300 sending out the remap notification message to the storage users 500. The storage users 500 then have to
acknowledge the remap notification message before the control element 300 can reclaim the 4 MB buffers 405 previously storing the remapped 4 KB data blocks 450.
[0086] The storage users 500 acknowledge the remap notification message and then update the indirection entries 230 in indirection mechanism 200 to contain the new device ID 232 and new block addresses 234 for the remapped data blocks 450 (FIG. 1).
[0087] Defragmentation in prior SSD devices is typically done autonomously without providing any notification to the storage users. The remapping described above is transparent to the storage users 500 through the handshaking operation described above. This handshaking allows the storage users 500 to complete operations on particular 4 KB blocks 450 before enabling remapping of the blocks into another 4 MB buffer 405.
[0088] In another aspect, the staging buffers 370 in FIG. 4 might only be partially filled when ready to be written into a particular 4 MB buffer 405 in SSD array 400. The control element 300 may take this opportunity to remap blocks 450 from other partially filled 4 MB buffers 405 in SSD array 400 into the same 4
MB buffer where the current contents in staging buffer 370 are going to be written.
[0089] Similarly, as described above, the control element 300 identifies free 4 KB blocks in the new 4 MB buffer 405 using the device buffer list 340. A remap notification message may be sent to the storage users 500 for the data blocks 450 that will be copied into the staging buffer 370 and remapped. After the storage users 500 reply with an acknowledgement, all of the contents of the staging buffer 370, including the new data and the remapped data from storage array 400, may be written into the same 4 MB buffer 405. This remaps the 4 KB blocks 450 from other sparsely populated 4 MB buffers 405 into the new 4 MB buffer 405 along with any new write data previously contained in the staging buffer 370.
[0090] In yet another aspect, there may not be many write operations 600 currently being performed by the storage users 500. The control element 300 may start reading 4 KB blocks 450 from SSD array 400 for one or more sparsely filled 4 MB buffers 405 into the staging buffer 370. When writes 600 are received, the write data may be loaded into the remaining free blocks in the staging buffer 370. All of the contents in the staging buffer 370 may then be written into the same 4 MB buffer 405 after the remap acknowledgement is received from the storage users 500. The blocks previously read from the sparsely filled 4 MB blocks in the SSD array may then be made available for other block write operations.
[0091] FIGS. 6-12 describe in more detail examples of how the storage system 100 may be used to remap and optimize storage usage in the SSD array 400. As described above, the SSD array 400 may be virtualized into, for example, 4 MB buffers 405 with 4 KB physical blocks 450. Thus, in this example, there will be 1024 4 KB physical blocks in each 4 MB buffer 405 in the SSD array 400. Of course, other delineations could be used for the buffer size and block size within the buffers.
[0092] Referring to FIG. 6, the control element 300 in the storage system 100 maintains a buffer entry 342 for each 4 KB data block 450 in each 4 MB buffer 405 in SSD 400. The buffer entry 342 contains the pointer 355 to the physical
location of the 4 MB buffer 405 in SSD array 400. Different combinations of the 4 KB blocks 450 within the 4 MB buffer 405 may either contain valid data, be designated as used space, or may contain empty or invalid data designated as free space.
[0093] The control element 300 uses a register counter 356 to keep track of the number of blocks 450 that are used for each 4 MB buffer 405 and uses another register counter 357 to track the number of times the blocks 450 are read from the same 4 MB buffer 405. For example, whenever data is written into a previously empty buffer 405, the control element 300 will reset the value in used block count register 356 to 1024. The control element 300 will then decrement the value in used block count register 356 for each 4 KB block 450 that is subsequently invalidated. Whenever there is a read operation to any 4 KB block 450 in a 4 MB buffer 405, the control element 300 will increment the value in a block read count register 357 associated with that particular buffer 405.
[0094] The count value in register 357 may be based on a particular time window. For example, the number of reads in register 357 may be a running average for the last minute, hour, day, etc. If the time window was, for example, 1 day, then the number of reads for a last hour may be averaged in with other read counts for the previous 23 hours. If a buffer 405 has not existed for 24 hours, then an average over the time period that the buffer has retained data may be extrapolated to an average value per hour. Other counting schemes that indicate the relative read activity of a particular buffer 405 with respect to the other buffers in the SSD array 400 can also be used.
[0095] The device block map 360 as described above is a bit map where each bit indicates whether or not an associated 4 KB data block 450 in a particular 4 MB buffer 405 is used or free. In the example, in FIG. 6, a first group of bits 365A in the bit map 360 indicate that a corresponding first group of 4 KB blocks 450A in 4 MB buffer 405 are used. A second group of bits 365B in the bit map 360 may indicate that, for example, a corresponding second group of 4 KB blocks 450B in buffer 405 are all free. In other examples, the bits 365 can be configured to represent smaller or larger block sizes.
[0096] The overall storage system 100 (FIG. 1) performs three basic data activities in SSD array 400: read, write, and invalidate. FIG. 7 shows in more detail the write operations performed by the control element 300. In operation 600, the storage system 100 receives a user write operation. The control element 300 determines if there is a staging buffer 370 currently in use in operation 602. If not, the control element 300 initializes a new staging buffer 370 in operation 614 and initializes a new buffer entry 342 for the data associated with the write operation in operation 616.
[0097] The control element 300 copies the user data contained in the write operation from the user 500 into the staging buffer 370 in operation 604. The bits 365 in the device block map 360 associated with the data are then set in operation 606. For example, the bits 365 corresponding to the locations of each 4 KB block of data in the 4 MB staging buffer 370 used for storing the data from the user write operation will be set in operation 606. Operation 606 also increments the used block counter 356 in buffer entry 342 for each 4 KB block 450 of data used in the staging buffer 370 for storing user write data.
[0098] If the staging buffer 370 is full, in operation 608, the control element 300 writes the data in the staging buffer 370 into an unused 4 MB buffer 405 in the SSD array 400 in operation 618. The control element 300 may also keep track how long the staging buffer 370 has been holding data. If data has been held in staging buffer 370 beyond some configured time period in operation 610, the control element 300 may also write the data into the 4 MB buffer 405 in operation 618. The control element 300 may update the indirection table 220 in FIG. 1 to include the SSD device ID 232, user addresses 233, and block addresses 234 for the indirection entries 230 associated with the data blocks 450 written into SSD array 400. The process then returns to operation 600 for processing other write operations.
[0099] FIG. 8 explains the operations performed by the control element 300 for read operations. In operation 630, the storage system 100 receives a read request from one of the users 500. The control device determines if the user read address
in the read request is contained in the indirection table 220. If not, a read error message is sent back to the user in operation 634.
[00100] When the read address is located, the control element 300 identifies the corresponding device ID 232 and physical block address 234 (FIG. 1) in operation 632. Note that the physical block address 234 may actually have an additional layer of abstraction used internally by the individual SSD devices 402. The control element 300 in operation 636 reads the 4 KB data block 450 from SSD array 400 that corresponds with the mapped block address 234. The read count value in register 357 (FIG. 6) is then incremented and the control device returns to processing other read requests from the users 500.
[00101] FIG. 9 shows the operations that are performed by the control element 300 for invalidate operations. The storage system 100 receives an invalidate command from one of the users 500 in operation 642. The control element 300 in operation 644 determines if the user address 233 in the invalidate request is contained in the indirection table 220 (FIG. 1). If not, an invalidate error message is sent back to the user in operation 648.
[00102] When the address is successfully located in the indirection table, the control element 300 identifies the corresponding device ID 232 and physical block address 234 (FIG. 1) in operation 644. The control element 300 in operation 646 clears the bits 365 in the device block map 360 (FIG. 6) that correspond with the identified block addresses 234. The used block counter value in register 357 is then decremented once for each invalidated 4 KB block 450. In operation 650, the control element 300 checks to determine whether the used block counter value in register 356 is zero. If the value is zero, the 4 MB buffer 405 no longer contains any valid data and can be reused in operation 652. When the used block counter 356 is not zero, the control element 300 returns and processes other memory access requests.
[00103] FIGS. 10 and 11 show how data from different 4 MB buffers 405 in the SSD array 400 may be combined together. Referring first to FIG. 10, three different buffer entries 342A, 342B, and 342C are identified by the control element 300 for resource recovery and optimization. A ranking scheme identifies
the best candidate buffers 405 for recovery based on the associated used block count value in buffer 356, the read count value in register 357 in the buffer entries 342 and a buffer utilization. One example of the ranking scheme is described in more detail below in FIG. 12.
[00104] In this example, the buffer entry 342 A associated with 4 MB buffer 405 A has an associated block count of 16 and a read count of 1. This means that the valid data Al and A2 in buffer 405 A has a combination of 16 valid 4 KB blocks and has been read once. Sixteen different bits are set in the device block map 360A that correspond to the sixteen 4 KB valid blocks of data Al and A2.
[00105] The buffer entry 342B associated with 4 MB buffer 405B has a block count of 20 and a read count of 0, and the buffer entry 342C associated with 4 MB buffer 405C has an associated block count of 24 and a read count of 10. Similarly, 20 bits will be set in the device block map 360B that correspond to the locations of the twenty 4 KB blocks of data B 1 in buffer 405B, and 24 bits will be set in the device block map 360C that correspond to the twenty four 4 KB blocks of data CI in buffer 405C.
[00106] The control element 300 combines the data Al and A2 from buffer 405 A, the data Bl from buffer 405B, and the data CI from buffer 405C into a free 4 MB buffer 405D.
[00107] In this example, the data Al and A2 from buffer 405 A are first copied into the first two contiguous address ranges Dl and D2 of buffer 405D, respectively. The data Bl from buffer 405B is copied into a next contiguous address range D3 in buffer 405D after data A2. The data CI from buffer 405C is copied into a fourth contiguous address range D4 in buffer 405D immediately following data CI .
[00108] A new buffer entry 342D is created for 4 MB buffer 405D and the block count 356D is set to the total number of 4 KB blocks 450 that were copied into buffer 405D. In this example, 60 total blocks 450 were copied into buffer 405D and the used block count value in register 356D is set to 60. The read count 357D is also set to the total number of previous reads of buffers 342A, 342B, and 342C. The device block map 360D for buffer 405D is updated by setting the bits
corresponding with the physical address locations for each of the 60 4KB blocks 450 of data Al , A2, Bl and CI copied into buffer 405B. In this example, the data Al, A2, Bl and CI substantially fills the 4 MB buffer 405D. Any remaining 4 KB blocks 450 in buffer 405D remain as free space and the corresponding bits in device block map 360D remain set at zero.
[00109] The different free spaces shown in FIG. 10 may have previously contained valid data that was then later invalidated. The writes to SSD array 400 are in 4 MB blocks. Therefore, this free space remains unused until the control element 300 aggregates the data Al A2, Bl , and CI into another buffer 405D. After the aggregation, 4 MBs of data can again be written into 4 MB buffers 405A, 405B, and 405C and the free space reused. By performing contiguous 4 MB writes to SSD array 400, the storage system 100 reduces the overall write times with respect to random write operations. By then aggregating partially used 4 MB buffers 405, the control element 300 improves the overall utilization of the DDS array 400.
[00110] Referring to FIG. 1 1 , the control element 300 ranks the 4 MB buffers 405 according to their usefulness in operation 670. Usefulness refers to how much the storage system 100 is using of the data in the 4 MB buffer 405. Ranking buffers will be explained in more detail below in FIG. 12. After the buffers are ranked, one of the staging buffers 370 (FIG. 4) is cleared for copying data from other currently used 4 MB buffers 405. For example in FIG. 10, a staging buffer 370 is cleared for loading data that will eventually be loaded into 4 MB buffer 405D.
[00111] In operation 684, the control element 300 reads the information from the buffer entry 342 associated with the highest ranked 4 MB buffer 405. For example, the information in buffer entry 342A and device block map 360A in FIG. 10 is read. The control element 300 identifies the valid data in buffer 405 A using the associated buffer entry 342A and device block map 360A in operation 686. The valid 4 KB blocks in buffer 405A are then copied into the staging buffer 370 in operation 688. This process may be repeated in order of the highest ranked 4MB buffers until the staging buffer (FIG. 5) is full in operation 674.
[00112] The control element 300 then creates a new buffer entry 342 in operation 676 and sets the used block counter value in the associated register 356 to the total number of 4 KB blocks copied into the staging buffer 370. For example, the control element 300 creates a new buffer entry 342D for the 4 MB buffer 342D in FIG. 10. The control element 300 also sets the bits for the associated device block map 360D for all of the valid 4 KB blocks 450 in the new 4 MB buffer 405D.
[00113] In operation 678, the data in the staging buffer 370 is written into one of the 4 MB buffers 405 in the SSD array 400 that is not currently being used. For example, as described in FIG. 10, the aggregated data for Al, A2, Bl and B2 are stored in 4 MB buffer 405D of the SSD array 400. The control element 300 in operation 680 updates the indirection mechanism 200 in FIG. 1 to include a new indirection entry 230 that contains the device ID 232 under user addresses 233 and corresponding physical block addresses 234 for each of the 4K blocks in 4 MB buffer 405D. The process then returns in operation 682.
Ranking Buffers
[00114] The SSD array 400 may be used to tier data that is also stored in the disk array 20(FIG. 1) and data in any of the 4 MB buffers 405 may be deleted or "ejected" whenever that data has little usefulness arising from being stored in the SSD array 400. For example, storing data in the SSD array 400 that is seldom read may have little impact in improving the overall read access time provided by the storage system 100 and is therefore less useful. However, storing data in the SSD array 400 that is frequently read could have a substantial impact in reducing the overall read access time provided by storage system 100 and is therefore more useful. Accordingly, the control element 300 may remove data from SSD array 400 that is seldom read and replace it with data that is more frequently read. This is different from conventional SSD devices that cannot eject any data that is currently being used, regardless of the usefulness of the data.
[00115] FIG. 12 explains a scheme for determining which of the 4 MB buffers 405 to recover, and the criteria used for determining which of the buffers to
recover first. As explained above, a buffer 405 refers to a 4 MB section or region of memory in the SSD array 400 and a block 450 refers to a 4 KB portion of memory space within one of the 4 MB buffers 405. Of course, the 4 MB buffer size and the 4 KB block size are just examples and other buffer and block sizes could be used.
[0100] In operation 700, the control element 300 calculates the number of used buffers 405 in the SSD array 400 by comparing the number of buffer entries 342 with the overall memory space provided by SSD array 400. Operation 702 calculates the total number of 4 KB blocks 450 currently being used (valid) in the SSD array 400. This number may be determined by summing all of the used block counter values in each of the registers 356 for each of the buffer entries 342.
[0101] The control element 300 in operation 704 calculates a fragmentation value that characterizes how much of the SSD array 400 is actually being used. Fragmentation can be calculated globally for all buffer entries 342 or can be calculated for a single 4 MB buffer 405. For example, the number of used blocks 450 identified in operation 702 can be divided by the total number of available 4 KB blocks 450 in the SSD array 400. A fragmentation value close to 1 is optimal, and a value below 50% indicates that at least 2: 1 buffer space recovery potential exists.
[0102] Operation 708 calculates a utilization value that is a measure of how soon the SSD array 400 wil 1 likely run out of space. A utilization above 50% indicates the SSD array may be starting to run out of space and a utilization above 90% indicates the SSD array 400 in the storage system 100 will likely run out of space soon. The control element 300 determines the utilization value by dividing the number of used 4 MB buffers 405 identified in operation 700 by the total number of available 4 MB buffers 405 in SSD array 400.
[0103] If the utilization of the 4 MB buffers is less than 50% in operation 708, for example, no buffer ranking is performed, no buffers are discarded, and no blocks from different buffers are aggregated together in operation 714. There is still plenty of space in the SSD array 400 available for storing additional data and space is not likely to run out soon.
[0104] If the utilization is greater than 50% in operation 708, there is a possibility that the SSD array 400 could run out of space sometime relatively soon. The control element 300 will first determine if the fragmentation value is greater than 50% in operation 710. A fragmentation less than 50% indicates that there are a relatively large percentage of 4 KB blocks 450 within the 4 MB buffers 405 that are currently free/invalid and defragmenting the buffers 405 based on their used block count values in registers 356 will likely provide the most efficient way to free up buffers 405 in the SSD array 400.
[0105] In operation 716, the control element 300 ranks all of the 4 MB buffers 405 in ascending order according to their used block count values in their associated registers 356. For example, the 4 MB buffer 405 with the lowest block count value in associated register 356 is ranked the highest. The control element 300 then performs the defragmentation operations described above in FIGS. 10 and 1 1 for the highest ranked buffers 405. The results of the defragmentation my cause the utilization value in operation 708 to fall back down below 50%. If not, additional defragmentation may be performed.
[0106] If the fragmentation value in operation 710 is greater than 50% in operation 710, then defragmenting buffers is less likely to free up substantial numbers of 4 MB buffers 405. A relatively large percentage of 4 KB blocks 450 within each of the 4 MB buffers 405 are currently being used.
[0107] Operation 712 first determines if the utilization is above 90%. If the utilization value is below 90% in operation 712, then the number of 4 MB buffers is running out, but not likely to immediately run out. In this condition, the control element 300 in operation 718 may discard the data in 4 MB buffers 405 that have a read count of zero in the associated registers 357. This represents data in the SSD array 400 that have had relatively little use since it has not been used in read operations for a particular period of time.
[0108] A utilization value in operation 712 above 90% represents a SSD array 400 that is likely to run out of 4 MB buffers 405 relatively soon. The control element 300 in operation 720 ranks the 4 MB buffers 405 in ascending order according to the read counts in their associated read count registers 357. For
example, any 4 MB buffers 405 with a zero read count would be ranked highest and any 4 MB buffers 405 with a read count of 1 would be ranked next highest. The control element 300 may then discard the data in the 4 MB buffers 405 according to the rankings (lowest number of reads) until the utilization value in operation 712 drops below 90%.
[0109] Note that defragmentation as described above in FIGS. 10 and 1 1 compacts the data loss. If utilization is below 90% the control element 300 can alternatively discard the buffers that have never been read for recovery.
[0110] Conventional SSD drives perform defragmentation to improve read access time; however, the capacity of the SSD drives remain the same. The optimization scheme described above increases memory capacity and improves memory utilization by determining first if data blocks from fragmented buffers can be combined together.
[011 1] When blocks from different buffers cannot efficiently be combined together, data is discarded based on read activity. When the fast storage media begins to run out of space, the data most useful for improving memory access times is kept in the fast storage media while other less useful data is accessed from slower more but typically higher capacity and lower cost disc storage media.
[0112] The system described above can use dedicated processor systems, micro controllers, programmable logic devices, or microprocessors that perform some or all of the operations. Some of the operations described above may be implemented in software and other operations may be implemented in hardware.
[0113] For the sake of convenience, the operations are described as various interconnected functional blocks or distinct software modules. This is not necessary, however, and there may be cases where these functional blocks or modules are equivalently aggregated into a single logic device, program or operation with unclear boundaries. In any event, the functional blocks and software modules or features of the flexible interface can be implemented by themselves, or in combination with other operations in either hardware or software.
[0114] Having described and illustrated the principles of the invention it should be apparent that the invention may be modified in arrangement and detail without departing from such principles. Any modifications and variation coming within the spirit and scope of the present invention are also claimed.
Claims
1. An apparatus, comprising: a storage media; a buffer capable of storing data of a plurality of user write requests, each user write request having a logical address; an indirection table capable of mapping the logical address of each user write request to a physical address of the storage media; a processor configured to write the data stored in the buffer to a contiguous address range within the storage media and to update the indirection table to include the addresses in the storage media where the data corresponding to the user write request logical address has been stored.
2. The apparatus of claim 1, wherein the contiguous address range is a physical address range of the storage media.
3. The device of claim 1, wherein a logical address of data from the user writes is associated with a physical address in the storage media by an indirection table.
4. The apparatus of claim 3, wherein the processor is configured to select the size of the contiguous address range.
5. The apparatus of claim 1 wherein the processor is configured to aggregate data for a plurality of write operations into a staging buffer and write the
aggregated data from the staging buffer into the contiguous buffer region within the storage media.
6. The apparatus of claim 1, wherein the indirection table maps the logical addresses of the write operations into a contiguous address within the buffer.
7. The apparatus of claim 1 , wherein the storage media comprises an array of Solid State Devices (SSDs).
8. The apparatus of claim 7, wherein the SSD comprises a NAND flash memory.
9. The apparatus of claim 1 , wherein the processor is configured to discard data from the buffer or to aggregate data from a plurality of buffers into a buffer region of the plurality of buffer regions according to a number of used storage media buffer regions.
10. The apparatus of claim 9, further comprising a plurality of block counters, each block counter of the plurality of block counters containing block count values identifying a number of the blocks in the buffer containing valid data, wherein the processor is configured to discard data from the buffer or, to aggregate data from a plurality buffers into a same buffer of the plurality of buffers according to the block count values.
1 1. The apparatus of claim 10, further comprising a plurality of read counters, each read counter of the plurality of read counters that containing read count values associated with the buffer region and wherein the processor is further configured to discard data from the buffer region according to the associated read count values of a plurality of read counters.
12. The apparatus according to claim 1, further comprising a bit map for each of the buffer regions, where bits in the bit map identify a used or unused status for data of a block within the buffer region, and the processor is further configured to combine data from a plurality of buffers into a same buffer of the plurality of buffers according to the bit maps.
13. A storage system, comprising:
a control element configured to establish buffer regions within a storage media to store data as blocks in contiguous address locations; and, to identify blocks within the buffer regions that store subgroups of the data, the control element further configured to relocate the data from the blocks of a plurality of buffer regions into a same buffer region of the plurality of buffer regions or discard the data from the buffer regions, according to utilization of the buffer regions.
14. The storage system of claim 13, wherein the utilization of the buffer regions corresponds with a number of the buffer regions that are currently being used in the storage media.
15. The storage system of claim 14, further comprising a block counter for each buffer region of the plurality of buffer regions, the block counter identifying a number of used blocks in the buffer region.
16. The storage system of claim 15, wherein the control element is configured to rank order the different buffer regions according to the number of used blocks of each buffer region of the plurality of buffer regions and combine blocks from different buffer regions in a same buffer region of the plurality of buffer regions according to the rank order, in accordance with a policy.
17. The storage system of claim 14, further comprising a read counter configured to identify a read count for the buffer region.
18. The storage system of claim 14, wherein the control element is configured to rank order buffer regions of the plurality of buffer regions according to the read counter value and discard data from different buffer regions depending on the rank order, in accordance with a policy.
19. The storage system of claim 18, wherein the control element is configured to:
discard data from buffer regions having a zero read counter value when the number of buffer regions currently being used in the storage media is below a first threshold; and
discard data in the buffer regions data according to rank order when the number of buffer regions currently being used in the storage media is above the first threshold,
wherein the read count is a measure of a frequency of reading data from the buffer.
20. The storage system of claim 13, further comprising a bit map associated with a buffer region of the plurality of buffer regions, wherein bits in the bit map indicate a utilization status of a set of blocks in the buffer region, or the utilization status of a subset of blocks in the buffer region.
21. The storage system of claim 20, wherein the control element is configured to clear the bit associated with the block when the data in the block is invalidated and to set the bit associated with the block when the data in the block is currently valid.
22. The storage system of claim 13, further comprising buffer entries mapping user logical addresses with physical address locations of the blocks in the storage media.
23. The storage system of claim 13, wherein the control element is further configured to consolidate data for a plurality of user write operations into a buffer and write the data in the buffer into contiguous block locations in buffer region of the plurality of buffer regions in the storage media.
24. A method for operating an apparatus, comprising: receiving different write operations;
accumulating data from the different write operations into a staging buffer;
writing the data in the staging buffer into contiguous block regions within a same buffer region of a storage media; and
creating an indirection table identifying the physical address of the block within the buffer region.
25. The method of claim 24, further comprising: receiving a read operation;
identifying a user address of the read operation;
identifying one of the indirection table entries that maps the user address to the physical address in the storage media; and
supplying data from the identified block responsive to the read operation.
26. The method of claim 24, further comprising:
generating a bit map having bits corresponding to the blocks in buffer region;
setting bits in the bit map when the data in the staging buffer is written into the buffer region;
receiving invalidation requests ; and
clearing the bits in the bit map corresponding to the block of the invalidated data.
27. The method of claim 26, further comprising combining data from a plurality of buffer regions into a same one of the buffer regions according to the bit maps associated with the different buffer regions.
28. The method of claim 24, further comprising:
calculating a number of buffer regions used in the storage media; calculating a number of block used within each buffer region; and combining the data from blocks of different buffer regions into a same one of the buffer regions according to the number of buffer regions and the number of blocks within the buffer regions.
29. The method of claim 28, further comprising: calculating a number of reads operations performed on each of a plurality of buffer regions; and
discarding data from the buffer regions according to the number of reads to the buffer regions, in accordance with a policy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2013/035584 WO2014168603A1 (en) | 2013-04-08 | 2013-04-08 | System for increasing utilization of storage media |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2013/035584 WO2014168603A1 (en) | 2013-04-08 | 2013-04-08 | System for increasing utilization of storage media |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014168603A1 true WO2014168603A1 (en) | 2014-10-16 |
Family
ID=51689864
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2013/035584 WO2014168603A1 (en) | 2013-04-08 | 2013-04-08 | System for increasing utilization of storage media |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2014168603A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109885255A (en) * | 2019-01-07 | 2019-06-14 | 北京小米移动软件有限公司 | Memory space method for sorting and device |
CN112732178A (en) * | 2020-12-29 | 2021-04-30 | 北京浪潮数据技术有限公司 | Data clearing method of SSD and related device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162795A1 (en) * | 2006-12-28 | 2008-07-03 | Genesys Logic, Inc. | Hard disk cache device and method |
US20120117309A1 (en) * | 2010-05-07 | 2012-05-10 | Ocz Technology Group, Inc. | Nand flash-based solid state drive and method of operation |
US20120173794A1 (en) * | 2011-01-05 | 2012-07-05 | Royer Jr Robert J | Drive assisted system checkpointing |
US20130031301A1 (en) * | 2011-07-29 | 2013-01-31 | Stec, Inc. | Backend organization of stored data |
-
2013
- 2013-04-08 WO PCT/US2013/035584 patent/WO2014168603A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162795A1 (en) * | 2006-12-28 | 2008-07-03 | Genesys Logic, Inc. | Hard disk cache device and method |
US20120117309A1 (en) * | 2010-05-07 | 2012-05-10 | Ocz Technology Group, Inc. | Nand flash-based solid state drive and method of operation |
US20120173794A1 (en) * | 2011-01-05 | 2012-07-05 | Royer Jr Robert J | Drive assisted system checkpointing |
US20130031301A1 (en) * | 2011-07-29 | 2013-01-31 | Stec, Inc. | Backend organization of stored data |
Non-Patent Citations (1)
Title |
---|
OUYANG, XIANGYONG ET AL.: "Enhancing Checkpoint Performance with Staging 10 and SSD", PROCEEDINGS OF THE 2010 INTERNATIONAL WORKSHOP ON STORAGE NETWORK ARCHITECTURE AND PARALLEL I/OS, 3 May 2010 (2010-05-03), pages 13 - 20, Retrieved from the Internet <URL:http://nowlab.cse.ohio-state.edu/publications/conf-papers/2010/ouyangx-snapi10.pdf> * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109885255A (en) * | 2019-01-07 | 2019-06-14 | 北京小米移动软件有限公司 | Memory space method for sorting and device |
CN112732178A (en) * | 2020-12-29 | 2021-04-30 | 北京浪潮数据技术有限公司 | Data clearing method of SSD and related device |
CN112732178B (en) * | 2020-12-29 | 2024-02-13 | 北京浪潮数据技术有限公司 | SSD data clearing method and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9424180B2 (en) | System for increasing utilization of storage media | |
US11429277B2 (en) | Memory system for controlling nonvolatile memory | |
US10430084B2 (en) | Multi-tiered memory with different metadata levels | |
US10055161B1 (en) | Data reduction techniques in a flash-based key/value cluster storage | |
US10241859B2 (en) | Memory system and method of controlling nonvolatile memory | |
US9946642B2 (en) | Distributed multimode storage management | |
KR101726824B1 (en) | Efficient Use of Hybrid Media in Cache Architectures | |
US9582222B2 (en) | Pre-cache similarity-based delta compression for use in a data storage system | |
US10621057B2 (en) | Intelligent redundant array of independent disks with resilvering beyond bandwidth of a single drive | |
US10956049B2 (en) | Wear-aware block mode conversion in non-volatile memory | |
KR20150105323A (en) | Method and system for data storage | |
CN113254358A (en) | Method and system for address table cache management | |
JP2019179571A (en) | Memory system and control method | |
WO2023275632A1 (en) | Mirroring data in write caches of controller of non-volatile memory | |
Chang et al. | Stable greedy: Adaptive garbage collection for durable page-mapping multichannel SSDs | |
US10572464B2 (en) | Predictable allocation latency in fragmented log structured file systems | |
WO2012109145A2 (en) | Pre-cache similarity-based delta compression for use in a data storage system | |
TWI782847B (en) | Method and apparatus for performing pipeline-based accessing management in a storage server | |
JP6649452B2 (en) | Memory system and control method for nonvolatile memory | |
WO2014168603A1 (en) | System for increasing utilization of storage media | |
US12093171B2 (en) | Proactive data placement in high density storage by a hybrid non-volatile storage controller | |
JP2018200720A (en) | Memory system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13881802 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.02.2016) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13881802 Country of ref document: EP Kind code of ref document: A1 |