WO2018002999A1

WO2018002999A1 - Storage device and storage equipment

Info

Publication number: WO2018002999A1
Application number: PCT/JP2016/069077
Authority: WO
Inventors: 英通小関; 藤本　和久
Original assignee: 株式会社日立製作所
Priority date: 2016-06-28
Filing date: 2016-06-28
Publication date: 2018-01-04

Abstract

A storage device according to an embodiment of the present invention comprises a device controller, which provides a logical storage space of a predetermined size to a storage controller, and a nonvolatile semiconductor storage medium having a plurality of blocks, which are data erase units. Further, each block is configured such that the cells in the block can be switched from operating in a first mode, in which the cells can store n-bit information, to operating in a second mode, in which the cells can store m-bit information (where n < m). The device controller manages, as reserve capacity, the portion of the combined available storage area of the plurality of blocks other than a selected storage area required to allocate the logical storage space, and if the reserve capacity becomes less than a predetermined threshold value, the device controller switches some of the blocks from operating in the first mode to operating in the second mode, thereby increasing available storage areas.

Description

Storage device and storage device

The present invention relates to storage device control.

The storage apparatus has a large number of storage devices for storing data and a storage controller for controlling the storage devices, and is intended to provide a large-capacity data storage space to a computer.

An HDD (Hard Disk Disk Drive) is generally mounted as a storage device, but recently, a storage device (for example, SSD: Solid State) having a non-volatile semiconductor memory (for example, FM: flash memory) as a new storage medium replacing the HDD. Drive) is drawing attention.

Generally, many SSDs are equipped with a plurality of NAND type FM chips, and the FM chips read and write data in units called pages. Since FM cannot directly overwrite the data stored in the page, it is necessary to erase the data once by performing a reclamation process in order to store new data. Note that erasure of data is performed on a collection of pages called a block.

The reclamation process is a process of generating a reusable block by erasing one or more blocks. When erasing a block, the data on the page in use must be moved to another block. Therefore, as described in Patent Document 1, for example, an SSD generally has a physical storage capacity larger than the size of a logical address space provided to an initiator such as a host or a storage controller. This physical storage area exceeding the size of the logical address space is called a spare area. The SSD performs reclamation processing using a spare area.

International Publication No. 2014/045329

As mentioned earlier, SSD requires a spare block (spare area) for reclamation processing. If there are no spare blocks (depletion), the operation cannot be continued. As is well known, there is an upper limit on the number of times each block can be erased. When the number of erasable blocks exceeds the upper limit, the block cannot be used, and the spare area is reduced.

However, the number of erasures can be predicted to some extent by observing the usage trend of the SSD. Therefore, the use of the SSD with the increased number of erasures is stopped, and the data of the SSD is stored in another storage device. It is possible to continue operation by moving to. Rather, a problem is a failure that occurs suddenly (occurs at an unexpected timing).

Ｆ FM block or die quality varies. If the quality of certain blocks (or dies) is innately worse than others, some of these blocks (or dies) will fail suddenly, resulting in depletion of spare areas As a result, there is a possibility that the operation cannot be continued. However, it is difficult to detect such a sudden decrease in blocks in advance.

A storage device according to an embodiment of the present invention includes a device controller that provides a storage controller with a logical storage space of a predetermined size, and a nonvolatile semiconductor storage medium having a plurality of blocks that are data erasure units. Each block is configured so that the cells in the block can be changed from a state in which the cell is operating in the first mode capable of storing n-bit information to a second mode capable of storing m-bit (n <m) information. Has been.

The device controller manages, as a spare capacity, a storage area that exceeds the amount of the intra-block storage area required for allocation to the logical storage space among the usable storage areas in the block. When the value falls below the threshold value, the usable storage area is increased by changing a part of the block being operated in the first mode to be operated in the second mode.

∙ It is possible to avoid the risk of SSD or storage device operation stoppage that may occur due to sudden depletion of spare capacity.

It is a figure which shows the outline | summary of this invention. It is a figure which shows the structural example of the storage system 10000 containing the storage apparatus 1 which concerns on a 1st Example. It is a figure explaining the volume structure of a storage system. It is explanatory drawing of a RAID group. It is a figure explaining the structure of a virtual volume management table. It is a figure explaining the structure of a pool management table. It is a figure explaining the structure of a RAID group management table. It is a figure which shows the structural example of SSD21. It is explanatory drawing of the data arrangement | positioning in SSD. It is a figure explaining the structure of a structure information management table. It is a figure explaining the structure of a logical physical conversion table. It is a figure explaining the structure of a block management table. It is a flowchart of a storage controller task. It is a flowchart of the data write process of a storage controller. It is a flowchart of a data read process of the storage controller. It is a flowchart of an SSD controller task. It is a flowchart of a data read / write process of the SSD controller. It is a flowchart of FM diagnosis processing of the SSD controller. It is a flowchart of FM depletion recovery processing of the SSD controller. It is a flowchart of the drive monitoring process of a storage controller. It is a figure which shows the outline | summary of a 2nd Example. 12 is a flowchart of data read / write processing of the SSD controller according to the second embodiment. 12 is a flowchart of FM diagnosis processing of the SSD controller in the second embodiment. It is a figure explaining the outline | summary of a capacity | capacitance rebalance process. 10 is a flowchart of drive monitoring processing of a storage controller in Embodiment 2. It is a figure explaining the structure of the logical physical conversion table in Example 2. FIG. It is a figure explaining the structure of the block management table in Example 2. FIG.

Hereinafter, some examples will be described. In the following description, the description is based on the premise that the storage device is an SSD. Assume that the nonvolatile semiconductor storage medium included in the SSD is a flash memory (FM). The flash memory is assumed to be a type of flash memory that is read / written in units of pages, typically a NAND flash memory. However, the flash memory may be another type of flash memory instead of the NAND type. Further, instead of the flash memory, other types of non-volatile semiconductor storage media such as phase change memory may be employed.

Hereinafter, this embodiment will be described in detail. First, the outline of the embodiment will be described with reference to FIG.

FIG. 1 is a time-series representation of the capacity change of the SSD according to the first embodiment. The SSD in this embodiment is equipped with an FM capable of changing the cell mode. In this embodiment, the description will be made on the assumption that each cell can be operated in either a mode capable of storing n-bit data or a mode capable of storing m-bit data (where n <m). . In the following, a case where n = 2 and m = 3 will be described unless otherwise specified. The mode in which the cell can store n-bit (2 bits) data is called the MLC (Multi-Level Cell) mode, and the mode in which the cell can store m-bit (3 bits) data. This is called a TLC (Triple-Level Cell) mode. Further, the MLC mode and the TLC mode are collectively referred to as “cell mode” or “mode”.

In the present embodiment, description will be made on the assumption that all cells are set to the MLC mode in the initial state of each SSD (immediately after the start of use). After a while, the SSD controller changes the mode of some cells (changes from MLC mode to TLC mode).

The SSD has a logical address space (LBA space) provided to an initiator such as a storage controller and a physical address space (PBA space) for storing actual data. Note that the size of the logical address space is the logical capacity, and the size of the physical address space is the physical capacity. In FIG. 1, a rectangular object 1000 represents a logical address space, and objects 1001 and 1002 represent areas on the physical address space. The lengths of the

objects

1000, 1001, and 1002 represent the size (capacity) of the space. Therefore, in the description of FIG. 1, the

objects

1000, 1001, and 1002 are referred to as a logical capacity 1000, a physical capacity 1001, and a physical capacity 1002, respectively.

In the initial state at time t1, the SSD provides the logical capacity 1000 to the initiator and has a physical address space (physical storage area) having a size equal to the sum of the

physical capacities

1001 and 1002. Note that at t1, the

physical capacities

1001 and 1002 are constructed in the MLC mode. The physical capacity 1001 is the total capacity of physical areas (a set of cells) reserved for allocation to the logical address space, and is equal to the logical capacity 1000. The physical capacity 1002 is an area that the SSD has beyond the physical capacity 1001 and is a kind of surplus area.

As mentioned earlier, SSD needs to be reclaimed. The physical capacity 1002 is the amount of physical storage area that the SSD has for that purpose, and is referred to as “reserve capacity” in this specification.

The state at time t2 shows a situation when a failure occurs in some blocks in the SSD and the spare capacity is reduced.

At t2, the physical storage area corresponding to the physical capacity 1004 is unusable due to a block failure. Even if some of the blocks become unusable, the logical capacity 1000 cannot be reduced, and the physical capacity to be allocated to the logical capacity 1000 is as much as the logical capacity 1000. Therefore, when some blocks become unusable, the reserve capacity is reduced by the amount of the unusable blocks. As a result, the reserve capacity 1002 existing at time t1 is reduced to the reserve capacity 1003 at time t2. If further block failure occurs from here, there is a high possibility that the reserve capacity will be exhausted. Here, the FM cell remains in the MLC mode.

At t2, the SSD controller that has recognized that there is a high risk of reserve capacity depletion expands (recovers) the reserve capacity by changing some cells from the MLC mode to the TLC mode. The state at time t3 indicates the internal state of the SSD after the mode change is performed.

At t3, the SSD controller changes a part of blocks constituting the physical capacity 1001 (or physical capacity 1003) from the MLC mode to the TLC mode. Therefore, the MLC mode area is reduced from the physical capacity 1001 (and physical capacity 1003) to the physical capacity 1005. On the other hand, in the area of the difference between the physical capacity 1001 and the physical capacity 1005, the capacity has increased 1.5 times due to the TLC mode. The areas correspond to

physical capacities

1006 and 1007. The SSD controller allocates physical capacities 1005 and 1006 (an area having a size equal to the logical capacity 1000) to a user data storage application, and uses the physical capacity 1007 as a spare capacity. As a result, the reserve capacity has expanded from the physical capacity 1003 at time t2 to the physical capacity 1007.

Through the above processing, the SSD controller can expand the physical capacity by changing the cell mode when the reserve capacity decreases. As a result, the reduced reserve capacity can be recovered, and the risk of operation stoppage due to exhaustion of the reserve capacity can be avoided. Furthermore, in a series of processes, the logical capacity provided to the initiator such as the storage controller is not changed, so that the storage controller can continue operation without being influenced by the internal state of the SSD.

FIG. 2 is a diagram illustrating a configuration example of the storage system 10000 including the storage apparatus 1 according to the first embodiment.

The storage device 1 includes a storage controller 10 and a plurality of SSDs 21 connected to the storage controller 10.

The SSD 21 is a storage device for storing write data from an initiator such as the host 2, and is a storage device that employs a nonvolatile semiconductor memory such as a flash memory as a storage medium. The internal configuration of the SSD 21 will be described later. As an example, the SSD 21 is connected to the storage controller 10 by a transmission line (SAS link) conforming to the SAS (Serial Attached SCSI) standard, a transmission line (PCI link) conforming to the PCI (Peripheral Component Interconnect) standard, or the like.

Further, as shown in FIG. 2, the storage apparatus 1 of this embodiment can be equipped with an HDD (Hard Disk Drive) 25 in addition to the SSD 21. The HDD 25 is a storage device that uses a magnetic disk as a recording medium. The HDD 25 is also connected to the storage controller 10 like the SSD 21. Similarly to the SSD 21, the HDD 25 is also connected to the storage controller 10 by a SAS link or the like. In this embodiment, storage devices such as the SSD 21 and the HDD 25 installed in the storage apparatus 1 may be referred to as “drives”. However, in the following, the configuration in which only the SSD 21 is connected as a storage device to the storage apparatus 1 of the present embodiment will be mainly described.

One or more hosts 2 are connected to the storage controller 10. A management host 5 is connected to the storage controller 10. The storage controller 10 and the host 2 are connected via a SAN (Storage Area Network) 3 formed using a fiber channel as an example. The storage controller 10 and the management host 5 are connected via a LAN (Local Area Network) 6 formed using Ethernet as an example.

The storage controller 10 includes at least a processor (CPU) 11, a host interface (denoted as “host I / F” in the figure) 12, a disk interface (denoted as “disk I / F” in the figure) 13, a memory 14, a management I / F 15 for use. The processor 11, host IF 12, disk IF 13, memory 14 and management I / F 15 are interconnected via an internal switch (internal SW) 16. Although only one of these components is shown in FIG. 2, a plurality of these components may be mounted in the storage controller 10 in order to achieve high performance and high availability. Further, instead of the internal SW 16, the components may be connected to each other via a common bus.

The disk I / F 13 has at least an interface controller and a transfer circuit. The interface controller is a component for converting a protocol (SAS in one example) used by the SSD 21 into a communication protocol (PCI-Express as an example) used in the storage controller 10. The transfer circuit is used when the storage controller 10 transfers data (read, write) to the SSD 21.

The host I / F 12 has at least an interface controller and a transfer circuit, like the disk I / F 13. The interface controller included in the host I / F 12 converts a communication protocol (for example, fiber channel) used in the data transfer path between the host 2 and the storage controller 10 and a communication protocol used in the storage controller 10. belongs to.

The processor 11 performs various controls of the storage device 1. The memory 14 is used to store programs executed by the processor 11 and various management information of the storage device 1 used by the processor 11. The memory 14 is also used for temporarily storing I / O target data for the SSD 21. Hereinafter, the storage area in the memory 14 used for temporarily storing the I / O target data for the SSD 21 is referred to as “cache”. The memory 14 is configured by a volatile storage medium such as DRAM or SRAM. However, as another embodiment, the memory 14 may be configured by using a nonvolatile memory.

FIG. 3 is a diagram for explaining the volume configuration of the storage system.

Suppose that the storage controller in this embodiment has a capacity virtualization function. The capacity virtualization function is a technique for providing a virtual capacity larger than the physical capacity of the storage apparatus to the host computer as a virtual volume. Details will be described below with reference to FIG.

The storage device 1 is equipped with SSDs 21-1 to 21-3. The SSD 21-1 has a logical address space presented to the storage controller 10 and a physical address space for storing actual data. The size of the logical address space is the logical capacity, and the size of the physical address space is the physical capacity. The association between the area on the logical address space and the area on the physical address space can be dynamically changed, and is managed by a logical-physical conversion table 1100 described later.

The physical address space is composed of a plurality of blocks 211 described later. The block 211 is used in either the MLC mode (“M” in the figure) or the TLC mode (“T” in the figure). In this embodiment, the SSD 21 can change the cell mode for each block 211.

The storage controller 10 forms a RAID group (RG) 30-1 using the logical address space provided by the SSDs 21-1 to 21-3. Although not shown, the RAID group 30-2 is configured using another SSD. Further, the storage controller 10 makes the two RAID groups 30-1 and 30-2 belong to a management unit called a pool 35. The pool 35 is a set of storage areas that can be allocated to virtual chunks of a virtual volume to be described later. The storage controller 10 manages the storage area of the RAID group by dividing it into partitions of a predetermined size. This partition is called “chunk”. A chunk 31 is created in the RAID group 30-1, and a chunk 32 is created in the RAID group 30-2.

The storage device 1 is connected to the host computer 2 and provides the virtual volume 40 to the host computer 2. The virtual volume 40 is a virtual volume formed by the capacity virtualization function. When the storage controller 10 receives a write request for the virtual volume 40 from the host computer 2, the storage controller 10 allocates an arbitrary chunk in the pool 35 to the virtual chunk 41 of the virtual volume 40 and writes data associated with the write request to the chunk.

Next, storage areas in the RAID group will be described with reference to FIG.

The storage device 1 manages a plurality of SSDs 21 as one RAID group. When one (or two) SSDs 21 in the RAID group fail and data access becomes impossible, the data stored in the failed SSD 21 is stored using the data in the remaining SSDs 21. I am trying to recover.

In FIG. 4, SSD # 0 (20-0) to SSD # 3 (20-3) respectively represent logical address spaces provided by the SSD 21 to the storage controller 10. The storage controller 10 forms one RAID group 30 from a plurality (four in the example of FIG. 4) of SSDs 21, and the logical address space (SSD # 0 (20-0) to SSD) of each SSD belonging to the RAID group 30 # 3 (20-3)) is divided into a plurality of fixed-size storage areas called stripe blocks (301) for management.

FIG. 4 shows an example in which the RAID level of the RAID group 30 (representing the data redundancy method in the RAID technology and generally having RAID levels of RAID1 to RAID6) is RAID5. In FIG. 4, boxes such as “0”, “1”, and “P” in the RAID group 20 represent stripe blocks. A number such as “1” assigned to each stripe block is referred to as a “stripe block number”.

In FIG. 4, a stripe block described as “P” in the stripe block is a stripe block in which redundant data (parity) is stored, and this is called a “parity stripe”. On the other hand, a stripe block in which numbers (0, 1, etc.) are written is a stripe block in which data written from an initiator such as the host 2 (data that is not redundant data) is stored. This stripe block is called “data stripe”.

In the RAID group 30 shown in FIG. 4, for example, the stripe block located at the head of SSD # 3 (20-3) is the parity stripe 301-3. When the storage controller 10 creates redundant data stored in the parity stripe 301-3, the data stripe positioned at the head of each SSD (SSD # 0 (20-0) to SSD # 2 (20-2)) Redundant data is generated by performing a predetermined operation (for example, exclusive OR (XOR) or the like) on data stored in (striped blocks 301-0, 301-1, 301-2).

Hereinafter, a parity stripe and a set of data stripes (for example, the element 300 in FIG. 4) used to generate redundant data stored in the parity stripe are referred to as “strip lines”. In the case of the storage apparatus 1 according to the present embodiment, each stripe block belonging to one stripe line is located at the same position in the logical address space of the SSDs 21-0 to 21-3, as in the stripe line 300 shown in FIG. A stripe line is configured according to the rule of existing at (address).

The “chunk” described above is an area composed of a plurality of stripe lines continuously arranged in the RAID group, as shown in FIG. Further, the number of data stripes included in each chunk in the storage device 1 is the same. In the present embodiment, one chunk 31 is a region composed of a plurality of stripe lines, but one chunk 31 may be configured to have only one stripe line.

As mentioned earlier, chunks are assigned to “virtual chunks” on the virtual volume. A virtual chunk is a partition of a predetermined size on the storage space of a virtual volume.

１ One chunk is mapped to one virtual chunk. When the storage device 1 receives a data write request for a virtual chunk from the host 2, the storage device 1 stores the data in the mapped chunk. However, when a chunk is mapped to a virtual chunk, only the data stripe in the chunk is mapped. Therefore, the size of the virtual chunk is equal to the total size of all data stripes included in the chunk. The storage controller 10 manages the storage area (chunk) allocated to the virtual chunk by recording the mapping between the virtual chunk and the chunk in a virtual volume management table 500 described later.

-Immediately after the virtual volume is defined, no chunk is mapped to each virtual chunk of the virtual volume. The storage controller 10 determines a storage area (chunk) on the SSD 20 to which data written to the area is to be written only when a write request for the area on the virtual chunk is received from the host 2. As the chunk determined here, one chunk is determined from among chunks not yet assigned to any virtual chunk (unused chunk).

Here, although not shown, at least a storage control program, a virtual volume management table 500, a pool management table 550, and a RAID group management table 650 exist in the memory 14 of the storage controller 10. The contents of these programs and management tables will be described below.

FIG. 5 is a diagram for explaining the configuration of the virtual volume management table. The virtual volume management table 500 is a table for managing the mapping relationship between the virtual chunks in each virtual volume defined in the storage apparatus 1 and the chunks.

The virtual volume management table 500 has columns of virtual volume # 501, pool # 502, virtual volume LBA range 503, virtual chunk number 504, RAID group number 505, and chunk number 506. Each row (record) of the virtual volume management table 500 indicates that the chunk specified by the RAID group number 505 and the chunk number 506 is mapped to the virtual chunk specified by the virtual volume # 501 and the virtual chunk number 504. To express. Hereinafter, not only the virtual volume management table 500 but also each row of a table for managing various information is referred to as a “record”.

When no chunk is mapped to the virtual chunk specified by the virtual chunk number 504, an invalid value (NULL) is stored in the RAID group number 505 and the chunk number 506 of the record.

Pool # 502 stores the identification number of the pool to which the chunk that can be allocated to the virtual volume belongs. That is, the chunks that can be allocated to the virtual chunks of the virtual volume identified by the virtual volume # 501 are limited to the chunks (or RAID groups) belonging to the pool # 502 in principle. The virtual volume LBA range 503 is information indicating which range on the virtual volume the virtual chunk specified by the virtual chunk number 504 corresponds to. As an example, in the row (record) 500-1 of FIG. 5, the virtual volume LBA range 503 is “0x0500 to 0x09FF” and the virtual chunk number 504 is “2”. This indicates that the LBA of volume # 0 corresponds to the area from 0x0500 to 0x09FF.

FIG. 6 is a diagram for explaining the configuration of the pool management table. The pool is managed by a pool management table 550. The pool management table 550 includes columns of pool # 551, RG # 552, chunk # 553, RAID group LBA 554, status 555, and remaining capacity 556. In the pool management table 550, each record is for storing information about a chunk. RG # 552 of each record represents the RAID group number of the RAID group to which the chunk belongs, and pool # 551 represents the pool number of the pool to which the chunk belongs. Furthermore, pool # 551 represents the pool number to which the RAID group specified by RG # 552 belongs.

Further, the RAID group LBA 554 of each record is information indicating in which range on the RAID group the chunk is positioned. The status 555 is information indicating whether the chunk is assigned to the virtual chunk (whether mapped). When “assigned” is stored in the status 555, it indicates that the chunk is assigned to the virtual chunk. Conversely, when “unallocated” is stored in the status 555, it means that the chunk is not allocated to the virtual chunk. The remaining capacity 556 is a total value of unused capacity of the RAID group in the pool, and is equal to the total value of the capacity of chunks whose status 555 is “unallocated”. Note that the storage apparatus 1 according to the present embodiment manages the remaining capacity 556 for each RAID group. For example, in the pool management table 550 illustrated in FIG. 6, it is managed that the remaining capacity 556 of RG # 1 is 400 GB and the remaining capacity 556 of RG # 2 is 600 GB. However, as another embodiment, the storage apparatus 1 may manage the remaining capacity for each pool.

FIG. 7 is a diagram for explaining the configuration of a RAID group management table. The RAID group is managed by a RAID group management table 650. The RAID group management table 650 includes columns of RG # 651, RAID level 652, drive number 653, drive attribute 654, RAID group LBA655, drive logical capacity 656, and average data compression rate 657.

RG # 651 stores the RAID group number of the RAID group. The RAID level 652 indicates the RAID configuration of the RAID group. The drive number 653 stores an identifier of the SSD 21 belonging to the RAID group specified by RG # 651. The drive attribute 654 indicates whether the drive specified by the drive number 653 is an active drive or a spare drive. The active drive means a drive that currently stores user data, and “active” is set in the drive attribute 654 of the active drive. One spare drive is a drive that starts operation as an alternative drive when the active drive fails. “Spare” is set in the drive attribute 654 of the spare drive. The RAID group LBA 655 is information indicating which area on the RAID group each area of the SSD 21 specified by the drive number 653 is positioned. The drive logical capacity 655 indicates the capacity (logical capacity) of the drive. The average data compression rate 657 is information indicating how much data transferred by the storage controller 10 is reduced when the drive has a data compression function. In the first embodiment, it is assumed that the data compression function is invalidated. Therefore, “N / A” indicating an invalid state is stored in FIG.

FIG. 8 is a diagram illustrating a configuration example of the SSD 21. The SSD 21 includes an SSD controller 200 and a plurality of FM chips 210. The SSD controller 200 includes a processor (CPU) 201, a disk I / F 202, an FM chip I / F 203, a memory 204, a parity operation circuit 206, and a compression / decompression circuit 207, which are interconnected via an internal connection switch 205. Has been.

The disk I / F 202 is an interface controller for performing communication between the SSD 21 and the storage controller 10. The disk I / F 202 is connected to the disk I / F 13 of the storage controller 10 via a transmission line (SAS link or PCI link). On the other hand, the FM chip I / F 203 is an interface controller for performing communication between the SSD controller 200 and the FM chip 210.

Also, the FM chip I / F 203 has a function of generating ECC (Error Correcting Code), error detection using the ECC, and error correction. As an example of the ECC, a BCH code, an LDPC (Low Density Parity Check) code, or the like may be used.

When the data is transmitted (written) from the SSD controller 200 to the FM chip 210, the FM chip I / F 203 generates an ECC. The FM chip I / F 203 adds the generated ECC to the data, and writes the data with the ECC added to the FM chip 210.

When the SSD controller 200 reads data from the FM chip 210, the data with the ECC added is read from the FM chip 210, and the data with the ECC added arrives at the FM chip I / F 203. The FM chip I / F 203 performs a data error check using the ECC (generates an ECC from the data, and checks whether the generated ECC matches the ECC added to the data), and a data error is detected. In this case, data correction is performed using ECC.

The CPU 201 performs processing related to various commands coming from the storage controller 10. The memory 204 stores a program executed by the processor 201 and various management information. A part of the memory 204 is also used as a buffer for temporarily storing write data transmitted from the storage controller 10 together with a write command and data read from the FM chip 210. Hereinafter, an area used as a buffer in the area of the memory 204 is referred to as a “buffer area”. As the memory 204, a volatile memory such as a DRAM is used. However, a nonvolatile memory may be used for the memory 204.

Although not shown, at least an SSD control program, a logical-physical conversion table 1100, a block management table 1150, and a configuration information management table 1300 are stored in the memory 204 of the SSD 21.

The parity operation circuit 206 is a circuit for creating parity data in the SSD 21. The SSD 21 has a function of configuring a RAID group from a plurality of FM chips 210 and performing data recovery using RAID technology. The parity calculation circuit 206 is hardware for generating redundant data (parity) in RAID technology. In this embodiment, the redundant data generated by the parity operation circuit 206 is expressed as “parity” or “parity data”, while the ECC generated by the FM chip I / F 203 is expressed as “ECC”.

The compression / decompression circuit 207 is a circuit for compressing and decompressing data. However, in the first embodiment, an example in which the SSD 21 does not compress data will be described. Therefore, in the SSD 21 according to the first embodiment, the compression / decompression circuit 207 may not be provided. A usage example of the compression / decompression circuit 207 will be described in a second embodiment.

FM chip 210 is a non-volatile semiconductor memory chip such as a NAND flash memory. The FM chip 210 has a plurality of dies 213, and each die 213 has a plurality of cells 214. The cell 214 is a memory element including a transistor or the like, and each cell 214 can hold one or a plurality of bits of data. Write data from the SSD controller 200 is stored in the cell 214. In the cell 214 of the FM chip 210 in this embodiment, the amount of information (number of bits) that can be stored in the cell 214 can be changed by an instruction from the SSD controller 200.

As is well known, reading / writing of data in the flash memory cannot be performed in units of cells 214. A set of a plurality of cells 214 is performed for each area of a predetermined size (for example, 8 KB) called a page. Data erasure is performed for each block 211 that is a set of pages.

In general, “SSD” means a storage device having the same form factor as the HDD. However, in this embodiment, the SSD means a general storage device including a plurality of flash memories and a controller for controlling them, and the external shape is not limited to a general HDD or SSD form factor. . Further, a nonvolatile semiconductor memory such as NOR or NAND may be used for the flash memory. Also, instead of flash memory, magnetoresistive memory MRAM (Magnetoretic random access memory), resistance change memory ReRAM (resistance random access memory), ferroelectric memory FeRAM (Ferroelectric random), etc. Various semiconductor memories may be used.

FIG. 9 is an explanatory diagram of data arrangement in the SSD. This is equivalent to the case where the RAID group 30 in FIG. 4 is replaced with the SSD 21 and the SSD is replaced with the FM chip. In the SSD according to this embodiment, the unit of the stripe block (401-0, 401-1, 401-2, 401-3, etc.) is one physical page. However, the RAID group may be configured such that the block 211, the die 213, or the FM chip 210 is a unit of a stripe block. When write data is written to a data stripe (physical page), the SSD 21 generates parity belonging to the same stripe line, writes write data to the data stripe, and stores parity in the parity stripe. Since the physical page cannot be overwritten, when updating the data stripe and parity stripe, an unused physical page in the chip is selected and data and parity are written to the unused physical page.

Here, the relationship between the physical address space (physical capacity) and the data arrangement in the SSD will be described with reference to FIG. 1 and FIG. In FIG. 1, the physical capacity 1001 is equal to the size of the logical capacity 1000, and the physical address space corresponding to the physical capacity 1001 is an area where write data from the storage controller 10 serving as the initiator can be stored. In other words, the physical address space is a space composed of data stripes in the SSD 21 and does not include the capacity of the parity stripe in FIG. Also, the ECC generated by the FM chip I / F 203 is not included in the physical address space. Therefore, the total of the

physical capacities

1001 and 1002 is in a relation that it is equal to the total capacity of the blocks 211 (or cells 214) used as data stripes among the blocks 211 in the SSD 21.

FIG. 10 is a diagram for explaining the configuration of the configuration information management table. The configuration information management table 1300 mainly stores information related to the capacity of the SSD 21. The configuration information management table 1300 includes columns of logical capacity 1301, block status 1302, number of blocks 1303, capacity 1304, spare capacity 1305, FM mode change threshold 1306, and average data compression ratio 1307.

The logical capacity 1301 indicates the size of the logical address space provided by the SSD. The block status 1302, the number of blocks 1303, and the capacity 1304 indicate what state the FM block is currently in. In the block status 1302, “normal (MLC mode)” indicates a block in a normal state and operating in the MLC mode, and “normal (TLC mode)” indicates a block in a normal state and operating in the TLC mode. “Failure (MLC mode)” indicates each block in a failure state. For example, the configuration information management table 1300 illustrated in FIG. 10 indicates that 1000 blocks are operating in the normal state and in the MLC mode, and the capacity thereof is 1500 GB.

The number of blocks 1303 and the capacity 1304 in the row in which the block status 1302 is “normal (MLC mode)” respectively store the number of blocks that are operating in the normal state and in the MLC mode, and the total capacity of the blocks. Similarly, the number of blocks 1303 and the capacity 1304 in the row in which the block status 1302 is “normal (TLC mode)” respectively store the number of blocks operating in the normal state and the TLC mode, and the total capacity of the blocks. The Further, the number of blocks 1303 and the capacity 1304 in the row where the block status 1302 is “failed” respectively store the number of blocks of the failed block and the total capacity of the blocks.

Hereinafter, the block number 1303 and the capacity 1304 of the row in which the block status 1302 is “normal (MLC mode)” are respectively referred to as “the number of blocks in the normal state (MLC mode)” and “the block capacity in the normal state (MLC mode)”. " Similarly, the number of blocks 1303 and the capacity 1304 of the row whose block status 1302 is “normal (TLC mode)” are “the number of blocks in the normal state (TLC mode)” and “the block capacity in the normal state (TLC mode)”, respectively. The number of blocks 1303 and the capacity 1304 in the row whose block status 1302 is “failed” are respectively referred to as “number of blocks in failure state” and “block capacity in failure state”.

Next, the spare capacity 1305 indicates the size of the spare capacity of the SSD. In the SSD 21 according to the first embodiment, assuming that the normal (MLC mode) block capacity is M, the normal (TLC mode) block capacity is T, and the logical capacity is L, the reserve capacity 1305 is expressed by the following equation (1).
Reserve capacity 1305 = M + TL (1)
Is a value calculated by. In the SSD 21 according to the present embodiment, in the initial state, about 30% to 40% of the physical area of the SSD is secured as a spare capacity, as in the case of a general SSD. If some blocks fail and cannot be used while using the SSD 21, the value of M (or T) decreases. Since L is a constant value (the logical capacity does not change), the reserve capacity 1305 is reduced as a result.

The FM mode change threshold value 1306 is a value for the SSD controller 200 to determine that a block mode change is necessary. When the value of the spare capacity 1304 falls below the value of the FM mode change threshold value 1306, the SSD controller 200 determines that there is a risk that the spare block will be exhausted, and the mode of some of the MLC mode blocks in the FM Make changes.

The average data compression rate 1307 is information indicating which data is compressed when the data compression function is enabled. However, in the first embodiment, “N / A” indicating an invalid state is stored in FIG. 10 in order to describe an example in which data compression is not performed.

In the configuration information management table 1300 shown in FIG. 10, the SSD 21 having the configuration information management table 1300 currently has a total physical capacity of 1875 GB, of which 150 GB is in a failure state and the remaining 1725 GB is in a normal state. Yes, of which 1500 GB is operating in MLC mode and 225 GB is operating in TLC mode. The SSD 21 provides a capacity of 1000 GB to the user or the storage controller 10 as a logical capacity, and has a reserve capacity of 725 GB.

As described above, a part of the block 211 is used as a parity stripe. However, the physical capacity (capacity 1304, spare capacity 1305) shown in FIG. 10 represents the capacity of the block 211 used as the data stripe. Similarly, the number of blocks 1303 in FIG. 10 also represents the number of blocks 211 used as data stripes. However, as another embodiment, the SSD 21 may record the capacity of all blocks 211 in the SSD 21 in the capacity 1304 and the reserve capacity 1305, and record the number of all blocks 211 in the SSD 21 in the number of blocks 1303.

FIG. 11 is a diagram for explaining the configuration of the logical-physical conversion table. The logical / physical conversion table 1100 is a table for managing the mapping between logical pages and physical pages managed by the SSD 21. The SSD 21 employs a flash memory as a storage medium. As is well known, the minimum access (read, write) unit of the flash memory (FM chip 210) is a page (physical page). The size of the physical page is, for example, 8 KB. For this reason, the SSD 21 according to the first embodiment manages the logical address space provided to the storage controller 10 by dividing the logical address space into areas of the same size as the physical page. An area having the same size as the physical page is called a “logical page”. The SSD 21 according to the first embodiment maps one physical page to one logical page.

The SSD 21 according to the present embodiment manages each block in all the FM chips 210 with a unique identification number in the SSD 21, and this identification number is called a block number (block #). Each physical page in the block is managed with a unique number in the block, and this number is called a physical page number (or physical page #). By specifying the block # and the physical page #, the physical page in the SSD 21 is uniquely specified.

Further, the SSD 21 according to the present embodiment manages each logical page in the SSD 21 by assigning a unique identification number in the SSD. This identification number is called a logical page number (logical page #). The logical-physical conversion table 1100 stores information on block # and physical page # of a physical page mapped to a certain logical page for each logical page.

The logical-physical conversion table 1100 has columns of SSD LBA 1101, logical page # 1102, status 1103, block # 1104, and physical page # 1105, as shown in FIG. Each record of the logical-physical conversion table 1100 stores information about the logical page specified by the logical page # 1102. The SSD LBA 1101 stores the LBA (range) on the logical address space provided by the SSD 21 to the storage controller 10 corresponding to the logical page. When the SSD 21 receives an access request from the storage controller 10, the SSD 21 can convert the LBA included in the access request into a logical page # using the SSD LBA 1101 and the logical page # 1102. In block # 1104 and physical page # 1105, information for specifying the physical page mapped to the logical page (that is, block # and physical page #) is stored.

Status 1103 stores information indicating whether a physical page is mapped to a logical page. No physical page is mapped to the logical page of the SSD 21 in the initial state. When a write request is received from the storage controller 10, a physical page is mapped to a logical page to be written by the write request. When “assignment” is stored in the status 1103, it indicates that the physical page is mapped to the logical page. Conversely, when “unallocated” is stored in the status 1103, it means that the physical page is not mapped to the logical page (at this time, the block # 1104 and the physical page # 1105 corresponding to the logical page are set to NULL). (Invalid value) is stored).

As is well known, a physical page once written cannot be overwritten (if it is desired to overwrite the physical page, the entire block to which the physical page belongs needs to be erased once). Therefore, in the SSD 21, when an update (overwrite) request for a certain logical page is received from the storage controller 10, the update data is a physical page (new physical page) different from the physical page in which the pre-update data is written (referred to as an old physical page). Stored in a page). Then, block # 1 and physical page # of the new physical page are stored in block # 1104 and physical page # 1105 corresponding to the logical page to be updated.

FIG. 12 is a diagram illustrating the configuration of the block management table. The block management table 1150 is a table for managing the states of blocks and physical pages. Each record in the block management table 1150 stores information about a physical page in the SSD 21. The block management table 1150 has columns of block # 1151, FM cell mode 1152, physical page # 1153, status 1154, and erase count 1155.

In block # 1151 and physical page # 1153, status 1154 is the same information as block # 1104, physical page # 1105, and status 1103 in the logical-physical conversion table 1100, respectively. That is, when a physical page is allocated to a logical page, the block # and physical page # of the allocated physical page are stored in block # 1104 and physical page # 1105 of the logical-physical conversion table 1100, and the status 1103 is “allocated”. Is stored. At the same time, “assignment” is also stored in the status 1154 (in the block management table 1105) of the assigned physical page.

Also, when a failure occurs in a block and the physical page in the block cannot be accessed, the SSD 21 manages the block as an unusable block. Therefore, the SSD 21 stores “failure (blocked)” in the status 1154 of each physical page belonging to the block.

The FM cell mode 1152 is information indicating which mode the cell of the block is in, and the FM cell mode 1152 stores either “TLC” or “MLC” information. is doing. In the erase count 1155, the cumulative count of block erase is stored.

Hereafter, the flow of each process will be described.

FIG. 13 is a flowchart of the storage controller task. The storage controller task is realized by the CPU 11 of the storage controller 10 executing a storage control program. The storage controller 10 periodically executes this task process. In the following, there is a process described with the storage controller 10 as the subject, but this means that it is executed by the CPU 11 unless otherwise specified.

The storage controller 10 determines whether a read or write request has been received from the host computer 2 (S10). If no request has been received (S10: No), S20 is performed next.

When a read or write request has been received (S10: Yes), the storage controller 10 determines whether this request is a read or a write (S40). If this request is a read (S40: Read), the storage controller 10 executes a read process (S50), and then executes S20. Details of the read process will be described later (see FIG. 15). If this request is a write (S40: write), the storage controller 10 executes a write process (S60), and then executes S20. Details of the write processing will be described later (see FIG. 14).

Next, the storage controller 10 executes a drive monitoring process (S20), and then makes a determination in S30. Details of the drive monitoring process will be described later (FIG. 20).

In S30, the storage controller 10 determines whether or not a request for stopping the storage apparatus 1 has been received (S30). When the stop request has been received (S30: Yes), the storage controller 10 executes the stop process of the storage apparatus 1 and ends the process (END). When the stop request has not been received (S30: No), the storage controller 10 repeats the process from S10 again.

FIG. 14 is a flowchart of storage controller write processing. Similar to the storage controller task, the write processing is also realized by the CPU 11 of the storage controller 10 executing the storage control program.

The host 2 transmits a write request and write data to the storage controller 10 (S61). When the storage controller 10 receives a write request from the host 2, the storage controller 10 starts a write process. First, the storage controller 10 refers to the virtual volume management table 500 and the pool management table 550 to determine whether or not a chunk has been allocated to a virtual chunk including the write destination address of the virtual volume specified by the write request. (S62).

When the chunk is not assigned to the write-destination virtual chunk (S62: No), the storage controller 10 assigns the chunk to the virtual chunk (S63), and executes S64 after the chunk is assigned. When the chunk has been allocated to the write-destination virtual chunk (S62: Yes), the storage controller 10 executes S64 without performing S63.

In S64, the storage controller 10 stores the write data in the cache and generates parity (S64). The parity generated here is parity data of a parity stripe belonging to the same stripe line as the write data. Further, when it is necessary to read data or parity before update from the storage device (SSD 21) for parity generation, the processing is also performed.

After S64, the storage controller 10 transmits a write command and write data to the write destination storage device (S65). Here, in addition to the write data, the parity is also written to the storage device. Then, the storage controller 10 receives a write completion notification from the write destination storage device (S66). The storage controller 10 transmits a completion response to the write request to the host 2 (S67). The host 2 receives a completion response to the write request from the storage controller 10 (S68), and ends the processing (END).

Note that the flow of the write process described here is an example, and each step may be executed in a different order. For example, in the processing described above, after write data and parity are written to the storage device, a completion response is transmitted to the host 2. However, as another embodiment, the storage controller 10 performs processing in the order of transmitting a completion response to the host 2 immediately after storing the write data in the cache (S64) and then transmitting the write data and parity to the storage device. May be.

Through the above processing, the storage apparatus 1 can store the write data transmitted from the host 2.

FIG. 15 is a flowchart of the storage controller read processing. The read process is also realized by the CPU 11 of the storage controller 10 executing a storage control program.

The host 2 sends a read request to the storage controller 10 (S51). When the storage controller 10 receives a read request from the host 2, the storage controller 10 starts a read process. First, the storage controller 10 specifies a virtual chunk including the address of the virtual volume specified by the read request, specifies a chunk assigned to the virtual chunk, and stores the read destination from the storage devices constituting the chunk. A device is specified (S52). The storage controller 10 transmits a read command to the specified storage device (S53). Then, the storage controller 10 receives read data from the storage device (S54). The storage controller 10 stores the read data in the cache (S55). The storage controller 10 transmits a completion response to the read request and read data to the host 2 (S56). The host 2 receives a completion response and read data from the storage controller 10 (S57), and ends the processing (END).

Through the above processing, the storage apparatus 1 can respond with read data in response to a read request from the host 2.

FIG. 16 is a flowchart of the SSD controller task. The SSD controller task is performed by the CPU 201 of the SSD controller 200 executing the SSD control program. The SSD controller task is executed periodically. In the following, each process will be described with the SSD controller 200 as the subject. However, unless otherwise specified, it means that each process is executed by the CPU 201.

The SSD controller 200 first performs a process for a read or write request from the storage controller 10 as an initiator (S100). Details of the read / write processing will be described later (FIG. 17). Next, FM diagnosis processing for checking the presence or absence of a block failure is performed (S120), and then an FM for restoring the spare area as necessary. A depletion recovery process is executed (S140), and the determination of S160 is performed. Details of the FM diagnosis process will be described later (FIG. 18). Further, the FM depletion recovery process will be described later (FIG. 19).

In S160, the SSD controller 200 determines whether or not a request to stop the SSD 21 has been received from the storage controller 10. When the stop request has been received (S160: Yes), the SSD controller 200 executes the stop process of the SSD 21 and ends the process (END). When the stop request has not been received (S160: No), the SSD controller 200 repeats the process from S100 again.

Through the above processing, the SSD 21 can store the write data transmitted from the storage controller 10 and read the read data. Further, the state of the FM chip 210 can be monitored, and the recovery process of the spare capacity can be executed according to the result.

FIG. 17 is a flowchart of data read / write processing of the SSD controller. Similar to the SSD controller task, the CPU 201 of the SSD controller 200 executes the SSD control program, thereby realizing data read / write processing. In order to avoid complicated description, here, the data access size specified by the access request (read request or write request) requested by the storage controller 10 to the SSD 21 is the size of one logical page. An example will be described in which the range on the logical address space specified by the access request matches the logical page boundary.

The SSD controller 200 determines whether a read or write request has been received from the storage controller 10 as an initiator (S101). If the request has not been received (S101: No), the SSD controller 200 ends this process. When the request from the storage controller 10 is received (S101: Yes), the SSD controller 200 determines the content of the request (S102).

If this request is a read command (S102: read command), the SSD controller 200 transfers data from the physical page storing the read target data to the buffer area based on the information in the logical-physical conversion table 1100. (S103). Next, the SSD controller 200 determines whether or not the read data is normally read based on the error detection result of the ECC circuit included in the FM chip I / F 203 (S104). If the data could not be read normally in S104, the SSD controller 200 determines that an uncorrectable error state in which an error due to ECC is impossible (S104: Yes), and proceeds to S105. On the other hand, when the data can be normally read in S104 (S104: no), the SSD controller 200 transfers the data in the buffer area to the storage controller 10 (S117), and ends a series of processing.

In S105, the SSD controller 200 determines that the block to which the physical page storing the read target data belongs is in a failure state, and closes the block (S105). Next, the SSD controller 200 performs data recovery processing of the block using the parity data (S106), and transfers the read target data to the storage controller 10 from the recovered data (S108). Next, in order to store the recovered data, the SSD controller 200 secures one block of physical pages whose status 1153 is unallocated based on the information in the block management table 1150, and the recovered data is secured. Store in the block (S109).

After S109, the SSD controller 200 performs an update operation of various tables (the logical physical conversion table 1100, the block management table 1150, and the configuration information management table 1300) (S110). In particular, in S110, the SSD controller 200 adds 1 to the number of failed blocks 1303 in the configuration information management table 1300, and adds the capacity of one block to the failed block capacity 1304. When the blocked block is in the MLC mode, the SSD controller 200 subtracts 1 from the number of blocks 1303 in the normal state (MLC mode) and subtracts the capacity for one block from the block capacity 1304 in the normal state (MLC mode). (When a block in the TLC mode is blocked, the number of blocks 1303 in the normal state (TLC mode) and the block capacity 1304 in the normal state (TLC mode) are subtracted). Furthermore, the SSD controller 200 updates (subtracts) the reserve capacity 1305. After S110, the SSD controller 200 ends the process.

On the other hand, when the request received in S102 is a write command (S102: write command), the SSD controller 200 first stores the write target data in the buffer area (S111), and then stores the data in the block management table 1150. Based on the information, the physical page whose status 1153 is not allocated is specified, and the data stored in the buffer area is stored in the specified physical page (S113). Furthermore, the SSD controller 200 generates a parity stripe on the same stripe line as the physical page in which data is stored, selects one unassigned physical page to store the generated parity stripe, and selects the selected physical page. Parity data is written in the physical page (S114). Thereafter, the SSD controller 200 returns a notification (response) of the completion of processing related to the write command to the storage controller 10 (S115).

After S115, the SSD controller 200 performs S110. When S110 is performed after S115, the SSD controller 200 updates the logical-physical conversion table 1100 and the block management table 1150 in S110.

FIG. 18 is a flowchart of the FM diagnosis process of the SSD controller. Similar to the SSD controller task, the CPU 201 of the SSD controller 200 executes the SSD control program, thereby realizing FM diagnosis processing.

The SSD controller 200 determines whether there is a block that needs to be diagnosed (S121). For example, this processing may be performed periodically at specific time intervals (cycles) or when a specific command such as an execution instruction from the storage controller 10 is received, the number of block erasures, or page access It may be executed when a specific event occurs, such as when the number of times reaches N times. When it is determined that the diagnosis is unnecessary (S121: No), the SSD controller 200 ends the process (END). If it is determined that processing is necessary (S121: Yes), the SSD controller 200 selects a block to be inspected (S122), and reads the data of the page in the handling block (S123). Thereafter, S124 is performed. In S123, the SSD controller 200 does not need to read all pages. For example, only the physical page allocated to the logical page or only a specific physical page having an even or odd physical page number is read. Also good.

In S124, the SSD controller 200 determines whether there is a page in which an uncorrectable error has occurred due to a sudden hardware failure or the like. If there is a page where an uncorrectable error has occurred (S124: Yes), the SSD controller 200 closes the block (S127). Next, the parity data is used to recover the data in the block (S128), and the recovered data is stored in another block (S129). Thereafter, various tables are updated (S130). The processing of S127 to S130 is the same as S105 to S110 except that data transmission to the storage controller 10 (S108) is not performed. Thereafter, S121 is performed again.

On the other hand, when there is no page in which an uncorrectable error has occurred in S124 (S124: No), the SSD controller 200 executes S125. In S125, the SSD controller 200 determines whether or not the number of error bits generated on the inspection target page is greater than a predetermined threshold. If the number of generated error bits is not greater than the predetermined threshold value in S125 (S125: no), the SSD controller 200 performs the process from S121 again. On the other hand, when the number of generated error bits is larger than the predetermined threshold (S125: Yes), the SSD controller 200 performs a refresh process on the block (S126), and then executes S130. Note that “refresh” means a process of reading data stored in a physical page (or block) and moving it to another physical page (block). Therefore, in S130, the logical / physical conversion table 1100 and the block management table 1150 are updated. However, when S130 is executed after S126, the block of block is not generated, and the configuration information management table 1300 is not updated.

FIG. 19 is a flowchart of the FM depletion recovery process of the SSD controller. Similar to the SSD controller task, the CPU 201 of the SSD controller 200 executes the SSD control program, thereby realizing FM depletion recovery processing.

The SSD controller 200 refers to the configuration information management table 1300 to determine whether or not the spare capacity is almost exhausted (S141). Specifically, when the value of the reserve capacity 1305 is below the FM mode change threshold value 1306, the SSD controller 200 determines that the reserve capacity is almost exhausted. If it is determined in S141 that the depletion is not near (S141: No), the process ends.

On the other hand, if it is determined that the depletion is near (S141: Yes), the SSD controller 200 determines whether there is a block whose cell mode can be changed based on the block management table 1150 (S142). ). If the status 1154 is not “failure”, but the FM cell mode 1152 has only a “TLC” block (S142: No), the SSD controller 200 determines that there is no mode changeable cell and ends the process. To do. On the other hand, when the status 1154 is not “failure” and there is a block whose FM cell mode 1152 is “MLC” (S142: Yes), the SSD controller 200 determines that there is a cell whose mode can be changed, and executes S143. To do.

In S143, the SSD controller 200 expands the physical capacity by changing the cell mode of some blocks from MLC to TLC. Here, the SSD controller 200 can arbitrarily determine the number of blocks whose cell mode is changed. However, at least the number of blocks necessary for the reserve capacity 1305 to exceed the FM mode change threshold 1306 is changed from the MLC mode to the TLC mode. Further, the cell mode of a block in use (a block having a physical page assigned to a logical page) may be changed. However, in that case, the SSD controller 200 changes the cell mode after moving the physical page data allocated to the logical page among the physical pages in the block to an unused physical page. If the data is moved to an unused physical page, the mapping between the logical page and the physical page is changed, so that the logical-physical conversion table 1100, the block management table 1150, etc. are updated (in step S144 described later). )

Thereafter, the SSD controller 200 updates the configuration information management table 1300 and, if necessary, updates the logical-physical conversion table 1100 and the block management table 1150 (S144). Further, the SSD controller 200 notifies the storage controller 10 that the FM cell mode has been changed and that the cause is a block failure (S145), and ends this processing. The notification transmitted from the SSD 21 to the storage controller 10 in S145 is referred to as “mode change notification”. The mode change notification includes information included in the configuration information management table 1300. Therefore, the storage controller 10 can grasp the state of the SSD 21 in detail by receiving the mode change notification.

FIG. 20 is a flowchart of storage controller drive monitoring processing. Similar to the storage controller task, the drive monitoring process is also realized by the CPU 11 of the storage controller 10 executing the storage control program.

Storage controller 10 determines whether a mode change notification has been received from SSD controller 200 (S21). If no mode change notification has been received (S21: No), this process ends. If a mode change notification has been received (S21: Yes), the storage controller 10 determines whether there is room for expanding the physical capacity (S22).

When S22 is performed, the storage controller 10 receives the information of the configuration information management table 1300 from the SSD 21 (SSD controller 200) together with the mode change notification. Therefore, the storage controller 10 uses the information in the configuration information management table 1300 to determine whether there is room for expanding the physical capacity (whether the normal state (MLC mode) block 1303 is greater than 0). If there is a normal block that is operating in the MLC mode (S22: Yes), it is determined that there is still room for enlargement, and this process is terminated. On the other hand, if it is determined in S22 that there is no room for enlargement (S22: No), the storage controller 10 next executes S23.

In S23, the storage controller 10 has no remaining capacity for avoiding the risk of depletion of the spare capacity in the relevant SSD 21, so that if there is a further block failure, there is a risk that the spare capacity will be completely depleted. It is determined that the data is very high, data is copied to the spare drive, and the relevant SSD is blocked (S23). In a known storage apparatus that employs RAID technology, it is possible to execute access request processing to a drive and data movement to a spare drive in parallel (even if these two processes are performed in parallel, Some can be controlled to prevent data corruption and access to incorrect data). Also in the storage apparatus according to the present embodiment, an access request (read or write request) to the SSD 21 may be accepted in parallel with the processing of S23. Thereafter, this process is terminated.

Further, instead of determining whether or not the normal state (MLC mode) block 1303 is larger than 0 in the determination of S22, the storage controller 10 determines whether the normal state (MLC mode) block 1303 is larger than a predetermined threshold. It may be determined whether or not. Alternatively, when the determination of S21 is affirmative (when the mode change notification is received), the storage controller 10 may always execute S23 without performing the determination of S22. This is because the SSD that issues the mode change notification is likely to be in a state where it cannot be used because the spare capacity is exhausted soon.

The above is the description of the storage device (SSD) and the storage apparatus according to the first embodiment. In the SSD according to the present embodiment, when the reserve capacity decreases and falls below a predetermined threshold, a part of a block operated in the MLC mode (or a mode in which each cell can store n-bit data) is stored in the TLC. The mode is changed to operate in a mode (or a mode in which each cell can store m-bit (m> n) data), and the reserve capacity is increased. As a result, it is possible to prevent a situation in which the SSD spare capacity is depleted and the SSD cannot be used.

In addition, when the SSD according to the present embodiment changes a part of the block operated in the MLC mode to the TLC mode, the SSD transmits a mode change notification to the storage controller of the storage apparatus. When the storage controller receives this notification, it moves the SSD data that issued the notification to the spare drive. The SSD that issues the mode change notification, that is, the SSD that has changed the mode of the cell is in a state where the spare capacity is small, and there is a high possibility that it cannot be used if a further failure occurs. When the storage controller moves the data of the SSD that has issued the mode change notification to the spare drive, it is possible to prevent a situation in which the data cannot be accessed even if a further failure occurs.

In addition, when the spare capacity of the SSD falls below a predetermined threshold, the storage controller may immediately move the data to the spare drive without changing the cell mode in the SSD. If a further failure occurs, the SSD cannot be used, and there is a possibility that data cannot be moved to the spare drive. Therefore, as explained in the present embodiment, it is more reliable that the storage controller moves the data to the spare drive after the SSD has changed the cell mode once to secure a certain spare capacity. It can be performed.

Subsequently, the storage apparatus according to the second embodiment will be described. The configuration of the storage apparatus according to the second embodiment is the same as that described in the first embodiment. The hardware configuration of the SSD according to the second embodiment is almost the same as that described in the first embodiment. Therefore, when specifying each component in a storage apparatus or SSD, it demonstrates using the same term (or reference number) used in Example 1. FIG.

Note that the SSD according to the first embodiment does not necessarily include the compression / decompression circuit 207, but the SSD according to the second embodiment always includes the compression / decompression circuit 207 or an equivalent thereof.

In this specification, “compression” means a process of reducing the data size while maintaining the meaning of the data using a reversible compression algorithm such as the LZW algorithm. When the SSD 21 receives the write data from the storage controller 10, it compresses the data using the compression / decompression circuit 207 and stores the compressed data in the FM 210. Data compressed by the compression / decompression circuit 207 is referred to as “compressed data”.

Also, the process of restoring the compressed data to the original data size using the lossless compression algorithm is called “decompression”. When the SSD 21 receives a read request from the storage controller 10, the SSD 21 reads the compressed data from the FM 210, decompresses the compressed data using the compression / decompression circuit 207, and returns the data to the storage controller 10. That is, in the storage device 1 according to the present embodiment, data compression or expansion is performed transparently to the host 1 and the storage controller 10.

Hereinafter, an outline of the second embodiment will be described with reference to FIG. In addition, the definition of the compression rate in a present Example is as follows. When compression of certain data (with size x) reduces the data size to y (0 <y ≦ x), the compression ratio can be obtained by calculating “y ÷ x”. As the data compression efficiency is higher and the data size after compression becomes smaller, the value of the compression rate becomes smaller (takes a value close to 0). Conversely, the larger the data size after compression, the larger the value of the compression ratio (takes a value close to 1). Therefore, in this embodiment, “small compression ratio” means that the compression efficiency is good and the data size after compression is small, and “high compression ratio” means that the compression efficiency is bad and the data size after compression is too small. It means not to be.

FIG. 21 is a diagram showing an outline of the second embodiment. Similar to FIG. 1, the change in the internal capacity of the SSD is expressed in time series. At time t1, the physical capacity 1021 is the size of the physical storage area required for allocation to the logical address space, and the physical capacity 1022 is a spare capacity. At time t1, the physical area corresponding to the

physical capacities

1021 and 1022 is constructed in the MLC mode.

In the SSD according to the second embodiment, the logical capacity is determined in expectation that the size of data stored in the FM is reduced. Therefore, in the initial state, the logical capacity 1020 of the SSD is set to a value larger than the physical capacity 1021. This is because when the data actually stored in the physical capacity is reduced, an amount of data exceeding the physical capacity can be stored in the SSD.

In the SSD according to the second embodiment, the average value of the compression rate when general data is compressed by the compression / decompression circuit 207 (this is referred to as “assumed compression rate” in this specification. The assumed compression rate is a predetermined constant. The logical capacity is determined on the basis of In the initial state, if the assumed compression rate of data is a, the logical capacity 1020 is equal to the value obtained by the physical capacity 1021 ÷ a. In FIG. 21, the state at time t1 is a state in which data is written in the entire logical address space (or almost the entire region) and the compression rate of each data is the assumed compression rate (a). At this time, an amount of physical area corresponding to the physical capacity 1021 is mapped to the entire logical address space.

However, a value other than the average compression rate may be used as the assumed compression rate. That is, the logical capacity may be determined based on an index other than the average value of the compression rate. For example, the logical capacity may be determined on the assumption that all data is compressed at the minimum compression rate (this is b) that can be realized by the compression / decompression circuit 207. In this case, the logical capacity is physical capacity 1021 / b. For example, when b = 0.25 (when all data is compressed to a quarter), the logical capacity is four times the physical capacity 1021.

T2 represents a case in which when the data already stored at t1 is overwritten and updated, the update data has changed to a data pattern that is difficult to compress (that is, the data compression rate has deteriorated). When the data compression rate deteriorates, the data size after compression increases. For this reason, the physical capacity required to store user data at t1 increases from the physical capacity 1021 to the physical capacity 1024. Along with this, the size of the physical capacity 1022 reserved as the spare capacity is reduced to the physical capacity 1025.

That is, in an SSD having a data compression function and a logical capacity expansion function, when the compression rate of stored data deteriorates, there is a risk that the reserve capacity is compressed and eventually runs out. Therefore, in the second embodiment, by switching a part of the

physical capacity

1024 or 1025 from the MLC mode to the TLC mode at t3, the reserve capacity is expanded and the risk of exhaustion is avoided.

Hereinafter, only the difference from Example 1 will be described.

FIG. 26 is a configuration example of the logical-physical conversion table 1100 ′ included in the SSD 21 according to the second embodiment. The SSD 21 according to the second embodiment includes a logical-physical conversion table 1100 'instead of the logical-physical conversion table 1100 (FIG. 11) included in the SSD 21 according to the first embodiment.

Among the columns of the logical / physical conversion table 1100 ′, the SSD LBA 1101 to physical page # 1105 are the same as the logical / physical conversion table 1100 described in the first embodiment. The logical / physical conversion table 1100 ′ further includes columns of Offset 1106 and Length 1107.

Offset 1106 represents a relative address (offset) where the top of the physical page is 0, and Length 1107 represents the length of the area. The unit of the value stored in Offset 1106 and Length 1107 is a byte. As a result, each row (record) of the logical-physical conversion table 1100 ′ is added to the logical page specified by the logical page # 1102, and the top of this physical page among the physical pages specified by the block # 1104 and the physical page # 1105. This indicates that an area of Length 1106 (bytes) starting from the position of Offset 1105 (bytes) to Allocation is allocated.

FIG. 27 is a configuration example of the block management table 1150 ′ included in the SSD 21 according to the second embodiment. The SSD 21 according to the second embodiment has a block management table 1150 'instead of the block management table 1500 (FIG. 12) included in the SSD 21 according to the first embodiment. As shown in FIG. 27, the block management table 1150 ′ has columns of Offset 1156 and Length 1157 in addition to the columns of the block management table 1150 in the first embodiment.

Offset 1156 and Length 1157 are the same information as Offset 1106 and Length 1107 of the logical-physical conversion table 1100 ′. Offset 1156 represents the relative address (offset) when the top position of the physical page is 0, and Length 1157 represents the length of the area. . The unit of the value stored in Offset 1156 and Length 1157 is a byte. Thereby, each row of the block management table 1150 ′ indicates that the areas specified by Offset 1156 and Length 1157 among the areas of the physical page specified by block # 1151 and physical page # 1153 are allocated to the logical page. To express.

Subsequently, the difference between the configuration information management table 1300 in the second embodiment and the first embodiment will be described. Since the format of the configuration information management table 1300 included in the SSD 21 according to the second embodiment is the same as that described in the first embodiment, the illustration is omitted. However, the definition of the reserve capacity 1305 in the second embodiment is different from that described in the first embodiment.

In the first embodiment, the spare capacity 1305 has the normal (MLC mode) block capacity M, the normal (TLC mode) block capacity T, and the logical capacity L. The spare capacity 1305 has been described in the first embodiment. It was determined by equation (1) (reserved capacity 1305 = M + TL). On the other hand, in the second embodiment, the reserve capacity 1305 is expressed by the following equation (2).
Reserve capacity 1305 = Min (M + TL × a, M + TA) (2)
The value calculated in Note that a is an assumed compression rate, A is the total capacity of physical pages allocated to the logical address space, and Min (α, β) means the smaller value of α and β.

Outline the meaning of equation (2). Similarly to the SSD according to the first embodiment, the SSD 21 according to the second embodiment secures an area of about 30% to 40% as a spare capacity in the physical area of the SSD in the initial state. In the initial state, there is no physical page already assigned to the logical address space, so A = 0. Therefore, the reserve capacity is equal to “M + TL−a”. As described in the first embodiment, when some blocks fail and cannot be used while the SSD 21 is used, the value of M (or T) decreases, resulting in “M + TL−a”. The value (reserved capacity 1305) will decrease.

However, in the case of the SSD according to the second embodiment, even when no failure occurs in the block, when the number of physical pages allocated to the logical address space increases (that is, when A used in Expression (2) increases). , “M + TA” becomes smaller. As an example in which the number of physical pages allocated to the logical address space increases, there is a case where the compression rate of the post-update data becomes large and exceeds the assumed compression rate. In that case, “M + TA” may be smaller than “M + TL × a”, and the reserve amount 1305 decreases.

FIG. 22 is a flowchart of the data read / write processing of the SSD controller according to the second embodiment. The main difference from FIG. 17 is that the data compression process is added before the data is stored in the FM in the write process (S112), and the data is decompressed before the data is transferred to the storage controller 10 in the read process. (S107 and S116). Also, when updating the contents of the logical-physical conversion table 1100 ′ or the block management table 1150 in S <b> 110, the Offset 1106, Length 1107, Offset 1156, and Length 1157 are also updated. Further, when the spare capacity 1305 is updated, the SSD controller 200 determines the spare capacity 1305 based on the equation (2) described above.

Also, when the write process (S111 to S115) is performed, the compression rate of the SSD 21 may fluctuate. Therefore, in the management information update process (S110) performed after S115, the SSD controller 200 obtains the data compression rate and updates the content of the average data compression rate 1307. The compression rate can be obtained, for example, by performing the following calculation. In the area on the logical address space, X is the number of logical pages to which physical pages are allocated, and Y is the number of physical pages allocated to logical pages. At this time, “Y ÷ X” is the compression rate. Both X and Y can be counted by referring to the logical-physical conversion table 1100 '.

These are the differences between the data read / write process described in the first embodiment and the data read / write process in the second embodiment, and other processes are the same as those in FIG.

Note that the process illustrated in FIG. 22 is an example, and may be executed by a procedure other than this. In particular, in the write processing procedure of FIG. 22, each time a write command is received, compressed data is written to a physical page. Therefore, an unused area (an area where received data is not written) is written in the latter half of the physical page. ) Remains. Since the minimum write unit of the FM 210 is a physical page, data cannot be written into this unused area later. Therefore, the SSD controller 200 may accumulate the write data after compression in the buffer area and perform S115 (return of completion notification) before executing S113 and S114. Then, the SSD controller 200 may execute S113 and S114 when compressed data of one physical page or more is accumulated in the buffer area. Thereby, a lot of compressed data can be stored in one physical page.

FIG. 23 is a flowchart of the FM depletion recovery process of the SSD controller according to the second embodiment. In the SSD 21 according to the second embodiment, there are cases where the compression rate of stored data deteriorates as an execution factor of the reserve capacity recovery process due to the change of the cell mode, in addition to the case of the block failure described in the first embodiment.

Therefore, the difference between the FM depletion recovery process according to the second embodiment and the process described in the first embodiment (FIG. 19) is that the FM depletion recovery process according to the second embodiment updates the internal information after changing the FM cell mode. After the process (S144), a process (S146) for determining the cause of the shortage of the spare capacity is added. In S146, the SSD controller 200 refers to the average data compression ratio 1307 and the number of failed blocks 1303 (or failed block capacity 1304) in the configuration information management table 1300, and the deterioration of the data compression ratio is a cause of the exhaustion of the spare capacity. Alternatively, it may be determined whether the increase in the number of failed blocks is the cause of exhaustion of the spare capacity. If the cause is a deterioration of the data compression rate (S146: deterioration of the data compression rate), the SSD controller 200 informs the storage controller 10 to that effect (S147).

Subsequently, of the processes performed by the storage controller 10 according to the second embodiment, differences from the first embodiment will be described. In the second embodiment, when the storage controller 10 detects an SSD in which the cell mode has been changed due to the deterioration factor of the data compression rate in the drive monitoring process, the storage controller 10 avoids the depletion of the spare capacity due to the further deterioration of the data compression rate. Then, a capacity rebalancing process is performed in which a part of the data of the SSD is moved to another SSD. First, the outline will be described with reference to FIG.

In this embodiment, capacity rebalancing is performed by means of moving data in chunk units, not SSD units. In the figure, RG30-1 is a RAID group to which the SSD that has changed the cell mode belongs, and has a large amount of data storage. On the other hand, the RG 30-2 is a RAID group with a small data storage amount. Therefore, the storage controller 10 reduces the data storage amount of the RG 30-1 by moving the data of the chunk 31 belonging to the RG 30-1 to the chunk 32 belonging to the RG 30-2. Along with this, a reduction in the data storage amount of the SSD belonging to RG 30-1 is realized. Note that the above-described chunk movement is performed transparently to the host 2 by the capacity virtualization function of the storage controller.

FIG. 25 is a flowchart of the drive monitoring process of the storage controller in the second embodiment.

The storage controller 10 determines whether or not a cell mode change notification is received from the SSD controller 200 (S21). When the notification has not been received (S21: No), this process ends. If a notification has been received in S21 (S21: Yes), the storage controller 10 analyzes the factor (S24). If the cause is a block failure (S24: block failure), the storage controller 10 executes S22 and S23, and the post-execution processing of S23 ends. Since the processing contents of S22 and S23 have already been described in the first embodiment, description thereof is omitted here.

On the other hand, in S24, when the cause is the deterioration of the data compression rate (S24: Deterioration of the data compression rate), the storage controller 10 determines whether or not the capacity rebalancing can be executed (S25). More specifically, in S25, the storage controller 10 refers to the pool management table 550 and determines whether there is a RAID group with a sufficient capacity (a RAID group whose remaining capacity 556 is equal to or greater than a predetermined threshold value). Determine if it exists). If another RAID group with sufficient capacity exists (S25: Yes), the storage controller 10 moves some chunks of data to the RAID group with sufficient capacity (S26), and ends this process. To do. On the other hand, in S25, when there is no other RAID group with room (S25: No), the storage controller 10 displays a notification requesting capacity addition to the pool on the display screen of the management host 5, The user is prompted to add a storage area to the pool (S27). After the user adds a storage area (RAID group) to the pool, the storage controller 10 moves some chunks of data to the pool with the storage area added (S28). After S28, this process ends.

As mentioned above, although the Example of this invention was described, these are the illustrations for description of this invention, Comprising: It is not the meaning which limits the scope of the present invention only to these Examples. That is, the present invention can be implemented in various other forms. For example, in the above embodiment, the configuration example in which the SSD is mounted on the storage device and connected to the storage controller has been described. However, the configuration may be such that the SSD is directly mounted on the server.

Further, in the above embodiment, the case where only the physical capacity is increased when the cell mode is changed has been described. However, in addition to the physical capacity, the logical capacity may be increased simultaneously.

In the second embodiment, the SSD compresses data using the compression / decompression circuit 207 and reduces the amount of data stored in the FM. However, instead of providing the compression / decompression circuit 207, the CPU 201 uses the lossless compression algorithm. May be configured to perform data compression or decompression.

In the above embodiment, an example of performing data compression processing using a lossless compression algorithm as an example of a function for reducing the amount of data stored in FM has been described. Instead of using a lossless compression algorithm, for example, duplication of data is performed. The amount of data stored in the FM may be reduced by performing the exclusion process. Deduplication is a technique for reducing the amount of data by searching for the same data in the entire storage area and deleting the rest of the data, and can be said to be a data compression process in a broad sense. In that case, the deterioration of the data compression rate may be read as the deterioration of the deduplication rate. That is, when the deduplication rate of data already stored in the SSD deteriorates, the size of the data after deduplication increases, so that the reserve capacity is compressed as in the case of compression. Further, the SSD may be configured to further reduce the amount of data stored in the FM by using both the deduplication process and the compression process using the lossless compression algorithm.

Each program (storage control program or SSD control program) that causes the CPU to execute the processing described above is provided by a program distribution server or a storage medium that can be read by a computer, and is installed in each device that executes the program. Also good. The computer-readable storage medium is a non-transitory computer-readable medium such as a non-volatile storage medium such as an IC card, an SD card, or a DVD.

1: storage device, 2: host, 3: SAN, 5: management host, 10: storage controller, 11: CPU, 12 host I / F, 13: disk I / F, 14: memory, 15: management I / F F, 21: SSD, 25: HDD, 200: SSD controller, 201: CPU, 202: disk I / F, 203: FM chip I / F, 204: memory, 205: internal connection switch, 206: parity calculation circuit, 207: Compression / decompression circuit, 210: FM chip, 211: Block, 213: Die, 214: Cell

Claims

A storage device connected to the storage controller,
The storage device includes a device controller that provides a logical storage space of a predetermined size to the storage controller, and a nonvolatile semiconductor storage medium having a plurality of blocks that are data erasure units,
The block has a plurality of cells for storing data, and each block has m bits (n <m) from the state in which the cell is operated in the first mode capable of storing n-bit information. Can be changed to the second mode that can store the information of
The device controller manages the storage area that exceeds the amount of the storage area in the block necessary for allocation to the logical storage space among the storage areas that can be used in the block as a spare capacity,
When the reserve capacity falls below a predetermined threshold, the device controller can use the block by changing a part of the block that is operating in the first mode so that it is operated in the second mode. Increase storage space,
Storage device.
When the block becomes unusable due to a failure, the reserve capacity is reduced by an amount corresponding to the storage capacity of the block that has become unusable.
The storage device according to claim 1.
The device controller provides the logical storage space having a size exceeding the sum of the storage areas of the blocks;
When the device controller receives a write command and write data from the storage controller, the device controller allocates the intra-block storage area to the position on the logical storage space specified by the write command, and the position on the logical storage space The write data is stored in the storage area in the block allocated to
The device controller manages, as the spare capacity, a difference between a total of usable storage areas in the block and an amount of the storage area in the block allocated to the logical storage space.
The storage device according to claim 1.
The device controller is configured to compress the write data and store the compressed data of the write data in the storage area in the block.
The storage device according to claim 3.
The device controller notifies the storage controller that a part of the block being operated in the first mode has been changed to be operated in the second mode;
The storage device according to claim 1.
A storage controller and a plurality of storage devices connected to the storage controller;
The storage device includes a device controller that provides a logical storage space of a predetermined size to the storage controller, and a nonvolatile semiconductor storage medium having a plurality of blocks that are data erasure units,
The block has a plurality of cells for storing data, and each block has m bits (n <m) from the state in which the cell is operated in the first mode capable of storing n-bit information. Can be changed to the second mode that can store the information of
The device controller manages the storage area that exceeds the amount of the storage area in the block necessary for allocation to the logical storage space among the storage areas that can be used in the block as a spare capacity,
When the reserve capacity falls below a predetermined threshold, the device controller can use the block by changing a part of the block that is operating in the first mode so that it is operated in the second mode. Increase storage space,
Storage device.
When the block becomes unusable due to a failure, the reserve capacity is reduced by an amount corresponding to the storage capacity of the block;
The storage apparatus according to claim 6.
The device controller provides the logical storage space having a size exceeding the sum of the storage areas of the blocks;
When the device controller receives a write command and write data from the storage controller, the device controller allocates the intra-block storage area to the position on the logical storage space specified by the write command, and the position on the logical storage space The write data is stored in the storage area in the block allocated to
The device controller manages, as the spare capacity, a difference between a total of usable storage areas in the block and an amount of the storage area in the block allocated to the logical storage space.
The storage apparatus according to claim 6.
The device controller is configured to compress the write data and store the compressed data of the write data in the storage area in the block.
The storage device according to claim 8.
When the storage controller receives from the storage device that a part of the block being operated in the first mode has been changed to be operated in the second mode,
Moving all the data stored in the storage device to an alternative storage device among the plurality of storage devices;
The storage apparatus according to claim 6.
The storage controller comprises at least a first RAID group and a second RAID group using a plurality of the storage devices,
When the storage controller receives from the storage device belonging to the first RAID group that a part of the block being operated in the first mode has been changed to be operated in the second mode,
Moving a portion of the data stored in the first RAID group to the second RAID group;
The storage device according to claim 9.
The storage controller provides a virtual volume composed of a plurality of partitions to a host computer, and each of the partitions is assigned a storage area of the first or second RAID group when a write request is received from the host computer. Configured to be
When the storage controller receives from the storage device belonging to the first RAID group that a part of the block being operated in the first mode has been changed to be operated in the second mode, Moving the data in the storage area of the first RAID group assigned to the partition to a storage area not assigned to the virtual volume in the storage areas of the second RAID group;
The storage apparatus according to claim 11.