WO2015173925A1

WO2015173925A1 - Storage device

Info

Publication number: WO2015173925A1
Application number: PCT/JP2014/062959
Authority: WO
Inventors: 彬史鈴木; 義裕吉井; 和衛弘中; 山本　彰
Original assignee: 株式会社日立製作所
Priority date: 2014-05-15
Filing date: 2014-05-15
Publication date: 2015-11-19

Abstract

　This storage device has a plurality of final storage media and a cache device provided with a data compression function and a parity generation function. The storage device stores write data from a host device in the final storage medium after compressing the data, and, for the host device, provides a virtual non-compressed volume for concealing that the data is stored after being compressed. The storage device divides the area of the virtual non-compressed volume into stripe units, and manages each stripe in correlation with one of the plurality of final storage media constituting a RAID group. When storing the data of each stripe in the final storage medium, the storage device generates a parity from the data of each stripe and compresses the generated parity and the data of each stripe by using the cache device, and stores the parity and the data of each stripe that have been compressed by the cache device in each of the final storage media constituting the RAID group.

Description

Storage device

The present invention relates to a storage device using a semiconductor recording device as a primary data storage device, and a device control method.

Storage devices generally include a component called a cache for the purpose of improving request processing performance (hereinafter simply referred to as performance). The cache plays two major roles in the storage device. One of the roles is to store an area with a relatively high read / write access frequency in the cache and improve the average performance of the storage apparatus. The second function is to temporarily store write data when a write request is received from the server to the storage apparatus. In recent years, a device using a nonvolatile semiconductor memory (hereinafter referred to as NVM) such as a NAND-type flash memory (hereinafter referred to as FM) is known as a storage element of such a cache. (For example, Patent Document 1)

On the other hand, storage devices are required to store data at low cost while maintaining the reliability of data retention. In order to satisfy these requirements, storage apparatuses that record data by lossless compression (hereinafter simply referred to as compression) are known. When the data size is reduced by compression and recorded in the final storage medium and / or the cache device, the data retention cost (bit cost of the storage medium, power consumption cost of the storage device, etc.) can be reduced.

A storage having such a compression function generally conceals the fact that recorded data is compressed and stored from the server that the storage device provides a storage area, and the data is recorded as if it were in an uncompressed state. So as to provide a storage area. With this function, the user can enjoy the merit of lowering the holding cost due to the compression without changing the software such as the existing application and operating system.

US Patent Application Publication No. 2009/0216945

As described in the background art, it is desirable to compress and store data in the cache device and the final storage medium, and to conceal changes in the data due to the compression from the server. A storage apparatus having a data compression function and a function for concealing data changes caused by compression includes a virtual uncompressed storage area provided to the server (hereinafter referred to as a virtual uncompressed volume) and a physical area that is a recording destination of compressed data. Need to be associated.

On the other hand, in the lossless compression algorithm, the compression rate changes depending on the data content. For example, when the information on the compression target data is complex (large) and the information on the compression target data is monotonous (small), the data size after compression is smaller in the monotonous case. Thus, the data size after compression depends on the data content of the data to be compressed, and can only be obtained by a heuristic method of “actual compression”. Accordingly, the association between the virtual uncompressed volume and the physical recording destination of the compressed data changes dynamically every time the recorded data changes.

For this reason, the storage device manages the correspondence by dividing the virtual uncompressed volume into fixed areas, and updates the information managing the dynamically changing correspondence every time data update and data compression accompanying it are completed. To do. Hereinafter, information for managing this correspondence relationship is referred to as compression management information.

Compressed management information is generally larger in size than other management information managed by the storage. As an example, a case will be described in which an 800 TB virtual uncompressed volume is provided using a 100 TB physical area. In the configuration in which the area 800TB of the virtual uncompressed volume is divided into 4KB areas, and the association is managed with information of 8B (for example, 6B start position information + 2B length information) for each area, 800TB ÷ 4KB × 8B = 1600 GB of compression management information is required.

A lot of management information of a storage device is generally stored in a DRAM that can be accessed at high speed by a processor that controls the storage device. However, a method of storing compression management information in GB units in a DRAM having a high bit cost and high power consumption causes an increase in data retention cost.

Further, the compression management information is management information for managing the virtual non-compressed volume and the physical recording destination of the compressed data, and is indispensable for the response to the data read request from the server. Therefore, from the viewpoint of the reliability of the storage apparatus, the loss of compression management information is equivalent to the loss of retained data. For this reason, in order to retain the compression management information, it is necessary to maintain at least the same level of reliability as data.

In order to achieve the above object, the storage apparatus of the present invention has a function of providing a virtual uncompressed volume to a host apparatus such as a server in order to conceal data changes caused by compression. The storage apparatus also divides the area of the virtual uncompressed volume into stripe units, and manages each stripe in association with one of a plurality of final storage media constituting the RAID group. When storing the data of each stripe in the final storage medium, the parity is generated from the data of each stripe, the generated parity and the data of each stripe are compressed, and the compressed parity and the data of each stripe are stored in the stripe. Are stored in the final storage medium associated with each other.

Further, the storage device divides compression management information for managing the correspondence of the compressed data with the recording destination on the final storage medium for each recording medium, and compresses management for managing only the correspondence relation related to one recording medium. Information is recorded in a specific area of a recording medium to be managed.

According to the present invention, the compression management information can be held with the same reliability as the data in the storage device. Further, it is possible to reduce the processing load associated with updating the compression management information and improve the performance of the storage apparatus.

FIG. 1 is a diagram showing a schematic configuration of a computer system centered on a storage apparatus according to an embodiment of the present invention. FIG. 2-A is a conceptual diagram showing a logical space configuration of the storage apparatus according to the embodiment of the present invention. FIG. 2-B is another conceptual diagram showing the logical space configuration of the storage apparatus according to the embodiment of the present invention. FIG. 3-A is a diagram showing a data flow when the storage apparatus receives a write command from the host apparatus. FIG. 3-B is a diagram showing a data flow when the storage apparatus receives a write command from the host apparatus. FIG. 4 is a diagram showing an internal configuration of the NVM module. FIG. 5 is a diagram showing the internal configuration of the FM. FIG. 6 is a diagram showing the internal configuration of the physical block. FIG. 7 is a diagram showing the concept of associating the LBA0 and LBA1 spaces, which are logical spaces provided by the NVM module to the storage controller, and the PBA space, which is a physical area designating address space. FIG. 8 is a diagram showing the contents of the LBA0-PBA conversion table 810 and the LBA1-PBA conversion table 820 managed by the NVM module. FIG. 9 is a diagram showing block management information used by the NVM module. FIG. 10 is a diagram showing a write command and response information to the write command received by the NVM module. FIG. 11 is a diagram showing a compressed data size acquisition command and response information to the compressed data size acquisition command received by the NVM module. FIG. 12 is a diagram showing an LBA1 mapping command received by the NVM module and response information for the LBA1 mapping command. FIG. 13 is a diagram illustrating a full stripe parity generation command and response information to the full stripe parity generation command received by the NVM module. FIG. 14 is a diagram illustrating an update parity generation command and response information to the update parity generation command received by the NVM module. FIG. 15 is a diagram showing a compressed information acquisition command and response information to the compressed information acquisition command received by the NVM module. FIG. 16 is a diagram showing a read command and response information to the read command received by the NVM module. FIG. 17 is a diagram illustrating a mapping cancellation command and response information to the mapping cancellation command received by the NVM module. FIG. 18 is a diagram illustrating a compressed information transfer command and response information to the compressed information transfer command received by the NVM module. FIG. 19 is a diagram illustrating LBA0 mapping command and response information to the LBA0 mapping command received by the NVM module. FIG. 20A is a diagram showing an example of cache management information. FIG. 20B is a diagram showing an example of a free list. FIG. 21 is a conceptual diagram showing a correspondence relationship between a virtual volume, a RAID group, and a PDEV in the storage apparatus according to the embodiment of the present invention. FIG. 22 is a diagram showing an example of management information for managing the correspondence between virtual volumes, RAID groups, and PDEVs in the storage apparatus according to an embodiment of the present invention. FIG. 23 is a diagram showing a configuration of compression management information used by the storage apparatus according to the embodiment of the present invention. FIG. 24 is a flowchart of the decompression read process. FIG. 25 is a flowchart of the write data cache storage process. FIG. 26 is a flowchart of parity generation processing. FIG. 27 is a flowchart of the destage process. FIG. 28 is a flowchart of the partial recovery operation of the compression management information 2300. FIG. 29 is a flowchart of the rebuild process.

Next, an embodiment of the present invention will be described based on the drawings. The present invention is not limited to the embodiments described below. Note that a NAND flash memory (hereinafter referred to as FM) will be described as an example of the semiconductor recording element, but the present invention is not limited to FM, and covers all nonvolatile memories. In this embodiment, a mode in which data compression is performed by a dedicated hardware circuit will be described. However, the present invention is not limited to this embodiment, and data is compressed by a data compression arithmetic process by a general-purpose processor. It may be compressed. In this embodiment, a mode in which the parity is implemented by a dedicated hardware circuit will be described. However, the present invention is not limited to this embodiment, and the RAID parity is set by a parity generation calculation process by a general-purpose processor. It may be generated.

(1-1) Configuration of Storage Device FIG. 1 is a diagram showing a schematic configuration of a computer system centered on a storage device according to an embodiment of the present invention. The NVM module 126 shown in FIG. 1 is a semiconductor recording device using FM as a recording medium.

The storage apparatus 101 includes a plurality of storage controllers 110. Each storage controller 110 includes a host interface (host I / F) 124 that connects to a host device and a disk interface (disk I / F) 123 that connects to a recording device. Examples of the host interface 124 include devices that support protocols such as FC (Fibre Channel), iSCSI (Internet Small Computer System Interface), and FCoE (Fibre Channel over Ether), and the disk interface 107 includes, for example, FC and SAS (Serial). Examples include devices compatible with various protocols such as Attached SCSI), SATA (Serial Advanced Technology Attachment), and PCI (Peripheral Component Interconnect) -Express. Furthermore, the storage controller 110 includes hardware resources such as a processor 121 and a memory (DRAM) 125, and a final storage medium device such as the SSD 111 and the HDD 112 in response to a read / write request from the host device 124 under the control of the processor. To read / write requests. Further, it has an NVM module 126 used as a cache device, and can be controlled from the processor 121 via the internal SW 122.

In addition, the storage controller 110 has a RAID (Redundant Arrays of Inexpensive Disks) parity generation function and a data restoration function using RAID parity, and manages a plurality of SSDs 111 and a plurality of HDDs 112 as a RAID group in arbitrary units. The storage controller 110 also has a function of monitoring and managing the failure, usage status, operating status, etc. of the recording device.

The storage apparatus 101 is connected to the management apparatus 104 via a network. An example of this network is a LAN (Local Area Network). Although this network is omitted for simplification in FIG. 1, it is connected to each storage controller 110 in the storage apparatus 101. This network may be connected by the same network as the SAN 102.

The management device 104 is a computer having hardware resources such as a processor, a memory, a network interface, and a local input / output device, and software resources such as a management program. The management device 104 acquires information from the storage device by a program and displays a management screen.

The system administrator uses the management screen displayed on the management apparatus to monitor the storage apparatus 101 and control the operation.

There are a plurality (for example, 16) of SSDs 111 in the storage apparatus 101, and they are connected to a plurality of storage controllers 110 in the storage apparatus via the disk interface 123. The SSD 111 stores data transferred in response to a write request from the storage controller, retrieves stored data in response to a read request, and transfers the data to the storage controller. At this time, the disk interface 123 designates a logical storage location for a read / write request by a logical address (hereinafter, LBA: Logical Block Address). The plurality of SSDs 111 are managed as a plurality of RAID groups, and are configured to be able to restore lost data when data is lost.

A plurality of HDDs (Hard Disk Drives) 112 (for example, 120) are provided in the storage apparatus 101, and are connected to a plurality of storage controllers 110 in the same storage apparatus via the disk interface 123, similarly to the SSD 111. The HDD 112 stores data transferred in response to a write request from the storage controller 110, retrieves stored data in response to a read request, and transfers it to the storage controller 110. At this time, the disk interface 123 designates a logical storage location for a read / write request by a logical address (hereinafter, LBA: Logical Block Address). Further, the plurality of HDDs 111 are managed as a plurality of RAID groups, and are configured to be able to restore lost data when data is lost.

The storage controller 110 is connected to the SAN 102 connected to the host device 103 via the host interface 124. Although omitted in FIG. 1 for simplification, a connection path for mutually communicating data and control information between the storage controllers 110 is also provided.

The host device 103 corresponds to, for example, a computer or a file server that forms the core of a business system. The host device 103 includes hardware resources such as a processor, a memory, a network interface, and a local input / output device, and includes software resources such as a device driver, an operating system (OS), and an application program. As a result, the host apparatus 103 executes various programs under processor control to perform communication with the storage apparatus 101 and data read / write requests. Also, management information such as usage status and operation status of the storage apparatus 101 is acquired by executing various programs under processor control. In addition, the management unit of the recording apparatus, the recording apparatus control method, the data compression setting, and the like can be designated and changed.

So far, the computer system configuration including the NVM module 126 to which the present invention is applied has been described.

(1-3) Logical Configuration of Storage Device Next, the logical space configuration of the storage device of this embodiment will be described with reference to FIG. FIG. 2-A shows the transition of the write data management state when a write request is issued from the host device 103 in the storage apparatus of this embodiment.

The host apparatus 103 recognizes a virtual volume (indicated as “virtual Vol” in the figure) 200 as a storage area, and performs data access by designating an address in the virtual volume 200.

The virtual volume 200 is a virtual space that the storage apparatus 101 provides to the host apparatus 103. When the upper device 103 writes data to the virtual volume 200, the write data is compressed in the storage device 101 and stored in the final storage medium (HDD 111 or SSD 112). It cannot be recognized that data is compressed and stored in 200 (changes in data due to data compression are concealed). In FIG. 2A, an example in which the storage apparatus 101 has one virtual volume 200 is described, but the present invention is not limited to this example. The storage apparatus 101 may manage a plurality of virtual volumes. In addition, a plurality of volumes to be managed may include a volume that is not compressed. However, in the embodiment of the present invention, the virtual volume 200 that conceals data compression in the higher-level device 103 will be mainly described.

The storage apparatus 101 of the present invention logically manages the storage areas of the physical HDD 111 or SSD 112 as PDEV 205 (Physical Device), and associates each PDEV 205 with one virtual PDEV 204 whose capacity is virtually expanded. Manage. The storage apparatus 101 configures and manages an RG 203 (RAID group) with a plurality of virtual PDEVs 204, and manages the RG 203 and the virtual volume 200 in association with each other. FIG. 2A shows an example in which one RG is associated with one virtual volume 200 (virtual volume 200 and RAID group 0), but the present invention is not limited to this example. A configuration in which one RG and a plurality of virtual volumes are associated may be employed, or a configuration in which one virtual volume is associated with a plurality of RGs may be employed.

The storage apparatus 101 manages the area specified as the write destination in the virtual volume 200 as being cached in the LBA0 space provided by the NVM module 126. Note that the LBA0 space is a virtual logical space provided to the storage apparatus 101 by the NVM module 126, and it is assumed that the data compressed and stored by the NVM module 126 is stored uncompressed. This space is accessible from the processor 121) of the storage controller 110.

On the other hand, the write data is transferred to the NVM module 126 after the storage apparatus 101 receives it from the host apparatus 103. At this time, the NVM module 126 of this embodiment compresses the data and records it in the NVM module 126.

When the recording to the NVM module 126 is completed, the storage apparatus 101 determines that the write data has been stored in the cache area (the LBA0 space provided by the NVM module 126), and notifies the upper apparatus 103 that the write has been completed. To do.

In addition, the storage apparatus 101 transfers the compressed data of the write data recorded in the LBA0 space to the HDD 111 or the SSD 112 as the final storage medium at an arbitrary timing. At this time, the storage apparatus 101 needs to acquire compressed data from the NVM module 126. As shown in FIG. 2A, the storage apparatus 101 of this embodiment acquires compressed data using the LBA1 space 202 provided by the NVM module 126. For this purpose, the storage apparatus 101 issues a command for associating the compressed data stored in the non-compressed area area in the LBA0 space with the LBA1 space 202 to the NVM module 126.

The NVM module 126 that has received the association command for the LBA1 space 202 associates the compressed data associated with the designated LBA0 space with the LBA1 space. The storage apparatus 101 acquires the compressed data from the NVM module 126 by designating the address of the LBA1 space.

Subsequently, the storage apparatus 101 identifies the addresses in the virtual PDEV 204 and the virtual PDEV 204 for storing data from the addresses in the virtual volume associated with the compressed data to be transferred to the final storage medium. Then, the address of the PDEV 205 associated with the address in the virtual PDEV 204 is determined, and the data is transferred to the physical device.

The above is the outline of the logical configuration of the storage apparatus of this embodiment. As shown in FIG. 2-B, in the present invention, the LBA1 space 202 for acquiring the compressed data is not necessary. For example, the storage apparatus 101 may issue a read command including an LBA0 address and an instruction to transfer the compressed data without decompressing, and read the compressed data from the NVM module 126 using the LBA0 space.

(1-4) Write Data Transfer Write data transfer in the storage apparatus according to the embodiment of the present invention will be described with reference to FIG. The storage apparatus 101 compresses and stores the data acquired from the host apparatus 103 in the NVM module 126 that is a cache. Hereinafter, this operation is referred to as a host write operation. Next, data transfer that occurs in the host write operation will be described.

First, the first data transfer in the host ride operation is performed when the write data is acquired from the host device. This transfer is a transfer from the host interface 124 to the DRAM 125 of the storage controller (311). The storage apparatus 101 performs this transfer by issuing a command to the host interface 124.

Subsequently, the storage apparatus 101 issues a command to the NVM module 126 and transfers the write data stored in the DRAM 125 to the NVM module 126 (312). The NVM module 126 compresses the write data by internal compression hardware (compression circuit) and stores it in the DRAM (data buffer) 416 in the NVM module 126 (313). When the storage of the compressed data in the DRAM is completed, the NVM module 126 notifies the storage device that the storage of the write data is completed. The compressed data stored in the DRAM 416 may be transferred from the DRAM 416 to the NVM (FM) 420 in the NVM module 126 and recorded, or may be kept in the DRAM 416. Whether the transfer from the DRAM 416 to the NVM (FM) 420 is necessary depends on the control method in the NVM module 126.

The storage apparatus 101 that has received the write data storage completion from the NVM module 126 notifies the host apparatus 103 of the completion of the write command. The above is the data transfer that occurs in the host write operation in this embodiment.

After the host write operation, the storage apparatus 101 generates a RAID parity for the write data at an arbitrary timing. Hereinafter, this operation is referred to as a parity generation operation. Next, data transfer that occurs in the parity generation operation will be described.

The storage apparatus 101 generates a RAID parity for the write data during the parity generation operation. In the present invention, parity is not generated for compressed write data, but parity is generated for uncompressed write data. In this parity generation method, a method is conceivable in which write data is stored uncompressed when recorded in the NVM module 126 and the data is compressed after parity generation. However, many nonvolatile memories such as NAND flash memory and RRAM have a limited number of writes, and by reducing the amount of data written to the NVM (FM) 420 by compression, the device life of the NVM module 126 can be extended. . In addition, by compressing and storing data, the capacity of the NVM module 126 can be expanded, and the cache area of the storage apparatus can be expanded at a lower apparatus cost.

In order to obtain the above effects, the NVM module 126 compresses and stores the data in the DRAM 416 or NVM in the NVM module 126, and decompresses the data when generating the parity. This operation is not indispensable, and it may be recorded in the NVM (FM) 420 or the DRAM 416 without being compressed, and the data may be compressed together with the generated parity after the parity is generated.

In the NVM module 126 according to the embodiment of the present invention, after the compressed data recorded in the DRAM 416 or the NVM (FM) 420 is expanded, the expanded data is provided to the parity generation circuit (317). With this function, the NVM module 126 generates parity for the uncompressed data while aiming to increase the life of the apparatus and reduce the cost.

The parity generated by the parity generation circuit is transferred to the compression circuit (318), and is transferred to the DRAM 416 as compressed data (319). Note that the compressed parity stored in the DRAM 416 may be recorded in the NVM (FM) 420 or kept in the DRAM 416 depending on the determination of the NVM module 126. Further, it is not always necessary to compress the parity generated by the parity generation circuit. In general, since the data effect due to compression cannot be expected compared with data, parity control may be performed without compression.

The above is the data transfer that occurs in the parity generation operation.

After the parity generation operation, the storage apparatus 101 transfers the compressed data of parity and write data to the final storage medium at an arbitrary timing. Hereinafter, this operation is referred to as a destage operation. Next, data transfer that occurs in the destage operation will be described.

The storage apparatus 101 reads the compressed parity and write data from the NVM module 126 during the destage operation. At this time, the NVM module 126 transfers the designated write data and compressed parity data to the DRAM 125 of the storage apparatus 101 (322). Thereafter, the storage apparatus 101 transfers the compressed write data and parity to the SSD or HDD (323).

The above is the outline of the write data transfer process performed by the storage apparatus 101 according to the embodiment of the present invention. In the present invention, after the compressed data is decompressed by the decompression circuit, it is not essential to directly transfer the compressed data to the parity generation circuit, and the decompressed data is recorded in the DRAM 416 as shown in FIG. May be transferred to the parity generation circuit. Similarly, the parity generated by the parity generation circuit need not be directly transferred to the compression circuit, but the generated parity may be recorded in the DRAM 416 and the data may be transferred to the compression circuit.

(1-5) Configuration of NVM Module Next, the internal configuration of the NVM module 126 will be described with reference to FIG.

The NVM module 126 includes an FM controller (FM CTL) 410 and a plurality of (for example, 32) FM 420s.

The FM controller 410 includes a processor 415, a RAM (DRAM) 413, a data compression / decompression unit 418, a parity generation unit 419, a data buffer 416, an I / O interface (I / F) 411, an FM interface (I / F). ) 417, and a switch 414 for mutually transferring data.

The switch 414 connects the processor 415 in the FM controller 410, the RAM 413, the data compression / decompression unit 418, the parity generation unit 419, the data buffer 416, the I / O interface 411, and the FM interface 417, and addresses the data between the parts. Or route and forward by ID.

The I / O interface 411 is connected to the internal switch 122 included in the storage controller 110 in the storage apparatus 101, and is connected to each part of the FM controller 410 via the switch 414. The I / O interface 411 receives a read / write request and a logical storage location (LBA: Logical Block Address) to be requested from the processor 121 included in the storage controller 110 in the storage apparatus 101, and processes the request. I do. Further, when a write request is made, the write data is received and the write data is recorded in the FM 420. Further, the I / O interface 411 receives an instruction from the processor 121 included in the storage controller 110 and issues an interrupt to the processor 415 in the FM controller internal 410. Further, the I / O interface 411 also receives a control command for the NVM module 126 from the processor 121 included in the storage controller 110, and displays the operation status, usage status, current setting value, etc. of the NVM module 126 according to the command. The storage controller 110 can be notified.

The processor 415 is connected to each part of the FM controller 410 via the switch 414 and controls the entire FM controller 410 based on the program and management information recorded in the RAM 413. In addition, the processor 415 monitors the entire FM controller 410 by a periodic information acquisition and interrupt reception function.

The data buffer 416 is configured by using a DRAM as an example, and stores temporary data in the middle of data transfer processing in the FM controller 410.

The FM interface 417 is connected to the FM 420 by a plurality of buses (for example, 16). A plurality (for example, 2) of FM 420 is connected to each bus, and a plurality of FMs 420 connected to the same bus are controlled independently using a CE (Chip Enable) signal that is also connected to the FM 420.

The FM interface 417 operates in response to a read / write request instructed by the processor 415. At this time, the FM interface 417 is instructed by the processor 415 as the chip, block, and page numbers as request targets. If it is a read request, the stored data is read from the FM 420 and transferred to the data buffer 416. If it is a write request, the data to be stored is called from the data buffer 416 and transferred to the FM 420.

Also, the FM interface 417 includes an ECC generation circuit, an ECC data loss detection circuit, and an ECC correction circuit. When writing data to the FM 420, the data is written with the ECC added. Further, when data is called, the call data from the FM 420 is inspected by the data loss detection circuit using ECC, and when the data loss is detected, the data is corrected by the ECC correction circuit.

The data compression / decompression unit 418 has a data compression function using a reversible compression algorithm. In addition, there are a plurality of types of data compression algorithms, and a compression level changing function is also provided. The data compression / decompression unit 418 reads data from the data buffer 416 according to an instruction from the processor 415, performs a data compression operation that is a data compression operation or an inverse conversion of the data compression by a lossless compression algorithm, and outputs the result again. Write to the data buffer. The data compression / decompression unit 418 may be implemented as a logic circuit, or a similar function may be realized by executing a compression / decompression program with a processor.

The parity generation unit 419 has a function of generating parity that is redundant data required in the RAID technology. Specifically, the parity generation unit 419 includes an XOR operation used in RAID 5 and 6, a Reed Solomon code or EVENODD used in RAID 6. It has a function of generating diagonal parity calculated by the method. The parity generation unit 419 reads data that is a parity generation target from the data buffer 416 in accordance with an instruction from the processor 415, and generates RAID5 or RAID6 parity by the above-described parity generation function.

The switch 414, I / O interface 411, processor 415, data buffer 416, FM interface 417, data compression / decompression unit 418, and parity generation unit 419 described above are ASIC (Application Specific Integrated Circuit) and FPGA (Field Programmable Gate). Array) may be configured in one semiconductor element, or may be configured by connecting a plurality of individual dedicated ICs (Integrated Circuits) to each other.

Specifically, a volatile memory such as a DRAM is used for the RAM 413. The RAM 413 stores management information of the FM 420 used in the NVM module 126, a transfer list including transfer control information used by each DMA, and the like. A part or all of the role of the data buffer 416 for storing data may be included in the RAM 413 and the RAM 413 may be used for data storage.

Up to this point, the configuration of the NVM module 126 to which the present invention is applied has been described with reference to FIG. In the present embodiment, the NVM module 126 having the flash memory (Flash Memory) is described as shown in FIG. 4, but the non-volatile memory to be mounted on the NVM module 126 is not limited to the flash memory. . For example, a non-volatile memory such as Phase Change RAM or Resistance RAM may be used. Further, a configuration may be adopted in which part or all of the FM 420 is a volatile RAM (DRAM or the like).

Next, the FM 420 will be described with reference to FIG. The nonvolatile memory area in the FM 420 is composed of a plurality (for example, 4096) of blocks (physical blocks) 502, and stored data is erased in units of physical blocks. The FM 420 has an I / O register 501 inside. The I / O register 501 is a register having a recording capacity equal to or larger than a physical page size (for example, 8 KB).

FM 420 operates in accordance with a read / write request instruction from FM interface 417. The flow of the write operation is as follows. First, the FM 420 receives a write command, a requested physical block, and a physical page from the FM interface 417. Next, the write data transferred from the FM interface 417 is stored in the I / O register 501. Thereafter, the data stored in the I / O register 501 is written to the designated physical page.

The flow of read operation is as follows. First, the FM 420 receives a read command, a requested physical block, and a page from the FM interface 417. Next, the data stored in the physical page of the designated physical block is read and stored in the I / O register 501. Thereafter, the data stored in the I / O register 501 is transferred to the FM interface 417.

Next, the physical block 502 will be described with reference to FIG. The physical block 502 is divided into a plurality of (for example, 128) pages 601, and reading of stored data and writing of data are processed in units of pages. The order of writing to the physical page 601 in the block 502 is fixed, and writing is performed in order from the first page. That is, data must be written in the order of Page1, Page2, Page3,. In addition, overwriting on a written page 601 is prohibited in principle, and when data is overwritten on a written page 601, it is necessary to delete the data in the block 502 to which the page 601 belongs only after the data is erased. Data cannot be written to page 601.

So far, the configuration of the NVM module 126 to which the present invention is applied and the computer system in which the NVM module 126 is used have been described. Next, a storage area provided to the storage apparatus by the NVM module 126 in this embodiment will be described.

(1-6) Overview of Correspondence between LBA and PBA of NVM Module Next, the storage space provided by the NVM module 126 to the storage apparatus 101 in this embodiment will be described. The NVM module 126 according to this embodiment includes a plurality of FMs (chips) 420, manages a storage area composed of a plurality of blocks and a plurality of pages, and a storage controller 110 (processor 121) to which the NVM module 126 is connected. Provides a logical storage space. Here, “providing storage space” means that each storage area to be accessed by the storage controller 110 is assigned an address, and the processor 121 of the storage controller 110 to which the NVM module 126 is connected determines the address. By issuing a designated access request (command), it means that the data stored in the area specified by the address can be referred to and updated. The physical storage area configured by the FM 420 is managed in a manner uniquely associated with an address space used only within the NVM module 126. Hereinafter, this physical area designating address space (physical address space) used only within the NVM module 126 will be referred to as a PBA (Physical Block Address) space, and each physical storage area (sector in the PBA space. In the embodiment of the present invention). The position (address) of 1 sector is 512 bytes) is described as PBA (Physical Block Address). The NVM module 126 of this embodiment manages the association between this PBA and LBA (Logical Block Address) that is the address of each area of the logical storage space provided to the storage apparatus.

A conventional storage device such as an SSD provides one storage space for a host device (such as a host computer) to which the storage device is connected. On the other hand, the NVM module 126 of this embodiment has two logical storage spaces, and provides the two logical storage spaces to the storage controller 110 to which the NVM module 126 is connected. The relationship between the two logical storage spaces LBA and PBA will be described with reference to FIG.

FIG. 7 is a diagram illustrating a concept of association between the LBA0 space 701 and the LBA1 space 702, which are logical storage spaces provided by the NVM module 126 of the present embodiment to the storage controller 110, and the PBA space 703.

The NVM module 126 provides two logical storage spaces, an LBA0 space 701 and an LBA1 space 702. Hereinafter, the addresses assigned to the storage areas on the LBA 0 space 701 are referred to as “LBA 0” or “LBA 0 address”, and the addresses assigned to the storage areas on the LBA 1 space 702 are referred to as “LBA 1”. Or “LBA1 address”. In the embodiment of the present invention, the size of the LBA0 space 701 and the size of the LBA1 space 702 are both equal to or smaller than the size of the PBA space. However, even when the size of the LBA0 space 701 is larger than the size of the PBA space, The invention is effective.

The LBA0 space 701 is a logical storage space for allowing the processor 121 of the storage controller 110 to access the compressed data recorded in the physical storage area configured by the FM 420 as uncompressed data. When the processor 121 designates an address (LBA0) on the LBA0 space 701 and issues a write request to the NVM module 126, the NVM module 126 acquires write data from the storage controller 110 and compresses it by the data compression / decompression unit 418. After that, the NVM module 126 records data in the physical storage area on the FM 420 designated by the dynamically selected PBA, and associates LBA0 and PBA. When the processor 121 designates LBA0 and issues a read request to the NVM module 126, the NVM module 126 acquires data (compressed data) from the physical storage area of the FM 420 indicated by the PBA associated with the LBA0. After decompression by the compression / decompression unit 418, the decompressed data is transferred to the storage controller 110 as read data. The association between LBA0 and PBA is managed by an LBA0-PBA conversion table described later.

The LBA1 space 702 is a logical storage space for allowing the storage controller 110 to access the compressed data recorded in the physical storage area configured by the FM 420 as it is (not expanded). When the processor 121 of the storage controller 110 designates LBA1 and issues a write request to the NVM module 126, the NVM module 126 acquires data (compressed write data) from the storage controller 110, and the NVM module 126 dynamically The data is recorded in the storage area of the FM designated by the selected PBA, and the LBA 1 and the PBA are associated with each other. When the processor 121 designates LBA 1 and issues a read request, the NVM module 126 acquires data (compressed data) from the physical storage area of the FM 420 indicated by the PBA associated with LBA 1 and reads it to the storage controller 110. Transfer compressed data as data. The association between LBA1 and PBA is managed by an LBA1-PBA conversion table described later.

Note that, as shown in FIG. 7, the area on the PBA space, which is the physical storage area in which the compressed data 713 is recorded, may be associated with both the LBA0 space area and the LBA1 space area at the same time. For example, the decompressed data of the compressed data 713 is associated with the LBA0 space as the decompressed data 711, and the compressed data 713 is directly associated with the LBA1 space as the compressed data 712. For example, when the processor 121 specifies LBA0 (assuming that LBA0 is set to 0x000000011000) and writes data to the NVM module 126, the data is compressed by the data compression / decompression unit 418 in the NVM module 126. The NVM module 126 is arranged on the dynamically selected PBA space (specifically, any unwritten page among a plurality of pages of the FM 420). The data is managed in a state associated with the address 0x000000011000 of the LBA0 space. Thereafter, when the processor 121 issues a request for associating the data associated with 0x000000011000 to the address of the LBA1 space (assuming 0x80000000010) to the NVM module 126, this data is also associated with the LBA1 space, and the processor 121 is associated with the LBA1 space. When a request (command) for reading the data at the address 0x80000000010 is issued to the NVM module 126, the processor 121 can read out the data written to the LBA0 address 0x000000011000 in a compressed state.

The storage apparatus 101 in the present embodiment associates the data written to the NVM module 126 with LBA0 specified, associates it with an area on the LBA1 space, specifies LBA1 and issues a RAID parity generation instruction corresponding to the data. RAID parity generation for compressed data is enabled.

Note that the size of the compressed data generated by the NVM module 126 in the embodiment of the present invention is limited to a multiple of 512 bytes (1 sector), and does not exceed the size of the uncompressed data. Yes. That is, when 4 KB data is compressed, the minimum size is 512 bytes and the maximum size is 4 KB.

(1-7) NVM Module Management Information 1: LBA-PBA Conversion Table Next, management information used for control by the NVM module 126 in this embodiment will be described.

As management information used by the NVM module 126, the LBA0-PBA conversion table 810 and the LBA1-PBA conversion table 820 will be described with reference to FIG.

The LBA0-PBA conversion table 810 is stored in the DRAM 413 in the NVM module 126, and includes information on the NVM module LBA0 (811), the NVM module PBA (812), and the PBA length (813). The processor 2415 of the NVM module 126 receives the LBA 0 specified at the time of the read request from the host device, and then uses the LBA 0 to obtain the PBA indicating the location where the actual data is stored.

At the time of update writing, the NVM module 126 records the update data (write data) in a physical storage area different from the PBA in which the pre-update data is recorded, and converts the PBA and PBA length in which the update data is recorded into an LBA0-PBA conversion. Record in the corresponding part of the table and update the LBA0-PBA conversion table. By operating in this manner, the NVM module 126 enables (pseudo) overwriting of data in the area on the LBA0 space.

The NVM module LBA0 (811) is a logical area of the LBA0 space provided by the NVM module 126 arranged in units of 4 KB in order (each address (LBA0) in the LBA0 space is attached to each sector (512 bytes). Have been). In the LBA0-PBA conversion table 810 in this embodiment, it is intended that the association between the NVM module LBA0 (811) and the NVM module PBA (812) is managed in units of 4 KB (8 sectors). However, the association between the NVM module LBA0 (811) and the NVM module PBA (812) may be managed in an arbitrary unit other than the 4 KB unit.

The NVM module PBA (812) is a field for storing the head address of the PBA associated with the NVM module LBA0 (811). In this embodiment, the physical storage area of the PBA space is divided and managed for every 512 bytes (one sector). In the example of FIG. 8, a value (PBA) of “XXX” is associated as a PBA (Physical Block Address) associated with the NVM module LBA0 (811) “0x000_0000_0000”. This value is an address that uniquely indicates a storage area among a plurality of FMs 420 mounted on the NVM module 126. As a result, when “0x000 — 0000 — 0000” is received as the head address (LBA 0) of the read request destination, “XXX” is acquired as the head address (PBA) of the physical storage area (read destination) in the NVM module 126. When there is no PBA associated with LBA0 specified by the NVM module LBA0 (811), a value (such as NULL or 0xFFFFFFFF) indicating “unallocated” is stored in the NVM module PBA (812). The

In the PBA length 813, the actual storage size of 4 KB data designated in the NVM module LBA0 (811) is recorded. The storage size is recorded by the number of sectors. In the example shown in FIG. 8, data of 4 KB (LBA 0 space is 8 sec) with LBA 0 “0x000 — 0000 — 0000” as the start address is recorded with a PBA length of “2”, that is, 512 B × 2 = 1 KB. Is shown. Therefore, when combined with the information of the NVM module PBA (812), 4 KB data having the start address of LBA 0 [0x000_0000_0000] is compressed and stored in the 1 KB area from PBA “XXX” to “XXX + 1”. ing. Note that the NVM module 126 in this embodiment compresses uncompressed data instructed by the processor 121 of the storage controller 110 in units of 4 KB. For example, when the processor 121 receives a write request for 8 KB data (uncompressed data) starting from an address in the LBA 0 space (0x000_0000_0000), 4 KB data in the address range 0x000_0000_0000 to 0x000_0000_0007 (in the LBA 0 space) is used as a unit. Compressed data is generated by compression, and then compressed data is generated by compressing 4 KB data in the address range 0x000_0000_0008 to 0x000_0000_000F as a unit, and each compressed data is written in the physical storage area of the FM 420. However, the present invention is not limited to a mode in which data is compressed in units of 4 KB, and the present invention is effective even in a configuration in which data is compressed in other units.

Subsequently, the LBA1-PBA conversion table 820 will be described. The LBA1-PBA conversion table 820 is stored in the DRAM 413 in the NVM module 126, and includes two pieces of information of the NVM module LBA1 (821) and the NVM module PBA (822). The processor 245 of the NVM module 126 receives the LBA1 specified at the time of the read request from the upper apparatus, and then uses the LBA1-PBA conversion table 820 to indicate the location where the actual data of the LBA1 is stored. Convert to PBA.

The NVM module LBA1 (821) is a logical area of the LBA1 space provided by the NVM module 126 arranged in order for each sector (a numerical value 1 in the NVM module LBA1 (821) means one sector (512 bytes). To do). This is because the NVM module 126 in this embodiment is described on the premise that the association between the NVM module LBA1 (821) and the NVM module PBA (822) is managed in units of 512B, but this NVM module LBA1 (821). ) And the NVM module PBA (822) are not limited to the mode managed in 512B units, and may be managed in any unit. However, LBA1 is a space that directly maps the physical storage area PBA that is the storage location of the compressed data, and is preferably equal to the PBA division management size. In this embodiment, LBA1 is divided in units of 512B. to manage.

The NVM module PBA (822) is a field for storing the head address of the PBA associated with LBA1. In this embodiment, the PBA is divided and managed for each 512B. In the example of FIG. 8, the PBA value “ZZZ” is associated with the NVM module LBA1 “0x800_0000_0002”. This PBA value is an address that uniquely indicates a storage area on a certain FM 420 mounted on the NVM module 126. Accordingly, when “0x800_0000_0002” is received as the read request destination start address (LBA1), “ZZZ” is acquired as the physical read destination start address in the NVM module 126. When there is no PBA associated with LBA1 specified by the NVM module LBA1 (821), a value indicating “unallocated” is stored in the NVM module PBA (822).

The above is the contents of the LBA0-PBA logical-physical conversion table 810 and the LBA1-PBA logical-physical change table 820 used by the NVM module 126.

(1-9) NVM Module Management Information 3: Block Management Information Next, block management information used by the NVM module to which the present invention is applied will be described with reference to FIG.

The block management information 900 is stored in the DRAM 413 in the NVM module 126 and includes items of an NVM module PBA 901, an NVM chip number 902, a block number 903, and an invalid PBA amount 904.

The NVM module PBA 901 is a field for storing a PBA value that uniquely identifies each area in all the FMs 420 managed by the NVM module 126. In this embodiment, the NVM module PBA 901 is divided and managed in units of blocks. FIG. 9 shows an example in which the head address is stored as the NVM module PBA value. For example, the field “0x000_0000_0000” indicates that the NVM module PBA range from “0x000_0000_0000” to “0x000_0000_0FFF” is applicable.

The NVM chip number 902 is a field for storing a number for uniquely specifying the FM Chip 420 mounted on the NVM module 126. The block number 903 is a field for storing the block number in the FM Chip 420 specified by the stored value of the NVM Chip number 902.

The invalid PBA amount 904 is a field for storing the invalid PBA amount of the block specified by the stored value of the block number 903 in the FM Chip specified by the stored value of the NVM Chip number 902. The invalid PBA amount is associated with the LBA0 space and / or LBA1 space specified by the NVM module LBA0 (811) and the NVM module LBA1 (821) in the LBA0-PBA conversion table 810 and the LBA1-PBA conversion table 820. This is the amount of the area (on the PBA space) that was later released from the association. Conversely, the PBA associated with the NVM module LBA0 or LBA1 by the LBA0-PBA conversion table 810 or the LBA1-PBA conversion table 820 is referred to as an effective PBA in this specification.

The invalid PBA area is inevitably generated when a pseudo-overwrite is attempted in a non-volatile memory where data cannot be overwritten. Specifically, the NVM module 126 records the update data in an unwritten PBA (different from the PBA in which the pre-update data is written) at the time of data update, and the NVM module PBA 812 of the LBA0-PBA conversion table 810. And the PBA length 813 field are rewritten to the start address and PBA length of the PBA area in which the update data is recorded. At this time, the association by the LBA0-PBA conversion table 810 is released for the PBA area in which the pre-update data is recorded. At this time, the NVM module 126 also checks the LBA1-PBA conversion table 820 and sets an area that is not associated in the LBA1-PBA conversion table as an invalid PBA area. The NVM module 126 counts the amount of invalid PBA for each block, which is the minimum erase unit of FM, and preferentially selects a block with a large amount of invalid PBA as a garbage collection target area. In the example of FIG. 9, the block number 0 of the NVM chip number 0 managed by the NVM module 126 has an invalid PBA area of 160 KB as an example.

In this embodiment, when the total amount of invalid PBA areas managed by the NVM module 126 exceeds a predetermined garbage collection start threshold (depletion of unwritten pages), blocks including invalid PBA areas are erased and unwritten. Create a PBA area. This operation is called garbage collection. When an effective PBA area is included in an erasure target block at the time of garbage collection, it is necessary to copy the effective PBA area to another block before erasing the block. Since this data copy involves a write operation to the FM, the destruction of the FM progresses, and resources such as the processor of the NVM module 126 and the bus bandwidth are consumed as the copy operation, which causes a decrease in performance. For this reason, it is desirable that the number of valid PBA areas be as small as possible. The NVM module 126 according to the present embodiment refers to the block management information 900 at the time of garbage collection, and deletes the effective PBA by sequentially deleting the blocks having a larger storage value of the invalid PBA amount 904 (including many invalid PBA areas). Operates to reduce the amount of space copy.

In the present embodiment, an example is described in which the amount of the area released from the association with the NVM modules LBA0 (811) and LBA1 (821) is managed by the PBA amount (KB). It is not limited to this management unit. For example, instead of the invalid PBA amount, there may be a mode of managing the number of pages that are the minimum writing unit.

The above is the content of the block management information 900 used by the NVM module to which the present invention is applied.

(1-10) NVM Module Control Command 1: Write Command Next, commands used by the NVM module 126 to which the present invention is applied will be described.

When the NVM module 126 in this embodiment receives one command from the processor 121 of the storage controller 110, the NVM module 126 analyzes the content of the received command, performs predetermined processing, and sends one response (response information) after the processing is completed. Reply to the storage controller. This process is realized by the processor 415 in the NVM module 126 executing a command processing program stored in the RAM 413. The command includes a set of information necessary for the NVM module 126 to perform predetermined processing. For example, if the command is a write command that instructs the NVM module 126 to write data, The command includes a write command and information necessary for writing (write data write position, data length, etc.). The NVM module 126 supports a plurality of types of commands. First, information common to each command will be described.

Each command includes information such as an operation code (Opcode) and a command ID at the head as common information. Then, after the command ID, information (parameter) unique to each command is added to form one command. For example, FIG. 10 is a diagram showing the format of the write command of the NVM module 126 and the format of the response information for the write command in this embodiment. The element (field) 1011 in FIG. 10 is Opcode, and the element 1012 is a command. ID. Elements 1013 to 1016 are parameters specific to the write command. Further, as response information returned after the processing of each command is completed, the command ID and status (Status) are information included in all response information, and information unique to each response information is added after the status. Sometimes.

The operation code (Opcode) is information for notifying the NVM module 126 of the command type, and the NVM module 126 that has acquired the command recognizes the notified command type by referring to this information. For example, if Opcode is 0x01, it is recognized as a write command, and if Opcode is 0x02, it is recognized as a read command.

The command ID is a field for storing a unique ID of the command. In the response information of the command, the ID specified in this field is used so that the storage controller 110 can recognize which command is the response information. Is granted. The storage controller 110 generates an ID capable of uniquely identifying the command when creating the command, creates a command storing this ID in the command ID field, and transmits the command to the NVM module 126. Then, when the process corresponding to the received command is completed, the NVM module 126 includes the command ID of the command in response information and returns it to the storage controller 110. When the storage controller 110 receives the response information, the storage controller 110 recognizes the completion of the command by acquiring the ID included in the response information. Further, the status (element 1022 in FIG. 10) included in the response information is a field in which information indicating whether or not the command processing has been normally completed is stored. If the command process is not completed normally (error), the status stores a number that can identify the cause of the error, for example.

FIG. 10 is a diagram showing the LBA0 write command of the NVM module 126 and the response information to the write command in this embodiment. The LBA0 write command 1010 of the NVM module 126 in the present embodiment is constituted by an operation code 1011, a command ID 1012, an LBA0 / 1 start address 1013, an LBA0 / 1 length 1014, a compression necessity flag 1015, and a write data address 1016 as command information. Is done. In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. For example, the present invention is effective even if information related to DIF (Data Integrity Field) or the like is given to the command.

The operation code 1011 is a field for notifying the type of command to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is a write command.

The command ID 1012 is a field for storing a unique ID of the command. The command response information is assigned with the ID specified in this field so that the storage apparatus can recognize which command is the response information. Is done. The storage apparatus 101 assigns an ID that can uniquely identify the command when the command is created. When the response information from the NVM module 126 is received, the completion of the command is recognized by acquiring the ID included in the response information.

The LBA 0/1 start address 1013 is a field for designating the start address of the write destination logical space. In the embodiment of the present invention, the LBA0 space is a space in the range of addresses 0x000_0000_0000 to 0x07F_FFFF_FFFF, and the LBA1 space is defined as a space in the range after the address 0x800_0000_0000. Therefore, the NVM module 126 uses the LBA0 / 1 of the write command. If an address in the range from 0x000_0000_0000 to 0x07F_FFFF_FFFF is stored in the start address 1013, it is recognized that an address in the LBA0 space has been specified, and if an address in the range from 0x800_0000_0000 to 0x8FF_FFFF_FFFF is specified, the address in the LBA1 space is specified Can be recognized. However, a method other than the method described above can be adopted as a method for recognizing which address space in the LBA0 space or the LBA1 space is designated. For example, there may be a method of identifying the LBA0 space and the LBA1 space according to the contents of Opcode 1011.

The LBA 0/1 length 1014 is a field for designating the range (length) of the recording destination LBA 0 or LBA 1 starting from the LBA 0/1 start address 1013, and stores the length represented by the number of sectors. The NVM module 126 performs processing for associating the PBA area storing the write data with the LBA0 or LBA1 area in the range indicated by the LBA0 or LBA1 start address 1013 and the LBA0 / 1 length 1014 described above.

The compression necessity flag 1015 is a field for designating whether to compress the write target data indicated by this command. When the storage controller 110 creates a write command, if the size reduction effect due to data compression cannot be expected for the write target data (for example, when it is already recognized as data compressed by image compression), this flag is controlled. Then, the NVM module 126 is notified that compression is not necessary. In this embodiment, when writing to LBA1, the write target data has already been compressed and is used to explicitly notify that compression is not necessary. If the fixed setting indicates that transfer data compression is not required when writing to LBA1, this compression necessity flag 1015 may be omitted.

The write data address 1016 is a field for storing the start address of the current storage destination of the write target data indicated by this command. For example, when data temporarily stored in the DRAM 125 of the storage apparatus 101 is written to the NVM module 126, the processor of the storage apparatus 101 issues a write command in which the address on the DRAM 125 in which the data is stored is stored in the write data address 1016. Create it. The NVM module 126 acquires write data by acquiring, from the storage apparatus 101, data of an area having a length designated by the LBA 0/1 length 1014 from the address indicated in this field.

The write response information 1020 includes a command ID 1021, a status 1022, and a compressed data length 1023. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above.

The command ID 1021 is a field for storing a number that can uniquely identify a completed command.

The status 1022 is a field for notifying the storage device of the completion or error of the command. In the case of an error, for example, a number for identifying the cause of the error is stored.

The compressed data length 1023 is a field for recording the data length when the written data is reduced by data compression. The storage apparatus 101 can grasp the data size after compression of the written data by acquiring this field. However, the storage apparatus 101 cannot accurately grasp the actual compressed data size associated with the specific LBA0 area as update writing is performed. For this reason, the storage apparatus 101 issues a compressed data size acquisition command, which will be described later, for mapping to LBA1 when the total of the compressed data lengths 1023 acquired by the write command reaches a constant value.

Also, in this embodiment, when the write destination is LBA1, this field is invalid because compressed data is recorded.

(1-11) NVM Module Control Command 2: Compressed Data Size Acquisition Command FIG. 11 is a diagram showing a compressed data size acquisition command of the NVM module 126 and response information to the compressed data size acquisition command in this embodiment. It is. The compressed data size acquisition command 1110 of the NVM module 126 in the present embodiment is constituted by an operation code 1111, a command ID 1012, an LBA 0 start address 1113, and an LBA 0 length 1114 as command information. In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. Since the command ID 1012 has the same contents as the previous LBA0 write command, description thereof is omitted.

The operation code 1111 is a field for notifying the command type to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is a compressed data size acquisition command.

The LBA 0 start address 1113 is a field for designating the start address of the LBA 0 area that is the target of acquiring the data size after compression. The LBA 0 length 1114 is a field for designating a range of LBA 0 starting from the LBA 0 start address 1113. The NVM module 126 calculates the size of the compressed data associated with the LBA1 area in the range indicated by the LBA0 start address 1113 and the LBA0 length 1114, and notifies the storage apparatus. The address that can be specified as the LBA 0 start address 1113 is limited to a multiple of 8 sectors (4 KB). Similarly, the length that can be designated as the LBA 0 length 1114 is also limited to a multiple of 8 sectors (4 KB). If an address that does not match the 8-sector boundary (for example, 0x000 — 0000 — 0001) or length is specified as the LBA 0 start address 1113 or the LBA 0 length 1114, an error is returned.

The compressed data size acquisition response 1120 includes a command ID 1021, a status 1022, and a compressed data length 1123. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above. Since the command ID 1021 and the status 1022 have the same contents as the previous LBA0 write response, description thereof will be omitted.

The compressed data length 1123 is a field for storing the size of the compressed data associated with the LBA0 area specified by the compressed data size acquisition command. The storage controller 110 acquires the value of this compressed data length, and recognizes the area size required for the LBA 1 that is the mapping destination by an LBA 1 mapping command described later.

(1-12) Command 3 for NVM Module Control: LBA1 Mapping Command In the NVM module 126 in this embodiment, the NVM module 126 compresses and writes the data written by designating the LBA0 area and records it in the FM 420. Further, in order to acquire the compressed data recorded in the FM 420, the compressed data must be mapped on the LBA1 space. The LBA1 mapping command is used for that purpose.

FIG. 12 is a diagram schematically showing an LBA1 mapping command and response information to the LBA1 mapping command supported by the NVM module 126 in the present embodiment. The LBA1 mapping command 1210 of the NVM module 126 in the present embodiment is configured by an operation code 1211, a command ID 1012, an LBA0 start address 1213, an LBA0 length 1214, and an LBA1 start address 1215 as command information. In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. Since the command ID 1012 has the same contents as the previous write command, description thereof is omitted.

The operation code 1211 is a field for notifying the type of command to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is an LBA1 mapping command.

The LBA 0 start address 1213 is a field for designating a head address for designating the LBA 0 area of the target data for mapping the compressed data to LBA 1. The LBA0 length 1214 is a field for designating a range of LBA0 starting from the LBA0 start address 1213 to be mapped to LBA1. As with the compressed data size acquisition command, the LBA 0 start address 1213 and the LBA 0 length 1214 are limited to multiples of 8 sectors (4 KB).

The LBA1 start address 1215 is a field for designating the start address of LBA1 to be mapped. The storage controller 110 knows the data size to be mapped in advance using the compressed data size acquisition command, reserves an LBA1 area to which this data size can be mapped, and stores this head address in the LBA1 start address 1215 field. The command is issued to the NVM module 126.

The NVM module 126 according to the present embodiment transfers the compressed data associated with the LBA0 space in the range indicated by the LBA0 start address 1213 and the LBA0 length 1214 from the LBA1 start address 1215 to an area corresponding to the compressed data size. Perform mapping. More specifically, the PBA (NVM module PBA812) associated with the LBA0 space in the range indicated by the LBA0 start address 1213 and the LBA0 length 1214 is acquired by referring to the LBA0-PBA conversion table. Then, referring to the LBA1-PBA conversion table, from the LBA1 start address 1215, the PBA acquired in the PBA 822 in the LBA1 range (entry specified by the NVM module LBA1 (821)) having the same size as the total size of the acquired PBA Enter the address.

The LBA1 mapping response 1220 includes a command ID 1021 and a status 1022. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above. Since the command ID 1021 and the status 1022 have the same contents as the previous write response, description thereof is omitted.

(1-13) NVM Module Control Command 4: Full Stripe Parity Generation Command There are two main parity generation methods in RAID technology. One is a method of generating parity by calculating parity data such as XOR by using all data necessary for generating parity, and this method is referred to as a “full stripe parity generation method” in this specification. . The other corresponds to the data before update and the data before update stored in the storage medium in addition to the update data when update data is written to the RAID-configured storage medium group. This is a method of generating parity (updated parity) corresponding to update data by performing an XOR operation with the parity before update, and this method is called an “update parity generation method” in this specification.

The full stripe parity generation command can be used when all the data constituting the RAID parity is stored in the NVM module 126 and mapped in the LBA1 space. Therefore, in the case of a RAID configuration that generates parity for six data, six data must be stored in the NVM module 126.
As described above, in the storage apparatus 101 according to the embodiment of the present invention, the write data from the higher level apparatus 103 is stored in a compressed state in the NVM module 126. However, in the parity generation, the uncompressed data is stored. Generate parity from. Therefore, the parity generation target data needs to be mapped to the LBA0 space.

FIG. 13 is a diagram showing the response information to the full stripe parity generation command and the full stripe parity generation command of the NVM module 126 in the present embodiment. The full stripe parity generation command 1310 includes, as command information, an operation code (Opcode) 1311, a command ID 1012, an LBA0 length 1313, a stripe number 1314, an LBA0 start address 0 to X (1315 to 1317), and an LBA0 start address (for XOR parity) 1318, an LBA0 start address (for RAID 6 parity) 1319. In this embodiment, an example of a command based on the above information will be described, but there may be additional information above.

The operation code 1311 is a field for notifying the command type to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is a full stripe parity generation command.

The LBA 0 length 1313 is a field for designating the length of the parity to be generated (for RAID parity, the parity and the parity generation source data have the same length). The number of stripes 1314 designates the number of data used for generating parity. For example, when parity is generated for 6 data, 6 is stored in the stripe number 1314.

LBA 0 start addresses 0 to X (1315 to 1317) are fields for designating the start address of LBA 0 to which the parity generation source data is associated. The number of fields matches the number specified by the stripe number 1314 (when a command that does not match is issued, the NVM module 126 returns an error). For example, in a configuration in which two parities are created for six data (RAID6 6D + 2P), six LBA1 start addresses are designated.

LBA 0 start address (for XOR parity) 1318 is a field for designating the storage destination of the generated RAID parity (XOR parity). In the case of the RAID 5 configuration, the parity (RAID 5 parity, RAID 6 P parity, or horizontal parity) generated in the area specified by the LBA 0 length 1313 is mapped from this start address.

The LBA 0 start address (for RAID 6) 1319 is a field for designating the storage destination of the parity for RAID 6 to be generated. As described above, the parity for RAID 6 is Q parity of Reed-Solomon code or diagonal parity in the EVENODD system. The generated parity is stored in an area in a range specified by the LBA 0 start address (for RAID 6) 1319 and the LBA 0 length 1313.

The NVM module 126 of this embodiment acquires a plurality of compressed data from the FM 420 indicated by the PBA associated with the area specified by the LBA 0 start addresses 0 to X (1315 to 1317) described above. Subsequently, the acquired data is decompressed using the data compression / decompression unit 418, and one or two parities are generated from the decompressed data using the parity generation unit 419 inside the NVM module 126. Thereafter, the generated parity is compressed using the data compression / decompression unit 418 and then recorded in the FM 420. Since the PBA in the area of the recording destination FM where the parity is recorded last is associated with the LBA 0 start address (for XOR parity) 1318 and the LBA 0 start address (for RAID 6) 1319, the corresponding row of the LBA 0-PBA management information 810 (NVM module LBA 0 (811) is the area of the recording destination FM in the NVM module PBA (812) and the PBA length (813) of the LBA0 start address (for XOR parity) 1318 and the LBA0 start address (for RAID6) 1319) The PBA and the compressed data length are recorded.

The full stripe parity generation response 1320 includes a command ID 1021 and a status 1022. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above. Since the command ID 1021 and the status 1022 have the same contents as the previous LBA0 write response, description thereof will be omitted.

(1-14) NVM module control command 5: Update parity generation command Update parity generation is updated with update data and update data when update data is recorded in the area of the final storage medium for which parity has already been created. This is performed when three pieces of information of old data in the area and old parity protecting the old data are mapped on LBA0. When generating the updated parity, the storage controller 110 of this embodiment reads the old data and the compressed data of the old parity from the final storage medium configured in RAID, and writes it to the area on the LBA1 space of the NVM module 126. Then, by mapping each compressed data in the LBA1 space to the LBA0 space, the update data, the old data in the area updated by the update data, and the old parity protecting the old data are aligned in the LBA0 space, and then updated. Update parity generation is performed by issuing a parity generation command.

FIG. 14 is a diagram showing an update parity generation command of the NVM module 126 and response information to the update parity generation command in the present embodiment. The update parity command 1410 of the NVM module 126 in this embodiment includes, as command information, an operation code 1011, a command ID 1012, an LBA0 length 1413, an LBA0 start address 0 (1414), an LBA0 start address 1 (1415), and an LBA0 start address 2 ( 1416), LBA0 start address 3 (1417), LBA0 start address 4 (1418), and LBA0 start address 5 (1419). In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. Since the command ID 1012 has the same contents as the previous LBA0 write command, description thereof is omitted.

The operation code 1411 is a field for notifying the type of command to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is an update parity generation command.

The LBA0 length 1413 is a field for designating the length of parity to be generated (note that the length of RAID parity and parity generation source data have the same relationship).

LBA 0 start address 0 (1414) is a field indicating the start address of the LBA 0 area to which new data for parity update is mapped. The storage apparatus 101 uses this field to notify the NVM module 126 that the data in the area from the LBA 0 start address 0 (1414) to the LBA 0 length 1413 is new data.

LBA0 start address 1 (1415) is a field indicating the start address of the LBA0 area to which the old data for parity update is mapped. The storage apparatus 101 uses this field to notify the NVM module 126 that the data in the area from the LBA 0 start address 1 (1415) to the LBA 0 length 1413 is old data.

LBA0 start address 2 (1416) is a field indicating the start address of the LBA0 area to which the XOR parity before update for parity update is mapped. The storage apparatus 101 uses this field to notify the NVM module 126 that the data in the area from the LBA 0 start address 2 (1416) to the LBA 0 length 1413 is the pre-update XOR parity.

LBA 0 start address 3 (1417) is a field indicating the start address of the LBA 0 area to which the parity for RAID 6 before update for parity update is mapped. The storage apparatus 101 uses this field to notify the NVM module 126 that the data in the area from the LBA 0 start address 3 (1417) to the LBA 0 length 1413 is RAID 6 parity.

LBA0 start address 4 (1418) is a field indicating the start address of the LBA0 area to which the XOR parity newly created by updating is associated. The storage apparatus 101 uses this field to instruct the NVM module 126 to map a new XOR parity from the LBA 0 start address 4 (1418) to the LBA 0 length 1413 area.

LBA 0 start address 5 (1419) is a field indicating the start address of the LBA 0 area to which a parity for RAID 6 newly created by update is associated. The storage controller 110 uses this field to instruct the NVM module 126 to map a new parity for RAID 6 from the LBA 0 start address 5 (1419) to the LBA 0 length 1413 area.

The processing when the NVM module 126 receives the update parity generation command is the same as the processing performed when the full stripe parity generation command is received. A parity generation unit in the NVM module 126 that acquires and decompresses a plurality of compressed data from the storage area on the FM 420 indicated by the PBA associated with the area specified by the LBA 0 start addresses 0 to 3 (1414 to 1417). After generating one or two parities using 419, the parities are compressed. Thereafter, the generated parity is recorded in the FM 420 and mapped to the LBA 0 specified by the LBA 0 start address 4 (1418) and the LBA 0 start address 5 (1419).

The update parity generation response 1420 includes a command ID 1021 and a status 1022. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above. Since the command ID 1021 and the status 1022 have the same contents as the previous LBA0 write response, description thereof will be omitted.

(1-15) NVM module control command 6: compression information acquisition command In the storage apparatus 101 in this embodiment, the NVM module 126, which is a cache apparatus, generates parity corresponding to data, and each data including parity , The storage apparatus 101 acquires the compressed data from the NVM module 126 and records the compressed data in the final storage medium. At this time, information necessary for decompressing the compressed data (hereinafter referred to as compressed information) is also recorded in the final storage medium. Note that the present invention does not depend on this method, and the NVM module 126 may permanently hold information necessary for decompression.

When recording the compressed information in the final storage medium as in the present embodiment, the storage apparatus 101 needs to acquire the compressed information from the NVM module 126 that is a cache apparatus. The compression information acquisition command is used when the storage controller 110 acquires compression information from the NVM module 126.

FIG. 15 is a diagram illustrating a compression information acquisition command and response information to the compression information acquisition command of the NVM module 126 in the present embodiment. The compression information acquisition command 1510 of the NVM module 126 in this embodiment is constituted by an operation code 1511, a command ID 1012, an LBA1 start address 1513, an LBA1 length 1514, and a compression information address 1515 as command information. In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. Since the command ID 1012 has the same contents as the previous write command, description thereof is omitted.

The operation code 1511 is a field for notifying the command type to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is a compression information acquisition command.

The LBA1 start address 1513 is a field for designating the start address of the area on the LBA1 from which compression information is to be acquired.

The LBA1 length 1514 is a field for designating a range of LBA1 starting from the LBA1 start address 1513.

The compression information address 1315 is a field for designating the storage destination of the compression information acquired by the storage controller 110 from the NVM module 126.

The NVM module 126 creates compression information necessary for decompressing the data recorded in the LBA1 area in the range indicated by the LBA1 start address 1513 and the LBA1 length 1514, and sets the compression information address 1315 specified by the storage controller 110. Forward. The compression information specifically indicates the structure of the compressed data mapped to LBA1. For example, when four pieces of independently decompressable compressed data are mapped to the designated LBA1 area, the information stores the start position of the four pieces of compressed data and the length of the compressed data.

The storage apparatus 101 according to the present embodiment acquires the compression information from the NVM module 126 by the compression information acquisition command, and then records the compression information together with the compressed data on the final storage medium. When decompressing and reading the compressed data, the compressed information is acquired from the final storage medium together with the compressed data, the compressed data is written to the NVM module 126, and then the compressed information is transferred to a compressed information transfer command described later. Thus, the NVM module 126 can be extended.

The compression information acquisition response 1520 includes a command ID 1021 and a status 1022. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above. Since the command ID 1021 and the status 1022 have the same contents as the previous LBA0 write response, description thereof will be omitted.

(1-16) NVM Module Control Command 7: Read Command FIG. 16 is a diagram showing a read command of the NVM module 126 and response information to the read command in this embodiment. The read command 1610 of the NVM module 126 in this embodiment is constituted by an operation code 1611, a command ID 1012, an LBA 0/1 start address 1613, an LBA 0/1 length 1614, an expansion necessity flag 1615, and a read data address 1616 as command information. The In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. Since the command ID 1012 has the same contents as the previous LBA0 write command, description thereof is omitted.

The operation code 1111 is a field for notifying the command type to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is a read command.

The LBA 0/1 start address 1613 is a field for designating the start address of the logical space of the read destination.

The LBA 0/1 length 1614 is a field for designating the range of the recording destination LBA 0 or LBA 1 starting from the LBA 0/1 start address 1613. The NVM module 126 obtains data from the PBA associated with the LBA0 or LBA1 area in the range indicated by the LBA0 or LBA1 start address 1613 and the LBA0 / 1 length 1614 described above, and transfers the data to the storage device to perform read processing. I do.

The decompression necessity flag 1615 is a field for designating the necessity of decompression of the read target data indicated by this command. When the storage controller 110 creates a read command, this flag is controlled to notify the NVM module 126 that decompression is unnecessary. This field may not be included in the read command. In this embodiment, when reading from the LBA1 space, it is necessary to acquire the read target data without intentionally expanding the data. For this reason, it is used to explicitly tell that decompression is unnecessary. It should be noted that the decompression necessity flag 1615 may not be provided if the read to the LBA 1 is set to a fixed value that does not require decompression of the acquired data.

In the read data address 1616, the head address (for example, an address in the DRAM 125) of the output destination area of the read target data is designated. In the read data, data having a length designated by the LBA 0/1 length 1614 is continuously stored from the area of the address designated by the read data address 1616.

The read response 1620 includes a command ID 1021 and a status 1022. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above. Since the command ID 1021 and the status 1022 have the same contents as the previous LBA0 write response, description thereof will be omitted.

(1-17) NVM Module Control Command 8: Mapping Cancel Command The storage controller 110 according to the embodiment of the present invention acquires the write data and parity compressed and recorded in the NVM module 126 in a compressed state. Therefore, the data is mapped to LBA1. Further, in order to decompress and acquire the compressed information, LBA1 is designated and data recorded in the NVM module 126 is mapped to LBA0. The mapped area needs to be unmapped when the processing is completed and becomes unnecessary. The storage apparatus 101 of this embodiment releases the association of LBA0 or LBA1 associated with the PBA using a mapping release command.

FIG. 17 is a diagram showing a mapping cancellation command of the NVM module 126 and response information to the mapping cancellation command in this embodiment. The mapping cancellation command 1710 of the NVM module 126 in this embodiment is constituted by an operation code 1711, a command ID 1012, an LBA0 / 1 start address 1713, and an LBA0 / 1 length 1714 as command information. In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. Since the command ID 1012 has the same contents as the previous LBA0 write command, description thereof is omitted.

The operation code 1111 is a field for notifying the command type to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is a mapping release command.

The LBA 0/1 start address 1713 is a field for designating the start address of the logical space to be unmapped, and addresses in both the LBA 0 space and the LBA 1 space can be designated. However, if an address in the LBA0 space is specified, the address must be an address on a 4 KB (8 sector) boundary. If an address that is not on a 4 KB (8 sector) boundary is specified, the NVM module 126 will generate an error. return it. The LBA 0/1 length 1714 is a field for designating the range of the LBA 0 space or the LBA 1 space starting from the LBA 0/1 start address 1713.

The processing when the NVM module 126 receives a mapping release command from the storage controller 110 is as follows. The NVM module 126 associates the PBA associated with the LBA0 or LBA1 space (hereinafter referred to as “target LBA0 / 1 area”) in the range indicated by the LBA0 LBA / 1 start address 1713 and the LBA0 / 1 length 1714 described above. Is deleted. Specifically, referring to the LBA0-PBA conversion table 810 or the LBA1-PBA conversion table 820, each entry in which the value of the NVM module LBA0 (811) or the NVM module LBA1 (821) belongs to the range of the target LBA0 / 1 area Is updated by changing the field of the NVM module PBA812 or the NVM module PBA822 to unallocated.

At this time, the PBA whose association with LBA 0 and LBA 1 has been released is detected, and the PBA information is reflected in the block management information 900 (that is, the amount of the area that has become invalid PBA in the invalid PBA amount 904 item) Count). The NVM module 126 in the embodiment of the present invention selects a block having a relatively large invalid PBA amount 904 among a plurality of blocks (that is, selects a block having the largest invalid PBA amount 904 in order), Garbage collection is carried out. Garbage collection is a well-known process and will not be described here.

(1-18) NVM Module Control Command 9: Compressed Information Transfer Command In this embodiment, after the storage apparatus 101 stores the data compressed by the NVM module 126 in the final storage medium, the storage apparatus 101 In response to a read request from the user, it is necessary to decompress the compressed data and transfer it to the host device. At this time, the storage apparatus 101 acquires the compressed data from the final storage medium, transfers the compressed data to the NVM module 126, and then transfers the compressed information necessary for decompressing the compressed data.

FIG. 18 is a diagram showing the compression information transfer command of the NVM module 126 and the response information to the compression information transfer command in the present embodiment. The compression information transfer command 1810 of the NVM module 126 in the present embodiment is constituted by an operation code 1811, a command ID 1012, an LBA1 start address 1813, an LBA1 length 1814, and a compression information address 1815 as command information. In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. Since the command ID 1012 has the same contents as the previous write command, description thereof is omitted.

The operation code 1811 is a field for notifying the command type to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is a compressed information transfer command.

The LBA1 start address 1813 is a field that specifies the start address of the area on the LBA1 that is the target of the compressed information to be transferred.

The LBA1 length 1814 is a field for designating a range of LBA1 starting from the LBA1 start address 1513.

The compression information address 1815 is a field for designating the storage destination of the compression information transferred from the storage controller 110 to the NVM module 126.

The NVM module 126 acquires the compression information from the address specified by the compression information address 1815, and enables decompression of a plurality of compressed data in the area specified by the LBA1 start address 1813 and the LBA1 length 1814. Specifically, after the compressed data associated with LBA1 is mapped to LBA0 with the LBA0 mapping command described later, the compression transferred with the compression information transfer command when a read request for LBA0 is received from the storage device The compressed data is decompressed using the information and transferred to the storage.

The compressed information transfer response 1820 includes a command ID 1021 and a status 1022. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above. Since the command ID 1021 and the status 1022 have the same contents as the previous write response, description thereof is omitted.

(1-19) NVM Module Control Command 10: LBA0 Mapping Command In this embodiment, the NVM module 126 records the compressed data written by designating the LBA1 area in the FM. In this embodiment, the compressed data is acquired by the storage controller 110 being decompressed, so that the compressed data is mapped to LBA 0 different from LBA 1 that is the write destination of the compressed data.

FIG. 19 is a diagram showing the LBA0 mapping command of the NVM module 126 and the response information to the LBA0 mapping command in the present embodiment. The LBA0 mapping command 1210 of the NVM module 126 in the present embodiment is configured by an operation code 1911, a command ID 1012, an LBA1 start address 1913, an LBA1 length 1914, and an LBA0 start address 1915 as command information. In this embodiment, an example of a command based on the above information will be described, but there may be additional information above. Since the command ID 1012 has the same contents as the previous write command, description thereof is omitted.

The operation code 1911 is a field for notifying the command type to the NVM module 126, and the NVM module 126 that has acquired the command recognizes that the command notified by this field is an LBA0 mapping command.

The LBA1 start address 1913 is a field for designating a head address for designating an LBA1 area of target data for mapping compressed data to LBA1.

The LBA1 length 1914 is a field for designating a range of LBA1 starting from the LBA1 start address 1913 to be mapped to LBA1.

The LBA 0 start address 1915 is a field for designating the start address of LBA 0 to be mapped. The storage apparatus 101 knows the data size after the decompression of the compressed data recorded in the LBA 1 from the compression information acquired from the PDEV by the storage controller 110, and secures an area of LBA 0 to which this data size can be mapped. Enter the address in the LBA 0 start address 1915 field. The address that can be specified as the LBA 0 start address 1915 is limited to a multiple of 8 sectors (4 KB).

The NVM module 126 of this embodiment transfers the compressed data associated with the LBA1 area in the range indicated by the LBA1 start address 1913 and the LBA0 length 1914 from the LBA0 start address 1915 to the area corresponding to the data size after decompression. Mapping. More specifically, referring to the LBA1-PBA conversion table, the PBA associated with the LBA in the range indicated by the LBA1 start address 1913 and the LBA1 length 1914 is acquired. Then, referring to the LBA0-PBA conversion table, acquired from the LBA0 start address 1915 to the PBA822 in the LBA0 range that has the same size after decompression based on the compression information acquired from the storage device by the NVM module 126 using the compression information transfer command Enter the PBA address.

The LBA0 mapping response 1220 includes a command ID 1021 and a status 1022. In the present embodiment, an example of response information based on the above information will be described, but there may be additional information above. Since the command ID 1021 and the status 1022 have the same contents as the previous write response, description thereof is omitted.

(1-20) Overview of data management in storage device:
Next, data management of the storage apparatus 101 according to the embodiment of the present invention will be described. The storage apparatus 101 manages the virtual volume in association with one or more RAID group areas. The storage apparatus 101 manages one or more virtual PDEVs in association with the RAID group. Furthermore, the storage apparatus 101 manages one PDEV (SSD 111 or HDD 112) in association with each virtual PDEV. These associations will be described with reference to FIG.

The storage apparatus 101 according to the embodiment of the present invention provides a virtual volume (denoted as “virtual Vol” in the drawing) 200 to the host apparatus 103. The example of FIG. 21 shows the association in the storage apparatus of the data area “Data14” recognized by the higher-level apparatus 103. Hereinafter, a case where the RAID type of the RAID group associated with the virtual volume 200 is RAID 5 will be described as an example.

In FIG. 21, data areas Data0 to Data14 are fixed-size areas divided by RAID parity calculation units, and these units are hereinafter referred to as RAID stripes. Note that the data areas Data0 to Data14 are storage areas recognized by the host apparatus 103, and are therefore areas in which uncompressed data is stored in the storage apparatus 101. For example, the RAID stripe Data14 generates a parity for RAID5 by performing an XOR operation with Data13 and Data12. Hereinafter, a set of RAID stripes necessary for generating a RAID parity, such as Data12, Data13, and Data14, is referred to as a RAID stripe column. The length of the RAID stripe is 64 KB or 32 KB, and the length of the RAID stripe column is the product of the number of RAID stripes constituting the RAID stripe column. For example, when the length of the RAID stripe is 32 KB and the number of RAID stripes constituting the RAID stripe row is 3, the length of the RAID stripe row is 32 KB × 3 = 96 KB. At this time, the RAID parity is generated for each 96 KB data area.

Note that each stripe in one virtual volume is assigned a serial number starting from 0. This number is referred to herein as the “striped number”. The stripe number of the stripe located at the head of the virtual volume is 0, and the

stripe numbers

1, 2,. In FIG. 21, the numbers given after “Data” such as Data 0 to 14 are stripe numbers. This is referred to as a “stripe number”.

Suppose that each RAID stripe column is also given a number starting from 0 (called a stripe column number) in order from the stripe column located at the head of the RAID group. The stripe column number of the stripe column located at the head of the RAID group is 0, and the

stripe column numbers

1, 2,...

Subsequently, the correspondence relationship between the virtual volume and the RAID group and the correspondence relationship between the RAID group and the virtual PDEV will be described. Before that, the virtual PDEV will be described. The virtual PDEV is a concept defined in the storage apparatus 101 in order to convert the address of the virtual volume into the PDEV address. The storage apparatus 101 converts the data written from the host apparatus 103 into the virtual PDEV. Treat as a storage device that stores the data as is (uncompressed).

21, the virtual volume 200 is associated with the RAID group 0, and the RAID group 0 is configured with virtual PDEVs 0 to 3. RAID group 0 is configured to protect data with RAID 5 by virtual PDEVs 0 to 3, and each stripe of virtual volume 200 corresponds to virtual volume 200 with a statically defined computable correspondence. Is associated with one of the stripes in the plurality of virtual PDEVs belonging to the RAID group 0. This is the same as the association performed in the storage apparatus adopting the conventional RAID configuration. Further, since the PDEV has a relationship of 1: 1 with the virtual PDEV, it can be said that each stripe of the virtual volume 200 is associated with one of the PDEVs belonging to the RAID group 0.

As an example, the stripe “Data 14” of the virtual volume 200 will be described. The stripe “Data 14” of the virtual volume 200 has the fourth RAID stripe column from the beginning in the virtual PDEV 2 (here, the first stripe in the virtual PDEV 2 is counted as the 0th stripe) in a computable correspondence relationship. It is associated with the area to be stored (“Data 14” in “virtual PDEV 2” in FIG. 21). For example, if the length of the RAID stripe is 32 KB, the 32 KB area in the virtual PDEV 2 is associated with the start address of 32 KB × 4 = 128 KB as the storage destination of the RAID stripe “Data 14” of the virtual volume 200.

The correspondence relationship between each virtual PDEV configuring the RAID group 0 and the PDEV installed in the storage apparatus 101 is managed by virtual PDEV information 2230 described later. The correspondence relationship between the storage destination area of the RAID stripe “Data14” in the virtual PDEV2 and the area in the PDEV2 in which the data of the RAID stripe “Data14” is compressed and stored (“compressed D14” in FIG. 21) is as follows. It is managed by the compression management information (“management information 2” in FIG. 21) stored in the PDEV 2. The compression management information (management information 2) is recorded at a predetermined location in the PDEV 2. Details regarding the contents of the compression management information and the recording position in the PDEV 2 will be described later. Although only the compression management information in the PDEV 2 is described here, the compression management information (

management information

0, 1, 3 in FIG. 21) is also stored in the other PDEVs.

The above is the outline of data management in the embodiment.

(1-21) Storage Device Management Information 1: Virtual Volume Management Information Subsequently, the storage device 101 stores “virtual volume management information” stored in the DRAM 125 of the storage device 101 so that the processor 121 can access data at high speed. “RAID group management information” and “virtual PDEV information” will be described with reference to FIG. Note that the management information stored in the DRAM 125 of the storage apparatus 101 is not limited to the above, and other management information may be stored.

First, the virtual volume management information 2210 will be described. The virtual volume management information 2210 is management information generated each time the storage apparatus 101 creates one virtual volume, and management information for one virtual volume is stored in one virtual volume management information. The storage apparatus 101 manages the association between the virtual volume and the RAID group using the virtual volume management information 2210, and identifies the RAID group to be referred to for the address requested from the host apparatus.

The virtual volume management information 2210 includes items of a virtual volume start address 2211, a virtual volume size 2212, a RAID group number 2213, and a RAID group start address 2214. The present invention is not limited to these four types of items. The virtual volume management information 2210 may include management information other than that shown in FIG.

The head address 2211 in the virtual volume is an item for storing the head of the address in the virtual volume to which the RAID group is associated.

The virtual volume size 2212 is an item for storing the area size in the virtual volume associated with the RAID group. The storage apparatus 101 associates the area of the size specified by the virtual volume size 2212 from the start address specified by the virtual volume start address 2211 with the RAID group specified by the RAID group number 2213 described later.

The RAID group number 2213 is an item for storing the number of the RAID group associated with the virtual volume.

The RAID group start address 2214 is an item for designating an address in the RAID group designated by the RAID group number 2213. In this specification, the RAID group address does not include an area in which parity is stored. The RAID group address will be described below with reference to FIG. When the stripe size is 64 KB, in RAID group 0 in FIG. 21, the first stripe “Data 0” of virtual PDEV 0 that is the first virtual PDEV that constitutes RAID group 0 is 0, and the first stripe “Data 1” of the next virtual PDEV 1 is stored. The position to be recorded is address 64 KB. Subsequently, the position where the first stripe “Data2” of the next virtual PDEV2 is stored is the address 128 KB. Further, since the stripe in which the parity is stored is excluded from the RAID group address, the position where “Data4” of the virtual PDEV0 is stored is the address 192KB.

In the example shown in FIG. 22, the virtual volume management information 2210 indicates that the 0 to 20 (= 0 + 20) TB area of the virtual volume is associated with the 0 to 20 TB area of the RAID group number 0. Further, in the example shown in FIG. 22, the virtual volume management information 2210 indicates that the area from 20 TB to 100 (= 20 + 80) TB of the virtual volume is associated with 10 to 90 (= 10 + 80) TB of RAID number 1. Show. Thus, when receiving an access request specifying the virtual volume access target area (the access target area is specified by the LBA and the access data length) from the host device 103, the processor 121 refers to the virtual volume management information 2210, A RAID group associated with an access target area can be specified.

As described above, the virtual volume management information for associating the virtual volume with the RAID group does not require as much information as the compression management information described later for associating the areas. The reason for this will be described with reference to FIG.

In the example of FIG. 21, when the recording destination area of the RAID stripe “Data14” in the virtual volume is specified, the virtual PDEV can be specified as a value obtained by adding 1 to the remainder obtained by dividing the stripe number by the number of RAID-configured drives. In the case of FIG. 21, the remainder of 14 ÷ 4 is 2, and when 1 is added, it becomes 3. From this, it can be calculated from the calculation that it is recorded in the third virtual PDEV in the RAID group 0203. Further, the start address of the recording location in the virtual PDEV 2 of the RAID stripe Data 14 can be specified from a value obtained by dividing the stripe number by the number of data included in the RAID stripe column. For example, in the case of FIG. 21, the number of data included in the RAID stripe column is three. Accordingly, the start address of the recording location of the RAID stripe “Data 14” is after the fourth stripe stored in the virtual PDEV 2 from 14 ÷ 3 = 4.

The example described here is an example when the RAID type of the RAID group is RAID 5, and in the case of a RAID group adopting another RAID type, the address on the virtual volume is calculated by a different calculation method. The address in the virtual PDEV associated with (LBA, stripe number, etc.) can be specified. Further, the virtual PDEV and the address in the virtual PDEV associated with the stripe storing the parity corresponding to a certain RAID stripe are similarly obtained by simple calculation.

As described above, the correspondence between the address of the virtual volume and the address in the RAID group can be uniquely specified by calculation considering the RAID configuration. For this reason, a lot of information is not required for the association, and the information can be recorded in the DRAM accessible by the processor 121 at a high speed.

The above is the virtual volume management information in the embodiment of the present invention. The storage apparatus 101 manages the association between virtual volumes and RAID groups using this management information.

(1-22) Storage Device Management Information 2: RAID Group Management Information Next, the RAID group management information 2220 will be described.

The RAID group management information 2220 is information for managing a virtual PDEV that constitutes a RAID group, and the storage apparatus 101 generates one RAID group management information 2220 when one RAID group is defined. One RAID group management information manages one RAID group.

The RAID group management information 2220 is composed of a RAID configuration number 2221, a registered virtual PDEV number 2222, and a RAID type 2223. The RAID group management information 2220 of the present invention is not limited to the items shown in FIG. The RAID group management information 2220 may include items other than those shown in FIG.

The RAID configuration virtual PDEV number 2221 is an item for storing the number of virtual PDEVs constituting the RAID group. In the example of FIG. 22, it is shown that four units are configured.

The registered virtual PDEV number 2222 is an item for storing a number for identifying the virtual PDEV that constitutes the RAID group. In the example of FIG. 22, the RAID group is configured with four units of virtual PDEV3, virtual PDEV8, virtual PDEV9, and virtual PDEV15.

The RAID type 2223 is an item for storing a RAID type (RAID level). The example of FIG. 22 indicates that the RAID group to which this management information corresponds is configured with RAID5.

The above is the RAID group management information in the embodiment of the present invention. The storage apparatus 101 uses this management information to manage a RAID group configured with virtual PDEVs.

Subsequently, the virtual PDEV information 2230 will be described. The virtual PDEV information 2230 is information for managing the association between the virtual PDEV and the PDEV. The virtual PDEV number 2231 is an identification number assigned to each virtual PDEV managed in the storage apparatus 101, and the PDEV Addr 2232 is an identification number of each PDEV managed in the storage apparatus 101. For example, when the PDEV is a storage medium that conforms to the SAS standard, the PDEV Addr 2232 stores the SAS address assigned to each PDEV. With the virtual PDEV information 2230, the PDEV can be specified from the virtual PDEV number.

(1-23) Storage Device Management Information 2: Compression Management Information Next, compression management information stored in the PDEV will be described. The compression management information is information for managing the association between the virtual PDEV area and the PDEV area, and is recorded in each PDEV. In the storage apparatus 101 according to the embodiment of the present invention, the recording position of the compression management information in the PDEV is common to all PDEVs. Hereinafter, the head address in the PDEV in which the compression management information is stored is referred to as “head address in the PDEV of the compression management information”. The information on the head address in the PDEV of the compression management information may be stored in the DRAM 125 of the storage controller 110, or the address information is embedded in a program for accessing the compression management information. Also good.

Since the compression rate when data is compressed depends on the data content, the association between the virtual PDEV area and the PDEV area dynamically changes. For this reason, the compression management information is changed as the recording data is changed.

FIG. 23 shows compression management information 2300 used by the storage apparatus 101 according to the embodiment of the present invention. The compression management information 2300 is composed of three fields: a head address 2301 in the PDEV in which compressed data is stored, a length 2302 of the compressed data, and a compression flag 2303. The compression management information 2300 in the present invention is not limited to this configuration, and the compression management information 2300 may have three or more fields.

In the example of FIG. 23, the virtual PDEV capacity is set to 8 TB, which is eight times the 1 TB PDEV capacity, but the present invention is not limited to this value. The virtual PDEV capacity may be set to an arbitrary value according to the assumed compression rate. For example, when the compression rate of stored data can only be expected to be about 50%, it is preferable to set the virtual PDEV capacity to 2 TB with respect to the 1 TB PDEV capacity because the compression management information 2300 can be reduced. Further, the virtual PDEV capacity may be dynamically changed according to the compression rate of the stored data. For example, at first, the virtual PDEV capacity is set to 2 TB with respect to the 1 TB PDEV capacity. After the operation, the stored data is reduced to less than 1/8 based on the compressed data length information obtained from the NVM module 126, for example. If compression is expected, the capacity of the virtual PDEV may be dynamically set to 8 TB. In that case, the compression management information 2300 is also increased dynamically. On the other hand, if it is found that the stored data cannot be compressed much after the operation is started, the capacity of the virtual PDEV is dynamically reduced.

In the example of FIG. 23, an example of the compression management information 2300 is shown in which the virtual PDEV area is divided and associated with each 4 KB, but the present invention is not limited to this division unit. In the case where it is assumed that requests for data having a size exceeding 4 KB are frequently received from the host, it is desirable to classify the data in units of 4 KB or more because the amount of compression management information 2300 can be reduced.

The head address 2301 in the PDEV in which the compressed data is stored is a field for storing the head address of the area in which the compressed data in the corresponding virtual PDEV area is stored. In addition, when data is stored in an uncompressed state due to a lack of compression effect or the like, this field stores the start address of an area where uncompressed data is stored. When NULL is stored at the head address 2301 in the PDEV in which the compressed data is stored (described as “unassigned” in FIG. 23), the PDEV area is not allocated to the corresponding virtual PDEV area ( Unassigned). In the storage apparatus 101 according to the embodiment of the present invention, the compressed data and the uncompressed data are managed with the sector length 512B as the minimum unit. The present invention is not limited to this sector length.

The compressed data length 2302 is a field for storing the length of the compressed data stored in the PDEV. The unit of the value stored in this field is a sector. As a result, the value of the start address 2301 in the PDEV in which the compressed data is stored is set as the start address, and the compressed data is stored in the area of the number of sectors + 1 recorded in the compressed data length 2302 field. Has been. In the storage apparatus 101 according to the embodiment of the present invention, the storage area of the virtual volume is divided into 4 KB units, and compression is performed for each divided 4 KB area. The minimum data value by compression is 512B, and when there is no compression effect, the compressed data length is 512B to 4096B (4KB) because it is stored uncompressed (in terms of the number of sectors, 1-8). However, when the value of the compressed data length 2302 is 0, this field is used with the rule that the compressed data length is 1 sector (512 B). As a result, the compressed data length 2302 field manages the data length of 512 B to 4 KB in 3 bits.

The compression flag 2303 is a field indicating that the corresponding virtual PDEV area is compressed and stored. When the value of the compression flag 2303 is 1, it indicates that the data is compressed and stored. On the other hand, when the value of the compression flag 2303 is 0, it indicates that the data is stored uncompressed.

The compression management information 2300 has each value of the leading address 2301 in the PDEV, the length 2302 in the PDEV of the compressed data, and the compression flag 2303 for each 4 KB area of the virtual PDEV. Hereinafter, a set of the head address 2301 in the PDEV, the length 2302 in the PDEV of the compressed data, and the compression flag 2303 is referred to as a compression information entry. The size of the compressed information entry is 60 bits for the head address 2301 in the PDEV storing the compressed data, 3 bits for the length 2302 in the PDEV of the compressed data, and 1 bit for the compression flag 2303, for a total of 64 bits = 8B.

The recording position in the PDEV of the compression information entry that manages the association of the virtual PDEV area is uniquely determined by the virtual PDEV area to be associated. For example, the example of FIG. 23 indicates that the compressed information entry of the 4 KB area with the virtual PDEV area “0x0000_0000_1000” as the head address is fixedly recorded at the recording position “0x00_0000_0008”. In FIG. 23, the units of the start address of the virtual PDEV area and the address of the recording position are both bytes. Therefore, the address “0x0000 — 0000 — 1000” of the virtual PDEV area represents a position of 4 KB from the top of the virtual PDEV area. The recording position is represented by a relative address when the start address (address 0) is the start address in the PDEV of the compression management information. Hereinafter, unless otherwise specified, the case where the head address in the PDEV of the compression management information is 0, that is, the compression management information is recorded in the head area of each PDEV will be described.

The storage apparatus 101 according to the embodiment of the present invention can arbitrarily store the write data (compressed data) from the host apparatus 103 as long as it is an area (unused area) other than the area where the data is already stored. Compressed data can be stored in this area. When update data for certain data (pre-update data) is received from the host apparatus 103, the compressed data of the update data is written in a location different from the pre-update data (compressed data). Then, the PDEV area where the pre-update data is stored is treated as an unused area.

As described above, the update data may be stored at a different position from the pre-update data, but even in this case, the value in the compression information entry is changed, and the virtual data is always stored in the recording position “0x00_0000_0008”. A compressed information entry of the PDEV area “0x0000_0000_1000” is recorded. Thereafter, each time the address of the virtual PDEV area increases by 4 KB, the PDEV area in which the compression information entry of the area is recorded is recorded in the area incremented by 8B. For this reason,
The location of the compression information entry that manages the virtual PDEV address can be specified by the calculation formula of the head address in the PDEV of virtual PDEV address ÷ 4 KB × 8B + compression management information. The present invention is not limited to this calculation formula. It is only necessary to uniquely calculate the recording position of the compressed information entry recorded in the PDEV from the virtual PDEV address.

Such a fixed arrangement of compressed information entries makes it possible to calculate a virtual PDEV area that has become unreadable when the compressed information entry is lost from the recording address of the lost entry. For this reason, after the lost virtual PDEV area is restored by rebuilding by RAID, the contents of the compressed information entry can be regenerated when the restored data is compressed and recorded. Therefore, in the present invention, the reliability of the storage apparatus can be maintained even if the compression management information 2300 is not made redundant.

(1-24) Decompression Read Operation of Storage Device Next, the expansion read operation of the storage device 101 in this embodiment will be described with reference to FIG. The storage apparatus 101 according to the embodiment of the present invention responds to a read request from the host apparatus 103 with respect to the data recorded in the final storage medium by the write data compression operation of the storage apparatus 101 described with reference to FIG. The data is decompressed and returned to the host apparatus 103. Hereinafter, unless otherwise specified, each process is executed by the processor 121 of the storage apparatus 101.

Before explaining the decompression read operation, the management information of the cache area managed by the storage controller 110 according to the embodiment of the present invention will be explained. The storage controller 110 uses the storage area provided by the NVM module 126 as a cache area for temporarily storing write data from the higher-level device 103 and read data from the SSD 111 or the HDD 122. The NVM module 126 provides the LBA0 space or LBA1 space to the storage controller 110 (the processor 121 thereof), and among the provided LBA0 space or LBA1 space, the region used for storing data and so on. The processor 121 manages a non-region (referred to as a free space). Information used for managing this area is called cache management information.

FIG. 20A shows an example of the cache management information 3000 managed by the storage controller 110. The cache management information 3000 is stored on the DRAM 125. In principle, the storage apparatus 101 according to the embodiment of the present invention uses the LBA0 space provided by the NVM module 126 as a cache area for storing write data from the host apparatus 103. When storing data read from the final storage medium, the LBA1 space is used. This is because the data read from the final storage medium is compressed data. The cache area allocation unit is the stripe size. In the following description, the stripe size is 64 KB as an example.

Each row (entry) of the cache management information 3000 is data of an area corresponding to one stripe of the virtual volume specified by VOL # 3010 which is an identification number (virtual volume number) given to the virtual volume and an address 3020 in the virtual volume. This indicates that the area for caching the LBA0 space address for the stripe size starting from the cache LBA0 (3030) and the LBA1 space address for the stripe size starting from the cache LBA1 (3040). When a cache area is not allocated to an area for one stripe of a virtual volume, an invalid value (NULL) is stored in the cache LBA0 (3030) or the cache LBA1 (3040). In the example of FIG. 20-A, the area for caching data in the area (stripe) where VOL # 3010 is 0 and address 3020 is 0 is for 64 KB (for the stripe size) where the cache LBA0 (3030) starts from 0. It is an area. Further, since the cache LBA1 (3040) is NULL, it indicates that the area of the LBA1 space is not allocated. The address 3020 stores a stripe number.

Bit map 3050 is 16-bit information indicating in which area data is stored in one stripe area specified by cache LBA0 (3030). Each bit represents information about a 4 KB area in one stripe. When each bit is 1, data is stored in an area corresponding to the bit, and when 0, data is stored. It means not. In the example of FIG. 20A, the bitmap 3040 corresponding to the row (first row) where the cache LBA 0 (3030) is 0 is “0x8000”, that is, the first bit of 16 bits is 1. Therefore, the cache LBA0 (3030) indicates that data is stored in the first 4 KB in the area corresponding to one stripe starting from 0.

In the attribute 3060, “Dirty” or “Clean” information is stored as information indicating the state of data cached in the area specified by the cache LBA0 (3030). When “Dirty” is stored in the attribute 3060, it means that the data in the area specified by the cache LBA 0 (3030) is not reflected in the final storage medium (SSD 111 or HDD 112), and “Clean” is stored. If it is, it indicates that the cached data has already been reflected in the final storage medium. The last access time 3070 represents the time when the cached data was last accessed. The last access time 3070 is used as reference information when selecting data to be destaged to the last storage medium from among a plurality of data stored in the cache area from the higher-level device 103 (for example, the data with the oldest last access time is Select). Therefore, the cache management information 3000 may store other information used for selecting the destage target data at the last access time 3070.

In addition, since the storage controller 110 needs to manage an unused (no cache data) area in the storage space of the LBA0 space and the LBA1 space of the NVM module 126, the storage controller 110 stores information on the unused area list. Have. This is called a free list 3500, and an example is shown in FIG. 20-B. The free list 3500 includes a free LBA 0 list 3510 and a free LBA 1 list 3520. The addresses of unused LBA 0/1 areas are stored in the respective lists (3510, 3520).

When a write request for the Nth area of the virtual volume address (stripe number) is received from the host device 103, the LBA0 address is acquired from the free LBA0 list 3510, and the address 3020 of the cache management information 3000 is N rows. The cache area can be secured by storing the acquired LBA0 address in the cache LBA0 (3030).

Further, when the compressed data is read from the final storage medium and stored in the NVM module 126, it is necessary to allocate an area on the LBA1 space. Therefore, the LBA1 address is obtained from the free LBA1 list 3520, and the cache LBA1 of the cache management information 3000 is acquired. The acquired LBA1 address is stored in (3040).

Next, the extension read operation will be described. In S2401, which is the first step of the decompression read operation of the storage apparatus 101, the storage apparatus 101 receives a read request and a read target address from the host apparatus 103.

In step S2402 following S2401, the processor 121 checks whether the read target data exists in the NVM module 126 (cache) (cache hit) using the read address acquired in S2401. The processor 121 checks whether or not a value is stored in the cache LBA0 (3030) of the row corresponding to the read address acquired in S2401 in the cache management information 3000. If the value is stored, it is determined as a cache hit. To do. On the other hand, if no value is stored, it is determined that there is a cache miss.

In step S2403 following S2402, the process branches according to the condition determined in S2402. If it is determined in S2402 that the cache miss has occurred, the processor 121 performs the processing from S2404 onward. On the other hand, if it is determined in S2402 that there is a cache hit, the processor 121 performs the process of S2413.

In step S2404 following S2403, since a cache miss is determined in S2402, a virtual PDEV associated with the read target address is acquired. More specifically, the processor 121 refers to the virtual PDEV management information 2210, and acquires the RAID group number 2213 of the RAID group corresponding to the read target address and the start address 2214 in the RAID group. Further, the cache areas (LBA0, LBA1) corresponding to the read target address are secured and stored in the cache LBA0 (3030) and the cache LBA1 (3040) of the cache management information 3000.

The processor 121 acquires the RAID group management information 2220 of the RAID group indicated by the value of the RAID group number 2213, and acquires the virtual PDEV number 2222 registered in the RAID group. Then, the virtual PDEV number storing the target data and the virtual PDEV internal address are calculated from the read address acquired in S2401. At this time, depending on the read address and the request size, the read request area from the host device may straddle a plurality of virtual PDEVs. In this case, in order to respond to the read request, a plurality of virtual PDEVs and addresses in the virtual PDEV are used. calculate.

In step S2405 following S2404, an entry of compression management information 2300 for managing the virtual PDEV number and the virtual PDEV address acquired in S2404 is acquired from the PDEV. The processor 121 refers to the virtual PDEV information 2230 and specifies the PDEV associated with the virtual PDEV number acquired in S2404. In the storage apparatus 101, the PDEV is uniquely associated with the virtual PDEV, and the compression management information 2300 is recorded in a specific area in the PDEV. For this reason, as described above, the address at which the entry of the compression management information 2300 recorded in the PDEV is stored is calculated from the address in the virtual PDEV, and the compression information entry is acquired from the PDEV.

In step S2406 following S2405, the processor 121 refers to the compression management information entry acquired in S2405, and records it in the PDEV from the start address 2301 in the PDEV in which the compressed data is stored and the length 2302 of the compressed data. Specify the storage area for compressed data.

In step S2407 following S2406, the processor 121 reads the compressed data from the compressed data storage area acquired in S2406. The read data is temporarily stored in the DRAM 125.

In step S2408 following S2407, the processor 121 writes the compressed data by designating LBA1 of the NVM module 126 which is a cache device. The processor 121 uses the write command 1010 to specify LBA1 and writes the compressed data.

In step S2409 following S2408, the storage apparatus 101 creates compression information necessary for decompressing the compressed data based on the entry of the compression management information acquired in S2405, and transfers it to the NVM module 126. The processor 121 transfers the compressed information to the NVM module 126 using the compressed information transfer command 1810 shown in FIG.

Step S2410 following S2409 is a step of mapping the compressed data to LBA0 in order to decompress and acquire the compressed data written by the storage apparatus 101 in S2408. The processor 121 instructs the NVM module 126 to map the compressed data to LBA0 using the LBA0 mapping command shown in FIG. The NVM module 126 that has acquired the command refers to the compression information related to the compressed data associated with LBA1, and associates the compressed data with the area of LBA0 corresponding to the data size after expansion of the compressed data.

In step S2411 subsequent to S2413 or 2410, the processor 121 reads data that has been read into the cache area by the processing of S2407 to S2409 and mapped to the LBA0 space (data that has already been associated with LBA0 in the case of a cache hit). Is obtained by decompressing and reading by designating LBA0. The NVM module 126 that has acquired the read command designating LBA 0 acquires the compressed data associated with LBA 0 from the FM 420, decompresses it with the compression / decompression unit 418, and returns it to the storage controller 110 (DRAM 125).

In step S2412 following step S2411, the processor 121 returns the decompressed data acquired in S2411 to the server as response data to the read request. Since the cache LBA1 (3040) area is an unused area, the cache LBA1 (3040) value is returned to the free list, the cache LBA1 (3040) value of the cache management information 3000 is set to NULL, and the process ends. To do.

In step S2413, when it is determined in step S2403 that there is no cache miss (cache hit), the processor 121 refers to the cache management information 3000, and the read target area is already stored in LBA0 (cache LBA0 (3030)). ) Information.

The above is the extension read operation in this embodiment.

(1-25) Write Data Cache Storage Operation of Storage Device Next, the write data cache storage operation of the storage device will be described. The write data cache storage operation of this embodiment is the processes 311 to 314 of the write data compression operation of this embodiment shown in FIG. The write data cache storage operation will be described with reference to the flowchart of FIG.

The first step S2501 of the storage device write data cache storage operation is a step in which the storage device 101 receives write data and a write destination address from the host device. At this time, the write data is once recorded in the DRAM 125 of the storage apparatus 101 as in a data flow 311 shown in FIG. If there is a function of directly transferring data from the host interface 124 to the NVM module 126, the data may not be recorded in the DRAM 125 of the storage apparatus 101.

Step S2502 following S2501 is a step of performing a cache hit determination using the write address acquired by the processor 121 in S2001. Here, the same process as step S2402 of the extension read operation is performed.

Step S2503 following S2502 is a step that branches depending on the determination result of S2502. If the result of S2502 is a cache hit, the process proceeds to S2504. If the result of S2501 is a cache miss, the process proceeds to S2509.

In step S2504 following S2503, the processor 121 acquires LBA0 of the area staged on the NVM module 126.

Step S2509 following S2503 is a step in which the processor 121 newly secures LBA0 of the NVM module 126 for recording the write data. The securing of LBA0 is the same as the processing performed in S2404 of the decompression read processing, but it is not necessary to secure the area of the LBA1 space here.

In step S2505 following S2504 and S2509, the processor 121 specifies LBA0 acquired in S2504 or S2509, and writes data to the NVM module 126 using the write command 1010 shown in FIG. At this time, the write data is transferred from the DRAM 125 of the storage apparatus 101 to the data compression / decompression unit 418 of the NVM module 126 and compressed, as shown in the data flow 312 shown in FIG. Recorded in the buffer 416. The compressed data recorded in the data buffer 416 is recorded in the FM 420 at an arbitrary timing as in the data flow 314.

In step S2506 following S2505, the processor 121 obtains the write response shown in FIG. 10 from the NVM module 126, and the compressed data of the data written in S2505 from the compressed data length 1023 field of the write response information 1020. Get the size.

In step S2507 following S2506, the processor 121 updates the bitmap 3050, the attribute 3060, and the last access time 3070 of the cache management information 3000.

In step S2508 subsequent to step S2507, the storage apparatus 101 determines whether the total amount of data in which RAID parity is not generated among the compressed data held in the cache configured by the NVM module 126 is equal to or greater than the threshold value. It is. When the total amount of data for which RAID parity is not generated among the compressed data held in the cache exceeds a threshold value, the storage apparatus 101 needs to generate parity for the compressed data held in the cache. It judges that there is, and shifts to the parity generation operation. On the other hand, if the total amount of data for which no RAID parity is generated among the compressed data held in the cache is less than or equal to the threshold value, the storage apparatus 101 determines that parity generation is unnecessary, and performs cache storage operation for write data. finish. The above is the write data cache storage operation in this embodiment.

(1-26) RAID Parity Generation Operation of Storage Device Next, the RAID parity generation operation of the storage device in this embodiment will be described. In the RAID parity generation operation of this embodiment, the total amount of data in which RAID parity is not generated among the compressed data held in the cache in step S2008 in the write data cache storage operation shown in FIG. It is not limited to the aspect performed only when it becomes above. The RAID parity operation may be performed by the storage apparatus 101 at an arbitrary timing. For example, it is performed when there are few or no requests from the host device 103.

The RAID parity generation operation of this embodiment is the processes 315 to 320 of the write data compression operation of this embodiment shown in FIG. The RAID parity generation operation will be described with reference to the flow of FIG.

The first step S2601 of the RAID parity generation processing of the storage apparatus is a step in which the processor 121 selects data to be a parity generation target from the data recorded in the cache area configured by the LBA0 of the NVM module 126. is there. At this time, the processor 121 refers to the last access time 3070 of the cache management information 3000 and selects data having a long elapsed time since the last access. Further, data that is a parity generation target may be selected according to some other rule. For example, data with a relatively low update frequency may be selected.

Step S2602 following S2601 is a step of securing a recording destination area of the parity to be generated on the LBA0, which is a logical space provided by the NVM module 126, by the processor 121. The processor 121 refers to the free list 3500 and secures an unused LBA0. The selected LBA 0 is managed by the parity cache area management information (not shown) similar to the cache management information 3000.

Step S2603 following S2602 is a step of determining whether to perform full stripe parity generation. If all data belonging to the same stripe column as the data selected in S2601 exists in the cache, the processor 121 moves to S2604 in order to generate full stripe parity. On the other hand, if there is only a part of the data belonging to the same stripe column as the data selected in S2601, the process proceeds to S2607 to generate updated parity.

The method of searching the cache for data belonging to the same stripe column as the data selected in S2601 refers to VOL # 3010 and address 3020 for each row stored in cache management information 3000, and VOL # 3010 and address 3020. Is within the same stripe column range as the data selected in S2601. For example, FIG. 21 will be described as an example. If the data selected in S2601 is Data14, VOL # 3010 and address 3020 of each row stored in the cache management information 3000 are referred to, Vol # 3010 is equal to the virtual volume number to which Data14 belongs, and address 3020 is set. Data belonging to the same stripe is a value obtained by dividing the stripe number (14) of Data 14 by 3 (3), which is equal to the result (3). Furthermore, if the values of the respective bitmaps 3050 are the same, it can be determined that data belonging to the same stripe column as the data selected in S2601 is stored in the cache.

Step S2604 following S2603 is a step for instructing the NVM module 126 to map the RAID parity to the LBA0 area secured by the storage apparatus 101 in S2602. The processor 121 uses the full stripe parity generation command 1310 shown in FIG. 13 to specify the compressed data for generating the parity by the LBA 0 start addresses 0 to X (1315 to 1317), and also for the generated parity mapping location. The LBA 0 start address (for XOR parity) 1318 and the LBA 0 start address (for RAID 6 parity) 1319 are specified.

The NVM module 126 that has received the full stripe parity generation command reads the compressed data recorded in the FM 420 to the data buffer 416 in the NVM module 126 if the area associated with LBA 0 is FM 420 (corresponding to LBA 0). (Not required if the attached area is the data buffer 416 in the NVM module 126). The parity generation unit 419 in the NVM module 126 is instructed to generate parity for the compressed data in the data buffer 416. Upon receipt of the instruction, the parity generation unit 419 acquires data from the data buffer 416 by the data compression / decompression unit 418 and expands it, and generates parity from the expanded data. Further, the parity generation unit 419 transfers the generated parity to the data compression / decompression unit 418, compresses it, and records it in the data buffer 416 or FM 420 in the NVM module 126. The PBA associated with the generated parity recording area (data buffer 416 or FM 420) is designated by LBA0 (LBA0 start address (for XOR parity) 1318 and LBA0 start address (for RAID6 parity) 1319. LBA0).

Step S2607 following S2603 is a step in which the storage apparatus 101 acquires the compressed data of the old data and the compressed data of the old parity from the final storage medium configured in RAID and designates and writes LBA1 in order to generate updated parity. It is. The processor 121 acquires, from the free list, LBA1 for storing the compressed data of the old data and the compressed data of the old parity. The processor 121 temporarily stores the acquired LBA1 information.

Subsequently, it is necessary to specify the old data necessary for parity generation, the virtual PDEV in which the old parity is stored, and the address in the virtual PDEV. The virtual PDEV in which the old data is stored is the same as the virtual PDEV in which the new data (the data selected in S2601) is stored. Further, as described above, the old parity virtual PDEV can be obtained by simple calculation from the address (stripe number) of the new data. Since the addresses in the virtual PDEV for the old data and the old parity are the same as the addresses in the virtual PDEV to be stored for the new data (the data selected in S2601), the processor 121 stores the data selected in S2601. What is necessary is just to specify the address in virtual PDEV which should be performed. Subsequently, the storage position in the PDEV of the compressed data is specified from the old data required for parity generation and the address in the virtual PDEV of the old parity, and the data is read from the PDEV, by the same processing as S2404 to S2407 of the above-described read operation. . Thereafter, the processor 121 writes the old compressed data and the old parity to the secured LBA 1 using the write command 1010 shown in FIG.

Step S2608 following S2607 is a step of mapping the old data and the old parity compressed state data recorded in the LBA1 area in S2607 to the LBA0 space area. The processor 121 acquires from the free list 3500 an LBA0 area to which the data size after decompression of each compressed data can be mapped. Then, a plurality of LBA0 mapping commands shown in FIG. 19 specifying each LBA0 and each LBA1 are transferred to the NVM module 126, and the decompressed image of the compressed data recorded in the LBA1 area written in S2607 is displayed in the LBA0 area. To map.

Step S2609 following S2608 is a step of performing update parity generation using the data (update data) selected in S2601, the old compressed data mapped to LBA0 in S2608, and the old parity. The processor 121 uses the updated parity generation command 1410 shown in FIG. 14 to specify the compressed data, the old compressed data, and the old parity area by LBA0, and also specifies the storage location of the updated parity by LBA0. The flow of processing performed by the NVM module 126 that has received the update parity generation command is substantially the same as the flow of processing that is performed when the full stripe parity generation command is received as described above.

Step S2605 subsequent to S2604 or S2609 is a step of obtaining the correct data size after compression of the parity generated in S2609 or S2604. In this step, the processor 121 creates a compressed data size acquisition command 1110 in which the LBA 0 in which the generated parity is stored is specified at the LBA 0 start address 1113 of the command parameter, and issues it to the NVM module 126. Then, the processor 121 acquires the data size after compression of the parity by the compressed data size acquisition response 1120.

In step S2606 following S2605, it is determined whether destage is necessary. The processor 121 determines whether the compressed data on the cache that has generated the parity should be recorded in the final storage medium. This determination is made, for example, based on the cache free area. If the free area in the cache is equal to or smaller than the threshold value, the storage apparatus 101 starts destage processing to create a free area. On the other hand, if it is determined that there is a sufficient free area in the cache, the parity generation process ends.

The above is the parity generation operation in this embodiment. Although description has been made on the premise of the parity generation operation of RAID 5 in the present embodiment, the same applies to RAID 6.

(1-27) Destaging Operation of Storage Device Next, the destaging operation of the storage device in this embodiment will be described. In this embodiment, the destage operation is not limited to a mode that is executed only when it is determined that destage is necessary in step 2606 in the RAID parity generation operation shown in FIG. The destage operation in this embodiment may be performed by the storage apparatus 101 at an arbitrary timing. For example, it may be performed at an arbitrary timing when there are few or no requests from the host device.

The destage operation of the present embodiment is the processes 321 to 323 of the write data compression operation of the present embodiment shown in FIG. Destage operation will be described with reference to the flowchart of FIG.

The first step S2701 of the destaging operation of the storage device is a step of selecting data to be destaged from the NVM module 126 which is a cache device. At this time, the processor 121 selects an area to be destaged from the LBA0. The destage refers to the last access time 3070 of the cache management information 3000 and may target data that has not been accessed recently from the upper level apparatus 103 or other statistical information managed by the storage apparatus 101. On the basis of this, a method of targeting data determined to be sequential write data may be adopted. Note that here, the parity generated by the processing of FIG.

In step S2702, following step S2701, the storage apparatus 101 acquires the data size after compression of the data in the LBA0 space area selected in S2701 from the NVM module 126. The processor 121 transfers the compressed data size acquisition command 1120 shown in FIG. 11 to the NVM module 126, acquires the compressed data length 1123 in the compressed data size acquisition response 1120, and sets the compressed data size to be acquired in the destage operation. To grasp.

In step S2703 following step S2702, the storage apparatus 101 maps the compressed data in the LBA0 area determined in S2701 to the LBA1 area. The processor 121 transfers the LBA1 mapping command 1210 describing the LBA1 area to which the compressed data length can be mapped acquired in step S2702 to the NVM module 126.

In step S2704 following step S2703, the storage apparatus 101 acquires compressed data from the LBA1 area mapped in step S2703. The processor 121 describes the LBA1 mapped in S2703 in the read command 1610 shown in FIG. 16, and acquires the compressed data by transferring it to the NVM module 126.

Step S2704 'following step S2704 is a step of specifying the storage destination address of the write target data. Since the address of the storage destination virtual volume of each write target data in the cache area (the LBA0 space of the NVM module 126) is stored at the address (3020) in the cache management information 3000, the processor 121 uses this. The virtual PDEV associated with this address and the address of the virtual PDEV are calculated. The calculation method is as described above.

Step S2705 subsequent to step S2704 'is a step of recording the compressed data acquired in step S2704 on the PDEV. First, in step S2704 ′, the write data storage destination virtual PDEV is specified. In step S2705, the processor 121 refers to the virtual PDEV information 2230 and selects the PDEV associated with the write data storage destination virtual PDEV. Identify. Subsequently, the free area of the specified PDEV is selected, the selected free area is determined as a storage location for compressed data, and the compressed data acquired in S2704 is stored in the PDEV area for the determined storage location. Record.

The storage apparatus 101 according to the embodiment of the present invention manages information about free areas (areas not associated with virtual PDEVs) of each PDEV (SSDorHDD) in the storage controller 110 (for example, DRAM 125). This information is used when the processor 121 selects a free area in the PDEV. Alternatively, since the area other than the area registered in the compression management information 2300 (the area specified by the start address 2301 in the PDEV in which the compressed data is stored) is an empty area, the processor 121 stores the compressed data. A method may be employed in which the compression management information 2300 is read from the destination PDEV, and an empty area is specified based on the compression management information 2300.

Step S2706 following step S2705 is a step of releasing the LBA1 area mapped for obtaining compressed data in S2703. The processor 121 releases LBA1 using a mapping release command 1710 shown in FIG. Further, the processor 121 stores the released LBA1 information in the free LBA1 list 3520 and deletes it from the cache management information 3000.

In step S2707 following step S2706, the compression management information 2300 is updated and recorded in the PDEV. The processor 121 reads the compression management information 2300, and stores the compressed data at S2705 in the first address 2301 column in the PDEV storing the compressed data of the compressed information entry corresponding to the virtual PDEV area targeted for destaging. The address of the PDEV area is recorded and updated. Then, the updated compression management information 2300 is recorded in the PDEV. When the compression management information 2300 is updated, it is not necessary to read and update all the information stored in the compression management information 2300, and only a necessary area may be read and updated.

The above is the destage operation in the present embodiment.

(1-28) Entry Regeneration Operation of Compression Management Information in Storage Device Next, the partial recovery operation of the compression management information 2300 in the embodiment will be described. The operation of regenerating the entry of the compression management information 2300 in the embodiment is performed when entry disappearance of the compression management information 2300 is detected. In the embodiment, the entry loss detection opportunity is at the time of regularly monitoring the entry in the storage device, or when the entry of the compression management information 2300 is acquired by the read or destage operation.

The storage apparatus 101 according to the embodiment of the present invention is characterized in that the compression management information 2300 can be regenerated by the partial recovery operation of the compression management information 2300 and a rebuild process described later. With this function, the storage apparatus 101 can maintain the reliability of the storage apparatus without redundantly holding the compression management information 2300.

The partial recovery operation of the compression management information 2300 will be described with reference to FIG. For simplification of description, a case where only one entry of the compression management information 2300 is lost will be described below. However, this process is applicable even when a plurality of entries of the compression management information 2300 are lost.

S2801, which is the first step of the partial recovery operation of the compression management information 2300, is a step of calculating a virtual PDEV area (address within the virtual PDEV) managed by the lost entry of the compression management information 2300. As described above, the storage apparatus 101 according to the embodiment of the present invention fixedly assigns the recording position of the entry of the compression management information 2300 in the PDEV in accordance with the virtual PDEV area managed by the entry. Therefore, the processor 121 can identify the virtual PDEV area managed by the entry from the address in the PDEV of the lost entry. For example, if the compression management information 2300 having the contents shown in FIG. 23 is stored in the PDEV and the compression management information entry whose recording position (relative address in the PDEV) is stored at 0x00_0000_0008 cannot be read, It can be seen that this is a compression management information entry for a 4 KB area starting from the address 0x0000_0000_1000 of the PDEV area. In this case, in the partial recovery operation according to the embodiment of the present invention, the data stored in the 4 KB area starting from the address 0x0000_0000_1000 of the virtual PDEV area is regenerated using the data read from the other PDEVs constituting the RAID group. Then, the process of rewriting the PDEV and recreating the entry of the compression management information based on it is performed after S2802.

In step S2802, following step S2801, the RAID group to which the virtual PDEV in which the lost compression management information 2300 was stored belongs is specified. The processor 121 searches the RAID group management information 2220 and identifies the corresponding RAID group.

In step S2803 following step S2802, in order to restore the virtual PDEV area managed by the lost entry, the processor 121 stores data necessary for restoring the data in the virtual PDEV area specified in S2801, Obtain from each PDEV. This process will be described by taking as an example the case where the compression management information entry for the 4 KB area starting from the virtual PDEV area address 0x0000_0000_1000 described above has disappeared. In this case, among the plurality of virtual PDEVs constituting the RAID group identified in S2802, virtual PDEVs other than the virtual PDEV in which the lost compression management information 2300 was stored (hereinafter, these virtual PDEVs are referred to as “other virtual PDEVs”). , The compressed data corresponding to the virtual PDEV area identified in S2801 (that is, the 4 KB area starting from the virtual PDEV area address 0x0000_0000_1000) is read. Therefore, the processor 121 reads the compression management information 2300 from each of the other virtual PDEVs, and the PDEV address associated with the 4 KB area starting from the virtual PDEV area address 0x0000_0000_1000 of the other virtual PDEV, and the length of the compressed data The compressed data is read from the PDEV associated with the other virtual PDEV using the acquired PDEV address and the length of the compressed data.

In step S2804 following step S2803, the processor 121 records a plurality of compressed data (data necessary for restoring the data in the virtual PDEV area specified in S2801) acquired in S2803 in the NVM module 126. Map the decompressed image to LBA0. Here, the same processing as in steps S2408 to S2410 in FIG. 24 may be performed. Note that it is necessary to secure the LBA0 and LBA1 space areas before data is recorded in the NVM module 126, but this is the same as the processing performed in S2404 of FIG.

In step S2805 following step S2804, the processor 121 restores the data in the virtual PDEV area identified in step S2801 using the RAID function from the plurality of data in the RAID stripe column mapped to LBA0 of the NVM module 126 in step S2804. To do. For this, a full stripe parity generation command 1310 may be used. As a parameter of the full stripe parity generation command 1310, the address of the data mapped in the LBA0 space in step S2804 is stored in the LBA0 start address (1314 to 1316). Then, an LBA0 space area for storing data to be restored is secured, and a command in which the start address of the secured LBA0 space area is stored in the LBA0 start address (for XOR parity) 1317 is created and issued to the NVM module 126 To do.

In step S2806 following step S2805, the processor 121 maps the compressed data of the decompressed data generated in S2805 to LBA1, and acquires the compressed data. Here, processing similar to that in steps S2702 to S2704 in FIG. 27 is performed.

Step S2807 following step S2806 is a step of recording the compressed data acquired in S2806 on the PDEV. In this process, as in step S2705 of FIG. 27, the processor 121 identifies the PDEV associated with the restored data storage destination virtual PDEV, selects a free area in the identified PDEV, and selects the selected free area. May be determined as the storage location of the compressed data, and the compressed data acquired in S2806 may be recorded in the PDEV area for the determined storage location.

Step S2808 following step S2807 is a step of updating the compression management information 2300 and recording it in the PDEV. The processor 121 records the address of the PDEV area to which the compressed data is written in S2807 in the first address 2301 column in the PDEV in which the compressed data of the compression management information 2300 entry of the virtual PDEV area to be destaged is stored. Update. Then, the lost entry is restored by recording the updated compression management information entry in the PDEV.

The storage device 101 can regenerate each entry of the compression management information 2300 from the data even if the compression management information 2300 is lost by the entry regeneration operation of the compression management information 2300.

The above is the compression regeneration information entry regeneration operation of this embodiment.

(1-28) Rebuild Operation of Storage Device Next, the rebuild operation in this embodiment will be described. The storage apparatus 101 according to the embodiment of the present invention performs the rebuild process shown in FIG. 29 when one of the PDEVs constituting the RAID group fails and cannot be accessed. In the first step S2901 of the rebuild process, the processor 121 identifies a virtual PDEV associated with the failed PDEV (hereinafter referred to as a failed virtual PDEV).

In step S2902, following step S2901, the RAID group to which the failed virtual PDEV belongs is specified. The processor 121 searches the RAID group management information 2220 and identifies the corresponding RAID group.

In step S2903 subsequent to step S2902, the processor 121 retrieves data necessary for recovering data in each area of the failed virtual PDEV identified in S2901 in order to restore the failed virtual PDEV area in S2902. Obtained from a plurality of PDEVs that are associated with a plurality of virtual PDEVs other than the failed virtual PDEV in the identified RAID group. Here, the processing described in step S2803 in FIG. 28 is performed for all areas of the failed virtual PDEV (all addresses from the virtual PDEV address 0 to the maximum address). That is, the compression management information 2300 is read from (a plurality of) virtual PDEVs (hereinafter referred to as “other virtual PDEVs”) other than the failed virtual PDEV, and the data of the other virtual PDEV addresses 0x0000_0000_0000, 0x0000_0000_1000,. To go. However, for the area in the compression management information 2300 in which the start address 2301 in the PDEV in which the compressed data is stored is not assigned (NULL), the area is not associated with the PDEV area, so it is not necessary to read it out. Absent.

Also, steps S2904 to S2908 described below perform the same processing as S2804 to S2808 in FIG. The difference from the process of FIG. 28 is that the process of FIG. 28 is performed only for a part of the area in the virtual PDEV (the specific area in the virtual PDEV managed by the lost entry). The process is to perform the process on all areas of the faulty virtual PDEV (excluding areas where the PDEV areas are not associated).

In step S2904 subsequent to step S2903, the processor 121 records the compressed data acquired in S2903 in the NVM module 126, and maps the decompressed image to LBA0. Here, the same processing as step S2804 in FIG. 28 is performed.

In step S2905 following step S2904, the processor 121 obtains data of the failed virtual PDEV area identified in S2901 using the RAID function from a plurality of data in the RAID stripe column mapped to LBA0 of the NVM module 126 in step S2904. Restore. The restored data is compressed and recorded in the LBA0 space of the NVM module 126. Here, the same processing as step S2805 in FIG. 28 is performed.

In step S2906 following step S2905, the processor 121 maps the compressed data of the decompressed data generated in S2905 to LBA1, and acquires the compressed data. Here, the same processing as step S2806 in FIG. 28 is performed.

Step S2907 following step S2906 is a step of recording the compressed data acquired in S2906 in a new PDEV. The storage apparatus 101 holds one or more PDEVs that are spares for the failed PDEV, and is used as a substitute for the failed PDEV. Hereinafter, this PDEV is referred to as an alternative PDEV. The processor 121 selects a virtual PDEV corresponding to the alternative PDEV, registers the virtual PDEV number of the virtual PDEV corresponding to the alternative PDEV in the virtual PDEV number column 2222 of the RAID group management information 2220, and corresponds to the failed PDEV. Delete the PDEV number. As a result, the selected virtual PDEV is added as a substitute for the failed virtual PDEV to the RAID group associated with the failed virtual PDEV. The processor 121 records the compressed data acquired in step S2906 in the alternative PDEV area. The compressed data can be stored in the alternative PDEV by performing the same process as in step S2705 in FIG.

In step S2908 following step S2907, the processor 121 generates compression management information for managing the association between the substitute virtual PDEV area and the substitute PDEV area stored in step S2907, and the generated compression management information 2300 is generated. Is recorded on the PDEV. Thereby, the data restoration at the time of the failure of the PDEV is completed.

As described above, in the storage apparatus 101 according to the embodiment of the present invention, when the PDEV fails, the data is recovered using the decompressed data of the other virtual PDEV area of the RAID group that the failed virtual PDEV configures. . Then, the recovery data is compressed and recorded in an alternative PDEV. At this time, by regenerating the compression management information that associates the alternate virtual PDEV with the area of the substitute PDEV, the compression management information lost due to the PDEV failure is regenerated.

In the process of FIG. 29, the processes of steps S2903 to S2908 are performed for all areas of the failed virtual PDEV, but it is not necessary to process the entire area at once in each step, and for each partial area (stripe Or every 4 KB which is a data compression unit of the storage apparatus 101 according to the embodiment of the present invention.

The above is the description of each process in the storage apparatus according to the embodiment of the present invention. In the case of a storage device that compresses data and stores it in the final storage medium, such as a storage device according to an embodiment of the present invention, the change in the data due to compression is concealed from a host device such as a server, as if uncompressed It provides a storage area (virtual uncompressed volume) where data appears to be recorded in the state. In this case, information (compression management information) for managing the association between the virtual uncompressed volume area provided to the host device and the physical area that is the recording destination of the compressed data is required.

The compression management information is management information for managing the virtual non-compressed volume and the physical recording destination of the compressed data, and is indispensable for the response to the data read request from the server. Therefore, from the viewpoint of the reliability of the storage apparatus, the loss of compression management information is equivalent to the loss of retained data. For this reason, in order to retain the compression management information, it is necessary to maintain at least the same level of reliability as data.

As a method of recording the compression management information in an inexpensive storage medium and maintaining the same level of reliability as the data, a method of protecting the compression management information with RAID (mirroring, parity) like the data can be considered. However, in this method, when updating the recording data, it is necessary to update the parity of the compression management information as well as the data. At the same time, redundant information such as mirror data and parity is required, so that a large storage area is consumed for the compression management information, and the cost of the storage apparatus is increased.

In the storage apparatus of the present invention, the compression management information for managing the association of the compressed data with the recording destination on the final storage medium is divided for each final recording medium, and only the correspondence relation related to one final recording medium is managed. The compression management information is recorded in a specific area of the final recording medium to be managed. In the case of this recording method, the compression management information is lost for some reason, and even if a failure that makes it inaccessible occurs, the data managed by the compression management information that has become inaccessible is regenerated by RAID technology and regenerated. Data (recovery data) can be compressed and written to the final storage medium. At the same time, compression management information corresponding to the recovery data written to the final storage medium can be created and written to the final storage medium to restore the compression management information. It is. Therefore, in the storage apparatus of the present invention, it is not necessary to store the compression management information in a redundant manner, and the consumption of the storage area by the compression management information can be reduced.

The embodiment of the present invention has been described above, but this is an example for explaining the present invention, and is not intended to limit the present invention to the embodiment described above. The present invention can be implemented in various other forms.

For example, in the embodiment described above, the storage apparatus forms a virtual volume in which storage areas of a RAID group configured using storage areas of a plurality of virtual PDEVs are statically assigned, and this virtual volume is used as a host apparatus. Was offered to. However, as another embodiment, a volume formed by using a so-called Thin Provisioning technology (also referred to as a Dynamic Provisioning technology) that dynamically allocates a physical storage area may be adopted as a volume to be provided to a host device.

Dynamic provisioning is a function that can define a volume that is larger than the storage capacity of the final storage medium (SSD 111 or HDD 112) installed in the storage device (hereinafter, this volume is referred to as “DP volume”). With this function, the user does not necessarily have to install a final storage medium with the same capacity as the defined volume (DP volume) in the storage device in the initial state. Add storage media.

The DP volume is one of the volumes virtually created by the storage device, and is created with an arbitrary capacity designated by the user or the host device. In the initial state, no storage area is allocated to the DP volume. When data is written from the host device 103, a storage area is allocated as much as necessary. As a storage area to be allocated to the DP volume, for example, the virtual volume 200 in the embodiment described above is managed for each fixed-size storage area (this storage area is called a Dynamic Provisioning page (or DP page)). It may be assigned to the DP volume. As described above, the storage area of the virtual volume 200 is a storage area that is treated as if data before compression is stored. For this reason, the DP volume using the storage area allocated from the virtual volume 200 also conceals that the data is compressed and stored in the higher level device 103.

Further, as described above, the storage controller 110 determines whether or not the virtual volume based on the compression information such as the compressed data length or the compression rate (ratio between the data amount before compression and the data amount after compression) acquired from the NVM module 126. The size of the virtual PDEV that constitutes may be increased or decreased. As the size of the virtual PDEV increases or decreases, the number of DP pages that can be extracted from the virtual volume 200 also increases or decreases. The storage device manages the increase / decrease in the number of DP pages depending on the compression ratio, and when the remaining DP page amount is below a certain level, the storage device 101 or the management terminal of the storage device 101 needs to add the final storage medium. Notice. The user may add the final storage medium to the storage apparatus 101 at the time of receiving the notification.

When a DP volume is provided to the host device 103, a DP volume, which is a fixed size volume determined in advance, is provided to the host device 103. Even when the increase / decrease occurs, the host device 103 or the user using it does not need to be aware of the increase / decrease of the storage area, and when the storage area (DP page) that can be used increases due to the improved compression ratio, There is an advantage that the increased storage area can be effectively used.

101: Storage device 102: SAN
103: Host device 104: Management device 110: Storage controller 111: SSD
112: HDD
121: Processor 122: Internal SW
123: Disk interface 124: Host interface 125: DRAM
126: NVM module 410: FM controller 411: I / O interface 413: RAM
414: Switch 416: Data buffer 417: FM interface 418: Data compression / decompression unit 419: Parity generation unit

Claims

In a storage apparatus having one or more processors and a controller having a cache device, and a plurality of storage media, wherein the plurality of storage media constitute one or more RAID groups.
One of the RAID groups includes (n + m) (n ≧ 1, m ≧ 1) storage media among the plurality of storage media,
The processor manages the virtual volume provided to the host device in association with the RAID group,
The storage area of the virtual volume is divided into stripes of a predetermined size, and each of the n stripes sequentially selected from the stripe located at the head of the storage area of the virtual volume constitutes the RAID group (n + m) Managed in association with one of the storage media,
When the processor stores the data stored in the n stripes in the storage medium,
Causing the cache device to generate m parities from the data stored in the n stripes;
The cache device compresses the data stored in the n stripes and the generated m parities,
Storing each of the compressed data stored in the n stripes in the storage medium associated with the stripe;
Each of the compressed m parities is stored in m storage media not associated with the n stripes among (n + m) storage media constituting the RAID group.
A storage apparatus characterized by the above.
The cache device includes a storage medium composed of a nonvolatile semiconductor memory (NVM), a data compression unit, and a parity generation unit,
When the processor receives write data stored in the n stripes from the host device,
The processor sends the write data to the cache device by issuing a write command to the cache device,
The cache device that has received the write command stores the received write data in a compressed state in the nonvolatile semiconductor memory,
The processor issues a parity generation command to the cache device when the cache device generates the m parities.
The cache device that has received the parity generation command,
Read and decompress the write data stored in the non-volatile semiconductor memory,
Parity is generated from the expanded write data,
The generated parity is stored in the nonvolatile semiconductor memory in a compressed state,
The processor issues a read command to the cache device,
The cache device reads the compressed write data and the compressed parity stored in the nonvolatile semiconductor memory, and stores each of the compressed write data and the compressed parity. Store on media,
The storage apparatus according to claim 1, wherein:
When the processor receives write data from the host device together with a write request specifying a storage location on the virtual volume,
1) Specify the storage medium in which the write data is to be stored from the storage location on the virtual volume,
2) calculating a virtual physical address on the specified storage medium;
The virtual physical address is an address when data in the storage area on the virtual volume is stored in the storage medium in an uncompressed state,
3) Select an unused area on the storage medium,
4) Store the write data compressed by the cache device in the unused area,
5) storing compression management information, which is information for managing the correspondence between the virtual physical address and the address of the unused area in which the write data is stored, in a storage medium in which the write data is stored;
The storage apparatus according to claim 1, wherein:
When the processor receives a read request specifying the storage location on the virtual volume from the host device,
Based on the storage position on the virtual volume, the storage medium storing the data specified by the read request and the virtual physical address on the storage medium are specified,
Read the compression management information from the specified storage medium,
Using the address on the storage medium in which the data specified by the read request included in the read compression management information is stored, the compressed data is read and stored in the cache device,
Decompressing the compressed data in the cache device;
Returning the decompressed data to the host device;
The storage apparatus according to claim 3, wherein:
Before the processor stores the write data in the storage medium,
Storing the write data in the cache device;
Read the pre-update data of the write data from the storage medium in which the write data is to be stored in a compressed state and store it in the cache device;
From the storage medium storing the parity corresponding to the write data, the parity corresponding to the write data is read in a compressed state and stored in the cache device,
In the cache device, a post-update parity is generated from the pre-update data, the parity, and the write data.
The storage apparatus according to claim 3, wherein:
The processor is
Calculating a virtual physical address on a storage medium in which the write data is to be stored;
Read the compression management information from the storage medium in which the write data is to be stored,
Using the address on the storage medium in which the pre-update data of the write data included in the read compression management information is read, the pre-update data in a compressed state is read and stored in the cache device,
Decompressing the pre-update data in the compressed state in the cache device;
Specifying the storage medium storing the parity corresponding to the write data;
Calculating a virtual physical address on a storage medium in which the parity is stored;
Read the compression management information from the storage medium storing the parity,
Using the address on the storage medium where the parity is stored, included in the read compression management information, the compressed parity is read and stored in the cache device,
Decompressing the compressed parity in the cache device;
In the cache device, an updated parity is generated from the expanded pre-update data, the expanded parity, and the write data.
The storage apparatus according to claim 5, wherein:
The compression management information is stored in a predetermined position of the storage medium,
The compression management information includes an address of an area on the storage medium associated with the virtual physical address and a length in a compressed state of data stored in the area from a predetermined position of the storage medium. Entries are stored in the order of the virtual physical addresses, and when the processor reads the compression management information,
Based on the virtual physical address on the storage medium in which the access target data is stored, the entry corresponding to the access target data is read from the compression management information.
It is characterized by
The storage apparatus according to claim 4.
When the processor fails to read the entry of the compression management information from the storage medium,
Based on the address on the storage medium in which the entry is stored, the virtual physical address associated with the information recorded in the entry is specified,
Data necessary to regenerate the data stored in the specified virtual physical address is read from another storage medium belonging to the same RAID group as the storage medium, and based on the read data , Regenerate the data corresponding to the virtual physical address,
Storing the regenerated data in an unused area of the storage medium in which the entry is stored;
An address of an unused area in which the regenerated data is stored and a length of the regenerated data in a compressed state are recorded in the entry of the compression management information;
The storage apparatus according to claim 7, wherein
When the processor regenerates the data,
Based on the compression management information stored in each of the other storage media that belong to the same RAID group as the storage media, stored in an address on the other storage media corresponding to the specified virtual physical address. The compressed data being stored and stored in the cache device,
In the cache device, the compressed data is decompressed, and the data is regenerated by generating parity from the decompressed data.
The storage apparatus according to claim 8, wherein
When access to one storage medium constituting the RAID group becomes impossible,
The processor regenerates data stored in the first storage medium from other storage media belonging to the same RAID group as the first storage medium, and stores the data in an alternative storage medium.
Creating the compression management information corresponding to the regenerated data and storing it in the alternative storage medium;
The storage apparatus according to claim 7, wherein
A control method for a storage device, comprising a controller having one or more processors and a cache device, and a plurality of storage media, wherein the plurality of storage media constitutes one or more RAID groups,
One of the RAID groups includes (n + m) (n ≧ 1, m ≧ 1) storage media among the plurality of storage media,
The storage device manages the virtual volume provided to the host device in association with the RAID group,
The storage area of the virtual volume is divided into stripes of a predetermined size, and each of the n stripes sequentially selected from the stripe located at the head of the storage area of the virtual volume constitutes the RAID group (n + m) Managed in association with one of the storage media,
When the processor stores data stored in the n stripes in the storage medium,
Causing the cache device to generate m parities from the data stored in the n stripes;
The cache device compresses the data stored in the n stripes and the generated m parities,
Storing each of the compressed data stored in the n stripes in the storage medium associated with the stripe;
Each of the compressed m parities is stored in a storage area of m storage media not associated with the n stripes among (n + m) storage media constituting the RAID group. ,
A method for controlling a storage apparatus.
When the processor receives write data together with a write request specifying a storage location on the virtual volume from the host device,
1) Specify the storage medium in which the write data is to be stored from the storage location on the virtual volume,
2) calculating a virtual physical address on the specified storage medium;
The virtual physical address is an address when data in the storage area on the virtual volume is stored in the storage medium in an uncompressed state,
3) Select an unused area on the storage medium,
4) Store the write data compressed by the cache device in the unused area,
5) storing compression management information, which is information for managing the correspondence between the virtual physical address and the address of the unused area in which the write data is stored, in a storage medium in which the write data is stored;
The storage apparatus control method according to claim 11, wherein:
When the processor receives a read request specifying the storage location on the virtual volume from the host device,
Based on the storage position on the virtual volume, the storage medium storing the data specified by the read request and the virtual physical address on the storage medium are specified,
Read the compression management information from the specified storage medium,
Using the address on the storage medium in which the data specified by the read request included in the read compression management information is stored, the compressed data is read and stored in the cache device,
Decompressing the compressed data in the cache device;
Returning the decompressed data to the host device;
The storage apparatus control method according to claim 12, wherein: