WO2018078741A1

WO2018078741A1 - Storage apparatus

Info

Publication number: WO2018078741A1
Application number: PCT/JP2016/081714
Authority: WO
Inventors: 豊大島; 黒川　勇; 栄寿葛城; 敦田代
Original assignee: 株式会社日立製作所
Priority date: 2016-10-26
Filing date: 2016-10-26
Publication date: 2018-05-03

Abstract

A storage apparatus according to an aspect of the present invention includes: a storage controller that accepts write data from a host computer; and a plurality of storage devices that have predetermined amounts of physical areas and that compress data and store the compressed data in the physical areas. The storage controller manages more than (N + M) units of the storage devices as a parity group, and the storage controller generates M parities from N items of data written from the host computer and stores the M parities individually in different ones of the storage devices in the parity group. Meanwhile, in logical storage spaces of the storage devices, spare stripes serving as areas for storing data recovered at the time of rebuild processing are provided. The storage controller is characterized by inhibiting writing to the spare stripes in normal situations, and permitting the use of certain amounts of physical areas needed for storing data to be written to the spare stripes when triggered by the occurrence of a fault in the storage devices.

Description

Storage device

The present invention relates to a data recovery technique in a storage apparatus.

Many storage devices use a so-called RAID (Redundant Arrays of Inexpensive / Independent Disks) technology to make the system highly available. The RAID technology is a technology for calculating redundant data such as parity from write data received from a host device such as a host computer, and distributing and storing the write data and parity in different storage devices. By adopting RAID technology, even if a failure occurs in some storage devices and data cannot be read from the storage devices, data is regenerated using information stored in other storage devices. can do. The process of regenerating data is called a rebuild process.

A storage apparatus employing RAID technology divides a storage area on a volume provided to a host computer or the like into data units of a predetermined size called stripes, and uses N stripes (N ≧ 1) among these stripes. To generate M (M ≧ 1) redundant data (parity). Note that the size of the M redundant data generated here is equal to the stripe. These redundant data are also called stripes or sometimes called parity stripes. A set of N stripes and M parity stripes generated from the N stripes is called a stripe line.

The storage device stores (N + M) stripes belonging to one stripe line in different storage devices. By storing each stripe on a different storage device, even if a specific storage device fails, the data stored on the failed storage device is regenerated using the stripe stored on the other storage device. can do. Therefore, in this case, if the storage apparatus has at least (N + M) storage devices, data can be regenerated even if one or more storage devices fail.

However, in recent years, as disclosed in Patent Document 1, for example, there is a technique called distributed RAID in which stripes are distributed and stored in more than (N + M) storage devices. There is an effect that the processing load can be distributed by distributing and storing data in more storage devices.

In RAID technology, the data regenerated by the rebuild process is stored in a reserved storage area secured in advance. In general, an empty storage device (spare drive) that does not store any data or redundant data is secured as a spare storage area. However, in the storage device disclosed in Patent Document 1, data or redundant data is stored. A method is disclosed in which a part of each storage device used for storage is reserved as an unused area (spare area), and the regenerated data is distributed and stored in the spare area of each storage device. ing. By doing so, the write destination of the regenerated data is distributed, so that it is possible to expect an effect that the load can be distributed even during the rebuild process.

International Publication No. 2014/115320

By the way, in recent years, with increasing needs for bit cost reduction, a technique for reducing the amount of data stored in a storage device has appeared. As one of the techniques for this purpose, a technique for recording data by lossless compression (hereinafter simply referred to as compression) is known. When the data size is reduced using a compression technique and recorded in a storage device, the data retention cost (bit cost of the storage medium, power consumption cost of the storage device, etc.) can be reduced.

However, when compressing data, the compression ratio varies depending on the data contents. Therefore, when the compression rate of the data stored in the storage device is worse than originally assumed, there may occur a case where data cannot be stored in the storage device. For example, when the rebuild process described above is performed, there may occur a situation in which the regenerated data cannot be stored in the spare area. In this case, the data may be lost.

A storage apparatus according to an aspect of the present invention includes a storage controller that receives write data from a host computer, a predetermined amount of physical area, and compresses the data to the physical area. A plurality of storage devices. The storage controller manages more than (N + M) storage devices as a parity group, generates M parity from N data written from the host computer, and generates the N data and Each of the M parities is stored in different storage devices in the parity group. On the other hand, on the logical storage space of the storage device, in addition to a data stripe that is an area for storing data and a parity stripe that is an area for storing parity, a spare stripe that is an area for storing data restored during the rebuild process is provided. It has been.

The storage controller normally suppresses the use of the physical area necessary for storing the data written to the spare stripe. Then, when a failure occurs in the storage device, use of a physical area of an amount necessary for storing data to be written in the spare stripe is permitted.

The present invention can prevent data loss when a storage device fails.

It is a block diagram of the storage apparatus based on the Example of this invention. It is a block diagram of SSD. FIG. 3 is a conceptual diagram showing an example of arrangement of each stripe in the storage apparatus according to the embodiment. It is a figure showing the relationship between a logical volume and a virtual parity group. It is the conceptual diagram which showed the relationship between a virtual parity group and a parity group. It is explanatory drawing of a spare area | region. It is a figure explaining the concept of the logical address space and compression function which SSD provides. It is a figure which shows the structure of a logical physical capacity management table. It is a figure which shows the structure of PG management table. It is a figure which shows the structure of a depletion threshold value management table. It is a figure which shows the structure of a spare conversion table. It is a flowchart of an I / O process. It is a flowchart of a depletion check process. It is a flowchart of a rebuild process. It is a flowchart (2) of a rebuild process. It is a flowchart of a depletion capacity update process. It is a flowchart of a destage process. 10 is a flowchart of a depletion check process according to the second embodiment.

Hereinafter, a storage apparatus according to an embodiment of the present invention will be described with reference to the drawings. Note that the present invention is not limited to the embodiments described below.

In the following description, “program” may be used as the subject, but precisely, the program is executed by a CPU (processor) to perform a predetermined process. However, to prevent the explanation from becoming redundant, the program may be described as the subject. Therefore, in the following description, the process described with the program as the subject actually means that the CPU executes the process. Further, part or all of the program may be realized by dedicated hardware. Various programs may be installed in each apparatus by a program distribution server or a computer-readable storage medium. The computer-readable storage medium is a non-transitory computer-readable medium such as a non-volatile storage medium such as an IC card, an SD card, or a DVD.

FIG. 1 shows a configuration of an information processing system (computer system) including a storage apparatus according to an embodiment of the present invention. The information processing system includes a storage apparatus 1 and one or more hosts 2.

The storage device 1 is connected to the host 2 via a storage area network (SAN) 3. The host 2 is a computer that executes an application program used by a user, and operates as a requester that issues an I / O request to the storage apparatus 1. In the first embodiment, the host 2 is a general-purpose computer such as a main frame or a personal computer. An application program or the like executed on the host 2 issues an I / O request such as a read command or a write command to a volume (logical volume) provided by the storage apparatus 1.

SAN 3 is a network composed of, for example, fiber channel cables and fiber channel switches.

The storage apparatus 1 includes a storage controller (hereinafter also abbreviated as “controller” or “DKC”) 10 and a disk unit 11 including a plurality of storage devices. The storage controller 10 includes a CPU 12 that executes control such as I / O processing performed in the storage apparatus 1, a memory 13, a front-end interface (FE I / F) 14 that is a data transfer interface with the host 2, and a disk unit 11. A back-end interface (BE I / F) 15 and a management I / F 16 which are data transfer interfaces are interconnected via an internal switch 17. In addition, the number of each component, such as CPU12 and FE I / F14, is not limited to the number shown by FIG.

The disk unit 11 is equipped with a plurality of storage devices for storing write data from the host 2, and each storage device is connected to the storage controller 10 via the BE I / F 15. The storage device is, for example, an SSD 21 that uses a nonvolatile semiconductor memory such as a flash memory as a storage medium, or an HDD 22 that uses a magnetic disk as a storage medium. Hereinafter, a case where all the storage devices mounted on the disk unit 11 are SSDs 21 will be described unless otherwise specified.

BE I / F 15 has at least an interface controller and a transfer circuit. The interface controller is a component for converting a protocol (SAS in one example) used by the storage device into a communication protocol (PCI-Express as an example) used in the storage controller 10. The transfer circuit is used when the storage controller 10 transfers data (read, write) to the storage device.

FE I / F 14 has at least an interface controller and a transfer circuit, similarly to BE I / F 15. The interface controller of the FE I / F 14 is for converting the communication protocol (for example, fiber channel) used in the data transfer path between the host 2 and the storage controller 10 and the communication protocol used in the storage controller 10. belongs to.

The CPU 12 performs various controls of the storage device 1. The memory 13 is used to store programs executed by the CPU 12 and various management information of the storage device 1 used by the CPU 12. The memory 13 is also used for temporarily storing I / O target data for the storage device. Hereinafter, the storage area in the memory 13 used for temporarily storing the I / O target data for the storage device is referred to as “cache”. The memory 13 is composed of a volatile storage medium such as DRAM or SRAM, but as another embodiment, the memory 14 may be composed of a non-volatile memory.

FIG. 2 is a diagram illustrating a configuration example of the SSD 21. The SSD 21 includes an SSD controller 200 and a plurality of FM chips 206. The SSD controller 200 includes a processor (CPU) 201, an upstream I / F 202, a downstream I / F 203, a memory 204, and a compression / decompression circuit 207, which are interconnected via an internal connection switch 205.

In general, “SSD” is a storage device using a semiconductor memory, particularly a nonvolatile semiconductor memory, and means a device having the same form factor as an HDD. However, in the present embodiment, SSD is used as a word meaning an entire storage device including a plurality of flash memories and a controller for controlling them, and the external shape and the like are limited to general HDD and SSD form factors. It is not something. In addition to the flash memory, the nonvolatile semiconductor memory used for the SSD is a magnetoresistive memory such as MRAM (Magnetoretic Random Access Memory), a resistance variable memory ReRAM (Resistivity Random Access Memory), and a ferroelectric memory. Various semiconductor memories such as a certain FeRAM (Ferroelectric random access memory) may be used.

The upstream I / F 202 is an interface controller for performing communication between the SSD 21 and the storage controller 10. The upstream I / F 202 is connected to the BE I / F 15 of the storage controller 10 via a transmission line (SAS link or PCI link). On the other hand, the downstream I / F 203 is an interface controller for performing communication between the SSD controller 200 and the FM chip 206.

The CPU 201 performs processing related to various commands coming from the storage controller 10. The memory 204 stores a program for controlling the SSD 21, management information, and the like. A part of the memory 204 is also used as a buffer for temporarily storing write data transmitted from the storage controller 10 together with a write command and data read from the FM chip 206. As the memory 204, a volatile memory such as a DRAM is used. However, a nonvolatile memory may be used for the memory 202.

FM chip 206 is a non-volatile semiconductor memory chip such as a NAND flash memory. As is well known, reading / writing of data in the flash memory is performed for each area of a predetermined size (for example, 8 KB) called a page in a set of a plurality of cells. Data erasure is performed for each block which is a set of pages.

The compression / decompression circuit 207 is hardware having a function of compressing data or decompressing the compressed data. The SSD 21 can compress the write data from the storage controller 10 by the compression / decompression circuit 207 and store it in the FM chip 206. However, in principle, the storage controller 10 performs data compression and data expansion transparently.

In the above, an example in which data compression and decompression is performed by hardware such as the compression / decompression circuit 207 has been described. However, data compression and decompression are not necessarily performed using hardware. The SSD 21 may be configured such that data is compressed and decompressed by the CPU 201 executing a program that performs compression and decompression.

Next, the concept of storage areas used in the storage device 1 will be described. The storage apparatus 1 stores the write data from the host 2 in the storage device (SSD 21 or HDD 22), but does not directly provide the host 2 with the storage space provided by the storage device. The storage apparatus 1 provides the host 2 with a storage space different from the storage space provided by the storage device. In this embodiment, the storage space provided to the host 2 is called a “logical volume”.

The storage device 1 manages the storage area on the logical volume by dividing it into partial areas of a predetermined size called “stripes”. When the storage device 1 stores the write data from the host 2 in the storage device, a predetermined operation such as exclusive OR is performed on a predetermined number (for example, N) of stripes to obtain the same size as the stripe. M redundant data (parity) are created. Since the created redundant data is the same size as the stripe, it is called a “parity stripe”. On the other hand, a stripe used for creating redundant data is sometimes called a “data stripe”. In this embodiment, a set of a parity stripe and a data stripe used to generate the parity stripe is referred to as a “stripe line”.

In RAID technology, each stripe (data stripe and parity stripe) included in one stripe line is stored in a different storage device. Hereinafter, a case where the number of data stripes included in one stripe line is N and the number of parity stripes is M will be described (N and M are integers of 1 or more). In this case, each data stripe and parity stripe included in one stripe line are distributedly stored in (N + M) storage devices. In the traditional RAID technology, a plurality of stripe lines included in one logical volume are all stored in the same (N + M) storage devices, and the combination of the (N + M) storage devices is called “RAID group”. be called. In the storage apparatus 1 according to the present embodiment, the distributed RAID technology is used as in Patent Document 1, and a plurality of stripe lines included in one logical volume are distributed to more than (N + M) storage devices. Stored.

An example of the arrangement of each stripe in the storage apparatus 1 according to this embodiment will be described with reference to FIGS. FIG. 3 is a diagram conceptually showing the relationship between the stripe line and the storage area of each storage device. Hereinafter, an example in which the number of data stripes included in one stripe line is 3 and the number of parity stripes is 1 will be described.

In FIG. 3, elements 21-1 to 21-8 represent storage spaces (“logical address spaces” to be described later) provided to the storage controller 10 by the respective SSDs 21 (SSD 21-1 to SSD 21-8). Each box in 21-1 to 21-8 represents a stripe (data stripe or the like). Among each box, a box described as “D” represents a data stripe, and a box described as “P” means a parity stripe. A box labeled “S” is a stripe used when a failure occurs in the SSD 21, and in the present embodiment, this is called a “spare stripe” or “spare area”. Details will be described later.

In FIG. 3, elements 320-1 and 320-2 each represent a stripe line. In the traditional RAID technology, each stripe belonging to the same stripe line exists at the same position (address) on the storage device. However, in the storage apparatus 1 according to this embodiment, each stripe belonging to the same stripe line is not necessarily stored. It does not have to exist at the same address on the device.

In the traditional RAID technology, each stripe line exists in the same storage device. However, in the storage apparatus 1 according to this embodiment, each stripe line 320-1, 320-2 has more than (N + M) storages. (N = 3 and M = 1 in the example of FIG. 3). In this embodiment, an example will be described in which each stripe line belonging to one logical device is distributed and arranged in eight storage devices, and a combination of these eight storage devices is called a “parity group”.

When the host 2 issues an I / O request (read command or write command) to the storage apparatus 1, an address (called LBA) on the logical volume is designated. The storage apparatus 1 needs to convert the received logical volume address to the address of the SSD 21. The outline of this address translation will be described with reference to FIGS.

FIG. 4 is a diagram showing the relationship between logical volumes and virtual parity groups. In FIG. 4, an element 300 represents a logical volume, and each box (boxes such as D (0) and D (1)) in the logical volume 300 represents a stripe (data stripe). The logical volume does not include a parity stripe.

On the other hand, the elements 350-0.350-1, 350-2, and 350-3 are storage devices that are virtually defined by the storage apparatus 1 for address conversion, and are hereinafter referred to as virtual storage devices. . A combination of virtual storage devices 350-0, 350-1, 350-2, and 350-3 (element 340 in the figure) is called a virtual parity group. The virtual parity group may be abbreviated as VPG. The virtual parity group is similar to the RAID group in the traditional RAID technology, and each stripe belonging to the same stripe line 320 is located at the same address on the virtual storage device. For example, in FIG. 4, stripes D (0), D (1), D (2), and (P0) are stripes belonging to the same stripe line 320, but these are all at the first address (LBA = 0) of the virtual storage device. positioned. Therefore, the process of converting the address (LBA) on the logical volume into the address on the virtual parity group (address on the virtual storage device) can be realized by a relatively simple calculation similar to the traditional RAID technology. .

FIG. 5 is a conceptual diagram showing the relationship between virtual parity groups and parity groups. Each stripe included in the stripe line on the virtual parity group is distributed and arranged in each storage device (SSD 21) in the parity group. For example, as shown in FIG. 5, each stripe included in the stripe line 320-1 on the virtual parity group is arranged in the SSD 21-1, 21-2, 21-3, 21-4, and the stripe line 320-2 The stripes included in are arranged in SSDs 21-2, 21-3, 21-4, and 21-5. However, the arrangement example of each stripe on the SSD 21 described above is an example, and other arrangements may be performed. There is an advantage that the load of I / O processing is distributed by distributing stripes to more storage devices (SSDs 21) than the virtual storage devices included in the virtual parity group.

Various methods can be adopted as a method for converting the address on the virtual parity group (virtual storage device) into the address of the parity group (storage device). For example, the storage apparatus 1 may hold the mapping information between the stripes on the virtual parity group (virtual storage device) and the stripes of the parity group (storage device) for each stripe. Alternatively, as in the example shown in Patent Document 1, the storage apparatus 1 may provide a relationship that can be expressed by some conversion formula (or function) between the address on the virtual storage device and the address of the storage device. Good.

Subsequently, the spare area will be described with reference to FIG. A storage apparatus employing the traditional RAID technology has one or a plurality of spare storage devices in case a storage device in a RAID group fails, and this storage device is called a “spare drive”. During normal operation (when no storage device has failed), no data is written to the spare drive. When the storage device fails and the data recorded in the failed storage device is restored (rebuilt) by the RAID technology, the restored data is stored in the spare drive.

On the other hand, the storage apparatus 1 according to the present embodiment is provided with an area for storing restored data in advance in each storage device in the parity group, as in the technique described in Patent Document 1. This area is not used during normal times (when no failure has occurred in the storage device), but is used when a failure has occurred in any storage device. As described above, this area is called a “spare area” or “spare stripe”. In this embodiment, the ratio of the spare stripe to the total stripe in the storage device, that is,
A value expressed by the number of spare stripes in the storage device ÷ the total number of stripes in the storage device is referred to as “spare area ratio”.

FIG. 6 shows the same parity group configuration as FIG. In FIG. 6, a box labeled “S” is a spare stripe. For example, if one SSD 21 (SSD 21-1 in the figure) fails, some spare stripes are used to store the restored data (the box labeled “S → D” in FIG. 6 is restored). Represents the spare stripe used to store the data (or parity) generated).

When the restored data is stored in a specific storage device (spare drive) as in the traditional RAID technology, the load is concentrated on the spare drive. As in this embodiment (or Patent Document 1), if a spare stripe is provided for each storage device in the parity group and the restored data is stored in the spare stripe of each storage device, the load is specified at the time of data recovery. It is possible to prevent concentration on the storage device. Details of processing at the time of data recovery will be described later.

Subsequently, a compression function of the SSD 21 and a storage space (referred to as a logical address space) provided by the SSD 21 to the storage controller 10 will be described with reference to FIG. As described above, the SSD 21 compresses the write data from the storage controller 10 by the compression / decompression circuit 207 and stores it in the FM chip 206. However, in principle, data compression is performed transparently for the storage controller 10.

Specifically, the following processing is performed. The SSD 21 provides a storage space of a predetermined size to the storage controller 10. In this embodiment, this storage space is called a “logical address space”. In FIG. 7, element 20 represents a logical address space. When the storage controller 10 writes data to the SSD 21, it issues a write command specifying the address in this logical address space and the size of the write target data. As an example, it is assumed that the storage controller 10 transmits a write command (and 64 KB write data) for writing 64 KB data to the head (address 0) of the logical address space to the SSD 21. Further, it is assumed that the SSD 21 compresses the 64 KB data to result in 8 KB compressed data, and the 8 KB compressed data is stored in the FM chip 206.

In this state, when the storage controller 10 reads this data, by issuing a read command designating the head of the logical address space (address 0) and the read data size (for example, 64 KB), the 64 KB data stored earlier. Can be read out. During the reading process, the SSD 21 decompresses (restores) 8 KB of compressed data by the compression / decompression circuit 207 to obtain 64 KB of data before compression, and the SSD 21 returns the decompressed data to the storage controller 10. Because. Therefore, the storage controller 10 recognizes as if the data is stored in an uncompressed state in the logical address space (even if the data is actually compressed and stored).

The relationship (mapping) between the area in the logical address space and the storage area (called “physical area”) of the FM chip 206 may not be static. The SSD 21 manages the mapping between each area on the logical address space and the physical area of the FM chip 206, and the mapping is dynamically changed. For example, in the example of FIG. 7, the data (the compressed data) written in the area 401 on the logical address space represents a state where it is stored in the physical area 411 of the FM chip 206. At this time, the SSD 21 manages information that the physical area 411 is mapped to the area 401. Here, when the SSD 21 receives an update request for the area 401 from the storage controller 10, the SSD 21 generates compressed data of the data written together with the update request, and secures a new physical area in which the compressed data is to be stored. The compressed data is stored in the reserved physical area. Then, the SSD 21 changes the mapping so that the physical area secured this time is mapped to the area 411.

In this embodiment, the size of the logical address space provided by the SSD 21 to the storage controller 10 is called “logical capacity”. On the other hand, the total capacity of the storage areas of all the FM chips 206 of the SSD 21 is called “physical capacity”. The logical capacity is set larger than the physical capacity. This is because the data written in the SSD 21 is compressed, so that it is expected that a larger amount of data than the physical capacity can be stored in the SSD 21. For example, when the logical capacity is set to 8 times the physical capacity and the physical capacity of the SSD 21 is 1 TB, the logical capacity is 8 TB (that is, the storage controller 10 recognizes that the SSD 21 has a storage capacity of 8 TB). ).

However, as is well known, when data compression is performed, the data size after compression can vary depending on the content of the data. When the logical capacity is set to 8 times the physical capacity, if all the data written from the storage controller 10 to the SSD 21 is compressed to a size of 1/8 or less, all the data is stored in the FM chip 206. it can. However, the data size after compression may not be less than 1/8 of the data size before compression. When such data is written to the SSD 21, there may be no physical area in which data (after compression) can be written.

In this embodiment, the state where there is no physical area in which data can be written is referred to as “the physical area is depleted”. When the physical area is depleted, the SSD 21 cannot store the write data received from the storage controller 10 in the physical area. Therefore, the SSD 21 according to the present embodiment has a predetermined amount of physical area into which data can be written (for example, 10% of all physical areas). %), The write request from the storage controller 10 is not accepted thereafter. Details will be described later.

When the SSD 21 (that is, a storage device having a compression function) is used for the parity group described with reference to FIGS. 3 and 6, the following problems may occur. As shown in FIGS. 3 and 6, the storage controller 10 provides a spare stripe in addition to the data stripe and the parity stripe on the storage space (logical address space) of the SSD 21 constituting the parity group. The spare stripe is an area that is not used (write is not performed) unless any storage device (SSD 21) in the parity group fails.

It is assumed that writing is performed to almost all data stripes and parity stripes on the logical address space of the SSD 21, and as a result, there is almost no usable physical area of the SSD 21. If any SSD 21 in the parity group fails in this state, the storage controller 10 restores the information stored in the data stripe (or parity stripe) of the failed SSD 21 and fails the restored information. An attempt is made to write to a spare stripe of no SSD 21. However, if the usable physical area is depleted at this time, the storage controller 10 cannot write the restored information and data loss occurs. In the storage apparatus 1 according to the present embodiment, control is performed so that data restored during the rebuild process cannot be written. Details of this control will be described below.

First, management information related to the rebuild process described in the present embodiment will be described among the management information of the storage apparatus 1. FIG. 8 shows an example of the logical physical capacity management table 800. The logical physical capacity management table 800 is a table for recording information on the capacity (logical capacity, physical capacity) of each storage device, and is stored in the memory 13.

Information about one storage device is stored in each row (record) of the logical physical capacity management table 800. This record has columns of PG # (801), PDEV # (802), logical capacity (803), physical capacity (804), logical usage (805), physical usage (806), and status (807). .

PDEV # (802) is an identification number of the storage device. PG # (801) is the identification number of the parity group to which the storage device belongs. In this embodiment, the identification number of the storage device is referred to as “PDEV number”. The PDEV number may be written as “PDEV #”. Further, the identification number of the parity group is called “PG number” and may be written as “PG #”.

The logical capacity (803) represents the logical capacity of the storage device, and the physical capacity (804) represents the physical capacity of the storage device.

The logical usage (805) and physical usage (806) represent the logical usage and physical usage of the storage device, respectively. The definitions of the logical usage amount and the physical usage amount in the present embodiment are as follows. The logical usage is the total size of the area accessed from the storage controller 10 (the area where writing has been performed) in the logical address space. The physical usage is the total size of the physical area allocated to the area written from the storage controller 10. In FIG. 5, when the storage controller 10 writes only to the

areas

401, 402, and 403 on the logical address space 20, and the size of each of the

areas

401, 402, and 403 is 64 KB, the logical usage amount is 192 KB. It is. In addition, the data after compression of the data written from the storage controller 10 in each of the

areas

401, 402, and 403 (which is assumed to be compressed to 8 KB) is written in the

areas

411, 412, and 413 in the FM chip. In this case, the physical usage amount is 24 KB.

Status (807) represents the state of the storage device. When the status (807) is “1”, this indicates that the storage device has failed and cannot be accessed.

Among these pieces of information, for information (for example, physical capacity and physical usage) that must be obtained from the storage device (SSD 21), the storage controller 10 stores the information (logical capacity (803), A request for acquiring physical capacity (804), logical usage (805), and information stored in physical usage (806) (for example, periodically) is issued, and the obtained information is logical capacity (803). , Physical capacity (804), logical usage (805), and physical usage (806). Since the logical capacity and the physical capacity are not variable values, the storage controller 10 acquires the logical capacity and the physical capacity once at the start of use of the storage device, and thereafter periodically acquires the physical usage from the storage device. You may make it acquire.

FIG. 9 is a configuration example of the PG management table 900. The PG management table 900 is a table for managing information about parity groups, and is stored in the memory 13. Information about one parity group is stored in each record of the PG management table 900. Hereinafter, the contents of each column in the record will be described.

The column PG # (801) is a column for storing the PG number, and the drive type (902) stores information indicating the types of storage devices constituting the parity group. In the example of FIG. 9, when “NF” is stored in the drive type (902), it indicates that the parity group is configured by the SSD 21. When “HDD” is stored, the parity group is the HDD 22. It is composed of In principle, the types of storage devices used for one parity group are the same.

The capacity (903) represents the logical capacity of the storage devices constituting the parity group. In principle, the storage devices used for one parity group are the same. However, storage devices having different logical capacities may be included in the parity group. When storage devices having different logical capacities are included in the parity group, the logical capacity of the storage device having the smallest logical capacity among the storage devices in the parity group is stored in the capacity (903).

Compression (904) indicates whether the storage device in the parity group has a compression function. When “ON” is stored in the compression (904), it means that the storage device in the parity group has a compression function, and the data written in the parity group is compressed.

The copy pointer (905) is information used at the time of rebuilding, and stores the address (LBA) of the rebuilding area in the failed storage device area (logical address space area). If none of the storage devices in the parity group has failed and the rebuild process has not been performed, an invalid value (referred to as “NULL”) is stored in the copy pointer (905). In this embodiment, the invalid value (NULL) is a value that is not normally used for the address of the storage device, such as -1.

When the storage apparatus 1 according to the present embodiment performs the rebuild process, data recovery is performed in order from the first stripe of the failed storage device. The initial value of the copy pointer (905) is recorded as 0 (LBA of the first stripe), and when rebuilding is completed for the stripe indicated by the copy pointer (905), the copy pointer (905) is set to the address of the next stripe. Will be updated. Therefore, it means that the data of the address less than the copy pointer (905) in the area in the failed storage device has already been restored and stored in the spare stripe of another storage device, and the copy pointer (905) This means that the data at the above address has not been restored yet. Details of how the copy pointer (905) is used will be described later.

FIG. 10 shows a configuration example of the depletion threshold management table 1000. As described above, when the physical area of the SSD 21 is exhausted, the storage controller 10 cannot write to the SSD 21. Therefore, the storage controller 10 needs to check whether the physical area of the SSD 21 is exhausted. The depletion threshold management table 1000 is a table for managing the usage state of the physical area, and is stored in the memory 13.

Note that the depletion threshold value management table 1000 manages the usage state of the storage devices constituting the parity group for each parity group. The record of the depletion threshold management table 1000 has columns of PG # (1001), a depletion threshold (1002), a physical capacity usage rate (1003), and an I / O suppression flag (1004). PG # (1001) is an identification number of a parity group to be managed by the record of the depletion threshold management table 1000.

Before describing the depletion threshold (1002), the physical capacity usage rate will be described. The physical capacity usage rate is a value that can be calculated for each storage device. For example, when the physical usage amount of a certain SSD 21 is u and the physical capacity is p, the physical capacity usage rate of the SSD 21 is
u ÷ p
(Note that, as shown in FIG. 10, the physical capacity usage rate (1003) is expressed as a percentage, so the physical capacity usage rate (1002) A value obtained by multiplying u ÷ p by 100 is stored). Since the physical usage and the physical capacity of the storage device are stored in the physical usage (806) column and the physical capacity (804) column of the logical physical capacity management table 800, respectively, the storage controller 10 stores the logical physical capacity management table 800. The physical capacity usage rate of each storage device can be calculated by referring to.

However, in the storage apparatus 1 according to the present embodiment, the physical capacity usage rate of the parity group is stored and managed in the depletion threshold management table 1000 (its physical capacity usage rate (1003)). In this embodiment, the maximum physical capacity usage rate is defined as the parity group usage rate among the physical capacity usage rates of a plurality of storage devices constituting the parity group. Therefore, the storage controller 10 calculates the physical capacity usage rate of each storage device belonging to the parity group, and stores the maximum physical capacity usage rate in the physical capacity usage rate (1003) of the depletion threshold management table 1000.

The depletion threshold (1002) is an index value used by the storage controller 10 to determine whether or not data can be written to the parity group. Specifically, when the physical capacity usage rate (1003) of the parity group is equal to or greater than the value stored in the depletion threshold (1002), the storage controller 10 does not write data to the parity group. A value such as 90% is set as the depletion threshold (1002). The depletion threshold (1002) is a value that does not vary during normal times (when no failure occurs in the storage device), and the initially set value is maintained.

In the storage apparatus according to the present embodiment, a value slightly smaller than the value that should be originally set is set as the depletion threshold (1002) at the time of initial setting (when a parity group is defined). During the rebuild process, the depletion threshold (1002) is returned to a value that should be originally set. The value that should be originally set in the depletion threshold (1002) is called the “pre-correction depletion threshold”. Assuming that the pre-correction depletion threshold is Y% and the spare area ratio is X%, “Y% −X%” is set at the time of initialization as shown in FIG. At the time of rebuild processing, the original value is returned to Y%. During normal operation (when rebuilding is not being performed), the physical area that should be used for the spare stripe is secured and restored because "Y% -X%" is set in the depletion threshold (1002). It is possible to prevent a shortage of a physical area in which data (data written to the spare stripe) is to be written.

The logical physical capacity management table 800 and the depletion threshold management table 1000 are periodically updated by a configuration information management program that is periodically executed by the CPU 12 of the storage controller 10. By writing data from the host 2 to the logical volume, the logical usage (805) and physical usage (806) in the logical physical capacity management table 800 are updated. Accordingly, the physical capacity usage rate (1003) in the depletion threshold management table 1000 is also updated.

FIG. 11 shows a configuration example of the spare conversion table 1100. The spare conversion table 1100 is used to specify the position of the storage destination stripe (spare stripe) of the rebuilt data when the storage device fails. In each record of the spare conversion table 1100, the position information of the stripe in the failed storage device and the position information of the corresponding spare stripe are recorded.

Each column of the record of the spare conversion table 1100 will be described. PDEV # (1101) and address (1102) are stripe position information in the failed storage device. Specifically, PDEV # (1101) is the PDEV number of the failed storage device, and address (1102) is the address of the stripe in the failed storage device. Note that information used as an address is, for example, LBA (Logical Block Address).

Meanwhile, the spare PDEV # (1103) and the address (1104) are spare stripe position information. Specifically, spare PDEV # (1103) is the PDEV number of the storage device where the spare stripe exists, and address (1104) is the address of the spare stripe.

For example, in FIG. 11, the PDEV # (1101) of the first record is “0x0000”, the address (1102) is “0”, the spare PDEV # (1103) is “0x0004”, and the address (1104) is “0x1000”. This is because the data stored in the stripe with the address 0 (first stripe) in the PDEV with the PDEV # of 0x0000 is the stripe (spare) with the address of 0x1000 in the PDEV with the PDEV # of 0x0004. Stored in a stripe).

In this embodiment, each record in the spare conversion table 1100 is created when any storage device in the parity group fails. However, each record in the spare conversion table 1100 may be created at a time before the storage device fails. For example, the storage controller 10 may create each record in the spare conversion table 1100 at the start of using the parity group. Further, a spare stripe (spare stripe specified by spare PDEV # (1103) and address (1104)) associated with the stripe of the failed storage device (stripe specified by PDEV # (1101) and address (1102)). Since the determination method is arbitrary and is not directly related to the present embodiment, description thereof is omitted here.

The storage device 1 also stores information other than the management information described above in the memory 13. For example, the information for managing the mapping between the logical volume and the virtual parity group described with reference to FIG. 4 and the information for managing the mapping between the virtual parity group and the parity group described with reference to FIG. . However, since these pieces of information are not directly related to the description of the present embodiment, the description is omitted.

Hereinafter, the flow of various processes performed by the storage apparatus 1 will be described. The storage apparatus 1 stores several programs for performing various processes in the memory 13. Here, only an outline of the program related to the description of the present embodiment will be described. In the memory 13, an I / O program for processing an I / O command (read command or write command; hereinafter simply referred to as “command”) received from the host 2, and a destage program for writing data to the storage device The exhaustion check program for checking whether the physical area is exhausted and the rebuild program for performing the rebuild process are stored.

First, the flow of I / O processing will be described with reference to FIG. The I / O processing is performed when an I / O command is received from the host 2 to the logical volume, and is executed by the I / O program. In the following, in order to avoid complicated explanation, the access range (area on the logical volume) specified by the I / O command from the host 2 is smaller than one stripe size, and there are a plurality of access ranges. An example of a case that does not cross the stripe will be described.

Step 1201: The I / O program converts the address of the logical volume specified by the I / O command into an address on the virtual storage device constituting the virtual parity group.

Step 1202: The I / O program converts the address on the virtual storage device obtained in Step 1201 into the PDEV number of the storage device (SSD 21) constituting the parity group and the address on this storage device. In this embodiment, this conversion process is sometimes referred to as “VP conversion”. The address obtained here is an address on the logical address space provided by the SSD 21 to the storage controller 10.

Step 1203: The I / O program determines whether the I / O command received from the host 2 is a write command or a read command. If the command is a write command (step 1203: Y), then step 1204 is performed. If the command is a read command (step 1203: N), then step 1211 is performed.

Steps 1204 and 1205: In step 1204, the I / O program performs a depletion check process. Although details of this process will be described later, the I / O program calls the depletion check program to cause the depletion check program to perform the depletion check process. If it is detected as a result of step 1204 that the physical area has been exhausted (step 1205: Y), the CPU 12 next executes step 1208. If the physical area has not been exhausted (step 1205) 1205: N), then step 1206 is performed.

Step 1206: When a write command is issued from the host 2, data (write data) to be written to the logical volume is sent from the host 2 to the storage device 1 together with the write command. In step 1206, the I / O program stores this write data in the cache. Note that when data is stored in the cache, the I / O program stores the data with write position information (specifically, PDEV number and LBA). The data stored in the cache is later written to the storage device (at an appropriate timing). The process in which the storage controller 10 writes data to the storage device is called “destage”. The flow of destage processing will be described later.

Step 1207: The I / O program responds to the host 2 that the write process is completed, and ends the process.

Step 1208: This process is performed when it is determined that the physical area is exhausted. Here, the I / O program does not write the write data sent from the host 2 and ends the processing (responses that the data cannot be written to the host 2).

Step 1211: The I / O program refers to the status (807) in the logical physical capacity management table 800 to determine whether the access destination storage device obtained in Step 1202 is inaccessible (status (807) is “1”. ” When the access is impossible (step 1211: Y), step 1212 is performed next, and when the access is possible (step 1211: N), step 1214 is performed next.

Step 1212: The I / O program compares the address of the access target storage device obtained in Step 1202 with the copy pointer (905) of the parity group to which the access target storage device belongs. If the copy pointer (905) is larger than the address of the access target storage device (step 1212: Y), it means that the data stored at that address has been restored by the rebuild process. In this case, step 1214 (normal read processing) is performed next. Conversely, when the address of the access target storage device is not less than the copy pointer (905) (step 1212: N), step 1213 (collection read processing) is performed next.

Step 1213: Here, collection read processing is performed. Since the collection read process is a known process, only the outline will be described here. The I / O program reads data from all stripes (data stripe and parity stripe) belonging to the same stripe line as the access target stripe, and restores the read target data using the read data.

Step 1214: The I / O program performs normal read processing. Unlike the collection read process, the normal read process refers to a process of reading data stored in the storage device as it is without performing data restoration. In step 1214, if the access target storage device has not failed (if the determination in step 1211 is negative), the I / O program reads data from the access target storage device. On the other hand, if the access target storage device is out of order (if the determination in step 1211 is affirmative), the data (restored data) is recorded in the spare stripe. Therefore, the I / O program refers to the spare conversion table 1100 to identify the spare stripe where the restored data is stored, and reads the data from the identified spare stripe.

For example, an example in which the contents of the spare conversion table 1100 are shown in FIG. 11 and the PDEV # and the address obtained in step 1202 are 0x0000 and 0x0100 will be described. Referring to the second row of spare conversion table 1100 (PDEV # (1101) and address (1102) are rows of 0x0000 and 0x0100, respectively), spare PDEV # (1103) and address (1104) are 0x0005 and 0x1000, respectively. is there. In this case, the I / O program reads data from the address 0x1000 of the storage device whose PDEV # is 0x0005.

Step 1215: The I / O program returns the data read in Step 1214 or the data restored in Step 1213 to the host 2 and ends the process.

Next, the flow of processing (destage) when the storage controller 10 writes data to the storage device will be described with reference to FIG. The destage process is executed when the data stored in the cache is written to the storage device in step 1206 described above, or when the storage controller 10 writes the restored data to the spare stripe in the rebuild process (described later). The This process is performed by a destage program.

Step 1701: The destage program selects data that has not yet been written to the storage device from the data on the cache, and obtains information on the write position assigned to the data. The destaging program performs the depletion check process by calling the depletion check program. At this time, the destaging program passes the write position information to the depletion check program, and determines whether the physical area of the write destination storage device is depleted.

Step 1702: If it is determined in step 1701 that the physical area is exhausted, step 1704 is performed next. If the physical area is not exhausted, step 1703 is performed next.

Step 1703: The destage program issues a write command to the storage device that is the data write destination, writes the data, and ends the process.

Step 1704: The destage program ends the process without writing to the storage device.

Subsequently, the flow of the depletion check process performed in step 1204 will be described with reference to FIG. This process is executed by the depletion check program. The depletion check program starts processing by being called from the I / O program (or rebuild program). When the depletion check program is called, the PDEV number of the access target storage device and the address (LBA) of the access target stripe are received from the caller program (I / O program or the like).

Step 1303: The depletion check program determines whether the access target storage device has a compression function. Specifically, the depletion check program first refers to the logical physical capacity management table 800 to identify the parity group number of the parity group to which the access target storage device belongs, and then refers to the PG management table 900. It is determined whether the compression (904) of the parity group to which the access target storage device belongs is “ON”. When the access target storage device does not have a compression function (step 1303: N), step 1307 is performed next. When the access target storage device has a compression function (step 1303: Y), step 1305 is performed next.

Step 1305: The depletion check program refers to the depletion threshold management table 1000 to determine whether the physical capacity usage rate (1003) of the parity group to which the access target area belongs is greater than or equal to the depletion threshold (1002). When the physical capacity usage rate (1003) is equal to or greater than the depletion threshold (1002) (step 1305: N), the depletion check program next executes step 1306. When the physical capacity usage rate (1003) is less than the depletion threshold (1002) (step 1305: Y), step 1307 is executed next.

Step 1306: The depletion check program determines that physical area depletion has occurred in the parity group to be accessed, returns to the calling program that the physical area has been depleted, and ends the process.

Step 1307: The depletion check program returns to the calling program that the physical area has not been depleted and ends the processing.

Subsequently, the flow of the rebuild process will be described with reference to FIG. When the rebuild program is executed by the CPU 12, a rebuild process is performed. The rebuild process is started when access from the storage controller 10 becomes impossible because one (or a plurality of) storage devices in the parity group have failed. Hereinafter, the failed storage device is referred to as “failed device”, and the parity group to which the failed device belongs is referred to as “failed PG”.

It should be noted that the rebuild process can be executed in parallel with the I / O process (FIG. 12). That is, the storage apparatus 1 can execute the rebuild process while receiving an I / O command from the host 2.

Step 1401: The rebuild program determines a spare stripe corresponding to each stripe of the failed device and records the information in the spare conversion table 1100.

Step 1402: If the failed device has a compression function (this can be determined by referring to the column for compression (904) of the failed PG), the rebuilding program sets the failure PG depletion threshold (1002) to the original value. Return to (pre-correction depletion threshold). For example, when the spare area ratio of the faulty PG is 9%, the rebuild program adds 9% to the depletion threshold (1002) to increase the depletion threshold (1002).

Step 1403: The rebuild program initializes the copy pointer (905) of the fault PG (substitutes 0).

Step 1404, Step 1405: The rebuild program executes data restoration of the stripe specified by the copy pointer (905) (hereinafter referred to as “repair target stripe”). This process is similar to the known RAID technology.

Specifically, in step 1404, the rebuild program specifies the position of the stripe belonging to the same stripe line as the repair target stripe (PDEV # of the storage device to which it belongs and the LBA where the stripe exists), and reads out those stripes. When obtaining these, the rebuild program obtains the virtual storage device in the virtual parity group to which the stripe to be repaired belongs and the address on the virtual storage device, and based on that, each rebuilding program belongs to the same stripe line in the virtual parity group. The address of the stripe (PDEV # and LBA of each storage device) is obtained, but description of these processes will be omitted.

Subsequently, in step 1405, the rebuild program restores the data using the read stripe, and writes the restored data in the spare stripe. Since the information of the spare stripe corresponding to the restoration target stripe is recorded in the spare conversion table 1100 generated in step 1401, the rebuild program identifies the spare stripe to which the restoration data is written based on this information, and writes the information. Do.

Step 1406: The rebuild program updates the copy pointer (905) of the fault PG. Specifically, the rebuild program adds the stripe size to the copy pointer (905) so that the address pointed to by the copy pointer (905) becomes the stripe next to the restoration target stripe. However, the stripe next to the repair target stripe may be a spare stripe. In that case, the rebuild program adds the stripe size to the copy pointer (905) again. This is because the spare stripe is not normally used and there is no need to restore data.

Step 1410: The rebuild program determines whether the copy pointer (905) of the failed PG has exceeded the termination LBA of the failed device. If the copy pointer (905) of the failed PG has not yet exceeded the termination LBA of the failed device (step 1410: N), the rebuild program returns to step 1404 for processing. Conversely, when the copy pointer (905) exceeds the terminal LBA of the failed device (step 1410: Y), step 1411 is performed next.

Step 1411: The rebuild program has an invalid value (called “NULL”) for the copy pointer (905) of the faulty PG. Specifically, NULL is a value that is not normally used for the address of the storage device (for example, −1). ) Is stored, and the process ends.

In the storage apparatus according to the present embodiment, when the rebuild process is started, the depletion threshold is updated (step 1402). Before the rebuild process, the depletion threshold is kept lower than the original value (subtracted by the amount corresponding to the spare area ratio), so it is in a state where writing to some physical areas is practically impossible. . This is to prevent the physical area for storing the data written in the spare stripe during the rebuild process from being exhausted (to ensure the physical area for data to be written in the spare stripe). is there. On the other hand, when the rebuild process is executed, the depletion threshold value is increased by executing the above step 1402, so that the rebuild program can write data to the physical area secured before the rebuild process.

Subsequently, Example 2 will be described. Since the configuration of the storage apparatus according to the second embodiment is the same as the storage according to the first embodiment, the illustration of the configuration is omitted.

In the storage device according to the second embodiment, the contents performed in the rebuild process are slightly different from those described in the first embodiment. With reference to FIG. 15, the rebuild process performed in the storage apparatus according to the second embodiment will be described. Since many parts of this process are the same as those described in the first embodiment, the following description focuses on the differences.

First, in the rebuild process in the second embodiment, the process in step 1402 described in the first embodiment is not performed. Instead, step 1407 is performed after step 1406 (processing for updating the copy pointer).

In step 1407, the rebuild program updates the depletion threshold (1002) of the depletion capacity management table 1000. Although details of this process will be described later, unlike the step 1402 described in the first embodiment, the depletion threshold is gradually changed according to the progress of the rebuild process.

Except for the points described above, the process executed by the rebuild program in the second embodiment is the same as that described in the first embodiment. Next, details of the processing performed in step 1407 will be described with reference to FIG.

Step 1601: The rebuild program determines whether the failed device has a compression function. This determination is the same as step 1303 described in the first embodiment. If the failed device does not have a compression function (step 1601: N), the process ends. On the other hand, if the failed device has a compression function (step 1601: Y), then step 1602 is performed.

Step 1602: The CPU 12 calculates a rebuild program progress rate. In the present embodiment, an index value indicating the progress of rebuilding a failed drive is referred to as “copy progress rate”. When the copy progress rate is Z, the end LBA of the failed drive is c, and the copy pointer (905) of the failed PG is p, the copy progress rate Z is a value obtained by the following equation. (Z is a value expressed as a percentage)
Z = p ÷ c × 100
Therefore, when the rebuild process is not performed at all, Z is 0, and Z is 100 when the rebuild process is completed.

Step 1603: The rebuild program calculates the depletion threshold value of the failure PG (value to be stored in the depletion threshold value (1002) of the depletion threshold value management table 1000) using the copy progress rate obtained in Step 1602. Specifically, the depletion threshold is obtained by the following formula.
Depletion threshold = (depletion depletion threshold) − (spare area ratio) ÷ 100 × (100−Z)

That is, when Z is 0 (when the rebuild process is not performed at all), the depletion threshold is equal to the “pre-correction depletion threshold-spare area ratio”, and as the copy pointer increases (that is, the rebuild process is performed for many stripes). As it is done), the depletion threshold increases gradually. When the rebuild process is completed, the depletion threshold becomes equal to the pre-correction depletion threshold.

Step 1604: Finally, the rebuild program stores the depletion threshold obtained in Step 1603 in the depletion threshold management table 1000 (the depletion threshold (1002)), and ends the process.

Subsequently, the flow of the depletion check process in the second embodiment will be described with reference to FIG. However, since this process is mostly the same as the depletion check process described in the first embodiment, differences from the first embodiment will be described below.

In the depletion check program in the second embodiment, when it is determined in step 1303 that the access target storage device has the compression function, step 1304 is executed before step 1305. In step 1304, the depletion check program determines whether or not the access target area is a spare stripe. If the area is not a spare stripe (step 1304: N), next step 1305 is executed as in the processing described in the first embodiment.

On the other hand, when the access target area is a spare stripe (step 1304: Y), the depletion check program next executes step 1307. In other words, in the case of a spare stripe, a call is made that the physical area is not exhausted regardless of whether the physical capacity usage rate (1003) of the parity group to which the access target area belongs is equal to or higher than the depletion threshold (1002). Return to the original program (step 1307). The depletion check process in the second embodiment is the same as that described in the first embodiment except that step 1304 is added.

In this way, in the storage device according to the second embodiment, the depletion threshold gradually increases with the progress of the rebuild process, so that the normal write process (a process for storing write data from the host 2) that occurs during the rebuild process is performed. It is possible to prevent a situation in which the physical area is excessively consumed by data and the data restored by the rebuild process cannot be stored. Also, in the exhaustion check process, if the access target area is a spare stripe, the exhaustion check is not performed (data can be written to the storage device even if the physical capacity usage rate exceeds the exhaustion threshold), so it is restored by the rebuild process. The data can be reliably written to the storage device (spare stripe).

Note that the depletion check process (FIG. 18) described in the second embodiment may be performed in the storage apparatus according to the first embodiment.

As a third embodiment, a method in which the storage device physical area management method according to the first embodiment is slightly modified will be described. Since the configuration of the storage device according to the third embodiment is the same as that of the storage according to the first or second embodiment, the illustration is omitted.

In the first or second embodiment, when a parity group is defined, a value obtained by subtracting the spare area ratio from the pre-correction depletion threshold is set as the depletion threshold (the depletion threshold (1002) of the depletion threshold management table 1000). The physical area for writing (the physical area for writing the data restored by the rebuild process) was secured. However, the physical area for the spare stripe may be secured by other methods. In Example 3, two methods will be described.

The first method is a method in which some physical areas are used in advance (apparently). Specifically, at the time of initial setting, the configuration information management program adds a value corresponding to the spare area ratio in advance to the physical usage (806) of the logical physical capacity management table 800. At this time, the pre-correction depletion threshold is set as the depletion threshold (1002) (that is, it is not necessary to set (pre-correction depletion threshold-spare area ratio)).

For example, when the physical capacity of a storage device is P (TB) and the spare area ratio is S (%), the configuration information management program sets (P × S ÷÷) to the physical usage (806) of the storage device at the initial setting. 100) is set. During normal times (when the rebuild process is not executed), the configuration information management program updates the physical usage (806) of the logical physical capacity management table 800 using the physical usage acquired from the storage device. At that time, the physical usage (806) includes the physical usage acquired from the storage device + (P × S ÷ 100).
Should be set.

When the configuration information management program updates the physical capacity usage rate (1003) of the depletion threshold management table 1000, the physical capacity usage rate is used because the physical capacity (804) and physical usage amount (806) of the logical physical capacity management table 800 are used. In (1003), a value larger than the actual physical capacity usage rate (a value obtained by adding the spare area ratio) is stored. As a result, the physical area corresponding to the spare area ratio is not used.

When the rebuild process is started, in step 1402, the rebuild program subtracts (P × S ÷ 100) from the physical usage (806) of the storage device instead of correcting the depletion threshold (1002). Return to physical usage. As a result, during the rebuild process, the physical area for (P × S ÷ 100) (TB) that has not been used until now can be used.

The second method is to reduce the physical capacity in advance. Specifically, at the time of initial setting, the configuration information management program sets a value obtained by subtracting a value corresponding to the spare area ratio from the physical capacity of the original storage device in the physical capacity (804) of the logical physical capacity management table 800. As in the first method, the pre-correction depletion threshold may be set as the depletion threshold (1002).

For example, when the original physical capacity of a storage device is P (TB) and the spare area ratio is S (%), the configuration information management program sets (PP) to the physical capacity (804) of the storage device at the initial setting. × S ÷ 100) is set.

When the rebuild process is started, the rebuild program returns the physical capacity (804) of the storage device to the original physical capacity instead of correcting the depletion threshold (1002) in step 1402. As a result, during the rebuild process, the physical area for (P × S ÷ 100) (TB) that has not been used until now can be used.

In the above description, the method of returning the physical usage (806) or the physical capacity (804) of the storage device to the original value at the start of rebuilding (step 1402) has been described. As with the storage apparatus according to the second embodiment, The storage apparatus according to the third embodiment may gradually return the physical usage (806) or the physical capacity (804) of the storage device to the original value in accordance with the progress of the rebuild process. Also in the storage apparatus according to the third embodiment, the depletion check process described in the second embodiment (the process of FIG. 18, that is, a method that does not perform the depletion check when writing is performed on a spare stripe) may be performed. .

The embodiment of the present invention has been described above, but this is an example for explaining the present invention, and is not intended to limit the present invention to the embodiment described above. The present invention can be implemented in various other forms. For example, in the storage apparatus 1 described in the embodiment, the example in which the storage device is an SSD (storage apparatus using a flash memory) has been described, but the type of storage medium used for the storage device is not limited to the flash memory. Any storage device may be used as long as it has a compression function. For example, when the external storage apparatus 4 illustrated in FIG. 1 has a compression function, the external storage apparatus 4 may be used as a storage device.

In the above, the storage area on the logical volume provided to the host and the storage area of the (virtual) parity group are statically mapped (virtual volume to which each storage area of the logical volume is mapped at the time of definition). The storage area on the parity group is uniquely determined), but the relationship between the logical volume storage area and the parity group storage area (stripe line) is limited to a fixed (static) configuration is not. For example, a logical volume formed by using a well-known technique, Thin-Provisioning technology, may be provided to the host. In this case, the storage area of the parity group (or stripe line) is allocated to the storage area only when a write request is received from the host computer to the storage area on the logical volume.

Further, the components described as programs in the embodiments may be realized by hardware using hard wired logic or the like. In addition, each program in the embodiment may be stored in a storage medium such as a CD-ROM or DVD and provided.

1: storage device, 2: host, 3: SAN, 10: storage controller (DKC), 11: disk unit, 12: processor (CPU), 13: memory, 14: front-end interface (FE I / F), 15 : Back-end interface (BE I / F), 16: Management I / F, 17: Internal switch, 21: Storage device (SSD), 22: Storage device (HDD), 200: SSD controller, 201: Processor (CPU) ), 202: Upstream I / F, 203: Downstream I / F, 204: Memory, 205: Internal connection switch, 206: FM chip, 207: Compression / decompression circuit

Claims

A storage controller that accepts write data from the host computer,
A plurality of storage devices each having a predetermined amount of physical area;
In a storage device having
The storage device provides the storage controller with a logical storage space having a size larger than the capacity of the physical area, compresses data written from the storage controller to the logical storage space, and then compresses the compressed data into the storage controller. Configured to store in the physical area,
The storage controller manages more than (N + M) storage devices as a parity group (N ≧ 1, M ≧ 1),
The storage controller generates M parities from N data of a predetermined size written from the host computer, and the N data and the M parities are respectively different from each other in the parity group. Configured to store on a storage device,
In the logical storage space provided by the storage device, in addition to a data stripe that is an area for storing the data and a parity stripe that is an area for storing the parity, data restored when a failure occurs in the storage device Spare stripes are provided as storage areas,
The storage controller
Normally, the use of the physical area of the amount necessary to store the data written to the spare stripe is suppressed,
When a failure occurs in the storage device, the use of a physical area of an amount necessary for storing data written to the spare stripe is permitted.
A storage apparatus characterized by the above.
The storage controller obtains, from the storage device, a physical usage that is the amount of the physical area used for the storage device to store the compressed data, and the physical area of the physical usage When the physical capacity usage rate that is the ratio to the capacity of the storage device is equal to or greater than a predetermined depletion threshold, it is configured to suppress writing to the storage device,
The storage controller normally corresponds to an amount corresponding to the amount of physical area necessary for storing the data written to the spare stripe from the value used when the storage device has failed as the depletion threshold. Use the value obtained by subtracting only
The storage apparatus according to claim 1, wherein:
The storage controller, when a failure occurs in the storage device, starts executing a rebuild process that restores data stored in the failed storage device,
In accordance with the progress of the rebuild process, the depletion threshold is increased.
The storage apparatus according to claim 2, wherein:
The storage controller does not inhibit writing to the storage device even when the physical capacity usage rate is equal to or higher than the depletion threshold when writing data to the spare stripe.
The storage apparatus according to claim 3, wherein:
The storage controller obtains, from the storage device, a physical usage that is the amount of the physical area used for the storage device to store the compressed data, and the physical area of the physical usage When the physical capacity usage rate that is the ratio to the capacity of the storage device is equal to or greater than a predetermined depletion threshold, it is configured to suppress writing to the storage device,
The storage controller normally uses the physical capacity using a value obtained by adding the amount of the physical area necessary for storing data to be written to the spare stripe to the physical usage acquired from the storage device. Calculate the rate,
When a failure occurs in the storage device, the physical capacity usage rate is calculated using the physical usage acquired from the storage device.
The storage apparatus according to claim 1, wherein:
The storage controller obtains, from the storage device, a physical usage that is the amount of the physical area used for the storage device to store the compressed data, and the physical area of the physical usage When the physical capacity usage rate that is the ratio to the capacity of the storage device is equal to or greater than a predetermined depletion threshold, it is configured to suppress writing to the storage device,
Normally, the storage controller divides the physical usage by the value obtained by subtracting the amount of the physical area necessary for storing data to be written to the spare stripe from the capacity of the physical area. Calculate capacity usage,
When a failure occurs in the storage device, the physical capacity usage rate is calculated by dividing the physical usage amount by the capacity of the physical area.
The storage apparatus according to claim 1, wherein:
A storage controller that accepts write data from the host computer,
A plurality of storage devices each having a predetermined amount of physical area;
A storage apparatus control method comprising:
The storage device provides the storage controller with a logical storage space having a size larger than the capacity of the physical area, compresses data written from the storage controller to the logical storage space, and then compresses the compressed data into the storage controller. Configured to store in the physical area,
The storage controller manages more than (N + M) storage devices as a parity group (N ≧ 1, M ≧ 1),
The storage controller generates M parities from N data of a predetermined size written from the host computer, and the N data and the M parities are respectively different from each other in the parity group. Configured to store on a storage device,
In the logical storage space provided by the storage device, in addition to a data stripe that is an area for storing the data and a parity stripe that is an area for storing the parity, data restored when a failure occurs in the storage device A storage device provided with a spare stripe which is a storage area,
The method
The storage controller normally suppresses the use of an amount of physical area necessary to store data written to the spare stripe,
When a failure occurs in the storage device, the use of a physical area of an amount necessary for storing data written to the spare stripe is permitted.
A method for controlling a storage apparatus.
The storage controller obtains, from the storage device, a physical usage that is the amount of the physical area used for the storage device to store the compressed data, and the physical area of the physical usage When the physical capacity usage rate that is the ratio to the capacity of the storage device is equal to or greater than a predetermined depletion threshold, it is configured to suppress writing to the storage device,
The storage controller normally corresponds to an amount corresponding to the amount of physical area necessary for storing the data written to the spare stripe from the value used when the storage device has failed as the depletion threshold. Use the value obtained by subtracting only
The storage apparatus control method according to claim 7, wherein:
The storage controller obtains, from the storage device, a physical usage that is the amount of the physical area used for the storage device to store the compressed data, and the physical area of the physical usage When the physical capacity usage rate that is the ratio to the capacity of the storage device is equal to or greater than a predetermined depletion threshold, it is configured to suppress writing to the storage device,
The storage controller normally uses the physical capacity using a value obtained by adding the amount of the physical area necessary for storing data to be written to the spare stripe to the physical usage acquired from the storage device. Calculate the rate,
When a failure occurs in the storage device, the physical capacity usage rate is calculated using the physical usage acquired from the storage device.
The storage apparatus control method according to claim 7, wherein:
The storage controller obtains, from the storage device, a physical usage that is the amount of the physical area used for the storage device to store the compressed data, and the physical area of the physical usage When the physical capacity usage rate that is the ratio to the capacity of the storage device is equal to or greater than a predetermined depletion threshold, it is configured to suppress writing to the storage device,
Normally, the storage controller divides the physical usage by the value obtained by subtracting the amount of the physical area necessary for storing data to be written to the spare stripe from the capacity of the physical area. Calculate capacity usage,
When a failure occurs in the storage device, the physical capacity usage rate is calculated by dividing the physical usage amount by the capacity of the physical area.
The storage apparatus control method according to claim 7, wherein: