WO2020082888A1

WO2020082888A1 - Method, system and apparatus for restoring data in storage system

Info

Publication number: WO2020082888A1
Application number: PCT/CN2019/103085
Authority: WO
Inventors: 蒲贵友
Original assignee: 华为技术有限公司
Priority date: 2018-10-25
Filing date: 2019-08-28
Publication date: 2020-04-30

Abstract

Embodiments of the present invention provide a method for restoring data in a storage system. The method comprises: dividing a solid state disk (SSD) into multiple failure domains, each of the failure domains being used for providing a physical address for a logical address of the SSD within a certain range, so that when failures occur to the failure domains of the SSD, it is unnecessary to reconstruct data in the whole SSD.

Description

Data recovery method, system and device in storage system

Technical field

The present invention relates to the field of information technology, and in particular, to a data recovery method, system, and device in a storage system.

Background technique

Redundant Array of Independent Disks (RAID) technology is a technology widely used in storage systems to ensure data reliability. When the storage system has a hard disk damage, the data on the hard disk and the verification data can be restored Calculating the data on the damaged hard disk, such a process is called RAID reconstruction.

In a RAID-based storage system composed of Solid State Disk (SSD), if the reconstruction speed is 1 terabyte (TB) every 5 hours, then an SSD with a capacity of 1TB will be consumed locally and will consume 5 hours; and if the SSD capacity is 100TB, then the reconstruction time will become 500 hours.

Summary of the invention

In a first aspect, a data recovery method in a storage system is provided. The storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein the first SSD and the second SSD each include multiple Fault domain, the storage system includes a block group formed based on an erasure coding algorithm, the block group includes a first block and a second block; the address of the first block is mapped to the first fault domain of the first SSD The physical address provided, the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD; the method includes: the controller receives the fault information of the first SSD, the The fault information is used to indicate that a fault has occurred in the first fault domain; in response to the fault information, the controller restores the logical address of the first block in the block group according to the erasure code algorithm. data. Both the first SSD and the second SSD include multiple fault domains, but the number of fault domains of the first SSD and the second SSD may be different. Therefore, compared to the prior art, the storage system according to the embodiment of the present invention does not need to reconstruct the data in all logical addresses of the failed SSD, but only needs to reconstruct the data in a part of the logical address in the SSD. The partial logical address refers to The logical address mapped to the physical address in the fault domain where the fault occurred, thereby increasing the speed of data reconstruction.

In a specific implementation, mapping the address of the first block to the physical address provided by the first fault domain of the first SSD includes: the address of the first block is the first logical address of the first SSD, and the first A logical address is mapped to a physical address provided by a first fault domain of the first SSD; an address of the second block is mapped to a physical address provided by a second fault domain of the second SSD includes: the second block Is the second logical address of the second SSD; the second logical address is mapped to the physical address provided by the second fault domain of the second SSD. In another implementation, in an SSD scenario that supports Open-Channel, the address of the first block is a physical address provided by the first fault domain of the first SSD, and the address of the first block The physical address provided by the first fault domain mapped to the first SSD is the address directly mapped to the physical address provided by the first fault domain of the first SSD; the address of the second block is The physical address provided by the second fault domain of the second SSD, the address of the second block is directly mapped to the physical address provided by the second fault domain of the second SSD. In another implementation, in the SSD scenario of Open-Channel, the embodiment of the present invention also supports indirect mapping between the block address and the physical address provided by the fault domain.

With reference to the first aspect, in some implementations of the first aspect, the storage system stores the correspondence between the address of the first block and the first fault domain, and the address and address of the second block The corresponding relationship of the second fault domain is described. The address of the first block is the first logical address of the first SSD, and the address of the second block is the second logical address of the second SSD. Further, the storage system stores the correspondence between the blocks included in the block group and the fault domain. For example, the first block belongs to the first fault domain and the second block belongs to the second fault domain. Further, the storage system also stores a fault domain index table. For example, the fault domain index table contains the correspondence between the fault domain and the block group. Because the same block group contains blocks from different SSD fault domains, the fault domain index In the table, different fault domains can correspond to the same block group. When a fault domain of an SSD fails, the controller can quickly find the block groups affected by the fault domain according to the fault domain index table, so as to quickly reconstruct the data in the blocks affected by the fault domain in these block groups.

Optionally, one fault domain in the first SSD and the second SSD is a plurality of particle packages connected to one channel, or one or more particle packages, or one or more particles, or one Or multiple flash memory chips.

With reference to the first aspect, in some implementations of the first aspect, the responding to the fault information includes: the controller querying the correspondence between the first fault domain and the block group to determine the block group.

With reference to the first aspect, in some implementations of the first aspect, the storage system stores the correspondence between the address of the first block and the first fault domain, and the address and address of the second block The corresponding relationship of the second fault domain is described.

In a second aspect, a method for SSD management of a solid state drive is provided. The SSD includes a first fault domain and a second fault domain. The method includes: assigning a logic of a first range of the SSD to the first fault domain Address; the second fault domain is assigned a logical address of the second range of the SSD.

With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: separately recording the correspondence between the first fault domain and the logical address of the first range and the second fault domain and The corresponding relationship of the logical addresses in the second range.

With reference to the second aspect, in some implementations of the second aspect, the logical address of the first range and the logical address of the second range are consecutive logical addresses or, the logical addresses of the first range and The logical addresses in the second range are discontinuous logical addresses.

With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: the SSD sending a correspondence between the first fault domain and the first range of logical addresses to a controller of a storage system And the correspondence between the second fault domain and the logical address in the second range; wherein, the storage system includes the SSD.

In a third aspect, an embodiment of the present invention provides a controller applied to a storage system, including units, for implementing various solutions of the first aspect.

According to a fourth aspect, an embodiment of the present invention provides an SSD management device, including units, for implementing various solutions of the second aspect.

According to a fifth aspect, an embodiment of the present invention provides a computer-readable storage medium in which computer instructions are stored. The computer instructions are used to execute various methods of the first aspect.

According to a sixth aspect, an embodiment of the present invention provides a computer program product containing computer instructions, which are used to execute various methods of the first aspect.

According to a seventh aspect, an embodiment of the present invention provides a computer-readable storage medium in which computer instructions are stored, and the computer instructions are used to perform various methods of the second aspect.

In an eighth aspect, an embodiment of the present invention provides a computer program product containing computer instructions, which are used to execute various methods of the second aspect.

In a ninth aspect, an embodiment of the present invention provides a solid state drive SSD, which includes an SSD controller, a first fault domain, and a second fault domain; the SSD controller is used to implement various solutions of the second aspect.

According to a tenth aspect, an embodiment of the present invention provides a controller applied to a storage system. The controller includes an interface and a processor, and is used to implement various solutions of the first aspect.

According to an eleventh aspect, an embodiment of the present invention provides a data recovery method in a storage system, where the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the Each second SSD includes multiple namespaces, one namespace corresponds to one fault domain, and the storage system includes a block group formed based on an erasure coding algorithm, and the block group includes a first block and a second block; the first The address of a block is the first logical address of the first namespace of the first SSD, and the address of the second block is the second logical address of the second namespace of the second SSD; the first logic The address is mapped to the physical address provided by the first fault domain of the first SSD, and the second logical address is mapped to the physical address provided by the second fault domain of the second SSD; the method includes: the controller Receiving fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain or a fault in the first namespace; in response to the fault information, the controller according to the correction Delete code Restoring the block data of the group of a logical address stored.

According to a twelfth aspect, a method for SSD management of a solid state drive is provided. The SSD includes a first fault domain and a second fault domain. The method includes: allocating the first namespace of the SSD to the first fault domain ; Assigning the second namespace of the SSD to the second fault domain.

With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: separately recording the correspondence between the first fault domain and the namespace and the second fault domain and the second Namespace correspondence.

With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: the SSD sending the correspondence between the first fault domain and the first namespace to the controller of the storage system and the The correspondence between the second fault domain and the second namespace; wherein, the storage system includes the SSD.

With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: the SSD sending a correspondence between the first fault domain and the logical address of the first namespace to the controller of the storage system And the correspondence between the second fault domain and the logical address of the second namespace; wherein, the storage system includes the SSD.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions of the embodiments of the present application, the drawings required in the description of the embodiments will be briefly introduced below.

1 is a schematic diagram of a storage system according to an embodiment of the invention;

2 is a schematic structural diagram of a storage array controller according to an embodiment of the present invention;

3 is a schematic diagram of a distributed storage system according to an embodiment of the present invention;

4 is a schematic structural diagram of a server in a distributed storage system according to an embodiment of the present invention;

5 is a schematic structural diagram of an SSD according to an embodiment of the present invention;

6 is a schematic diagram of the relationship between a fault domain and a logical address in an SSD in an embodiment of the present invention;

7 is a schematic diagram of the block group relationship in the storage system;

Figure 8 is a schematic diagram of the fault domain index table;

Figure 9 is a schematic diagram of the namespace index table;

Figure 10 is a schematic diagram of the controller structure;

11 is a schematic diagram of the structure of an SSD management device.

detailed description

The technical solutions in the embodiments of the present application are described in more detail below.

The embodiment of the present invention solves the problem that when a partial failure of some components of an SSD occurs in the storage system, the fault domain is limited to the fault domain based on the way of the SSD fault domain and the physical space of the SSD, thereby reducing the impact range on the storage system side The cost of reconstruction can be used to reconstruct less storage space and spend less time, thereby improving reliability.

As shown in FIG. 1, the storage system in the embodiment of the present invention may be a storage array (such as Huawei

of

18000 series,

V3 series). The storage array includes a controller 101 and multiple SSDs. As shown in FIG. 2, the controller 101 includes a central processing unit (Central Processing Unit, CPU) 201, a memory 202, and an interface 203. The memory 202 stores computer instructions. The CPU 201 executes the computer instructions in the memory 202 to manage and store data in the storage system. Access operations, data recovery and other operations. In addition, in order to save the computational resources of CPU201, a field programmable gate array (Field Programmable Gate Array, FPGA) or other hardware can also be used to perform all operations of CPU201 in the embodiment of the present invention, or FPGA or other hardware and CPU201 are used In order to perform some operations of the CPU 201 in the embodiment of the present invention. For ease of description, the embodiment of the present invention uses a processor to refer to the combination of the CPU 201 and the memory 202, as well as the various implementations described above, and the processor communicates with the interface 203. The interface 203 may be a network interface card (NIC) or a host bus adaptor (Host Bus Adaptor, HBA).

Further, the storage system in this embodiment of the present invention may also be a distributed storage system (such as Huawei

of

Series) etc. Huawei

of

series. Exemplarily as shown in FIG. 3, the distributed block storage system includes multiple servers, such as server 1, server 2, server 3, server 4, server 5, and server 6, and the servers communicate with each other through InfiniBand or Ethernet. In practical applications, the number of servers in the distributed block storage system can be increased according to actual needs, which is not limited in the embodiments of the present invention.

The server of the distributed block storage system includes the structure shown in FIG. 4. As shown in FIG. 4, each server in the distributed block storage system includes a central processing unit (Central Processing Unit, CPU) 401, a memory 402, an interface 403, SSD 1, SSD 2 and SSD 3, and the memory 402 stores computer instructions The CPU 401 executes the program instructions in the memory 402 to perform corresponding operations. The interface 403 may be a hardware interface, such as a network interface card (Network Interface Card, NIC) or a host bus adapter (Host Bus Adaptor, HBA), etc., or a program interface module. In addition, in order to save the computational resources of CPU401, Field Programmable Gate Array (FPGA) or other hardware can also replace CPU401 to perform the above corresponding operations, or FPGA or other hardware and CPU401 can perform the above corresponding operations together . For convenience of description, the embodiments of the present invention collectively refer to the CPU 401 and the memory 402, FPGA and other hardware replacing the CPU 401 or the combination of the FPGA and other hardware replacing the CPU 401 and the CPU 401 as a processor. The interface 403 may be a network interface card (Networking Interface Card, NIC) or a host bus adapter (Host Bus Adaptor, HBA). In a distributed storage system, the server responsible for storage management in the distributed storage system is called the controller. Specifically, the controller is used to perform storage space management and data access.

SSDs use pages as read and write units and blocks as erase units. SSDs can implement multiple channels, particle packages, flash memory chips, die, and flash chips. Three levels of data access in parallel. The SSD organizes the flash particle package in a multi-channel manner. Multiple particle packages can be connected to each channel. The transmission channel is shared among the multi-particle packages, but instructions can be executed independently. The specific structure of the SSD can refer to FIG. 5, including the interface 501, the SSD controller 502, the channel 503, and the package 504. Among them, one package 504 contains multiple flash memory chips, each flash memory chip includes one or more particles, each particle includes multiple flash memory chips, each flash memory chip includes multiple blocks, and each block includes multiple pages. Among them, the interface 501 may be a serial connected small computer system interface (Serial Attached Small Computer System Interface (SAS) protocol, non-volatile memory fast (Non-Volatile Memory Express, NVMe) protocol or fast peripheral interconnect (Peripheral Component) Interconnect Express, PCIe) protocol interface, etc.

If an SSD fails, usually only some elements of the SSD, such as physical blocks, fail, but not the entire SSD. That is to say, when a fault occurs inside the SSD, the potential affected range is not the entire SSD, but a part of the SSD. The embodiment of the present invention refers to this potentially affected part as the fault domain. According to the structure of SSD, divide SSD into multiple fault domains, for example, encapsulate multiple particles connected to a channel as a fault domain, or use one or more particles as a fault domain, or one or more flash memories Slice as a fault domain. In the embodiment of the present invention, when the SSD fails, the failure domain is regarded as the range that is potentially affected by the failure, and the data in the failed failure domain needs to be restored. In an actual application scenario, the failure in the fault domain of the SSD may be a failure of the entire fault domain or a part of the fault domain. The embodiment of the present invention may also use other SSD components as a fault domain, which is not limited in the embodiment of the present invention. The SSD monitors the status of each fault domain. In specific implementation, the controller of the SSD monitors the status of the fault domain using methods such as background inspection. The SSD can also determine the health status of the fault domain according to the number of erasures of the physical block Block in each fault domain, that is, the status of the fault domain according to the degree of wear.

SSD provides storage space in the form of logical addresses. In the SSD, the logical address is the logical block address (Logical Block Address, LBA), and the SSD uses the Flash translation layer (Flash Translation Layer, FTL) to map the LBA to the page on the physical block of the SSD, and establish the LBA to page Address mapping relationship. In the embodiment of the present invention, in order to solve the problem of SSD full disk data recovery in the case of SSD failure in the storage system, the SSD configures LBA to page mapping according to the failure domain. For example, an SSD contains 128 particles, and the available capacity of the SSD is 32TB, which can provide a logical address of 32TB, or it can be said to provide an address space of 32TB. If the LBA range affected by SSD failure is limited to 1TB, the number of failure domains is 32, that is, 32TB / 1TB = 32. In the embodiment of the present invention, the SSD contains 128 particles, and the number of particles in each fault domain is 4, that is, 128/32 = 4. As shown in Figure 6, the SSD contains 32 fault domains, and the fault domain identifiers are 0-31. In a specific implementation, the SSD may use numbers or other methods to identify the fault domain, which is not limited in this embodiment of the present invention. An implementation manner, the SSD corresponding to each fault domain has a certain range of LBA, for example, the LBA corresponding to fault domain 0 ranges from 0 to (1TB-1), and the LBA corresponding to fault domain 1 ranges from 1TB to (2TB-1) ... …, The logical block address corresponding to the fault domain 31 ranges from 31TB to (32TB-1), that is, the logical addresses corresponding to one fault domain are continuous. The above is also referred to as assigning a certain range of logical addresses to the fault domain, that is, a certain range of LBAs. The embodiment of the present invention allocates a certain range of logical addresses to the fault domain, which is also called SSD to map a certain range of logical addresses to physical addresses in a specific fault domain based on FTL. In the embodiment of the present invention, assigning a certain range of logical addresses to the fault domain does not require all mappings of the certain range of logical addresses to physical addresses in the fault domain to be established. In an implementation manner, when a mapping from a specific logical block address to a physical address in the logical address in a certain range needs to be established, the SSD selects a physical address in the fault domain to establish the mapping. In another implementation of the embodiment of the present invention, still taking the above SSD as an example, the LBAs in each fault domain may be discontinuous, that is, a certain range of logical addresses may be discontinuous logical addresses. For example, the 32TB LBA is divided into 32 parts, with 1 gigabyte (GB) as the granularity, each fault domain provides a physical address for the 1TB LBA, that is, the LBA corresponding to the fault domain 0 is 0 ~ (1GB-1 ), The LBA corresponding to fault domain 1 is 1 ~ (2GB-1), and the LBA corresponding to fault domain 31 is 31GB ~ (32GB-1). Then the LBA corresponding to fault domain 0 is 0 ～ (1GB-1), the LBA corresponding to fault domain 31 is 32GB ～ (33GB-1) ……, the LBA corresponding to fault domain 31 is 63GB ～ (64GB-1), through the loop The corresponding relationship between the fault domain and the LBA is established in an interleaved manner. In this implementation, the LBA corresponding to a fault domain is not continuous. The SSD stores the above mapping relationship between the LBA and the fault domain. The SSD reports the mapping relationship between the LBA and the fault domain to the controller 101.

In the embodiment of the present invention, taking the storage array shown in FIG. 1 as a storage system as an example, wherein the SSD provides a fixed-length block (Chunk, CK), the controller 101 uses a redundancy algorithm, such as erasure coding (Erasure Coding, EC) algorithm. Chunks from different SSDs are used to form Chunk Group (CKG). For specific implementation, the EC algorithm can be a RAID algorithm. As shown in Figure 7, CKG consists of CK1, CK2, and CK3. CK1 is provided by SSD1, CK2 is provided by SSD2, and CK3 is provided by SSD3. The address of CK1 is LBA1 of SSD1, the address of CK2 is LBA2 of SSD2, and the address of CK3 is LBA3 of SSD3. Among them, LBA1 is mapped to the physical address provided by fault domain 1 of SSD1. Here, the address of CK1 is mapped to the physical address provided by fault domain 1 of SSD1; LBA2 is mapped to the physical address provided by fault domain 2 of SSD2, and LBA3 is mapped The physical address provided to fault domain 3 of SSD3. In the embodiment of the present invention, when a CK is selected from multiple SSDs to form a CKG, the failure domain of the SSD that provides CK based on the load may be considered. The load can be the type of input output (InputOutput, IO), IO coldness and so on. In one implementation, the SSD sends the correspondence between the fault domain and the LBA to the controller 101, and the controller 101 can determine the fault domain corresponding to the logical address of each CK in the CKG according to the correspondence between the SSD's fault domain and the logical address. The controller 101 acquires the status information of the SSD. For example, if the fault domain 1 of SSD1 fails, SSD1 sends fault information to the controller 101 to indicate that the fault domain 1 fails. Since the controller 101 can determine the LBA affected by the failure of the fault domain 1 of the SSD1 according to the correspondence between the SSD fault domain and the logical address, the storage array contains multiple CKGs, and the controller 101 finds that the address of the CK contained in the CKG is mapped to SSD1 LBA of fault domain 1. For example, it is determined that the address of CK1 included in CKG1 is the LBA mapped to the fault domain 1 of the SSD. The controller 101 restores the data of CK1 in CKG1 according to a redundant algorithm, such as the EC algorithm. Therefore, compared with the prior art, the embodiments of the present invention do not need to reconstruct the CK corresponding to all the logical addresses provided by SSD1, which provides the speed of data reconstruction. In a specific implementation process, the data in CK1 may be restored to other fault domains of SSD1 or other SSDs, which is not limited in the embodiment of the present invention.

Further, the SSD reports the above mapping relationship between the LBA and the fault domain to the controller 101, so the storage array stores the correspondence between the address of the CK contained in the CKG and the fault domain. For example, the first CK belongs to the first fault domain and the second CK belongs to the second fault domain. Further, in order to quickly find the address of the CK contained in the CKG as the LBA mapped to the fault domain of SSD1, according to the mapping relationship between the LBA and the fault domain, the storage array also stores a fault domain index table, for example, the fault domain index table contains faults Correspondence between the domain and the CKG, for example, the correspondence between the fault domain ID and the CKG ID. Because the same CKG contains CKs from fault domains of different SSDs, in the fault domain index table, different fault domains can correspond to the same CKG. When a fault domain of a certain SSD fails, the controller 101 can quickly find the CKGs affected by the fault domain according to the fault domain index table, so as to quickly reconstruct the data in the CKs of the CKGs affected by the fault domain. In a specific implementation, the controller 101 may record the corresponding entry in the fault domain index table according to the mapping relationship between the LBA and the fault domain when creating the CKG, and the entry contains the correspondence between the fault domain and the CKG. In order to facilitate the query and management of the fault domain index table, an implementation can establish a multi-level fault domain index table, for example, the first level is the SSD and fault domain index table, the second level is the fault domain and CKG index table; another In this implementation, as shown in FIG. 8, the fault domain index table can be partitioned according to the SSD, thereby facilitating quick query.

In another embodiment of the present invention, in an SSD that supports the NVME interface specification, the SSD can be assigned a corresponding namespace according to the number of fault domains, that is, one fault domain corresponds to one namespace. Therefore, the logical addresses of different namespaces of an SSD can be addressed independently. For example, taking the available capacity of the SSD as 32TB as an example, the SSD is divided into 32 fault domains, and a namespace is allocated to each fault domain. The LBA range of each namespace is 0 to (1TB-1). The LBA of a namespace is mapped to the physical address in the fault domain corresponding to the namespace. The SSD reports the mapping relationship between the namespace and the fault domain to the controller 101. The SSD stores the mapping relationship between the aforementioned namespace and the fault domain. The SSD reports the mapping relationship between the namespace and the fault domain to the controller 101. In another implementation, the mapping relationship between the LBA in the namespace and the fault domain can also be reported. In the embodiment of the present invention, when selecting CK from multiple SSDs to form a CKG, it may be considered to decide the namespace of the SSD that provides CK based on the load. The load can be the type of input output (InputOutput, IO), IO coldness and so on.

Correspondingly, as mentioned above, the storage array stores the fault domain index table. In another implementation, the storage array stores a namespace index table, and the namespace index table contains the correspondence between the namespace and the CKG, for example, the correspondence between the namespace ID and the CKG ID. Because the same CKG contains CKs from different SSD namespaces, in the namespace index table, different namespaces can correspond to the same CKG. When a fault domain of an SSD fails, the SSD reports fault information to the controller 101. The fault information is used to indicate the namespace in which the fault occurs. For example, the fault information includes a namespace identifier. The controller 101 can quickly find the CKGs affected by the fault domain according to the namespace index table, so as to quickly reconstruct the data in the CKs of the CKGs affected by the fault domain. In a specific implementation, the controller 101 may record the corresponding entry in the namespace index table according to the mapping relationship between the namespace and the fault domain when the CKG is allocated and established, and the entry contains the correspondence between the namespace and the CKG. In order to facilitate the query and management of the namespace index table, an implementation method can establish a multi-level namespace index table, for example, the first level is the SSD and namespace index tables, and the second level is the namespace and CKG index tables; another implementation, As shown in Figure 9, the namespace index table can be partitioned according to SSD, which facilitates quick query.

In the embodiment of the present invention, when the SSD performs garbage data collection, valid data is also written in different physical addresses in the same fault domain.

In the embodiment of the present invention, the controller of the SSD collects wear information of each fault domain in the SSD, and reports the wear information of the fault domain to the controller 101. When the controller 101 creates the CKG, the CK mapped to the physical address of the corresponding fault domain is selected according to the wear level of each fault domain of the SSD and the frequency of data modification.

The embodiments of the present invention can also be applied to SSDs supporting Open Channel (Open-Channel). It is supported in Open-channel SSD, an implementation method. The SSD is divided into multiple fault domains, and the controller 101 of the storage system can directly access the physical address of the SSD. SSD establishes the mapping relationship between the fault domain and the physical address of the SSD, then the address of the CK that forms the CKG in the storage system can be the physical address of the SSD, that is, the CK address is the physical address provided by the SSD fault domain, and the physical mapping of the CK to the SSD fault domain The physical address provided. For other operations required for the implementation of the SSD supporting Open-channel in the embodiment of the present invention, reference may be made to the description of other embodiments of the present invention, and details are not described herein again.

Various operations performed by the SSD in the embodiments of the present invention may be performed by the SSD controller.

Correspondingly, an embodiment of the present invention also provides a controller, which is applied to a storage system, where the storage system includes the controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the first Both SSDs contain multiple fault domains, and the storage system includes a block group based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to the first The physical address provided by the first fault domain of the SSD, and the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD; the controller, as shown in FIG. 10, includes a receiving unit 1001 and Recovery unit 1002. Wherein, the receiving unit 1001 is used to receive the fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain; the recovery unit 1002 is used to respond to the fault information according to the correction The code algorithm restores the data stored in the address of the first block in the block group. Further, the storage system stores the correspondence between the address of the first block and the first fault domain, and the correspondence between the address of the second block and the second fault domain, and the controller further includes a query The unit is configured to query the correspondence between the first fault domain and the block group to determine the block group. For the specific implementation of the controller shown in FIG. 10, reference may be made to the previous implementation of the embodiment of the present invention, and the structure of the controller 101 shown in FIG. 2 will not be repeated here. For another implementation, the controller provided in FIG. 10 in the embodiment of the present invention may also be implemented by software.

As shown in FIG. 11, an embodiment of the present invention further provides an SSD management device, where the SSD includes a first fault domain and a second fault domain, and the SSD management device includes a first allocation unit 1101 for A fault domain allocates a logical address of the first range of the SSD; a second allocation unit 1102 is used to allocate a logical address of the second range of the SSD to the second fault domain. Further, the SSD management apparatus further includes a sending unit for sending the correspondence between the first fault domain and the logical address of the first range and the second fault domain and the second to the controller of the storage system Correspondence of logical addresses of ranges; wherein, the storage system includes the SSD. Further, the SSD management apparatus further includes a recording unit for respectively recording the correspondence between the first fault domain and the logical address of the first range and the logic of the second fault domain and the second range Address correspondence. One of the hardware implementations of the SSD management device provided by the embodiment of the present invention can refer to the structure of the SSD controller, and details of the embodiment of the present invention will not be repeated here. In another implementation, the SSD management device provided by the embodiment of the present invention may also be implemented by software or jointly implemented by an SSD controller and software.

An embodiment of the present invention provides a computer-readable storage medium that stores computer instructions, and when the computer instructions run on the controller 101 shown in FIG. 1 or the server shown in FIG. 4, The method in the embodiment of the present invention is executed.

An embodiment of the present invention provides a computer program product containing computer instructions. When the computer instructions run on the controller 101 shown in FIG. 1 or the server shown in FIG. 4, the method in the embodiment of the present invention is executed.

Each unit of the data recovery apparatus provided by the embodiment of the present invention may be implemented by a processor, or may be implemented by a processor and a memory together, or may be implemented by software.

Embodiments of the present invention provide a computer program product containing computer instructions. When the computer instructions run on an SSD controller, the SSD management method in the embodiments of the present invention is executed.

The logical address in the embodiment of the present invention may also be a KV in a key-value (KV) disk, or a log in a log disk.

In the embodiment of the present invention, the correspondence relationship and the mapping relationship have the same meaning. The expression of the correspondence between the address of the block and the fault domain has the same meaning as the correspondence between the fault domain and the address of the block.

It should be noted that the memories described herein are intended to include, but are not limited to these and any other suitable types of memories.

Those of ordinary skill in the art may realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed in hardware or software depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the present invention.

Those skilled in the art can clearly understand that for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided by the present invention, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a division of logical functions. In actual implementation, there may be other divisions, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention essentially or part of the contribution to the existing technology or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, including Several computer instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store computer instructions .

The above are only specific embodiments of the present invention, but the scope of protection of the present invention is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed by the present invention. It should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

A data recovery method in a storage system, characterized in that the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the second SSD each include multiple fault domains , The storage system includes a block group formed based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to the first fault domain provided by the first SSD Physical address, the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD;

The method includes:

The controller receives fault information of the first SSD, and the fault information is used to indicate a fault in the first fault domain;

In response to the failure information, the controller restores the data stored in the address of the first block in the block group according to the erasure code algorithm.
The method according to claim 1, wherein a fault domain in the first SSD and the second SSD is a plurality of particle packages connected on one channel.
The method according to claim 1, wherein one fault domain in the first SSD and the second SSD is one or more particle packages.
The method according to claim 1, wherein one fault domain in the first SSD and the second SSD is one or more particles.
The method according to claim 1, wherein one fault domain in the first SSD and the second SSD is one or more flash memory chips.
The method according to any one of claims 1-5, wherein the storage system stores the correspondence between the first fault domain and the block group, and the second fault domain and the block group Correspondence.
The method according to claim 6, wherein the response to the fault information includes:

The controller queries the correspondence between the first fault domain and the block group to determine the block group.
The method according to any one of claims 1 to 5, wherein the storage system stores the correspondence between the address of the first block and the first fault domain, and the address and the second block The corresponding relationship of the second fault domain.
A solid-state drive SSD management method, characterized in that the SSD includes a first fault domain and a second fault domain, and the method includes:

Assign a logical address of the first range of the SSD to the first fault domain;

A second range of logical addresses of the SSD is allocated to the second fault domain.
The method according to claim 9, wherein the method further comprises:

The corresponding relationship between the first fault domain and the logical address in the first range and the corresponding relationship between the second fault domain and the logical address in the second range are recorded separately.
The method according to claim 9, wherein the logical addresses in the first range and the logical addresses in the second range are consecutive logical addresses.
The method according to claim 9, wherein the logical address in the first range and the logical address in the second range are discrete logical addresses.
The method according to claim 9, wherein the method further comprises:

The SSD sends the correspondence between the first fault domain and the logical address of the first range and the correspondence between the second fault domain and the logical address of the second range to the controller of the storage system; wherein, The storage system includes the SSD.
The method according to claim 9, wherein a fault domain of the SSD is a plurality of particle packages connected to one channel.
The method according to claim 9, wherein a fault domain of the SSD is encapsulated by one or more particles.
The method according to claim 9, wherein a fault domain of the SSD is one or more particles.
The method according to claim 9, wherein one fault domain in the first SSD and the second SSD is one or more flash memory chips.
A storage system is characterized in that the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, both the first SSD and the second SSD include multiple fault domains, and the storage The system includes a block group formed based on an erasure coding algorithm. The block group includes a first block and a second block; the address of the first block is mapped to the physical address provided by the first fault domain of the first SSD. The address of the second block is mapped to the physical address provided by the second fault domain of the second SSD;

The controller is used to:

Receiving fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain;

In response to the failure information, the data stored in the address of the first block in the block group is restored according to the erasure code algorithm.
The storage system according to claim 18, wherein the storage system stores the correspondence between the first fault domain and the block group, and the correspondence between the second fault domain and the block group;

The controller is also used to query the correspondence between the first fault domain and the block group to determine the block group.
A solid state drive SSD, characterized in that the SSD includes an SSD controller, a first fault domain, and a second fault domain; the SSD controller is used for:

Assign a logical address of the first range of the SSD to the first fault domain;

A second range of logical addresses of the SSD is allocated to the second fault domain.
The SSD according to claim 20, wherein the SSD controller is further used to:

The corresponding relationship between the first fault domain and the logical address in the first range and the corresponding relationship between the second fault domain and the logical address in the second range are recorded separately.
The SSD according to claim 20, wherein the logical addresses in the first range and the logical addresses in the second range are consecutive logical addresses.
The SSD according to claim 20, wherein the logical address in the first range and the logical address in the second range are discontinuous logical addresses.
The SSD according to claim 20, wherein the SSD controller is further used to:

Sending the correspondence between the first fault domain and the logical address of the first range and the correspondence between the second fault domain and the logical address of the second range to the controller of the storage system; wherein, the storage The system includes the SSD.
A controller, characterized in that the controller is applied to a storage system, the storage system includes the controller, a first solid state drive SSD and a second SSD; wherein, the first SSD and the first Both SSDs contain multiple fault domains, and the storage system includes a block group based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to the first The physical address provided by the first fault domain of the SSD, and the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD;

The controller includes:

A receiving unit, configured to receive fault information of the first SSD, and the fault information is used to indicate a fault in the first fault domain;

The recovery unit is configured to recover the data stored in the address of the first block in the block group according to the erasure code algorithm in response to the fault information.
The controller according to claim 25, wherein the storage system stores the correspondence between the first fault domain and the block group, and the correspondence between the second fault domain and the block group;

The controller further includes a query unit for querying the correspondence between the first fault domain and the block group to determine the block group.
A solid-state drive SSD management device, characterized in that the SSD includes a first fault domain and a second fault domain; the SSD management device includes:

A first allocation unit, configured to allocate a first range of logical addresses of the SSD to the first fault domain;

A second allocation unit is configured to allocate a second range of logical addresses of the SSD to the second fault domain.
The SSD management device according to claim 27, wherein the SSD management device further includes a recording unit for respectively recording the corresponding relationship between the first fault domain and the logical address of the first range and the second Correspondence between the fault domain and the logical address of the second range.
The SSD management apparatus according to claim 27, wherein the SSD management apparatus further comprises a sending unit for sending the first fault domain and the first range of logical addresses to the controller of the storage system The correspondence relationship and the correspondence relationship between the second fault domain and the logical address in the second range; wherein, the storage system includes the SSD.
A controller, characterized in that the controller is applied to a storage system, and the storage system includes the controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the second SSDs all contain multiple fault domains, and the storage system includes a block group formed based on an erasure coding algorithm. The block group includes a first block and a second block; the address of the first block is mapped to the first SSD The physical address provided by the first fault domain of the, the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD;

The controller includes a processor and an interface;

The interface is used to receive fault information of the first SSD, and the fault information is used to indicate a fault in the first fault domain;

The processor is configured to restore the data stored in the address of the first block in the block group according to the erasure code algorithm in response to the fault information.
The processor according to claim 30, wherein the storage system stores the correspondence between the first fault domain and the block group, and the correspondence between the second fault domain and the block group;

The processor is also used to query the correspondence between the first fault domain and the block group to determine the block group.
A computer program product, characterized in that the computer program product includes computer instructions applied to a storage system, the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and Each of the second SSDs includes multiple fault domains, and the storage system includes a block group formed based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to all The physical address provided by the first fault domain of the first SSD, and the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD; when the controller executes the computer instruction, use To perform the following steps:

Receiving fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain;

In response to the failure information, the data stored in the address of the first block in the block group is restored according to the erasure code algorithm.
The computer program product of claim 32, wherein when the controller executes the computer instruction, it is further used to perform the following steps:

The corresponding relationship between the first fault domain and the block group is queried to determine the block group.
A computer program product, characterized in that the computer program product includes an SSD controller applied to a solid state drive SSD, the SSD includes an SSD controller, a first fault domain, and a second fault domain; when the SSD controller Executing the computer instructions for performing the following steps:

Assign a logical address of the first range of the SSD to the first fault domain;

A second range of logical addresses of the SSD is allocated to the second fault domain.
The computer program product of claim 34, wherein when the SSD controller executes the computer instruction, it is further used to perform the following steps:

The corresponding relationship between the first fault domain and the logical address in the first range and the corresponding relationship between the second fault domain and the logical address in the second range are recorded separately.
The computer program product of claim 34, wherein when the SSD controller executes the computer instruction, it is further used to perform the following steps:

Sending the correspondence between the first fault domain and the logical address of the first range and the correspondence between the second fault domain and the logical address of the second range to the controller of the storage system; wherein, the storage The system includes the SSD.