WO2020082888A1 - Method, system and apparatus for restoring data in storage system - Google Patents

Method, system and apparatus for restoring data in storage system Download PDF

Info

Publication number
WO2020082888A1
WO2020082888A1 PCT/CN2019/103085 CN2019103085W WO2020082888A1 WO 2020082888 A1 WO2020082888 A1 WO 2020082888A1 CN 2019103085 W CN2019103085 W CN 2019103085W WO 2020082888 A1 WO2020082888 A1 WO 2020082888A1
Authority
WO
WIPO (PCT)
Prior art keywords
ssd
fault domain
fault
block
controller
Prior art date
Application number
PCT/CN2019/103085
Other languages
French (fr)
Chinese (zh)
Inventor
蒲贵友
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201811560345.7A external-priority patent/CN111104056B/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020217012802A priority Critical patent/KR102648688B1/en
Priority to EP19875722.1A priority patent/EP3851949A4/en
Publication of WO2020082888A1 publication Critical patent/WO2020082888A1/en
Priority to US17/233,893 priority patent/US20210240584A1/en
Priority to US17/883,708 priority patent/US20230076381A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • the present invention relates to the field of information technology, and in particular, to a data recovery method, system, and device in a storage system.
  • Redundant Array of Independent Disks (RAID) technology is a technology widely used in storage systems to ensure data reliability.
  • RAID reconstruction When the storage system has a hard disk damage, the data on the hard disk and the verification data can be restored Calculating the data on the damaged hard disk, such a process is called RAID reconstruction.
  • SSD Solid State Disk
  • a data recovery method in a storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein the first SSD and the second SSD each include multiple Fault domain, the storage system includes a block group formed based on an erasure coding algorithm, the block group includes a first block and a second block; the address of the first block is mapped to the first fault domain of the first SSD The physical address provided, the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD; the method includes: the controller receives the fault information of the first SSD, the The fault information is used to indicate that a fault has occurred in the first fault domain; in response to the fault information, the controller restores the logical address of the first block in the block group according to the erasure code algorithm.
  • Both the first SSD and the second SSD include multiple fault domains, but the number of fault domains of the first SSD and the second SSD may be different. Therefore, compared to the prior art, the storage system according to the embodiment of the present invention does not need to reconstruct the data in all logical addresses of the failed SSD, but only needs to reconstruct the data in a part of the logical address in the SSD.
  • the partial logical address refers to The logical address mapped to the physical address in the fault domain where the fault occurred, thereby increasing the speed of data reconstruction.
  • mapping the address of the first block to the physical address provided by the first fault domain of the first SSD includes: the address of the first block is the first logical address of the first SSD, and the first A logical address is mapped to a physical address provided by a first fault domain of the first SSD; an address of the second block is mapped to a physical address provided by a second fault domain of the second SSD includes: the second block Is the second logical address of the second SSD; the second logical address is mapped to the physical address provided by the second fault domain of the second SSD.
  • the address of the first block is a physical address provided by the first fault domain of the first SSD, and the address of the first block The physical address provided by the first fault domain mapped to the first SSD is the address directly mapped to the physical address provided by the first fault domain of the first SSD; the address of the second block is The physical address provided by the second fault domain of the second SSD, the address of the second block is directly mapped to the physical address provided by the second fault domain of the second SSD.
  • the embodiment of the present invention also supports indirect mapping between the block address and the physical address provided by the fault domain.
  • the storage system stores the correspondence between the address of the first block and the first fault domain, and the address and address of the second block The corresponding relationship of the second fault domain is described.
  • the address of the first block is the first logical address of the first SSD
  • the address of the second block is the second logical address of the second SSD.
  • the storage system stores the correspondence between the blocks included in the block group and the fault domain.
  • the first block belongs to the first fault domain and the second block belongs to the second fault domain.
  • the storage system also stores a fault domain index table.
  • the fault domain index table contains the correspondence between the fault domain and the block group.
  • the fault domain index In the table different fault domains can correspond to the same block group.
  • the controller can quickly find the block groups affected by the fault domain according to the fault domain index table, so as to quickly reconstruct the data in the blocks affected by the fault domain in these block groups.
  • one fault domain in the first SSD and the second SSD is a plurality of particle packages connected to one channel, or one or more particle packages, or one or more particles, or one Or multiple flash memory chips.
  • the responding to the fault information includes: the controller querying the correspondence between the first fault domain and the block group to determine the block group.
  • the storage system stores the correspondence between the address of the first block and the first fault domain, and the address and address of the second block The corresponding relationship of the second fault domain is described.
  • a method for SSD management of a solid state drive includes a first fault domain and a second fault domain.
  • the method includes: assigning a logic of a first range of the SSD to the first fault domain Address; the second fault domain is assigned a logical address of the second range of the SSD.
  • the method further includes: separately recording the correspondence between the first fault domain and the logical address of the first range and the second fault domain and The corresponding relationship of the logical addresses in the second range.
  • the logical address of the first range and the logical address of the second range are consecutive logical addresses or, the logical addresses of the first range and The logical addresses in the second range are discontinuous logical addresses.
  • the method further includes: the SSD sending a correspondence between the first fault domain and the first range of logical addresses to a controller of a storage system And the correspondence between the second fault domain and the logical address in the second range; wherein, the storage system includes the SSD.
  • an embodiment of the present invention provides a controller applied to a storage system, including units, for implementing various solutions of the first aspect.
  • an embodiment of the present invention provides an SSD management device, including units, for implementing various solutions of the second aspect.
  • an embodiment of the present invention provides a computer-readable storage medium in which computer instructions are stored.
  • the computer instructions are used to execute various methods of the first aspect.
  • an embodiment of the present invention provides a computer program product containing computer instructions, which are used to execute various methods of the first aspect.
  • an embodiment of the present invention provides a computer-readable storage medium in which computer instructions are stored, and the computer instructions are used to perform various methods of the second aspect.
  • an embodiment of the present invention provides a computer program product containing computer instructions, which are used to execute various methods of the second aspect.
  • an embodiment of the present invention provides a solid state drive SSD, which includes an SSD controller, a first fault domain, and a second fault domain; the SSD controller is used to implement various solutions of the second aspect.
  • an embodiment of the present invention provides a controller applied to a storage system.
  • the controller includes an interface and a processor, and is used to implement various solutions of the first aspect.
  • an embodiment of the present invention provides a data recovery method in a storage system, where the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the Each second SSD includes multiple namespaces, one namespace corresponds to one fault domain, and the storage system includes a block group formed based on an erasure coding algorithm, and the block group includes a first block and a second block; the first The address of a block is the first logical address of the first namespace of the first SSD, and the address of the second block is the second logical address of the second namespace of the second SSD; the first logic The address is mapped to the physical address provided by the first fault domain of the first SSD, and the second logical address is mapped to the physical address provided by the second fault domain of the second SSD; the method includes: the controller Receiving fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain or a fault in the first namespace; in response to the fault information, the controller according
  • a method for SSD management of a solid state drive includes a first fault domain and a second fault domain.
  • the method includes: allocating the first namespace of the SSD to the first fault domain ; Assigning the second namespace of the SSD to the second fault domain.
  • the method further includes: separately recording the correspondence between the first fault domain and the namespace and the second fault domain and the second Namespace correspondence.
  • the method further includes: the SSD sending the correspondence between the first fault domain and the first namespace to the controller of the storage system and the The correspondence between the second fault domain and the second namespace; wherein, the storage system includes the SSD.
  • the method further includes: the SSD sending a correspondence between the first fault domain and the logical address of the first namespace to the controller of the storage system And the correspondence between the second fault domain and the logical address of the second namespace; wherein, the storage system includes the SSD.
  • FIG. 1 is a schematic diagram of a storage system according to an embodiment of the invention.
  • FIG. 2 is a schematic structural diagram of a storage array controller according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a distributed storage system according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a server in a distributed storage system according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of an SSD according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of the relationship between a fault domain and a logical address in an SSD in an embodiment of the present invention
  • FIG. 7 is a schematic diagram of the block group relationship in the storage system
  • Figure 8 is a schematic diagram of the fault domain index table
  • Figure 9 is a schematic diagram of the namespace index table
  • Figure 10 is a schematic diagram of the controller structure
  • FIG. 11 is a schematic diagram of the structure of an SSD management device.
  • the embodiment of the present invention solves the problem that when a partial failure of some components of an SSD occurs in the storage system, the fault domain is limited to the fault domain based on the way of the SSD fault domain and the physical space of the SSD, thereby reducing the impact range on the storage system side
  • the cost of reconstruction can be used to reconstruct less storage space and spend less time, thereby improving reliability.
  • the storage system in the embodiment of the present invention may be a storage array (such as Huawei of 18000 series, V3 series).
  • the storage array includes a controller 101 and multiple SSDs.
  • the controller 101 includes a central processing unit (Central Processing Unit, CPU) 201, a memory 202, and an interface 203.
  • the memory 202 stores computer instructions.
  • the CPU 201 executes the computer instructions in the memory 202 to manage and store data in the storage system. Access operations, data recovery and other operations.
  • a field programmable gate array (Field Programmable Gate Array, FPGA) or other hardware can also be used to perform all operations of CPU201 in the embodiment of the present invention, or FPGA or other hardware and CPU201 are used In order to perform some operations of the CPU 201 in the embodiment of the present invention.
  • FPGA Field Programmable Gate Array
  • the embodiment of the present invention uses a processor to refer to the combination of the CPU 201 and the memory 202, as well as the various implementations described above, and the processor communicates with the interface 203.
  • the interface 203 may be a network interface card (NIC) or a host bus adaptor (Host Bus Adaptor, HBA).
  • the storage system in this embodiment of the present invention may also be a distributed storage system (such as Huawei of Series) etc. Huawei of series.
  • the distributed block storage system includes multiple servers, such as server 1, server 2, server 3, server 4, server 5, and server 6, and the servers communicate with each other through InfiniBand or Ethernet.
  • the number of servers in the distributed block storage system can be increased according to actual needs, which is not limited in the embodiments of the present invention.
  • the server of the distributed block storage system includes the structure shown in FIG. 4.
  • each server in the distributed block storage system includes a central processing unit (Central Processing Unit, CPU) 401, a memory 402, an interface 403, SSD 1, SSD 2 and SSD 3, and the memory 402 stores computer instructions
  • the CPU 401 executes the program instructions in the memory 402 to perform corresponding operations.
  • the interface 403 may be a hardware interface, such as a network interface card (Network Interface Card, NIC) or a host bus adapter (Host Bus Adaptor, HBA), etc., or a program interface module.
  • FPGA Field Programmable Gate Array
  • the embodiments of the present invention collectively refer to the CPU 401 and the memory 402, FPGA and other hardware replacing the CPU 401 or the combination of the FPGA and other hardware replacing the CPU 401 and the CPU 401 as a processor.
  • the interface 403 may be a network interface card (Networking Interface Card, NIC) or a host bus adapter (Host Bus Adaptor, HBA).
  • NIC Network Interface Card
  • HBA host bus adapter
  • the server responsible for storage management in the distributed storage system is called the controller. Specifically, the controller is used to perform storage space management and data access.
  • SSDs use pages as read and write units and blocks as erase units.
  • SSDs can implement multiple channels, particle packages, flash memory chips, die, and flash chips. Three levels of data access in parallel.
  • the SSD organizes the flash particle package in a multi-channel manner. Multiple particle packages can be connected to each channel. The transmission channel is shared among the multi-particle packages, but instructions can be executed independently.
  • the specific structure of the SSD can refer to FIG. 5, including the interface 501, the SSD controller 502, the channel 503, and the package 504. Among them, one package 504 contains multiple flash memory chips, each flash memory chip includes one or more particles, each particle includes multiple flash memory chips, each flash memory chip includes multiple blocks, and each block includes multiple pages.
  • the interface 501 may be a serial connected small computer system interface (Serial Attached Small Computer System Interface (SAS) protocol, non-volatile memory fast (Non-Volatile Memory Express, NVMe) protocol or fast peripheral interconnect (Peripheral Component) Interconnect Express, PCIe) protocol interface, etc.
  • SAS Serial Attached Small Computer System Interface
  • NVMe non-volatile memory fast
  • PCIe fast peripheral interconnect Express
  • an SSD fails, usually only some elements of the SSD, such as physical blocks, fail, but not the entire SSD. That is to say, when a fault occurs inside the SSD, the potential affected range is not the entire SSD, but a part of the SSD.
  • the embodiment of the present invention refers to this potentially affected part as the fault domain.
  • divide SSD into multiple fault domains for example, encapsulate multiple particles connected to a channel as a fault domain, or use one or more particles as a fault domain, or one or more flash memories Slice as a fault domain.
  • the failure domain is regarded as the range that is potentially affected by the failure, and the data in the failed failure domain needs to be restored.
  • the failure in the fault domain of the SSD may be a failure of the entire fault domain or a part of the fault domain.
  • the embodiment of the present invention may also use other SSD components as a fault domain, which is not limited in the embodiment of the present invention.
  • the SSD monitors the status of each fault domain.
  • the controller of the SSD monitors the status of the fault domain using methods such as background inspection.
  • the SSD can also determine the health status of the fault domain according to the number of erasures of the physical block Block in each fault domain, that is, the status of the fault domain according to the degree of wear.
  • the SSD provides storage space in the form of logical addresses.
  • the logical address is the logical block address (Logical Block Address, LBA)
  • the SSD uses the Flash translation layer (Flash Translation Layer, FTL) to map the LBA to the page on the physical block of the SSD, and establish the LBA to page Address mapping relationship.
  • the SSD configures LBA to page mapping according to the failure domain. For example, an SSD contains 128 particles, and the available capacity of the SSD is 32TB, which can provide a logical address of 32TB, or it can be said to provide an address space of 32TB.
  • the SSD contains 32 fault domains, and the fault domain identifiers are 0-31.
  • the SSD may use numbers or other methods to identify the fault domain, which is not limited in this embodiment of the present invention.
  • the SSD corresponding to each fault domain has a certain range of LBA, for example, the LBA corresponding to fault domain 0 ranges from 0 to (1TB-1), and the LBA corresponding to fault domain 1 ranges from 1TB to (2TB-1) ... ...,
  • the logical block address corresponding to the fault domain 31 ranges from 31TB to (32TB-1), that is, the logical addresses corresponding to one fault domain are continuous.
  • assigning a certain range of logical addresses to the fault domain that is, a certain range of LBAs.
  • the embodiment of the present invention allocates a certain range of logical addresses to the fault domain, which is also called SSD to map a certain range of logical addresses to physical addresses in a specific fault domain based on FTL.
  • assigning a certain range of logical addresses to the fault domain does not require all mappings of the certain range of logical addresses to physical addresses in the fault domain to be established.
  • the SSD selects a physical address in the fault domain to establish the mapping.
  • the LBAs in each fault domain may be discontinuous, that is, a certain range of logical addresses may be discontinuous logical addresses.
  • the 32TB LBA is divided into 32 parts, with 1 gigabyte (GB) as the granularity, each fault domain provides a physical address for the 1TB LBA, that is, the LBA corresponding to the fault domain 0 is 0 ⁇ (1GB-1 ), The LBA corresponding to fault domain 1 is 1 ⁇ (2GB-1), and the LBA corresponding to fault domain 31 is 31GB ⁇ (32GB-1).
  • the LBA corresponding to fault domain 0 is 0 ⁇ (1GB-1)
  • the LBA corresponding to fault domain 31 is 32GB ⁇ (33GB-1)
  • the LBA corresponding to fault domain 31 is 63GB ⁇ (64GB-1)
  • the corresponding relationship between the fault domain and the LBA is established in an interleaved manner.
  • the LBA corresponding to a fault domain is not continuous.
  • the SSD stores the above mapping relationship between the LBA and the fault domain.
  • the SSD reports the mapping relationship between the LBA and the fault domain to the controller 101.
  • the controller 101 uses a redundancy algorithm, such as erasure coding (Erasure Coding, EC) algorithm. Chunks from different SSDs are used to form Chunk Group (CKG).
  • the EC algorithm can be a RAID algorithm. As shown in Figure 7, CKG consists of CK1, CK2, and CK3. CK1 is provided by SSD1, CK2 is provided by SSD2, and CK3 is provided by SSD3.
  • the address of CK1 is LBA1 of SSD1, the address of CK2 is LBA2 of SSD2, and the address of CK3 is LBA3 of SSD3.
  • LBA1 is mapped to the physical address provided by fault domain 1 of SSD1.
  • the address of CK1 is mapped to the physical address provided by fault domain 1 of SSD1;
  • LBA2 is mapped to the physical address provided by fault domain 2 of SSD2, and
  • LBA3 is mapped The physical address provided to fault domain 3 of SSD3.
  • the failure domain of the SSD that provides CK based on the load may be considered.
  • the load can be the type of input output (InputOutput, IO), IO coldness and so on.
  • the SSD sends the correspondence between the fault domain and the LBA to the controller 101, and the controller 101 can determine the fault domain corresponding to the logical address of each CK in the CKG according to the correspondence between the SSD's fault domain and the logical address.
  • the controller 101 acquires the status information of the SSD. For example, if the fault domain 1 of SSD1 fails, SSD1 sends fault information to the controller 101 to indicate that the fault domain 1 fails.
  • the controller 101 can determine the LBA affected by the failure of the fault domain 1 of the SSD1 according to the correspondence between the SSD fault domain and the logical address, the storage array contains multiple CKGs, and the controller 101 finds that the address of the CK contained in the CKG is mapped to SSD1 LBA of fault domain 1. For example, it is determined that the address of CK1 included in CKG1 is the LBA mapped to the fault domain 1 of the SSD.
  • the controller 101 restores the data of CK1 in CKG1 according to a redundant algorithm, such as the EC algorithm. Therefore, compared with the prior art, the embodiments of the present invention do not need to reconstruct the CK corresponding to all the logical addresses provided by SSD1, which provides the speed of data reconstruction.
  • the data in CK1 may be restored to other fault domains of SSD1 or other SSDs, which is not limited in the embodiment of the present invention.
  • the SSD reports the above mapping relationship between the LBA and the fault domain to the controller 101, so the storage array stores the correspondence between the address of the CK contained in the CKG and the fault domain.
  • the first CK belongs to the first fault domain and the second CK belongs to the second fault domain.
  • the storage array also stores a fault domain index table, for example, the fault domain index table contains faults Correspondence between the domain and the CKG, for example, the correspondence between the fault domain ID and the CKG ID.
  • the controller 101 can quickly find the CKGs affected by the fault domain according to the fault domain index table, so as to quickly reconstruct the data in the CKs of the CKGs affected by the fault domain.
  • the controller 101 may record the corresponding entry in the fault domain index table according to the mapping relationship between the LBA and the fault domain when creating the CKG, and the entry contains the correspondence between the fault domain and the CKG.
  • an implementation can establish a multi-level fault domain index table, for example, the first level is the SSD and fault domain index table, the second level is the fault domain and CKG index table; another In this implementation, as shown in FIG. 8, the fault domain index table can be partitioned according to the SSD, thereby facilitating quick query.
  • the SSD in an SSD that supports the NVME interface specification, can be assigned a corresponding namespace according to the number of fault domains, that is, one fault domain corresponds to one namespace. Therefore, the logical addresses of different namespaces of an SSD can be addressed independently. For example, taking the available capacity of the SSD as 32TB as an example, the SSD is divided into 32 fault domains, and a namespace is allocated to each fault domain. The LBA range of each namespace is 0 to (1TB-1). The LBA of a namespace is mapped to the physical address in the fault domain corresponding to the namespace.
  • the SSD reports the mapping relationship between the namespace and the fault domain to the controller 101.
  • the SSD stores the mapping relationship between the aforementioned namespace and the fault domain.
  • the SSD reports the mapping relationship between the namespace and the fault domain to the controller 101.
  • the mapping relationship between the LBA in the namespace and the fault domain can also be reported.
  • it may be considered to decide the namespace of the SSD that provides CK based on the load.
  • the load can be the type of input output (InputOutput, IO), IO coldness and so on.
  • the storage array stores the fault domain index table.
  • the storage array stores a namespace index table
  • the namespace index table contains the correspondence between the namespace and the CKG, for example, the correspondence between the namespace ID and the CKG ID. Because the same CKG contains CKs from different SSD namespaces, in the namespace index table, different namespaces can correspond to the same CKG.
  • the SSD reports fault information to the controller 101.
  • the fault information is used to indicate the namespace in which the fault occurs.
  • the fault information includes a namespace identifier.
  • the controller 101 can quickly find the CKGs affected by the fault domain according to the namespace index table, so as to quickly reconstruct the data in the CKs of the CKGs affected by the fault domain.
  • the controller 101 may record the corresponding entry in the namespace index table according to the mapping relationship between the namespace and the fault domain when the CKG is allocated and established, and the entry contains the correspondence between the namespace and the CKG.
  • an implementation method can establish a multi-level namespace index table, for example, the first level is the SSD and namespace index tables, and the second level is the namespace and CKG index tables; another implementation, As shown in Figure 9, the namespace index table can be partitioned according to SSD, which facilitates quick query.
  • valid data is also written in different physical addresses in the same fault domain.
  • the controller of the SSD collects wear information of each fault domain in the SSD, and reports the wear information of the fault domain to the controller 101.
  • the controller 101 creates the CKG
  • the CK mapped to the physical address of the corresponding fault domain is selected according to the wear level of each fault domain of the SSD and the frequency of data modification.
  • the embodiments of the present invention can also be applied to SSDs supporting Open Channel (Open-Channel). It is supported in Open-channel SSD, an implementation method.
  • the SSD is divided into multiple fault domains, and the controller 101 of the storage system can directly access the physical address of the SSD.
  • SSD establishes the mapping relationship between the fault domain and the physical address of the SSD, then the address of the CK that forms the CKG in the storage system can be the physical address of the SSD, that is, the CK address is the physical address provided by the SSD fault domain, and the physical mapping of the CK to the SSD fault domain The physical address provided.
  • an embodiment of the present invention also provides a controller, which is applied to a storage system, where the storage system includes the controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the first Both SSDs contain multiple fault domains, and the storage system includes a block group based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to the first The physical address provided by the first fault domain of the SSD, and the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD; the controller, as shown in FIG. 10, includes a receiving unit 1001 and Recovery unit 1002.
  • the receiving unit 1001 is used to receive the fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain; the recovery unit 1002 is used to respond to the fault information according to the correction
  • the code algorithm restores the data stored in the address of the first block in the block group.
  • the storage system stores the correspondence between the address of the first block and the first fault domain, and the correspondence between the address of the second block and the second fault domain, and the controller further includes a query The unit is configured to query the correspondence between the first fault domain and the block group to determine the block group.
  • the controller provided in FIG. 10 in the embodiment of the present invention may also be implemented by software.
  • an embodiment of the present invention further provides an SSD management device, where the SSD includes a first fault domain and a second fault domain, and the SSD management device includes a first allocation unit 1101 for A fault domain allocates a logical address of the first range of the SSD; a second allocation unit 1102 is used to allocate a logical address of the second range of the SSD to the second fault domain.
  • the SSD management apparatus further includes a sending unit for sending the correspondence between the first fault domain and the logical address of the first range and the second fault domain and the second to the controller of the storage system Correspondence of logical addresses of ranges; wherein, the storage system includes the SSD.
  • the SSD management apparatus further includes a recording unit for respectively recording the correspondence between the first fault domain and the logical address of the first range and the logic of the second fault domain and the second range Address correspondence.
  • a recording unit for respectively recording the correspondence between the first fault domain and the logical address of the first range and the logic of the second fault domain and the second range Address correspondence.
  • An embodiment of the present invention provides a computer-readable storage medium that stores computer instructions, and when the computer instructions run on the controller 101 shown in FIG. 1 or the server shown in FIG. 4, The method in the embodiment of the present invention is executed.
  • An embodiment of the present invention provides a computer program product containing computer instructions.
  • the computer instructions run on the controller 101 shown in FIG. 1 or the server shown in FIG. 4, the method in the embodiment of the present invention is executed.
  • Each unit of the data recovery apparatus provided by the embodiment of the present invention may be implemented by a processor, or may be implemented by a processor and a memory together, or may be implemented by software.
  • Embodiments of the present invention provide a computer program product containing computer instructions.
  • the computer instructions run on an SSD controller, the SSD management method in the embodiments of the present invention is executed.
  • the logical address in the embodiment of the present invention may also be a KV in a key-value (KV) disk, or a log in a log disk.
  • KV key-value
  • the correspondence relationship and the mapping relationship have the same meaning.
  • the expression of the correspondence between the address of the block and the fault domain has the same meaning as the correspondence between the fault domain and the address of the block.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present invention essentially or part of the contribution to the existing technology or part of the technical solution can be embodied in the form of a software product
  • the computer software product is stored in a storage medium, including Several computer instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present invention.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store computer instructions .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

Embodiments of the present invention provide a method for restoring data in a storage system. The method comprises: dividing a solid state disk (SSD) into multiple failure domains, each of the failure domains being used for providing a physical address for a logical address of the SSD within a certain range, so that when failures occur to the failure domains of the SSD, it is unnecessary to reconstruct data in the whole SSD.

Description

存储系统中数据恢复方法、系统及装置Data recovery method, system and device in storage system 技术领域Technical field
本发明涉及信息技术领域,特别涉及一种存储系统中数据恢复方法、系统及装置。The present invention relates to the field of information technology, and in particular, to a data recovery method, system, and device in a storage system.
背景技术Background technique
独立硬盘冗余阵列(Redundant Array of Independent Disks,RAID)技术是存储系统中广泛使用的保证数据可靠性的技术,当存储系统发生硬盘损坏时,通过未损坏硬盘上数据和校验数据,可以重新计算出损坏硬盘上的数据,这样一个过程被称为RAID的重构。Redundant Array of Independent Disks (RAID) technology is a technology widely used in storage systems to ensure data reliability. When the storage system has a hard disk damage, the data on the hard disk and the verification data can be restored Calculating the data on the damaged hard disk, such a process is called RAID reconstruction.
在固态硬盘(Solid State Disk,SSD)组成的基于RAID的存储系统中,如果重构速度为1太字节(Terabyte,TB)每5小时,那么一个容量为1TB的SSD发生局部故障,会消耗5小时;而如果SSD容量为100TB,那么重构时间将变为500小时。In a RAID-based storage system composed of Solid State Disk (SSD), if the reconstruction speed is 1 terabyte (TB) every 5 hours, then an SSD with a capacity of 1TB will be consumed locally and will consume 5 hours; and if the SSD capacity is 100TB, then the reconstruction time will become 500 hours.
发明内容Summary of the invention
第一方面,提供了一种存储系统中数据恢复方法,所述存储系统包括控制器、第一固态硬盘SSD和第二SSD;其中,所述第一SSD和所述第二SSD均包含多个故障域,所述存储系统包含基于纠删码算法构成的块组,所述块组包含第一块和第二块;所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址,所述第二块的地址映射到所述第二SSD的第二故障域提供的物理地址;所述方法包括:所述控制器接收所述第一SSD的故障信息,所述故障信息用于指示所述第一故障域发生故障;响应于所述故障信息,所述控制器根据所述纠删码算法恢复所述块组中的所述第一块的逻辑地址中存储的数据。所述第一SSD和所述第二SSD均包含多个故障域,但第一SSD和所述第二SSD的故障域的数量可以不同。因此,相较于现有技术,本发明实施例存储系统不需要重构发生故障SSD的全部逻辑地址中的数据,只需要重构该SSD中部分逻辑地址中的数据,该部分逻辑地址是指映射到发生故障的故障域中的物理地址的逻辑地址,从而提高了数据重构的速度。In a first aspect, a data recovery method in a storage system is provided. The storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein the first SSD and the second SSD each include multiple Fault domain, the storage system includes a block group formed based on an erasure coding algorithm, the block group includes a first block and a second block; the address of the first block is mapped to the first fault domain of the first SSD The physical address provided, the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD; the method includes: the controller receives the fault information of the first SSD, the The fault information is used to indicate that a fault has occurred in the first fault domain; in response to the fault information, the controller restores the logical address of the first block in the block group according to the erasure code algorithm. data. Both the first SSD and the second SSD include multiple fault domains, but the number of fault domains of the first SSD and the second SSD may be different. Therefore, compared to the prior art, the storage system according to the embodiment of the present invention does not need to reconstruct the data in all logical addresses of the failed SSD, but only needs to reconstruct the data in a part of the logical address in the SSD. The partial logical address refers to The logical address mapped to the physical address in the fault domain where the fault occurred, thereby increasing the speed of data reconstruction.
具体实现,所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址包括:所述第一块的地址为所述第一SSD的第一逻辑地址,所述第一逻辑地址映射到所述第一SSD的第一故障域提供的物理地址;所述第二块的地址映射到所述第二SSD的第二故障域提供的物理地址包括:所述第二块的地址为所述第二SSD的第二逻辑地址;所述第二逻辑地址映射到所述第二SSD的第二故障域提供的物理地址。在另一种实现,在支持开放通道(Open-Channel)的SSD场景中,所述第一块的地址为所述第一SSD的第一故障域提供的物理地址,所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址为所述第一块的地址直接映射到所述第一SSD的第一故障域提供的物理地址;所述第二块的地址为所述第二SSD的第二故障域提供的物理地址,所述第二块的地址直接映射到所述第二SSD的第二故障域提供的物理地址。另一种实现中,在Open-Channel的SSD场景中,本发明实施例也支持块地址与故障域提供的物理地址的间接映射。In a specific implementation, mapping the address of the first block to the physical address provided by the first fault domain of the first SSD includes: the address of the first block is the first logical address of the first SSD, and the first A logical address is mapped to a physical address provided by a first fault domain of the first SSD; an address of the second block is mapped to a physical address provided by a second fault domain of the second SSD includes: the second block Is the second logical address of the second SSD; the second logical address is mapped to the physical address provided by the second fault domain of the second SSD. In another implementation, in an SSD scenario that supports Open-Channel, the address of the first block is a physical address provided by the first fault domain of the first SSD, and the address of the first block The physical address provided by the first fault domain mapped to the first SSD is the address directly mapped to the physical address provided by the first fault domain of the first SSD; the address of the second block is The physical address provided by the second fault domain of the second SSD, the address of the second block is directly mapped to the physical address provided by the second fault domain of the second SSD. In another implementation, in the SSD scenario of Open-Channel, the embodiment of the present invention also supports indirect mapping between the block address and the physical address provided by the fault domain.
结合第一方面,在第一方面的某些实现方式中,所述存储系统存储有所述第一块的地址与所述第一故障域的对应关系,以及所述第二块的地址与所述第二故障域的对应关系。 第一块的地址即第一SSD的第一逻辑地址,第二块的地址即第二SSD的第二逻辑地址。进一步的,存储系统存储有所述块组中包含的块与故障域的对应关系,例如,所述第一块属于第一故障域,所述第二块属于第二故障域。进一步的,存储系统还存储有故障域索引表,例如,故障域索引表包含故障域与块组的对应关系,因为同一个块组中包含来自不同SSD的故障域的块,所以在故障域索引表中,不同的故障域可以对应相同的块组。当某一个SSD的故障域发生故障,控制器根据该故障域索引表可以快速查找到受该故障域影响的块组,从而快速重构这些块组中受该故障域影响的块中的数据。With reference to the first aspect, in some implementations of the first aspect, the storage system stores the correspondence between the address of the first block and the first fault domain, and the address and address of the second block The corresponding relationship of the second fault domain is described. The address of the first block is the first logical address of the first SSD, and the address of the second block is the second logical address of the second SSD. Further, the storage system stores the correspondence between the blocks included in the block group and the fault domain. For example, the first block belongs to the first fault domain and the second block belongs to the second fault domain. Further, the storage system also stores a fault domain index table. For example, the fault domain index table contains the correspondence between the fault domain and the block group. Because the same block group contains blocks from different SSD fault domains, the fault domain index In the table, different fault domains can correspond to the same block group. When a fault domain of an SSD fails, the controller can quickly find the block groups affected by the fault domain according to the fault domain index table, so as to quickly reconstruct the data in the blocks affected by the fault domain in these block groups.
可选的,所述第一SSD和所述第二SSD中一个故障域为一个通道上连接的多个颗粒封装,或者为一个或多个颗粒封装,或者为一个或多个颗粒,或者为一个或多个闪存片。Optionally, one fault domain in the first SSD and the second SSD is a plurality of particle packages connected to one channel, or one or more particle packages, or one or more particles, or one Or multiple flash memory chips.
结合第一方面,在第一方面的某些实现方式中,所述响应于所述故障信息,包括:所述控制器查询所述第一故障域与所述块组的对应关系确定所述块组。With reference to the first aspect, in some implementations of the first aspect, the responding to the fault information includes: the controller querying the correspondence between the first fault domain and the block group to determine the block group.
结合第一方面,在第一方面的某些实现方式中,所述存储系统存储有所述第一块的地址与所述第一故障域的对应关系,以及所述第二块的地址与所述第二故障域的对应关系。With reference to the first aspect, in some implementations of the first aspect, the storage system stores the correspondence between the address of the first block and the first fault domain, and the address and address of the second block The corresponding relationship of the second fault domain is described.
第二方面,提供了一种固态硬盘SSD管理方法,所述SSD包含第一故障域和第二故障域,所述方法包括:为所述第一故障域分配所述SSD的第一范围的逻辑地址;为所述第二故障域分配所述SSD的第二范围的逻辑地址。In a second aspect, a method for SSD management of a solid state drive is provided. The SSD includes a first fault domain and a second fault domain. The method includes: assigning a logic of a first range of the SSD to the first fault domain Address; the second fault domain is assigned a logical address of the second range of the SSD.
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:分别记录所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系。With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: separately recording the correspondence between the first fault domain and the logical address of the first range and the second fault domain and The corresponding relationship of the logical addresses in the second range.
结合第二方面,在第二方面的某些实现方式中,所述第一范围的逻辑地址和所述第二范围的逻辑地址均为连续的逻辑地址或者,所述第一范围的逻辑地址和所述第二范围的逻辑地址为不连续的逻辑地址。With reference to the second aspect, in some implementations of the second aspect, the logical address of the first range and the logical address of the second range are consecutive logical addresses or, the logical addresses of the first range and The logical addresses in the second range are discontinuous logical addresses.
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:所述SSD向存储系统的控制器发送所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系;其中,所述存储系统包括所述SSD。With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: the SSD sending a correspondence between the first fault domain and the first range of logical addresses to a controller of a storage system And the correspondence between the second fault domain and the logical address in the second range; wherein, the storage system includes the SSD.
第三方面,本发明实施例提供了一种应用于存储系统的控制器,包括各单元,用于实现第一方面各种方案。In a third aspect, an embodiment of the present invention provides a controller applied to a storage system, including units, for implementing various solutions of the first aspect.
第四方面,本发明实施例提供了一种SSD管理装置,包括各单元,用于实现第二方面各种方案。According to a fourth aspect, an embodiment of the present invention provides an SSD management device, including units, for implementing various solutions of the second aspect.
第五方面,本发明实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,该计算机指令用于执行第一方面的各种方法。According to a fifth aspect, an embodiment of the present invention provides a computer-readable storage medium in which computer instructions are stored. The computer instructions are used to execute various methods of the first aspect.
第六方面,本发明实施例提供了一种包含计算机指令的计算机程序产品,该计算机指令用于执行第一方面的各种方法。According to a sixth aspect, an embodiment of the present invention provides a computer program product containing computer instructions, which are used to execute various methods of the first aspect.
第七方面,本发明实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,该计算机指令用于执行第二方面的各种方法。According to a seventh aspect, an embodiment of the present invention provides a computer-readable storage medium in which computer instructions are stored, and the computer instructions are used to perform various methods of the second aspect.
第八方面,本发明实施例提供了一种包含计算机指令的计算机程序产品,该计算机指令用于执行第二方面的各种方法。In an eighth aspect, an embodiment of the present invention provides a computer program product containing computer instructions, which are used to execute various methods of the second aspect.
第九方面,本发明实施例提供了一种固态硬盘SSD,该SSD包含SSD控制器、第一故障 域和第二故障域;该SSD控制器用于执行第二方面各种方案。In a ninth aspect, an embodiment of the present invention provides a solid state drive SSD, which includes an SSD controller, a first fault domain, and a second fault domain; the SSD controller is used to implement various solutions of the second aspect.
第十方面,本发明实施例提供了一种应用于存储系统的控制器,控制器包含接口和处理器,用于实现第一方面各种方案。According to a tenth aspect, an embodiment of the present invention provides a controller applied to a storage system. The controller includes an interface and a processor, and is used to implement various solutions of the first aspect.
第十一方面,本发明实施例提供了提供了一种存储系统中数据恢复方法,所述存储系统包括控制器、第一固态硬盘SSD和第二SSD;其中,所述第一SSD和所述第二SSD均包含多个命名空间namespace,一个命名空间对应一个故障域,所述存储系统包含基于纠删码算法构成的块组,所述块组包含第一块和第二块;所述第一块的地址为所述第一SSD的第一命名空间的第一逻辑地址,所述第二块的地址为所述第二SSD的第二命名空间的第二逻辑地址;所述第一逻辑地址映射到所述第一SSD的第一故障域提供的物理地址,所述第二逻辑地址映射到所述第二SSD的第二故障域提供的物理地址;所述方法包括:所述控制器接收所述第一SSD的故障信息,所述故障信息用于指示所述第一故障域发生故障或所述第一命名空间发生故障;响应于所述故障信息,所述控制器根据所述纠删码算法恢复所述块组中的所述第一块的逻辑地址中存储的数据。According to an eleventh aspect, an embodiment of the present invention provides a data recovery method in a storage system, where the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the Each second SSD includes multiple namespaces, one namespace corresponds to one fault domain, and the storage system includes a block group formed based on an erasure coding algorithm, and the block group includes a first block and a second block; the first The address of a block is the first logical address of the first namespace of the first SSD, and the address of the second block is the second logical address of the second namespace of the second SSD; the first logic The address is mapped to the physical address provided by the first fault domain of the first SSD, and the second logical address is mapped to the physical address provided by the second fault domain of the second SSD; the method includes: the controller Receiving fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain or a fault in the first namespace; in response to the fault information, the controller according to the correction Delete code Restoring the block data of the group of a logical address stored.
第十二方面,提供了一种固态硬盘SSD管理方法,所述SSD包含第一故障域和第二故障域,所述方法包括:为所述第一故障域分配所述SSD的第一命名空间;为所述第二故障域分配所述SSD的第二命名空间。According to a twelfth aspect, a method for SSD management of a solid state drive is provided. The SSD includes a first fault domain and a second fault domain. The method includes: allocating the first namespace of the SSD to the first fault domain ; Assigning the second namespace of the SSD to the second fault domain.
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:分别记录所述第一故障域与所述命名空间的对应关系以及所述第二故障域与所述第二命名空间的对应关系。With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: separately recording the correspondence between the first fault domain and the namespace and the second fault domain and the second Namespace correspondence.
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:所述SSD向存储系统的控制器发送所述第一故障域与所述第一命名空间的对应关系以及所述第二故障域与所述第二命名空间的对应关系;其中,所述存储系统包括所述SSD。With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: the SSD sending the correspondence between the first fault domain and the first namespace to the controller of the storage system and the The correspondence between the second fault domain and the second namespace; wherein, the storage system includes the SSD.
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:所述SSD向存储系统的控制器发送所述第一故障域与所述第一命名空间的逻辑地址对应关系以及所述第二故障域与所述第二命名空间的逻辑地址的对应关系;其中,所述存储系统包括所述SSD。With reference to the second aspect, in some implementation manners of the second aspect, the method further includes: the SSD sending a correspondence between the first fault domain and the logical address of the first namespace to the controller of the storage system And the correspondence between the second fault domain and the logical address of the second namespace; wherein, the storage system includes the SSD.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。In order to more clearly explain the technical solutions of the embodiments of the present application, the drawings required in the description of the embodiments will be briefly introduced below.
图1为本发明实施例的存储系统示意图;1 is a schematic diagram of a storage system according to an embodiment of the invention;
图2为本发明实施例的存储阵列控制器结构示意图;2 is a schematic structural diagram of a storage array controller according to an embodiment of the present invention;
图3为本发明实施例分布式存储系统示意图;3 is a schematic diagram of a distributed storage system according to an embodiment of the present invention;
图4为本发明实施例分布式存储系统中的服务器结构示意图;4 is a schematic structural diagram of a server in a distributed storage system according to an embodiment of the present invention;
图5为本发明实施例SSD的结构示意图;5 is a schematic structural diagram of an SSD according to an embodiment of the present invention;
图6为本发明实施例中SSD中故障域与逻辑地址关系示意图;6 is a schematic diagram of the relationship between a fault domain and a logical address in an SSD in an embodiment of the present invention;
图7为存储系统中块组关系示意图;7 is a schematic diagram of the block group relationship in the storage system;
图8为故障域索引表示意图;Figure 8 is a schematic diagram of the fault domain index table;
图9为namespace索引表示意图;Figure 9 is a schematic diagram of the namespace index table;
图10为控制器结构示意图;Figure 10 is a schematic diagram of the controller structure;
图11为SSD管理装置结构示意图。11 is a schematic diagram of the structure of an SSD management device.
具体实施方式detailed description
下面对本申请实施例中的技术方案进行更详细地描述。The technical solutions in the embodiments of the present application are described in more detail below.
本发明实施例解决存储系统中SSD在出现部分组件局部故障时,基于SSD的故障域和SSD的物理空间对应的方式,将故障范围限制在故障域内,从而减少在存储系统侧的影响范围,减少重构的开销,能够对更少的存储空间花费更少的时间进行重构,从而获得可靠性的提升。The embodiment of the present invention solves the problem that when a partial failure of some components of an SSD occurs in the storage system, the fault domain is limited to the fault domain based on the way of the SSD fault domain and the physical space of the SSD, thereby reducing the impact range on the storage system side The cost of reconstruction can be used to reconstruct less storage space and spend less time, thereby improving reliability.
如图1所示,本发明实施例中的存储系统,可以为存储阵列(如华为
Figure PCTCN2019103085-appb-000001
Figure PCTCN2019103085-appb-000002
Figure PCTCN2019103085-appb-000003
18000系列,
Figure PCTCN2019103085-appb-000004
V3系列)。存储阵列包括控制器101和多块SSD。如图2所示,控制器101包含中央处理单元(Central Processing Unit,CPU)201、存储器202和接口203,存储器202中存储计算机指令,CPU201执行存储器202中的计算机指令对存储系统进行管理及数据访问操作、数据恢复等操作。另外,为节省CPU201的计算资原,现场可编程门阵列(Field Programmable Gate Array,FPGA)或其他硬件也可以用于执行本发明实施例中CPU201全部操作,或者,FPGA或其他硬件与CPU201分别用于执行本发明实施例CPU201的部分操作。为方便描述,本发明实施例统一用处理器来指CPU201和存储器202的组合,以及上述各种实现,处理器与接口203通信。接口203可以为网络接口卡(Networking Interface Card,NIC)、主机总线适配器(Host Bus Adaptor,HBA)。
As shown in FIG. 1, the storage system in the embodiment of the present invention may be a storage array (such as Huawei
Figure PCTCN2019103085-appb-000001
of
Figure PCTCN2019103085-appb-000002
Figure PCTCN2019103085-appb-000003
18000 series,
Figure PCTCN2019103085-appb-000004
V3 series). The storage array includes a controller 101 and multiple SSDs. As shown in FIG. 2, the controller 101 includes a central processing unit (Central Processing Unit, CPU) 201, a memory 202, and an interface 203. The memory 202 stores computer instructions. The CPU 201 executes the computer instructions in the memory 202 to manage and store data in the storage system. Access operations, data recovery and other operations. In addition, in order to save the computational resources of CPU201, a field programmable gate array (Field Programmable Gate Array, FPGA) or other hardware can also be used to perform all operations of CPU201 in the embodiment of the present invention, or FPGA or other hardware and CPU201 are used In order to perform some operations of the CPU 201 in the embodiment of the present invention. For ease of description, the embodiment of the present invention uses a processor to refer to the combination of the CPU 201 and the memory 202, as well as the various implementations described above, and the processor communicates with the interface 203. The interface 203 may be a network interface card (NIC) or a host bus adaptor (Host Bus Adaptor, HBA).
进一步的,本发明实施例的存储系统还可以为分布式存储系统(如华为
Figure PCTCN2019103085-appb-000005
Figure PCTCN2019103085-appb-000006
系列)等。以华为
Figure PCTCN2019103085-appb-000007
Figure PCTCN2019103085-appb-000008
系列。示例性的如图3所示,分布式块存储系统包括多台服务器,如服务器1、服务器2、服务器3、服务器4、服务器5和服务器6,服务器间通过InfiniBand或以太网络等互相通信。在实际应用当中,分布式块存储系统中服务器的数量可以根据实际需求增加,本发明实施例对此不作限定。
Further, the storage system in this embodiment of the present invention may also be a distributed storage system (such as Huawei
Figure PCTCN2019103085-appb-000005
of
Figure PCTCN2019103085-appb-000006
Series) etc. Huawei
Figure PCTCN2019103085-appb-000007
of
Figure PCTCN2019103085-appb-000008
series. Exemplarily as shown in FIG. 3, the distributed block storage system includes multiple servers, such as server 1, server 2, server 3, server 4, server 5, and server 6, and the servers communicate with each other through InfiniBand or Ethernet. In practical applications, the number of servers in the distributed block storage system can be increased according to actual needs, which is not limited in the embodiments of the present invention.
分布式块存储系统的服务器中包含如图4所示的结构。如图4所示,分布式块存储系统中的每台服务器包含中央处理单元(Central Processing Unit,CPU)401、内存402、接口403、SSD 1、SSD 2和SSD 3,内存402中存储计算机指令,CPU401执行内存402中的程序指令执行相应的操作。接口403可以为硬件接口,如网络接口卡(Network Interface Card,NIC)或主机总线适配器(Host Bus Adaptor,HBA)等,也可以为程序接口模块等。另外,为节省CPU401的计算资原,现场可编程门阵列(Field Programmable Gate Array,FPGA)或其他硬件也可以代替CPU401执行上述相应的操作,或者,FPGA或其他硬件与CPU401共同执行上述相应的操作。为方便描述,本发明实施例将CPU401与内存402、FPGA及其他替代CPU401的硬件或FPGA及其他替代CPU401的硬件与CPU401的组合统称为处理器。接口403可以为网络接口卡(Networking Interface Card,NIC)、主机总线适配器(Host Bus Adaptor,HBA)。在分布式存储系统中,负责分布式存储系统中存储管理的服务器称为控制器。具体的,控制器用于执行存储空间管理,数据访问等。The server of the distributed block storage system includes the structure shown in FIG. 4. As shown in FIG. 4, each server in the distributed block storage system includes a central processing unit (Central Processing Unit, CPU) 401, a memory 402, an interface 403, SSD 1, SSD 2 and SSD 3, and the memory 402 stores computer instructions The CPU 401 executes the program instructions in the memory 402 to perform corresponding operations. The interface 403 may be a hardware interface, such as a network interface card (Network Interface Card, NIC) or a host bus adapter (Host Bus Adaptor, HBA), etc., or a program interface module. In addition, in order to save the computational resources of CPU401, Field Programmable Gate Array (FPGA) or other hardware can also replace CPU401 to perform the above corresponding operations, or FPGA or other hardware and CPU401 can perform the above corresponding operations together . For convenience of description, the embodiments of the present invention collectively refer to the CPU 401 and the memory 402, FPGA and other hardware replacing the CPU 401 or the combination of the FPGA and other hardware replacing the CPU 401 and the CPU 401 as a processor. The interface 403 may be a network interface card (Networking Interface Card, NIC) or a host bus adapter (Host Bus Adaptor, HBA). In a distributed storage system, the server responsible for storage management in the distributed storage system is called the controller. Specifically, the controller is used to perform storage space management and data access.
SSD以页(page)为读写单元,以块(block)为擦除单元,SSD可以实现通道(channel)、颗粒封装(package)、闪存芯片、颗粒(die)、闪存片(plane)等多个级别的数据访问的并行。SSD 以多通道方式组织闪存颗粒封装,每个通道上可连接多个颗粒封装,多颗粒封装之间共享传输通道,但可独立执行指令。SSD具体结构可参考图5,包含接口501、SSD控制器502、channel 503、封装504。其中,一个封装504中包含多个闪存芯片,每一个闪存芯片包含一个或多个颗粒,每一个颗粒包含多个闪存片,每一个闪存片包含多个块,每一个块包含多个页。其中,接口501可以是支持串行连接小型计算机系统接口(Serial Attached Small Computer System Interface,SAS)协议、非易失性内存快速(Non-Volatile Memory Express,NVMe)协议或者快速外设互联(Peripheral Component Interconnect Express,PCIe)协议的接口等。SSDs use pages as read and write units and blocks as erase units. SSDs can implement multiple channels, particle packages, flash memory chips, die, and flash chips. Three levels of data access in parallel. The SSD organizes the flash particle package in a multi-channel manner. Multiple particle packages can be connected to each channel. The transmission channel is shared among the multi-particle packages, but instructions can be executed independently. The specific structure of the SSD can refer to FIG. 5, including the interface 501, the SSD controller 502, the channel 503, and the package 504. Among them, one package 504 contains multiple flash memory chips, each flash memory chip includes one or more particles, each particle includes multiple flash memory chips, each flash memory chip includes multiple blocks, and each block includes multiple pages. Among them, the interface 501 may be a serial connected small computer system interface (Serial Attached Small Computer System Interface (SAS) protocol, non-volatile memory fast (Non-Volatile Memory Express, NVMe) protocol or fast peripheral interconnect (Peripheral Component) Interconnect Express, PCIe) protocol interface, etc.
SSD发生故障,通常只是SSD的部分元素发生故障,例如物理块等,而不是整个SSD发生故障。也就是说当SSD内部发生故障时,潜在的受该故障影响的范围并不是整个SSD,而是SSD的一部分,本发明实施例称这一潜在的受该故障影响的部分为故障域。根据SSD的结构,将SSD划分为多个故障域,例如,将一个通道上连接的多个颗粒封装作为一个故障域,或者将一个或多个颗粒作为一个故障域,或者将一个或多个闪存片作为一个故障域。本发明实施例中,SSD发生故障,则将故障域作为受该故障潜在影响的范围,需要对该发生故障的故障域中的数据进行恢复。实际应用场景中,SSD的故障域发生故障可以是该故障域整体发生故障,也可以是该故障域内部分发生故障。本发明实施例还可以将SSD的其他组件作为一个故障域,本发明实施例对此不作限定。SSD监控每一个故障域的状态。具体实现,SSD的控制器使用后台巡检等方式监控故障域的状态。SSD还可以根据每一个故障域内的物理块Block的擦除次数确定故障域的健康状态,即根据磨损度确定故障域的状态。If an SSD fails, usually only some elements of the SSD, such as physical blocks, fail, but not the entire SSD. That is to say, when a fault occurs inside the SSD, the potential affected range is not the entire SSD, but a part of the SSD. The embodiment of the present invention refers to this potentially affected part as the fault domain. According to the structure of SSD, divide SSD into multiple fault domains, for example, encapsulate multiple particles connected to a channel as a fault domain, or use one or more particles as a fault domain, or one or more flash memories Slice as a fault domain. In the embodiment of the present invention, when the SSD fails, the failure domain is regarded as the range that is potentially affected by the failure, and the data in the failed failure domain needs to be restored. In an actual application scenario, the failure in the fault domain of the SSD may be a failure of the entire fault domain or a part of the fault domain. The embodiment of the present invention may also use other SSD components as a fault domain, which is not limited in the embodiment of the present invention. The SSD monitors the status of each fault domain. In specific implementation, the controller of the SSD monitors the status of the fault domain using methods such as background inspection. The SSD can also determine the health status of the fault domain according to the number of erasures of the physical block Block in each fault domain, that is, the status of the fault domain according to the degree of wear.
SSD对外以逻辑地址的形式提供存储空间。在SSD中,逻辑地址为逻辑块地址(Logical Block Address,LBA),SSD使用闪存转换层(Flash Translation Layer,FTL)将LBA映射到SSD的物理块上的页(page),并且建立LBA到页地址的映射关系。本发明实施例为解决在存储系统中,SSD发生故障要进行SSD全盘数据恢复的问题,SSD按照故障域配置LBA到页的映射。例如,1个SSD包含128个颗粒,SSD的可用容量为32TB,即可提供32TB的逻辑地址,或称为可提供32TB的地址空间。如果要将SSD发生故障影响的LBA范围限制在1TB大小,则故障域数量为32个,即32TB/1TB=32。本发明实施例中,SSD包含128个颗粒,则每个故障域中颗粒的数量为4个,即128/32=4个。如图6所示,SSD包含32个故障域,故障域标识分别为0-31。具体实现中,SSD可以使用数字或其他方式来标识故障域,本发明实施例对此不作限定。一种实现方式,每一个故障域对应的SSD一定范围的LBA,例如,故障域0对应的LBA范围为0~(1TB-1),故障域1对应的LBA为1TB~(2TB-1)……,故障域31对应的逻辑块地址范围为31TB~(32TB-1),即一个故障域对应的逻辑地址连续。上述也称为为故障域分配一定范围的逻辑地址,即一定范围的LBA。本发明实施例为故障域分配一定范围逻辑地址,也称为SSD基于FTL将一定范围逻辑地址映射到特定的故障域中的物理地址。本发明实施例中为故障域分配一定范围逻辑地址并不要求全部建立该一定范围的逻辑地址到该故障域中的物理地址的映射。一种实现方式,在需要建立该一定范围的逻辑地址中的特定逻辑块地址到物理地址的映射时,SSD再在该故障域中选择物理地址建立映射。本发明实施例另一种实现,仍然以上述SSD为例,每一个故障域中的LBA可以不连续,即一定范 围逻辑地址可以是不连续的逻辑地址。例如将32TB的LBA划分为32份,以1吉字节(Gigabyte,GB)为粒度划分,每一个故障域为1TB的LBA提供物理地址,即故障域0对应的LBA为0~(1GB-1),故障域1对应的LBA为1~(2GB-1),故障域31对应的LBA为31GB~(32GB-1)。然后故障域0对应的LBA为0~(1GB-1),故障域31对应的LBA为32GB~(33GB-1)……,故障域31对应的LBA为63GB~(64GB-1),通过循环交错的方式建立故障域与LBA的对应关系,在这种实现方式中,一个故障域对应的LBA不连续。SSD存储上述LBA与故障域的映射关系。SSD向控制器101上报上述LBA与故障域的映射关系。SSD provides storage space in the form of logical addresses. In the SSD, the logical address is the logical block address (Logical Block Address, LBA), and the SSD uses the Flash translation layer (Flash Translation Layer, FTL) to map the LBA to the page on the physical block of the SSD, and establish the LBA to page Address mapping relationship. In the embodiment of the present invention, in order to solve the problem of SSD full disk data recovery in the case of SSD failure in the storage system, the SSD configures LBA to page mapping according to the failure domain. For example, an SSD contains 128 particles, and the available capacity of the SSD is 32TB, which can provide a logical address of 32TB, or it can be said to provide an address space of 32TB. If the LBA range affected by SSD failure is limited to 1TB, the number of failure domains is 32, that is, 32TB / 1TB = 32. In the embodiment of the present invention, the SSD contains 128 particles, and the number of particles in each fault domain is 4, that is, 128/32 = 4. As shown in Figure 6, the SSD contains 32 fault domains, and the fault domain identifiers are 0-31. In a specific implementation, the SSD may use numbers or other methods to identify the fault domain, which is not limited in this embodiment of the present invention. An implementation manner, the SSD corresponding to each fault domain has a certain range of LBA, for example, the LBA corresponding to fault domain 0 ranges from 0 to (1TB-1), and the LBA corresponding to fault domain 1 ranges from 1TB to (2TB-1) ... …, The logical block address corresponding to the fault domain 31 ranges from 31TB to (32TB-1), that is, the logical addresses corresponding to one fault domain are continuous. The above is also referred to as assigning a certain range of logical addresses to the fault domain, that is, a certain range of LBAs. The embodiment of the present invention allocates a certain range of logical addresses to the fault domain, which is also called SSD to map a certain range of logical addresses to physical addresses in a specific fault domain based on FTL. In the embodiment of the present invention, assigning a certain range of logical addresses to the fault domain does not require all mappings of the certain range of logical addresses to physical addresses in the fault domain to be established. In an implementation manner, when a mapping from a specific logical block address to a physical address in the logical address in a certain range needs to be established, the SSD selects a physical address in the fault domain to establish the mapping. In another implementation of the embodiment of the present invention, still taking the above SSD as an example, the LBAs in each fault domain may be discontinuous, that is, a certain range of logical addresses may be discontinuous logical addresses. For example, the 32TB LBA is divided into 32 parts, with 1 gigabyte (GB) as the granularity, each fault domain provides a physical address for the 1TB LBA, that is, the LBA corresponding to the fault domain 0 is 0 ~ (1GB-1 ), The LBA corresponding to fault domain 1 is 1 ~ (2GB-1), and the LBA corresponding to fault domain 31 is 31GB ~ (32GB-1). Then the LBA corresponding to fault domain 0 is 0 ~ (1GB-1), the LBA corresponding to fault domain 31 is 32GB ~ (33GB-1) ……, the LBA corresponding to fault domain 31 is 63GB ~ (64GB-1), through the loop The corresponding relationship between the fault domain and the LBA is established in an interleaved manner. In this implementation, the LBA corresponding to a fault domain is not continuous. The SSD stores the above mapping relationship between the LBA and the fault domain. The SSD reports the mapping relationship between the LBA and the fault domain to the controller 101.
本发明实施例,以图1所示的存储阵列作为存储系统为例,其中,SSD提供固定长度的块(Chunk,CK),控制器101根据使用冗余算法,例如纠删码(Erasure Coding,EC)算法,将分别来不同的SSD的Chunk组成块组(Chunk Group,CKG),具体实现,EC算法可以为RAID算法。如图7所示,CKG由CK1、CK2和CK3组成。CK1由SSD1提供,CK2由SSD2提供,CK3由SSD3提供。CK1的地址为SSD1的LBA1,CK2的地址为SSD2的LBA2,CK3的地址为SSD3的LBA3。其中,LBA1映射到SSD1的故障域1提供的物理地址,在此,称CK1的地址映射到SSD1的故障域1的提供的物理地址;LBA2映射到SSD2的故障域2提供的物理地址,LBA3映射到SSD3的故障域3提供的物理地址。本发明实施例中,从多个SSD选择CK组成CKG时,可以考虑根据基于负载决定提供CK的SSD的故障域。负载可以为输入输出(InputOutput,IO)的类型,IO冷热度等。其中一种实现,SSD将故障域与LBA的对应关系发送给控制器101,控制器101可以根据SSD的故障域与逻辑地址的对应关系确定CKG中每一个CK的逻辑地址对应的故障域。控制器101获取SSD的状态信息,例如,SSD1的故障域1发生故障,SSD1向控制器101发送故障信息,用于指示故障域1发生故障。由于控制器101根据SSD的故障域与逻辑地址的对应关系可以确定SSD1的故障域1发生故障影响的LBA,存储阵列包含多个CKG,控制器101查找CKG中包含的CK的地址为映射到SSD1的故障域1的LBA。例如确定CKG1包含的CK1的地址为映射到SSD的故障域1的LBA。控制器101根据冗余算法,如EC算法,恢复CKG1中CK1的数据。因此,相较于现有技术,本发明实施例不需要重构SSD1所提供的全部逻辑地址对应的CK,提供了数据重构的速度。具体实现过程,可以将CK1中的数据恢复到SSD1的其他故障域或者其他SSD的中,本发明实施例对此不作限定。In the embodiment of the present invention, taking the storage array shown in FIG. 1 as a storage system as an example, wherein the SSD provides a fixed-length block (Chunk, CK), the controller 101 uses a redundancy algorithm, such as erasure coding (Erasure Coding, EC) algorithm. Chunks from different SSDs are used to form Chunk Group (CKG). For specific implementation, the EC algorithm can be a RAID algorithm. As shown in Figure 7, CKG consists of CK1, CK2, and CK3. CK1 is provided by SSD1, CK2 is provided by SSD2, and CK3 is provided by SSD3. The address of CK1 is LBA1 of SSD1, the address of CK2 is LBA2 of SSD2, and the address of CK3 is LBA3 of SSD3. Among them, LBA1 is mapped to the physical address provided by fault domain 1 of SSD1. Here, the address of CK1 is mapped to the physical address provided by fault domain 1 of SSD1; LBA2 is mapped to the physical address provided by fault domain 2 of SSD2, and LBA3 is mapped The physical address provided to fault domain 3 of SSD3. In the embodiment of the present invention, when a CK is selected from multiple SSDs to form a CKG, the failure domain of the SSD that provides CK based on the load may be considered. The load can be the type of input output (InputOutput, IO), IO coldness and so on. In one implementation, the SSD sends the correspondence between the fault domain and the LBA to the controller 101, and the controller 101 can determine the fault domain corresponding to the logical address of each CK in the CKG according to the correspondence between the SSD's fault domain and the logical address. The controller 101 acquires the status information of the SSD. For example, if the fault domain 1 of SSD1 fails, SSD1 sends fault information to the controller 101 to indicate that the fault domain 1 fails. Since the controller 101 can determine the LBA affected by the failure of the fault domain 1 of the SSD1 according to the correspondence between the SSD fault domain and the logical address, the storage array contains multiple CKGs, and the controller 101 finds that the address of the CK contained in the CKG is mapped to SSD1 LBA of fault domain 1. For example, it is determined that the address of CK1 included in CKG1 is the LBA mapped to the fault domain 1 of the SSD. The controller 101 restores the data of CK1 in CKG1 according to a redundant algorithm, such as the EC algorithm. Therefore, compared with the prior art, the embodiments of the present invention do not need to reconstruct the CK corresponding to all the logical addresses provided by SSD1, which provides the speed of data reconstruction. In a specific implementation process, the data in CK1 may be restored to other fault domains of SSD1 or other SSDs, which is not limited in the embodiment of the present invention.
进一步的,SSD向控制器101上报上述LBA与故障域的映射关系,因此存储阵列存储有CKG中包含的CK的地址与故障域的对应关系,例如,第一CK属于第一故障域,第二CK属于第二故障域。进一步的,为了快速查找CKG中包含的CK的地址为映射到SSD1的故障域的LBA,根据LBA与故障域的映射关系,存储阵列还存储有故障域索引表,例如,故障域索引表包含故障域与CKG的对应关系,例如故障域标识与CKG标识的对应关系。因为同一个CKG中包含来自不同SSD的故障域的CK,所以在故障域索引表中,不同的故障域可以对应相同的CKG。当某一个SSD的故障域发生故障,控制器101根据该故障域索引表可以快速查找到受该故障域影响的CKG,从而快速重构这些CKG中受该故障域影响的CK中的数据。具体实现中,控制器101可以在创建CKG时,根据LBA与故障域的映射关系,在故障域索引表中记录相应的表项,表项中包含故障域与CKG的对应关系。为方便故障域索引表的查询和管理,一种实现,可以建立多级故障域索引表,例如,第一 级为SSD与故障域索引表,第二级为故障域与CKG索引表;另一种实现,如图8所示,故障域索引表可以根据SSD进行分区,从而方便快速查询。Further, the SSD reports the above mapping relationship between the LBA and the fault domain to the controller 101, so the storage array stores the correspondence between the address of the CK contained in the CKG and the fault domain. For example, the first CK belongs to the first fault domain and the second CK belongs to the second fault domain. Further, in order to quickly find the address of the CK contained in the CKG as the LBA mapped to the fault domain of SSD1, according to the mapping relationship between the LBA and the fault domain, the storage array also stores a fault domain index table, for example, the fault domain index table contains faults Correspondence between the domain and the CKG, for example, the correspondence between the fault domain ID and the CKG ID. Because the same CKG contains CKs from fault domains of different SSDs, in the fault domain index table, different fault domains can correspond to the same CKG. When a fault domain of a certain SSD fails, the controller 101 can quickly find the CKGs affected by the fault domain according to the fault domain index table, so as to quickly reconstruct the data in the CKs of the CKGs affected by the fault domain. In a specific implementation, the controller 101 may record the corresponding entry in the fault domain index table according to the mapping relationship between the LBA and the fault domain when creating the CKG, and the entry contains the correspondence between the fault domain and the CKG. In order to facilitate the query and management of the fault domain index table, an implementation can establish a multi-level fault domain index table, for example, the first level is the SSD and fault domain index table, the second level is the fault domain and CKG index table; another In this implementation, as shown in FIG. 8, the fault domain index table can be partitioned according to the SSD, thereby facilitating quick query.
在本发明实施例中,另外一种实现方式,在支持NVME接口规范的SSD中,可以根据故障域的数量为SSD分配相应的命名空间(namespace),即一个故障域对应一个namespace。因此,一个SSD的不同namespace的逻辑地址可以独立编址。例如,仍以SSD的可用容量为32TB为例,将SSD划分为32个故障域,为一个故障域分配一个namespace,每一个namespace的LBA范围均为0~(1TB-1)。一个namespace的LBA映射到该namespace对应的故障域内的物理地址。SSD向控制器101上报命名空间和故障域的映射关系。SSD存储上述namespace与故障域的映射关系。SSD向控制器101上报上述namespace与故障域的映射关系。另一种实现,也可以上报上述namespace中的LBA与故障域的映射关系。本发明实施例中,从多个SSD选择CK组成CKG时,可以考虑根据基于负载决定提供CK的SSD的namespace。负载可以为输入输出(InputOutput,IO)的类型,IO冷热度等。In another embodiment of the present invention, in an SSD that supports the NVME interface specification, the SSD can be assigned a corresponding namespace according to the number of fault domains, that is, one fault domain corresponds to one namespace. Therefore, the logical addresses of different namespaces of an SSD can be addressed independently. For example, taking the available capacity of the SSD as 32TB as an example, the SSD is divided into 32 fault domains, and a namespace is allocated to each fault domain. The LBA range of each namespace is 0 to (1TB-1). The LBA of a namespace is mapped to the physical address in the fault domain corresponding to the namespace. The SSD reports the mapping relationship between the namespace and the fault domain to the controller 101. The SSD stores the mapping relationship between the aforementioned namespace and the fault domain. The SSD reports the mapping relationship between the namespace and the fault domain to the controller 101. In another implementation, the mapping relationship between the LBA in the namespace and the fault domain can also be reported. In the embodiment of the present invention, when selecting CK from multiple SSDs to form a CKG, it may be considered to decide the namespace of the SSD that provides CK based on the load. The load can be the type of input output (InputOutput, IO), IO coldness and so on.
相应的,如前面所述,存储阵列存储有故障域索引表。另一种实现,存储阵列存储namespace索引表,namespace索引表包含namespace与CKG的对应关系,例如namespace标识与CKG标识的对应关系。因为同一个CKG中包含来自不同SSD的namespace的CK,所以在namespace索引表中,不同namespace可以对应相同的CKG。当某一个SSD的故障域发生故障,SSD向控制器101上报故障信息,故障信息用于指示发生故障的namespace,例如,故障信息包含namespace标识。控制器101根据该namespace索引表可以快速查找到受该故障域影响的CKG,从而快速重构这些CKG中受该故障域影响的CK中的数据。具体实现中,控制器101可以在分配建立CKG时,根据namespace与故障域的映射关系,在namespace索引表中记录相应的表项,表项中包含namespace与CKG的对应关系。为方便namespace索引表的查询与管理,一种实现方式,可以建立多级namespace索引表,例如,第一级为SSD与namespace索引表,第二级为namespace与CKG索引表;另一种实现,如图9所示,namespace索引表可以根据SSD进行分区,从而方便快速查询。Correspondingly, as mentioned above, the storage array stores the fault domain index table. In another implementation, the storage array stores a namespace index table, and the namespace index table contains the correspondence between the namespace and the CKG, for example, the correspondence between the namespace ID and the CKG ID. Because the same CKG contains CKs from different SSD namespaces, in the namespace index table, different namespaces can correspond to the same CKG. When a fault domain of an SSD fails, the SSD reports fault information to the controller 101. The fault information is used to indicate the namespace in which the fault occurs. For example, the fault information includes a namespace identifier. The controller 101 can quickly find the CKGs affected by the fault domain according to the namespace index table, so as to quickly reconstruct the data in the CKs of the CKGs affected by the fault domain. In a specific implementation, the controller 101 may record the corresponding entry in the namespace index table according to the mapping relationship between the namespace and the fault domain when the CKG is allocated and established, and the entry contains the correspondence between the namespace and the CKG. In order to facilitate the query and management of the namespace index table, an implementation method can establish a multi-level namespace index table, for example, the first level is the SSD and namespace index tables, and the second level is the namespace and CKG index tables; another implementation, As shown in Figure 9, the namespace index table can be partitioned according to SSD, which facilitates quick query.
本发明实施例中,SSD进行垃圾数据回收时,有效数据也写在相同的故障域的不同物理地址中。In the embodiment of the present invention, when the SSD performs garbage data collection, valid data is also written in different physical addresses in the same fault domain.
本发明实施例中,SSD的控制器收集SSD内部各故障域的磨损信息,并将故障域的磨损信息上报给控制器101。控制器101创建CKG时,根据SSD的各故障域的磨损度和数据的修改频率选择映射到相应的故障域的物理地址的CK。In the embodiment of the present invention, the controller of the SSD collects wear information of each fault domain in the SSD, and reports the wear information of the fault domain to the controller 101. When the controller 101 creates the CKG, the CK mapped to the physical address of the corresponding fault domain is selected according to the wear level of each fault domain of the SSD and the frequency of data modification.
本发明实施例还可以应用于支持开放通道(Open-Channel)的SSD。支持在Open-channel的SSD中,一种实现方式,SSD划分为多个故障域,存储系统的控制器101可以直接访问SSD的物理地址。SSD建立故障域与SSD的物理地址的映射关系,则存储系统中组成CKG的CK的地址可以SSD的物理地址,即CK的地址为SSD故障域提供的物理地址,CK的物理映射到SSD故障域提供的物理地址。本发明实施例中基于支持Open-channel的SSD的实现所需的其他操作可以参考本发明其他实施例的描述,在此不再赘述。The embodiments of the present invention can also be applied to SSDs supporting Open Channel (Open-Channel). It is supported in Open-channel SSD, an implementation method. The SSD is divided into multiple fault domains, and the controller 101 of the storage system can directly access the physical address of the SSD. SSD establishes the mapping relationship between the fault domain and the physical address of the SSD, then the address of the CK that forms the CKG in the storage system can be the physical address of the SSD, that is, the CK address is the physical address provided by the SSD fault domain, and the physical mapping of the CK to the SSD fault domain The physical address provided. For other operations required for the implementation of the SSD supporting Open-channel in the embodiment of the present invention, reference may be made to the description of other embodiments of the present invention, and details are not described herein again.
本发明实施例中SSD执行的各种操作可以由SSD的控制器执行。Various operations performed by the SSD in the embodiments of the present invention may be performed by the SSD controller.
相应的,本发明实施例也提供了一种控制器,应用于存储系统,其中,存储系统包括该控制器、第一固态硬盘SSD和第二SSD;其中,所述第一SSD和所述第二SSD均包含 多个故障域,所述存储系统包含基于纠删码算法构成的块组,所述块组包含第一块和第二块;所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址,所述第二块的地址映射到所述第二SSD的第二故障域提供的物理地址;该控制器,如图10所示,包括接收单元1001和恢复单元1002。其中,接收单元1001用于接收所述第一SSD的故障信息,所述故障信息用于指示所述第一故障域发生故障;恢复单元1002用于响应于所述故障信息,根据所述纠删码算法恢复所述块组中的所述第一块的地址中存储的数据。进一步的,该存储系统存储所述第一块的地址与所述第一故障域的对应关系,以及所述第二块的地址与所述第二故障域的对应关系,该控制器还包括查询单元,用于用于查询所述第一故障域与所述块组的对应关系确定所述块组。图10所示的控制器的具体实现,可以参考本发明实施例前面的实现,如图2所示控制器101的结构,在此不作赘述。另一种实现,本发明实施例图10提供的控制器还可以由软件实现。Correspondingly, an embodiment of the present invention also provides a controller, which is applied to a storage system, where the storage system includes the controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the first Both SSDs contain multiple fault domains, and the storage system includes a block group based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to the first The physical address provided by the first fault domain of the SSD, and the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD; the controller, as shown in FIG. 10, includes a receiving unit 1001 and Recovery unit 1002. Wherein, the receiving unit 1001 is used to receive the fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain; the recovery unit 1002 is used to respond to the fault information according to the correction The code algorithm restores the data stored in the address of the first block in the block group. Further, the storage system stores the correspondence between the address of the first block and the first fault domain, and the correspondence between the address of the second block and the second fault domain, and the controller further includes a query The unit is configured to query the correspondence between the first fault domain and the block group to determine the block group. For the specific implementation of the controller shown in FIG. 10, reference may be made to the previous implementation of the embodiment of the present invention, and the structure of the controller 101 shown in FIG. 2 will not be repeated here. For another implementation, the controller provided in FIG. 10 in the embodiment of the present invention may also be implemented by software.
如图11所示,本发明实施例还提供了一种SSD管理装置,其中,SSD包括第一故障域和第二故障域,该SSD管理装置包括第一分配单元1101,用于为所述第一故障域分配所述SSD的第一范围的逻辑地址;第二分配单元1102,用于为所述第二故障域分配所述SSD的第二范围的逻辑地址。进一步的,SSD管理装置还包括发送单元,用于向存储系统的控制器发送所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系;其中,所述存储系统包括所述SSD。进一步的,所述SSD管理装置还包括记录单元,用于分别记录所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系。本发明实施例提供的SSD管理装置其中一种硬件实现,可以参考SSD的控制器的结构,本发明实施例在此不再赘述。另一种实现,本发明实施例提供的SSD管理装置还可以由软件实现或者由SSD的控制器与软件共同实现。As shown in FIG. 11, an embodiment of the present invention further provides an SSD management device, where the SSD includes a first fault domain and a second fault domain, and the SSD management device includes a first allocation unit 1101 for A fault domain allocates a logical address of the first range of the SSD; a second allocation unit 1102 is used to allocate a logical address of the second range of the SSD to the second fault domain. Further, the SSD management apparatus further includes a sending unit for sending the correspondence between the first fault domain and the logical address of the first range and the second fault domain and the second to the controller of the storage system Correspondence of logical addresses of ranges; wherein, the storage system includes the SSD. Further, the SSD management apparatus further includes a recording unit for respectively recording the correspondence between the first fault domain and the logical address of the first range and the logic of the second fault domain and the second range Address correspondence. One of the hardware implementations of the SSD management device provided by the embodiment of the present invention can refer to the structure of the SSD controller, and details of the embodiment of the present invention will not be repeated here. In another implementation, the SSD management device provided by the embodiment of the present invention may also be implemented by software or jointly implemented by an SSD controller and software.
本发明实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在图1所示的控制器101或者图4所示的服务器上运行时,执行本发明实施例中的方法。An embodiment of the present invention provides a computer-readable storage medium that stores computer instructions, and when the computer instructions run on the controller 101 shown in FIG. 1 or the server shown in FIG. 4, The method in the embodiment of the present invention is executed.
本发明实施例提供了一种包含计算机指令的计算机程序产品,当该计算机指令在图1所示的控制器101或者图4所示的服务器运行时,执行本发明实施例中的方法。An embodiment of the present invention provides a computer program product containing computer instructions. When the computer instructions run on the controller 101 shown in FIG. 1 or the server shown in FIG. 4, the method in the embodiment of the present invention is executed.
本发明实施例提供的数据恢复装置的各单元可以由处理器实现,也可以由处理器与存储器共同实现,也可以由软件实现。Each unit of the data recovery apparatus provided by the embodiment of the present invention may be implemented by a processor, or may be implemented by a processor and a memory together, or may be implemented by software.
本发明实施例提供了一种包含计算机指令的计算机程序产品,当该计算机指令在SSD的控制器运行时,执行本发明实施例中的SSD管理方法。Embodiments of the present invention provide a computer program product containing computer instructions. When the computer instructions run on an SSD controller, the SSD management method in the embodiments of the present invention is executed.
本发明实施例中的逻辑地址还可以是键值(Key-Value,KV)盘中的KV,或者日志(log)盘的log等。The logical address in the embodiment of the present invention may also be a KV in a key-value (KV) disk, or a log in a log disk.
本发明实施例中,对应关系与映射关系具有相同的含义。块的地址与故障域的对应关系的表述与故障域与块的地址的对应关系具有相同的含义。In the embodiment of the present invention, the correspondence relationship and the mapping relationship have the same meaning. The expression of the correspondence between the address of the block and the fault domain has the same meaning as the correspondence between the fault domain and the address of the block.
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It should be noted that the memories described herein are intended to include, but are not limited to these and any other suitable types of memories.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可 以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art may realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed in hardware or software depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the present invention.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
在本发明所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a division of logical functions. In actual implementation, there may be other divisions, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干计算机指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储计算机指令的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention essentially or part of the contribution to the existing technology or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, including Several computer instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store computer instructions .
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present invention, but the scope of protection of the present invention is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed by the present invention. It should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (36)

  1. 一种存储系统中数据恢复方法,其特征在于,所述存储系统包括控制器、第一固态硬盘SSD和第二SSD;其中,所述第一SSD和所述第二SSD均包含多个故障域,所述存储系统包含基于纠删码算法构成的块组,所述块组包含第一块和第二块;所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址,所述第二块的地址映射到所述第二SSD的第二故障域提供的物理地址;A data recovery method in a storage system, characterized in that the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the second SSD each include multiple fault domains , The storage system includes a block group formed based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to the first fault domain provided by the first SSD Physical address, the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD;
    所述方法包括:The method includes:
    所述控制器接收所述第一SSD的故障信息,所述故障信息用于指示所述第一故障域发生故障;The controller receives fault information of the first SSD, and the fault information is used to indicate a fault in the first fault domain;
    响应于所述故障信息,所述控制器根据所述纠删码算法恢复所述块组中的所述第一块的地址中存储的数据。In response to the failure information, the controller restores the data stored in the address of the first block in the block group according to the erasure code algorithm.
  2. 根据权利要求1所述的方法,其特征在于,所述第一SSD和所述第二SSD中一个故障域为一个通道上连接的多个颗粒封装。The method according to claim 1, wherein a fault domain in the first SSD and the second SSD is a plurality of particle packages connected on one channel.
  3. 根据权利要求1所述的方法,其特征在于,所述第一SSD和所述第二SSD中一个故障域为一个或多个颗粒封装。The method according to claim 1, wherein one fault domain in the first SSD and the second SSD is one or more particle packages.
  4. 根据权利要求1所述的方法,其特征在于,所述第一SSD和所述第二SSD中一个故障域为一个或多个颗粒。The method according to claim 1, wherein one fault domain in the first SSD and the second SSD is one or more particles.
  5. 根据权利要求1所述的方法,其特征在于,所述第一SSD和所述第二SSD中一个故障域为一个或多个闪存片。The method according to claim 1, wherein one fault domain in the first SSD and the second SSD is one or more flash memory chips.
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述存储系统存储所述第一故障域与所述块组的对应关系,以及所述第二故障域与所述块组的对应关系。The method according to any one of claims 1-5, wherein the storage system stores the correspondence between the first fault domain and the block group, and the second fault domain and the block group Correspondence.
  7. 根据权利要求6所述的方法,其特征在于,所述响应于所述故障信息,包括:The method according to claim 6, wherein the response to the fault information includes:
    所述控制器查询所述第一故障域与所述块组的对应关系确定所述块组。The controller queries the correspondence between the first fault domain and the block group to determine the block group.
  8. 根据权利要求1-5任一所述的方法,其特征在于,所述存储系统存储有所述第一块的地址与所述第一故障域的对应关系,以及所述第二块的地址与所述第二故障域的对应关系。The method according to any one of claims 1 to 5, wherein the storage system stores the correspondence between the address of the first block and the first fault domain, and the address and the second block The corresponding relationship of the second fault domain.
  9. 一种固态硬盘SSD管理方法,其特征在于,所述SSD包含第一故障域和第二故障域,所述方法包括:A solid-state drive SSD management method, characterized in that the SSD includes a first fault domain and a second fault domain, and the method includes:
    为所述第一故障域分配所述SSD的第一范围的逻辑地址;Assign a logical address of the first range of the SSD to the first fault domain;
    为所述第二故障域分配所述SSD的第二范围的逻辑地址。A second range of logical addresses of the SSD is allocated to the second fault domain.
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:The method according to claim 9, wherein the method further comprises:
    分别记录所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系。The corresponding relationship between the first fault domain and the logical address in the first range and the corresponding relationship between the second fault domain and the logical address in the second range are recorded separately.
  11. 根据权利要求9所述的方法,其特征在于,所述第一范围的逻辑地址和所述第二范围的逻辑地址均为连续的逻辑地址。The method according to claim 9, wherein the logical addresses in the first range and the logical addresses in the second range are consecutive logical addresses.
  12. 根据权利要求9所述的方法,其特征在于,所述第一范围的逻辑地址和所述第二范围的逻辑地址为不连续的逻辑地址。The method according to claim 9, wherein the logical address in the first range and the logical address in the second range are discrete logical addresses.
  13. 根据权利要求9所述的方法,其特征在于,所述方法还包括:The method according to claim 9, wherein the method further comprises:
    所述SSD向存储系统的控制器发送所述第一故障域与所述第一范围的逻辑地址的对应 关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系;其中,所述存储系统包括所述SSD。The SSD sends the correspondence between the first fault domain and the logical address of the first range and the correspondence between the second fault domain and the logical address of the second range to the controller of the storage system; wherein, The storage system includes the SSD.
  14. 根据权利要求9所述的方法,其特征在于,所述SSD的一个故障域为一个通道上连接的多个颗粒封装。The method according to claim 9, wherein a fault domain of the SSD is a plurality of particle packages connected to one channel.
  15. 根据权利要求9所述的方法,其特征在于,所述SSD的一个故障域为一个或多个颗粒封装。The method according to claim 9, wherein a fault domain of the SSD is encapsulated by one or more particles.
  16. 根据权利要求9所述的方法,其特征在于,所述SSD的一个故障域为一个或多个颗粒。The method according to claim 9, wherein a fault domain of the SSD is one or more particles.
  17. 根据权利要求9所述的方法,其特征在于,所述第一SSD和所述第二SSD中一个故障域为一个或多个闪存片。The method according to claim 9, wherein one fault domain in the first SSD and the second SSD is one or more flash memory chips.
  18. 一种存储系统,其特征在于,所述存储系统包括控制器、第一固态硬盘SSD和第二SSD;其中,所述第一SSD和所述第二SSD均包含多个故障域,所述存储系统包含基于纠删码算法构成的块组,所述块组包含第一块和第二块;所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址,所述第二块的地址映射到所述第二SSD的第二故障域提供的物理地址;A storage system is characterized in that the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, both the first SSD and the second SSD include multiple fault domains, and the storage The system includes a block group formed based on an erasure coding algorithm. The block group includes a first block and a second block; the address of the first block is mapped to the physical address provided by the first fault domain of the first SSD. The address of the second block is mapped to the physical address provided by the second fault domain of the second SSD;
    所述控制器用于:The controller is used to:
    接收所述第一SSD的故障信息,所述故障信息用于指示所述第一故障域发生故障;Receiving fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain;
    响应于所述故障信息,根据所述纠删码算法恢复所述块组中的所述第一块的地址中存储的数据。In response to the failure information, the data stored in the address of the first block in the block group is restored according to the erasure code algorithm.
  19. 根据权利要求18所述的存储系统,其特征在于,所述存储系统存储所述第一故障域与所述块组的对应关系,以及所述第二故障域与所述块组的对应关系;The storage system according to claim 18, wherein the storage system stores the correspondence between the first fault domain and the block group, and the correspondence between the second fault domain and the block group;
    所述控制器还用于查询所述第一故障域与所述块组的对应关系确定所述块组。The controller is also used to query the correspondence between the first fault domain and the block group to determine the block group.
  20. 一种固态硬盘SSD,其特征在于,所述SSD包含SSD控制器、第一故障域和第二故障域;所述SSD控制器用于:A solid state drive SSD, characterized in that the SSD includes an SSD controller, a first fault domain, and a second fault domain; the SSD controller is used for:
    为所述第一故障域分配所述SSD的第一范围的逻辑地址;Assign a logical address of the first range of the SSD to the first fault domain;
    为所述第二故障域分配所述SSD的第二范围的逻辑地址。A second range of logical addresses of the SSD is allocated to the second fault domain.
  21. 根据权利要求20所述的SSD,其特征在于,所述SSD控制器还用于:The SSD according to claim 20, wherein the SSD controller is further used to:
    分别记录所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系。The corresponding relationship between the first fault domain and the logical address in the first range and the corresponding relationship between the second fault domain and the logical address in the second range are recorded separately.
  22. 根据权利要求20所述的SSD,其特征在于,所述第一范围的逻辑地址和所述第二范围的逻辑地址均为连续的逻辑地址。The SSD according to claim 20, wherein the logical addresses in the first range and the logical addresses in the second range are consecutive logical addresses.
  23. 根据权利要求20所述的SSD,其特征在于,所述第一范围的逻辑地址和所述第二范围的逻辑地址为不连续的逻辑地址。The SSD according to claim 20, wherein the logical address in the first range and the logical address in the second range are discontinuous logical addresses.
  24. 根据权利要求20所述的SSD,其特征在于,所述SSD控制器还用于:The SSD according to claim 20, wherein the SSD controller is further used to:
    向存储系统的控制器发送所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系;其中,所述存储系统包括所述SSD。Sending the correspondence between the first fault domain and the logical address of the first range and the correspondence between the second fault domain and the logical address of the second range to the controller of the storage system; wherein, the storage The system includes the SSD.
  25. 一种控制器,其特征在于,所述控制器应用于存储系统中,所述存储系统包括所 述控制器、第一固态硬盘SSD和第二SSD;其中,所述第一SSD和所述第二SSD均包含多个故障域,所述存储系统包含基于纠删码算法构成的块组,所述块组包含第一块和第二块;所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址,所述第二块的地址映射到所述第二SSD的第二故障域提供的物理地址;A controller, characterized in that the controller is applied to a storage system, the storage system includes the controller, a first solid state drive SSD and a second SSD; wherein, the first SSD and the first Both SSDs contain multiple fault domains, and the storage system includes a block group based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to the first The physical address provided by the first fault domain of the SSD, and the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD;
    所述控制器包括:The controller includes:
    接收单元,用于接收所述第一SSD的故障信息,所述故障信息用于指示所述第一故障域发生故障;A receiving unit, configured to receive fault information of the first SSD, and the fault information is used to indicate a fault in the first fault domain;
    恢复单元,用于响应于所述故障信息,根据所述纠删码算法恢复所述块组中的所述第一块的地址中存储的数据。The recovery unit is configured to recover the data stored in the address of the first block in the block group according to the erasure code algorithm in response to the fault information.
  26. 根据权利要求25所述的控制器,其特征在于,所述存储系统存储所述第一故障域与所述块组的对应关系,以及所述第二故障域与所述块组的对应关系;The controller according to claim 25, wherein the storage system stores the correspondence between the first fault domain and the block group, and the correspondence between the second fault domain and the block group;
    所述控制器还包括查询单元,用于查询所述第一故障域与所述块组的对应关系确定所述块组。The controller further includes a query unit for querying the correspondence between the first fault domain and the block group to determine the block group.
  27. 一种固态硬盘SSD管理装置,其特征在于,所述SSD包含第一故障域和第二故障域;所述SSD管理装置包括:A solid-state drive SSD management device, characterized in that the SSD includes a first fault domain and a second fault domain; the SSD management device includes:
    第一分配单元,用于为所述第一故障域分配所述SSD的第一范围的逻辑地址;A first allocation unit, configured to allocate a first range of logical addresses of the SSD to the first fault domain;
    第二分配单元,用于为所述第二故障域分配所述SSD的第二范围的逻辑地址。A second allocation unit is configured to allocate a second range of logical addresses of the SSD to the second fault domain.
  28. 根据权利要求27所述的SSD管理装置,其特征在于,SSD管理装置还包括记录单元,用于分别记录所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系。The SSD management device according to claim 27, wherein the SSD management device further includes a recording unit for respectively recording the corresponding relationship between the first fault domain and the logical address of the first range and the second Correspondence between the fault domain and the logical address of the second range.
  29. 根据权利要求27所述的SSD管理装置,其特征在于,所述SSD管理装置还包括发送单元,用于向存储系统的控制器发送所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系;其中,所述存储系统包括所述SSD。The SSD management apparatus according to claim 27, wherein the SSD management apparatus further comprises a sending unit for sending the first fault domain and the first range of logical addresses to the controller of the storage system The correspondence relationship and the correspondence relationship between the second fault domain and the logical address in the second range; wherein, the storage system includes the SSD.
  30. 一种控制器,其特征在于,所述控制器应用于存储系统,所述存储系统包括所述控制器、第一固态硬盘SSD和第二SSD;其中,所述第一SSD和所述第二SSD均包含多个故障域,所述存储系统包含基于纠删码算法构成的块组,所述块组包含第一块和第二块;所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址,所述第二块的地址映射到所述第二SSD的第二故障域提供的物理地址;A controller, characterized in that the controller is applied to a storage system, and the storage system includes the controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and the second SSDs all contain multiple fault domains, and the storage system includes a block group formed based on an erasure coding algorithm. The block group includes a first block and a second block; the address of the first block is mapped to the first SSD The physical address provided by the first fault domain of the, the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD;
    所述控制器包含处理器和接口;The controller includes a processor and an interface;
    所述接口,用于接收所述第一SSD的故障信息,所述故障信息用于指示所述第一故障域发生故障;The interface is used to receive fault information of the first SSD, and the fault information is used to indicate a fault in the first fault domain;
    所述处理器,用于响应于所述故障信息,根据所述纠删码算法恢复所述块组中的所述第一块的地址中存储的数据。The processor is configured to restore the data stored in the address of the first block in the block group according to the erasure code algorithm in response to the fault information.
  31. 根据权利要求30所述的处理器,其特征在于,所述存储系统存储所述第一故障域与所述块组的对应关系,以及所述第二故障域与所述块组的对应关系;The processor according to claim 30, wherein the storage system stores the correspondence between the first fault domain and the block group, and the correspondence between the second fault domain and the block group;
    所述处理器还用于查询所述第一故障域与所述块组的对应关系确定所述块组。The processor is also used to query the correspondence between the first fault domain and the block group to determine the block group.
  32. 一种计算机程序产品,其特征在于,所述计算机程序产品包括应用于存储系统的 计算机指令,所述存储系统包括控制器、第一固态硬盘SSD和第二SSD;其中,所述第一SSD和所述第二SSD均包含多个故障域,所述存储系统包含基于纠删码算法构成的块组,所述块组包含第一块和第二块;所述第一块的地址映射到所述第一SSD的第一故障域提供的物理地址,所述第二块的地址映射到所述第二SSD的第二故障域提供的物理地址;当所述控制器执行所述计算机指令,用于执行以下步骤:A computer program product, characterized in that the computer program product includes computer instructions applied to a storage system, the storage system includes a controller, a first solid state drive SSD, and a second SSD; wherein, the first SSD and Each of the second SSDs includes multiple fault domains, and the storage system includes a block group formed based on an erasure coding algorithm, and the block group includes a first block and a second block; the address of the first block is mapped to all The physical address provided by the first fault domain of the first SSD, and the address of the second block is mapped to the physical address provided by the second fault domain of the second SSD; when the controller executes the computer instruction, use To perform the following steps:
    接收所述第一SSD的故障信息,所述故障信息用于指示所述第一故障域发生故障;Receiving fault information of the first SSD, the fault information is used to indicate a fault in the first fault domain;
    响应于所述故障信息,根据所述纠删码算法恢复所述块组中的所述第一块的地址中存储的数据。In response to the failure information, the data stored in the address of the first block in the block group is restored according to the erasure code algorithm.
  33. 根据权利要求32所述的计算机程序产品,其特征在于,当所述控制器执行所述计算机指令,还用于执行以下步骤:The computer program product of claim 32, wherein when the controller executes the computer instruction, it is further used to perform the following steps:
    查询所述第一故障域与所述块组的对应关系确定所述块组。The corresponding relationship between the first fault domain and the block group is queried to determine the block group.
  34. 一种计算机程序产品,其特征在于,所述计算机程序产品包括应用于固态硬盘SSD的SSD控制器,所述SSD包含SSD控制器、第一故障域和第二故障域;当所述SSD控制器执行所述计算机指令,用于执行以下步骤:A computer program product, characterized in that the computer program product includes an SSD controller applied to a solid state drive SSD, the SSD includes an SSD controller, a first fault domain, and a second fault domain; when the SSD controller Executing the computer instructions for performing the following steps:
    为所述第一故障域分配所述SSD的第一范围的逻辑地址;Assign a logical address of the first range of the SSD to the first fault domain;
    为所述第二故障域分配所述SSD的第二范围的逻辑地址。A second range of logical addresses of the SSD is allocated to the second fault domain.
  35. 根据权利要求34所述的计算机程序产品,其特征在于,当所述SSD控制器执行所述计算机指令,还用于执行以下步骤:The computer program product of claim 34, wherein when the SSD controller executes the computer instruction, it is further used to perform the following steps:
    分别记录所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系。The corresponding relationship between the first fault domain and the logical address in the first range and the corresponding relationship between the second fault domain and the logical address in the second range are recorded separately.
  36. 根据权利要求34所述的计算机程序产品,其特征在于,当所述SSD控制器执行所述计算机指令,还用于执行以下步骤:The computer program product of claim 34, wherein when the SSD controller executes the computer instruction, it is further used to perform the following steps:
    向存储系统的控制器发送所述第一故障域与所述第一范围的逻辑地址的对应关系以及所述第二故障域与所述第二范围的逻辑地址的对应关系;其中,所述存储系统包括所述SSD。Sending the correspondence between the first fault domain and the logical address of the first range and the correspondence between the second fault domain and the logical address of the second range to the controller of the storage system; wherein, the storage The system includes the SSD.
PCT/CN2019/103085 2018-10-25 2019-08-28 Method, system and apparatus for restoring data in storage system WO2020082888A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020217012802A KR102648688B1 (en) 2018-10-25 2019-08-28 How to restore data from storage systems, systems and devices
EP19875722.1A EP3851949A4 (en) 2018-10-25 2019-08-28 Method, system and apparatus for restoring data in storage system
US17/233,893 US20210240584A1 (en) 2018-10-25 2021-04-19 Data recovery method, system, and apparatus in storage system
US17/883,708 US20230076381A1 (en) 2018-10-25 2022-08-09 Data recovery method, system, and apparatus in storage system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201811248415 2018-10-25
CN201811248415.5 2018-10-25
CN201811560345.7 2018-12-20
CN201811560345.7A CN111104056B (en) 2018-10-25 2018-12-20 Data recovery method, system and device in storage system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/233,893 Continuation US20210240584A1 (en) 2018-10-25 2021-04-19 Data recovery method, system, and apparatus in storage system

Publications (1)

Publication Number Publication Date
WO2020082888A1 true WO2020082888A1 (en) 2020-04-30

Family

ID=70330911

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103085 WO2020082888A1 (en) 2018-10-25 2019-08-28 Method, system and apparatus for restoring data in storage system

Country Status (1)

Country Link
WO (1) WO2020082888A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596949A (en) * 2020-12-23 2021-04-02 厦门市美亚柏科信息股份有限公司 High-efficiency SSD (solid State disk) deleted data recovery method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160277039A1 (en) * 2013-08-05 2016-09-22 Intel Corporation Storage systems with adaptive erasure code generation
CN106844098A (en) * 2016-12-29 2017-06-13 中国科学院计算技术研究所 A kind of fast data recovery method and system based on right-angled intersection erasure code
CN107085546A (en) * 2016-02-16 2017-08-22 深圳市深信服电子科技有限公司 Data managing method and device based on failure field technique
CN107203328A (en) * 2016-03-17 2017-09-26 伊姆西公司 Memory management method and storage device
CN108540315A (en) * 2018-03-28 2018-09-14 新华三技术有限公司成都分公司 Distributed memory system, method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160277039A1 (en) * 2013-08-05 2016-09-22 Intel Corporation Storage systems with adaptive erasure code generation
CN107085546A (en) * 2016-02-16 2017-08-22 深圳市深信服电子科技有限公司 Data managing method and device based on failure field technique
CN107203328A (en) * 2016-03-17 2017-09-26 伊姆西公司 Memory management method and storage device
CN106844098A (en) * 2016-12-29 2017-06-13 中国科学院计算技术研究所 A kind of fast data recovery method and system based on right-angled intersection erasure code
CN108540315A (en) * 2018-03-28 2018-09-14 新华三技术有限公司成都分公司 Distributed memory system, method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3851949A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596949A (en) * 2020-12-23 2021-04-02 厦门市美亚柏科信息股份有限公司 High-efficiency SSD (solid State disk) deleted data recovery method and system
CN112596949B (en) * 2020-12-23 2022-12-16 厦门市美亚柏科信息股份有限公司 High-efficiency SSD (solid State disk) deleted data recovery method and system

Similar Documents

Publication Publication Date Title
CN111552436B (en) Data recovery method, system and device in storage system
US8555029B2 (en) Virtualized storage system and method of operating thereof
US10082959B1 (en) Managing data placement in storage systems
US8918619B2 (en) Virtualized storage system and method of operating thereof
US10037152B2 (en) Method and system of high-throughput high-capacity storage appliance with flash translation layer escalation and global optimization on raw NAND flash
US11797387B2 (en) RAID stripe allocation based on memory device health
US20190205053A1 (en) Storage apparatus and distributed storage system
WO2020019267A1 (en) Data processing method and device
CN111124264A (en) Method, apparatus and computer program product for reconstructing data
US20210326207A1 (en) Stripe reassembling method in storage system and stripe server
CN113918087B (en) Storage device and method for managing namespaces in the storage device
US20210318826A1 (en) Data Storage Method and Apparatus in Distributed Storage System, and Computer Program Product
CN113687978A (en) Data processing method for storage array controller
WO2020082888A1 (en) Method, system and apparatus for restoring data in storage system
TWI607303B (en) Data storage system with virtual blocks and raid and management method thereof
US8688908B1 (en) Managing utilization of physical storage that stores data portions with mixed zero and non-zero data
WO2014045329A1 (en) Storage system and storage control method
WO2022143741A1 (en) Storage device management method, device, and storage system
US20240069814A1 (en) Storage server and operation method of storage server
CN110688056B (en) Storage medium replacement for NVM group
CN108664210B (en) IO command control method, IO command control system and solid-state storage device
CN116401063A (en) RAID resource allocation method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19875722

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019875722

Country of ref document: EP

Effective date: 20210415

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20217012802

Country of ref document: KR

Kind code of ref document: A