WO2020042850A1 - Procédé et appareil de stockage de données et système de stockage - Google Patents

Procédé et appareil de stockage de données et système de stockage Download PDF

Info

Publication number
WO2020042850A1
WO2020042850A1 PCT/CN2019/098256 CN2019098256W WO2020042850A1 WO 2020042850 A1 WO2020042850 A1 WO 2020042850A1 CN 2019098256 W CN2019098256 W CN 2019098256W WO 2020042850 A1 WO2020042850 A1 WO 2020042850A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
storage
valid data
storage area
Prior art date
Application number
PCT/CN2019/098256
Other languages
English (en)
Chinese (zh)
Inventor
王英
赵小宝
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19855114.5A priority Critical patent/EP3839716A4/fr
Publication of WO2020042850A1 publication Critical patent/WO2020042850A1/fr
Priority to US17/183,657 priority patent/US12008263B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1044Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1068Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices in sector programmable memories, e.g. flash disk
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0253Garbage collection, i.e. reclamation of unreferenced memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/12Formatting, e.g. arrangement of data block or words on the record carriers
    • G11B20/1217Formatting, e.g. arrangement of data block or words on the record carriers on discs
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/13Linear codes
    • H03M13/15Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes
    • H03M13/151Cyclic codes, i.e. cyclic shifts of codewords produce other codewords, e.g. codes defined by a generator polynomial, Bose-Chaudhuri-Hocquenghem [BCH] codes using error location or error correction polynomials
    • H03M13/154Error and erasure correction, e.g. by using the error and erasure locator or Forney polynomial
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/82Solving problems relating to consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/12Formatting, e.g. arrangement of data block or words on the record carriers
    • G11B20/1217Formatting, e.g. arrangement of data block or words on the record carriers on discs
    • G11B2020/1218Formatting, e.g. arrangement of data block or words on the record carriers on discs wherein the formatting concerns a specific area of the disc
    • G11B2020/1238Formatting, e.g. arrangement of data block or words on the record carriers on discs wherein the formatting concerns a specific area of the disc track, i.e. the entire a spirally or concentrically arranged path on which the recording marks are located
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/12Formatting, e.g. arrangement of data block or words on the record carriers
    • G11B2020/1291Formatting, e.g. arrangement of data block or words on the record carriers wherein the formatting serves a specific purpose
    • G11B2020/1292Enhancement of the total storage capacity

Definitions

  • the present application relates to the field of information technology, and more particularly, to a method, an apparatus, and a storage system for storing data.
  • erasure coding In storage systems, data security is a major indicator of storage system performance. In order to enable data to be safely stored in a storage system, many data protection mechanisms are provided in the prior art.
  • the principle of erasure coding is to use erasure coding (EC) coding to encode the original data to generate multiple data fragments, and store the data fragments in multiple memories. After a memory failure causes data in the memory to be lost, the above-mentioned lost data can be recovered through data fragments stored in other memories.
  • mirror storage means data storage in a mirror manner. Store the same data in the master node and the standby node of the storage system respectively. In this way, when a memory failure in the master node causes data loss, the above-mentioned lost data can be copied from the standby node to other memories in the master node. .
  • the lost data is recovered by the EC method or the mirror method, the recovered data is randomly stored in the storage area of the memory, resulting in very different time to expiration of the data stored in the final storage area. Big.
  • the garbage collection (GC) mechanism the storage area is used as the basic unit of garbage collection.
  • the valid data in the storage area needs to be migrated to other storage areas first. Only then can garbage collection be performed on the storage area to be recycled, which causes a lot of time for garbage collection and the efficiency is very low.
  • the present application provides a data storage method and device, which is beneficial to reduce the time occupied by garbage collection and improve the efficiency of garbage collection.
  • a data storage method is provided.
  • the method is applied to a storage system, where the storage system includes at least one first storage, the storage system further includes a second storage, and the at least one first storage includes A plurality of storage areas, each of the plurality of storage areas being a unit for garbage collection, the method includes: after a data failure occurs in the second storage, obtaining a plurality of target valid data to be recovered is invalidated Time, and recovery of the plurality of target valid data, the expiration time is used to indicate the time when valid data becomes invalid data; selecting a target storage area from the plurality of storage areas, and At least a part of the target valid data is stored in the target storage area, wherein after the at least part of the target valid data is stored in the target storage area, The length of time between the earliest failure time and the latest failure time in the failure time is less than or equal to the The length of time.
  • the target storage area Before storing multiple target valid data in the target storage area, the target storage area may be a blank storage area without stored data, or a storage area in which a part of data has been stored.
  • the valid data stored in the target valid data may include at least part of the target valid data in the multiple target valid data, and may also include other data in addition to the at least part of the target valid data, and the other valid data may be the target storage.
  • the data stored in the area before storing the at least part of the target valid data may also be newly written data after the target storage area stores the at least part of the target valid data.
  • a time length between the earliest expiration time and the latest expiration time is less than or equal to a preset time length, and is specifically divided into the following three cases.
  • Case 1 If valid data is already stored in the target storage area before at least part of the target valid data is saved, after saving at least part of the target valid data this time, the valid data saved this time and the existing valid data
  • the length of time between the earliest expiration time of the valid data and the latest expiration time of the valid data is less than or equal to a preset time length.
  • Case two if there is no valid data in the target storage area before at least part of the target valid data is stored this time. In the valid data stored this time, the time length between the earliest expiration time of the valid data and the latest expiration time of the valid data is less than or equal to a preset time length.
  • Case 3 If at least part of the target valid data is stored in the target storage area and other valid data continues to be stored in the target storage area, the set of the at least part of the target valid data and the above other valid data is stored in the set: The length of time between the earliest expiration time of the valid data and the latest expiration time of the valid data is less than or equal to a preset time length.
  • case three can also be combined, for example, case three can be combined with case one and case two, respectively.
  • case three and case one are combined, three parts of data can be stored in the target storage area, that is, valid data stored in the target storage area before at least part of the target valid data, at least part of the target valid data itself, and at least part of the target Other valid data stored in the target storage area after the valid data, that is, at least part of the valid data stored in the target storage area before the valid data, at least part of the target valid data itself, and at least part of the target valid data after being stored in the target storage In the set composed of other valid data in the area, a time length between the earliest failure time and the latest failure time of the valid data is less than or equal to a preset time length.
  • the target storage area by limiting the time length between the earliest expiration time and the latest expiration time of valid data stored in the target storage area to be less than or equal to a preset time length, the target storage area The invalidation time of the stored valid data is relatively concentrated. In the process of garbage collection with the target storage area, it is beneficial to reduce the time occupied by the migration of valid data and improve the efficiency of garbage collection.
  • the storage area does not need to be recycled. If the time of garbage collection is later than the latest expiration time, all the data in the storage area becomes invalid data and can be directly recycled without performing the step of migrating valid data. If the garbage collection time is between the earliest expiration time and the latest expiration time, because the expiration time of the data stored in the storage area is more concentrated, from a macro perspective, the effective data that needs to be migrated in each storage area The amount of data is less than the amount of valid data that needs to be migrated in the traditional recovery data storage mechanism.
  • the invalidation times of the valid data stored in the target storage area may all be the same. Thus, if the target storage area is All the data is invalid.
  • garbage collection there is no need to perform the step of migrating valid data, which minimizes the time occupied by migrating valid data during the garbage collection process.
  • data stored in the multiple storage areas is written in a sequential writing manner.
  • the at least one first memory and the second memory are shingled magnetic recording SMR hard disks, and each of the plurality of storage areas is a sequential area Szone.
  • the at least one memory is a solid state drive SSD, and each of the plurality of storage areas is a block.
  • the method further includes: determining a storage order of the multiple target valid data according to the expiration time of each valid data of the multiple target valid data, and the storage order It is used to instruct all the valid data in the multiple target valid data to be stored in the target storage area after all the data is written, and then write all valid data in the next storage area to be stored in the next storage area.
  • the next storage area is a storage area other than the target storage area among the plurality of storage areas; selecting the target storage area from the plurality of storage areas, and And storing at least a part of the target valid data in the target storage area includes: selecting a target storage area from the plurality of storage areas, and storing the at least part of the target valid data in accordance with the storage order. To the target storage area.
  • the target valid data is stored in the target storage area in a storage order, and the storage order is used to indicate that data of the multiple target valid data that is to be stored in the same storage area is written continuously. It avoids that in the process of storing multiple target valid data, the target valid data written adjacently is data stored in different target storage areas, resulting in the need to Switching back and forth in the target storage area is conducive to improving the efficiency of storing target valid data.
  • the head of the SMR hard disk will swing, which will cause the time of data storage to increase, which will seriously affect the performance of SMR hard disk data .
  • the expiration time of any one of the plurality of target valid data is stored in metadata of a data fragment corresponding to the any one of the target valid data, and the one A data fragment corresponding to one target valid data and any one of the target valid data is generated by erasure code encoding on the original data.
  • the expiration time is represented by a data life cycle and data writing time.
  • the life cycle of the above data can be determined according to the service to which the data belongs. Data belonging to different services have their own corresponding life cycles.
  • a data storage device in a second aspect, includes various modules for performing the foregoing method.
  • a storage system includes various modules for performing the foregoing methods.
  • a storage system including at least one processor and at least one memory.
  • the at least one memory is configured to store a computer program
  • the at least one processor is configured to call and run the computer program from the memory, so that the storage system executes the foregoing method.
  • the at least one processor and the at least one memory may be located in multiple different storage nodes, and may also be located in the same storage node.
  • a storage system including at least one processor and at least one memory.
  • the at least one memory is configured to store a computer program
  • the at least one processor is configured to call and run the computer program from the memory, so that the storage system executes the foregoing method.
  • a computer program product includes: computer program code that, when the computer program code runs on a computer, causes the computer to execute the methods in the above aspects.
  • the above computer program code may be stored in whole or in part on a first storage medium, where the first storage medium may be packaged with the processor, or may be packaged separately with the processor. This embodiment of the present application does not deal with this. Specific limitations.
  • a computer-readable medium stores program code, and when the computer program code runs on a computer, the computer causes the computer to execute the methods in the foregoing aspects.
  • FIG. 1 is an architecture diagram of a storage system to which an embodiment of the present application is applied.
  • FIG. 2 is a schematic diagram showing a storage result for storing recovered data based on a random storage mechanism.
  • FIG. 3 is a schematic diagram of a time sequence between failure times.
  • FIG. 4 is a schematic flowchart of a data storage method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a time attribute structure according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a storage result of recovered data according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a data storage device according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a controller according to an embodiment of the present application.
  • a mechanism for managing storage space When the remaining storage space of the storage space is insufficient, the invalid data stored in the storage space can be deleted through the garbage collection mechanism to achieve the purpose of recycling storage resources.
  • the recovered storage resources can be used to store other valid data to improve the use of storage space. rate.
  • Invalid data can be understood as data that needs to be deleted from the storage system. For example, it could be data with an expiration date, or data with an expiration time.
  • Valid data can be understood as data that the storage system needs to continue to store. For example, it can be data that has not yet expired, or data that has not expired.
  • the storage area may be a recycling unit for garbage collection, that is, a basic unit when the memory is garbage collected, for example, it may be an integer multiple of the smallest unit for garbage collection, or the smallest unit for garbage collection for the memory.
  • the basic unit of garbage collection may include at least one sequential zone (Szone), and the size of each sequential zone is usually 256M.
  • the basic unit of garbage collection may include at least one block.
  • FIG. 1 is an architecture diagram of a distributed storage system to which an embodiment of the present application is applied.
  • the distributed storage system 100 shown in FIG. 1 includes a plurality of storage nodes 110 and a controller 120.
  • Multiple storage nodes 110 that is, storage node 1 to storage node n, where each storage node may be provided with multiple memories.
  • m memories are provided, that is, memory 1 to memory m.
  • the storage node can provide storage space for data through respective memories, where n and m are positive integers greater than 1.
  • the foregoing storage node may specifically be a storage node in a distributed storage system, such as a storage server.
  • the storage node may also be a storage node in a cluster storage system.
  • any of the storage nodes may be used as a master node, and other storage nodes may be used as mirror nodes (also called backup nodes) of the master node.
  • the controller 120 is configured to recover the lost data from the data.
  • the above-mentioned controller may be a controller having a control function independently of the storage node, and the above-mentioned controller may also be a controller located in a certain storage node.
  • the foregoing storage system is a distributed system, and the controller may be composed of at least one processor, and the at least one processor may be distributed in at least one storage node (server).
  • the controller may be located on the master node, such as a processor in the master node.
  • the controller can use a data protection mechanism (for example, EC, or mirror storage) to lose the data according to the data stored in other memories in the storage system. Data is restored, and the restored data is randomly stored in the storage system.
  • a data protection mechanism for example, EC, or mirror storage
  • This mechanism of randomly storing the recovered data in the storage system causes the expiration time of the recovered data stored in each target storage area in the storage system to be random.
  • garbage collection usually only a small part of the data in the storage area to be recycled reaches the invalidation time and becomes invalid data. The remaining data is still valid data. At this time, you need to wait for a large number of After valid data is migrated from the storage area to be recycled to other storage areas, garbage collection can be performed on the storage area to be recycled.
  • FIG. 2 shows a schematic diagram of storing the storage result of the recovered data based on the random storage mechanism.
  • the data a to data p are stored in the memory 2 among the multiple memories, where the failure times of the data a, data b, data c, and data d are the first failure times, and the data e, data f, data g, and data
  • the failure time of h is the second failure time
  • the failure time of data i, data j, data k, and data l is the third failure time
  • the failure time of data m, data n, data o, and data p is the fourth failure time.
  • the time sequence between the first failure time, the second failure time, the third failure time, and the fourth failure time is shown in FIG. 3.
  • storage area 1 When the memory 2 fails, causing the data a to p stored in the memory 2 to be lost, it is necessary to recover the lost data in the memory 2 through the data stored in other memories, and randomly store the recovered data to the memory 1
  • storage area 2 storage area 3, and storage area 4
  • storage area 1 finally stores data a, data e, data i, and data n
  • storage area 2 stores Data c, data f, data k, and data p.
  • the storage area 3 stores data d, data h, data 1, and data m
  • the storage area 4 stores data b, data g, data j, and data o.
  • garbage collection is performed on the above 4 storage areas within the time period between the first expiration time and the second expiration time, only data a in storage area 1, data c in storage area 2, and storage area 3
  • the data d in the storage area 4 and the data b in the storage area 4 are invalid.
  • the remaining data are all valid data.
  • the valid data in each storage area needs to be migrated to other storage areas. In this way, a large amount of data is migrated, which causes a lot of time for garbage collection, which is very inefficient.
  • this application provides a new technology for storing and recovering data.
  • the recovery data ie, the target valid data below
  • the recovery data is stored in the same storage area (the following target storage area).
  • garbage collection is performed on a storage area basis. It is beneficial to reduce the time taken to migrate valid data and improve the efficiency of garbage collection.
  • the recovery data is randomly stored in the storage area, which causes the expiration time of the recovery data stored in each storage area to be random.
  • garbage collection using the storage area as a unit Often, only a part of the scattered data in each storage area in the storage system is invalidated, so that the amount of valid data that needs to be migrated is large.
  • the following describes the data storage method in the embodiment of the present application with reference to FIG. 4.
  • the method is applied to a storage system (for example, the storage system shown in FIG. 1).
  • a storage system for example, the storage system shown in FIG. 1.
  • a memory that works normally in a storage system is called a first memory
  • a memory that has a data failure in the storage system is called a second memory, where the normally working memory includes multiple storage areas.
  • FIG. 4 is a schematic flowchart of a data storage method according to an embodiment of the present application.
  • the method shown in FIG. 4 can be executed by a device having a control function in the storage system, for example, the controller shown in FIG. 1.
  • the method shown in FIG. 4 includes steps 410 and 420.
  • the expiration time is used to indicate the time when valid data becomes invalid data. It can be directly stored in the storage system.
  • the expiration time can also be stored in the storage system indirectly through the time attribute.
  • the time attribute can include The data life cycle and the data writing time, wherein the data life cycle is used to indicate the total length of time that the data is stored in the storage system, and the write time is used to indicate the time that the data is written to the storage system. Is the start time of the life cycle. After the total time indicated by the life cycle, the end time of the life cycle is the expiration time.
  • the life cycle of the above data can be determined according to the service to which the data belongs. Data belonging to different services have their own corresponding life cycles.
  • the original data that has not been stored in the storage system is usually EC coded to generate multiple data fragments. Because these multiple data fragments are generated based on the original data, they are being sent to the storage system. When the multiple data fragments are written, the multiple data fragments are continuously written to different memories in the storage system, that is, the write time of the multiple data is also the same. Therefore, the above multiple data The failure time of each shard is the same.
  • the target valid data fragment that is, the target valid data
  • the data in the multiple data fragments stored in the storage system may be lost. Obtain the invalidation time of the target valid data segment in other data segments.
  • the storage system is a distributed storage system
  • the client of the distributed storage system divides the data written by the user into multiple data fragments through EC coding. It is possible to add a life cycle to the metadata of each data slice according to the business to which the business data belongs.
  • each storage node (including a data shard node and a redundant shard node) in the distributed storage system receives at least a part of the data shards from the multiple data shards, it stores the data shards to the storage node.
  • write time can be added to the metadata of each data slice.
  • the storage nodes can directly add the data shard metadata to the data shards during the process of writing the data shards. Expiration time.
  • data needs to be stored not only on the master node, but also on the standby node.
  • the data stored on the standby node is a copy of the data stored on the master node
  • the master node The expiration time of the stored data is the same as the replica data of the data stored in the standby node.
  • the invalidation time of the target valid data can be obtained from the standby node.
  • the recovering the target valid data may include recovering multiple target valid data according to the data stored in the storage system.
  • multiple data fragments stored in the storage system can be EC-decoded to recover the multiple valid data.
  • the specific decoding method can refer to the traditional EC decoding method, which is not described in the embodiment of this application. Detailed description.
  • multiple target valid data can be copied from the standby node to recover the multiple target valid data.
  • the time relationship between the execution process of the recovered data and the selection of the target storage area in step 410 and step 420 is relatively flexible.
  • the execution process of the recovered data may be performed before step 410, and the execution process of the recovered data may also be selected.
  • the target storage area is performed before.
  • the process of recovering data may also be performed after the target storage area is selected.
  • the process of recovering data may also be performed after step 410, which is not specifically limited in this embodiment of the present application.
  • the execution process of the recovered data is performed before step 410, the expiration times of the above-mentioned multiple valid valid data may be directly obtained from the metadata of the recovered data.
  • the target storage area Select a target storage area from the multiple storage areas, and send at least a part of the target valid data from the restored multiple target valid data to a memory where the target storage area is located, where at least a part of the target valid data is stored to After the target storage area, the length of time between the earliest failure time and the latest failure time of the valid data stored in the target storage area is less than or equal to a preset time length.
  • the target storage area Before storing multiple target valid data in the target storage area, the target storage area may be a blank storage area without stored data, or a storage area in which a part of data has been stored.
  • the valid data stored in the target valid data may include at least part of the target valid data in the multiple target valid data, and may also include other data in addition to the at least part of the target valid data, and the other valid data may be the target storage.
  • the data stored in the area before storing the at least part of the target valid data may also be newly written data after the target storage area stores the at least part of the target valid data.
  • a time length between the earliest expiration time and the latest expiration time is less than or equal to a preset time length, and is specifically divided into the following three cases.
  • Case 1 If valid data is already stored in the target storage area before at least part of the target valid data is saved, after saving at least part of the target valid data this time, the valid data saved this time and the existing valid data
  • the length of time between the earliest expiration time of the valid data and the latest expiration time of the valid data is less than or equal to a preset time length.
  • Case two if there is no valid data in the target storage area before at least part of the target valid data is stored this time. In the valid data stored this time, the time length between the earliest expiration time of the valid data and the latest expiration time of the valid data is less than or equal to a preset time length.
  • Case 3 If at least part of the target valid data is stored in the target storage area and other valid data continues to be stored in the target storage area, the set of the at least part of the target valid data and the above other valid data is stored in the set: The length of time between the earliest expiration time of the valid data and the latest expiration time of the valid data is less than or equal to a preset time length.
  • case three can also be combined, for example, case three can be combined with case one and case two, respectively.
  • case three and case one are combined, three parts of data can be stored in the target storage area, that is, valid data stored in the target storage area before at least part of the target valid data, at least part of the target valid data itself, and at least part of the target Other valid data stored in the target storage area after the valid data, that is, at least part of the valid data stored in the target storage area before the valid data, at least part of the target valid data itself, and at least part of the target valid data after being stored in the target storage In the set composed of other valid data in the area, a time length between the earliest failure time and the latest failure time of the valid data is less than or equal to a preset time length.
  • the valid data stored in the target storage area may include some target valid data in the multiple target valid data, and the valid data stored in the target storage area may include all target valid data in the multiple target valid data.
  • the target storage area is originally a blank storage area, at least part of the target valid data can be directly stored in the target storage area. If the target storage area originally contained data, then you need to ensure that the length of time between the earliest expiration time and the latest expiration time of the data to be stored and the stored data expiration time in the target effective storage area is less than or equal to the preset Length of time.
  • the target storage area by limiting the time length between the earliest expiration time and the latest expiration time of valid data stored in the target storage area to be less than or equal to a preset time length, the target storage area The invalidation time of the stored valid data is relatively concentrated. In the process of garbage collection with the target storage area, it is beneficial to reduce the time occupied by the migration of valid data and improve the efficiency of garbage collection.
  • the storage area does not need to be recycled. If the time of garbage collection is later than the latest expiration time, all the data in the storage area becomes invalid data and can be directly recycled without performing the step of migrating valid data. If the garbage collection time is between the earliest expiration time and the latest expiration time, because the expiration time of the data stored in the storage area is more concentrated, from a macro perspective, the effective data that needs to be migrated in each storage area The amount of data is less than the amount of valid data that needs to be migrated in the traditional recovery data storage mechanism.
  • the expiration time of the valid data stored in the target storage area can be all the same, that is, the data stored in the target storage area either becomes invalid data at the same time, or it is all valid data. In this way, if all the data in the target storage area is valid Invalidation, when garbage collection is needed, there is no need to perform the step of migrating valid data, which minimizes the time occupied by migrating valid data during the garbage collection process.
  • the data with the same expiration time may be data with the same life cycle and write time, and the data with the same expiration time may be data with different write time and different life cycle, but the end of the life cycle Data with the same time, for example, data with the same remaining life cycle and the same time to recover the data, where the remaining life cycle indicates the remaining effective time of the data after recovery, that is, starting from the time to recover the data, after the remaining life cycle, the remaining The end of the life cycle is the expiration time of the data.
  • FIG. 6 is a schematic diagram of a storage result of recovered data according to an embodiment of the present application.
  • a life period (period) indicated by p and a write time (t) indicated by t indicate a data expiration time.
  • the failure time of the data a, data b, data c, and data d is the first failure time, which is represented by (p1, t1); data e
  • the data, f, data g, and data h failure times are the second failure time, represented by (p1, t2);
  • the data i, data j, data k, and data l failure times are the third failure time, using (p2, t1) indicates;
  • the failure time of data m, data n, data o, and data p is the fourth failure time, which is represented by (p2, t2).
  • the same storage strategy can be used to recover the lost data in memory 2 and store the recovered data to storage area 1 and memory 3 in memory 1, respectively.
  • the plurality of storage areas may be located in one or more first memories.
  • the at least one first storage and the second storage are located in one storage node, and the at least one first storage and the second storage may also be located in different storage nodes, which is not limited in the embodiment of the present application.
  • the memory for storing the invalidation time of multiple target valid data may be the same memory as the memory for storing the target valid data (that is, the memory where the target storage area is located);
  • the storage of valid data (that is, the storage where the target storage area is located) is a different storage, which is not limited in the embodiment of the present application.
  • the storage node where the memory storing the expiration time may be located in the same storage node as the at least one first storage, and the storage node where the memory storing the expiration time may be different from the storage node where the at least one first storage is located.
  • the storage node where the controller performing steps 410 and 420 is located may be the same storage node as the storage node where the target storage area is located. That is, after a data failure occurs in the second storage in the storage node, the storage node The controller in the server stores the target valid data to the target storage area by performing the above steps 410 and 420.
  • the storage node on which the controller performing steps 410 and 420 is located may also be a storage node that is different from the storage node on which the target storage area is located, that is, the controller may send at least part of the target valid data after recovery to the target The storage node where the storage area is located, and the storage node where the target storage area is located stores at least part of the target valid data to the target storage area.
  • the storage node where the controller performing step 410 and step 420 is located may be the same storage node as the storage node where the second storage is located, and the storage node where the controller performing step 410 and step 420 is located may be the same as the second storage Storage nodes are different storage nodes.
  • the storage area for storing the data for recovering the target valid data may be in the same storage node as the target storage area, or even in the same memory.
  • the storage area for storing the data for recovering the target valid data may be located at a different storage node from the target storage area.
  • the data of the same data stream will be naturally written into continuous storage space.
  • the expiration time of the data stored in the target storage area is usually the same. Even if the expiration time of the data stored in each target storage area is not exactly the same, it is usually concentrated. In the process of garbage collection based on the target storage area, the amount of effective data migration is usually not large. However, as described above, during the process of storing the recovered data, due to the random storage of the recovered data, the expiration time of the recovered data stored in the target storage area is also random. Therefore, the method in the embodiment of the present application also The same applies to the above-mentioned storage medium that supports sequential writing.
  • the memory mentioned in the embodiment of the present invention may be a “sequential write medium” such as an SSD or an SMR, and may also be an “append” memory.
  • software such as storage management software
  • the additional write memory the newly written data cannot directly overwrite the existing data, for example, the newly written data is added after the existing data in the memory is stored.
  • target valid data in the target storage area In the process of storing target valid data in the target storage area, data that needs to be stored in the same target storage area can usually be written continuously. It avoids that in the process of storing multiple target valid data, the target valid data written adjacently is data stored in different target storage areas, resulting in the need to Switching back and forth in the target storage area is conducive to improving the efficiency of storing target valid data.
  • the head of the SMR hard disk will swing, which will cause the time of data storage to increase, which will seriously affect the performance of SMR hard disk data writing. .
  • the storage order of multiple target valid data may be determined according to the expiration time of multiple target valid data. Based on the storage order, multiple target valid data are prepared to be stored in the same The target effective data of the storage area is continuously written into the target storage area.
  • a storage order of the plurality of target valid data is determined according to the expiration time of each valid data of the plurality of target valid data, and the storage order is used to indicate that the plurality of target valid data is ready to be stored. After all the data in the target storage area is written, all valid data to be stored in the next storage area is written to the next storage area, and the next storage area is the multiple storage areas.
  • the specific sorting method of the above storage order can be used in conjunction with the storage strategy introduced above.
  • the above storage order may specifically indicate that target valid data with the same expiration time is continuously written.
  • the storage policy is to store target valid data with an expiration time within a preset time length threshold in the same storage area, the above storage order may indicate that the target valid data that satisfies the storage policy is continuously written.
  • the restored valid data of the target is stored in the target storage area according to the storage sequence.
  • the above-mentioned process of establishing a storage sequence may also be performed before recovering multiple target valid data.
  • the data stored in a storage area can be aggregated into a data set and temporarily stored according to the expiration time. Then, the target valid data in the data set is sequentially stored in the storage area in units of the data set.
  • the above-mentioned process of establishing the storage sequence is performed before recovering multiple target valid data, at this time, the multiple target valid data fragments have not been recovered, and the purpose of adjusting the storage order may be achieved by adjusting the data recovery order.
  • the method of adjusting the storage order of data is described by taking the same expiration time of valid data stored in the target storage area as an example. Since the expiration time mentioned above, or the time attribute used to indicate the expiration time, can be added to the metadata of the data as a key, the traditional hash algorithm can be reused to valid data for multiple targets The hash of the key is performed. In this way, the data carrying the same key can be hashed into the same bucket, and then the valid data of multiple targets is restored in the unit of the bucket, and the restored target valid data is stored in the corresponding bucket. Destination storage area.
  • the data stored in the second memory needs to be stored for 30 days, and the time attributes (life cycle and write time) stored in the metadata of the data are 1020180602, 1020180601, and 3020180602, respectively.
  • the time attributes of the data that needs to be recovered from the storage system are 1020180602, 1020180601, and 3020180602, respectively, and the above time attributes are hashed respectively as keys.
  • writing on June 2, 2018 requires storage The 10-day data, the data that needs to be stored for 10 days when written on June 1, 2018, and the data that needs to be stored for 30 days when written on June 2, 2018, will be hashed to the hash buckets of 1020180602, 1020180601, and 3020180602 respectively.
  • the data on the buckets 1020180602, 1020180601, and 3020180602 are sequentially restored, so that the data stored in the same storage area can be written one after another.
  • FIG. 7 is a schematic diagram of a data storage device according to an embodiment of the present application.
  • the apparatus 700 shown in FIG. 7 may be applied to a storage system, where the storage system includes at least one first memory, the storage system further includes a second memory, the at least one first memory includes multiple storage areas, and the multiple Each of the storage areas is a unit for garbage collection.
  • the apparatus 700 may include a processing unit 710.
  • a processing unit 710 configured to obtain the invalidation time of a plurality of target valid data to be recovered after a data failure occurs in the second memory, where the invalidation time is used to indicate a time when valid data becomes invalid data;
  • the processing unit 710 is further configured to select a target storage area from the plurality of storage areas, and store at least a part of the target valid data of the restored plurality of target valid data to the target storage area, where After the at least part of the target valid data is stored in the target storage area, the length of time between the earliest invalidation time and the latest invalidation time of valid data stored in the target storage area is less than or equal to The preset length of time.
  • the target storage area by limiting the time length between the earliest expiration time and the latest expiration time of valid data stored in the target storage area to be less than or equal to a preset time length, the target storage area The invalidation time of the stored valid data is relatively concentrated. In the process of garbage collection with the target storage area, it is beneficial to reduce the time occupied by the migration of valid data and improve the efficiency of garbage collection.
  • the expiration time of the valid data stored in the target storage area is the same.
  • data stored in the multiple storage areas is written in a sequential writing manner.
  • the at least one first memory and the second memory are shingled magnetic recording SMR hard disks, and each of the plurality of storage areas is a sequential area Szone.
  • the at least one memory is a solid state drive SSD, and each of the plurality of storage areas is a block.
  • the processing unit is further configured to determine a storage order of the plurality of target valid data according to the expiration time of each valid data in the plurality of target valid data.
  • the storage order is used to indicate that data of the plurality of target valid data that is to be stored in the same storage area is written continuously; and selecting a target storage area from the plurality of storage areas, and in accordance with the storage order, The at least part of the target valid data is stored in the target storage area.
  • the expiration time of any one of the plurality of target valid data is stored in metadata of a data fragment corresponding to the any one of the target valid data, and the one A data fragment corresponding to one target valid data and any one of the target valid data is generated by erasure code encoding on the original data.
  • the expiration time is represented by a data life cycle and data writing time.
  • the foregoing apparatus 700 may also be a controller 800.
  • the processing unit 710 may be at least one processor 820.
  • the controller 800 may further include at least one memory 810 and at least one input / output interface 830, as shown in FIG. 8.
  • FIG. 8 is a schematic diagram of a controller according to an embodiment of the present application.
  • the controller 800 shown in FIG. 8 may include at least one memory 810, at least one processor 820, and at least one input / output interface 830. Among them, at least one memory 810, at least one processor 820, and at least one input / output interface 830 are connected through a communication connection.
  • the at least one memory 810 is used to store program instructions, and the at least one processor 820 is used to execute the memory 820 storage. Program instructions to control at least one input / output interface 830 to receive input data and information, and output data such as operation results.
  • the memory 810 may include a read-only memory and a random access memory, and provide instructions and data to the processor 820.
  • a portion of the processor 820 may also include non-volatile random access memory.
  • the processor 820 may also store information of a device type.
  • each step of the above method may be completed by using hardware integrated logic circuits or instructions in the form of software in the processor 820.
  • the method disclosed in combination with the embodiments of the present application may be directly implemented by a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module may be located in a mature storage medium such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, and the like.
  • the storage medium is located in the memory 810, and the processor 820 reads information in the memory 810 and completes the steps of the foregoing method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • the processor may be a central processing unit (CPU), and the processor may also be other general-purpose processors, digital signal processors (DSPs), and special-purpose integrations.
  • Circuit application specific integrated circuit, ASIC
  • ready-made programmable gate array field programmable gate array, FPGA
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • An embodiment of the present application further provides a storage system.
  • the storage system includes at least one first storage, and the storage system further includes a second storage.
  • the at least one first storage includes multiple storage areas, and each of the multiple storage areas is garbage collected. unit.
  • the storage system may include a processing unit and a processing unit.
  • a processing unit configured to obtain the invalidation time of a plurality of target valid data to be recovered after a data failure occurs in the second memory, where the invalidation time is used to indicate a time when valid data becomes invalid data;
  • a processing unit configured to select a target storage area from the plurality of storage areas, and store at least a part of the target valid data of the restored plurality of target valid data to the target storage area, where After at least part of the target valid data is stored in the target storage area, a length of time between the earliest failure time and the latest failure time of the valid data stored in the target storage area is less than or equal to a preset time length.
  • the above storage system involves three types of storage nodes, that is, the storage node where the second memory where the data failure occurs, the storage node where the target storage area is located, and the storage node where the processing unit and processing unit are located.
  • the above three types of storage nodes may be the same storage node (for convenience of description, hereinafter referred to as the first storage node).
  • the controller in the first storage node After a data failure occurs in the second storage in the first storage node, the controller in the first storage node The restored at least part of the target valid data is then stored in a target storage area in the first storage node.
  • the above storage system includes at least one storage node, that is, the first storage node.
  • the above three types of storage nodes may also be different storage nodes.
  • the storage node where the second storage is located and the processing unit and the storage node where the processing unit is located may be the same storage node (also referred to as a second storage node). )
  • the second storage node and the storage node where the target storage area is located are different storage nodes, that is, after the second storage node in the second storage node has a data failure, the second storage node
  • the processing unit and the processing unit may perform data recovery on multiple target valid data, and send at least a part of the target valid data among the restored multiple target valid data to a third storage node, so that the third storage node sends at least a part of the above.
  • the target valid data is stored in the target storage area.
  • the above storage system includes at least two storage nodes, that is, a second storage node and a third storage node.
  • the target storage area by limiting the time length between the earliest expiration time and the latest expiration time of valid data stored in the target storage area to be less than or equal to a preset time length, the target storage area The invalidation time of the stored valid data is relatively concentrated. In the process of garbage collection with the target storage area, it is beneficial to reduce the time occupied by the migration of valid data and improve the efficiency of garbage collection.
  • the expiration time of the valid data stored in the target storage area is the same.
  • data stored in the multiple storage areas is written in a sequential writing manner.
  • the at least one first memory and the second memory are shingled magnetic recording SMR hard disks, and each of the plurality of storage areas is a sequential area Szone.
  • the at least one memory is a solid state drive SSD, and each of the plurality of storage areas is a block.
  • the processing unit is further configured to determine a storage order of the plurality of target valid data according to the expiration time of each valid data in the plurality of target valid data.
  • the storage order is used to indicate that data of the plurality of target valid data that is to be stored in the same storage area is written continuously; and selecting a target storage area from the plurality of storage areas, and in accordance with the storage order, The at least part of the target valid data is stored in the target storage area.
  • the expiration time of any one of the plurality of target valid data is stored in metadata of a data fragment corresponding to the any one of the target valid data, and the one A data fragment corresponding to one target valid data and any one of the target valid data is generated by erasure code encoding on the original data.
  • the expiration time is represented by a data life cycle and data writing time.
  • the size of the sequence numbers of the above processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be read by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD) )Wait.
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium for example, a digital video disc (DVD)
  • DVD digital video disc
  • SSD solid state disk

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Algebra (AREA)
  • Pure & Applied Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Signal Processing (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé et un appareil de stockage de données et un système de stockage, le procédé étant appliqué au système de stockage. Le système de stockage comprend au moins une première mémoire ; le système de stockage comprend en outre une seconde mémoire ; la première mémoire ou les premières mémoires comprennent une pluralité de zones de stockage ; et chaque zone de stockage de la pluralité de zones de stockage est une unité effectuant un enlèvement d'ordures. En définissant une durée entre le temps d'expiration le plus précoce et le temps d'expiration le plus tardif parmi les temps d'expiration de données effectives stockées dans une zone de stockage cible comme étant inférieure ou égale à une durée prédéfinie, le procédé rend les temps d'expiration des données effectives stockées dans la zone de stockage cible relativement concentrés, ce qui, dans un processus d'enlèvement d'ordures dans la zone de stockage cible, facilite la réduction du temps occupé par la migration de données effectives pour améliorer l'efficacité de l'enlèvement des ordures.
PCT/CN2019/098256 2018-08-27 2019-07-30 Procédé et appareil de stockage de données et système de stockage WO2020042850A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19855114.5A EP3839716A4 (fr) 2018-08-27 2019-07-30 Procédé et appareil de stockage de données et système de stockage
US17/183,657 US12008263B2 (en) 2018-08-27 2021-02-24 Garbage collection and data storage method and apparatus, and storage system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810983555.0A CN109445681B (zh) 2018-08-27 2018-08-27 数据的存储方法、装置和存储系统
CN201810983555.0 2018-08-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/183,657 Continuation US12008263B2 (en) 2018-08-27 2021-02-24 Garbage collection and data storage method and apparatus, and storage system

Publications (1)

Publication Number Publication Date
WO2020042850A1 true WO2020042850A1 (fr) 2020-03-05

Family

ID=65532785

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/098256 WO2020042850A1 (fr) 2018-08-27 2019-07-30 Procédé et appareil de stockage de données et système de stockage

Country Status (4)

Country Link
US (1) US12008263B2 (fr)
EP (1) EP3839716A4 (fr)
CN (1) CN109445681B (fr)
WO (1) WO2020042850A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445681B (zh) 2018-08-27 2021-05-11 华为技术有限公司 数据的存储方法、装置和存储系统
CN115989485A (zh) * 2020-11-30 2023-04-18 华为技术有限公司 一种数据处理方法、装置及系统
CN112714031B (zh) * 2021-03-29 2021-06-22 中南大学 一种基于带宽感知的故障节点快速修复方法
CN113687774A (zh) * 2021-07-19 2021-11-23 锐捷网络股份有限公司 空间回收方法、装置及设备
CN113900590B (zh) * 2021-09-28 2023-01-31 重庆紫光华山智安科技有限公司 叠瓦式磁盘存储方法、装置、设备及介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105204783A (zh) * 2015-10-13 2015-12-30 华中科技大学 一种基于数据生存期的固态盘垃圾回收方法
CN107102819A (zh) * 2014-12-12 2017-08-29 西安三星电子研究有限公司 向固态硬盘写入数据的方法及设备
CN109445681A (zh) * 2018-08-27 2019-03-08 华为技术有限公司 数据的存储方法、装置和存储系统

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6526448B1 (en) * 1998-12-22 2003-02-25 At&T Corp. Pseudo proxy server providing instant overflow capacity to computer networks
US8195912B2 (en) * 2007-12-06 2012-06-05 Fusion-io, Inc Apparatus, system, and method for efficient mapping of virtual and physical addresses
EP3696676B1 (fr) * 2009-10-09 2023-12-06 Violin Systems LLC Système de mémoire avec multiples bandes de groupes raid et son procédé d'exécution
JP5066199B2 (ja) * 2010-02-12 2012-11-07 株式会社東芝 半導体記憶装置
US8316176B1 (en) * 2010-02-17 2012-11-20 Western Digital Technologies, Inc. Non-volatile semiconductor memory segregating sequential data during garbage collection to reduce write amplification
FI124455B (fi) * 2010-04-20 2014-09-15 Tellabs Oy Menetelmä ja laite verkko-osoitteiden konfiguroimiseksi
US20120030260A1 (en) * 2010-07-30 2012-02-02 Maohua Lu Scalable and parallel garbage collection method and system for incremental backups with data de-duplication
US8886990B2 (en) * 2011-01-27 2014-11-11 Apple Inc. Block management schemes in hybrid SLC/MLC memory
EP2662774A4 (fr) * 2011-10-27 2014-01-08 Huawei Tech Co Ltd Procédé de commande de mappage de tampons et système de tampons
CN103577338B (zh) * 2013-11-14 2016-06-29 华为技术有限公司 一种回收垃圾数据的方法及存储设备
US9501393B2 (en) * 2014-01-27 2016-11-22 Western Digital Technologies, Inc. Data storage system garbage collection based on at least one attribute
KR102289919B1 (ko) * 2014-04-15 2021-08-12 삼성전자주식회사 스토리지 컨트롤러, 스토리지 장치, 스토리지 시스템 및 상기 스토리지 컨트롤러의 동작 방법
US10114557B2 (en) * 2014-05-30 2018-10-30 Sandisk Technologies Llc Identification of hot regions to enhance performance and endurance of a non-volatile storage device
US9705990B2 (en) * 2014-06-05 2017-07-11 Toyota Jidosha Kabushiki Kaisha Transfer of digital data to mobile software systems
US9600409B2 (en) * 2014-08-29 2017-03-21 EMC IP Holding Company LLC Method and system for garbage collection in a storage system based on longevity of stored data
US9734051B2 (en) * 2015-02-16 2017-08-15 Quantum Corporation Garbage collection and defragmentation for solid state drives (SSD) and shingled magnetic recording (SMR) drives
US10261725B2 (en) * 2015-04-10 2019-04-16 Toshiba Memory Corporation Storage system capable of invalidating data stored in a storage device thereof
JPWO2016175028A1 (ja) * 2015-04-28 2018-02-22 日本電気株式会社 情報処理システム、記憶制御装置、記憶制御方法および記憶制御プログラム
US10650024B2 (en) * 2015-07-30 2020-05-12 Google Llc System and method of replicating data in a distributed system
CN106548789B (zh) * 2015-09-17 2019-05-17 伊姆西公司 用于操作叠瓦式磁记录设备的方法和装置
US20170153842A1 (en) * 2015-12-01 2017-06-01 HGST Netherlands B.V. Data allocation in hard drives
CN105677243B (zh) * 2015-12-31 2018-12-28 华为技术有限公司 数据写入装置及方法
CN106951375B (zh) * 2016-01-06 2021-11-30 北京忆恒创源科技股份有限公司 在存储系统中删除快照卷的方法及装置
KR102533389B1 (ko) * 2016-02-24 2023-05-17 삼성전자주식회사 장치 수명을 향상시키는 데이터 저장 장치 및 이를 포함하는 raid 시스템
US10339044B2 (en) * 2016-03-30 2019-07-02 Sandisk Technologies Llc Method and system for blending data reclamation and data integrity garbage collection
CN107515728B (zh) * 2016-06-17 2019-12-24 清华大学 发挥闪存设备内部并发特性的数据管理方法和装置
US9880745B2 (en) * 2016-06-21 2018-01-30 International Business Machines Corporation Reducing concurrency of garbage collection operations
US10467134B2 (en) * 2016-08-25 2019-11-05 Sandisk Technologies Llc Dynamic anneal characteristics for annealing non-volatile memory
US10740294B2 (en) * 2017-01-12 2020-08-11 Pure Storage, Inc. Garbage collection of data blocks in a storage system with direct-mapped storage devices
US10496293B2 (en) * 2017-03-14 2019-12-03 International Business Machines Corporation Techniques for selecting storage blocks for garbage collection based on longevity information
CN107391774B (zh) * 2017-09-15 2019-11-19 厦门大学 基于重复数据删除的日志文件系统的垃圾回收方法
CN107766180B (zh) * 2017-09-22 2020-08-14 成都华为技术有限公司 存储介质的管理方法、装置及可读存储介质
CN111562880A (zh) * 2019-02-14 2020-08-21 英韧科技(上海)有限公司 一种数据存储装置、系统及数据写入方法
KR20210046377A (ko) * 2019-10-18 2021-04-28 에스케이하이닉스 주식회사 마이그레이션 동작을 위한 메모리 시스템 및 메모리 시스템의 동작방법

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107102819A (zh) * 2014-12-12 2017-08-29 西安三星电子研究有限公司 向固态硬盘写入数据的方法及设备
CN105204783A (zh) * 2015-10-13 2015-12-30 华中科技大学 一种基于数据生存期的固态盘垃圾回收方法
CN109445681A (zh) * 2018-08-27 2019-03-08 华为技术有限公司 数据的存储方法、装置和存储系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3839716A4 *

Also Published As

Publication number Publication date
EP3839716A4 (fr) 2021-10-20
US20210181992A1 (en) 2021-06-17
CN109445681B (zh) 2021-05-11
US12008263B2 (en) 2024-06-11
CN109445681A (zh) 2019-03-08
EP3839716A1 (fr) 2021-06-23

Similar Documents

Publication Publication Date Title
WO2020042850A1 (fr) Procédé et appareil de stockage de données et système de stockage
US11023448B2 (en) Data scrubbing method and apparatus, and computer readable storage medium
US11003533B2 (en) Data processing method, system, and apparatus
US10467246B2 (en) Content-based replication of data in scale out system
US11397538B2 (en) Data migration method and apparatus
US10769035B2 (en) Key-value index recovery by log feed caching
US10599630B2 (en) Elimination of log file synchronization delay at transaction commit time
CN110651246B (zh) 一种数据读写方法、装置和存储服务器
WO2019001521A1 (fr) Procédé de stockage de données, dispositif de stockage, client et système
WO2023046042A1 (fr) Procédé de sauvegarde de données et groupement de bases de données
WO2017020576A1 (fr) Procédé et appareil de compactage de fichiers dans un système de stockage clé/valeur
WO2015085529A1 (fr) Procédé de réplication de données, dispositif de réplication de données et dispositif de stockage
JP2018084864A (ja) 情報処理装置、情報処理方法およびプログラム
WO2020103512A1 (fr) Procédé et dispositif de reconstruction de données dans un système de stockage
WO2023197937A1 (fr) Procédé et appareil de traitement de données, support de stockage et produit programme informatique
US8832395B1 (en) Storage system, and method of storage control for storage system
US20200133514A1 (en) Proactive copy in a storage environment
CN115981559A (zh) 分布式数据存储方法、装置、电子设备和可读介质
WO2022033269A1 (fr) Procédé, dispositif et système de traitement de données
CN116868173A (zh) 降低在恢复操作期间网络延时的影响
TW202129500A (zh) 儲存資料的方法、儲存裝置及資料儲存系統
CN109213621B (zh) 一种数据处理方法及数据处理设备
WO2024131379A1 (fr) Procédé, appareil, et système de stockage de données
WO2016082560A1 (fr) Procédé et serveur pour tester des clés
WO2020181478A1 (fr) Procédé et appareil de gestion de nœuds non optimaux

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19855114

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019855114

Country of ref document: EP

Effective date: 20210316