WO2019000950A1 - 分片管理方法和分片管理装置 - Google Patents

分片管理方法和分片管理装置 Download PDF

Info

Publication number
WO2019000950A1
WO2019000950A1 PCT/CN2018/075188 CN2018075188W WO2019000950A1 WO 2019000950 A1 WO2019000950 A1 WO 2019000950A1 CN 2018075188 W CN2018075188 W CN 2018075188W WO 2019000950 A1 WO2019000950 A1 WO 2019000950A1
Authority
WO
WIPO (PCT)
Prior art keywords
fragment
storage
verification
data
slice
Prior art date
Application number
PCT/CN2018/075188
Other languages
English (en)
French (fr)
Inventor
王晨
姚唐仁
王�锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18823199.7A priority Critical patent/EP3617867B1/en
Priority to EP22182872.6A priority patent/EP4137924A1/en
Publication of WO2019000950A1 publication Critical patent/WO2019000950A1/zh
Priority to US16/718,976 priority patent/US11243706B2/en
Priority to US17/574,262 priority patent/US20220137849A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0685Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays

Definitions

  • the present invention relates to computer technology, and more particularly to the field of storage.
  • the present invention provides an implementation manner of a fragment management method, which may be applied to a distributed storage system, where the distributed storage system includes a computing node and at least one storage node, and the storage node includes at least a storage medium, the distributed storage system includes a plurality of storage media, different slices are stored in different storage media, and the data slice and the first check slice are all located in the first-level storage medium, and the method includes: calculating the node read Obtaining metadata of the first verification slice, obtaining a first storage location where the first verification fragment is located; the computing node selecting a second storage location, where the second storage location is at a second level a storage medium, the read speed of the second-level storage medium is lower than the first-level storage medium, and the second storage location has a free space; and the storage node stores the storage to the first verification slice
  • the node sends a migration indication, indicating that the storage node where the first verification fragment is located sends the first verification fragment to the storage node where the second storage location is located;
  • the calibration slice of the EC or the global verification slice of the LRC can be migrated from a high-cost high-speed storage medium to a low-cost low-speed storage medium, thereby saving the occupation of the high-speed storage medium.
  • the computing node reads the metadata of the second verification fragment, and obtains a third storage location where the second verification fragment is located; a fourth storage location, the fourth storage location being located at a third level storage medium, the third level storage medium having a higher read/write speed than the second level storage medium and lower than the first level storage medium,
  • the fourth storage location has a free space;
  • the computing node sends a migration indication to the storage node where the second verification fragment is located, indicating that the storage node where the second verification fragment is located is the second school
  • the verification slice is sent to the storage node where the fourth storage location is located; the storage node where the fourth storage location is located stores the second verification slice to the second storage location; the computing node indicates to The information of the fourth storage location is updated to the metadata of the first verification slice.
  • the LRC local check fragment can be migrated from a higher cost high speed storage medium to a lower cost low speed storage medium, which saves the occupation of high speed storage medium.
  • the read/write speed of moving into the storage medium is higher than that of the LRC global check fragment. The read/write speed of the media.
  • the data fragment, the first parity fragment, and the second parity fragment are consistent with a local reconstruction code LRC algorithm, where The first check fragment is a global check fragment in the LRC algorithm, and the second check fragment is a local check fragment in the LRC algorithm.
  • the second alternative of the first aspect before the method, includes: the computing node receives the write data request, divides the target data carried in the write data request into data fragments, and generates the data according to the data fragment according to the LRC algorithm. Determining a global check fragment and the local check fragment; the global check fragment is used to check a plurality of data fragments; the local check fragment is used to divide the plurality of data A portion of the data slice in the slice is verified.
  • This scheme introduces the generation process of data fragmentation, local parity fragmentation and global parity fragmentation based on LRC algorithm and the verification relationship between these fragments.
  • a third alternative of the first aspect the data fragment and the first check fragment conform to an erasure code EC algorithm, the method further comprising: the computing node receiving the write data request, carrying the write data request
  • the target data is divided into data fragments, and the first verification fragment is generated according to the EC algorithm according to the data fragment.
  • This scheme introduces the generation process based on the EC algorithm, data fragmentation and check fragmentation (also referred to as “global check fragmentation” in this application) and the check relationship between these fragments.
  • the present invention provides an embodiment of a computing node, the computing node comprising a processor unit and a memory, the memory for storing a computer program, wherein the processor is executed by running the computer program
  • the unit is configured to: read metadata of the first verification slice, obtain a first storage location where the first verification fragment is located; select a second storage location, where the second storage location is located at a second location Level storage medium, the read speed of the second level storage medium is lower than the first level storage medium, the second storage location has a free space; and the migration is sent to the storage node where the first check fragment is located Instructing, by the storage node where the first verification fragment is located, to migrate the first verification fragment to the second storage location; after the migration is completed, updating the information of the second storage location to the The metadata of the first check fragment.
  • the processor is further configured to: read metadata of the second verification fragment, and obtain a third location from which the second verification fragment is located a storage location; the fourth storage location is located at a third level storage medium, the third level storage medium having a higher read speed than the second level storage medium and lower than the first level storage a medium, the fourth storage location has a free space; a storage indication is sent to the storage node where the second verification fragment is located, indicating that the storage node where the second verification fragment is located performs the second verification The fragment migrates to the fourth storage location; after the migration is completed, the information of the fourth storage location is updated to the metadata of the first verification fragment.
  • the data fragment, the first parity fragment, and the second parity fragment are consistent with a local reconstruction code LRC algorithm, where The first check fragment is a global check fragment in the LRC algorithm, and the second check fragment is a local check fragment in the LRC algorithm.
  • the processor is further configured to: receive a write data request, divide the target data carried in the write data request into data fragments, and follow the LRC algorithm according to the data fragmentation. Generating the global check fragment and the local check fragment; the global check fragment is used to check a plurality of data fragments; and the local check fragment is used to compare the multiple A portion of the data fragments in the data slice are verified.
  • the data fragment and the first check fragment are in accordance with an erasure code EC algorithm
  • the processor is further configured to: receive a write data request, write The target data carried in the data request is divided into data fragments, and the first verification fragment is generated according to the EC algorithm according to the data fragment.
  • the present invention provides an embodiment of a fragment management method, the method comprising: a computing node receiving a data unit through an interface, generating a data fragment according to the data unit; and generating a first check according to the data fragment Framing; selecting a storage space located in the first-level storage medium as a data fragment storage location; selecting a storage space located in the second-level storage medium as a first verification fragment storage location, wherein the second-level storage medium
  • the read/write speed is lower than the read/write speed of the first level storage medium; the data slice and the first check slice are transmitted at the selected storage location to segment the data and the first check
  • the fragmentation is performed, wherein the data fragmentation write request carries the data fragment and the data fragment storage location, and the first verification fragment carries the first verification fragment in the write request And the first verification slice storage location.
  • the embodiment is directly sent to different levels of storage media for storage after the fragment is generated, so that the “migration” operation is no longer required, which is directly equivalent to the first/second.
  • the rushing efficiency of the fragmentation is further improved.
  • the second verification fragment is generated according to the data fragment, where the data fragment, the first verification fragment, and the second parity
  • the fragment conforms to the local reconstruction code LRC algorithm, the first parity fragment is a global parity fragment in the LRC algorithm, and the second parity fragment is a local parity fragment in the LRC algorithm; a storage space located in the third-level storage medium as a second verification slice storage location, wherein a read/write speed of the third-level storage medium is lower than a read/write speed of the first-level storage medium, and is high At or equal to the read/write speed of the second-level storage medium; sending a write data request with the selected storage location to be stored in the second verification slice, wherein the write request of the second verification slice carries the The second verification slice and the second verification slice storage location.
  • This scheme introduces the relationship between data fragmentation, global parity fragmentation and local parity fragmentation based on the LRC algorithm.
  • the present invention provides a computing node, including a processor unit and a memory, wherein the memory stores a computer program, and the processor unit is configured to: receive the data unit through the interface, according to Generating, by the data unit, a data slice; generating a first check slice according to the data slice; selecting a storage space located in the first-level storage medium as a data slice storage location, and selecting a storage located in the second-level storage medium a space as a first parity slice storage location, wherein a read/write speed of the second level storage medium is lower than a read/write speed of the first level storage medium; the data point is transmitted at the selected storage location a slice and the first check slice to store the data slice and the first check slice, wherein a storage location of the data slice is the first-level storage medium
  • the storage space, where the storage location of the first verification slice is the storage space of the second-level storage medium.
  • the second verification fragment is generated according to the data fragment, where the data fragment, the first verification fragment, and the second parity
  • the slice conforms to the local reconstructed code LRC algorithm, the first check fragment is a global check fragment in the LRC algorithm, and the second check fragment is a local check fragment in the LRC algorithm; a storage space of the third-level storage medium as a second verification slice storage location, wherein a read/write speed of the third-level storage medium is lower than a read/write speed of the first-level storage medium, and is higher than Or equal to the read/write speed of the second level storage medium; sending the data slice and the second check slice at the selected storage location to store the second check slice, wherein The storage location of the second verification slice is the storage space of the third-level storage medium.
  • the present invention also provides an embodiment of a storage medium
  • the program code can be stored, by running the storage code, the computer/server/distributed storage system can execute the first aspect described above and its various alternatives, or execute The above third aspect and its various alternatives.
  • the present invention further provides an embodiment of a fragment management apparatus.
  • the fragment management apparatus may be software or hardware.
  • the fragment management apparatus is composed of modules, and each module has a function corresponding to the foregoing method embodiment.
  • an embodiment provides a fragment management apparatus, including: a reading module, a location selection module, a migration module, and a metadata management module.
  • the reading module is configured to read metadata of the first verification fragment, and obtain a first storage location where the first verification fragment is located.
  • a location selection module configured to select a second storage location, the second storage location is located in a second level storage medium, a read speed of the second level storage medium is lower than the first level storage medium, and the second The storage location has free space.
  • a migration module configured to send a migration indication to the storage node where the first verification fragment is located, where the storage node where the first verification fragment is located migrates the first verification fragment to the second storage location.
  • the fragment management device includes: a fragmentation module, a location selection module, and a storage module.
  • the fragmentation module is configured to receive a data unit, generate a data fragment according to the data unit, and generate a first verification fragment according to the data fragment.
  • the location selection module is configured to select a storage space located in the first-level storage medium as a data fragment storage location, and select a storage space located in the second-level storage medium as a first verification fragment storage location, where The read speed of the second level storage medium is lower than the read speed of the first level storage medium.
  • the storage module is configured to send the data fragment and the first verification fragment in a selected storage location to store the data fragment and the first verification fragment, where the data fragmentation
  • the write request carries the data fragment and the data fragment storage location
  • the write request of the first verification fragment carries the first verification fragment and the first verification fragment storage location.
  • a metadata management module is configured to record the storage location of the slice in the metadata of the slice.
  • FIG. 1 is a fragment distribution diagram before migration according to an embodiment of a fragment management method
  • FIG. 2 is a fragment distribution diagram after migration according to an embodiment of a fragment management method
  • FIG. 3 is a flow chart of an embodiment of a slice management method
  • FIG. 4 is a hardware structural diagram of an embodiment of a computing node
  • FIG. 5 is a flow chart of another embodiment of a slice management method.
  • the embodiments of the present invention can be applied to a distributed storage system scenario.
  • the distributed storage system referred to in the embodiment of the present invention means a storage system including a plurality of storage media (a storage medium such as a solid state drive SSD, a magnetic disk, a USB flash drive, a rewritable optical disk, a magnetic tape, etc.), and the storage medium may be located at the same One node or multiple nodes.
  • Each storage medium can store one data slice or one check slice.
  • the verification slice is obtained by one or more data fragments through a check calculation.
  • a cloud storage system is also a distributed storage system. In a cloud storage system, a storage node is divided into multiple data centers, and each data center includes at least one storage node.
  • a distributed storage system includes a plurality of storage nodes, such as a computer, a server, or a storage controller + storage medium.
  • a data unit (such as a file or file fragment) is split into multiple data fragments, and an erasure code (EC) calculation is performed on data fragments from the same data unit or from different data units to generate a checksum ( Redundant) fragmentation.
  • Data fragmentation and parity fragmentation are collectively referred to as fragmentation.
  • These fragments (data fragmentation + parity fragmentation) are distributed and stored in different storage nodes, or are distributed and stored in different storage media. If some of the fragments are lost or corrupted, the EC fragment can be used to recover the failed fragment with the remaining fragments.
  • the more the number of verification slices the more the number of failure slices that can be recovered by means of the EC algorithm.
  • LRC Local reconstruction code
  • another type of verification slice is provided, which is obtained by a part of the data slice calculation, and only a part of the data slice is checked and protected. If the data fragment is divided into several data fragment groups, then one set of data fragments corresponding to generate their own verification fragments together form a parity group. This check group can be stored in the same data center or the same storage node. For fault fragments that appear in the check group, if the number of fault fragments is not greater than the number of local check fragments, you can recover them by checking the remaining fragments in the group. Since the physical storage locations of the fragments of the same parity group are close, the recovery speed is very fast. Such parity slices are called local parity.
  • a global parity chip may also be included, and the global parity slice is used to verify all data fragments.
  • the fault shard can be recovered by using the check algorithm, and the check algorithm can be combined with the EC check.
  • the algorithm is the same. For example, when a large number of fragmentation faults occur in the same verification group, it is difficult to recover the fault fragmentation by using the remaining fragments in the group. It is often possible to recover using global check shards.
  • checksum fragment of the EC technology and the global check fragment of the LRC technology are both referred to as "global checksum fragments"; the new checksum fragments of the LRC technology (only for some data fragments are verified) Is "local check shard”.
  • the data fragment used by the distributed storage system for storage includes: data fragment 1, data fragment 2, local parity slice 3; data fragment 4, data fragment 5, local parity slice 6; global school The verification slice 7, the global verification slice 8, and the local verification slice 9.
  • Local check group 1 includes 3 fragments, which are: local check slice 3, data slice 1 and data slice 2.
  • the local verification slice 3 is a verification slice of the data slice 1 and the data slice 2, and the three slices are stored in different storage nodes of the data center 1.
  • Local check group 2 includes 3 fragments, which are: local check fragment 6, data fragment 4, and data fragment 5.
  • the local verification slice 6 is a verification slice of the data slice 4 and the data slice 5, and the three slices are stored in different storage nodes of the data center 2.
  • Local check group 3 includes three shards, which are: local check shard 9, global check shard 7, and global check shard 8.
  • the local check fragment 9 is a checksum slice of the global check fragment 7 and the global check fragment 8, and the three fragments are stored in different storage nodes of the data center 3.
  • Global check group includes 6 slices, which are: global check slice 7, global slice 8, data slice 1, data slice 2, data slice 4, and data slice 5 .
  • the global check fragment 7 and the global slice 8 are data slice 1, data slice 2, data slice 4, and parity slice of the data slice 5.
  • the global check fragment 7 and the global slice 8 may be located in the data center 3.
  • the global check fragment is used to check a plurality of data fragments; the local check fragment is used to slice a part of the plurality of data fragments (usually less than The plurality of data fragments are verified.
  • the data slice verified by the global check fragment is divided into multiple groups, and each data slice group has at least one local check fragment.
  • the data fragments that are verified by different local check fragments may be located in different physical locations, for example, in different data centers, different equipment rooms, different chassis, and different storage nodes.
  • the global check fragment itself also has a corresponding local check fragment, and has the verification of the global check fragment, and the local check fragment of the global check fragment has a name. There is a "local" word, but considering the local checksum whose read frequency is lower than the data slice. Therefore, in the embodiment of the present application, the migration mode is not the same as the local verification fragment of the data fragment, but is the same as the global verification fragment.
  • the remaining two shards can be used to recover the failed shards. If any two shards fail in the entire storage system, the remaining shards in the storage system can be recovered. If three local parity fragments are ignored, then this can be considered an EC check group.
  • the LRC technology has further improved the data reliability and the slice recovery speed compared to the EC technology.
  • the EC algorithm/LRC algorithm refers to an algorithm for calculating a verification slice of a data slice according to the principle of EC/LRC; or, when there is fragmentation damage, according to the principle of EC/LRC, The algorithm for recovering damaged fragments based on undamaged shards.
  • the storage space is also increased.
  • Different types of storage media can be used for different storage nodes in the same data center.
  • the same data center contains storage nodes with standard storage media, storage nodes with warm storage media, and storage nodes with cold storage media.
  • These three types of storage media provide different read and write speeds, with standard storage media being the highest (such as solid state drive SSDs), warm storage media (such as high speed disks), and cold storage media (such as low speed disks) being the lowest.
  • standard storage media such as solid state drive SSDs
  • warm storage media such as high speed disks
  • cold storage media such as low speed disks
  • the cost of the three storage media is also different, the cost of standard storage is the highest, the cost of the warm storage medium is second, and the cost of the cold storage medium is the lowest.
  • the first row is a storage node of a standard storage medium
  • the second row is a warm storage medium node
  • the third row is a cold storage medium node.
  • Figure 1 is a schematic representation of three levels, and in practice, there can be more levels or only two layers.
  • the same storage node includes: standard storage medium, warm storage medium, and cold storage medium.
  • Different slices are distributed on different storage media, but may be located in the same storage node. Understanding a different node in a data center in Figure 1 as a plurality of memories in at least one node is a description of such a scenario. Since there is no essential difference between the two, it will not be described in detail, and only the scenario described in FIG. 1 will be described below.
  • the embodiment of the present invention proposes an innovative idea: performing finer-grained management on the storage location of the slice, and migrating the verification slice to the storage medium with lower cost. Go in. For example, referring to FIG. 2, the data fragment is reserved in the standard storage node; considering that the read/write frequency of the local parity fragment is lower than the data fragmentation, the local parity fragment can be migrated to a lower read/write speed.
  • Media node It should be noted that the focus of this embodiment is to perform the migration of the verification fragments between media of different rates.
  • the media for each node in Figure 2 is uniform, so migrating slices to a cold storage media node means that the data is migrated to the cold storage media. For the case where the same node has different levels of media, the migration may not be performed across nodes, such as migrating check slices from standard storage media to warm storage media of the same node.
  • both types of verification fragments are migrated to a warm storage medium node; or both are migrated to a cold storage medium node.
  • the following describes an embodiment of a fragment management method of the present invention, which specifically describes reducing the occupation of a high-cost storage medium by migrating the verification fragments.
  • This embodiment can be applied to a distributed storage system including a compute node and a storage node.
  • Compute nodes have computing capabilities, and storage nodes are primarily used to store data.
  • the two can be different physical nodes, or they can be integrated into the same physical node.
  • the compute node is, for example, a computer, a server, or a storage controller, and may also be a virtual machine.
  • the computer node includes at least one processor and a memory in which program code is stored, and the processor executes the following steps by running the program code.
  • the storage node is, for example, a computer, a server or a storage controller, and may also be a virtual machine.
  • the storage node includes at least one processor, a memory and a storage medium, where the program code is stored in the memory, and the processor executes the function of the storage node by running the program code (for example, receiving a fragment sent by the computing node, and then storing the storage in the storage interface. Medium), the storage medium is used to store slices and/or metadata.
  • the computing node receives the data unit and splits the data unit into data fragments.
  • the data unit itself is relatively small, it is less than or equal to the size of the slice. Then you can get the data fragment directly, no need to split.
  • the computing node generates a verification slice according to the data fragment.
  • the data slice and the check slice are stored, and the storage location of each slice is saved in the metadata.
  • one or more check fragments generated by data fragmentation are referred to as a first check fragment.
  • the verification fragments generated by the data fragmentation include a local parity slice and a global parity slice.
  • the global verification fragment is called a first verification fragment
  • the local verification fragment is called a second verification fragment.
  • the computing node sends the data fragment and the verification fragment to each storage medium for storage.
  • These storage media belong to the same level (mainly read/write speed).
  • Case 1 Each slice is stored in a storage medium located at a different storage node;
  • Case 2 Part or all of the slice is stored in a storage medium located at the same storage node .
  • the former case is more reliable. It can even be that different shards are stored in storage nodes in different data centers, which is more reliable.
  • the storage locations of the individual shards are saved in the metadata of each shard.
  • the metadata can be stored in the storage node.
  • the cloud storage system includes multiple data centers, each data center includes at least one storage node, and the metadata of the same data center is saved in the same storage of the data center. In the node.
  • the computing node reads the metadata of the first verification fragment, and obtains a storage location where the first verification fragment is located, that is, an outbound storage location of the first verification fragment.
  • the storage location is described, for example, as: [storage node ID, logical address], from which the first verification slice can be read. Or the storage location is described as: [storage node ID, fragment ID], and the storage node storing the first verification fragment records the correspondence between the fragment ID and the logical address/physical address, so that after receiving the storage location, The logical address/physical address of the slice can be obtained according to the slice ID.
  • This step is sent before the verification fragment migration, and the verification fragment and the data fragment are often located in the same type of storage medium, for example, the first type of storage medium.
  • the computing node selects the second storage location as the migration storage location of the first verification fragment.
  • the second storage location is located in the second level storage medium and has a lower read/write speed than the first level storage medium.
  • the storage medium and the first storage location where the second storage location is located may be located at the same storage node or may be located at different storage nodes.
  • the second storage location is located in the second type of storage medium or the third type of storage medium.
  • the read/write speeds of the first type of storage medium, the second type of storage medium, and the third type of storage medium are sequentially lowered.
  • the reason why the storage medium with low read/write speed is used as the migration destination of the verification slice is to reduce the occupation of expensive high-speed storage medium, so as to save cost.
  • the second storage location is used as the migration storage location. If it is composed of the same storage medium, the storage location may be described as only [storage node ID], and the storage node where the second storage location is located may select the storage medium as the migration fragment. Destination.
  • the computing node sends a migration indication to the storage node (the egress node) where the first verification fragment is located, and instructs to migrate the first verification fragment from the first storage location to the second storage location.
  • the compute node then sends an indication to the evicted node.
  • the eviction node migrates the first verification slice from the first storage location to the second storage location.
  • the compute node then sends an indication to the evicted node.
  • the egress node sends the first parity slice to the ingress node (the storage node where the second storage location is located). After the ingress node receives the first verification slice, it deposits the second storage location.
  • the computing node sends an indication to the eviction node indicating that the first verification shard is migrated from the first storage location to the migrating node, but does not indicate the second storage location.
  • the migrating node allocates a storage medium that satisfies performance requirements (such as read/write rate). For example, if any storage medium that moves into the node meets the performance requirements, it can be arbitrarily fragmented by the ingress node; if some of the storage media of the inbound node does not meet the performance requirements, the performance requirement can be directly or indirectly notified at the computing node. Move to the node to move the node to select the storage medium that meets the performance requirements.
  • the computing node instructs to update the information of the second storage location to the metadata of the first verification slice.
  • the storage of metadata is introduced.
  • the metadata is updated, and the new location (second storage location) of the first verification fragment is updated into the metadata of the first verification fragment.
  • the new location (second storage location) of the first verification fragment is updated into the metadata of the first verification fragment.
  • the local check fragment of the global check fragment also follows the migration scheme of steps 13-16.
  • this embodiment further includes the following steps 17-20. Steps 17-20 are similar to steps 13-16, and therefore will not be described in detail.
  • the difference is that the migrated object becomes the local check fragment of the data fragment (excluding the local check fragment of the global check fragment).
  • the migrating position is changed from the first storage location to the third storage location; the migrating location is changed from the second storage location to the fourth storage location; and the read/write speed of the storage medium (the third type of storage medium) where the third storage location is located
  • the storage medium (the first type of storage medium) lower than the data fragment is higher than or equal to the storage medium (the second type of storage medium) where the global check fragment is located.
  • read/write includes any one of three cases of “read”, “write”, and “read and write”.
  • FIG. 4 which is an embodiment of a computing node
  • the foregoing fragment management method can be executed, and since it corresponds to the fragment management method, it will be simply described.
  • the computing node 2 is applied to a distributed storage system including the computing node and at least one storage node, the storage node includes at least one storage medium, and the distributed storage system includes multiple storages
  • the medium, the different fragments are stored in different storage media, and the data fragment and the first verification fragment are all located in the first level storage medium
  • the computing node includes the processor unit 21 and the memory 22, and may further include an external interface ( Not shown), storage medium (not shown).
  • the memory unit 21 is, for example, a single core CPU, a multi-core CPU, a combination of a plurality of CPUs, an FPGA, and the memory 22 is, for example, a volatile storage medium (for example, a RAM), a nonvolatile storage medium (such as a hard disk or an SSD), or A portion of the storage medium.
  • the memory 22 is used to store a computer program.
  • the processor unit 21 is configured to: read metadata of the first verification fragment, obtain a first storage location where the first verification fragment is located, and select a second a storage location, the second storage location is located in a second level storage medium, the read speed of the second level storage medium is lower than the first level storage medium, and the second storage location has a free space;
  • the storage node where the first verification fragment is located sends a migration indication, indicating that the storage node where the first verification fragment is located migrates the first verification fragment to the second storage location; after the migration is completed, The information of the second storage location is updated to the metadata of the first verification slice.
  • the processor 22 is further configured to: read metadata of the second verification slice, obtain a third storage location where the second verification fragment is located, and select a fourth storage location, where the The fourth storage location is located in the third-level storage medium, the read speed of the third-level storage medium is higher than the second-level storage medium and lower than the first-level storage medium, and the fourth storage location has a free space; Sending a migration indication to the storage node where the second verification fragment is located, indicating that the storage node where the second verification fragment is located migrates the second verification fragment to the fourth storage location; After completion, the information of the fourth storage location is updated to the metadata of the first verification slice.
  • the data slice, the first check slice, and the second check slice conform to a local reconstructed code LRC algorithm, where the first check slice is a global check in an LRC algorithm Fragmentation, the second check fragment is a local check fragment in the LRC algorithm.
  • the processor 22 is further configured to: receive a write data request, divide the target data carried in the write data request into data fragments, and generate the global check fragment and the local according to the data fragment according to an LRC algorithm. Verification slice
  • the global check fragment is used to check a plurality of data fragments; the local check fragment is used to verify a part of the data fragments in the plurality of data fragments.
  • the processor is further configured to:
  • the present invention also provides an embodiment of a slice management device, which may be hardware (for example, a computing node) or software (for example, a computer program running in a computing node), the slice management device.
  • a slice management device which may be hardware (for example, a computing node) or software (for example, a computer program running in a computing node), the slice management device.
  • the previous slice management method can be executed, and since it corresponds to the slice management method, it is only described briefly.
  • the fragment management device includes: a reading module, a location selection module, a migration module, and a metadata management module.
  • the reading module is configured to read metadata of the first verification fragment, and obtain a first storage location where the first verification fragment is located.
  • a location selection module configured to select a second storage location, the second storage location is located in a second level storage medium, a read speed of the second level storage medium is lower than the first level storage medium, and the second The storage location has free space.
  • a migration module configured to send a migration indication to the storage node where the first verification fragment is located, where the storage node where the first verification fragment is located migrates the first verification fragment to the second storage location.
  • the metadata management node updates the information of the second storage location to the metadata of the first verification slice.
  • the foregoing module is further configured to perform the functions: the reading module is further configured to read metadata of the second verification fragment, and obtain a third location where the second verification fragment is located a storage location; the location selection module is further configured to select a fourth storage location, where the fourth storage location is located in a third-level storage medium, and the read speed of the third-level storage medium is higher than the second-level storage The medium is lower than the first level storage medium, and the fourth storage location has a free space; the migration module is further configured to send a migration indication to the storage node where the second verification fragment is located, indicating the The storage node where the second verification fragment is located migrates the second verification fragment to the fourth storage location; the metadata management module is further configured to: after the migration is completed, the fourth storage location The information is updated to the metadata of the first verification slice.
  • the first-level storage medium is first written, and then the verification slice that has been written to the first-level storage medium is migrated, and the migration destination is cost.
  • Lower second level storage medium / third level storage medium. 5 is an embodiment of another fragment management method according to the present invention. The difference from the former method is that after the verification fragment is generated, the verification fragment is directly written to the second-level storage with lower cost. Media/third level storage media.
  • the embodiment described in FIG. 5 omits the step of migration and is therefore more efficient.
  • the former method also has its own advantages, that is, the writing speed of the first-level storage medium is faster, and after the writing is successful, the host (the sender of the data unit) can be notified that the writing operation is completed, so that it can be faster.
  • This advantage is even more pronounced in the case of a response host, especially if the second level storage medium/third level storage medium is located in a cold storage node. Because the cold storage node is usually powered down, it is powered on when there is data write, so the response speed is very slow.
  • the computing node receives the data unit from the host or the server through the external interface, and splits the data unit into data fragments.
  • the data unit itself is relatively small, less than or equal to the size of the fragment, then no splitting is needed, but the data fragmentation can be directly obtained (if less than one fragment size, the fragmentation can be achieved by adding 0 to the fragment. size).
  • the computing node generates a verification slice according to the data fragment.
  • the verification slice includes a first verification slice and a second verification slice.
  • the verification slice is also referred to as the first verification slice.
  • the verification fragments generated by the data fragmentation include a local parity slice and a global parity slice.
  • the first check fragment is globally verified and the second check fragment is a local check fragment.
  • the global check fragment is to verify all data fragments; the local check fragment is to verify some data fragments.
  • the computing node selects the first-level storage medium, and sends the data fragment to the storage node where the first-level storage medium is located for storage.
  • the first level of storage media is the medium with the fastest read and write speed.
  • the computing node selects the second-level storage medium, and sends the first verification fragment to the storage node where the second-level storage medium is located for storage.
  • the read/write rate of the first level storage medium is lower than that of the first level storage medium.
  • the first verification slice can verify the data slice.
  • the first check fragment is simply referred to as an allocation.
  • the first check fragment is equivalent to the global check fragment.
  • step 35 The computing node selects the third-level storage medium, and sends the third verification fragment to the storage node where the third-level storage medium is located for storage.
  • the read/write rate of the third level storage medium is lower than the first level storage medium and higher than the second storage medium.
  • step 35 is an optional step. In the case of the LRC, step 35 is performed, and in the case of the EC, step 35 is not performed.
  • steps 33, 34 and 35 can be performed in any chronological order.
  • the storage node that receives the stripe stores the stripe.
  • the migration step is omitted, and the data striping and the verification strip are directly stored hierarchically.
  • the rest of the content (such as the interpretation of algorithms and nodes, the relationship between the verification slice and the data slice, the storage location/node selection scheme, and the definition of the noun, etc.) can be referred to the previous embodiment.
  • the computing node may specify only the storage node for storing the fragment, and does not specify the storage medium specific to the storage fragment, which is determined by the storage node receiving the fragment.
  • a storage medium that stores slices For the sake of brevity, the present embodiment does not describe similar contents, and can refer to the foregoing embodiments directly.
  • the present invention also provides a computing node that can perform the methods described in steps 31-36, as well as FIG.
  • the compute node 2 includes a processor unit 21 and a memory 22 in which is stored a computer program.
  • the computing node includes a processor unit 21 and a memory 22, and may further include an external interface (not shown) and a storage medium (not shown).
  • the memory unit 21 is, for example, a single core CPU, a multi-core CPU, a combination of a plurality of CPUs, an FPGA, and the memory 22 is, for example, a volatile storage medium (for example, a RAM), a nonvolatile storage medium (such as a hard disk or an SSD), or A portion of the storage medium.
  • the memory 22 is used to store a computer program.
  • the processor unit 21 is configured to: receive a data unit by using an interface, generate a data fragment according to the data unit; generate a first verification fragment according to the data fragment; and select the first a storage space of the level storage medium as a data fragment storage location, and a storage space located at the second level storage medium as a first verification slice storage location, wherein the read speed of the second level storage medium is lower than the a read speed of the first-level storage medium; the data slice and the first check slice are sent at the selected storage location to store the data slice and the first check slice, wherein The storage location of the data fragment is the storage space of the first-level storage medium, and the storage location of the first verification fragment is the storage space of the second-level storage medium.
  • the processor unit 21 is further configured to: generate, according to the data fragment, a second verification fragment, where the data fragment, the first parity The slice and the second check fragment are in accordance with a local reconstructed code LRC algorithm, the first check fragment is a global check fragment in an LRC algorithm, and the second check fragment is in an LRC algorithm.
  • a local verification slice selecting a storage space located in the third-level storage medium as a second verification slice storage location, wherein a read speed of the third-level storage medium is lower than a read of the first-level storage medium Speed, and higher than or equal to the read speed of the second-level storage medium; transmitting the data slice and the second check slice at the selected storage location to perform the second check slice Storage, wherein the storage location of the second verification slice is the storage space of the third-level storage medium.
  • the present invention also provides an embodiment of a slice management device, which may be hardware (for example, a computing node) or software (for example, a computer program running in a computing node), the slice management device.
  • a slice management device which may be hardware (for example, a computing node) or software (for example, a computer program running in a computing node), the slice management device.
  • the previous slice management method can be executed, and since it corresponds to the slice management method, it is only described briefly.
  • the fragment management device includes: a fragmentation module, a location selection module, and a storage module.
  • the fragmentation module is configured to receive a data unit, generate a data fragment according to the data unit, and generate a first verification fragment according to the data fragment.
  • the location selection module is configured to select a storage space located in the first-level storage medium as a data fragment storage location, and select a storage space located in the second-level storage medium as a first verification fragment storage location, where The read speed of the second level storage medium is lower than the read speed of the first level storage medium.
  • the storage module is configured to send the data fragment and the first verification fragment in a selected storage location to store the data fragment and the first verification fragment, where the data fragmentation
  • the write request carries the data fragment and the data fragment storage location
  • the write request of the first verification fragment carries the first verification fragment and the first verification fragment storage location.
  • a metadata management module is configured to record the storage location of the slice in the metadata of the slice.
  • the fragmentation module is further configured to generate a second verification fragment according to the data fragment, where the data fragment, the first verification fragment, and the second parity
  • the slice conforms to the local reconstruction code LRC algorithm, the first check fragment is a global check fragment in the LRC algorithm, and the second check fragment is a local check fragment in the LRC algorithm;
  • the location selection module is further configured to select a storage space located in the third-level storage medium as a second verification fragment storage location, wherein a read speed of the third-level storage medium is lower than the first-level storage medium The read speed is higher than or equal to the read speed of the second-level storage medium; the storage module is further configured to send the write data request with the selected storage location to store the second verification slice, where The second verification slice and the second verification slice storage location are carried in the write request of the second verification slice.
  • the above integrated units if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-available storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • the instructions include a plurality of instructions for causing a computer device (which may be a personal computer, server or network device, etc., and in particular a processor in a computer device) to perform all or part of the steps of the above-described methods of various embodiments of the present invention.
  • the foregoing storage medium may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM), and the like.
  • the medium of the code may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM), and the like.
  • the medium of the code may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM), and the like.
  • the medium of the code may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM), and the like.
  • the medium of the code may include: a U disk, a mobile hard disk, a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Detection And Correction Of Errors (AREA)

Abstract

一种分片管理技术,应用于在分布式存储系统中,计算节点读取所述第一校验分片的元数据,从中获得所述第一校验分片所在的第一存储位置;所述计算节点选择第二存储位置,所述第二存储位置位于第二级存储介质,所述第二级存储介质的读取速度低于所述第一级存储介质,所述第二存储位置拥有空闲空间;所述计算节点向所述第一校验分片所在的存储节点发送迁移指示,指示所述第一校验分片所在的存储节点把所述第一校验分片发送给第二存储位置所在的存储节点;第二存储位置所在的存储节点把所述第一校验分片存储到第二存储位置;所述计算节点指示把所述第二存储位置的信息更新到所述第一校验分片的元数据。应用该技术可以减少高速存储介质的占用量,降低存储系统成本。

Description

[根据细则37.2由ISA制定的发明名称] 分片管理方法和分片管理装置 背景技术
本发明涉及计算机技术,尤其涉及存储领域。
发明内容
第一方面,本发明提供一种分片管理方法的实施方式,该方法可以应用于在分布式存储系统中,所述分布式存储系统包括计算节点和至少一个存储节点,所述存储节点包括至少一个存储介质,分布式存储系统包括多个存储介质,不同分片存储于不同的存储介质,数据分片和第一校验分片均位于第一级存储介质中,该方法包括:计算节点读取所述第一校验分片的元数据,从中获得所述第一校验分片所在的第一存储位置;所述计算节点选择第二存储位置,所述第二存储位置位于第二级存储介质,所述第二级存储介质的读取速度低于所述第一级存储介质,所述第二存储位置拥有空闲空间;所述计算节点向所述第一校验分片所在的存储节点发送迁移指示,指示所述第一校验分片所在的存储节点把所述第一校验分片发送给第二存储位置所在的存储节点;第二存储位置所在的存储节点把所述第一校验分片存储到第二存储位置;所述计算节点指示把所述第二存储位置的信息更新到所述第一校验分片的元数据。
使用该方法,可以把EC的校验分片或者LRC的全局校验分片,从成本较高的高速存储介质迁移到成本较低的低速存储介质中,节约了高速存储介质的占用。
第一方面的第一种可选方案:计算节点读取所述第二校验分片的元数据,从中获得所述第二校验分片所在的第三存储位置;所述计算节点选择第四存储位置,所述第四存储位置位于第三级存储介质,所述第三级存储介质的读取/写入速度高于所述第二级存储介质、且低于第一级存储介质,所述第四存储位置拥有空闲空间;所述计算节点向所述第二校验分片所在的存储节点发送迁移指示,指示所述第二校验分片所在的存储节点把所述第二校验分片发送给所述第四存储位置所在的存储节点;所述第四存储位置所在的存储节点把所述第二校验分片存储到第二存储位置;所述计算节点指示把所述第四存储位置的信息更新到所述第一校验分片的元数据。
使用该方法,可以把LRC的本地校验分片,从成本较高的高速存储介质迁移到成本较低的低速存储介质中,节约了高速存储介质的占用。并且,考虑到LRC的本地校验分片的使用率高于LRC的全局校验分片,因此其迁入存储介质的读/写速度要高于LRC的全局校验分片所迁入的存储介质的读/写速度。
可选的,在第一方面的第一种可选方案中:所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,其中,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片。
第一方面的第二种可选方案,在所述方法之前包括:计算节点接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照LRC算法生成所述全局校验分片和所述本地校验分片;所述全局校验分片用于对多个数据分片进行校验;所述本地校验分片用于对所述多个数据分片中的一部分数据分片进行校验。
该方案介绍了基于LRC算法,数据分片、本地校验分片和全局校验分片的生成过程以及这些分片之间的校验关系。
第一方面的第三种可选方案:所述数据分片和所述第一校验分片符合纠删码EC算法,该方法进一步包括:计算节点接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照EC算法生成所述第一校验分片。
该方案介绍了基于EC算法,数据分片和校验分片(本申请中又称其为“全局校验分片”)的生成过程以及这些分片之间的校验关系。
第二方面,本发明提供一种计算节点的实施例,所述计算节点包括处理器单元和存储器,所述存储器用于存储计算机程序,其特征在于,通过运行所述计算机程序,所述处理器单元用于:读取所述第一校验分片的元数据,从中获得所述第一校验分片所在的第一存储位置;选择第二存储位置,所述第二存储位置位于第二级存储介质,所述第二级存储介质的读取速度低于所述第一级存储介质,所述第二存储位置拥有空闲空间;向所述第一校验分片所在的存储节点发送迁移指示,指示所述第一校验分片所在的存储节点把所述第一校验分片迁移到所述第二存储位置;迁移完成后,把所述第二存储位置的信息更新到所述第一校验分片的元数据。
第二方面的第一种可能的实施方式中,所述处理器还用于执行:读取所述第二校验分片的元数据,从中获得所述第二校验分片所在的第三存储位置;选择第四存储位置,所述第四存储位置位于第三级存储介质,所述第三级存储介质的读取速度高于所述第二级存储介质、且低于第一级存储介质,所述第四存储位置拥有空闲空间;所向所述第二校验分片所在的存储节点发送迁移指示,指示所述第二校验分片所在的存储节点把所述第二校验分片迁移到所述第四存储位置;迁移完成后,把所述第四存储位置的信息更新到所述第一校验分片的元数据。
可选的,在第二方面的第一种可能的实施方式中,所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,其中,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片。
第二方面的第二种可能的实施方式中,所述处理器还用于执行:接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照LRC算法生成所述全局校验分片和所述本地校验分片;所述全局校验分片用于对多个数据分片进行校验;所述本地校验分片用于对所述多个数据分片中的一部分数据分片进行校验。
第二方面的第三种可能的实施方式中,所述数据分片和所述第一校验分片符合纠删码EC算法,所述处理器还用于执行:接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照EC算法生成所述第一校验分片。
在第二方面及其各种可能的实施方式中,其解决的计算问题和有益效果和第一方面对应的实施例相似,因此不做赘述。
第三方面,本发明提供一种分片管理方法的实施例,该方法包括:计算节点通过接口接收数据单元,根据所述数据单元生成数据分片;根据所述数据分片生成第一校验分 片;选择位于第一级存储介质的存储空间作为数据分片存储位置;选择位于第二级存储介质的存储空间作为第一校验分片存储位置,其中,所述第二级存储介质的读/写速度低于所述第一级存储介质的读/写速度;以选择的存储位置发送所述数据分片和所述第一校验分片,以对数据分片和第一校验分片进行存储,其中所述,数据分片的写请求中携带所述数据分片以及所述数据分片存储位置,第一校验分片的写请求中携带所述第一校验分片以及所述第一校验分片存储位置。
该实施方式和第一/第二方面的方案相比,在分片生成后就直接发送到不同等级的存储介质进行存储,因此不再需要执行“迁移”操作,直接相当于第一/第二方面的方案中,经过迁移操作后的效果,因此进一步提高了分片匆匆的效率。
第三方面的第一种可能的实现方式中:根据所述数据分片生成第二校验分片,其中,所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片;选择位于第三级存储介质的存储空间作为第二校验分片存储位置,其中,所述第三级存储介质的读/写速度低于所述第一级存储介质的读/写速度,且高于或者等于所述第二级存储介质的读/写速度;以选择的存储位置发送写数据请求以第二校验分片进行存储,其中,第二校验分片的写请求中携带所述第二校验分片以及所述第二校验分片存储位置。
该方案介绍了基于LRC算法时,数据分片、全局校验分片和本地校验分片之间的关系。
第四方面,本发明提供一种计算节点,包括处理器单元和存储器,所述存储器中存储有计算机程序,通过运行所述计算机程序,所述处理器单元用于:通过接口接收数据单元,根据所述数据单元生成数据分片;根据所述数据分片生成第一校验分片;选择位于第一级存储介质的存储空间作为数据分片存储位置,以及选择位于第二级存储介质的存储空间作为第一校验分片存储位置,其中,所述第二级存储介质的读/写速度低于所述第一级存储介质的读/写速度;以选择的存储位置发送所述数据分片和所述第一校验分片,以对所述数据分片和所述第一校验分片进行存储,其中,所述数据分片的存储位置是所述位于第一级存储介质的存储空间,所述第一校验分片的存储位置是所述位于第二级存储介质的存储空间。
可选的,在第四方面的方案中,根据所述数据分片生成第二校验分片,其中,所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片;选择位于第三级存储介质的存储空间作为第二校验分片存储位置,其中,所述第三级存储介质的读/写速度低于所述第一级存储介质的读/写速度,且高于或者等于所述第二级存储介质的读/写速度;以选择的存储位置发送所述数据分片和所述第二校验分片,以对所述第二校验分片进行存储,其中,所述第二校验分片的存储位置是所述位于第三级存储介质的存储空间。
第四方面及其可选方案中,有益效果和解决的技术问题和第三方面及其可选方案相 同。不做赘述。
第五方面,本发明还提供存储介质的实施例,可以存储程序代码,通过运行存储代码,计算机/服务器/分布式存储系统可以执行上述第一方面及其可能的各种可选方式,或者执行上述第三方面及其可能的各种可选方式。
第六方面,本发明还提供分片管理装置的实施例,分片管理装置可以是软件或者硬件,分片管理装置由模块组成,各个模块具有对应于前述方法实施例的的功能。
例如:一种实施例提供一种分片管理装置,包括:读取模块、位置选择模块、迁移模块和元数据管理模块。所述读取模块,用于读取所述第一校验分片的元数据,从中获得所述第一校验分片所在的第一存储位置。位置选择模块,用于选择第二存储位置,所述第二存储位置位于第二级存储介质,所述第二级存储介质的读取速度低于所述第一级存储介质,所述第二存储位置拥有空闲空间。迁移模块,用于向所述第一校验分片所在的存储节点发送迁移指示,指示所述第一校验分片所在的存储节点把所述第一校验分片迁移到所述第二存储位置。
再例如,另外一种实施例也提供了分片管理装置,分片管理装置包括:分片模块、位置选择模块、存储模块。所述分片模块,用于接收数据单元,根据所述数据单元生成数据分片;根据所述数据分片生成第一校验分片。所述位置选择模块,用于选择位于第一级存储介质的存储空间作为数据分片存储位置,以及选择位于第二级存储介质的存储空间作为第一校验分片存储位置,其中,所述第二级存储介质的读速度低于所述第一级存储介质的读速度。所述存储模块,用于以选择的存储位置发送所述数据分片和所述第一校验分片,以对数据分片和第一校验分片进行存储,其中所述,数据分片的写请求中携带所述数据分片以及所述数据分片存储位置,第一校验分片的写请求中携带所述第一校验分片以及所述第一校验分片存储位置。元数据管理模块,用于把分片的存储位置记录在分片的元数据中。
附图说明
为了更清楚地说明本发明实施例技术方案,下面将对实施例和现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其它的附图。
图1是依照分片管理方法实施例迁移前的分片分布图;
图2是依照分片管理方法实施例迁移后的分片分布图;
图3是分片管理方法实施例的流程图;
图4是计算节点实施例的硬件结构图;
图5是另一分片管理方法实施例的流程图。
具体实施方式
本发明的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包括了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或者可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。术语 “第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。
本发明实施例可以应用于分布式存储系统场景。本发明实施例所指的分布式存储系统,意指包括多个存储介质(存储介质例如固态硬盘SSD,磁盘,U盘,可擦写光盘、磁带等)的存储系统,这些存储介质可以位于同一个节点或者多个节点。每个存储介质可以存储一个数据分片或者一个校验分片。其中,校验分片由一个或者多个数据分片通过校验计算获得。云存储系统也是一种分布式存储系统,在云存储系统中,存储节点被划分为多个数据中心,每个数据中心包括至少一个存储节点。
一方面存储位置可以不同。分布式存储系统包括多个存储节点,存储节点例如是计算机、服务器、或者存储控制器+存储介质。数据单元(例如文件或者文件分片)拆分为多个数据分片,对来自同一个数据单元或者来自不同数据单元的数据分片进行纠删码(erasure code,EC)计算,生成校验(冗余)分片。数据分片和校验分片统称为分片,这些分片(数据分片+校验分片)被分散的存储到不同的存储节点中,或者被分散的存储到不同的存储介质中。如果其中有部分分片的数据丢失或者损坏,借助于EC算法,可以用余下的分片把出现故障的分片恢复出来。校验分片的数量越多,借助于EC算法可以恢复的故障分片数量也越多。
本地重建码(local reconstruction code,LRC)技术可以看成是EC的一种扩展形态,LRC可以提高分片恢复的效率。在LRC技术中,提供另外一种校验分片,这种校验分片通过一部分数据分片计算获得,并仅对一部分数据分片进行校验保护。如果把数据分片分成几个数据分片组,那么,一组数据分片对应生成自己的校验分片共同形成一个校验组。这个校验组可以存储于同一个数据中心、或者同一个存储节点。对于校验组内出现的故障分片,如果故障分片的数量不大于本地校验分片的数量,可以通过校验组内余下的分片对其进行恢复。由于同一校验组的分片的物理存储位置接近,因此恢复速度很快,这类校验分片称为本地校验片(local parity)。
LRC技术中,还可以包括全局校验片(global parity),全局校验片用于对全部数据分片进行校验。在数据分片和全局校验分片的组合中,如果故障分片的数量不大于全局校验分配的数量,那么使用校验算法可以对故障分片进行恢复,校验算法可以和EC校验算法相同。例如当同一个校验组内出现大量的分片故障,用组内余下的分片难以恢复出故障分片时。往往可以使用全局校验分片进行恢复。
为了对以上两种类型的校验分片进行区分。把EC技术的校验分片和LRC技术的全局校验分片都称为“全局校验分片”;把LRC技术新增的校验分片(仅对部分数据分片进行校验)称为“本地校验分片”。
为了更加易于理解,下面参见附图1对一种LRC的应用场景举例。在一个分布式存储系统中,包括数据中心1、数据中心2和数据中心3。每个数据中心包括多个存储节点。该分布式存储系统用于存储的数据分片包括:数据分片1、数据分片2、本地校验分片3;数据分片4、数据分片5、本地校验分片6;全局校验分片7、全局校验分片8,本地校验分片9。
本地校验组1:包括3个分片,这3个分片分别是:本地校验分片3、数据分片1 和数据分片2。其中,本地校验分片3是数据分片1和数据分片2的校验分片,这三个分片存储于数据中心1的不同存储节点。
本地校验组2:包括3个分片,这3个分片分别是:本地校验分片6、数据分片4和数据分片5。其中,本地校验分片6是数据分片4和数据分片5的校验分片,这三个分片存储于数据中心2的不同存储节点。
本地校验组3:包括3个分片,这3个分片分别是:本地校验分片9、全局校验分片7和全局校验分片8。其中,本地校验分片9是全局校验分片7和全局校验分片8的校验分片,这三个分片存储于数据中心3的不同存储节点。
全局校验组:包括6个分片,这6个分片分别是:全局校验分片7、全局分片8、数据分片1、数据分片2、数据分片4和数据分片5。
其中全局校验分片7和全局分片8是数据分片1、数据分片2、数据分片4、数据分片5的校验分片。全局校验分片7和全局分片8可以位于数据中心3。
由此可见,所述全局校验分片用于对多个数据分片进行校验;所述本地校验分片用于对所述多个数据分片中的一部分数据分片(通常少于所述多个数据分片)进行校验。例如:把全局校验分片所校验的数据分片分成多个组,每个数据分片组拥有至少一个本地校验分片。不同的本地校验分片所负责校验的数据分片可以位于不同的物理位置,例如位于不同的数据中心、不同的机房、不同的机框、不同的存储节点。
需要特别说明的是,所述全局校验分片本身也拥有对应的本地校验分片,拥有对所述全局校验分片进行校验,全局校验分片的本地校验分片虽然名称里有“本地”二字,但是考虑到其读取频度低于数据分片的本地校验分片。因此,在没有特别说明的情况下,本申请各个实施例中,对迁移方式不与数据分片的本地校验分片相同,而是和全局校验分片相同。
如果任意一个数据中心中,有1个分片出现故障,使用余下的2个分片可以对故障分片进行恢复。如果在整个存储系统中,有任意2个分片出现故障,可以该存储系统中余下的分片进行恢复。如果忽略3个本地校验分片,那么可以认为这是一个EC校验组。
由此可见,LRC技术相对于EC技术,数据可靠性和分片恢复速度有了进一步提高。
本发明各个实施例中,EC算法/LRC算法,是指根据EC/LRC的原理,计算数据分片的校验分片的算法;或者,在有分片损坏时,根据EC/LRC的原理,根据未损坏的分片恢复被损坏的分片的算法。
另一方面,然而,不论是LRC技术还是EC技术,伴随着数据可靠性的提高,也增加了存储空间的占用。同一个数据中心的不同存储节点,可以使用不同类型的存储介质。例如同一个数据中心中包含了拥有标准存储介质的存储节点、拥有温存储介质的存储节点和拥有冷存储介质的存储节点。这三种存储介质提供的读写速度不同,标准存储介质最高(例如固态硬盘SSD)、温存储介质(例如高速磁盘)其次,冷存储介质(例如低速磁盘)最低。相应的,三种存储介质的成本也不同,标准存储的成本最高、温存储介质的成本其次,冷存储介质的成本最低。参见图1,在各个数据中心中,位于第一 行的是标准存储介质的存储节点,位于第二行的是温存储介质节点,位于第三行的是冷存储介质节点。图1示意性的指列出了三个层次,在实际应用中,可以有更多的分层次或者仅有2层。
此外需要说明的是,即使在同一个节点中,也可以使用不同类型的存储介质。例如同一个存储节点包括了:标准存储介质、温存储介质和冷存储介质,不同的分片分布于不同的存储介质,但是可以位于同一个存储节点。把图1中一个数据中心内的不同节点,理解成至少一个节点内的多个存储器,就是对这种场景的一种描述。由于二者没有本质的区别,因此不做详述,下面仅以图1描述的场景进行描述。
对于存储在标准存储介质中的数据分片,其校验分片也存在同类型的存储介质中。这无疑占用了大量昂贵的标准存储介质。考虑到校验分片的利用机会并不高,本发明实施例提出一种创新的思路:对分片的存储位置进行更细粒度的管理,把校验分片迁移到成本更低的存储介质中去。例如参见图2,保留数据分片在标准存储节点;考虑到本地校验分片的读写频度低于数据分片,因此可以把把本地校验分片迁移到读写速度较低的温存储介质节点;而全局校验分片的读写频度更低,因此可以迁移到读写速度更低的冷存储介质节点,全局校验分片的本地校验分片也可以迁移到冷存储介质节点。需要说明的是,本实施例的侧重点在于在不同速率的介质之间进行校验分片的迁移。图2中每个节点的介质是统一的,因此把分片迁移到冷存储介质节点,就意味着数据被迁移到冷存储介质中。而对于同一个节点拥有不同等级介质的情况,则可以不跨节点进行分配迁移,例如把校验分片从标准存储介质迁移到同一个节点的温存储介质。
当然,具体还可以有更多的变形。例如把两种类型的校验分片都迁移到温存储介质节点;或者都迁移到冷存储介质节点。
下面介绍本发明一种分片管理方法的实施例,具体描述了通过对校验分片进行迁移,减少对高成本存储介质的占用。该实施例可以应用于分布式存储系统中,分布式存储系统包括计算节点和存储节点。计算节点拥有计算功能,存储节点主要用于存储数据。二者可以是不同的物理节点,也可以把它们的功能集成在同一个物理节点中。
计算节点例如是计算机、服务器或者存储控制器,还可以是虚拟机。计算机节点包括至少一个处理器和存储器,存储器中存储有程序代码,处理器通过运行所述程序代码执行下面的步骤。存储节点例如是计算机、服务器或者存储控制器,还可以是虚拟机。存储节点包括至少一个处理器、存储器和存储介质,存储器中存储有程序代码,处理器通过运行所述程序代码执行存储节点的功能(例如接收计算节点发送的分片,然后存储于所述存储接中),存储介质用于存储分片和/或元数据。
11,计算节点接收数据单元,把数据单元拆分成数据分片。
需要说明的是,如果数据单元本身比较小,小于等于分片的大小。那么可以直接获得数据分片即可,不需要拆分。
12,计算节点根据数据分片生成校验分片。存储所述数据分片和校验分片,以及把各个分片的存储位置保存在元数据中。
对于EC算法,利用数据分片生成的一个或者多个校验分片,称之为第一校验分片。
对于LRC算法,利用数据分片生成的校验分片包括本地校验分片和全局校验分片。为了对这两种校验分片进行区分,全局校验分片称为第一校验分片,本地校验分片称为第二校验分片。
计算节点把数据分片和校验分片发送给各个存储介质进行存储。这些存储介质属于同一种等级(主要是读/写速度)。如前所述,因此具体而言,包括两种情形:情形一:每个分片存储于位于不同存储节点的存储介质;情形二:部分或者全部分片存储于位于同一个存储节点的存储介质。相较而言,前一种情形的可靠性更高。甚至还可以是:不同的分片存储于不同数据中心的存储节点中,这种做法可靠性更高。
把各个分片的存储位置保存在各个分片的元数据中。元数据可以保存在存储节点中,例如:云存储系统包括多个数据中心,每个数据中心包括至少一个存储节点,同一个数据中心的分片的元数据,保存在本数据中心的同一个存储节点中。
13,计算节点读取第一校验分片的元数据,从中获得第一校验分片所在的存储位置,也就是第一校验分片的迁出存储位置。
存储位置例如描述为:[存储节点ID,逻辑地址],由存储位置可以读取第一校验分片。或者存储位置描述为:[存储节点ID,分片ID],存储第一校验分片的存储节点记录分片ID和逻辑地址/物理地址的对应关系,因此它在收到获得这个存储位置后,可以根据分片ID获得分片的逻辑地址/物理地址。
本步骤发送在校验分片迁移前,校验分片和数据分片往往位于同一类存储介质,例如第一类存储介质。
14,计算节点选择第二存储位置,作为第一校验分片的迁入存储位置。第二存储位置位于第二级存储介质,其读/写速度低于第一级存储介质。
第二存储位置所在的存储介质和第一存储位置可以位于同一个存储节点,也可以位于不同的存储节点。
第二存储位置位于第二类存储介质或者第三类存储介质。第一类存储介质、第二类存储介质、第三类存储介质的读/写速度依次降低。之所以把读/写速度较低的存储介质作为校验分片的迁入目的地,是为了减少昂贵的高速存储介质的占用,以便节约成本。
除了步骤13中提及的存储位置描述方式以外。第二存储位置作为迁入存储位置,如果是由同一等级存储介质组成,那么存储位置可以仅描述为[存储节点ID],由第二存储位置所在的存储节点自行选择存储介质作为迁入分片的目的地。
15,计算节点向所述第一校验分片所在的存储节点(迁出节点)发送迁移指示,指示把所述第一校验分片从所述第一存储位置迁移到第二存储位置。
如果这两个位置位于同一个存储节点。则计算节点把指示发送给迁出节点。迁出节点把第一校验分片从所述第一存储位置迁移到第二存储位置。
如果这两个位置位于不同存储节点。则计算节点把指示发送给迁出节点。迁出节点 把第一校验分片发送给迁入节点(第二存储位置所在的存储节点)。所述迁入节点接收到第一校验分片后,存入所述第二存储位置。
在另外一种实施方式中,计算节点把指示发送给迁出节点,指示把第一校验分片从第一存储位置迁移到迁入节点,但是不指示第二存储位置。迁出节点把第一校验分片发送给迁入节点后,由迁入节点自行分配分配满足性能要求(例如读/写速率)的存储介质。例如,如果迁入节点的任意存储介质均满足性能要求,则可由迁入节点任意分片;如果迁入节点存在部分存储介质不满足性能要求,则可以在计算节点把性能要求直接或者间接的通知迁入节点,以便迁入节点选择满足性能要求的存储介质。
16,迁移完成后,所述计算节点指示把所述第二存储位置的信息更新到所述第一校验分片的元数据。
在上面的步骤12中,介绍了元数据的存储。本步骤中,对元数据进行了更新,把第一校验分片的新位置(第二存储位置)更新到第一校验分片的元数据中。以便后续对第一校验分片进行读取或者修改。
如果校验算法是LRC,则全局校验分片的本地校验分片也沿用步骤13-16的迁移方案。此外,本实施例还包括下面的步骤17~20。步骤17~20和步骤13-16类似,因此不做详述。不同之处在于:被迁移的对象变为数据分片的本地校验分片(不包括全局校验分片的本地校验分片)。迁出位置由第一存储位置变为第三存储位置;迁入位置由第二存储位置变为第四存储位置;第三存储位置所在的存储介质(第三类存储介质)的读/写速度低于数据分片所在的存储介质(第一类存储介质),高于、或者等于全局校验分片所在的存储介质(第二类存储介质)。
需要说明的是,本发明实施例中,“读/写”包括“读”、“写”以及“读和写”这三种情况中的任意一个。
参见附图4,是计算节点的一种实施例,可以执行前面的分片管理方法,由于对应于分片管理方法,因此仅做简单描述。
所述计算节点2,应用于在分布式存储系统中,所述分布式存储系统包括所述计算节点和至少一个存储节点,所述存储节点包括至少一个存储介质,分布式存储系统包括多个存储介质,不同分片存储于不同的存储介质,数据分片和第一校验分片均位于第一级存储介质中,所述计算节点包括处理器单元21和存储器22,还可以包括对外接口(未图示)、存储介质(未图示)。其中存储器单元21例如单核CPU、多核CPU、多个CPU的组合、FPGA,存储器22例如易失性存储介质(例如RAM)、非易失性存储介质(例如硬盘或者SSD),还可以是所述存储介质的一部分。所述存储器22用于存储计算机程序。
通过运行所述计算机程序,所述处理器单元21用于:读取所述第一校验分片的元数据,从中获得所述第一校验分片所在的第一存储位置;选择第二存储位置,所述第二存储位置位于第二级存储介质,所述第二级存储介质的读取速度低于所述第一级存储介质,所述第二存储位置拥有空闲空间;向所述第一校验分片所在的存储节点发送迁移指示,指示所述第一校验分片所在的存储节点把所述第一校验分片迁移到所述第二存储位 置;迁移完成后,把所述第二存储位置的信息更新到所述第一校验分片的元数据。
所述处理器22还用于执行:读取所述第二校验分片的元数据,从中获得所述第二校验分片所在的第三存储位置;选择第四存储位置,所述第四存储位置位于第三级存储介质,所述第三级存储介质的读取速度高于所述第二级存储介质、且低于第一级存储介质,所述第四存储位置拥有空闲空间;所向所述第二校验分片所在的存储节点发送迁移指示,指示所述第二校验分片所在的存储节点把所述第二校验分片迁移到所述第四存储位置;迁移完成后,把所述第四存储位置的信息更新到所述第一校验分片的元数据。
所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,其中,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片。
所述处理器22还用于执行:接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照LRC算法生成所述全局校验分片和所述本地校验分片;
所述全局校验分片用于对多个数据分片进行校验;所述本地校验分片用于对所述多个数据分片中的一部分数据分片进行校验。
当所述数据分片和所述第一校验分片符合纠删码EC算法,所述处理器还用于执行:
接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照EC算法生成所述第一校验分片。
此外,本发明还提供一种是分片管理装置的实施例,分片管理装置可以是硬件(例如计算节点)还可以是软件(例如计算节点中运行的计算机程序),所述分片管理装置可以执行前面的分片管理方法,由于对应于分片管理方法,因此仅做简单描述。
分片管理装置包括:读取模块、位置选择模块、迁移模块和元数据管理模块。
所述读取模块,用于读取所述第一校验分片的元数据,从中获得所述第一校验分片所在的第一存储位置。
位置选择模块,用于选择第二存储位置,所述第二存储位置位于第二级存储介质,所述第二级存储介质的读取速度低于所述第一级存储介质,所述第二存储位置拥有空闲空间。
迁移模块,用于向所述第一校验分片所在的存储节点发送迁移指示,指示所述第一校验分片所在的存储节点把所述第一校验分片迁移到所述第二存储位置。
元数据管理节点,把所述第二存储位置的信息更新到所述第一校验分片的元数据。
可选的,上述模块还用于执行这些功能:所述读取模块,还用于读取所述第二校验分片的元数据,从中获得所述第二校验分片所在的第三存储位置;所述位置选择模块,还用于选择第四存储位置,所述第四存储位置位于第三级存储介质,所述第三级存储介质的读取速度高于所述第二级存储介质、且低于第一级存储介质,所述第四存储位置拥有空闲空间;所述迁移模块,还用于所向所述第二校验分片所在的存储节点发送迁移指 示,指示所述第二校验分片所在的存储节点把所述第二校验分片迁移到所述第四存储位置;所述元数据管理模块,还用于在迁移完成后,把所述第四存储位置的信息更新到所述第一校验分片的元数据。
在图3所示的方法实施例中,校验分片生成后先写入第一级存储介质,然后对已经写入到第一级存储介质的校验分片进行迁移,迁移目的地是成本较低的第二级存储介质/第三级存储介质。参见图5是本发明另外一个分片管理方法的实施例,和前一种方式的区别在于:在生成校验分片后,直接把校验分片写入到成本较低的第二级存储介质/第三级存储介质。
和前一种方式相比,图5所介绍的实施方式省略了迁移的步骤,因此效率更高。而前一种方式也有自己的优势,那就是第一级存储介质的写入速度更快,写入成功后就可以通知主机(数据单元的发出者)该写入操作依据完成,因此可以更快速的响应主机,尤其是如果第二级存储介质/第三级存储介质是位于冷存储节点的情况下,这种优势更加明显。因为冷存储节点通常是下电状态,在有数据写入是才上电启动,因此响应速度非常慢。
31,计算节点通过对外接口从主机或者服务器接收数据单元,把数据单元拆分成数据分片。
需要说明的是,如果数据单元本身比较小,小于等于分片的大小,那么不需要拆分,而是可以直接获得数据分片(如果小于一个分片的大小,可以通过补0达到分片的大小)。
32,计算节点根据数据分片生成校验分片。校验分片包括第一校验分片和第二校验分片。
关于第一校验分片和第二校验分片的含义参见前面的实施例。对于EC算法,利用数据分片生成一个或者多个校验分片。校验分片也称为第一校验分片。对于LRC算法,利用数据分片生成的校验分片包括本地校验分片和全局校验分片。本实施例中,第一校验分片全局校验分片,第二校验分片是本地校验分片。全局校验分片是对所有数据分片进行校验;本地校验分片是对部分数据分片进行校验。
33,计算节点选择第一级存储介质,把数据分片发送给第一级存储介质所在的存储节点进行存储。第一级存储介质是读写速度最快的介质。
34,计算节点选择第二级存储介质,把第一校验分片发送给第二级存储介质所在的存储节点进行存储。第一级存储介质的读/写速率低于第一级存储介质。
如前所述,在EC场景下或者LRC场景下,第一校验分片可以对数据分片进行校验。在EC场景下,第一校验分片简称为分配。在LRC场景下,第一校验分片相当于全局校验分片。
35,计算节点选择第三级存储介质,把第三校验分片发送给第三级存储介质所在的存储节点进行存储。第三级存储介质的读/写速率低于第一级存储介质、高于第二存储介质。如前所述,步骤35是可选步骤。在LRC的情况下执行步骤35,在EC的情况下, 不执行步骤35。
步骤33、34和35这三个步骤可以按照任意时间顺序执行。
36,收到分条的存储节点对分条进行存储。
由于和步骤11-16所描述的方法相比,本实施例的主要区别在于少了迁移的步骤,改为直接对数据分条和校验分条进行分级存储。其余内容(例如算法和节点的解释,校验分片和数据分片的关系,存储位置/节点的选择方案,以及名词的定义等)均可参照前面的实施例。例如,参见步骤15,在步骤33、34和35中,计算节点可以仅指定用于存储分片的存储节点,而不指定存储分片具体的存储介质,由收到分片的存储节点来决定存储分片的存储介质。为了简洁,本实施例对相似的内容不做赘述,直接参照前面的实施例即可。
本发明还提供一种计算节点,可以执行步骤31-36所介绍的方法,同样可以参见图4。
计算节点2,包括处理器单元21和存储器22,所述存储器22中存储有计算机程序。所述计算节点包括处理器单元21和存储器22,还可以包括对外接口(未图示)、存储介质(未图示)。其中存储器单元21例如单核CPU、多核CPU、多个CPU的组合、FPGA,存储器22例如易失性存储介质(例如RAM)、非易失性存储介质(例如硬盘或者SSD),还可以是所述存储介质的一部分。所述存储器22用于存储计算机程序。
通过运行所述计算机程序,所述处理器单元21用于:通过接口接收数据单元,根据所述数据单元生成数据分片;根据所述数据分片生成第一校验分片;选择位于第一级存储介质的存储空间作为数据分片存储位置,以及选择位于第二级存储介质的存储空间作为第一校验分片存储位置,其中,所述第二级存储介质的读速度低于所述第一级存储介质的读速度;以选择的存储位置发送所述数据分片和所述第一校验分片,以对所述数据分片和所述第一校验分片进行存储,其中,所述数据分片的存储位置是所述位于第一级存储介质的存储空间,所述第一校验分片的存储位置是所述位于第二级存储介质的存储空间。
可选的,通过运行所述计算机程序,所述处理器单元21还用于:根据所述数据分片生成第二校验分片,其中,所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片;选择位于第三级存储介质的存储空间作为第二校验分片存储位置,其中,所述第三级存储介质的读速度低于所述第一级存储介质的读速度,且高于或者等于所述第二级存储介质的读速度;以选择的存储位置发送所述数据分片和所述第二校验分片,以对所述第二校验分片进行存储,其中,所述第二校验分片的存储位置是所述位于第三级存储介质的存储空间。
此外,本发明还提供一种是分片管理装置的实施例,分片管理装置可以是硬件(例如计算节点)还可以是软件(例如计算节点中运行的计算机程序),所述分片管理装置可以执行前面的分片管理方法,由于对应于分片管理方法,因此仅做简单描述。
分片管理装置包括:分片模块、位置选择模块、存储模块。
所述分片模块,用于接收数据单元,根据所述数据单元生成数据分片;根据所述数据分片生成第一校验分片。
所述位置选择模块,用于选择位于第一级存储介质的存储空间作为数据分片存储位置,以及选择位于第二级存储介质的存储空间作为第一校验分片存储位置,其中,所述第二级存储介质的读速度低于所述第一级存储介质的读速度。
所述存储模块,用于以选择的存储位置发送所述数据分片和所述第一校验分片,以对数据分片和第一校验分片进行存储,其中所述,数据分片的写请求中携带所述数据分片以及所述数据分片存储位置,第一校验分片的写请求中携带所述第一校验分片以及所述第一校验分片存储位置。
元数据管理模块,用于把分片的存储位置记录在分片的元数据中。
可选的:所述分片模块还用于根据所述数据分片生成第二校验分片,其中,所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片;
述所述位置选择模块还用于选择位于第三级存储介质的存储空间作为第二校验分片存储位置,其中,所述第三级存储介质的读速度低于所述第一级存储介质的读速度,且高于或者等于所述第二级存储介质的读速度;所述所述存储模块还用于以选择的存储位置发送写数据请求以第二校验分片进行存储,其中,第二校验分片的写请求中携带所述第二校验分片以及所述第二校验分片存储位置。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可获取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本发明的各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(ROM,Read-Only Memory)或者随机存取存储器(RAM,Random Access Memory)等各种可以存储程序代码的介质。换言之,本发明提供一种存储介质实施例,该存储介质用于记录计算机程序/软件,通过运行该存储计算机程序/软件,计算机/服务器/计算节点/分布式存储系统,可以执行前述的各个分片管理方法实施例。
以上所述,以上实施例仅用以说明本发明的技术方案而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,然而本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (14)

  1. 一种分片管理方法,应用于在分布式存储系统中,所述分布式存储系统包括计算节点和至少一个存储节点,所述存储节点包括至少一个存储介质,分布式存储系统包括多个存储介质,不同分片存储于不同的存储介质,数据分片和第一校验分片均位于第一级存储介质中,其特征在于,该方法包括:
    计算节点读取所述第一校验分片的元数据,从中获得所述第一校验分片所在的第一存储位置;
    所述计算节点选择第二存储位置,所述第二存储位置位于第二级存储介质,所述第二级存储介质的读取速度低于所述第一级存储介质,所述第二存储位置拥有空闲空间;
    所述计算节点向所述第一校验分片所在的存储节点发送迁移指示,指示所述第一校验分片所在的存储节点把所述第一校验分片发送给第二存储位置所在的存储节点;
    第二存储位置所在的存储节点把所述第一校验分片存储到第二存储位置;
    所述计算节点指示把所述第二存储位置的信息更新到所述第一校验分片的元数据。
  2. 根据权利要求1所述的方法,其特征在于,该方法进一步包括:
    计算节点读取所述第二校验分片的元数据,从中获得所述第二校验分片所在的第三存储位置;
    所述计算节点选择第四存储位置,所述第四存储位置位于第三级存储介质,所述第三级存储介质的读取速度高于所述第二级存储介质、且低于第一级存储介质,所述第四存储位置拥有空闲空间;
    所述计算节点向所述第二校验分片所在的存储节点发送迁移指示,指示所述第二校验分片所在的存储节点把所述第二校验分片发送给所述第四存储位置所在的存储节点;
    所述第四存储位置所在的存储节点把所述第二校验分片存储到第二存储位置;
    所述计算节点指示把所述第四存储位置的信息更新到所述第一校验分片的元数据。
  3. 根据权利要求2所述的方法,其特征在于:
    所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,其中,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片。
  4. 根据权利要求1所述的方法,其特征在于,所述方法之前包括:
    计算节点接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根 据所述数据分片按照LRC算法生成所述全局校验分片和所述本地校验分片;
    所述全局校验分片用于对多个数据分片进行校验;所述本地校验分片用于对所述多个数据分片中的一部分数据分片进行校验。
  5. 根据权利要求1所述的方法,其特征在于所述数据分片和所述第一校验分片符合纠删码EC算法,该方法进一步包括:
    计算节点接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照EC算法生成所述第一校验分片。
  6. 一种计算节点,所述计算节点包括处理器单元和存储器,所述存储器用于存储计算机程序,其特征在于,通过运行所述计算机程序,所述处理器单元用于:
    读取所述第一校验分片的元数据,从中获得所述第一校验分片所在的第一存储位置;
    选择第二存储位置,所述第二存储位置位于第二级存储介质,所述第二级存储介质的读取速度低于所述第一级存储介质,所述第二存储位置拥有空闲空间;
    向所述第一校验分片所在的存储节点发送迁移指示,指示所述第一校验分片所在的存储节点把所述第一校验分片迁移到所述第二存储位置;
    迁移完成后,把所述第二存储位置的信息更新到所述第一校验分片的元数据。
  7. 根据权利要求1所述的计算节点,其特征在于,所述处理器还用于执行:
    读取所述第二校验分片的元数据,从中获得所述第二校验分片所在的第三存储位置;
    选择第四存储位置,所述第四存储位置位于第三级存储介质,所述第三级存储介质的读取速度高于所述第二级存储介质、且低于第一级存储介质,所述第四存储位置拥有空闲空间;
    所向所述第二校验分片所在的存储节点发送迁移指示,指示所述第二校验分片所在的存储节点把所述第二校验分片迁移到所述第四存储位置;
    迁移完成后,把所述第四存储位置的信息更新到所述第一校验分片的元数据。
  8. 根据权利要求7所述的计算节点,其特征在于:
    所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,其中,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片。
  9. 根据权利要求6所述的计算节点,其特征在于,所述处理器还用于执行:
    接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照LRC算法生成所述全局校验分片和所述本地校验分片;
    所述全局校验分片用于对多个数据分片进行校验;所述本地校验分片用于对所 述多个数据分片中的一部分数据分片进行校验。
  10. 根据权利要求6所述的计算节点,其特征在于,所述数据分片和所述第一校验分片符合纠删码EC算法,所述处理器还用于执行:
    接收写数据请求,把写数据请求中携带的目标数据分成数据分片,根据所述数据分片按照EC算法生成所述第一校验分片。
  11. 一种分片管理方法,其特征在于,该方法包括:
    计算节点通过接口接收数据单元,根据所述数据单元生成数据分片;
    根据所述数据分片生成第一校验分片;
    选择位于第一级存储介质的存储空间作为数据分片存储位置;
    选择位于第二级存储介质的存储空间作为第一校验分片存储位置,其中,所述第二级存储介质的读速度低于所述第一级存储介质的读速度;
    以选择的存储位置发送所述数据分片和所述第一校验分片,以对数据分片和第一校验分片进行存储,其中所述,数据分片的写请求中携带所述数据分片以及所述数据分片存储位置,第一校验分片的写请求中携带所述第一校验分片以及所述第一校验分片存储位置。
  12. 根据权利要求11所述的分片管理方法,其特征在于,该方法还包括:
    根据所述数据分片生成第二校验分片,其中,所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片;
    选择位于第三级存储介质的存储空间作为第二校验分片存储位置,其中,所述第三级存储介质的读速度低于所述第一级存储介质的读速度,且高于或者等于所述第二级存储介质的读速度;
    以选择的存储位置发送写数据请求以第二校验分片进行存储,其中,第二校验分片的写请求中携带所述第二校验分片以及所述第二校验分片存储位置。
  13. 一种计算节点,包括处理器单元和存储器,所述存储器中存储有计算机程序,通过运行所述计算机程序,所述处理器单元用于:
    通过接口接收数据单元,根据所述数据单元生成数据分片;
    根据所述数据分片生成第一校验分片;
    选择位于第一级存储介质的存储空间作为数据分片存储位置,以及选择位于第二级存储介质的存储空间作为第一校验分片存储位置,其中,所述第二级存储介质的读速度低于所述第一级存储介质的读速度;
    以选择的存储位置发送所述数据分片和所述第一校验分片,以对所述数据分片和所述第一校验分片进行存储,其中,所述数据分片的存储位置是所述位于第一级存储介质的存储空间,所述第一校验分片的存储位置是所述位于第二级存储介质的 存储空间。
  14. 根据权利要求13所述的计算节点,其特征在于,通过运行所述计算机程序,所述处理器单元还用于:
    根据所述数据分片生成第二校验分片,其中,所述数据分片、所述第一校验分片和所述第二校验分片符合本地重构码LRC算法,所述第一校验分片是LRC算法中的的全局校验分片,所述第二校验分片是LRC算法中的本地校验分片;
    选择位于第三级存储介质的存储空间作为第二校验分片存储位置,其中,所述第三级存储介质的读速度低于所述第一级存储介质的读速度,且高于或者等于所述第二级存储介质的读速度;
    以选择的存储位置发送所述数据分片和所述第二校验分片,以对所述第二校验分片进行存储,其中,所述第二校验分片的存储位置是所述位于第三级存储介质的存储空间。
PCT/CN2018/075188 2017-06-29 2018-02-03 分片管理方法和分片管理装置 WO2019000950A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP18823199.7A EP3617867B1 (en) 2017-06-29 2018-02-03 Fragment management method and fragment management apparatus
EP22182872.6A EP4137924A1 (en) 2017-06-29 2018-02-03 Fragment management method and fragment management apparatus
US16/718,976 US11243706B2 (en) 2017-06-29 2019-12-18 Fragment management method and fragment management apparatus
US17/574,262 US20220137849A1 (en) 2017-06-29 2022-01-12 Fragment Management Method and Fragment Management Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710515966.2 2017-06-29
CN201710515966.2A CN107436733B (zh) 2017-06-29 2017-06-29 分片管理方法和分片管理装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/718,976 Continuation US11243706B2 (en) 2017-06-29 2019-12-18 Fragment management method and fragment management apparatus

Publications (1)

Publication Number Publication Date
WO2019000950A1 true WO2019000950A1 (zh) 2019-01-03

Family

ID=60459642

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075188 WO2019000950A1 (zh) 2017-06-29 2018-02-03 分片管理方法和分片管理装置

Country Status (4)

Country Link
US (2) US11243706B2 (zh)
EP (2) EP3617867B1 (zh)
CN (2) CN112328168A (zh)
WO (1) WO2019000950A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021249418A1 (zh) * 2020-06-12 2021-12-16 华为技术有限公司 一种数据写入方法和装置

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328168A (zh) 2017-06-29 2021-02-05 华为技术有限公司 分片管理方法和分片管理装置
WO2019183958A1 (zh) 2018-03-30 2019-10-03 华为技术有限公司 数据写入方法、客户端服务器和系统
CN110555012B (zh) * 2018-05-14 2022-03-25 杭州海康威视数字技术股份有限公司 数据迁移方法及装置
CN108920099B (zh) * 2018-06-22 2021-11-16 中国人民解放军战略支援部队信息工程大学 基于多种分片方式的数据动态存储系统及方法
CN108920104B (zh) * 2018-06-29 2021-06-25 吴俊杰 一种无中心的视频监控云存取方法
CN109067852A (zh) * 2018-07-15 2018-12-21 中国人民解放军国防科技大学 一种基于纠删码的跨中心协同修复方法
CN111936960B (zh) 2018-12-25 2022-08-19 华为云计算技术有限公司 分布式存储系统中数据存储方法、装置及计算机程序产品
CN110825698B (zh) * 2019-11-07 2021-02-09 重庆紫光华山智安科技有限公司 元数据管理方法及相关装置
CN112860599B (zh) * 2019-11-28 2024-02-02 中国电信股份有限公司 数据缓存处理方法、装置以及存储介质
CN113918378A (zh) * 2020-07-10 2022-01-11 华为技术有限公司 数据存储方法、存储系统、存储设备及存储介质
CN114816226A (zh) * 2021-01-29 2022-07-29 伊姆西Ip控股有限责任公司 用于管理存储系统的方法、设备和计算机程序产品
WO2023125507A1 (zh) * 2021-12-29 2023-07-06 华为技术有限公司 生成块组的方法、装置和设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699494A (zh) * 2013-12-06 2014-04-02 北京奇虎科技有限公司 一种数据存储方法、数据存储设备和分布式存储系统
CN105487823A (zh) * 2015-12-04 2016-04-13 华为技术有限公司 一种数据迁移的方法及装置
CN106201338A (zh) * 2016-06-28 2016-12-07 华为技术有限公司 数据存储方法及装置
CN106383665A (zh) * 2016-09-05 2017-02-08 华为技术有限公司 数据存储系统中的数据存储方法及协调存储节点
CN107436733A (zh) * 2017-06-29 2017-12-05 华为技术有限公司 分片管理方法和分片管理装置

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6910079B2 (en) * 2002-01-25 2005-06-21 University Of Southern California Multi-threshold smoothing
US7614068B2 (en) * 2005-03-18 2009-11-03 Nokia Corporation Prioritization of electronic service guide carousels
JP4749140B2 (ja) * 2005-12-05 2011-08-17 株式会社日立製作所 データマイグレーション方法及びシステム
WO2011036020A1 (en) * 2009-09-25 2011-03-31 International Business Machines Corporation Data storage
CN106469029B (zh) * 2011-12-31 2019-07-23 华为数字技术(成都)有限公司 数据分层存储处理方法、装置和存储设备
CN103095805B (zh) * 2012-12-20 2018-03-30 江苏辰云信息科技有限公司 一种对数据进行智能分层管理的云存储系统
US9378084B2 (en) * 2013-06-25 2016-06-28 Microsoft Technology Licensing, Llc Erasure coding across multiple zones
GB2519815A (en) * 2013-10-31 2015-05-06 Ibm Writing data cross storage devices in an erasure-coded system
CN103744620B (zh) * 2013-12-31 2017-11-07 百度在线网络技术(北京)有限公司 一种用于数据存储的方法与设备
US9378088B1 (en) * 2014-12-30 2016-06-28 Datadirect Networks, Inc. Method and system for reclamation of distributed dynamically generated erasure groups for data migration between high performance computing architectures and data storage using non-deterministic data addressing
US9595979B2 (en) * 2015-01-20 2017-03-14 International Business Machines Corporation Multiple erasure codes for distributed storage
US10558538B2 (en) * 2017-11-22 2020-02-11 Netapp, Inc. Erasure coding repair availability
US10187083B2 (en) * 2015-06-26 2019-01-22 Microsoft Technology Licensing, Llc Flexible erasure coding with enhanced local protection group structures
US9530442B1 (en) * 2015-09-23 2016-12-27 Western Digital Technologies, Inc. Enhanced low overhead data protection in data storage drives
US10552038B2 (en) * 2016-05-13 2020-02-04 International Business Machines Corporation Object storage architecture based on file_heat
US9672905B1 (en) * 2016-07-22 2017-06-06 Pure Storage, Inc. Optimize data protection layouts based on distributed flash wear leveling
US20180039425A1 (en) * 2016-08-02 2018-02-08 Alibaba Group Holding Limited Method and apparatus for improved flash memory storage latency and robustness
US10564883B2 (en) * 2016-12-13 2020-02-18 EMC IP Holding Company LLC Efficient migration to distributed storage
US10635340B2 (en) * 2016-12-21 2020-04-28 EMC IP Holding Company LLC Storage tiering with efficient allocation of redundant data
CN106776111A (zh) * 2017-01-06 2017-05-31 东北大学 一种基于lrc纠删码的可恢复云存储系统
US11630729B2 (en) * 2020-04-27 2023-04-18 Fungible, Inc. Reliability coding with reduced network traffic

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699494A (zh) * 2013-12-06 2014-04-02 北京奇虎科技有限公司 一种数据存储方法、数据存储设备和分布式存储系统
CN105487823A (zh) * 2015-12-04 2016-04-13 华为技术有限公司 一种数据迁移的方法及装置
CN106201338A (zh) * 2016-06-28 2016-12-07 华为技术有限公司 数据存储方法及装置
CN106383665A (zh) * 2016-09-05 2017-02-08 华为技术有限公司 数据存储系统中的数据存储方法及协调存储节点
CN107436733A (zh) * 2017-06-29 2017-12-05 华为技术有限公司 分片管理方法和分片管理装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3617867A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021249418A1 (zh) * 2020-06-12 2021-12-16 华为技术有限公司 一种数据写入方法和装置

Also Published As

Publication number Publication date
CN112328168A (zh) 2021-02-05
US20200125286A1 (en) 2020-04-23
US20220137849A1 (en) 2022-05-05
EP3617867B1 (en) 2022-07-27
US11243706B2 (en) 2022-02-08
EP3617867A4 (en) 2020-05-27
CN107436733B (zh) 2020-11-06
EP3617867A1 (en) 2020-03-04
EP4137924A1 (en) 2023-02-22
CN107436733A (zh) 2017-12-05

Similar Documents

Publication Publication Date Title
WO2019000950A1 (zh) 分片管理方法和分片管理装置
EP3726364B1 (en) Data write-in method and solid-state drive array
US10664367B2 (en) Shared storage parity on RAID
US9524101B2 (en) Modeling workload information for a primary storage and a secondary storage
US9547552B2 (en) Data tracking for efficient recovery of a storage array
US9916478B2 (en) Data protection enhancement using free space
US9405625B2 (en) Optimizing and enhancing performance for parity based storage
WO2016090541A1 (zh) 数据存储系统和数据存储方法
WO2019001521A1 (zh) 数据存储方法、存储设备、客户端及系统
CN110851401B (zh) 用于管理数据存储的方法、装置和计算机可读介质
JP2016512365A (ja) 不揮発性メモリシステムにおける同期ミラーリング
JP2016525249A (ja) ガベージデータを収集するための方法及び記憶装置
US11449400B2 (en) Method, device and program product for managing data of storage device
JP6089844B2 (ja) 制御装置,ストレージ装置,及び制御プログラム
WO2019000949A1 (zh) 分布式存储系统中元数据存储方法、系统及存储介质
WO2016116930A1 (en) Reusable memory devices with wom codes
WO2018188618A1 (zh) 固态硬盘访问
US11625193B2 (en) RAID storage device, host, and RAID system
US11269745B2 (en) Two-node high availability storage system
US10852951B1 (en) System and method for improving I/O performance by introducing extent pool level I/O credits and user I/O credits throttling on Mapped RAID
US10956245B1 (en) Storage system with host-directed error scanning of solid-state storage devices
US11663080B1 (en) Techniques for performing live rebuild in storage systems that operate a direct write mode
US10747610B2 (en) Leveraging distributed metadata to achieve file specific data scrubbing
CN109542359B (zh) 一种数据重建方法、装置、设备及计算机可读存储介质
CN117149062A (zh) 一种磁带损坏数据的处理方法以及计算装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18823199

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018823199

Country of ref document: EP

Effective date: 20191126

NENP Non-entry into the national phase

Ref country code: DE