WO2020215223A1 - 分布式存储系统和分布式存储系统中垃圾回收方法 - Google Patents

分布式存储系统和分布式存储系统中垃圾回收方法 Download PDF

Info

Publication number
WO2020215223A1
WO2020215223A1 PCT/CN2019/083960 CN2019083960W WO2020215223A1 WO 2020215223 A1 WO2020215223 A1 WO 2020215223A1 CN 2019083960 W CN2019083960 W CN 2019083960W WO 2020215223 A1 WO2020215223 A1 WO 2020215223A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
node
valid data
address
logical unit
Prior art date
Application number
PCT/CN2019/083960
Other languages
English (en)
French (fr)
Inventor
罗小东
陈飘
何益
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201980089025.4A priority Critical patent/CN113302597A/zh
Priority to PCT/CN2019/083960 priority patent/WO2020215223A1/zh
Publication of WO2020215223A1 publication Critical patent/WO2020215223A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation

Definitions

  • This application relates to the storage field, and more specifically, to a distributed storage system and a garbage collection method in a distributed storage system.
  • garbage data In a distributed storage system, data is usually written to multiple storage nodes included in the system through additional writing. Additional writing is different from overwriting. When the data is modified, the original data will not be deleted immediately, so a large amount of junk data will inevitably appear in the system (the modified data is valid data). In order to release the storage space occupied by garbage data, the system will periodically perform garbage collection. Garbage collection targets logical units. The specific process is to select a certain number of storage nodes in the distributed storage system, create new logical units in these storage nodes, and then combine the valid data in the logical units to be recycled Write a new logical unit, and then release the storage space occupied by the logical unit to be recycled.
  • This application provides a distributed storage system and a garbage collection method in a distributed storage system, which can ensure that at least a part of valid data is migrated in the same storage node, which reduces cross-node data migration to a certain extent, thereby achieving savings The purpose of bandwidth.
  • the first aspect provides a garbage collection method in a distributed storage system.
  • the distributed storage system includes a plurality of storage nodes, and one of the storage nodes is a master node.
  • the master node selects a target node from the plurality of storage nodes according to the amount of valid data distributed in each storage node by the source logical unit, and the data of the first valid data stored in the target node The quantity exceeds the set quantity threshold.
  • the master node creates a target logical unit, and storage nodes distributed by the target logical unit include the target node. In other words, at least a part of the storage space occupied by the target logical unit comes from the target node.
  • the master node instructs the target node to migrate the first valid data from the first source address to the first target address.
  • Both the first source address and the first target address refer to actual addresses, and both the first source address and the first target address are located in the target node, but the storage space indicated by the first source address belongs to the source logical unit.
  • the storage space indicated by a target address belongs to the target logical unit.
  • the storage node that stores more valid data distributed by the source logical unit is taken as the target node, and the storage node distributed by the target logical unit created by the master node includes the target node, so the target The node not only provides storage space for the source logical unit, but also provides storage space for the target logical unit.
  • the master node can instruct the target node to migrate the first valid data from the first source address in the target node to the first target in the target node. address. Since the first valid data is migrated inside the target node, data forwarding between storage nodes is avoided to a certain extent, and network bandwidth is saved.
  • the master node instructs the target node to migrate the first valid data from the first source address in the target node to the first target in the target node Before the address, the master node creates a migration list, and the migration list includes the first source address of the first valid data and the first target address of the first valid data. Then, the master node sends the migration list to the target node.
  • the migration list can be created according to the principle of least migration.
  • the multiple storage nodes further include other storage nodes, and the other storage nodes are independent of the storage nodes distributed by the target logical unit ,
  • the migration list further includes a second source address of the second valid data stored in the other storage node and a second target address of the second valid data, the second source address is located in the other storage node , The second target address is located in the target node.
  • the master node also sends the migration list to the other storage nodes.
  • Other storage nodes are storage nodes that store less effective data among the distribution of the source logical unit, and are not selected as the target node, so no storage space is provided for the target logical unit.
  • the other storage node needs to migrate the second valid data stored by it to the target node. If there are multiple target nodes, you can migrate to any target node. Alternatively, the other storage node may also migrate the second valid data to a node other than the target node where the target logical unit is located.
  • the first source address and the first target address are both located in the first hard disk of the target node.
  • the target node may send the first source address and the first target address to the first hard disk, and the first hard disk performs the migration operation. This reduces the burden on the processor of the target node.
  • the first source address is located in the first hard disk of the target node
  • the first target address is located in the first hard disk of the target node.
  • the specific migration operation is that the processor of the target node reads the first valid data from the first source address to the cache, and then writes the first target address from the cache. .
  • the master node instructs the target node to migrate the first valid data from the first source address in the target node to
  • the first target address in the target node includes the master node instructing the target node to transfer the first valid data from the first valid data according to the offset of the first valid data in the source logical unit.
  • the source address is migrated to the first target address, so that the offset of the first valid data in the target logical unit after the migration is the same as the offset of the first valid data in the source logical unit before the migration The offset is the same.
  • the location of the first valid data in the source logical unit is the same as the location in the target logical unit.
  • the master node may modify the identifier of the target logical unit to the identifier of the source logical unit. Because the logical address of the data is composed of the identifier of the logical unit where the data is located, and the offset within the logical unit.
  • the target logical unit inherits the identification of the source logical unit, and from the fourth implementation, it can be known that the position of the first valid data in the source logical unit is the same as the position in the target logical unit Therefore, the logical address of the first valid data does not change before and after the migration, thereby avoiding the modification of the metadata of the first valid data and the forwarding of the modified metadata between storage nodes, further saving Network bandwidth.
  • a second aspect of the present application provides a master node, the master node is located in a distributed storage system, the distributed storage system includes a plurality of storage nodes, the master node includes an interface and a processor, wherein the interface uses For communicating with the multiple storage nodes; the processor is configured to execute any implementation provided in the first aspect.
  • a third aspect of the present application provides a garbage collection device, the device is located in a master node of a distributed storage system, the distributed storage system includes a plurality of storage nodes, and the master node is among the plurality of storage nodes
  • the garbage collection device is used to implement any one of the implementations provided in the first aspect.
  • the fourth aspect of the present application provides a computer program product for garbage collection, including a computer-readable storage medium storing program code, the program code includes instructions for executing the method described in the first aspect.
  • Figure 1 is an application scenario diagram provided by an embodiment of the present invention.
  • Figure 2 is a schematic diagram of a logic unit provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of the effect of a garbage collection method provided by an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a garbage collection method provided by an embodiment of the present invention.
  • Figure 5 is a schematic diagram of a migration list provided by an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of another garbage collection method provided by an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of the effect of another garbage collection method provided by an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a master node provided by an embodiment of the present invention.
  • Fig. 9 is a schematic structural diagram of a garbage collection device of a master node provided by an embodiment of the present invention.
  • the embodiment of the present application can ensure that at least a part of valid data is migrated in the same storage node during garbage collection, which reduces cross-node data migration to a certain extent, thereby achieving the purpose of saving bandwidth.
  • a distributed storage system is used as an example to describe the technical solutions of the embodiments of the present application, but the embodiments of the present invention are not limited thereto.
  • data is stored on multiple storage nodes (hereinafter referred to as "nodes"), and multiple storage nodes share the storage load.
  • nodes multiple storage nodes
  • This storage method not only improves the reliability, availability, and access of the system Efficiency and easy to expand.
  • the storage node is, for example, a server, or a combination of a storage controller and a storage medium.
  • Fig. 1 is a schematic diagram of a scenario where the technical solution of this embodiment can be applied.
  • the storage system 100 includes a switch 103 and a plurality of storage nodes (or “nodes" for short) 104 and the like.
  • the switch 103 is an optional device.
  • Each storage node 104 may include multiple mechanical hard disks or other types of storage media (for example, solid state hard disks or shingled magnetic recording) for storing data.
  • Fig. 2 is an example of the logic unit provided in this embodiment.
  • a logical unit is a section of logical space, and the actual physical space of each logical unit comes from multiple nodes.
  • the number of nodes occupied by a logical unit depends on the type of Redundant Array of Independent Disks (RAID) corresponding to the logical unit.
  • RAID Redundant Array of Independent Disks
  • node 2, node 3, node 4, node 5, node 6, and node 7 each provide a portion of storage space, thereby constructing a logical unit 1 with a RAID type of "4+2".
  • node 2, Node 3, node 4, and node 5 are used to store data fragments
  • node 6 and node 7 are used to store parity fragments.
  • one node (for example, node 2) is elected as the master node.
  • the master node divides the received data into 4 data fragments, calculates 2 check fragments of the 4 data fragments, and then sends each data fragment and its check fragment to the corresponding node for storage.
  • the master node can be the node where one of the shards is located, or it can be a node independent of the logical unit 1.
  • data fragments are written to a node, it is usually written at a set granularity, such as 8KB or 16KB.
  • One data segment or check segment can be divided into multiple data blocks according to the set granularity.
  • a data slice stored in node 2 includes data block D1 and data block D2
  • a data slice stored in node 3 includes data D3 and D4,...
  • a check slice stored in node 6 includes Q1 and Q2 wait.
  • Check fragments include Q1, Q2, and P1, P2.
  • D1, D2, D3, D4, D5, D6, D7, D8 and Q1, Q2, P1, P2 together form a stripe.
  • the logic unit 1 may also include another stripe, which is composed of D9, D10, D11, D12, D13, D14, D15, D16 and Q3, Q4, P3, and P4.
  • the identifier of the logical unit where it is located and the location inside the logical unit constitute the logical address of the fragment, and the fragment is located in the node
  • the actual address is the physical address of the fragment.
  • Each logical unit may include one or more strips.
  • the number of strips included in the logical unit is not limited, and FIG. 2 is only an example.
  • the situations of logic unit 2 and logic unit 3 are similar to logic unit 1, and will not be described in detail here. This embodiment does not limit the number of nodes, nor does it limit the number of logical units and the corresponding RAID type.
  • the system often uses an additional write mode to write data into the logical unit.
  • a logical unit When a logical unit is full, the system will allocate a new logical unit for data writing. With the continuous modification of the data, the data written before the modification will become invalid data. These invalid data will not be read, but still occupy storage space. Therefore, when the system space is insufficient, logical units need to be recycled to free up storage space. Additional writing is also called ROW (redirect-On-write).
  • the logical unit is the basic unit of garbage collection.
  • the system selects one or more logical units to be recycled (also called source logical units) from multiple logical units, and migrates the valid data in these logical units to other places These logical units will be released later to achieve the purpose of reclaiming storage space.
  • FIG. 3 is a schematic diagram of the effect of the garbage collection method
  • Figure 4 is a schematic diagram of the flow of the garbage collection method. As shown in Figure 4, the method includes the following steps.
  • the master node determines the source logical unit. This step is usually performed under certain trigger conditions, for example, the amount of junk data in the system reaches a certain threshold, or the size of the storage space available in the system is lower than a certain threshold, or the logical unit that meets the recycling condition reaches a certain number, etc. .
  • the source logical unit also needs to meet certain conditions, for example, the data amount of junk data contained in the logical unit reaches the first junk threshold, or the data amount of valid data contained in the logical unit is lower than the second junk threshold, and so on.
  • the determined source logic unit can be one or multiple. Taking FIG. 3 as an example, it is assumed that the determined source logic units are logic unit 1, logic unit 2, and logic unit 3.
  • the master node determines the node where the source logical unit is located. Assume that logic unit 1, logic unit 2, and logic unit 3 are all source logic units. It can be seen from Fig. 3 that the RAID type corresponding to logical unit 1 is “4+2”, which is distributed in node 2, node 3, node 4, node 5, node 6 and node 7.
  • the RAID type of logical unit 2 is the same as that of logical unit 1, and it is distributed on node 1, node 2, node 3, node 5, node 6, and node 7.
  • the RAID type of logical unit 3 is the same as that of logical unit 1, and it is distributed in node 1, node 2, node 3, node 4, node 5, and node 7. Therefore, the node where the source logical unit is located includes node 1, node 2, node 3, node 4, node 5, node 6, and node 7.
  • the master node counts the data amount of the valid data contained in the node where the source logical unit is located, and selects the node with the data amount of the valid data exceeding a set threshold as the target node.
  • the data volume of valid data is often calculated according to the granularity of the data block described above. If a data block contains only valid data, such a data block is called a valid data block (as shown by the white Dn in Figure 3), if a data block contains invalid data, then this data block is called Invalid data block (as shown by the gray Dn in Figure 3).
  • the P and Q data blocks in Figure 3 store check data.
  • node 2, node 3, node 4, and node 5 all contain 3 or 4 valid data blocks. If the number threshold is 2, then node 2, node 3, node 4, and Node 5 is used as the target node. However, Figure 3 is only an example. In this embodiment, as long as the amount of valid data exceeds the set threshold, all nodes can be used as target nodes. The number of target nodes can be one or more. The embodiment is not limited.
  • the master node creates a target logical unit (logical unit 4 shown in FIG. 3), and at least part of the storage space occupied by the target logical unit comes from the target node.
  • the RAID type of the newly created target logical unit is consistent with the RAID type of the source logical unit (logical unit 1, logical unit 2, and logical unit 3), so logical unit 4 needs to span 6 nodes, and the 6 nodes include those selected in S403 If the selected target nodes are not enough, then select some nodes from the distributed storage system to make up 6 nodes. For example: assuming that the number of target nodes in S403 is 4, then 2 more nodes need to be selected. As shown in FIG.
  • the storage space of logical unit 4 comes from node 2, node 3, node 4, node 5, node 6, and node 7.
  • node 2, node 3, node 4, and node 5 are the target nodes selected in S403, and node 6 and node 7 are two other nodes selected.
  • the selection strategy for nodes other than the target node can be load balancing or The principle of randomness is not limited in this embodiment.
  • the RAID type of logical unit 4 is "4+2"
  • node 2, node 3, node 4, and node 5 can be used for data fragmentation
  • node 6 and node 7 are used for storing parity fragments.
  • the valid data blocks in the source logical unit are migrated to the target logical unit (logical unit 4). Since the valid data blocks in the source logical unit are distributed on multiple nodes, specifically, the master node needs to send instructions to the node where each valid data block is located, instructing the node to migrate its stored valid data blocks to Target logical unit. This can be divided into two cases. Case 1. For the target node, the valid data block only needs to be migrated inside the node.
  • the master node may instruct the target node to migrate the valid data block from the first source address in the target node to the first target address in the target node, and the first source address Both the first target address and the first target address are actual addresses, the storage space indicated by the first source address belongs to the source logical unit, and the storage space indicated by the first target address belongs to the target logical unit.
  • the source address and destination address of the data block D1 are both located inside the node 2, so the data block D1 only needs to be migrated inside the node 2.
  • Case 2 For nodes other than the target node, it is necessary to send the valid data block stored in the node to one of the target nodes.
  • the master node instructs the node to migrate the stored valid data block from the second source address to the second target address.
  • data block D19 For example, data block D19, its source address is located in node 1, and its destination address is located in node 2. Since node 1 does not provide storage space for logical unit 4, it is necessary to send D26 to one of the target nodes (node 2 as shown in FIG. 3), and node 2 saves D26 in logical unit 4.
  • An optional implementation manner is that before S405, the master node allocates a target address for each valid data block according to the minimum migration strategy, and creates a migration list 50 (as shown in FIG. 5).
  • the migration list 50 includes the source address and target address of each valid data block.
  • the source address of the valid data block refers to the actual address before the valid data block is migrated
  • the target address of the valid data block refers to the actual address after the valid data block is migrated.
  • the minimum migration strategy is a migration strategy, which refers to a strategy that minimizes data migration across nodes and avoids data migration across nodes as much as possible.
  • the target node such as node 2
  • it provides storage space for both the source logical unit and the target logical unit, so the valid data blocks in the target node do not need to be migrated to other nodes.
  • a non-target node such as node 6
  • the valid data block in the node has to be migrated to the node where the target logical unit is located.
  • the master node After creating the migration list 50, the master node sends the list 50 to the nodes where the valid data blocks are located, and instructs these nodes to migrate according to the target addresses in the list.
  • the minimum migration strategy is only one of the migration strategies, and the embodiment of the present invention may also use other migration strategies.
  • the destination node of migration in the prior art is randomly selected, as long as there is at least one node’s data block designated as: it remains in the current node after migration, in other words, for this data block, the source node It is the same node as the target node. Then, compared with the prior art, the beneficial effect of reducing cross-node migration can be produced, and therefore belongs to the scope of the protection of the embodiment of the present invention.
  • Another optional implementation manner is that the master node does not need to generate the migration list 50, but directly instructs the node where the valid data block is located according to the target address after assigning a target address to each valid data block according to the minimum migration strategy. Address for data migration.
  • the effective data block is migrated within the node, but the processing method is different in different scenarios. If the source address and target address of a valid data block point to different hard disks, then during migration, the valid data block needs to be read from the source address to the cache in the node, and then the data block is obtained from the cache and rewritten to the target address. For example, data block D5, its source address is located on hard disk 0 of node 4, and the target address is located on hard disk 1 of node 4. At this time, node 4 needs to read D5 from hard disk 0 to the cache, and then obtain D5 from the cache and write Hard disk 1.
  • the processor of the node may send a migration instruction to the hard disk where the valid data block is located, and the migration instruction includes the source address and the target address of the valid data block.
  • the hard disk can directly read data from the source address and then write to the target address. For example, data block D3, whose source address and target address are both located on hard disk 0 of node 3, then the processor of node 3 sends a migration instruction to the read-write chip of hard disk 0, and the read-write chip offsets D3 from the disk. 2Write the offset address 10 in the disc.
  • the offset address in the disk is used to indicate the specific location where the data is stored in the hard disk.
  • the master node After each data fragment is written to the corresponding node, the master node also needs to calculate the check fragments of these data fragments.
  • the parity fragment includes parity data blocks, as shown in Figure 3, P and Q are parity data blocks.
  • the master node sends the check fragment (the check fragment includes the check data block) to the storage of the corresponding node.
  • Metadata includes the logical address and physical address of the data.
  • the logical address refers to the identifier of the logical unit where the data is located and the offset within the logical unit. It is understandable that after the valid data block is migrated from the source logical unit to the target logical unit, its logical address will change. In order to enable the client server 101 to subsequently read the correct data, the master node needs to modify the logical address of the data.
  • the physical address refers to the physical location where the data is actually stored, and it indicates the identification of the node where the data is located, the identification of the hard disk in the node, and the offset address in the disk (refer to FIG. 5).
  • data is actually migrated from one node to another, or from one hard disk to another hard disk, or migrated within the same hard disk, its physical address will change, and the changed physical address must be recorded in the data metadata .
  • the master node releases the storage space occupied by the source logical units (logical unit 1, logical unit 2, and logical unit 3 shown in FIG. 3). Delete all data stored in the source logical unit before release, including valid data and invalid data. The storage space obtained after the release can be used by other logical units. It should be noted that after all valid data in the source logical unit in S406 is migrated to the target logical unit, it specifically includes: the master node respectively releases the corresponding storage space of the source logical unit distributed in each storage node.
  • nodes containing more valid data are selected to continue to provide storage space for the target logical unit, then the valid data stored in these nodes can continue to be retained in the node, avoiding inter-node
  • the forwarding saves network bandwidth. Even if there is a small amount of valid data located on other nodes, it still needs to be sent to these nodes containing more valid data. Compared with the prior art, it can also save network bandwidth to a certain extent.
  • One implementation is to migrate each valid data block in logical unit 1, logical unit 2, and logical unit 3 to the storage space corresponding to logical unit 4, regardless of the logical address of the valid data block. In other words, the position of the valid data block in the logical unit will change before and after the migration. Then, the data fragments contained in a stripe will also change. In this case, the check fragments have to be recalculated.
  • Another implementation method is to migrate the valid data block from the source logical unit to the target logical unit according to the original logical address of the valid data block, so that each valid data block after the migration is in the logical unit 4. The offset within is consistent with the original offset.
  • this migration method in the case of a large number of logical units that need to be recycled, there is a high probability that: Compared with before the migration, the data fragments contained in some strips have not changed after the migration. For these Striping eliminates the need to recalculate parity fragments. Therefore, compared with the previous method, this method can save system computing resources.
  • the following is a specific example to illustrate.
  • FIG. 6 is another garbage collection method provided in this embodiment. This method can be applied to the distributed storage system shown in FIG. 1, and the object of garbage collection is the logical unit shown in FIG. 2.
  • Fig. 6 is a schematic flowchart of the method
  • Fig. 7 is a schematic diagram of the effect of the method. As shown in Figure 6, the method includes the following steps.
  • the master node determines at least two source logic units. This step is usually performed under certain trigger conditions.
  • the trigger conditions here are consistent with the trigger conditions in S401 shown in FIG. 4, and the description of S401 may be referred to.
  • the at least two source logic units also need to meet certain conditions.
  • the master node may set the source logic unit to meet certain conditions, and the specific condition setting may refer to the description of S401.
  • each source logic unit can be set to meet the trigger condition, or it can be set that only any one of the source logic units is required to meet the trigger condition, and the amount of garbage data of the logic unit 22 can also be set to the first garbage.
  • the threshold is set, the amount of garbage data in the logic unit 33 is lower than the second garbage threshold. This setting makes the amount of junk data contained in the two source logical units have a certain difference.
  • the logical unit 22 may be the logical unit with the highest amount of garbage data, and the logical unit 33 is the logical unit with the lowest amount of garbage data.
  • other equivalent conditions can also be set to filter the two source logic units.
  • the master node determines the node where the source logical unit is located.
  • the nodes where the logic unit 22 and the logic unit 33 are located can refer to the example of FIG. 7.
  • S603 The master node counts the data amount of valid data contained in the node where the source logical unit is located, and selects a node whose data amount of valid data exceeds a set number threshold as a target node.
  • the specific implementation of S603 is the same as S403, please refer to the description of S403.
  • S604 The master node creates a target logical unit (for example, the logical unit 44 shown in FIG. 7), and at least part of the storage space occupied by the target logical unit comes from the target node.
  • the specific implementation of S604 is the same as S404, please refer to the description of S404.
  • the node where the logical unit 44 is located completely overlaps with the node where the logical unit 22 and the logical unit 33 are located. This is just an example. It should be understood that in an actual application scenario, the logical unit 44 is located The node may only partially overlap with the node where the logical unit 22 and the logical unit 33 are located (as shown in FIG. 3).
  • S605 Migrate the valid data blocks in the logical unit 33 to the logical unit 44, without changing the offset of each valid data block in the logical unit during the migration process.
  • the offset in the logical unit 33 before the migration of these valid data blocks is the same as the offset in the logical unit 44 after the migration.
  • S606 Write the valid data block in the logical unit 22 into the blank data block of the logical unit 44.
  • FIG. 7 after the strip where the D50 is located is migrated to the logical unit 44, blank data blocks will appear in some data fragments because the original data blocks become invalid data blocks in the logical unit 33. Therefore, when migrating valid data blocks in the logical unit 22, blank data blocks can be filled up first, and if overflow occurs, new strips can be written. As shown in Figure 7, D1, D3, etc. are all written into the strip where D51 is located.
  • the valid data in the logic unit 22 is named padding data in the example of FIG. 7, which is represented by a dotted line.
  • the data fragments contained in some strips in the logical unit 33 have not changed after the migration, such as the strip where D33 is located and the strip where D41 is located in FIG. 7. Therefore, there is no need to recalculate the parity slice for the slice where D33 is located and the slice where D41 is located.
  • the stripe where D50 is located and the stripe where D58 is located since new data blocks are filled in, the stripe has changed, so the check segment needs to be recalculated. It is understandable that even if there are still some slices that need to be recalculated for check shards, at least a part of the calculation of check shards is reduced and computing resources are saved.
  • the logical unit 44 can also inherit the identifier of the logical unit 33.
  • the logical unit 44 can also inherit the identifier of the logical unit 33.
  • the logical address includes the identification of the logical unit and the offset in the logical unit. Therefore, the modification of metadata of valid data blocks such as D33 and D34 and the forwarding of modified metadata between storage nodes are avoided, which further saves network bandwidth.
  • S607 The master node releases the storage space occupied by the logical unit 22 and the logical unit 33. For this step, refer to the description of S406 shown in FIG. 4.
  • This embodiment also provides a storage node, and the storage node may be a storage array or a server.
  • the storage node is a storage array
  • the storage node includes a storage controller and a storage medium.
  • the structure of the storage controller refer to the schematic structural diagram of FIG. 8.
  • the storage node is a server, you can also refer to the structural diagram of FIG. 8. Therefore, no matter what type of device the storage node is, it includes at least the processor 801 and the memory 802.
  • a program 803 is stored in the memory 802.
  • the processor 801, the memory 802, and the interface 804 are connected through the system bus 805 and complete mutual communication.
  • the processor 801 is a single-core or multi-core central processing unit, or a specific integrated circuit, or one or more integrated circuits configured to implement the embodiments of the present invention.
  • the memory 802 may be a random access memory (Random Access Memory, RAM) or a non-volatile memory (non-volatile memory), such as at least one hard disk memory.
  • RAM Random Access Memory
  • non-volatile memory non-volatile memory
  • the memory 802 is used to store computer execution instructions.
  • the program 803 may be included in the computer execution instruction.
  • the processor 801 runs the program 803 to execute the method flow of S401-S406 shown in FIG. 4, or execute the method flow of S601-S607 shown in FIG.
  • this embodiment also provides a garbage collection device, the device is located in the main node of the distributed storage system, the distributed storage system includes multiple storage nodes, the main node is the multiple One of the storage nodes, the garbage collection device includes the following modules.
  • the selection module 901 is configured to select a target node from the multiple storage nodes according to the amount of valid data distributed in each storage node by the source logic unit, and the data amount of the first valid data stored by the target node exceeds The set number threshold.
  • the specific functions of this module can refer to S401, S402, and S403 shown in Fig. 4, and S601, S602, and S603 shown in Fig. 6.
  • the function of this module can be executed by the processor 801 shown in FIG. 8 running the program 803 in the memory 802.
  • the creation module 902 is configured to create a target logical unit, and the storage nodes distributed by the target logical unit include the target node.
  • this module For specific functions of this module, refer to S404 shown in FIG. 4 and S604 shown in FIG. 6.
  • the function of this module can be executed by the processor 801 shown in FIG. 8 running the program 803 in the memory 802.
  • the instruction module 903 is configured to instruct the target node to migrate the first valid data from the first source address in the target node to the first target address in the target node, where the first source address indicates The storage space belongs to the source logical unit, and the storage space indicated by the first target address belongs to the target logical unit.
  • this module For specific functions of this module, refer to S405 shown in FIG. 4 and S605 and S606 shown in FIG. 6.
  • the function of this module can be executed by the processor 801 shown in FIG. 8 running the program 803 in the memory 802.
  • the releasing module 904 is configured to release the storage space indicated by the first source address.
  • the specific functions of this module can refer to S406 shown in FIG. 4 and S607 shown in FIG. 6.
  • the function of this module can be executed by the processor 801 shown in FIG. 8 running the program 803 in the memory 802.
  • the creation module 902 is further configured to instruct the target node at the master node to migrate the first valid data from the first source address in the target node to the first target address in the target node Previously, a migration list was created, the migration list including the first source address of the first valid data and the first target address of the first valid data.
  • the garbage collection device may further include a sending module 905, which is used to send the migration list to the target node.
  • the migration list further includes a second source address of the second valid data stored in the other storage node and a second target address of the second valid data, and the second source address is located in the other storage node.
  • the second target address is located in the target node.
  • the sending module 905 is further configured to send the migration list to the other storage nodes.
  • the instruction module 903 is further configured to instruct the other storage nodes to migrate the stored second valid data from the second source address to the second target address, and the storage space indicated by the second source address belongs to the The source logical unit, and the storage space indicated by the second target address belongs to the target logical unit.
  • the releasing module 904 is also used to release the storage space indicated by the second source address.
  • both the first source address and the first target address are located in the first hard disk of the target node.
  • the first source address is located in a first hard disk of the target node
  • the first target address is located in a second hard disk of the target node.
  • the instruction module 903 is specifically configured to instruct the target node to migrate the first valid data from the first source address to the first valid data according to the offset of the first valid data in the source logical unit.
  • the first target address is such that the offset of the first valid data in the target logical unit after the migration is the same as the offset of the first valid data in the source logical unit before the migration.
  • the first valid data is distributed in a first segment of the source logical unit before migration, and the first valid data is distributed in a second segment of the target logical unit after migration, and
  • the instruction module 903 is also used to determine whether the data fragments included in the first stripe are the same as the data fragments included in the second stripe; When the data segments included in the second segment are the same, the check segments included in the first segment are retained.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or Data storage devices such as storage nodes and data centers that contain one or more available media integration.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a storage node, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种分布式存储系统,以及分布式存储系统中垃圾回收方法。主节点根据源逻辑单元分布在每个存储节点中的有效数据的数据量,从所述多个存储节点中选择目标节点,所述目标节点中存储的第一有效数据的数据量超过设定的数量阈值。所述主节点创建目标逻辑单元,所述目标逻辑单元所分布的存储节点中包括所述目标节点。换而言之,所述目标逻辑单元所占用的存储空间中至少有一部分存储空间是来自于所述目标节点的。然后,所述主节点指令所述目标节点将所述第一有效数据从第一源地址迁移至第一目标地址。再释放所述第一源地址指示的存储空间。节省了存储节点之间的网络带宽。

Description

分布式存储系统和分布式存储系统中垃圾回收方法 技术领域
本申请涉及存储领域,并且更具体地,涉及一种分布式存储系统和分布式存储系统中垃圾回收方法。
背景技术
在分布式存储系统中,数据通常通过追加写的方式写入系统所包含的多个存储节点中。追加写不同于覆盖写,对数据进行修改时,原来的数据并不会立即删除,因此系统中不可避免地会出现大量垃圾数据(修改后的数据是有效数据)。为了释放垃圾数据所占用的存储空间,系统会定期进行垃圾回收。垃圾回收以逻辑单元为对象,其具体过程是,在所述分布式存储系统中选择一定数量的存储节点,在这些存储节点中创建新的逻辑单元,然后将待回收的逻辑单元中的有效数据写入新的逻辑单元,再释放该待回收的逻辑单元所占用的存储空间。由于所述新的逻辑单元所位于的一定数量的存储节点通常是随机选择的,所以这些存储节点往往不同于所述待回收的存储节点所在的节点,那么在将有效数据重新写入新的逻辑单元的过程中,往往涉及到存储节点之间的数据转发,会消耗大量的带宽资源。
发明内容
本申请提供了一种分布式存储系统,以及分布式存储系统中垃圾回收方法,能够保证至少有一部分有效数据在同一个存储节点中迁移,在一定程度上减少了跨节点迁移数据,从而达到节省带宽的目的。
第一方面提供了一种分布式存储系统中的垃圾回收方法,所述分布式存储系统包括多个存储节点,其中一个存储节点是主节点。在该方法中,主节点根据源逻辑单元分布在每个存储节点中的有效数据的数据量,从所述多个存储节点中选择目标节点,所述目标节点中存储的第一有效数据的数据量超过设定的数量阈值。所述主节点创建目标逻辑单元,所述目标逻辑单元所分布的存储节点中包括所述目标节点。换而言之,所述目标逻辑单元所占用的存储空间中至少有一部分存储空间是来自于所述目标节点的。然后,所述主节点指令所述目标节点将所述第一有效数据从第一源地址迁移至第一目标地址。第一源地址和第一目标地址均是指实际地址,并且第一源地址和第一目标地址都位于所述目标节点内,然而第一源地址指示的存储空间属于所述源逻辑单元,第一目标地址指示的存储空间属于所述目标逻辑单元。当所述主节点确认所述源逻辑单元中所有的有效数据均已迁移至所述目标逻辑单元之后,释放所述源逻辑单元占用的存储空间。所述源逻辑单元占用的存储空间包括所述第一源地址指示的存储空间。
按照第一方面提供的垃圾回收方法,将源逻辑单元所分布的中保存有效数据较多的存储节点作为目标节点,主节点创建的目标逻辑单元所分布的存储节点包括所述目标节点,因此目标节点不仅为源逻辑单元提供存储空间,也为目标逻辑单元提供了存储空间。那么主节点在迁移源逻辑单元的有效数据的过程中,可以指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址。由于 所述第一有效数据是在所述目标节点内部迁移,因此在一定程度上避免了数据在存储节点之间的转发,节省了网络带宽。
在第一方面的第一种实现中,在所述主节点指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址之前,所述主节点创建迁移列表,所述迁移列表包括所述第一有效数据的第一源地址以及所述第一有效数据的第一目标地址。然后,主节点将所述迁移列表发送给所述目标节点。所述迁移列表可以根据最少迁移原则创建。
结合第一方面的第一种实现,在第一方面的第二种实现中,所述多个存储节点还包括其他存储节点,所述其他存储节点独立于所述目标逻辑单元所分布的存储节点,所述迁移列表还包括所述其他存储节点内存储的第二有效数据的第二源地址和所述第二有效数据的第二目标地址,所述第二源地址位于所述其他存储节点中,所述第二目标地址位于所述目标节点中。所述主节点还将所述迁移列表发送给所述其他存储节点。其他存储节点是源逻辑单元所分布的中保存有效数据较少的存储节点,没有被选作目标节点,因此没有为目标逻辑单元提供存储空间。在这种情况下,所述其他存储节点需要将它存储的第二有效数据迁移至目标节点。如果目标节点有多个,则可以迁移至任意一个目标节点。或者,所述其他存储节点还可以将所述第二有效数据迁移至所述目标逻辑单元所位于的除目标节点之外的其他节点。
结合第一方面的第一种实现,在第一方面的第三种实现中,所述第一源地址和所述第一目标地址均位于所述目标节点的第一硬盘中。在这种情况下,所述目标节点可以将所述第一源地址和所述第一目标地址发送给所述第一硬盘,由所述第一硬盘执行迁移操作。由此减轻所述目标节点的处理器的负担。
结合第一方面的第一种实现,在第一方面的第四种实现中,所述第一源地址位于所述目标节点的第一硬盘中,所述第一目标地址位于所述目标节点的第二硬盘中。在这种情况下,具体的迁移操作则是所述目标节点的处理器将所述第一有效数据从所述第一源地址读取至缓存,再从缓存中写入所述第一目标地址。
结合第一方面的第一种实现,在第一方面的第四种实现中,所述主节点指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址包括所述主节点指令所述目标节点根据所述第一有效数据位于所述源逻辑单元内的偏移量将所述第一有效数据从所述第一源地址迁移至所述第一目标地址,使得所述迁移后所述第一有效数据位于所述目标逻辑单元内的偏移量与迁移前所述第一有效数据位于所述源逻辑单元内的偏移量相同。按照这种迁移方式,第一有效数据位于所述源逻辑单元中的位置和位于所述目标逻辑单元中的位置相同。如果所述第一有效数据所位于的第一分条中所包含的所有的有效数据都按照这样的方式迁移,则所述第一分条所包含的数据分片在迁移前后不会发生变化,因此不需要重新计算校验分片,保留所述第一分条原有的校验分片即可。从而减轻了主节点的计算量,节省了计算资源。
结合第一方面的第四种实现,在第一方面的第五种实现中,当所述源逻辑单元中所有的有效数据均迁移至所述目标逻辑单元,并且释放了所述源逻辑单元所占用的存储空间之后,所述主节点可以将所述目标逻辑单元的标识修改为所述源逻辑单元的标识。由于数据的逻辑地址是由该数据所在的逻辑单元的标识,以及在所述逻辑单元内的偏移量 组成的。因为所述目标逻辑单元继承了所述源逻辑单元的标识,并且由第四种实现可知,所述第一有效数据位于所述源逻辑单元中的位置和位于所述目标逻辑单元中的位置相同,因此所述第一有效数据的逻辑地址在迁移前后并没有发生变化,从而避免了所述第一有效数据的元数据的修改,以及修改后的元数据在存储节点间的转发,进一步节省了网络带宽。
本申请第二方面提供了一种主节点,所述主节点位于分布式存储系统中,所述分布式存储系统包括多个存储节点,所述主节点包括接口和处理器,其中所述接口用于与所述多个存储节点进行通信;所述处理器用于执行第一方面提供的任意一种实现。
本申请第三方面提供了一种垃圾回收装置,所述装置位于分布式存储系统的主节点中,所述分布式存储系统包括多个存储节点,所述主节点是所述多个存储节点中的一个存储节点,所述垃圾回收装置用于执行第一方面提供的任意一种实现。
本申请第四方面提供了一种垃圾回收的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令用于执行第一方面所描述的方法。
附图说明
图1是本发明实施例提供的应用场景图;
图2是本发明实施例提供的逻辑单元的示意图;
图3是本发明实施例提供的一种垃圾回收方法的效果示意图;
图4是本发明实施例提供的一种垃圾回收方法的流程示意图;
图5是本发明实施例提供的一种迁移列表的示意图;
图6是本发明实施例提供的另一种垃圾回收方法的流程示意图;
图7是本发明实施例提供的另一种垃圾回收方法的效果示意图;
图8是本发明实施例提供的主节点的结构示意图;
图9是本发明实施例提供的主节点的垃圾回收装置的结构示意图。
具体实施方式
本申请实施例在垃圾回收(garbage collection)时,能够保证至少有一部分有效数据在同一个存储节点中迁移,在一定程度上减少了跨节点迁移数据,从而达到节省带宽的目的。下面将结合附图,对本发明实施例中的技术方案进行描述。
本申请实施例的技术方案可以应用于各种存储系统。在下文中以分布式存储系统为例描述本申请实施例的技术方案,但本发明实施例对此并不限定。在分布式存储系统中,数据分散存储在多台存储节点(下面简称为“节点”)上,由多台存储节点分担存储负荷,这种存储方式不但提高了系统的可靠性、可用性和存取效率,还易于扩展。存储节点例如是服务器,或者是存储控制器和存储介质的组合。
图1是可应用本实施例的技术方案的场景的示意图。如图1所示,多个客户端服务器(client server)101和存储系统100通信,存储系统100包括交换机103和多个存储节点(或简称“节点”)104等。其中,交换机103是可选设备。每个存储节点104可以包括多个机械硬盘或者其他类型的存储介质(例如固态硬盘或者叠瓦式磁记录),用于存储数据。
图2是本实施例提供的逻辑单元的示例。逻辑单元是一段逻辑空间,每个逻辑单元的实际的物理空间来自多个节点。逻辑单元所占用的节点的数量取决于该逻辑单元对应的独立硬盘冗余阵列(Redundant Array of Independent Disks,RAID)类型。如图2所示,节点2、节点3、节点4、节点5、节点6和节点7各自提供一部分存储空间,从而构建出RAID类型为“4+2”的逻辑单元1,其中,节点2、节点3、节点4和节点5用于存储数据分片,节点6和节点7用于存储校验分片。在这6个节点中,有一个节点(例如,节点2)被选举为主节点。主节点将接收的数据划分成4个数据分片,并计算4个数据分片的2个校验分片,再将每个数据分片及其校验分片发送到相应的节点进行存储。主节点可以是其中一个分片所在的节点,也可以是独立于逻辑单元1之外的节点。在数据分片写入节点时,通常会以设定的粒度写入,粒度的大小例如8KB或者16KB。一个数据分片或者校验分片按照所述设定的粒度可以划分为多个数据块。例如节点2中存储的一个数据分片包括数据块D1和数据块D2,节点3中存储的一个数据分片包括数据D3和D4,……,节点6中存储的一个校验分片包括Q1和Q2等等。校验分片包括Q1、Q2,以及P1、P2。D1、D2、D3、D4、D5、D6、D7、D8和Q1、Q2、P1、P2共同组成一个分条(stripe)。当任意两个数据分片/校验分片发生损坏时,可以利用其它分片进行恢复,从而保证数据的可靠性。示例性的,逻辑单元1还可以包括另一个分条,它由D9、D10、D11、D12、D13、D14、D15、D16和Q3、Q4、P3、P4组成。对于每个分片(数据分片或者校验分片)而言,它所在的逻辑单元的标识以及位于所述逻辑单元内部的位置组成所述分片的逻辑地址,该分片位于节点中的实际地址是所述分片的物理地址。
每个逻辑单元可以包含一个或者多个分条,本实施例中不限定逻辑单元所包含的分条的数量,图2仅为示例。逻辑单元2和逻辑单元3的情况与逻辑单元1类似,这里不再详细描述。本实施例并不限定节点的数量,也不限定逻辑单元的数量以及其所对应的RAID类型。
在实际应用中,系统往往采用追加写的模式将数据写入逻辑单元,当一个逻辑单元写满之后系统会分配新的逻辑单元供数据写入。随着数据的不断修改,修改前写入的数据会变成无效数据。这些无效数据不会被读取,但仍然占据着存储空间。因此,当系统空间不足时就需要回收逻辑单元以释放存储空间了。追加写也称为写时重定向ROW(redirect-On-write)。
在本实施例中,逻辑单元是垃圾回收的基本单位。换而言之,当一定条件触发时,系统从多个逻辑单元中选择出一个或多个待回收的逻辑单元(又称为源逻辑单元),将这些逻辑单元中的有效数据迁移至其他地方以后再释放这些逻辑单元,以达到回收存储空间的目的。
下面结合图3和图4介绍本实施例提供的垃圾回收方法。该方法可以应用在图1所示的分布式存储系统中,垃圾回收的对象为如图2所示的逻辑单元。图3是垃圾回收方法的效果示意图,图4是是垃圾回收方法的流程示意图。如图4所示,该方法包括以下步骤。
在S401中,主节点确定源逻辑单元。该步骤通常在一定触发条件下进行,例如系统中垃圾数据的数据量达到特定数量阈值,或者系统中可用的存储空间的大小低于特定空间阈值,或者满足回收条件的逻辑单元达到一定数量等等。所述源逻辑单元也需要满足 一定条件,例如该逻辑单元所包含的垃圾数据的数据量达到第一垃圾阈值,或者该逻辑单元所包含的有效数据的数据量低于第二垃圾阈值等等。通常情况下,确定出的源逻辑单元可以是一个也可以是多个。以图3为例,假设确定出的源逻辑单元为逻辑单元1、逻辑单元2和逻辑单元3。
在S402中,主节点确定源逻辑单元所位于的节点。假设逻辑单元1、逻辑单元2和逻辑单元3都是源逻辑单元。由图3可知,逻辑单元1对应的RAID类型为“4+2”,分布在节点2、节点3、节点4、节点5、节点6和节点7中。逻辑单元2的RAID类型与逻辑单元1一致,它分布在节点1、节点2、节点3、节点5、节点6和节点7。逻辑单元3的RAID类型与逻辑单元1一致,它分布在节点1、节点2、节点3、节点4、节点5和节点7。因此,源逻辑单元所位于的节点包括节点1、节点2、节点3、节点4、节点5、节点6和节点7。
在S403中,主节点统计所述源逻辑单元所位于的节点包含的有效数据的数据量,并选择出有效数据的数据量超过设定的数量阈值的节点作为目标节点。实际应用中,往往按照上面描述的数据块的粒度来统计有效数据的数据量。如果某个数据块中只包含有效数据,这样的数据块被称为有效数据块(如图3中白色的Dn所示),如果某个数据块中包含无效数据,那么这个数据块被称为无效数据块(如图3中灰色的Dn所示)。另外,图3中的P、Q数据块中存储的是校验数据,由于通常情况下在垃圾回收完成之后原有的分条所包含的数据分片发生变化,因此会重新计算并存储校验分片以保证可靠性。由此,有效和无效仅针对数据分片中的数据块;P、Q数据块没有有效数据和无效数据之分,只统计数据分片所包含的有效数据块即可。
示例性的,如图3所示,节点2、节点3、节点4和节点5均包含3个或4个有效数据块,如果数量阈值为2,则可以将节点2、节点3、节点4和节点5均作为目标节点。然而,图3仅是一种示例,在本实施例中,只要有效数据的数据量超过设定的数量阈值的节点都可以作为目标节点,目标节点的数量可以是一个也可以是多个,本实施例不做限定。
在S404中,所述主节点创建目标逻辑单元(如图3所示的逻辑单元4),所述目标逻辑单元所占用的存储空间至少有部分来源于所述目标节点。新创建的目标逻辑单元的RAID类型与源逻辑单元(逻辑单元1、逻辑单元2和逻辑单元3)的RAID类型一致,因此逻辑单元4需要跨越6个节点,所述6个节点包括S403中选择出的目标节点,若选择出的目标节点不够,则再从所述分布式存储系统中选择一些节点以凑够6个节点。例如:假设S403中的目标节点的数量是4个,那么需要再选择2个节点。如图3所示,逻辑单元4的存储空间来源于节点2、节点3、节点4、节点5、节点6和节点7。其中,节点2、节点3、节点4、节点5是S403中选择出的目标节点,节点6和节点7是另外选择的两个节点,目标节点之外的节点的选择策略,可以是负载均衡或者随机原则,本实施例不做限定。逻辑单元4的RAID类型为“4+2”,节点2、节点3、节点4、节点5可以用于数据分片,节点6和节点7用于存储校验分片。
在S405中,将所述源逻辑单元中的有效数据块迁移到目标逻辑单元(逻辑单元4)。由于所述源逻辑单元中的有效数据块是分布在多个节点上的,因此具体的,主节点需要向各个有效数据块所在的节点发送指令,指示该节点将其存储的有效数据块迁移到目标 逻辑单元。这里可以分为两种情况,情况1,对于目标节点而言,有效数据块只需要在该节点内部迁移。在这种情况下,主节点可以指令所述目标节点将所述有效数据块从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址,所述第一源地址和所述第一目标地址都是实际地址,所述第一源地址指示的存储空间属于所述源逻辑单元,所述第一目标地址指示的存储空间属于所述目标逻辑单元。例如,数据块D1的源地址和目标地址都位于节点2内部,因此数据块D1只需要在节点2内部迁移。情况2,对于除目标节点之外的节点而言,则需要将该节点内存储的有效数据块发送给其中一个目标节点。在这种情况下,主节点指令该节点将存储的有效数据块从第二源地址迁移至第二目标地址。例如数据块D19,它的源地址位于节点1内,而目标地址位于节点2内。由于节点1并没有给逻辑单元4提供存储空间,因此需要将D26发送给其中一个目标节点(如图3所示的节点2),由节点2将D26保存在逻辑单元4中。
一种可选的实施方式是,在S405之前主节点根据最少迁移策略为各个有效数据块分配目标地址,并创建迁移列表50(如图5所示)。该迁移列表50包括每个有效数据块的源地址和目标地址。有效数据块的源地址是指有效数据块迁移前的实际地址,有效数据块的目标地址是指所述有效数据块迁移后的实际地址。最少迁移策略是一种迁移策略,是指使得跨节点的迁移数据尽量最少的一种策略,尽量避免数据跨节点迁移的原则。例如,对于目标节点(如节点2)而言,它既为源逻辑单元提供存储空间也为目标逻辑单元提供存储空间,因此该目标节点中的有效数据块无需迁移至其他节点。而对于非目标节点(如节点6),由于它并没有为目标逻辑单元提供存储空间,所以不得不将该节点中的有效数据块迁移至所述目标逻辑单元所在的节点。主节点创建迁移列表50后,将所述列表50发送给有效数据块所在的节点,并指示这些节点按照列表中的目标地址进行迁移。需要特别说明的是,最少迁移策略只是迁移策略中的一种,本发明实施例还可以使用其他迁移策略。由于现有技术中迁移的目的节点是随机选择的,因此,只要有至少一个节点的数据块被指定为:在迁移之后仍然保留在本节点,换句话说,对这个数据块而言,源节点和目标节点是同一个节点。那么,和现有技术相比,就可以产生减少跨节点迁移的有益效果,因此属于本发明实施例所欲保护的范围。
另一种可选的实施方式是,主节点无需生成迁移列表50,而是在根据所述最少迁移策略为各个有效数据块分配目标地址之后直接指示所述有效数据块所在的节点按照所述目标地址进行数据迁移。
进一步地,对于目标节点而言,有效数据块是在其节点内部实现迁移,然而在不同的场景下处理方式也有所差异。若一个有效数据块的源地址和目标地址指向不同硬盘,那么在迁移时需要将该有效数据块从源地址读取至节点内的缓存,再从缓存中获取该数据块重新写入目标地址。例如数据块D5,其源地址位于节点4的硬盘0,而目标地址位于节点4的硬盘1,此时节点4则需要从硬盘0中读取D5至缓存,再从缓存中获取D5并写入硬盘1。若一个有效数据块的源地址和目标地址指向同一个硬盘,那么迁移时则无需将有效数据块从硬盘读取至缓存,直接在硬盘内部实现迁移即可。此时,该节点的处理器可以向所述有效数据块所在的硬盘发送迁移指令,该迁移指令包括所述有效数据块的源地址和目标地址。所述硬盘可直接从源地址读取数据,再写入目标地址。例如,数据块D3,其源地址和目标地址均位于节点3的硬盘0,那么节点3的处理器向硬盘0的 读写芯片发送迁移指令,所述读写芯片将D3从盘内偏移地址2写入盘内偏移地址10。在本实施例中,盘内偏移地址用于指示数据存储在硬盘内的具体位置。
对于目标逻辑单元(如图3所示的逻辑单元4),在各个数据分片写入相应的节点之后,主节点还需要计算这些数据分片的校验分片。校验分片包括校验数据块,如图3所示的P、Q就是校验数据块。校验数据块计算完成之后,主节点再将校验分片(校验分片包括校验数据块)发送给相应的节点的存储。
当源逻辑单元中的所有有效数据均迁移至目标逻辑单元之后,主节点对数据的元数据进行更新。元数据包括数据的逻辑地址和物理地址,逻辑地址是指该数据所位于的逻辑单元的标识以及在所述逻辑单元内部的偏移量。可以理解的是,有效数据块从源逻辑单元迁移至目标逻辑单元后,其逻辑地址会发生变化。为了使得客户端服务器101后续能够读取到正确的数据,主节点需要修改数据的逻辑地址。物理地址是指实际存储所述数据的物理位置,它指示了该数据所在的节点的标识,在节点内的硬盘的标识以及盘内偏移地址(可参考图5)。当数据真实地从一个节点迁移至另一个节点,或者从一个硬盘迁移至另一个硬盘,或者在同一个硬盘内部迁移其物理地址都会发生变化,变化后的物理地址需记录在数据的元数据中。
在S406中,主节点释放所述源逻辑单元(图3所示的逻辑单元1、逻辑单元2和逻辑单元3)所占用的存储空间。在释放之前删除所述源逻辑单元中存储的所有数据,包括有效数据和无效数据。释放后所获得的存储空间可供其他逻辑单元使用。需要说明的是,S406发生在源逻辑单元中的有效数据全部迁移至目标逻辑单元之后,它具体包括:主节点分别释放所述源逻辑单元所分布在各个存储节点中的相应的存储空间。
按照图4所示的垃圾回收方法,将包含有效数据较多的节点选择出来继续为目标逻辑单元提供存储空间,那么这些节点中存储的有效数据就可以继续保留在该节点中,避免了节点间的转发,节省了网络带宽。即使有少量位于其他节点上的有效数据仍然需要发送给这些包含有效数据较多的节点,与现有技术相比,也能在一定程度上节省网络带宽。
进一步的,S405至少有两种实现方式。一种实现是将逻辑单元1、逻辑单元2和逻辑单元3中的各个有效数据块迁移至逻辑单元4对应的存储空间即可,不考虑有效数据块的逻辑地址。换而言之,迁移前、后相比,有效数据块在逻辑单元内的所处的位置会发生变化。那么,一个分条所包含的数据分片也会发生变化,这种情况下,不得不重新计算校验分片。另一种实现方式是,将有效数据块从源逻辑单元迁移至目标逻辑单元的过程中,按照所述有效数据块原有的逻辑地址进行迁移,使得迁移后的各个有效数据块在逻辑单元4内的偏移量与原来的偏移量一致。按照这种迁移方式,在大量逻辑单元需要被回收的情况下,大概率地会出现:和迁移前相比,某些分条在迁移之后,其包含的数据分片也没有发生变化,对于这些分条就不需要重新计算校验分片了。因此,和前一种方式相比,这种方式可以节省系统的计算资源。下面以一个具体的例子来说明。
请参考图6和图7,图6是本实施例提供的另一种垃圾回收方法。该方法可以应用在图1所示的分布式存储系统中,垃圾回收的对象为如图2所示的逻辑单元。图6是所述方法的流程示意图,图7是所述方法的效果示意图。如图6所示,该方法包括以下步骤。
S601,主节点确定至少两个源逻辑单元。该步骤通常在一定触发条件下进行,这里 的触发条件与图4所示的S401中的触发条件一致,可参考S401的描述。所述至少两个源逻辑单元也需要满足一定条件。可选的,主节点可以设置所述源逻辑单元符合一定条件,具体的条件设置可以参考S401的描述。
示例性的,以确定两个源逻辑单元为例,这两个逻辑单元分别为逻辑单元22和逻辑单元33。本实施例可以设置每个源逻辑单元均满足所述触发条件,也可以设置只需要其中任意一个源逻辑单元满足所述触发条件,还可以设置逻辑单元22的垃圾数据的数据量达到第一垃圾阈值时,逻辑单元33的垃圾数据的数据量低于第二垃圾阈值。这样的设置使得两个源逻辑单元所包含的垃圾数据的数据量有一定的差异。进一步地,在某些场景下,逻辑单元22可以是拥有的垃圾数据的数据量最高的逻辑单元,而逻辑单元33是拥有的垃圾数据的数据量最低的逻辑单元。同理,本实施例也可以设定其他等同条件筛选两个源逻辑单元。
S602,主节点确定源逻辑单元所位于的节点。逻辑单元22和逻辑单元33所位于的节点可参考图7的示例。
S603,所述主节点统计所述源逻辑单元所位于的节点包含的有效数据的数据量,并选择出有效数据的数据量超过设定的数量阈值的节点作为目标节点。S603的具体实现和S403一致,请参考S403的描述。
S604,所述主节点创建目标逻辑单元(例如图7所示的逻辑单元44),所述目标逻辑单元所占用的存储空间至少有部分来源于所述目标节点。S604的具体实现和S404一致,请参考S404的描述。在图7的示例中,逻辑单元44所位于的节点,与逻辑单元22、逻辑单元33所在的节点完全重合,这只是一种示例,应理解,在实际应用场景中,逻辑单元44所位于的节点可以仅与逻辑单元22、逻辑单元33所在的节点部分重合(如图3所示)。
S605,将逻辑单元33中的有效数据块迁移至逻辑单元44,在迁移的过程中不改变各个有效数据块在逻辑单元内的偏移量。也就是说,这些有效数据块迁移前在逻辑单元33内的偏移量与迁移后在逻辑单元44内的偏移量相同。
S606,将逻辑单元22中的有效数据块写入逻辑单元44的空白的数据块中。如图7所示,D50所在的分条迁移至逻辑单元44后,某些数据分片中会出现空白的数据块,因为原来的数据块在逻辑单元33中变成无效数据块了。因此,在迁移逻辑单元22中的有效数据块时,可以优先将空白的数据块填满,如有溢出再写入新的分条。如图7所示的D1、D3等都写入了D51所在的分条。为了和逻辑单元33中的有效数据相区别,在图7的示例中将逻辑单元22中的有效数据命名为填充数据,用虚线表示。
按照S605-S606的迁移方式,逻辑单元33中有一些分条所包含的数据分片在迁移后没有发生改变,如图7中D33所在的分条以及D41所在的分条。因此,D33所在的分条和D41所在的分条就不用重新计算校验分片。而对于D50所在的分条,以及D58所在的分条,由于都有新的数据块填充进来,分条发生了改变,因此需要重新计算校验分片。可以理解的是,即使仍然有一些分条需要重新计算校验分片,但至少减少了一部分校验分片的计算量,节省了计算资源。另一方面,由于校验分片是由主节点计算之后再发送给相应的节点(如图7所示的节点5和节点6)存储,既然不需要计算校验分片了,自然也不会发送计算后的校验分片了,节省了带宽资源。
进一步地,在本实施例中,逻辑单元44还可以继承逻辑单元33的标识,这样一来,对于D33、D34等数据块来说,不但在逻辑单元内的偏移量没有发生改变,它们所位于的逻辑单元的标识也没有发生改变,那么相当于这些数据块的逻辑地址没有改变(逻辑地址包括逻辑单元的标识以及位于该逻辑单元内的偏移量)。因此,避免了D33、D34等有效数据块的元数据的修改,以及修改后的元数据在存储节点间的转发,进一步节省了网络带宽。
S607,主节点释放逻辑单元22和逻辑单元33所占用的存储空间。该步骤可参考图4所示的S406的描述。
本实施例还提供了一种存储节点,所述存储节点可以是存储阵列,也可以是服务器。当存储节点是存储阵列时,该存储节点包括存储控制器和存储介质。所述存储控制器的结构可以参考图8的结构示意图。当存储节点是服务器时,也可以参考图8的结构示意图。由此,无论存储节点是哪种形态的设备,都至少包括了处理器801和存储器802。所述存储器802中存储有程序803。处理器801、存储器802和接口804之间通过系统总线805连接并完成相互间的通信。
处理器801是单核或多核中央处理单元,或者为特定集成电路,或者为被配置成实施本发明实施例的一个或多个集成电路。存储器802可以为随机存取存储器(Random Access Memory,RAM),也可以为非易失性存储器(non-volatile memory),例如至少一个硬盘存储器。存储器802用于存储计算机执行指令。具体的,计算机执行指令中可以包括程序803。当所述存储节点运行时,处理器801运行所述程序803以执行图4所示的S401-S406的方法流程,或者执行图6所示S601-S607的方法流程。
请参考图9,本实施例还提供一种垃圾回收装置,所述装置位于分布式存储系统的主节点中,所述分布式存储系统包括多个存储节点,所述主节点是所述多个存储节点中的一个存储节点,所述垃圾回收装置包括以下模块。
选择模块901,用于根据源逻辑单元分布在每个存储节点中的有效数据的数据量,从所述多个存储节点中选择目标节点,所述目标节点存储的第一有效数据的数据量超过设定的数量阈值。该模块的具体功能可参考图4所示的S401、S402、S403,以及图6所示的S601、S602和S603。另外,该模块的功能可由图8所示的处理器801运行存储器802中的程序803执行。
创建模块902,用于创建目标逻辑单元,所述目标逻辑单元所分布的存储节点中包括所述目标节点。该模块的具体功能可参考图4所示的S404以及图6所示的S604。另外,该模块的功能可由图8所示的处理器801运行存储器802中的程序803执行。
指示模块903,用于指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址,所述第一源地址指示的存储空间属于所述源逻辑单元,所述第一目标地址指示的存储空间属于所述目标逻辑单元。该模块的具体功能可参考图4所示的S405以及图6所示的S605和S606。另外,该模块的功能可由图8所示的处理器801运行存储器802中的程序803执行。
释放模块904,用于释放所述第一源地址指示的存储空间。该模块的具体功能可参考图4所示的S406以及图6所示的S607。另外,该模块的功能可由图8所示的处理器801运行存储器802中的程序803执行。
可选的,创建模块902还用于在所述主节点指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址之前,创建迁移列表,所述迁移列表包括所述第一有效数据的第一源地址以及所述第一有效数据的第一目标地址。所述垃圾回收装置还可以包括发送模块905,该模块用于将所述迁移列表发送给所述目标节点。
可选的,所述迁移列表还包括所述其他存储节点内存储的第二有效数据的第二源地址和所述第二有效数据的第二目标地址,所述第二源地址位于所述其他存储节点中,所述第二目标地址位于所述目标节点中。所述发送模块905还用于将所述迁移列表发送给所述其他存储节点。指示模块903还用于指令所述其他存储节点将存储的所述第二有效数据从所述第二源地址迁移至所述第二目标地址,所述第二源地址指示的存储空间属于所述源逻辑单元,所述第二目标地址指示的存储空间属于所述目标逻辑单元。释放模块904还用于释放所述第二源地址指示的存储空间。
可选的,第一源地址和所述第一目标地址均位于所述目标节点的第一硬盘中。
可选的,所述第一源地址位于所述目标节点的第一硬盘中,所述第一目标地址位于所述目标节点的第二硬盘中。
可选的,指示模块903具体用于指令所述目标节点根据所述第一有效数据位于所述源逻辑单元内的偏移量将所述第一有效数据从所述第一源地址迁移至所述第一目标地址,使得所述迁移后所述第一有效数据位于所述目标逻辑单元内的偏移量与迁移前所述第一有效数据位于所述源逻辑单元内的偏移量相同。
可选的,所述第一有效数据迁移前分布在所述源逻辑单元的第一分条中,所述第一有效数据迁移后分布在所述目标逻辑单元的第二分条中,所述指示模块903还用于判断所述第一分条所包含的数据分片与所述第二分条所包含的数据分片是否相同;当所述第一分条所包含的数据分片与所述第二分条所包含的数据分片相同时,保留所述第一分条所包含的校验分片。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的存储节点、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
应理解,在本申请实施例中,术语“第一”等仅仅是为了指代对象,并不表示相应对象的次序。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术 人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,存储节点,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种分布式存储系统中的垃圾回收方法,所述分布式存储系统包括多个存储节点,其特征在于,所述方法包括:
    所述多个存储节点中的主节点根据源逻辑单元分布在每个存储节点中的有效数据的数据量,从所述多个存储节点中选择目标节点,所述目标节点存储的第一有效数据的数据量超过设定的数量阈值;
    所述主节点创建目标逻辑单元,所述目标逻辑单元所分布的存储节点中包括所述目标节点;
    所述主节点指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址,所述第一源地址指示的存储空间属于所述源逻辑单元,所述第一目标地址指示的存储空间属于所述目标逻辑单元;
    所述主节点释放所述第一源地址指示的存储空间。
  2. 根据权1所述的方法,其特征在于,在所述主节点指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址之前,还包括:
    所述主节点创建迁移列表,所述迁移列表包括所述第一有效数据的第一源地址以及所述第一有效数据的第一目标地址;
    所述主节点将所述迁移列表发送给所述目标节点。
  3. 根据权2所述的方法,其特征在于,所述多个存储节点还包括其他存储节点,所述其他存储节点独立于所述目标逻辑单元所分布的存储节点,所述迁移列表还包括所述其他存储节点内存储的第二有效数据的第二源地址和所述第二有效数据的第二目标地址,所述第二源地址位于所述其他存储节点中,所述第二目标地址位于所述目标节点中,所述方法还包括:
    所述主节点将所述迁移列表发送给所述其他存储节点;
    所述主节点指令所述其他存储节点将存储的所述第二有效数据从所述第二源地址迁移至所述第二目标地址,所述第二源地址指示的存储空间属于所述源逻辑单元,所述第二目标地址指示的存储空间属于所述目标逻辑单元;
    所述主节点释放所述第二源地址指示的存储空间。
  4. 根据权1所述的方法,其特征在于,所述第一源地址和所述第一目标地址均位于所述目标节点的第一硬盘中。
  5. 根据权1所述的方法,其特征在于,所述第一源地址位于所述目标节点的第一硬盘中,所述第一目标地址位于所述目标节点的第二硬盘中。
  6. 根据权1所述的方法,其特征在于,所述主节点指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址包括:
    所述主节点指令所述目标节点根据所述第一有效数据位于所述源逻辑单元内的偏移量将所述第一有效数据从所述第一源地址迁移至所述第一目标地址,使得所述迁移后所述第一有效数据位于所述目标逻辑单元内的偏移量与迁移前所述第一有效数据位于所述源逻辑单元内的偏移量相同。
  7. 根据权6所述的方法,其特征在于,所述第一有效数据迁移前分布在所述源逻辑单元的第一分条中,所述第一有效数据迁移后分布在所述目标逻辑单元的第二分条中,所述方法还包括:
    判断所述第一分条所包含的数据分片与所述第二分条所包含的数据分片是否相同;
    当所述第一分条所包含的数据分片与所述第二分条所包含的数据分片相同时,保留所述第一分条所包含的校验分片。
  8. 一种主节点,其特征在于,所述主节点位于分布式存储系统中,所述分布式存储系统包括多个存储节点,所述主节点包括接口和处理器,其中
    所述接口用于与所述多个存储节点进行通信;
    所述处理器用于:
    根据源逻辑单元分布在每个存储节点中的有效数据的数据量,从所述多个存储节点中选择目标节点,所述目标节点存储的第一有效数据的数据量超过设定的数量阈值;
    创建目标逻辑单元,所述目标逻辑单元所分布的存储节点中包含所述目标节点;
    通过所述接口指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址,所述第一源地址指示的存储空间属于所述源逻辑单元,所述第一目标地址指示的存储空间属于所述目标逻辑单元;
    释放所述第一源地址指示的存储空间。
  9. 根据权8所述的主节点,其特征在于,所述处理器还用于:
    在所述指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址之前,创建迁移列表,所述迁移列表包括所述第一有效数据的第一源地址以及所述第一有效数据的第一目标地址;并且将所述迁移列表发送给所述目标节点。
  10. 根据权8所述的主节点,其特征在于,所述多个存储节点包括其他存储节点,所述其他存储节点独立于所述目标逻辑单元所分布的存储节点,所述迁移列表还包括所述其他存储节点内存储的第二有效数据的第二源地址和所述第二有效数据的第二目标地址,所述第二源地址位于所述其他存储节点中,所述第二目标地址位于所述目标节点中,所述处理器还用于:
    通过所述接口将所述迁移列表发送给所述其他存储节点;
    指令所述其他存储节点将存储的所述第二有效数据从所述第二源地址迁移至所述第二目标地址,所述第二源地址指示的存储空间属于所述源逻辑单元,所述第二目标地址指示的存储空间属于所述目标逻辑单元;
    释放所述第二源地址指示的存储空间。
  11. 根据权8所述的主节点,其特征在于,所述第一源地址和所述第一目标地址均位于所述目标节点的第一硬盘中。
  12. 根据权8所述的主节点,其特征在于,所述第一源地址位于所述目标节点的第一硬盘中,所述第一目标地址位于所述目标节点的第二硬盘中。
  13. 根据权8所述的主节点,其特征在于,所述处理器具体用于:
    指令所述目标节点根据所述第一有效数据位于所述源逻辑单元内的偏移量将所述第一有效数据从所述第一源地址迁移至所述第一目标地址,使得所述迁移后的所述第一有效数据位于所述目标逻辑单元内的偏移量与迁移前所述第一有效数据位于所述源逻辑单元内的偏移量相同。
  14. 根据权13所述的主节点,其特征在于,所述第一有效数据迁移前分布在所述源逻辑单元的第一分条中,所述第一有效数据迁移后分布在所述目标逻辑单元的第二分条中,所述处理器还用于:
    判断所述第一分条所包含的数据分片与所述第二分条所包含的数据分片是否相同;
    当所述第一分条所包含的数据分片与所述第二分条所包含的数据分片相同时,保留所述第一分条所包含的校验分片。
  15. 一种垃圾回收装置,所述装置位于分布式存储系统的主节点中,所述分布式存储系统包括多个存储节点,所述主节点是所述多个存储节点中的一个存储节点,所述垃圾回收装置包括:
    选择模块,用于根据源逻辑单元分布在每个存储节点中的有效数据的数据量,从所述多个存储节点中选择目标节点,所述目标节点存储的第一有效数据的数据量超过设定的数量阈值;
    创建模块,用于创建目标逻辑单元,所述目标逻辑单元所分布的存储节点中包括所述目标节点;
    指示模块,用于指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址,所述第一源地址指示的存储空间属于所述源逻辑单元,所述第一目标地址指示的存储空间属于所述目标逻辑单元;
    释放模块,用于释放所述第一源地址指示的存储空间。
  16. 根据权15所述的装置,其特征在于,所述创建模块还用于:在所述主节点指令所述目标节点将所述第一有效数据从所述目标节点内的第一源地址迁移至所述目标节点内的第一目标地址之前,创建迁移列表,所述迁移列表包括所述第一有效数据的第一源地址以及所述第一有效数据的第一目标地址;
    所述装置还包括发送模块,所述发送模块用于将所述迁移列表发送给所述目标节点。
  17. 根据权16所述的装置,其特征在于,所述多个存储节点还包括其他存储节点,所述其他存储节点独立于所述目标逻辑单元所分布的存储节点,所述迁移列表还包括所述其他存储节点内存储的第二有效数据的第二源地址和所述第二有效数据的第二目标地址,所述第二源地址位于所述其他存储节点中,所述第二目标地址位于所述目标节点中,
    所述发送模块还用于将所述迁移列表发送给所述其他存储节点;
    所述指示模块还用于指令所述其他存储节点将存储的所述第二有效数据从所述第二源地址迁移至所述第二目标地址,所述第二源地址指示的存储空间属于所述源逻辑单元,所述第二目标地址指示的存储空间属于所述目标逻辑单元;
    所述释放模块还用于释放所述第二源地址指示的存储空间。
  18. 根据权15所述的装置,其特征在于,第一源地址和所述第一目标地址均位于所述目标节点的第一硬盘中。
  19. 根据权15所述的装置,其特征在于,所述第一源地址位于所述目标节点的第一硬盘中,所述第一目标地址位于所述目标节点的第二硬盘中。
  20. 根据权15所述的装置,其特征在于,所述指示模块具体用于:
    指令所述目标节点根据所述第一有效数据位于所述源逻辑单元内的偏移量将所述第一有效数据从所述第一源地址迁移至所述第一目标地址,使得所述迁移后所述第一有效数据位于所述目标逻辑单元内的偏移量与迁移前所述第一有效数据位于所述源逻辑单元内的偏移量相同。
  21. 根据权20所述的装置,其特征在于,所述第一有效数据迁移前分布在所述源逻辑单元的第一分条中,所述第一有效数据迁移后分布在所述目标逻辑单元的第二分条中,所述指示模块还用于:
    判断所述第一分条所包含的数据分片与所述第二分条所包含的数据分片是否相同;
    当所述第一分条所包含的数据分片与所述第二分条所包含的数据分片相同时,保留所述第一分条所包含的校验分片。
PCT/CN2019/083960 2019-04-23 2019-04-23 分布式存储系统和分布式存储系统中垃圾回收方法 WO2020215223A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980089025.4A CN113302597A (zh) 2019-04-23 2019-04-23 分布式存储系统和分布式存储系统中垃圾回收方法
PCT/CN2019/083960 WO2020215223A1 (zh) 2019-04-23 2019-04-23 分布式存储系统和分布式存储系统中垃圾回收方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/083960 WO2020215223A1 (zh) 2019-04-23 2019-04-23 分布式存储系统和分布式存储系统中垃圾回收方法

Publications (1)

Publication Number Publication Date
WO2020215223A1 true WO2020215223A1 (zh) 2020-10-29

Family

ID=72941022

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/083960 WO2020215223A1 (zh) 2019-04-23 2019-04-23 分布式存储系统和分布式存储系统中垃圾回收方法

Country Status (2)

Country Link
CN (1) CN113302597A (zh)
WO (1) WO2020215223A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115202590B (zh) * 2022-09-15 2022-12-16 中电云数智科技有限公司 一种对固态硬盘ssd重定向数据的处理方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030187888A1 (en) * 2000-03-28 2003-10-02 Andrew Hayward Garbage collection
CN102024018A (zh) * 2010-11-04 2011-04-20 曙光信息产业(北京)有限公司 一种分布式文件系统中垃圾元数据的在线回收方法
CN102591789A (zh) * 2011-12-26 2012-07-18 成都市华为赛门铁克科技有限公司 存储空间回收方法及装置
CN103858092A (zh) * 2013-12-19 2014-06-11 华为技术有限公司 一种数据迁移方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030187888A1 (en) * 2000-03-28 2003-10-02 Andrew Hayward Garbage collection
CN102024018A (zh) * 2010-11-04 2011-04-20 曙光信息产业(北京)有限公司 一种分布式文件系统中垃圾元数据的在线回收方法
CN102591789A (zh) * 2011-12-26 2012-07-18 成都市华为赛门铁克科技有限公司 存储空间回收方法及装置
CN103858092A (zh) * 2013-12-19 2014-06-11 华为技术有限公司 一种数据迁移方法和装置

Also Published As

Publication number Publication date
CN113302597A (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
US9128636B2 (en) Methods and apparatus for migrating thin provisioning volumes between storage systems
US10649890B2 (en) Information processing system, storage control apparatus, storage control method, and storage control program
US8984248B2 (en) Data migration system and data migration method
US20190073130A1 (en) File management method, distributed storage system, and management node
TWI684098B (zh) 記憶體系統及控制非揮發性記憶體之控制方法
US20170177224A1 (en) Dynamic storage transitions employing tiered range volumes
US10503424B2 (en) Storage system
EP3352071B1 (en) Data check method and storage system
EP3958107A1 (en) Storage system, memory management method, and management node
US8001324B2 (en) Information processing apparatus and informaiton processing method
US20210004166A1 (en) Data writing method, client server, and system
US20240053886A1 (en) File operations in a distributed storage system
WO2021008197A1 (zh) 资源分配方法、存储设备和存储系统
EP4012547B1 (en) Storage method and apparatus for key value (kv) and storage device
US20150186063A1 (en) Compound storage system and storage control method
US20190114076A1 (en) Method and Apparatus for Storing Data in Distributed Block Storage System, and Computer Readable Storage Medium
US10282116B2 (en) Method and system for hardware accelerated cache flush
WO2020215223A1 (zh) 分布式存储系统和分布式存储系统中垃圾回收方法
WO2020083106A1 (zh) 存储系统中的节点扩容方法和存储系统
US10282301B2 (en) Method and system for hardware accelerated read-ahead caching
US20210311654A1 (en) Distributed Storage System and Computer Program Product
KR20230166803A (ko) 높은 퍼지 성능을 제공하는 스토리지 장치 및 그것의 메모리 블록 관리 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19925749

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19925749

Country of ref document: EP

Kind code of ref document: A1