WO2024055529A1 - Placement group member selection method and apparatus, device, and readable storage medium - Google Patents

Placement group member selection method and apparatus, device, and readable storage medium Download PDF

Info

Publication number
WO2024055529A1
WO2024055529A1 PCT/CN2023/078429 CN2023078429W WO2024055529A1 WO 2024055529 A1 WO2024055529 A1 WO 2024055529A1 CN 2023078429 W CN2023078429 W CN 2023078429W WO 2024055529 A1 WO2024055529 A1 WO 2024055529A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
placement
nodes
placement group
members
Prior art date
Application number
PCT/CN2023/078429
Other languages
French (fr)
Chinese (zh)
Inventor
张凯
孙润宇
丁纯杰
孟祥瑞
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Publication of WO2024055529A1 publication Critical patent/WO2024055529A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Definitions

  • the present application relates to the field of computer technology, and in particular to a method, device, equipment and readable storage medium for selecting placement group members.
  • placement group A1 in storage pool A corresponds to placement group B1 in storage pool B
  • the primary member in placement group A1 can forward the data to be processed to placement group B1 for processing.
  • the master member in placement group B1 can also forward the data to be processed to placement group A1 for processing.
  • the primary member in a certain placement group is any member of the corresponding placement group.
  • the number of members in a placement group depends on the erasure design of the current storage pool and the number of redundant replicas.
  • the purpose of this application is to provide a method, device, equipment and readable storage medium for selecting placement group members, so as to improve the data forwarding efficiency of corresponding placement groups.
  • the plan is as follows:
  • This application provides a method for selecting placement group members, including:
  • N is the number of members of the second placement group
  • N nodes are selected among the target nodes, including:
  • selecting one disk in each of the N nodes includes:
  • it also includes:
  • the number of members of the first placement group is less than the number of members of the second placement group, determine other nodes except the target node in the current distributed system, and select nodes among the other nodes so that the selected node is consistent with the target node. After the sum of the numbers is N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
  • selecting nodes among other nodes so that the sum of the number of selected nodes and target nodes is N includes:
  • it also includes:
  • it also includes:
  • the nodes where each member of the fault placement group is located form an object node set, including:
  • it also includes:
  • the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;
  • it also includes:
  • the node selected from the non-overlapping nodes has no available disk, determine other nodes except the object node set in the current distributed system, and select the corresponding placement group with the smallest number among the other nodes. After selecting the node, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the failed member with the currently selected disk.
  • it also includes:
  • This application also provides a placement group member selection device, including:
  • the determination module is used to determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes multiple first placement groups. second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;
  • the placement group selection module is used to select a second placement group in the placement group set and determine the target node where each member of the first placement group is located;
  • a node selection module used to select N nodes in the target node if the number of members of the first placement group is not less than the number of members of the second placement group; N is the number of members of the second placement group;
  • the member selection module is used to select a disk in each of the N nodes to obtain the N second placement group. member.
  • the node selection module is used to:
  • the member selection module is used to:
  • it also includes:
  • Another node selection module is used to determine other nodes except the target node in the current distributed system if the number of members of the first placement group is less than the number of members of the second placement group, and select nodes among the other nodes. After making the sum of the number of selected nodes and target nodes equal to N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
  • another node selection module is used to:
  • it also includes:
  • the selection member module of other second placement groups is used to select any second placement group in the placement group set. If there are other unselected second placement groups in the placement group set, then among the N nodes Select members in each node for otherwise unselected secondary placement groups.
  • it also includes: a fault processing module, and the fault processing module includes:
  • the object node set determination unit is used to determine the fault placement group to which the faulty member belongs if a member in any storage pool fails, and configure the nodes where each member of the fault placement group is located to form an object node set;
  • the corresponding node set determination unit is used to determine the placement group corresponding to the fault placement group in another storage pool, and configure the nodes where each member of the placement group is located to form a corresponding node set;
  • the non-overlapping node determination unit is used to determine the non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;
  • the member replacement unit is used to select the node with the smallest number of corresponding placement groups among non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the failed member with the currently selected disk.
  • the object node set determination unit is used for:
  • the corresponding node set determining unit is also used for:
  • the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;
  • the member replacement unit is also used to:
  • the node selected from the non-overlapping nodes has no available disk, determine other nodes except the object node set in the current distributed system, and select the corresponding placement group with the smallest number among the other nodes. After selecting the node, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the failed member with the currently selected disk.
  • the fault handling module also includes:
  • the data recovery unit is used to recover the data in the failed member to the currently selected disk after replacing the failed member with the currently selected disk.
  • it also includes:
  • the main member selection module is used to select members with the same nodes in the first placement group and the second placement group that correspond to each other as main members of the corresponding placement group.
  • This application also provides a distributed storage system, which is characterized in that it includes multiple nodes, and each node includes: multiple disks;
  • a part of the disks constitutes the first storage pool of any of the above items, and the other part of the disks constitutes the second storage pool of any of the above items.
  • the performance of each disk of the first storage pool is higher than the performance of each disk of the second storage pool.
  • This application also provides an electronic device, including:
  • Memory used to store computer programs
  • a processor is configured to execute a computer program to implement the aforementioned disclosed placement group member selection method.
  • This application also provides a non-volatile computer-readable storage medium for saving a computer program, wherein the calculation When the machine program is executed by the processor, the aforementioned disclosed placement group member selection method is implemented.
  • this application provides a method for selecting placement group members, which includes: determining the placement group set corresponding to any first placement group in the first storage pool in the second storage pool; wherein, the first storage pool Including multiple first placement groups, the second storage pool includes multiple second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool; any second placement group set is selected Place the group and determine the target node where each member of the first placement group is located; if the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes in the target node; N is the The number of members of the second placement group; select a disk in each of the N nodes to obtain N members of the second placement group.
  • this application uses a placement group in a storage pool with a smaller total number of placement groups as a benchmark to select members for the placement group corresponding to the placement group in another storage pool.
  • the placement group member selection device, equipment and readable storage medium provided by this application also have the above technical effects.
  • Figure 1 is a flow chart of a method for selecting placement group members provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of the correspondence between PGs in two storage pools provided by the embodiment of the present application;
  • Figure 3 is a schematic diagram of placement group member selection provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of fault processing provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another fault processing provided by the embodiment of the present application.
  • Figure 6 is a schematic diagram of a placement group member selection device provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by an embodiment of the present application.
  • this application provides a placement group member selection scheme that can improve the data forwarding efficiency of corresponding placement groups.
  • an embodiment of the present application provides a method for selecting placement group members, including:
  • the first storage pool includes multiple first placement groups
  • the second storage pool includes multiple second placement groups
  • the total number of placement groups in the first storage pool is smaller than the total number of placement groups in the second storage pool.
  • the first storage pool is represented by A
  • the first placement groups in it are: A1 ⁇ A4, a total of 4 first placement groups
  • the second storage pool is represented by B
  • the second placement groups in it are: B1 ⁇ A8 , a total of 8 second placement groups; then, one first placement group in the first storage pool A corresponds to two second placement groups in the second storage pool B.
  • the corresponding relationship is: A1 corresponds to B1 and B5, and A2 corresponds to B2 and B6, A3 corresponds to B3 and B7, and A4 corresponds to B4 and B8. Accordingly, the set of placement groups corresponding to A1 is ⁇ B1,B5 ⁇ , the set of placement groups corresponding to A2 is ⁇ B2,B6 ⁇ , the set of placement groups corresponding to A3 is ⁇ B3,B7 ⁇ , and the set of placement groups corresponding to A4 is ⁇ B4,B8 ⁇ .
  • the first storage pool may be a cache pool, and the second storage pool may be a low-speed storage pool.
  • a placement group is a vehicle for placing objects.
  • One placement group corresponds to multiple objects, and one object corresponds to one disk.
  • Each member of a placement group is: each disk corresponding to the placement group. Because each disk is distributed on each node of the distributed system, for a placement group, you can determine the node where each member of the placement group resides.
  • PG Picture Group
  • OSD Object-based Storage Device
  • N is the number of members of the second placement group.
  • the number of members in a placement group depends on the erasure design and the number of redundant copies of the storage pool to which the placement group belongs. If the storage pool to which a placement group belongs has a 4+2 erasure design, the number of members of the placement group is 6; if the number of redundant copies of the storage pool to which a placement group belongs is 3, the number of members of the placement group The number is 3.
  • the purpose in some embodiments is to duplicate the nodes where each member of the corresponding placement group is located in the two storage pools, therefore, after determining the node where each member of the first placement group is located, the first node among these nodes is If the second placement group corresponding to the placement group selects members, then the nodes of the members in the first placement group and the second placement group that correspond to each other are duplicates.
  • selecting N nodes among the target nodes includes: arranging the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtaining a node sequence, and selecting the first N nodes in the node sequence.
  • the target nodes in descending order according to the number of the second placement groups corresponding to the nodes, obtain the node sequence, and select the last N nodes in the node sequence. After selecting a member for the second placement group on a certain node, the number of the second placement group corresponding to the node is increased by one.
  • the node also records the number of the first placement group. If a member is selected for the first placement group on a node, the number of the first placement group corresponding to the node is increased by one.
  • M ⁇ N M nodes where each member of the first placement group is located are given priority, and then nodes with a smaller number of second placement groups are selected from other nodes in the system to make up N nodes.
  • the method further includes: if the number of members of the first placement group is less than the number of members of the second placement group, determining other nodes except the target node in the current distributed system, and on other nodes After selecting nodes so that the sum of the number of selected nodes and target nodes is N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
  • selecting a node among other nodes so that the sum of the numbers of the selected node and the target node is N includes: arranging the other nodes in ascending order according to the number of the second placement group corresponding to the node to obtain a node sequence, and Select the first NM nodes in the node sequence; M is the number of members of the first placement group; or place other nodes according to the second placement group corresponding to the node. Arrange the numbers in descending order to obtain the node sequence, and select the last NM nodes in the node sequence; M is the number of members of the first placement group.
  • selecting disks in the selected node disks with a smaller number of second placement groups are also given priority. That is to say, after any disk is selected as a member of the second placement group at a certain time, the number of the second placement group corresponding to the disk is increased by one.
  • the disk also records the number of the first placement group. If a disk is selected as a member of the first placement group at a time, the number of the first placement group corresponding to the disk is increased by one. Therefore, in some embodiments, selecting a disk in each of the N nodes includes: selecting a disk with the smallest number of corresponding second placement groups in each of the N nodes.
  • any second placement group in the placement group set if there are other unselected second placement groups in the placement group set, then in each of the N nodes, other Select members of the second placement group that are not selected. That is to say, other unselected second placement groups in the placement group set select members according to the second placement group whose members have been determined.
  • the members of B1 are first determined, and the N nodes determined when selecting members for B1 are: D1 ⁇ DN, then directly select members for B5 from D1 ⁇ DN. That is: select the disk with the smallest number of second placement groups in each node from D1 to DN, then the selected N disks are members of B5.
  • the N nodes D1 to DN can be: when M ⁇ N, the N nodes selected from the M nodes; or when M ⁇ N, the sum of the M nodes and the additionally selected M-N nodes.
  • N disks can be selected by selecting a disk in each node, and these N disks are the N members of the second placement group. In this way, the nodes where the members of the first placement group and the second placement group that correspond to each other are located overlap.
  • primary members for the first placement group and the second placement group there is a high probability that the primary members of the two placement groups will be the same. can be on one node.
  • the primary members of the corresponding placement groups are on the same node, the data forwarding of the two placement groups only needs to be completed on the same node without going through the network. This can improve the data forwarding efficiency of the corresponding placement groups. .
  • a member in any storage pool fails, determine the fault placement group to which the faulty member belongs, and configure the nodes where each member of the fault placement group is located to form an object node set; determine the fault placement group The corresponding placement group in another storage pool, and the nodes where each member of the placement group is located form a corresponding node set; determine the non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set; select the corresponding node among the non-overlapping nodes Place the node with the smallest number of placement groups, select the corresponding disk with the smallest number of placement groups in the selected node, and replace the faulty member with the currently selected disk.
  • forming an object node set with nodes where each member of the fault placement group is located includes: determining the object node where each member of the fault placement group is located, deleting the node where the fault member is located from the object node, and forming the remaining nodes into Object node set. It should be noted that regardless of whether the node where the faulty member is located is deleted from the object node, it will not affect the implementation of this application, but whether the deletion will have an impact on the system balance. Among them, if the node where the failed member is located is deleted from the object node, it is possible to select another disk in the node where the failed member is located to replace the failed member.
  • the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets; select among the multiple corresponding node sets A corresponding node set, and performs the steps of determining non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set.
  • the node selected from the non-overlapping nodes has no available disk
  • other nodes other than the object node set are determined in the current distributed system, and the corresponding nodes are selected among the other nodes.
  • After selecting the node with the smallest number of placement groups perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the faulty member with the currently selected disk.
  • the method further includes: restoring data in the failed member to the currently selected disk.
  • the method further includes: selecting members with the same nodes in the first placement group and the second placement group that correspond to each other as the main members of the corresponding placement group, so that the first placement group and the second placement group If the main member of a placement group is on one node, the data forwarding of the two placement groups only needs to be completed on the same node without going through the network. This can improve the data forwarding efficiency of the corresponding placement groups.
  • the following embodiment uses the cache pool and the data pool as examples to introduce the solution. After the cache pool and data pool are created, the PGs in them need to be bound. During binding, this embodiment adjusts the PG distribution in the storage pool with a large number of PGs. It is known that after the storage pool is created, the PG distribution in the storage pool is The PGs have been evenly distributed. The storage pool with a smaller number of PGs is used as the base pool (base_pool), and the other storage pool is called the bound pool (tier_pool) of the base pool.
  • base_pool base pool
  • tier_pool bound pool
  • the number of PGs in the storage pool is an integer power of 2, even if the number of PGs in the two storage pools is different, the relationship is still an integer multiple of 2. It can be seen that the following PG correspondence is always true: taking pool1, a storage pool with a small number of PGs, as a benchmark, the PGs in a storage pool with a large number of PGs can be divided according to the number of PGs in pool1, and each divided portion The number of PGs in and pool1 The number of PGs is equal. Now, each copy is matched one-to-one with the PG of pool1, and the corresponding relationship of PG can be obtained. For example: Divide the 4096 PGs in pool2 into 4 parts according to 1024, and each part corresponds to the 1024 PGs in pool1 respectively. To obtain the corresponding relationship between PGs, please refer to Figure 2.
  • a disk corresponds to two reference counts
  • one reference count is used to record the number of times it is selected by the PG in the benchmark pool (that is, the "number of first placement groups corresponding to the disk” in the above embodiment)
  • the other reference count Used to record the number of times selected by the PG in the bound pool (ie, the "number of second placement groups corresponding to the disks" in the above embodiment).
  • a PG in the benchmark pool (recorded as base_pg)
  • Sequence S If the node id of each member of base_pg is inserted into the array S, but the array S is not full, insert UNDEF into it until it is full. Then, the member is selected according to the tier_pg (denoted as As a member of The disk with the smallest reference count is selected as a member of X, and the reference counts of the disk and the node are increased by 1. After the sequence S is traversed, each member can be selected for X.
  • pool1 base pool
  • pool2 The number consisting of node identifiers determined by 2.1 corresponding to 1.1 in the (binding pool) is: node 1, node 2, node 3, UNDEF, UNDEF, UNDEF.
  • the number 1 in pool2 is PG: 2.1, and the determined members are: 2, 11, 21, 31, 41, 51. Then the reference counts of these members are increased from 0 to 1, and at the same time, the reference counts of each node in the array are increased from 0 to 1.
  • tier_pgs For other tier_pgs (denoted as Y) corresponding to base_pg in the binding pool except X, select members for them according to the sequence S. Similarly, traverse the sequence S. If a position in the sequence S is a node id, select the disk with the smallest reference count in the node as a member of Y, and increase the reference count of the disk and the node by 1; if a position in the sequence S is UNDEF, select the node with the smallest reference count from other nodes in the system, and select the disk with the smallest reference count from the node with the smallest reference count as a member of Y, and increase the reference count of the disk and the node by 1.
  • the sequence S is traversed. After the process is completed, you can select each member for Y.
  • the members of each PG in the binding pool can be determined, and the nodes of each member of the corresponding PG in the two storage pools should be as duplicate as possible, providing convenient conditions for the main members of the two to be located on the same node.
  • a member in the benchmark pool fails, first obtain the ID of the failed member, determine the node ID of the failed member based on this ID, collect the node IDs of each member of the PG (denoted as R1) to which the failed member belongs, and remove the node ID of the failed member. , forming an object node set. Determine a certain tier_pg corresponding to R1 in the binding pool, obtain the node IDs of all members in the tier_pg, and form a corresponding node set. Find all node IDs that are in the corresponding node set but not in the object node set.
  • UNDEF select the node with the smallest reference count among other nodes in the system that does not coincide with the selected node, and select a disk with the smallest reference count within the node.
  • the reference count of the corresponding disk and the corresponding node is increased by 1. In this way, the newly selected disk can try to belong to the node where the corresponding PG member is located. It can be seen that after the fault is handled, the nodes where the two corresponding PG members are located will overlap as much as possible.
  • the nodes of each member of base_pg in the benchmark pool are: node 1, node 2, and node 3, and one of the members of base_pg fails, and the node where the failed member is located is node 1, and in the binding pool
  • the nodes of each member of the tier_pg corresponding to the base_pg are: node 1, node 2, node 3, node 4, node 5, and node 6.
  • a member in the binding pool fails, first obtain the ID of the failed member, determine the node ID of the failed member based on this ID, collect the node IDs of each member of the PG (denoted as R2) to which the failed member belongs, and remove the node where the failed member is located. id, which constitutes the object node set. Determine a base_pg corresponding to R2 in the benchmark pool, obtain the node IDs of all members in the base_pg, and form a corresponding node set. Find all node IDs that are in the object node set but not in the corresponding node set.
  • UNDEF select the node with the smallest reference count among other nodes in the system that does not coincide with the selected node, and select a disk with the smallest reference count within the node.
  • the reference count of the corresponding disk and the corresponding node is increased by 1. In this way, the newly selected disk can try to belong to the node where the corresponding PG member is located. It can be seen that after the fault is handled, the nodes where the two corresponding PG members are located will overlap as much as possible.
  • the nodes of each member of base_pg in the benchmark pool are: node 1, node 2, and node 3.
  • the nodes of each member of the tier_pg corresponding to the base_pg in the pool are: node 1, node 2, node 3, node 4, node 5, node 6, but the member of the tier_pg located at node 1 fails.
  • find out the The node where the base_pg member is located but not the node where the tier_pg member is located the result is: node 1, and then select a disk with the smallest reference count in node 1 to replace the failed member in tier_pg.
  • a solution for optimizing members of placement groups in distributed storage is provided to try to ensure that members of corresponding PGs in the two storage pools can be selected to the same node, and provide better support for subsequent selections.
  • Main members provide convenience. When a member fails, the optimization of PG member selection can also avoid the occurrence of redundant reconstruction, has wider adaptability, minimizes the number of times business data is forwarded through the network, reduces network pressure, and improves the performance of the storage cluster. , improving product competitiveness.
  • placement group member selection device provided by an embodiment of the present application.
  • the placement group member selection device described below and the placement group member selection method described above may be mutually referenced.
  • an embodiment of the present application provides a placement group member selection device, which includes:
  • Determining module 601 is used to determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes Multiple second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;
  • the placement group selection module 602 is used to select any second placement group in the placement group set and determine the target node where each member of the first placement group is located;
  • the node selection module 603 is used to select N nodes in the target node if the number of members of the first placement group is not less than the number of members of the second placement group; N is the number of members of the second placement group;
  • the member selection module 604 is used to select a disk in each of the N nodes to obtain N members of the second placement group.
  • the node selection module is used to:
  • the member selection module is used to:
  • it also includes:
  • Another node selection module is used to determine other nodes except the target node in the current distributed system if the number of members of the first placement group is less than the number of members of the second placement group, and select nodes among the other nodes. After making the sum of the number of selected nodes and target nodes equal to N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
  • another node selection module is used to:
  • it also includes:
  • the selection member module of other second placement groups is used to select any second placement group in the placement group set. If there are other unselected second placement groups in the placement group set, then among the N nodes Select members in each node for otherwise unselected secondary placement groups.
  • it also includes: a fault processing module, and the fault processing module includes:
  • the object node set determination unit is used to determine the fault placement group to which the faulty member belongs if a member in any storage pool fails, and configure the nodes where each member of the fault placement group is located to form an object node set;
  • the corresponding node set determination unit is used to determine the placement group corresponding to the fault placement group in another storage pool, and configure the nodes where each member of the placement group is located to form a corresponding node set;
  • the non-overlapping node determination unit is used to determine the non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;
  • the member replacement unit is used to select the node with the smallest number of corresponding placement groups among non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the failed member with the currently selected disk.
  • the object node set determination unit is used for:
  • the corresponding node set determining unit is also used for:
  • the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;
  • the member replacement unit is also used to:
  • the node selected from the non-overlapping nodes has no available disk, determine other nodes except the object node set in the current distributed system, and select the corresponding placement group with the smallest number among the other nodes. After selecting the node, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the failed member with the currently selected disk.
  • the fault handling module also includes:
  • the data recovery unit is used to recover the data in the failed member to the currently selected disk after replacing the failed member with the currently selected disk.
  • it also includes:
  • the main member selection module is used to select members with the same nodes in the first placement group and the second placement group that correspond to each other as main members of the corresponding placement group.
  • a device for selecting placement group members which can duplicate the nodes where each member of the corresponding placement groups in the two storage pools is located, so that the main members of the two storage pools can be on one node as much as possible. , the data forwarding of these two placement groups can be completed on the same node without going through the network, thus improving the data forwarding efficiency of the corresponding placement groups.
  • Embodiments of the present application provide a distributed storage system, including multiple nodes, each node including: multiple disks; wherein a part of all disks constitute the first storage pool in any of the above embodiments, and the other part of the disks Constitute the second storage pool in any of the above embodiments.
  • the performance of each disk in the first storage pool is higher than the performance of each disk in the second storage pool.
  • the first storage pool is a cache pool
  • the second storage pool is a low-speed storage pool.
  • An electronic device provided by an embodiment of the present application is introduced below.
  • the electronic device described below and the placement group member selection method and device described above may be referred to each other.
  • an electronic device including:
  • Memory 701 used to store computer programs
  • the processor 702 is used to execute computer programs to implement the methods disclosed in any of the above embodiments.
  • inventions of the present application also provide a server as the above-mentioned electronic device.
  • the server may include: at least one processor, at least one memory, a power supply, a communication interface, an input/output interface and a communication bus.
  • the memory is used to store a computer program, and the computer program is loaded and executed by the processor to implement relevant steps in the placement group member selection method disclosed in any of the foregoing embodiments.
  • the power supply is used to provide operating voltage for each hardware device on the server;
  • the communication interface can create a data transmission channel between the server and external devices, and the communication protocol it follows is applicable to the technical solution of this application. Any communication protocol, which is not limited here; the input and output interface is used to obtain external input data or output data to the outside world, and its interface type can be selected according to application needs, which is not limited here.
  • the memory as a carrier for resource storage, can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc.
  • the resources stored thereon include operating systems, computer programs and data, etc.
  • the storage method can be short-term storage or permanent storage.
  • the operating system is used to manage and control various hardware devices and computer programs on the server to implement the processor's calculation and processing of data in the memory. It can be Windows Server, Netware, Unix, Linux, etc.
  • the computer program can further include computer programs that can be used to complete other specific tasks.
  • data such as the virtual machine
  • the data may also include data such as the developer information of the virtual machine.
  • inventions of the present application also provide a terminal as the above-mentioned electronic device.
  • the terminal may include but is not limited to a smartphone, a tablet, a laptop or a desktop computer.
  • a terminal in some embodiments includes: a processor and a memory.
  • the processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc.
  • the processor can be implemented using at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array).
  • the processor can also include a main processor and a co-processor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); a co-processor is used A low-power processor used to process data in standby mode.
  • the processor may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing content that needs to be displayed on the display screen.
  • the processor may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory may include one or more computer-readable storage media, which may be non-transitory. Memory may also include high-speed random access memory, and non-volatile memory, such as one or more disk storage devices, flash memory storage devices. In some embodiments, the memory is at least used to store the following computer program, wherein, after the computer program is loaded and executed by the processor, the correlation in the placement group member selection method executed by the terminal side disclosed in any of the foregoing embodiments can be implemented. step.
  • the resources stored in the memory may also include operating systems and data, and the storage method may be short-term storage or permanent storage. Among them, the operating system can include Windows, Unix, Linux, etc. Data may include, but is not limited to, application update information.
  • the terminal may also include a display screen, an input and output interface, a communication interface, a sensor, a power supply, and a communication bus.
  • the electronic device may be any node with management functions in the distributed system.
  • non-volatile computer-readable storage medium provided by embodiments of the present application.
  • the non-volatile computer-readable storage medium described below and the placement group member selection method and device described above are Devices can reference each other.
  • Figure 8 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by the present application.
  • a non-volatile computer-readable storage medium 8 is used to store a computer program 81, wherein when the computer program 81 is executed by a processor, the method for selecting placement group members disclosed in the aforementioned embodiments is implemented.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. Any other form of non-volatile computer-readable storage medium known to the public.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a placement group member selection method and apparatus, a device, and a readable storage medium in the technical field of computers. According to the present application, target nodes where members of a first placement group serving as a reference are located are determined; if the number of the members of the first placement group is not less than the number of members of a second placement group for which members are to be selected, N nodes are selected from the target nodes; and a disk is selected from each of the N nodes to obtain N members of the second placement group, so that the nodes where the members of the second placement group are located coincide with the target nodes, main members of the first placement group and the second placement group can subsequently conveniently be made to be on the same node, and data forwarding of the two placement groups can be completed on the same node without using a network. Therefore, the data forwarding efficiency of placement groups corresponding to each other can be improved. The placement group member selection apparatus, the device, and the readable storage medium provided by the present application also have the described technical effects.

Description

放置组成员选择方法、装置、设备及可读存储介质Placement group member selection method, device, equipment and readable storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年09月14日提交中国专利局,申请号为202211112880.2,申请名称为“放置组成员选择方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on September 14, 2022, with the application number 202211112880.2, and the application name is "Placement Group Member Selection Method, Device, Equipment and Readable Storage Medium", and its entire content incorporated herein by reference.
技术领域Technical field
本申请涉及计算机技术领域,特别涉及一种放置组成员选择方法、装置、设备及可读存储介质。The present application relates to the field of computer technology, and in particular to a method, device, equipment and readable storage medium for selecting placement group members.
背景技术Background technique
在分布式存储场景下,两个绑定的存储池中存在相互对应的放置组。例如:存储池A中的放置组A1对应存储池B中的放置组B1,那么放置组A1中的主成员可以将要处理的数据转发至放置组B1进行处理。当然,放置组B1中的主成员也可以将要处理的数据转发至放置组A1进行处理。其中,某一放置组中的主成员是相应放置组的各成员中的任一个。一个放置组中有几个成员取决于当前存储池的纠删设计和冗余副本数。In a distributed storage scenario, there are corresponding placement groups in two bound storage pools. For example: placement group A1 in storage pool A corresponds to placement group B1 in storage pool B, then the primary member in placement group A1 can forward the data to be processed to placement group B1 for processing. Of course, the master member in placement group B1 can also forward the data to be processed to placement group A1 for processing. Among them, the primary member in a certain placement group is any member of the corresponding placement group. The number of members in a placement group depends on the erasure design of the current storage pool and the number of redundant replicas.
需要说明的是,当系统压力达到一定程度时,相互对应的放置组的转发效率会降低;若达到网络瓶颈,转发的速度会被限制,那么分布式存储集群的性能就达不到预期。It should be noted that when the system pressure reaches a certain level, the forwarding efficiency of the corresponding placement groups will decrease; if the network bottleneck is reached, the forwarding speed will be limited, and the performance of the distributed storage cluster will not meet expectations.
因此,如何提高相互对应的放置组的数据转发效率,是本领域技术人员需要解决的问题。Therefore, how to improve the data forwarding efficiency of corresponding placement groups is a problem that those skilled in the art need to solve.
发明内容Contents of the invention
有鉴于此,本申请的目的在于提供一种放置组成员选择方法、装置、设备及可读存储介质,以提高相互对应的放置组的数据转发效率。其方案如下:In view of this, the purpose of this application is to provide a method, device, equipment and readable storage medium for selecting placement group members, so as to improve the data forwarding efficiency of corresponding placement groups. The plan is as follows:
本申请提供了一种放置组成员选择方法,包括:This application provides a method for selecting placement group members, including:
确定第一存储池中的任一个第一放置组在第二存储池中对应的放置组集合;其中,第一存储池包括多个第一放置组,第二存储池包括多个第二放置组;第一存储池中的放置组总数小于第二存储池中的放置组总数;Determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes multiple second placement groups ;The total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;
在放置组集合中任选一个第二放置组,并确定第一放置组的各成员所在的目标节点; Select any second placement group in the placement group set and determine the target node where each member of the first placement group is located;
若第一放置组的成员个数不小于第二放置组的成员个数,则在目标节点中选择N个节点;N为第二放置组的成员个数;If the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes in the target node; N is the number of members of the second placement group;
在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个成员。Select a disk in each of the N nodes to get N members of the second placement group.
在一些实施例中,在目标节点中选择N个节点,包括:In some embodiments, N nodes are selected among the target nodes, including:
将目标节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N个节点;Arrange the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtain the node sequence, and select the first N nodes in the node sequence;
或,or,
将目标节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节点序列中的后N个节点。Arrange the target nodes in descending order according to the number of the second placement groups corresponding to the nodes to obtain a node sequence, and select the last N nodes in the node sequence.
在一些实施例中,在N个节点中的每个节点中选择一个磁盘,包括:In some embodiments, selecting one disk in each of the N nodes includes:
在N个节点中的每个节点中选择对应的第二放置组个数最少的磁盘。Select the disk with the smallest number of corresponding second placement groups from each node among the N nodes.
在一些实施例中,还包括:In some embodiments, it also includes:
若第一放置组的成员个数小于第二放置组的成员个数,则在当前分布式系统中确定除目标节点之外的其他节点,在其他节点中选择节点使所选节点与目标节点的个数之和为N后,执行在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个成员的步骤。If the number of members of the first placement group is less than the number of members of the second placement group, determine other nodes except the target node in the current distributed system, and select nodes among the other nodes so that the selected node is consistent with the target node. After the sum of the numbers is N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
在一些实施例中,在其他节点中选择节点使所选节点与目标节点的个数之和为N,包括:In some embodiments, selecting nodes among other nodes so that the sum of the number of selected nodes and target nodes is N, includes:
将其他节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N-M个节点;M为第一放置组的成员个数;Arrange other nodes in ascending order according to the number of the second placement group corresponding to the node to obtain the node sequence, and select the first N-M nodes in the node sequence; M is the number of members of the first placement group;
或,or,
将其他节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节点序列中的后N-M个节点;M为第一放置组的成员个数。Arrange the other nodes in descending order according to the number of the second placement group corresponding to the node to obtain the node sequence, and select the last N-M nodes in the node sequence; M is the number of members of the first placement group.
在一些实施例中,还包括:In some embodiments, it also includes:
在放置组集合中任选一个第二放置组之后,若放置组集合中还有其他未被选择的第二放置组,则在N个节点中的每个节点中为其他未被选择的第二放置组选择成员。After selecting any second placement group in the placement group set, if there are other unselected second placement groups in the placement group set, then the other unselected second placement groups will be selected in each of the N nodes. Place group selection members.
在一些实施例中,还包括:In some embodiments, it also includes:
若任一存储池中的成员故障,则确定故障成员所属的故障放置组,并将故障放置组的各成员所在的节点组成对象节点集;If a member in any storage pool fails, determine the fault placement group to which the faulty member belongs, and combine the nodes where each member of the fault placement group is located to form an object node set;
确定故障放置组在另一存储池中对应的放置组,并将该放置组的各成员所在节点组成对应节点集; Determine the placement group corresponding to the fault placement group in another storage pool, and combine the nodes where each member of the placement group is located to form a corresponding node set;
确定属于对应节点集但不属于对象节点集的非重合节点;Determine non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;
在非重合节点中选择对应的放置组个数最少的节点,在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换故障成员。Select the node with the smallest number of corresponding placement groups among non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the faulty member with the currently selected disk.
在一些实施例中,将故障放置组的各成员所在的节点组成对象节点集,包括:In some embodiments, the nodes where each member of the fault placement group is located form an object node set, including:
确定故障放置组的各成员所在的对象节点,从对象节点中删除故障成员所在节点,并将剩余节点组成对象节点集。Determine the object node where each member of the fault placement group is located, delete the node where the fault member is located from the object nodes, and form the remaining nodes into an object node set.
在一些实施例中,还包括:In some embodiments, it also includes:
若故障放置组在另一存储池中对应的放置组有多个,则确定每个放置组的各成员所在节点,得到多个对应节点集;If the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;
在多个对应节点集中选择一个对应节点集,并执行确定属于对应节点集但不属于对象节点集的非重合节点的步骤。Select one corresponding node set among the plurality of corresponding node sets, and perform the step of determining non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set.
在一些实施例中,还包括:In some embodiments, it also includes:
若不存在非重合节点,或从非重合节点中选择的节点无可用磁盘,则在当前分布式系统中确定除对象节点集之外的其他节点,在其他节点中选择对应的放置组个数最少的节点后,执行在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换故障成员的步骤。If there are no non-overlapping nodes, or the node selected from the non-overlapping nodes has no available disk, determine other nodes except the object node set in the current distributed system, and select the corresponding placement group with the smallest number among the other nodes. After selecting the node, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the failed member with the currently selected disk.
在一些实施例中,用当前所选磁盘替换故障成员之后,还包括:In some embodiments, after replacing the failed member with the currently selected disk, it also includes:
将故障成员中的数据恢复至当前所选磁盘。Restores data from the failed member to the currently selected disk.
在一些实施例中,还包括:In some embodiments, it also includes:
在相互对应的第一放置组和第二放置组中分别选择节点相同的成员作为相应放置组的主成员。Select members with the same nodes in the first placement group and the second placement group that correspond to each other as the main members of the corresponding placement groups.
本申请还提供了一种放置组成员选择装置,包括:This application also provides a placement group member selection device, including:
确定模块,用于确定第一存储池中的任一个第一放置组在第二存储池中对应的放置组集合;其中,第一存储池包括多个第一放置组,第二存储池包括多个第二放置组;第一存储池中的放置组总数小于第二存储池中的放置组总数;The determination module is used to determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes multiple first placement groups. second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;
放置组选择模块,用于在放置组集合中任选一个第二放置组,并确定第一放置组的各成员所在的目标节点;The placement group selection module is used to select a second placement group in the placement group set and determine the target node where each member of the first placement group is located;
节点选择模块,用于若第一放置组的成员个数不小于第二放置组的成员个数,则在目标节点中选择N个节点;N为第二放置组的成员个数;A node selection module, used to select N nodes in the target node if the number of members of the first placement group is not less than the number of members of the second placement group; N is the number of members of the second placement group;
成员选择模块,用于在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个 成员。The member selection module is used to select a disk in each of the N nodes to obtain the N second placement group. member.
在一些实施例中,节点选择模块用于:In some embodiments, the node selection module is used to:
将目标节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N个节点;Arrange the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtain the node sequence, and select the first N nodes in the node sequence;
或,or,
将目标节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节点序列中的后N个节点。Arrange the target nodes in descending order according to the number of the second placement groups corresponding to the nodes to obtain a node sequence, and select the last N nodes in the node sequence.
在一些实施例中,成员选择模块用于:In some embodiments, the member selection module is used to:
在N个节点中的每个节点中选择对应的第二放置组个数最少的磁盘。Select the disk with the smallest number of corresponding second placement groups from each node among the N nodes.
在一些实施例中,还包括:In some embodiments, it also includes:
另一节点选择模块,用于若第一放置组的成员个数小于第二放置组的成员个数,则在当前分布式系统中确定除目标节点之外的其他节点,在其他节点中选择节点使所选节点与目标节点的个数之和为N后,执行在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个成员的步骤。Another node selection module is used to determine other nodes except the target node in the current distributed system if the number of members of the first placement group is less than the number of members of the second placement group, and select nodes among the other nodes. After making the sum of the number of selected nodes and target nodes equal to N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
在一些实施例中,另一节点选择模块用于:In some embodiments, another node selection module is used to:
将其他节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N-M个节点;M为第一放置组的成员个数;Arrange other nodes in ascending order according to the number of the second placement group corresponding to the node to obtain the node sequence, and select the first N-M nodes in the node sequence; M is the number of members of the first placement group;
或,or,
将其他节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节点序列中的后N-M个节点;M为第一放置组的成员个数。Arrange the other nodes in descending order according to the number of the second placement group corresponding to the node to obtain the node sequence, and select the last N-M nodes in the node sequence; M is the number of members of the first placement group.
在一些实施例中,还包括:In some embodiments, it also includes:
其他第二放置组的选择成员模块,用于在放置组集合中任选一个第二放置组之后,若放置组集合中还有其他未被选择的第二放置组,则在N个节点中的每个节点中为其他未被选择的第二放置组选择成员。The selection member module of other second placement groups is used to select any second placement group in the placement group set. If there are other unselected second placement groups in the placement group set, then among the N nodes Select members in each node for otherwise unselected secondary placement groups.
在一些实施例中,还包括:故障处理模块,故障处理模块包括:In some embodiments, it also includes: a fault processing module, and the fault processing module includes:
对象节点集确定单元,用于若任一存储池中的成员故障,则确定故障成员所属的故障放置组,并将故障放置组的各成员所在的节点组成对象节点集;The object node set determination unit is used to determine the fault placement group to which the faulty member belongs if a member in any storage pool fails, and configure the nodes where each member of the fault placement group is located to form an object node set;
对应节点集确定单元,用于确定故障放置组在另一存储池中对应的放置组,并将该放置组的各成员所在节点组成对应节点集;The corresponding node set determination unit is used to determine the placement group corresponding to the fault placement group in another storage pool, and configure the nodes where each member of the placement group is located to form a corresponding node set;
非重合节点确定单元,用于确定属于对应节点集但不属于对象节点集的非重合节点; The non-overlapping node determination unit is used to determine the non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;
成员替换单元,用于在非重合节点中选择对应的放置组个数最少的节点,在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换故障成员。The member replacement unit is used to select the node with the smallest number of corresponding placement groups among non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the failed member with the currently selected disk.
在一些实施例中,对象节点集确定单元用于:In some embodiments, the object node set determination unit is used for:
确定故障放置组的各成员所在的对象节点,从对象节点中删除故障成员所在节点,并将剩余节点组成对象节点集。Determine the object node where each member of the fault placement group is located, delete the node where the fault member is located from the object nodes, and form the remaining nodes into an object node set.
在一些实施例中,对应节点集确定单元还用于:In some embodiments, the corresponding node set determining unit is also used for:
若故障放置组在另一存储池中对应的放置组有多个,则确定每个放置组的各成员所在节点,得到多个对应节点集;If the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;
在多个对应节点集中选择一个对应节点集,并执行确定属于对应节点集但不属于对象节点集的非重合节点的步骤。Select one corresponding node set among the plurality of corresponding node sets, and perform the step of determining non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set.
在一些实施例中,成员替换单元还用于:In some embodiments, the member replacement unit is also used to:
若不存在非重合节点,或从非重合节点中选择的节点无可用磁盘,则在当前分布式系统中确定除对象节点集之外的其他节点,在其他节点中选择对应的放置组个数最少的节点后,执行在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换故障成员的步骤。If there are no non-overlapping nodes, or the node selected from the non-overlapping nodes has no available disk, determine other nodes except the object node set in the current distributed system, and select the corresponding placement group with the smallest number among the other nodes. After selecting the node, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the failed member with the currently selected disk.
在一些实施例中,故障处理模块还包括:In some embodiments, the fault handling module also includes:
数据恢复单元,用于在用当前所选磁盘替换故障成员之后,将故障成员中的数据恢复至当前所选磁盘。The data recovery unit is used to recover the data in the failed member to the currently selected disk after replacing the failed member with the currently selected disk.
在一些实施例中,还包括:In some embodiments, it also includes:
主成员选择模块,用于在相互对应的第一放置组和第二放置组中分别选择节点相同的成员作为相应放置组的主成员。The main member selection module is used to select members with the same nodes in the first placement group and the second placement group that correspond to each other as main members of the corresponding placement group.
本申请还提供了一种分布式存储系统,其特征在于,包括多个节点,每个节点包括:多个磁盘;This application also provides a distributed storage system, which is characterized in that it includes multiple nodes, and each node includes: multiple disks;
其中,所有磁盘中的一部分磁盘构成上述任一项的第一存储池,另一部分磁盘构成上述任一项的第二存储池。Among all the disks, a part of the disks constitutes the first storage pool of any of the above items, and the other part of the disks constitutes the second storage pool of any of the above items.
在一些实施例中,第一存储池的各磁盘性能高于第二存储池的各磁盘性能。In some embodiments, the performance of each disk of the first storage pool is higher than the performance of each disk of the second storage pool.
本申请还提供了一种电子设备,包括:This application also provides an electronic device, including:
存储器,用于存储计算机程序;Memory, used to store computer programs;
处理器,用于执行计算机程序,以实现前述公开的放置组成员选择方法。A processor is configured to execute a computer program to implement the aforementioned disclosed placement group member selection method.
本申请还提供了一种非易失性计算机可读存储介质,用于保存计算机程序,其中,计算 机程序被处理器执行时实现前述公开的放置组成员选择方法。This application also provides a non-volatile computer-readable storage medium for saving a computer program, wherein the calculation When the machine program is executed by the processor, the aforementioned disclosed placement group member selection method is implemented.
通过以上方案可知,本申请提供了一种放置组成员选择方法,包括:确定第一存储池中的任一个第一放置组在第二存储池中对应的放置组集合;其中,第一存储池包括多个第一放置组,第二存储池包括多个第二放置组;第一存储池中的放置组总数小于第二存储池中的放置组总数;在放置组集合中任选一个第二放置组,并确定第一放置组的各成员所在的目标节点;若第一放置组的成员个数不小于第二放置组的成员个数,则在目标节点中选择N个节点;N为第二放置组的成员个数;在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个成员。It can be seen from the above solution that this application provides a method for selecting placement group members, which includes: determining the placement group set corresponding to any first placement group in the first storage pool in the second storage pool; wherein, the first storage pool Including multiple first placement groups, the second storage pool includes multiple second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool; any second placement group set is selected Place the group and determine the target node where each member of the first placement group is located; if the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes in the target node; N is the The number of members of the second placement group; select a disk in each of the N nodes to obtain N members of the second placement group.
可见,本申请以放置组总数较少的存储池中的一个放置组为基准,能够为该放置组在另一存储池中对应的放置组选择成员。在进行选择时,先确定作为基准的第一放置组的各成员所在的目标节点,如果第一放置组的成员个数不小于要选择成员的第二放置组的成员个数,则表明目标节点的个数足以支持分布第二放置组的各成员,因此直接在目标节点中选择N个节点,N第二放置组的成员个数,之后在N个节点中的每个节点中选择一个磁盘,就可以选择到第二放置组的N个成员,那么第二放置组的这N个成员所在的节点就与目标节点重合,后续为第一放置组和第二放置组指定主成员时,此二者的主成员就大概率在一个节点上。而相互对应的放置组的主成员在同一节点上时,这两个放置组的数据转发就只需在同一节点上完成,而不用经过网络,由此就可以提高相互对应的放置组的数据转发效率。It can be seen that this application uses a placement group in a storage pool with a smaller total number of placement groups as a benchmark to select members for the placement group corresponding to the placement group in another storage pool. When making a selection, first determine the target node where each member of the first placement group as a benchmark is located. If the number of members of the first placement group is not less than the number of members of the second placement group whose members are to be selected, the target node is indicated. The number of is enough to support the distribution of each member of the second placement group, so directly select N nodes in the target node, N the number of members of the second placement group, and then select a disk in each of the N nodes. You can select N members of the second placement group, then the nodes where the N members of the second placement group are located coincide with the target node. When subsequently specifying primary members for the first placement group and the second placement group, these two The main member of the node is most likely to be on one node. When the primary members of the corresponding placement groups are on the same node, the data forwarding of the two placement groups only needs to be completed on the same node without going through the network. This can improve the data forwarding of the corresponding placement groups. efficiency.
相应地,本申请提供的一种放置组成员选择装置、设备及可读存储介质,也同样具有上述技术效果。Correspondingly, the placement group member selection device, equipment and readable storage medium provided by this application also have the above technical effects.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only This is an embodiment of the present application. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without exerting creative efforts.
图1为本申请实施例提供的一种放置组成员选择方法的流程图;Figure 1 is a flow chart of a method for selecting placement group members provided by an embodiment of the present application;
图2为本申请实施例提供的两个存储池中的PG对应关系的示意图;Figure 2 is a schematic diagram of the correspondence between PGs in two storage pools provided by the embodiment of the present application;
图3为本申请实施例提供的一种放置组成员选择的示意图;Figure 3 is a schematic diagram of placement group member selection provided by an embodiment of the present application;
图4为本申请实施例提供的一种故障处理的示意图;Figure 4 is a schematic diagram of fault processing provided by an embodiment of the present application;
图5为本申请实施例提供的另一种故障处理的示意图; Figure 5 is a schematic diagram of another fault processing provided by the embodiment of the present application;
图6为本申请实施例提供的一种放置组成员选择装置的示意图;Figure 6 is a schematic diagram of a placement group member selection device provided by an embodiment of the present application;
图7为本申请实施例提供的一种电子设备的示意图;Figure 7 is a schematic diagram of an electronic device provided by an embodiment of the present application;
图8为本申请实施例提供的一种非易失性计算机可读存储介质的结构示意图。FIG. 8 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
目前,当分布式存储系统的压力达到一定程度时,相互对应的放置组的转发效率会降低;若达到网络瓶颈,转发的速度会被限制,那么分布式存储集群的性能就达不到预期。为此,本申请提供了一种放置组成员选择方案,能够提高相互对应的放置组的数据转发效率。Currently, when the pressure on the distributed storage system reaches a certain level, the forwarding efficiency of the corresponding placement groups will decrease; if a network bottleneck is reached, the forwarding speed will be limited, and the performance of the distributed storage cluster will not meet expectations. To this end, this application provides a placement group member selection scheme that can improve the data forwarding efficiency of corresponding placement groups.
参见图1所示,本申请实施例提供了一种放置组成员选择方法,包括:As shown in Figure 1, an embodiment of the present application provides a method for selecting placement group members, including:
S101、确定第一存储池中的任一个第一放置组在第二存储池中对应的放置组集合。S101. Determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool.
其中,第一存储池包括多个第一放置组,第二存储池包括多个第二放置组;第一存储池中的放置组总数小于第二存储池中的放置组总数。假设第一存储池用A表示,其中的各个第一放置组为:A1~A4,共4个第一放置组;第二存储池用B表示,其中的各个第二放置组为:B1~A8,共8个第二放置组;那么,第一存储池A中的一个第一放置组对应第二存储池B中的两个第二放置组,对应关系为:A1对应B1和B5,A2对应B2和B6,A3对应B3和B7,A4对应B4和B8。据此,A1对应的放置组集合为{B1,B5},A2对应的放置组集合为{B2,B6},A3对应的放置组集合为{B3,B7},A4对应的放置组集合为{B4,B8}。第一存储池可以为高速缓存池,第二存储池为低速存储池。Wherein, the first storage pool includes multiple first placement groups, and the second storage pool includes multiple second placement groups; the total number of placement groups in the first storage pool is smaller than the total number of placement groups in the second storage pool. Assume that the first storage pool is represented by A, and the first placement groups in it are: A1~A4, a total of 4 first placement groups; the second storage pool is represented by B, and the second placement groups in it are: B1~A8 , a total of 8 second placement groups; then, one first placement group in the first storage pool A corresponds to two second placement groups in the second storage pool B. The corresponding relationship is: A1 corresponds to B1 and B5, and A2 corresponds to B2 and B6, A3 corresponds to B3 and B7, and A4 corresponds to B4 and B8. Accordingly, the set of placement groups corresponding to A1 is {B1,B5}, the set of placement groups corresponding to A2 is {B2,B6}, the set of placement groups corresponding to A3 is {B3,B7}, and the set of placement groups corresponding to A4 is { B4,B8}. The first storage pool may be a cache pool, and the second storage pool may be a low-speed storage pool.
S102、在放置组集合中任选一个第二放置组,并确定第一放置组的各成员所在的目标节点。S102. Select any second placement group in the placement group set, and determine the target node where each member of the first placement group is located.
放置组是用于放置对象的一个载体。一个放置组对应多个对象,一个对象对应一个磁盘。一个放置组的各成员即:该放置组对应的各磁盘。由于各磁盘分布在分布式系统的各个节点上,因此针对一个放置组,可确定该放置组中各成员所在的节点。A placement group is a vehicle for placing objects. One placement group corresponds to multiple objects, and one object corresponds to one disk. Each member of a placement group is: each disk corresponding to the placement group. Because each disk is distributed on each node of the distributed system, for a placement group, you can determine the node where each member of the placement group resides.
在分布式存储系统中,存储的内容按照固定大小切割,这一块固定大小的数据称为一个对象,PG(Placement Group,放置组)就是多个对象的聚合,是一个逻辑上的概念。PG和 对象之间通过一致性哈希算法对应起来。而PG与磁盘的对应是通过数据分布算法对应起来的。在一些实施例中的磁盘可以是OSD(Object-based Storage Device,对象存储设备)。In a distributed storage system, the stored content is cut into fixed sizes. This fixed-size piece of data is called an object. PG (Placement Group) is an aggregation of multiple objects and is a logical concept. PG and Objects are mapped to each other using a consistent hash algorithm. The correspondence between PG and disk is through data distribution algorithm. The disk in some embodiments may be an OSD (Object-based Storage Device).
S103、若第一放置组的成员个数不小于第二放置组的成员个数,则在目标节点中选择N个节点;N为第二放置组的成员个数。S103. If the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes from the target node; N is the number of members of the second placement group.
需要说明的是,一个放置组的成员个数取决于该放置组所属存储池的纠删设计和冗余副本数。若某一放置组所属存储池是4+2的纠删设计,则该放置组的成员个数为6;若某一放置组所属存储池的冗余副本数为3,则该放置组的成员个数为3。It should be noted that the number of members in a placement group depends on the erasure design and the number of redundant copies of the storage pool to which the placement group belongs. If the storage pool to which a placement group belongs has a 4+2 erasure design, the number of members of the placement group is 6; if the number of redundant copies of the storage pool to which a placement group belongs is 3, the number of members of the placement group The number is 3.
由于在一些实施例的目的是让两个存储池中相互对应的放置组中的各成员所在节点重复,因此,在确定第一放置组中各成员所在的节点后,在这些节点中为第一放置组对应的第二放置组选择成员,那么相互对应的第一放置组和第二放置组中的各成员所在节点就是重复的。Since the purpose in some embodiments is to duplicate the nodes where each member of the corresponding placement group is located in the two storage pools, therefore, after determining the node where each member of the first placement group is located, the first node among these nodes is If the second placement group corresponding to the placement group selects members, then the nodes of the members in the first placement group and the second placement group that correspond to each other are duplicates.
当第一放置组的成员个数M不小于第二放置组的成员个数N时(即M≥N),表明第一放置组的各成员所在的目标节点有M个,那么可在M个节点中直接选择N个节点,并在所选的N个节点中为第二放置组选择成员。在M≥N时,优先选择第二放置组个数较少的节点。因此,在一些实施例中,在目标节点中选择N个节点,包括:将目标节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N个节点;或将目标节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节点序列中的后N个节点。在某一个节点上为第二放置组选择一个成员后,该节点对应的第二放置组个数加一。相应的,该节点还对应记录有第一放置组个数,若在一个节点上为第一放置组选择一个成员,则该节点对应的第一放置组个数加一。When the number M of members in the first placement group is not less than the number N of members in the second placement group (that is, M≥N), it means that there are M target nodes where each member of the first placement group is located, then M Select N nodes directly among the nodes, and select members for the second placement group among the selected N nodes. When M≥N, nodes with a smaller number of second placement groups are given priority. Therefore, in some embodiments, selecting N nodes among the target nodes includes: arranging the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtaining a node sequence, and selecting the first N nodes in the node sequence. ; Or arrange the target nodes in descending order according to the number of the second placement groups corresponding to the nodes, obtain the node sequence, and select the last N nodes in the node sequence. After selecting a member for the second placement group on a certain node, the number of the second placement group corresponding to the node is increased by one. Correspondingly, the node also records the number of the first placement group. If a member is selected for the first placement group on a node, the number of the first placement group corresponding to the node is increased by one.
当第一放置组的成员个数M小于第二放置组的成员个数N时(即M<N),表明M个节点不足以为第二放置组选择成员,需要额外再找M-N个节点,以凑够N个节点,然后在N个节点中为第二放置组选择成员。在M<N时,优先选择第一放置组的各成员所在的M个节点,然后在系统的其他节点中选择第二放置组个数较少的节点,以凑够N个节点。因此,在一些实施例中,还包括:若第一放置组的成员个数小于第二放置组的成员个数,则在当前分布式系统中确定除目标节点之外的其他节点,在其他节点中选择节点使所选节点与目标节点的个数之和为N后,执行在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个成员的步骤。When the number M of members in the first placement group is less than the number N of members in the second placement group (i.e. M < N), it means that M nodes are not enough to select members for the second placement group, and additional M-N nodes need to be found. Collect enough N nodes, and then select members for the second placement group among the N nodes. When M < N, M nodes where each member of the first placement group is located are given priority, and then nodes with a smaller number of second placement groups are selected from other nodes in the system to make up N nodes. Therefore, in some embodiments, the method further includes: if the number of members of the first placement group is less than the number of members of the second placement group, determining other nodes except the target node in the current distributed system, and on other nodes After selecting nodes so that the sum of the number of selected nodes and target nodes is N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
在一些实施例中,在其他节点中选择节点使所选节点与目标节点的个数之和为N,包括:将其他节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N-M个节点;M为第一放置组的成员个数;或将其他节点按照节点对应的第二放置组 个数降序排列,得到节点序列,并选择该节点序列中的后N-M个节点;M为第一放置组的成员个数。In some embodiments, selecting a node among other nodes so that the sum of the numbers of the selected node and the target node is N includes: arranging the other nodes in ascending order according to the number of the second placement group corresponding to the node to obtain a node sequence, and Select the first NM nodes in the node sequence; M is the number of members of the first placement group; or place other nodes according to the second placement group corresponding to the node. Arrange the numbers in descending order to obtain the node sequence, and select the last NM nodes in the node sequence; M is the number of members of the first placement group.
S104、在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个成员。S104. Select a disk from each of the N nodes to obtain N members of the second placement group.
在所选节点中选择磁盘时,也优先选择第二放置组个数较少的磁盘。也就是说,任一个磁盘在某一次被选为第二放置组的成员后,该磁盘对应的第二放置组个数加一。相应的,该磁盘还对应记录有第一放置组个数,若一个磁盘某一次被选为第一放置组的成员,则该磁盘对应的第一放置组个数加一。因此,在一些实施例中,在N个节点中的每个节点中选择一个磁盘,包括:在N个节点中的每个节点中选择对应的第二放置组个数最少的磁盘。When selecting disks in the selected node, disks with a smaller number of second placement groups are also given priority. That is to say, after any disk is selected as a member of the second placement group at a certain time, the number of the second placement group corresponding to the disk is increased by one. Correspondingly, the disk also records the number of the first placement group. If a disk is selected as a member of the first placement group at a time, the number of the first placement group corresponding to the disk is increased by one. Therefore, in some embodiments, selecting a disk in each of the N nodes includes: selecting a disk with the smallest number of corresponding second placement groups in each of the N nodes.
在一些实施例中,在放置组集合中任选一个第二放置组之后,若放置组集合中还有其他未被选择的第二放置组,则在N个节点中的每个节点中为其他未被选择的第二放置组选择成员。也就是说,放置组集合中其他未被选择的第二放置组按照已确定成员的第二放置组选择成员。假设针对放置组集合{B1,B5},先确定了B1的成员,且在为B1选择成员时确定的N个节点为:D1~DN,那么直接在D1~DN中为B5选择成员。即:在D1~DN的每个节点中选择第二放置组个数最少的一个磁盘,那么所选的N个磁盘即为B5的成员。当然在选择后,相应节点和相应磁盘对应的第二放置组个数加一。其中,N个节点D1~DN可以是:M≥N时,在M个节点中选择的N个节点;或M<N时,M个节点与额外选择的M-N个节点之和。In some embodiments, after selecting any second placement group in the placement group set, if there are other unselected second placement groups in the placement group set, then in each of the N nodes, other Select members of the second placement group that are not selected. That is to say, other unselected second placement groups in the placement group set select members according to the second placement group whose members have been determined. Assume that for the placement group set {B1, B5}, the members of B1 are first determined, and the N nodes determined when selecting members for B1 are: D1 ~ DN, then directly select members for B5 from D1 ~ DN. That is: select the disk with the smallest number of second placement groups in each node from D1 to DN, then the selected N disks are members of B5. Of course, after selection, the number of the second placement group corresponding to the corresponding node and the corresponding disk is increased by one. Among them, the N nodes D1 to DN can be: when M≥N, the N nodes selected from the M nodes; or when M<N, the sum of the M nodes and the additionally selected M-N nodes.
按照一些实施例中,选够N个节点后,在每个节点中选择一个磁盘,就能选到N个磁盘,这N个磁盘就是第二放置组的N个成员。如此一来,相互对应的第一放置组和第二放置组的成员所在的节点就有重合,后续为第一放置组和第二放置组指定主成员时,此二者的主成员就大概率能在一个节点上。相互对应的放置组的主成员在同一节点上时,这两个放置组的数据转发就只需在同一节点上完成,而不用经过网络,由此就可以提高相互对应的放置组的数据转发效率。According to some embodiments, after N nodes are selected, N disks can be selected by selecting a disk in each node, and these N disks are the N members of the second placement group. In this way, the nodes where the members of the first placement group and the second placement group that correspond to each other are located overlap. When subsequently specifying primary members for the first placement group and the second placement group, there is a high probability that the primary members of the two placement groups will be the same. can be on one node. When the primary members of the corresponding placement groups are on the same node, the data forwarding of the two placement groups only needs to be completed on the same node without going through the network. This can improve the data forwarding efficiency of the corresponding placement groups. .
基于上述实施例,需要说明的是,若任一存储池中的成员故障,则确定故障成员所属的故障放置组,并将故障放置组的各成员所在的节点组成对象节点集;确定故障放置组在另一存储池中对应的放置组,并将该放置组的各成员所在节点组成对应节点集;确定属于对应节点集但不属于对象节点集的非重合节点;在非重合节点中选择对应的放置组个数最少的节点,在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换故障成员。Based on the above embodiment, it should be noted that if a member in any storage pool fails, determine the fault placement group to which the faulty member belongs, and configure the nodes where each member of the fault placement group is located to form an object node set; determine the fault placement group The corresponding placement group in another storage pool, and the nodes where each member of the placement group is located form a corresponding node set; determine the non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set; select the corresponding node among the non-overlapping nodes Place the node with the smallest number of placement groups, select the corresponding disk with the smallest number of placement groups in the selected node, and replace the faulty member with the currently selected disk.
在一些实施例中,将故障放置组的各成员所在的节点组成对象节点集,包括:确定故障放置组的各成员所在的对象节点,从对象节点中删除故障成员所在节点,并将剩余节点组成 对象节点集。需要说明的是,无论是否从对象节点中删除故障成员所在节点,都不影响本申请的实现,只是删除与否对系统均衡存在影响。其中,若从对象节点中删除故障成员所在节点,那么就可能在故障成员所在节点中重新选择另一磁盘来替换故障成员,后续在进行数据恢复时,既需要从故障成员所在节点读取待恢复数据,又需要从其他节点读取待恢复数据,同时还需要将恢复得到的数据写入故障成员所在节点。因此在整个数据恢复过程中,各个节点都有参与,系统整体压力比较均衡。若不从对象节点中删除故障成员所在节点,那么就不会在故障成员所在节点中重新选择另一磁盘来替换故障成员,而是在其他节点选择磁盘来替换故障成员,后续在进行数据恢复时,只需从故障成员所在节点读取待恢复数据,故障成员所在节点无需进行其他操作,因此故障成员所在节点相对其他节点较空闲。In some embodiments, forming an object node set with nodes where each member of the fault placement group is located includes: determining the object node where each member of the fault placement group is located, deleting the node where the fault member is located from the object node, and forming the remaining nodes into Object node set. It should be noted that regardless of whether the node where the faulty member is located is deleted from the object node, it will not affect the implementation of this application, but whether the deletion will have an impact on the system balance. Among them, if the node where the failed member is located is deleted from the object node, it is possible to select another disk in the node where the failed member is located to replace the failed member. When performing subsequent data recovery, it is necessary to read the data to be restored from the node where the failed member is located. Data needs to be read from other nodes to be recovered, and the recovered data needs to be written to the node where the failed member is located. Therefore, during the entire data recovery process, all nodes are involved, and the overall system pressure is relatively balanced. If the node where the failed member is located is not deleted from the object node, then another disk will not be re-selected on the node where the failed member is located to replace the failed member. Instead, a disk will be selected on another node to replace the failed member. During subsequent data recovery, , only need to read the data to be recovered from the node where the failed member is located, and the node where the failed member is located does not need to perform other operations, so the node where the failed member is located is relatively idle compared to other nodes.
在一些实施例中,若故障放置组在另一存储池中对应的放置组有多个,则确定每个放置组的各成员所在节点,得到多个对应节点集;在多个对应节点集中选择一个对应节点集,并执行确定属于对应节点集但不属于对象节点集的非重合节点的步骤。In some embodiments, if the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets; select among the multiple corresponding node sets A corresponding node set, and performs the steps of determining non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set.
在一些实施例中,若不存在非重合节点,或从非重合节点中选择的节点无可用磁盘,则在当前分布式系统中确定除对象节点集之外的其他节点,在其他节点中选择对应的放置组个数最少的节点后,执行在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换故障成员的步骤。In some embodiments, if there are no non-overlapping nodes, or the node selected from the non-overlapping nodes has no available disk, other nodes other than the object node set are determined in the current distributed system, and the corresponding nodes are selected among the other nodes. After selecting the node with the smallest number of placement groups, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the faulty member with the currently selected disk.
在一些实施例中,用当前所选磁盘替换故障成员之后,还包括:将故障成员中的数据恢复至当前所选磁盘。In some embodiments, after replacing the failed member with the currently selected disk, the method further includes: restoring data in the failed member to the currently selected disk.
基于上述实施例,需要说明的是,还包括:在相互对应的第一放置组和第二放置组中分别选择节点相同的成员作为相应放置组的主成员,以使第一放置组和第二放置组的主成员在一个节点上,这两个放置组的数据转发就只需在同一节点上完成,而不用经过网络,由此就可以提高相互对应的放置组的数据转发效率。Based on the above embodiment, it should be noted that the method further includes: selecting members with the same nodes in the first placement group and the second placement group that correspond to each other as the main members of the corresponding placement group, so that the first placement group and the second placement group If the main member of a placement group is on one node, the data forwarding of the two placement groups only needs to be completed on the same node without going through the network. This can improve the data forwarding efficiency of the corresponding placement groups.
下述实施例以缓存池和数据池为例进行方案介绍。缓存池和数据池创建完成后,其中的PG需要进行绑定操作,在绑定时本实施例对PG数目较多的存储池内的PG分布进行调整,其中:已知存储池创建完成后存储池内的PG已均衡分布。以PG数较少的存储池作为基准池(base_pool),另一存储池称为该基准池的绑定池(tier_pool)。The following embodiment uses the cache pool and the data pool as examples to introduce the solution. After the cache pool and data pool are created, the PGs in them need to be bound. During binding, this embodiment adjusts the PG distribution in the storage pool with a large number of PGs. It is known that after the storage pool is created, the PG distribution in the storage pool is The PGs have been evenly distributed. The storage pool with a smaller number of PGs is used as the base pool (base_pool), and the other storage pool is called the bound pool (tier_pool) of the base pool.
由于存储池中PG个数为2的整数次幂,所以即便两个存储池的PG数目不同,也是成2的整数倍的关系。由此可知下列PG对应关系总是成立:以PG个数较少的存储池pool1作为基准,PG个数较多的存储池内的PG可按照pool1的PG数进行切分,切分的每一份中的PG数与pool1的 PG数相等,现在将每一份都与pool1的PG进行一一对应,即可得到PG的对应关系。例如:将pool2中4096个PG按照1024分为4份,每份分别和pool1的1024个PG对应,得PG的对应关系,可参照图2。Since the number of PGs in the storage pool is an integer power of 2, even if the number of PGs in the two storage pools is different, the relationship is still an integer multiple of 2. It can be seen that the following PG correspondence is always true: taking pool1, a storage pool with a small number of PGs, as a benchmark, the PGs in a storage pool with a large number of PGs can be divided according to the number of PGs in pool1, and each divided portion The number of PGs in and pool1 The number of PGs is equal. Now, each copy is matched one-to-one with the PG of pool1, and the corresponding relationship of PG can be obtained. For example: Divide the 4096 PGs in pool2 into 4 parts according to 1024, and each part corresponds to the 1024 PGs in pool1 respectively. To obtain the corresponding relationship between PGs, please refer to Figure 2.
在调整PG分布之前,设磁盘和节点被两个存储池的引用计数(被PG选择的次数)均为0。也即:一个磁盘对应有两个引用计数,一个引用计数用于记录被基准池中的PG选择的次数(即上述实施例的“磁盘对应的第一放置组个数”),另一个引用计数用于记录被绑定池中的PG选择的次数(即上述实施例的“磁盘对应的第二放置组个数”)。Before adjusting the PG distribution, assume that the reference counts of disks and nodes by the two storage pools (the number of times selected by PG) are both 0. That is to say: a disk corresponds to two reference counts, one reference count is used to record the number of times it is selected by the PG in the benchmark pool (that is, the "number of first placement groups corresponding to the disk" in the above embodiment), and the other reference count Used to record the number of times selected by the PG in the bound pool (ie, the "number of second placement groups corresponding to the disks" in the above embodiment).
调整PG分布时,遍历到基准池中的一个PG(记为base_pg)后,按照图2所示的对应关系确定当前base_pg在绑定池中的对应的tier_pg,如果base_pg在绑定池中对应多个tier_pg,则选择PG ID(Identity document,身份标识)最小的一个。遍历当前base_pg的成员,并获取各成员所在的节点id,将该节点id插入到一个数列S中,该数列S的长度为tier_pg的成员个数;当该数列S被插满但还有剩余节点id没有被处理,选择小于当前数列S中的最大引用计数的节点id插入,同时删除当前数列S中的最大引用计数的节点id,直至所有节点id被处理,如此可获得一个内有节点id的数列S。如果base_pg各成员所在的节点id都插入到数列S,但数列S未满,则往里插入UNDEF,直至其满。之后据此数列S为PG ID最小的一个tier_pg(记为X)进行成员的选择,过程为:遍历数列S,如果数列S的某一位置为节点id,则选择该节点内引用计数最小的磁盘作为X的一个成员,同时该磁盘和该节点的引用计数加1;如果数列S的某一位置为UNDEF,则在系统中的其他节点中选择引用计数最小的节点,并在引用计数最小的节点中选择引用计数最小的磁盘作为X的一个成员,同时该磁盘和该节点的引用计数加1。数列S被遍历完后,可为X选到各个成员。When adjusting the PG distribution, after traversing to a PG in the benchmark pool (recorded as base_pg), determine the corresponding tier_pg of the current base_pg in the binding pool according to the corresponding relationship shown in Figure 2. If base_pg corresponds to multiple tiers in the binding pool, For each tier_pg, select the one with the smallest PG ID (Identity document). Traverse the members of the current base_pg, obtain the node ID of each member, and insert the node ID into an array S. The length of the array S is the number of members of the tier_pg; when the array S is filled but there are remaining nodes The id has not been processed. Select the node id that is smaller than the maximum reference count in the current array S to insert, and delete the node id with the maximum reference count in the current array S until all node ids are processed. In this way, a node id containing the node id can be obtained. Sequence S. If the node id of each member of base_pg is inserted into the array S, but the array S is not full, insert UNDEF into it until it is full. Then, the member is selected according to the tier_pg (denoted as As a member of The disk with the smallest reference count is selected as a member of X, and the reference counts of the disk and the node are increased by 1. After the sequence S is traversed, each member can be selected for X.
如图3所示,假设pool1(基准池)中的1号PG(图3中的1.1)的成员1、10、20所在节点分别为节点1、节点2、节点3,而按照上述流程为pool2(绑定池)中与1.1对应的2.1所确定的由节点标识组成的数列为:节点1、节点2、节点3、UNDEF、UNDEF、UNDEF。据此数列为pool2中的1号PG:2.1,确定的各成员为:2、11、21、31、41、51。那么这些成员的引用计数均由0递增为1,同时,数列中各节点的引用计数均由0递增为1。As shown in Figure 3, assume that the nodes 1, 10, and 20 of members 1, 10, and 20 of PG No. 1 (1.1 in Figure 3) in pool1 (base pool) are node 1, node 2, and node 3 respectively, and according to the above process, pool2 The number consisting of node identifiers determined by 2.1 corresponding to 1.1 in the (binding pool) is: node 1, node 2, node 3, UNDEF, UNDEF, UNDEF. According to this data, the number 1 in pool2 is PG: 2.1, and the determined members are: 2, 11, 21, 31, 41, 51. Then the reference counts of these members are increased from 0 to 1, and at the same time, the reference counts of each node in the array are increased from 0 to 1.
对于base_pg在绑定池中对应的、除X以外的其他tier_pg(记为Y),按照数列S为其选择成员。同理,遍历数列S,如果数列S的某一位置为节点id,则选择该节点内引用计数最小的磁盘作为Y的一个成员,同时该磁盘和该节点的引用计数加1;如果数列S的某一位置为UNDEF,则在系统中的其他节点中选择引用计数最小的节点,并在引用计数最小的节点中选择引用计数最小的磁盘作为Y的一个成员,同时该磁盘和该节点的引用计数加1。数列S被遍 历完后,可为Y选到各个成员。For other tier_pgs (denoted as Y) corresponding to base_pg in the binding pool except X, select members for them according to the sequence S. Similarly, traverse the sequence S. If a position in the sequence S is a node id, select the disk with the smallest reference count in the node as a member of Y, and increase the reference count of the disk and the node by 1; if a position in the sequence S is UNDEF, select the node with the smallest reference count from other nodes in the system, and select the disk with the smallest reference count from the node with the smallest reference count as a member of Y, and increase the reference count of the disk and the node by 1. The sequence S is traversed. After the process is completed, you can select each member for Y.
按照上述即可确定绑定池中各PG的成员,并且使两个存储池中相互对应的PG中的各成员所在节点尽可能重复,为二者的主成员位于同一节点提供便利条件。According to the above, the members of each PG in the binding pool can be determined, and the nodes of each member of the corresponding PG in the two storage pools should be as duplicate as possible, providing convenient conditions for the main members of the two to be located on the same node.
如果基准池中的成员故障,首先获取故障成员的id,并据此id确定故障成员所在节点id,收集故障成员所属PG(记为R1)的各成员所在节点id,并去除故障成员所在节点id,组成对象节点集。确定R1在绑定池中对应的某一个tier_pg,获取该tier_pg中的所有成员的节点id,组成对应节点集。找到所有满足在对应节点集中但不在对象节点集中的节点id,如果有多个,则选择引用计数最小的节点,并在该节点内选择一个引用计数最小的磁盘;如果找不到满足条件的节点id或满足条件的节点中没有可用的磁盘供其选择,则记为UNDEF。针对UNDEF,在系统里的其他节点中选择、与已选节点不重合的、引用计数最小的节点,并在该节点内选择一个引用计数最小的磁盘。同时,相应磁盘和相应节点的引用计数加1。如此可使新选的磁盘尽量属于对应PG的成员所在节点上,可见故障处理后,相互对应的两个PG的成员所在的节点也尽可能重合。If a member in the benchmark pool fails, first obtain the ID of the failed member, determine the node ID of the failed member based on this ID, collect the node IDs of each member of the PG (denoted as R1) to which the failed member belongs, and remove the node ID of the failed member. , forming an object node set. Determine a certain tier_pg corresponding to R1 in the binding pool, obtain the node IDs of all members in the tier_pg, and form a corresponding node set. Find all node IDs that are in the corresponding node set but not in the object node set. If there are multiple, select the node with the smallest reference count, and select a disk with the smallest reference count within the node; if no node that meets the condition is found If there is no available disk for selection in the id or node that meets the conditions, it is recorded as UNDEF. For UNDEF, select the node with the smallest reference count among other nodes in the system that does not coincide with the selected node, and select a disk with the smallest reference count within the node. At the same time, the reference count of the corresponding disk and the corresponding node is increased by 1. In this way, the newly selected disk can try to belong to the node where the corresponding PG member is located. It can be seen that after the fault is handled, the nodes where the two corresponding PG members are located will overlap as much as possible.
如图4所示,基准池中的base_pg的各成员所在节点为:节点1、节点2、节点3,且base_pg的其中一个成员故障,且该故障成员所在节点为节点1,而在绑定池中与该base_pg对应的tier_pg的各成员所在节点为:节点1、节点2、节点3、节点4、节点5、节点6。按照上述原理,找出在tier_pg成员所在节点但不在base_pg成员所在节点的节点,其结果为:节点1、节点4、节点5、节点6,然后在这些节点中选择引用计数最小的节点,并在所选节点中选择一个引用计数最小的磁盘,来base_pg中的替换故障成员。As shown in Figure 4, the nodes of each member of base_pg in the benchmark pool are: node 1, node 2, and node 3, and one of the members of base_pg fails, and the node where the failed member is located is node 1, and in the binding pool The nodes of each member of the tier_pg corresponding to the base_pg are: node 1, node 2, node 3, node 4, node 5, and node 6. According to the above principle, find the node that is in the node where the tier_pg member is located but not in the node where the base_pg member is located. The result is: node 1, node 4, node 5, node 6, and then select the node with the smallest reference count among these nodes, and Select a disk with the smallest reference count among the selected nodes to replace the failed member in base_pg.
如果绑定池中的成员故障,首先获取故障成员的id,并据此id确定故障成员所在节点id,收集故障成员所属PG(记为R2)的各成员所在节点id,并去除故障成员所在节点id,组成对象节点集。确定R2在基准池中对应的一个base_pg,获取该base_pg中的所有成员的节点id,组成对应节点集。找到所有满足在对象节点集中但不在对应节点集中的节点id,如果有多个,则选择引用计数最小的节点,并在该节点内选择一个引用计数最小的磁盘;如果找不到满足条件的节点id或满足条件的节点中没有可用的磁盘供其选择,则记为UNDEF。针对UNDEF,在系统里的其他节点中选择、与已选节点不重合的、引用计数最小的节点,并在该节点内选择一个引用计数最小的磁盘。同时,相应磁盘和相应节点的引用计数加1。如此可使新选的磁盘尽量属于对应PG的成员所在节点上,可见故障处理后,相互对应的两个PG的成员所在的节点也尽可能重合。If a member in the binding pool fails, first obtain the ID of the failed member, determine the node ID of the failed member based on this ID, collect the node IDs of each member of the PG (denoted as R2) to which the failed member belongs, and remove the node where the failed member is located. id, which constitutes the object node set. Determine a base_pg corresponding to R2 in the benchmark pool, obtain the node IDs of all members in the base_pg, and form a corresponding node set. Find all node IDs that are in the object node set but not in the corresponding node set. If there are multiple, select the node with the smallest reference count, and select a disk with the smallest reference count within the node; if no node that meets the condition is found If there is no available disk for selection in the id or node that meets the conditions, it is recorded as UNDEF. For UNDEF, select the node with the smallest reference count among other nodes in the system that does not coincide with the selected node, and select a disk with the smallest reference count within the node. At the same time, the reference count of the corresponding disk and the corresponding node is increased by 1. In this way, the newly selected disk can try to belong to the node where the corresponding PG member is located. It can be seen that after the fault is handled, the nodes where the two corresponding PG members are located will overlap as much as possible.
如图5所示,基准池中的base_pg的各成员所在节点为:节点1、节点2、节点3,在绑定 池中与该base_pg对应的tier_pg的各成员所在节点为:节点1、节点2、节点3、节点4、节点5、节点6,但tier_pg中位于节点1的成员故障,按照上述原理,找出在base_pg成员所在节点但不在tier_pg成员所在节点的节点,其结果为:节点1,然后在节点1中选择选择一个引用计数最小的磁盘,来替换tier_pg中的故障成员。As shown in Figure 5, the nodes of each member of base_pg in the benchmark pool are: node 1, node 2, and node 3. When binding The nodes of each member of the tier_pg corresponding to the base_pg in the pool are: node 1, node 2, node 3, node 4, node 5, node 6, but the member of the tier_pg located at node 1 fails. According to the above principle, find out the The node where the base_pg member is located but not the node where the tier_pg member is located, the result is: node 1, and then select a disk with the smallest reference count in node 1 to replace the failed member in tier_pg.
可见,在一些实施例中,提供了一种在分布式存储中为放置组进行成员优化的方案,尽量保证两个存储池中相互对应的PG的成员可以选择到同一个节点上,为后续选主成员提供便利。在成员故障时,对PG成员的选择优化也能避免多余重构的产生,适应性更广泛,尽量减少了业务数据通过网络进行转发的次数,减小了网络的压力,提高了存储集群的性能,提高了产品竞争力。It can be seen that in some embodiments, a solution for optimizing members of placement groups in distributed storage is provided to try to ensure that members of corresponding PGs in the two storage pools can be selected to the same node, and provide better support for subsequent selections. Main members provide convenience. When a member fails, the optimization of PG member selection can also avoid the occurrence of redundant reconstruction, has wider adaptability, minimizes the number of times business data is forwarded through the network, reduces network pressure, and improves the performance of the storage cluster. , improving product competitiveness.
下面对本申请实施例提供的一种放置组成员选择装置进行介绍,下文描述的一种放置组成员选择装置与上文描述的一种放置组成员选择方法可以相互参照。The following is an introduction to a placement group member selection device provided by an embodiment of the present application. The placement group member selection device described below and the placement group member selection method described above may be mutually referenced.
参见图6所示,本申请实施例提供了一种放置组成员选择装置,包括:Referring to Figure 6, an embodiment of the present application provides a placement group member selection device, which includes:
确定模块601,用于确定第一存储池中的任一个第一放置组在第二存储池中对应的放置组集合;其中,第一存储池包括多个第一放置组,第二存储池包括多个第二放置组;第一存储池中的放置组总数小于第二存储池中的放置组总数;Determining module 601 is used to determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes Multiple second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;
放置组选择模块602,用于在放置组集合中任选一个第二放置组,并确定第一放置组的各成员所在的目标节点;The placement group selection module 602 is used to select any second placement group in the placement group set and determine the target node where each member of the first placement group is located;
节点选择模块603,用于若第一放置组的成员个数不小于第二放置组的成员个数,则在目标节点中选择N个节点;N为第二放置组的成员个数;The node selection module 603 is used to select N nodes in the target node if the number of members of the first placement group is not less than the number of members of the second placement group; N is the number of members of the second placement group;
成员选择模块604,用于在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个成员。The member selection module 604 is used to select a disk in each of the N nodes to obtain N members of the second placement group.
在一些实施例中,节点选择模块用于:In some embodiments, the node selection module is used to:
将目标节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N个节点;Arrange the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtain the node sequence, and select the first N nodes in the node sequence;
或,or,
将目标节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节点序列中的后N个节点。Arrange the target nodes in descending order according to the number of the second placement groups corresponding to the nodes to obtain a node sequence, and select the last N nodes in the node sequence.
在一些实施例中,成员选择模块用于:In some embodiments, the member selection module is used to:
在N个节点中的每个节点中选择对应的第二放置组个数最少的磁盘。 Select the disk with the smallest number of corresponding second placement groups from each node among the N nodes.
在一些实施例中,还包括:In some embodiments, it also includes:
另一节点选择模块,用于若第一放置组的成员个数小于第二放置组的成员个数,则在当前分布式系统中确定除目标节点之外的其他节点,在其他节点中选择节点使所选节点与目标节点的个数之和为N后,执行在N个节点中的每个节点中选择一个磁盘,得到第二放置组的N个成员的步骤。Another node selection module is used to determine other nodes except the target node in the current distributed system if the number of members of the first placement group is less than the number of members of the second placement group, and select nodes among the other nodes. After making the sum of the number of selected nodes and target nodes equal to N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
在一些实施例中,另一节点选择模块用于:In some embodiments, another node selection module is used to:
将其他节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N-M个节点;M为第一放置组的成员个数;Arrange other nodes in ascending order according to the number of the second placement group corresponding to the node to obtain the node sequence, and select the first N-M nodes in the node sequence; M is the number of members of the first placement group;
或,or,
将其他节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节点序列中的后N-M个节点;M为第一放置组的成员个数。Arrange the other nodes in descending order according to the number of the second placement group corresponding to the node to obtain the node sequence, and select the last N-M nodes in the node sequence; M is the number of members of the first placement group.
在一些实施例中,还包括:In some embodiments, it also includes:
其他第二放置组的选择成员模块,用于在放置组集合中任选一个第二放置组之后,若放置组集合中还有其他未被选择的第二放置组,则在N个节点中的每个节点中为其他未被选择的第二放置组选择成员。The selection member module of other second placement groups is used to select any second placement group in the placement group set. If there are other unselected second placement groups in the placement group set, then among the N nodes Select members in each node for otherwise unselected secondary placement groups.
在一些实施例中,还包括:故障处理模块,故障处理模块包括:In some embodiments, it also includes: a fault processing module, and the fault processing module includes:
对象节点集确定单元,用于若任一存储池中的成员故障,则确定故障成员所属的故障放置组,并将故障放置组的各成员所在的节点组成对象节点集;The object node set determination unit is used to determine the fault placement group to which the faulty member belongs if a member in any storage pool fails, and configure the nodes where each member of the fault placement group is located to form an object node set;
对应节点集确定单元,用于确定故障放置组在另一存储池中对应的放置组,并将该放置组的各成员所在节点组成对应节点集;The corresponding node set determination unit is used to determine the placement group corresponding to the fault placement group in another storage pool, and configure the nodes where each member of the placement group is located to form a corresponding node set;
非重合节点确定单元,用于确定属于对应节点集但不属于对象节点集的非重合节点;The non-overlapping node determination unit is used to determine the non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;
成员替换单元,用于在非重合节点中选择对应的放置组个数最少的节点,在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换故障成员。The member replacement unit is used to select the node with the smallest number of corresponding placement groups among non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the failed member with the currently selected disk.
在一些实施例中,对象节点集确定单元用于:In some embodiments, the object node set determination unit is used for:
确定故障放置组的各成员所在的对象节点,从对象节点中删除故障成员所在节点,并将剩余节点组成对象节点集。Determine the object node where each member of the fault placement group is located, delete the node where the fault member is located from the object nodes, and form the remaining nodes into an object node set.
在一些实施例中,对应节点集确定单元还用于:In some embodiments, the corresponding node set determining unit is also used for:
若故障放置组在另一存储池中对应的放置组有多个,则确定每个放置组的各成员所在节点,得到多个对应节点集;If the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;
在多个对应节点集中选择一个对应节点集,并执行确定属于对应节点集但不属于对象节 点集的非重合节点的步骤。Select a corresponding node set among multiple corresponding node sets, and perform a determination that it belongs to the corresponding node set but does not belong to the object node. Steps for non-coincident nodes of point sets.
在一些实施例中,成员替换单元还用于:In some embodiments, the member replacement unit is also used to:
若不存在非重合节点,或从非重合节点中选择的节点无可用磁盘,则在当前分布式系统中确定除对象节点集之外的其他节点,在其他节点中选择对应的放置组个数最少的节点后,执行在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换故障成员的步骤。If there are no non-overlapping nodes, or the node selected from the non-overlapping nodes has no available disk, determine other nodes except the object node set in the current distributed system, and select the corresponding placement group with the smallest number among the other nodes. After selecting the node, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the failed member with the currently selected disk.
在一些实施例中,故障处理模块还包括:In some embodiments, the fault handling module also includes:
数据恢复单元,用于在用当前所选磁盘替换故障成员之后,将故障成员中的数据恢复至当前所选磁盘。The data recovery unit is used to recover the data in the failed member to the currently selected disk after replacing the failed member with the currently selected disk.
在一些实施例中,还包括:In some embodiments, it also includes:
主成员选择模块,用于在相互对应的第一放置组和第二放置组中分别选择节点相同的成员作为相应放置组的主成员。The main member selection module is used to select members with the same nodes in the first placement group and the second placement group that correspond to each other as main members of the corresponding placement group.
其中,关于各个模块、单元的工作过程可以参考前述实施例中提供的相应内容,在此不再进行赘述。Regarding the working process of each module and unit, reference can be made to the corresponding content provided in the foregoing embodiments, and details will not be described again here.
可见,在一些实施例中,提供了一种放置组成员选择装置,能够使两个存储池中相互对应的放置组中的各成员所在节点重复,使二者的主成员尽可能在一个节点上,这两个放置组的数据转发也就能在同一节点上完成,而不用经过网络,由此就可以提高相互对应的放置组的数据转发效率。It can be seen that in some embodiments, a device for selecting placement group members is provided, which can duplicate the nodes where each member of the corresponding placement groups in the two storage pools is located, so that the main members of the two storage pools can be on one node as much as possible. , the data forwarding of these two placement groups can be completed on the same node without going through the network, thus improving the data forwarding efficiency of the corresponding placement groups.
下面对本申请实施例提供的一种分布式存储系统进行介绍,下文描述的一种分布式存储系统与上文描述的一种放置组成员选择方法及装置可以相互参照。The following is an introduction to a distributed storage system provided by embodiments of the present application. The distributed storage system described below and the placement group member selection method and device described above may be referred to each other.
本申请实施例提供了一种分布式存储系统,包括多个节点,每个节点包括:多个磁盘;其中,所有磁盘中的一部分磁盘构成上述任一实施例的第一存储池,另一部分磁盘构成上述任一实施例的第二存储池。Embodiments of the present application provide a distributed storage system, including multiple nodes, each node including: multiple disks; wherein a part of all disks constitute the first storage pool in any of the above embodiments, and the other part of the disks Constitute the second storage pool in any of the above embodiments.
在一种示例中,第一存储池的各磁盘性能高于第二存储池的各磁盘性能。例如第一存储池为高速缓存池,第二存储池为低速存储池。In one example, the performance of each disk in the first storage pool is higher than the performance of each disk in the second storage pool. For example, the first storage pool is a cache pool, and the second storage pool is a low-speed storage pool.
下面对本申请实施例提供的一种电子设备进行介绍,下文描述的一种电子设备与上文描述的一种放置组成员选择方法及装置可以相互参照。An electronic device provided by an embodiment of the present application is introduced below. The electronic device described below and the placement group member selection method and device described above may be referred to each other.
参见图7所示,本申请实施例提供了一种电子设备,包括: Referring to Figure 7, an embodiment of the present application provides an electronic device, including:
存储器701,用于保存计算机程序;Memory 701, used to store computer programs;
处理器702,用于执行计算机程序,以实现上述任意实施例公开的方法。The processor 702 is used to execute computer programs to implement the methods disclosed in any of the above embodiments.
进一步的,本申请实施例还提供了一种服务器来作为上述电子设备。该服务器,可以包括:至少一个处理器、至少一个存储器、电源、通信接口、输入输出接口和通信总线。其中,存储器用于存储计算机程序,计算机程序由处理器加载并执行,以实现前述任一实施例公开的放置组成员选择方法中的相关步骤。Furthermore, embodiments of the present application also provide a server as the above-mentioned electronic device. The server may include: at least one processor, at least one memory, a power supply, a communication interface, an input/output interface and a communication bus. The memory is used to store a computer program, and the computer program is loaded and executed by the processor to implement relevant steps in the placement group member selection method disclosed in any of the foregoing embodiments.
在一些实施例中,电源用于为服务器上的各硬件设备提供工作电压;通信接口能够为服务器创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行限定;输入输出接口,用于获取外界输入数据或向外界输出数据,其接口类型可以根据应用需要进行选取,在此不进行限定。In some embodiments, the power supply is used to provide operating voltage for each hardware device on the server; the communication interface can create a data transmission channel between the server and external devices, and the communication protocol it follows is applicable to the technical solution of this application. Any communication protocol, which is not limited here; the input and output interface is used to obtain external input data or output data to the outside world, and its interface type can be selected according to application needs, which is not limited here.
另外,存储器作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源包括操作系统、计算机程序及数据等,存储方式可以是短暂存储或者永久存储。In addition, the memory, as a carrier for resource storage, can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc. The resources stored thereon include operating systems, computer programs and data, etc. The storage method can be short-term storage or permanent storage.
其中,操作系统用于管理与控制服务器上的各硬件设备以及计算机程序,以实现处理器对存储器中数据的运算与处理,其可以是Windows Server、Netware、Unix、Linux等。计算机程序除了包括能够用于完成前述任一实施例公开的放置组成员选择方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。数据除了可以包括虚拟机等数据外,还可以包括虚拟机的开发商信息等数据。Among them, the operating system is used to manage and control various hardware devices and computer programs on the server to implement the processor's calculation and processing of data in the memory. It can be Windows Server, Netware, Unix, Linux, etc. In addition to computer programs that can be used to complete the placement group member selection method disclosed in any of the foregoing embodiments, the computer program can further include computer programs that can be used to complete other specific tasks. In addition to data such as the virtual machine, the data may also include data such as the developer information of the virtual machine.
进一步的,本申请实施例还提供了一种终端来作为上述电子设备。该终端可以包括但不限于智能手机、平板电脑、笔记本电脑或台式电脑等。Furthermore, embodiments of the present application also provide a terminal as the above-mentioned electronic device. The terminal may include but is not limited to a smartphone, a tablet, a laptop or a desktop computer.
通常,在一些实施例中的终端包括有:处理器和存储器。Generally, a terminal in some embodiments includes: a processor and a memory.
其中,处理器可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。 The processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor can be implemented using at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array). . The processor can also include a main processor and a co-processor. The main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); a co-processor is used A low-power processor used to process data in standby mode. In some embodiments, the processor may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing content that needs to be displayed on the display screen. In some embodiments, the processor may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
存储器可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器至少用于存储以下计算机程序,其中,该计算机程序被处理器加载并执行之后,能够实现前述任一实施例公开的由终端侧执行的放置组成员选择方法中的相关步骤。另外,存储器所存储的资源还可以包括操作系统和数据等,存储方式可以是短暂存储或者永久存储。其中,操作系统可以包括Windows、Unix、Linux等。数据可以包括但不限于应用程序的更新信息。Memory may include one or more computer-readable storage media, which may be non-transitory. Memory may also include high-speed random access memory, and non-volatile memory, such as one or more disk storage devices, flash memory storage devices. In some embodiments, the memory is at least used to store the following computer program, wherein, after the computer program is loaded and executed by the processor, the correlation in the placement group member selection method executed by the terminal side disclosed in any of the foregoing embodiments can be implemented. step. In addition, the resources stored in the memory may also include operating systems and data, and the storage method may be short-term storage or permanent storage. Among them, the operating system can include Windows, Unix, Linux, etc. Data may include, but is not limited to, application update information.
在一些实施例中,终端还可包括有显示屏、输入输出接口、通信接口、传感器、电源以及通信总线。In some embodiments, the terminal may also include a display screen, an input and output interface, a communication interface, a sensor, a power supply, and a communication bus.
在一种示例中,电子设备可以是分布式系统中任一具备管理功能的节点。In one example, the electronic device may be any node with management functions in the distributed system.
下面对本申请实施例提供的一种非易失性计算机可读存储介质进行介绍,下文描述的一种非易失性计算机可读存储介质与上文描述的一种放置组成员选择方法、装置及设备可以相互参照。The following is an introduction to a non-volatile computer-readable storage medium provided by embodiments of the present application. The non-volatile computer-readable storage medium described below and the placement group member selection method and device described above are Devices can reference each other.
参照图8,图8为本申请提供的一种非易失性计算机可读存储介质的结构示意图。Referring to Figure 8, Figure 8 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by the present application.
一种非易失性计算机可读存储介质8,用于保存计算机程序81,其中,计算机程序81被处理器执行时实现前述实施例公开的放置组成员选择方法。A non-volatile computer-readable storage medium 8 is used to store a computer program 81, wherein when the computer program 81 is executed by a processor, the method for selecting placement group members disclosed in the aforementioned embodiments is implemented.
本申请涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。"First", "second", "third", "fourth", etc. (if present) mentioned in this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method or apparatus that encompasses a series of steps or units need not be limited to those steps or units expressly listed. , but may include other steps or elements not expressly listed or inherent to such processes, methods or apparatuses.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。 It should be noted that descriptions involving “first”, “second”, etc. in this application are for descriptive purposes only and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of indicated technical features. . Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In addition, the technical solutions in various embodiments can be combined with each other, but it must be based on the realization by those of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be realized, it should be considered that such a combination of technical solutions does not exist. , nor is it within the scope of protection required by this application.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的非易失性计算机可读存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. Any other form of non-volatile computer-readable storage medium known to the public.
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申请的限制。 This article uses specific examples to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method and the core idea of the present application; at the same time, for those of ordinary skill in the art, based on this application There will be changes in the specific implementation and scope of application of the ideas. In summary, the content of this description should not be understood as a limitation of this application.

Claims (20)

  1. 一种放置组成员选择方法,其特征在于,包括:A method for selecting placement group members, which is characterized by including:
    确定第一存储池中的任一个第一放置组在第二存储池中对应的放置组集合;其中,所述第一存储池包括多个第一放置组,所述第二存储池包括多个第二放置组;所述第一存储池中的放置组总数小于所述第二存储池中的放置组总数;Determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes multiple a second placement group; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;
    在所述放置组集合中任选一个第二放置组,并确定所述第一放置组的各成员所在的目标节点;Select any second placement group in the placement group set, and determine the target node where each member of the first placement group is located;
    若所述第一放置组的成员个数不小于所述第二放置组的成员个数,则在所述目标节点中选择N个节点;N为所述第二放置组的成员个数;If the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes from the target node; N is the number of members of the second placement group;
    在所述N个节点中的每个节点中选择一个磁盘,得到所述第二放置组的N个成员。Select one disk in each of the N nodes to obtain N members of the second placement group.
  2. 根据权利要求1所述的方法,其特征在于,所述在所述目标节点中选择N个节点,包括:The method according to claim 1, wherein selecting N nodes among the target nodes includes:
    将所述目标节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N个节点;Arrange the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtain a node sequence, and select the first N nodes in the node sequence;
    或,or,
    将所述目标节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节点序列中的后N个节点。Arrange the target nodes in descending order according to the number of the second placement groups corresponding to the nodes to obtain a node sequence, and select the last N nodes in the node sequence.
  3. 根据权利要求1所述的方法,其特征在于,所述在所述N个节点中的每个节点中选择一个磁盘,包括:The method of claim 1, wherein selecting a disk in each of the N nodes includes:
    在所述N个节点中的每个节点中选择对应的第二放置组个数最少的磁盘。Select the disk with the smallest number of corresponding second placement groups from each node among the N nodes.
  4. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    若所述第一放置组的成员个数小于所述第二放置组的成员个数,则在当前分布式系统中确定除所述目标节点之外的其他节点,在其他节点中选择节点使所选节点与所述目标节点的个数之和为N后,执行所述在所述N个节点中的每个节点中选择一个磁盘,得到所述第二放置组的N个成员的步骤。If the number of members of the first placement group is less than the number of members of the second placement group, other nodes other than the target node are determined in the current distributed system, and nodes are selected among the other nodes so that all After the sum of the number of selected nodes and the target node is N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
  5. 根据权利要求4所述的方法,其特征在于,所述在其他节点中选择节点使所选节点与所述目标节点的个数之和为N,包括:The method according to claim 4, characterized in that, selecting a node among other nodes so that the sum of the number of the selected node and the target node is N, includes:
    将其他节点按照节点对应的第二放置组个数升序排列,得到节点序列,并选择该节点序列中的前N-M个节点;M为所述第一放置组的成员个数;Arrange other nodes in ascending order according to the number of the second placement group corresponding to the node, obtain the node sequence, and select the first N-M nodes in the node sequence; M is the number of members of the first placement group;
    或,or,
    将其他节点按照节点对应的第二放置组个数降序排列,得到节点序列,并选择该节 点序列中的后N-M个节点;M为所述第一放置组的成员个数。Arrange other nodes in descending order according to the number of the second placement group corresponding to the node, obtain the node sequence, and select the node Point the last NM nodes in the sequence; M is the number of members of the first placement group.
  6. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    在所述放置组集合中任选一个第二放置组之后,若所述放置组集合中还有其他未被选择的第二放置组,则在所述N个节点中的每个节点中为其他未被选择的第二放置组选择成员。After selecting any second placement group in the placement group set, if there are other unselected second placement groups in the placement group set, then in each of the N nodes, other Select members of the second placement group that are not selected.
  7. 根据权利要求1至6任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1 to 6, further comprising:
    若任一存储池中的成员故障,则确定故障成员所属的故障放置组,并将所述故障放置组的各成员所在的节点组成对象节点集;If a member in any storage pool fails, determine the fault placement group to which the faulty member belongs, and configure the nodes where each member of the fault placement group is located to form an object node set;
    确定所述故障放置组在另一存储池中对应的放置组,并将该放置组的各成员所在节点组成对应节点集;Determine the placement group corresponding to the fault placement group in another storage pool, and configure the nodes where each member of the placement group is located to form a corresponding node set;
    确定属于所述对应节点集但不属于所述对象节点集的非重合节点;Determine non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;
    在所述非重合节点中选择对应的放置组个数最少的节点,在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换所述故障成员。Select the node with the smallest number of corresponding placement groups among the non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the faulty member with the currently selected disk.
  8. 根据权利要求7所述的方法,其特征在于,所述将所述故障放置组的各成员所在的节点组成对象节点集,包括:The method according to claim 7, characterized in that said forming an object node set by nodes where each member of the fault placement group is located includes:
    确定所述故障放置组的各成员所在的对象节点,从所述对象节点中删除所述故障成员所在节点,并将剩余节点组成所述对象节点集。Determine the object node where each member of the fault placement group is located, delete the node where the fault member is located from the object node, and form the remaining nodes into the object node set.
  9. 根据权利要求7所述的方法,其特征在于,还包括:The method according to claim 7, further comprising:
    若所述故障放置组在另一存储池中对应的放置组有多个,则确定每个放置组的各成员所在节点,得到多个对应节点集;If the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;
    在所述多个对应节点集中选择一个对应节点集,并执行所述确定属于所述对应节点集但不属于所述对象节点集的非重合节点的步骤。Select one corresponding node set from the plurality of corresponding node sets, and perform the step of determining non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set.
  10. 根据权利要求7所述的方法,其特征在于,还包括:The method according to claim 7, further comprising:
    若不存在所述非重合节点,或从所述非重合节点中选择的节点无可用磁盘,则在当前分布式系统中确定除所述对象节点集之外的其他节点,在其他节点中选择对应的放置组个数最少的节点后,执行所述在所选节点中选择对应的放置组个数最少的磁盘,用当前所选磁盘替换所述故障成员的步骤。If the non-overlapping node does not exist, or the node selected from the non-overlapping node has no available disk, other nodes other than the object node set are determined in the current distributed system, and the corresponding node is selected among the other nodes. After placing the node with the smallest number of placement groups, perform the steps of selecting the corresponding disk with the smallest number of placement groups among the selected nodes, and replacing the failed member with the currently selected disk.
  11. 根据权利要求7所述的方法,其特征在于,所述用当前所选磁盘替换所述故障成员之后,还包括:The method according to claim 7, characterized in that after replacing the failed member with the currently selected disk, it further includes:
    将所述故障成员中的数据恢复至当前所选磁盘。Recover data from the failed member to the currently selected disk.
  12. 根据权利要求1至6任一项所述的方法,其特征在于,还包括: The method according to any one of claims 1 to 6, further comprising:
    在相互对应的第一放置组和第二放置组中分别选择节点相同的成员作为相应放置组的主成员。Select members with the same nodes in the first placement group and the second placement group that correspond to each other as the main members of the corresponding placement groups.
  13. 根据权利要求1所述的方法,其特征在于,所述第一存储池为高速缓存池,所述第二存储池为低速存储池。The method of claim 1, wherein the first storage pool is a cache pool, and the second storage pool is a low-speed storage pool.
  14. 根据权利要求1所述的方法,其特征在于,所述放置组是用于放置对象的载体,一个放置组对应多个对象,一个对象对应一个磁盘。The method according to claim 1, characterized in that the placement group is a carrier for placing objects, one placement group corresponds to multiple objects, and one object corresponds to one disk.
  15. 根据权利要求14所示的方法,其特征在于,各磁盘分布在分布式系统的各个节点。The method according to claim 14 is characterized in that each disk is distributed on each node of the distributed system.
  16. 根据权利要求1所述的方法,其特征在于,在一个节点上为第二放置组选择一个成员后,还包括:The method according to claim 1, characterized in that after selecting a member for the second placement group on a node, it further includes:
    该节点对应的第二放置组个数加一。The number of the second placement group corresponding to this node is increased by one.
  17. 根据权利要求16所述的方法,其特征在于,所述该节点还对应记录有第一放置组个数;若在一个节点上为第一放置组选择一个成员后,还包括:The method according to claim 16, characterized in that the node also records the number of the first placement group; if a member is selected for the first placement group on a node, it also includes:
    所述该节点对应的第一放置组个数加一。The number of the first placement group corresponding to the node is increased by one.
  18. 一种放置组成员选择装置,包括:A device for selecting placement group members, including:
    确定模块,用于确定第一存储池中的任一个第一放置组在第二存储池中对应的放置组集合;其中,所述第一存储池包括多个第一放置组,所述第二存储池包括多个第二放置组;所述第一存储池中的放置组总数小于所述第二存储池中的放置组总数;Determining module, configured to determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second The storage pool includes a plurality of second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;
    放置组选择模块,用于在所述放置组集合中任选一个第二放置组,并确定所述第一放置组的各成员所在的目标节点;A placement group selection module, configured to select a second placement group in the placement group set and determine the target node where each member of the first placement group is located;
    节点选择模块,用于若所述第一放置组的成员个数不小于所述第二放置组的成员个数,则在所述目标节点中选择N个节点;N为所述第二放置组的成员个数;A node selection module, configured to select N nodes in the target node if the number of members of the first placement group is not less than the number of members of the second placement group; N is the second placement group the number of members;
    成员选择模块,用于在所述N个节点中的每个节点中选择一个磁盘,得到所述第二放置组的N个成员。A member selection module is used to select a disk in each of the N nodes to obtain N members of the second placement group.
  19. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    存储器,用于存储计算机程序;Memory, used to store computer programs;
    处理器,用于执行所述计算机程序,以实现如权利要求1至17任一项所述的方法。A processor, configured to execute the computer program to implement the method according to any one of claims 1 to 17.
  20. 一种非易失性计算机可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述的方法。 A non-volatile computer-readable storage medium, characterized in that it is used to store a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 17 is implemented.
PCT/CN2023/078429 2022-09-14 2023-02-27 Placement group member selection method and apparatus, device, and readable storage medium WO2024055529A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211112880.2A CN115202589B (en) 2022-09-14 2022-09-14 Placement group member selection method, device and equipment and readable storage medium
CN202211112880.2 2022-09-14

Publications (1)

Publication Number Publication Date
WO2024055529A1 true WO2024055529A1 (en) 2024-03-21

Family

ID=83571761

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/078429 WO2024055529A1 (en) 2022-09-14 2023-02-27 Placement group member selection method and apparatus, device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN115202589B (en)
WO (1) WO2024055529A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115202589B (en) * 2022-09-14 2023-02-24 浪潮电子信息产业股份有限公司 Placement group member selection method, device and equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110018800A (en) * 2019-04-12 2019-07-16 苏州浪潮智能科技有限公司 Group is put in order in distributed memory system selects main method, apparatus, equipment and medium
US20210365192A1 (en) * 2018-06-28 2021-11-25 Zhengzhou Yunhai Information Technology Co., Ltd. Method, system, and apparatus for allocating hard disks to placement group, and storage medium
CN113791730A (en) * 2021-08-16 2021-12-14 济南浪潮数据技术有限公司 Placement group adjusting method, system and device based on double storage pools and storage medium
CN114138181A (en) * 2021-10-24 2022-03-04 济南浪潮数据技术有限公司 Method, device, equipment and readable medium for placing, grouping and selecting owners in binding pool
CN115202589A (en) * 2022-09-14 2022-10-18 浪潮电子信息产业股份有限公司 Placement group member selection method, device, equipment and readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124289B (en) * 2019-12-06 2022-02-18 浪潮电子信息产业股份有限公司 Method, device and medium for selecting homing group members of distributed storage system
CN111880747B (en) * 2020-08-01 2022-11-08 广西大学 Automatic balanced storage method of Ceph storage system based on hierarchical mapping
CN112181736A (en) * 2020-09-23 2021-01-05 星辰天合(北京)数据科技有限公司 Distributed storage system and configuration method thereof
CN114756620A (en) * 2020-12-25 2022-07-15 深信服科技股份有限公司 Data storage method, distributed storage system and storage medium
CN114546286B (en) * 2022-02-27 2023-08-08 苏州浪潮智能科技有限公司 Method, system, storage medium and equipment for selecting members of homing group

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210365192A1 (en) * 2018-06-28 2021-11-25 Zhengzhou Yunhai Information Technology Co., Ltd. Method, system, and apparatus for allocating hard disks to placement group, and storage medium
CN110018800A (en) * 2019-04-12 2019-07-16 苏州浪潮智能科技有限公司 Group is put in order in distributed memory system selects main method, apparatus, equipment and medium
CN113791730A (en) * 2021-08-16 2021-12-14 济南浪潮数据技术有限公司 Placement group adjusting method, system and device based on double storage pools and storage medium
CN114138181A (en) * 2021-10-24 2022-03-04 济南浪潮数据技术有限公司 Method, device, equipment and readable medium for placing, grouping and selecting owners in binding pool
CN115202589A (en) * 2022-09-14 2022-10-18 浪潮电子信息产业股份有限公司 Placement group member selection method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN115202589A (en) 2022-10-18
CN115202589B (en) 2023-02-24

Similar Documents

Publication Publication Date Title
US9052824B2 (en) Content addressable stores based on sibling groups
CN103513938B (en) A kind of RAID RAID system expansion method and device
US20140379656A1 (en) System and Method for Maintaining a Cluster Setup
CN108733311B (en) Method and apparatus for managing storage system
CN104166606A (en) File backup method and main storage device
US8892836B2 (en) Automated migration to a new copy services target storage system to manage multiple relationships simultaneously while maintaining disaster recovery consistency
WO2019001521A1 (en) Data storage method, storage device, client and system
JP2012185687A (en) Control device, control method, and storage device
CN110058965B (en) Data reconstruction method and device in storage system
EP3739450A1 (en) Data processing method and apparatus, and computing device
WO2024055529A1 (en) Placement group member selection method and apparatus, device, and readable storage medium
CN115576505B (en) Data storage method, device and equipment and readable storage medium
CN110825543B (en) Method for quickly recovering data on fault storage device
CN104040512A (en) Method and device for processing storage space and non-volatile computer readable storage medium
US20170109246A1 (en) Database-level automatic storage management
US20200349081A1 (en) Method, apparatus and computer program product for managing metadata
CN107632781A (en) A kind of method and storage architecture of the more copy rapid verification uniformity of distributed storage
WO2019091349A1 (en) Data balancing method, apparatus and computer device
CN109840051A (en) A kind of date storage method and device of storage system
WO2024027140A1 (en) Data processing method and apparatus, and device, system and readable storage medium
CN111143113A (en) Method, electronic device and computer program product for copying metadata
US11003541B2 (en) Point-in-time copy on a remote system
US11216204B2 (en) Degraded redundant metadata, DRuM, technique
US11620080B2 (en) Data storage method, device and computer program product
US10809927B1 (en) Online conversion of storage layout

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23864272

Country of ref document: EP

Kind code of ref document: A1