WO2024055529A1

WO2024055529A1 - Placement group member selection method and apparatus, device, and readable storage medium

Info

Publication number: WO2024055529A1
Application number: PCT/CN2023/078429
Authority: WO
Inventors: 张凯; 孙润宇; 丁纯杰; 孟祥瑞
Original assignee: 浪潮电子信息产业股份有限公司
Priority date: 2022-09-14
Filing date: 2023-02-27
Publication date: 2024-03-21
Also published as: CN115202589A; CN115202589B

Abstract

The present application discloses a placement group member selection method and apparatus, a device, and a readable storage medium in the technical field of computers. According to the present application, target nodes where members of a first placement group serving as a reference are located are determined; if the number of the members of the first placement group is not less than the number of members of a second placement group for which members are to be selected, N nodes are selected from the target nodes; and a disk is selected from each of the N nodes to obtain N members of the second placement group, so that the nodes where the members of the second placement group are located coincide with the target nodes, main members of the first placement group and the second placement group can subsequently conveniently be made to be on the same node, and data forwarding of the two placement groups can be completed on the same node without using a network. Therefore, the data forwarding efficiency of placement groups corresponding to each other can be improved. The placement group member selection apparatus, the device, and the readable storage medium provided by the present application also have the described technical effects.

Description

Placement group member selection method, device, equipment and readable storage medium

Cross-references to related applications

This application requires the priority of the Chinese patent application submitted to the China Patent Office on September 14, 2022, with the application number 202211112880.2, and the application name is "Placement Group Member Selection Method, Device, Equipment and Readable Storage Medium", and its entire content incorporated herein by reference.

Technical field

The present application relates to the field of computer technology, and in particular to a method, device, equipment and readable storage medium for selecting placement group members.

Background technique

In a distributed storage scenario, there are corresponding placement groups in two bound storage pools. For example: placement group A1 in storage pool A corresponds to placement group B1 in storage pool B, then the primary member in placement group A1 can forward the data to be processed to placement group B1 for processing. Of course, the master member in placement group B1 can also forward the data to be processed to placement group A1 for processing. Among them, the primary member in a certain placement group is any member of the corresponding placement group. The number of members in a placement group depends on the erasure design of the current storage pool and the number of redundant replicas.

It should be noted that when the system pressure reaches a certain level, the forwarding efficiency of the corresponding placement groups will decrease; if the network bottleneck is reached, the forwarding speed will be limited, and the performance of the distributed storage cluster will not meet expectations.

Therefore, how to improve the data forwarding efficiency of corresponding placement groups is a problem that those skilled in the art need to solve.

Contents of the invention

In view of this, the purpose of this application is to provide a method, device, equipment and readable storage medium for selecting placement group members, so as to improve the data forwarding efficiency of corresponding placement groups. The plan is as follows:

This application provides a method for selecting placement group members, including:

Determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes multiple second placement groups ;The total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;

Select any second placement group in the placement group set and determine the target node where each member of the first placement group is located;

If the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes in the target node; N is the number of members of the second placement group;

Select a disk in each of the N nodes to get N members of the second placement group.

In some embodiments, N nodes are selected among the target nodes, including:

Arrange the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtain the node sequence, and select the first N nodes in the node sequence;

or,

Arrange the target nodes in descending order according to the number of the second placement groups corresponding to the nodes to obtain a node sequence, and select the last N nodes in the node sequence.

In some embodiments, selecting one disk in each of the N nodes includes:

Select the disk with the smallest number of corresponding second placement groups from each node among the N nodes.

In some embodiments, it also includes:

If the number of members of the first placement group is less than the number of members of the second placement group, determine other nodes except the target node in the current distributed system, and select nodes among the other nodes so that the selected node is consistent with the target node. After the sum of the numbers is N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.

In some embodiments, selecting nodes among other nodes so that the sum of the number of selected nodes and target nodes is N, includes:

Arrange other nodes in ascending order according to the number of the second placement group corresponding to the node to obtain the node sequence, and select the first N-M nodes in the node sequence; M is the number of members of the first placement group;

or,

Arrange the other nodes in descending order according to the number of the second placement group corresponding to the node to obtain the node sequence, and select the last N-M nodes in the node sequence; M is the number of members of the first placement group.

In some embodiments, it also includes:

After selecting any second placement group in the placement group set, if there are other unselected second placement groups in the placement group set, then the other unselected second placement groups will be selected in each of the N nodes. Place group selection members.

In some embodiments, it also includes:

If a member in any storage pool fails, determine the fault placement group to which the faulty member belongs, and combine the nodes where each member of the fault placement group is located to form an object node set;

Determine the placement group corresponding to the fault placement group in another storage pool, and combine the nodes where each member of the placement group is located to form a corresponding node set;

Determine non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;

Select the node with the smallest number of corresponding placement groups among non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the faulty member with the currently selected disk.

In some embodiments, the nodes where each member of the fault placement group is located form an object node set, including:

Determine the object node where each member of the fault placement group is located, delete the node where the fault member is located from the object nodes, and form the remaining nodes into an object node set.

In some embodiments, it also includes:

If the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;

Select one corresponding node set among the plurality of corresponding node sets, and perform the step of determining non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set.

In some embodiments, it also includes:

If there are no non-overlapping nodes, or the node selected from the non-overlapping nodes has no available disk, determine other nodes except the object node set in the current distributed system, and select the corresponding placement group with the smallest number among the other nodes. After selecting the node, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the failed member with the currently selected disk.

In some embodiments, after replacing the failed member with the currently selected disk, it also includes:

Restores data from the failed member to the currently selected disk.

In some embodiments, it also includes:

Select members with the same nodes in the first placement group and the second placement group that correspond to each other as the main members of the corresponding placement groups.

This application also provides a placement group member selection device, including:

The determination module is used to determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes multiple first placement groups. second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;

The placement group selection module is used to select a second placement group in the placement group set and determine the target node where each member of the first placement group is located;

A node selection module, used to select N nodes in the target node if the number of members of the first placement group is not less than the number of members of the second placement group; N is the number of members of the second placement group;

The member selection module is used to select a disk in each of the N nodes to obtain the N second placement group. member.

In some embodiments, the node selection module is used to:

or,

In some embodiments, the member selection module is used to:

In some embodiments, it also includes:

Another node selection module is used to determine other nodes except the target node in the current distributed system if the number of members of the first placement group is less than the number of members of the second placement group, and select nodes among the other nodes. After making the sum of the number of selected nodes and target nodes equal to N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.

In some embodiments, another node selection module is used to:

or,

In some embodiments, it also includes:

The selection member module of other second placement groups is used to select any second placement group in the placement group set. If there are other unselected second placement groups in the placement group set, then among the N nodes Select members in each node for otherwise unselected secondary placement groups.

In some embodiments, it also includes: a fault processing module, and the fault processing module includes:

The object node set determination unit is used to determine the fault placement group to which the faulty member belongs if a member in any storage pool fails, and configure the nodes where each member of the fault placement group is located to form an object node set;

The corresponding node set determination unit is used to determine the placement group corresponding to the fault placement group in another storage pool, and configure the nodes where each member of the placement group is located to form a corresponding node set;

The non-overlapping node determination unit is used to determine the non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;

The member replacement unit is used to select the node with the smallest number of corresponding placement groups among non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the failed member with the currently selected disk.

In some embodiments, the object node set determination unit is used for:

In some embodiments, the corresponding node set determining unit is also used for:

In some embodiments, the member replacement unit is also used to:

In some embodiments, the fault handling module also includes:

The data recovery unit is used to recover the data in the failed member to the currently selected disk after replacing the failed member with the currently selected disk.

In some embodiments, it also includes:

The main member selection module is used to select members with the same nodes in the first placement group and the second placement group that correspond to each other as main members of the corresponding placement group.

This application also provides a distributed storage system, which is characterized in that it includes multiple nodes, and each node includes: multiple disks;

Among all the disks, a part of the disks constitutes the first storage pool of any of the above items, and the other part of the disks constitutes the second storage pool of any of the above items.

In some embodiments, the performance of each disk of the first storage pool is higher than the performance of each disk of the second storage pool.

This application also provides an electronic device, including:

Memory, used to store computer programs;

A processor is configured to execute a computer program to implement the aforementioned disclosed placement group member selection method.

This application also provides a non-volatile computer-readable storage medium for saving a computer program, wherein the calculation When the machine program is executed by the processor, the aforementioned disclosed placement group member selection method is implemented.

It can be seen from the above solution that this application provides a method for selecting placement group members, which includes: determining the placement group set corresponding to any first placement group in the first storage pool in the second storage pool; wherein, the first storage pool Including multiple first placement groups, the second storage pool includes multiple second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool; any second placement group set is selected Place the group and determine the target node where each member of the first placement group is located; if the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes in the target node; N is the The number of members of the second placement group; select a disk in each of the N nodes to obtain N members of the second placement group.

It can be seen that this application uses a placement group in a storage pool with a smaller total number of placement groups as a benchmark to select members for the placement group corresponding to the placement group in another storage pool. When making a selection, first determine the target node where each member of the first placement group as a benchmark is located. If the number of members of the first placement group is not less than the number of members of the second placement group whose members are to be selected, the target node is indicated. The number of is enough to support the distribution of each member of the second placement group, so directly select N nodes in the target node, N the number of members of the second placement group, and then select a disk in each of the N nodes. You can select N members of the second placement group, then the nodes where the N members of the second placement group are located coincide with the target node. When subsequently specifying primary members for the first placement group and the second placement group, these two The main member of the node is most likely to be on one node. When the primary members of the corresponding placement groups are on the same node, the data forwarding of the two placement groups only needs to be completed on the same node without going through the network. This can improve the data forwarding of the corresponding placement groups. efficiency.

Correspondingly, the placement group member selection device, equipment and readable storage medium provided by this application also have the above technical effects.

Description of drawings

In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only This is an embodiment of the present application. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without exerting creative efforts.

Figure 1 is a flow chart of a method for selecting placement group members provided by an embodiment of the present application;

Figure 2 is a schematic diagram of the correspondence between PGs in two storage pools provided by the embodiment of the present application;

Figure 3 is a schematic diagram of placement group member selection provided by an embodiment of the present application;

Figure 4 is a schematic diagram of fault processing provided by an embodiment of the present application;

Figure 5 is a schematic diagram of another fault processing provided by the embodiment of the present application;

Figure 6 is a schematic diagram of a placement group member selection device provided by an embodiment of the present application;

Figure 7 is a schematic diagram of an electronic device provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

Currently, when the pressure on the distributed storage system reaches a certain level, the forwarding efficiency of the corresponding placement groups will decrease; if a network bottleneck is reached, the forwarding speed will be limited, and the performance of the distributed storage cluster will not meet expectations. To this end, this application provides a placement group member selection scheme that can improve the data forwarding efficiency of corresponding placement groups.

As shown in Figure 1, an embodiment of the present application provides a method for selecting placement group members, including:

S101. Determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool.

Wherein, the first storage pool includes multiple first placement groups, and the second storage pool includes multiple second placement groups; the total number of placement groups in the first storage pool is smaller than the total number of placement groups in the second storage pool. Assume that the first storage pool is represented by A, and the first placement groups in it are: A1~A4, a total of 4 first placement groups; the second storage pool is represented by B, and the second placement groups in it are: B1~A8 , a total of 8 second placement groups; then, one first placement group in the first storage pool A corresponds to two second placement groups in the second storage pool B. The corresponding relationship is: A1 corresponds to B1 and B5, and A2 corresponds to B2 and B6, A3 corresponds to B3 and B7, and A4 corresponds to B4 and B8. Accordingly, the set of placement groups corresponding to A1 is {B1,B5}, the set of placement groups corresponding to A2 is {B2,B6}, the set of placement groups corresponding to A3 is {B3,B7}, and the set of placement groups corresponding to A4 is { B4,B8}. The first storage pool may be a cache pool, and the second storage pool may be a low-speed storage pool.

S102. Select any second placement group in the placement group set, and determine the target node where each member of the first placement group is located.

A placement group is a vehicle for placing objects. One placement group corresponds to multiple objects, and one object corresponds to one disk. Each member of a placement group is: each disk corresponding to the placement group. Because each disk is distributed on each node of the distributed system, for a placement group, you can determine the node where each member of the placement group resides.

In a distributed storage system, the stored content is cut into fixed sizes. This fixed-size piece of data is called an object. PG (Placement Group) is an aggregation of multiple objects and is a logical concept. PG and Objects are mapped to each other using a consistent hash algorithm. The correspondence between PG and disk is through data distribution algorithm. The disk in some embodiments may be an OSD (Object-based Storage Device).

S103. If the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes from the target node; N is the number of members of the second placement group.

It should be noted that the number of members in a placement group depends on the erasure design and the number of redundant copies of the storage pool to which the placement group belongs. If the storage pool to which a placement group belongs has a 4+2 erasure design, the number of members of the placement group is 6; if the number of redundant copies of the storage pool to which a placement group belongs is 3, the number of members of the placement group The number is 3.

Since the purpose in some embodiments is to duplicate the nodes where each member of the corresponding placement group is located in the two storage pools, therefore, after determining the node where each member of the first placement group is located, the first node among these nodes is If the second placement group corresponding to the placement group selects members, then the nodes of the members in the first placement group and the second placement group that correspond to each other are duplicates.

When the number M of members in the first placement group is not less than the number N of members in the second placement group (that is, M≥N), it means that there are M target nodes where each member of the first placement group is located, then M Select N nodes directly among the nodes, and select members for the second placement group among the selected N nodes. When M≥N, nodes with a smaller number of second placement groups are given priority. Therefore, in some embodiments, selecting N nodes among the target nodes includes: arranging the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtaining a node sequence, and selecting the first N nodes in the node sequence. ; Or arrange the target nodes in descending order according to the number of the second placement groups corresponding to the nodes, obtain the node sequence, and select the last N nodes in the node sequence. After selecting a member for the second placement group on a certain node, the number of the second placement group corresponding to the node is increased by one. Correspondingly, the node also records the number of the first placement group. If a member is selected for the first placement group on a node, the number of the first placement group corresponding to the node is increased by one.

When the number M of members in the first placement group is less than the number N of members in the second placement group (i.e. M < N), it means that M nodes are not enough to select members for the second placement group, and additional M-N nodes need to be found. Collect enough N nodes, and then select members for the second placement group among the N nodes. When M < N, M nodes where each member of the first placement group is located are given priority, and then nodes with a smaller number of second placement groups are selected from other nodes in the system to make up N nodes. Therefore, in some embodiments, the method further includes: if the number of members of the first placement group is less than the number of members of the second placement group, determining other nodes except the target node in the current distributed system, and on other nodes After selecting nodes so that the sum of the number of selected nodes and target nodes is N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.

In some embodiments, selecting a node among other nodes so that the sum of the numbers of the selected node and the target node is N includes: arranging the other nodes in ascending order according to the number of the second placement group corresponding to the node to obtain a node sequence, and Select the first NM nodes in the node sequence; M is the number of members of the first placement group; or place other nodes according to the second placement group corresponding to the node. Arrange the numbers in descending order to obtain the node sequence, and select the last NM nodes in the node sequence; M is the number of members of the first placement group.

S104. Select a disk from each of the N nodes to obtain N members of the second placement group.

When selecting disks in the selected node, disks with a smaller number of second placement groups are also given priority. That is to say, after any disk is selected as a member of the second placement group at a certain time, the number of the second placement group corresponding to the disk is increased by one. Correspondingly, the disk also records the number of the first placement group. If a disk is selected as a member of the first placement group at a time, the number of the first placement group corresponding to the disk is increased by one. Therefore, in some embodiments, selecting a disk in each of the N nodes includes: selecting a disk with the smallest number of corresponding second placement groups in each of the N nodes.

In some embodiments, after selecting any second placement group in the placement group set, if there are other unselected second placement groups in the placement group set, then in each of the N nodes, other Select members of the second placement group that are not selected. That is to say, other unselected second placement groups in the placement group set select members according to the second placement group whose members have been determined. Assume that for the placement group set {B1, B5}, the members of B1 are first determined, and the N nodes determined when selecting members for B1 are: D1 ~ DN, then directly select members for B5 from D1 ~ DN. That is: select the disk with the smallest number of second placement groups in each node from D1 to DN, then the selected N disks are members of B5. Of course, after selection, the number of the second placement group corresponding to the corresponding node and the corresponding disk is increased by one. Among them, the N nodes D1 to DN can be: when M≥N, the N nodes selected from the M nodes; or when M<N, the sum of the M nodes and the additionally selected M-N nodes.

According to some embodiments, after N nodes are selected, N disks can be selected by selecting a disk in each node, and these N disks are the N members of the second placement group. In this way, the nodes where the members of the first placement group and the second placement group that correspond to each other are located overlap. When subsequently specifying primary members for the first placement group and the second placement group, there is a high probability that the primary members of the two placement groups will be the same. can be on one node. When the primary members of the corresponding placement groups are on the same node, the data forwarding of the two placement groups only needs to be completed on the same node without going through the network. This can improve the data forwarding efficiency of the corresponding placement groups. .

Based on the above embodiment, it should be noted that if a member in any storage pool fails, determine the fault placement group to which the faulty member belongs, and configure the nodes where each member of the fault placement group is located to form an object node set; determine the fault placement group The corresponding placement group in another storage pool, and the nodes where each member of the placement group is located form a corresponding node set; determine the non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set; select the corresponding node among the non-overlapping nodes Place the node with the smallest number of placement groups, select the corresponding disk with the smallest number of placement groups in the selected node, and replace the faulty member with the currently selected disk.

In some embodiments, forming an object node set with nodes where each member of the fault placement group is located includes: determining the object node where each member of the fault placement group is located, deleting the node where the fault member is located from the object node, and forming the remaining nodes into Object node set. It should be noted that regardless of whether the node where the faulty member is located is deleted from the object node, it will not affect the implementation of this application, but whether the deletion will have an impact on the system balance. Among them, if the node where the failed member is located is deleted from the object node, it is possible to select another disk in the node where the failed member is located to replace the failed member. When performing subsequent data recovery, it is necessary to read the data to be restored from the node where the failed member is located. Data needs to be read from other nodes to be recovered, and the recovered data needs to be written to the node where the failed member is located. Therefore, during the entire data recovery process, all nodes are involved, and the overall system pressure is relatively balanced. If the node where the failed member is located is not deleted from the object node, then another disk will not be re-selected on the node where the failed member is located to replace the failed member. Instead, a disk will be selected on another node to replace the failed member. During subsequent data recovery, , only need to read the data to be recovered from the node where the failed member is located, and the node where the failed member is located does not need to perform other operations, so the node where the failed member is located is relatively idle compared to other nodes.

In some embodiments, if the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets; select among the multiple corresponding node sets A corresponding node set, and performs the steps of determining non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set.

In some embodiments, if there are no non-overlapping nodes, or the node selected from the non-overlapping nodes has no available disk, other nodes other than the object node set are determined in the current distributed system, and the corresponding nodes are selected among the other nodes. After selecting the node with the smallest number of placement groups, perform the steps of selecting the disk with the smallest number of corresponding placement groups in the selected node and replacing the faulty member with the currently selected disk.

In some embodiments, after replacing the failed member with the currently selected disk, the method further includes: restoring data in the failed member to the currently selected disk.

Based on the above embodiment, it should be noted that the method further includes: selecting members with the same nodes in the first placement group and the second placement group that correspond to each other as the main members of the corresponding placement group, so that the first placement group and the second placement group If the main member of a placement group is on one node, the data forwarding of the two placement groups only needs to be completed on the same node without going through the network. This can improve the data forwarding efficiency of the corresponding placement groups.

The following embodiment uses the cache pool and the data pool as examples to introduce the solution. After the cache pool and data pool are created, the PGs in them need to be bound. During binding, this embodiment adjusts the PG distribution in the storage pool with a large number of PGs. It is known that after the storage pool is created, the PG distribution in the storage pool is The PGs have been evenly distributed. The storage pool with a smaller number of PGs is used as the base pool (base_pool), and the other storage pool is called the bound pool (tier_pool) of the base pool.

Since the number of PGs in the storage pool is an integer power of 2, even if the number of PGs in the two storage pools is different, the relationship is still an integer multiple of 2. It can be seen that the following PG correspondence is always true: taking pool1, a storage pool with a small number of PGs, as a benchmark, the PGs in a storage pool with a large number of PGs can be divided according to the number of PGs in pool1, and each divided portion The number of PGs in and pool1 The number of PGs is equal. Now, each copy is matched one-to-one with the PG of pool1, and the corresponding relationship of PG can be obtained. For example: Divide the 4096 PGs in pool2 into 4 parts according to 1024, and each part corresponds to the 1024 PGs in pool1 respectively. To obtain the corresponding relationship between PGs, please refer to Figure 2.

Before adjusting the PG distribution, assume that the reference counts of disks and nodes by the two storage pools (the number of times selected by PG) are both 0. That is to say: a disk corresponds to two reference counts, one reference count is used to record the number of times it is selected by the PG in the benchmark pool (that is, the "number of first placement groups corresponding to the disk" in the above embodiment), and the other reference count Used to record the number of times selected by the PG in the bound pool (ie, the "number of second placement groups corresponding to the disks" in the above embodiment).

When adjusting the PG distribution, after traversing to a PG in the benchmark pool (recorded as base_pg), determine the corresponding tier_pg of the current base_pg in the binding pool according to the corresponding relationship shown in Figure 2. If base_pg corresponds to multiple tiers in the binding pool, For each tier_pg, select the one with the smallest PG ID (Identity document). Traverse the members of the current base_pg, obtain the node ID of each member, and insert the node ID into an array S. The length of the array S is the number of members of the tier_pg; when the array S is filled but there are remaining nodes The id has not been processed. Select the node id that is smaller than the maximum reference count in the current array S to insert, and delete the node id with the maximum reference count in the current array S until all node ids are processed. In this way, a node id containing the node id can be obtained. Sequence S. If the node id of each member of base_pg is inserted into the array S, but the array S is not full, insert UNDEF into it until it is full. Then, the member is selected according to the tier_pg (denoted as As a member of The disk with the smallest reference count is selected as a member of X, and the reference counts of the disk and the node are increased by 1. After the sequence S is traversed, each member can be selected for X.

As shown in Figure 3, assume that the nodes 1, 10, and 20 of members 1, 10, and 20 of PG No. 1 (1.1 in Figure 3) in pool1 (base pool) are node 1, node 2, and node 3 respectively, and according to the above process, pool2 The number consisting of node identifiers determined by 2.1 corresponding to 1.1 in the (binding pool) is: node 1, node 2, node 3, UNDEF, UNDEF, UNDEF. According to this data, the number 1 in pool2 is PG: 2.1, and the determined members are: 2, 11, 21, 31, 41, 51. Then the reference counts of these members are increased from 0 to 1, and at the same time, the reference counts of each node in the array are increased from 0 to 1.

For other tier_pgs (denoted as Y) corresponding to base_pg in the binding pool except X, select members for them according to the sequence S. Similarly, traverse the sequence S. If a position in the sequence S is a node id, select the disk with the smallest reference count in the node as a member of Y, and increase the reference count of the disk and the node by 1; if a position in the sequence S is UNDEF, select the node with the smallest reference count from other nodes in the system, and select the disk with the smallest reference count from the node with the smallest reference count as a member of Y, and increase the reference count of the disk and the node by 1. The sequence S is traversed. After the process is completed, you can select each member for Y.

According to the above, the members of each PG in the binding pool can be determined, and the nodes of each member of the corresponding PG in the two storage pools should be as duplicate as possible, providing convenient conditions for the main members of the two to be located on the same node.

If a member in the benchmark pool fails, first obtain the ID of the failed member, determine the node ID of the failed member based on this ID, collect the node IDs of each member of the PG (denoted as R1) to which the failed member belongs, and remove the node ID of the failed member. , forming an object node set. Determine a certain tier_pg corresponding to R1 in the binding pool, obtain the node IDs of all members in the tier_pg, and form a corresponding node set. Find all node IDs that are in the corresponding node set but not in the object node set. If there are multiple, select the node with the smallest reference count, and select a disk with the smallest reference count within the node; if no node that meets the condition is found If there is no available disk for selection in the id or node that meets the conditions, it is recorded as UNDEF. For UNDEF, select the node with the smallest reference count among other nodes in the system that does not coincide with the selected node, and select a disk with the smallest reference count within the node. At the same time, the reference count of the corresponding disk and the corresponding node is increased by 1. In this way, the newly selected disk can try to belong to the node where the corresponding PG member is located. It can be seen that after the fault is handled, the nodes where the two corresponding PG members are located will overlap as much as possible.

As shown in Figure 4, the nodes of each member of base_pg in the benchmark pool are: node 1, node 2, and node 3, and one of the members of base_pg fails, and the node where the failed member is located is node 1, and in the binding pool The nodes of each member of the tier_pg corresponding to the base_pg are: node 1, node 2, node 3, node 4, node 5, and node 6. According to the above principle, find the node that is in the node where the tier_pg member is located but not in the node where the base_pg member is located. The result is: node 1, node 4, node 5, node 6, and then select the node with the smallest reference count among these nodes, and Select a disk with the smallest reference count among the selected nodes to replace the failed member in base_pg.

If a member in the binding pool fails, first obtain the ID of the failed member, determine the node ID of the failed member based on this ID, collect the node IDs of each member of the PG (denoted as R2) to which the failed member belongs, and remove the node where the failed member is located. id, which constitutes the object node set. Determine a base_pg corresponding to R2 in the benchmark pool, obtain the node IDs of all members in the base_pg, and form a corresponding node set. Find all node IDs that are in the object node set but not in the corresponding node set. If there are multiple, select the node with the smallest reference count, and select a disk with the smallest reference count within the node; if no node that meets the condition is found If there is no available disk for selection in the id or node that meets the conditions, it is recorded as UNDEF. For UNDEF, select the node with the smallest reference count among other nodes in the system that does not coincide with the selected node, and select a disk with the smallest reference count within the node. At the same time, the reference count of the corresponding disk and the corresponding node is increased by 1. In this way, the newly selected disk can try to belong to the node where the corresponding PG member is located. It can be seen that after the fault is handled, the nodes where the two corresponding PG members are located will overlap as much as possible.

As shown in Figure 5, the nodes of each member of base_pg in the benchmark pool are: node 1, node 2, and node 3. When binding The nodes of each member of the tier_pg corresponding to the base_pg in the pool are: node 1, node 2, node 3, node 4, node 5, node 6, but the member of the tier_pg located at node 1 fails. According to the above principle, find out the The node where the base_pg member is located but not the node where the tier_pg member is located, the result is: node 1, and then select a disk with the smallest reference count in node 1 to replace the failed member in tier_pg.

It can be seen that in some embodiments, a solution for optimizing members of placement groups in distributed storage is provided to try to ensure that members of corresponding PGs in the two storage pools can be selected to the same node, and provide better support for subsequent selections. Main members provide convenience. When a member fails, the optimization of PG member selection can also avoid the occurrence of redundant reconstruction, has wider adaptability, minimizes the number of times business data is forwarded through the network, reduces network pressure, and improves the performance of the storage cluster. , improving product competitiveness.

The following is an introduction to a placement group member selection device provided by an embodiment of the present application. The placement group member selection device described below and the placement group member selection method described above may be mutually referenced.

Referring to Figure 6, an embodiment of the present application provides a placement group member selection device, which includes:

Determining module 601 is used to determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes Multiple second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;

The placement group selection module 602 is used to select any second placement group in the placement group set and determine the target node where each member of the first placement group is located;

The node selection module 603 is used to select N nodes in the target node if the number of members of the first placement group is not less than the number of members of the second placement group; N is the number of members of the second placement group;

The member selection module 604 is used to select a disk in each of the N nodes to obtain N members of the second placement group.

In some embodiments, the node selection module is used to:

or,

In some embodiments, the member selection module is used to:

In some embodiments, it also includes:

In some embodiments, another node selection module is used to:

or,

In some embodiments, it also includes:

In some embodiments, the object node set determination unit is used for:

Select a corresponding node set among multiple corresponding node sets, and perform a determination that it belongs to the corresponding node set but does not belong to the object node. Steps for non-coincident nodes of point sets.

In some embodiments, the member replacement unit is also used to:

In some embodiments, the fault handling module also includes:

In some embodiments, it also includes:

Regarding the working process of each module and unit, reference can be made to the corresponding content provided in the foregoing embodiments, and details will not be described again here.

It can be seen that in some embodiments, a device for selecting placement group members is provided, which can duplicate the nodes where each member of the corresponding placement groups in the two storage pools is located, so that the main members of the two storage pools can be on one node as much as possible. , the data forwarding of these two placement groups can be completed on the same node without going through the network, thus improving the data forwarding efficiency of the corresponding placement groups.

The following is an introduction to a distributed storage system provided by embodiments of the present application. The distributed storage system described below and the placement group member selection method and device described above may be referred to each other.

Embodiments of the present application provide a distributed storage system, including multiple nodes, each node including: multiple disks; wherein a part of all disks constitute the first storage pool in any of the above embodiments, and the other part of the disks Constitute the second storage pool in any of the above embodiments.

In one example, the performance of each disk in the first storage pool is higher than the performance of each disk in the second storage pool. For example, the first storage pool is a cache pool, and the second storage pool is a low-speed storage pool.

An electronic device provided by an embodiment of the present application is introduced below. The electronic device described below and the placement group member selection method and device described above may be referred to each other.

Referring to Figure 7, an embodiment of the present application provides an electronic device, including:

Memory 701, used to store computer programs;

The processor 702 is used to execute computer programs to implement the methods disclosed in any of the above embodiments.

Furthermore, embodiments of the present application also provide a server as the above-mentioned electronic device. The server may include: at least one processor, at least one memory, a power supply, a communication interface, an input/output interface and a communication bus. The memory is used to store a computer program, and the computer program is loaded and executed by the processor to implement relevant steps in the placement group member selection method disclosed in any of the foregoing embodiments.

In some embodiments, the power supply is used to provide operating voltage for each hardware device on the server; the communication interface can create a data transmission channel between the server and external devices, and the communication protocol it follows is applicable to the technical solution of this application. Any communication protocol, which is not limited here; the input and output interface is used to obtain external input data or output data to the outside world, and its interface type can be selected according to application needs, which is not limited here.

In addition, the memory, as a carrier for resource storage, can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc. The resources stored thereon include operating systems, computer programs and data, etc. The storage method can be short-term storage or permanent storage.

Among them, the operating system is used to manage and control various hardware devices and computer programs on the server to implement the processor's calculation and processing of data in the memory. It can be Windows Server, Netware, Unix, Linux, etc. In addition to computer programs that can be used to complete the placement group member selection method disclosed in any of the foregoing embodiments, the computer program can further include computer programs that can be used to complete other specific tasks. In addition to data such as the virtual machine, the data may also include data such as the developer information of the virtual machine.

Furthermore, embodiments of the present application also provide a terminal as the above-mentioned electronic device. The terminal may include but is not limited to a smartphone, a tablet, a laptop or a desktop computer.

Generally, a terminal in some embodiments includes: a processor and a memory.

The processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor can be implemented using at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array). . The processor can also include a main processor and a co-processor. The main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); a co-processor is used A low-power processor used to process data in standby mode. In some embodiments, the processor may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing content that needs to be displayed on the display screen. In some embodiments, the processor may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.

Memory may include one or more computer-readable storage media, which may be non-transitory. Memory may also include high-speed random access memory, and non-volatile memory, such as one or more disk storage devices, flash memory storage devices. In some embodiments, the memory is at least used to store the following computer program, wherein, after the computer program is loaded and executed by the processor, the correlation in the placement group member selection method executed by the terminal side disclosed in any of the foregoing embodiments can be implemented. step. In addition, the resources stored in the memory may also include operating systems and data, and the storage method may be short-term storage or permanent storage. Among them, the operating system can include Windows, Unix, Linux, etc. Data may include, but is not limited to, application update information.

In some embodiments, the terminal may also include a display screen, an input and output interface, a communication interface, a sensor, a power supply, and a communication bus.

In one example, the electronic device may be any node with management functions in the distributed system.

The following is an introduction to a non-volatile computer-readable storage medium provided by embodiments of the present application. The non-volatile computer-readable storage medium described below and the placement group member selection method and device described above are Devices can reference each other.

Referring to Figure 8, Figure 8 is a schematic structural diagram of a non-volatile computer-readable storage medium provided by the present application.

A non-volatile computer-readable storage medium 8 is used to store a computer program 81, wherein when the computer program 81 is executed by a processor, the method for selecting placement group members disclosed in the aforementioned embodiments is implemented.

"First", "second", "third", "fourth", etc. (if present) mentioned in this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method or apparatus that encompasses a series of steps or units need not be limited to those steps or units expressly listed. , but may include other steps or elements not expressly listed or inherent to such processes, methods or apparatuses.

It should be noted that descriptions involving “first”, “second”, etc. in this application are for descriptive purposes only and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of indicated technical features. . Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In addition, the technical solutions in various embodiments can be combined with each other, but it must be based on the realization by those of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be realized, it should be considered that such a combination of technical solutions does not exist. , nor is it within the scope of protection required by this application.

Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.

The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. Any other form of non-volatile computer-readable storage medium known to the public.

This article uses specific examples to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method and the core idea of the present application; at the same time, for those of ordinary skill in the art, based on this application There will be changes in the specific implementation and scope of application of the ideas. In summary, the content of this description should not be understood as a limitation of this application.

Claims

A method for selecting placement group members, which is characterized by including:

Determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second storage pool includes multiple a second placement group; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;

Select any second placement group in the placement group set, and determine the target node where each member of the first placement group is located;

If the number of members of the first placement group is not less than the number of members of the second placement group, select N nodes from the target node; N is the number of members of the second placement group;

Select one disk in each of the N nodes to obtain N members of the second placement group.
The method according to claim 1, wherein selecting N nodes among the target nodes includes:

Arrange the target nodes in ascending order according to the number of the second placement groups corresponding to the nodes, obtain a node sequence, and select the first N nodes in the node sequence;

or,

Arrange the target nodes in descending order according to the number of the second placement groups corresponding to the nodes to obtain a node sequence, and select the last N nodes in the node sequence.
The method of claim 1, wherein selecting a disk in each of the N nodes includes:

Select the disk with the smallest number of corresponding second placement groups from each node among the N nodes.
The method according to claim 1, further comprising:

If the number of members of the first placement group is less than the number of members of the second placement group, other nodes other than the target node are determined in the current distributed system, and nodes are selected among the other nodes so that all After the sum of the number of selected nodes and the target node is N, perform the step of selecting a disk in each of the N nodes to obtain N members of the second placement group.
The method according to claim 4, characterized in that, selecting a node among other nodes so that the sum of the number of the selected node and the target node is N, includes:

Arrange other nodes in ascending order according to the number of the second placement group corresponding to the node, obtain the node sequence, and select the first N-M nodes in the node sequence; M is the number of members of the first placement group;

or,

Arrange other nodes in descending order according to the number of the second placement group corresponding to the node, obtain the node sequence, and select the node Point the last NM nodes in the sequence; M is the number of members of the first placement group.
The method according to claim 1, further comprising:

After selecting any second placement group in the placement group set, if there are other unselected second placement groups in the placement group set, then in each of the N nodes, other Select members of the second placement group that are not selected.
The method according to any one of claims 1 to 6, further comprising:

If a member in any storage pool fails, determine the fault placement group to which the faulty member belongs, and configure the nodes where each member of the fault placement group is located to form an object node set;

Determine the placement group corresponding to the fault placement group in another storage pool, and configure the nodes where each member of the placement group is located to form a corresponding node set;

Determine non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set;

Select the node with the smallest number of corresponding placement groups among the non-overlapping nodes, select the disk with the smallest number of corresponding placement groups among the selected nodes, and replace the faulty member with the currently selected disk.
The method according to claim 7, characterized in that said forming an object node set by nodes where each member of the fault placement group is located includes:

Determine the object node where each member of the fault placement group is located, delete the node where the fault member is located from the object node, and form the remaining nodes into the object node set.
The method according to claim 7, further comprising:

If the fault placement group has multiple corresponding placement groups in another storage pool, determine the node where each member of each placement group is located, and obtain multiple corresponding node sets;

Select one corresponding node set from the plurality of corresponding node sets, and perform the step of determining non-overlapping nodes that belong to the corresponding node set but do not belong to the object node set.
The method according to claim 7, further comprising:

If the non-overlapping node does not exist, or the node selected from the non-overlapping node has no available disk, other nodes other than the object node set are determined in the current distributed system, and the corresponding node is selected among the other nodes. After placing the node with the smallest number of placement groups, perform the steps of selecting the corresponding disk with the smallest number of placement groups among the selected nodes, and replacing the failed member with the currently selected disk.
The method according to claim 7, characterized in that after replacing the failed member with the currently selected disk, it further includes:

Recover data from the failed member to the currently selected disk.
The method according to any one of claims 1 to 6, further comprising:

Select members with the same nodes in the first placement group and the second placement group that correspond to each other as the main members of the corresponding placement groups.
The method of claim 1, wherein the first storage pool is a cache pool, and the second storage pool is a low-speed storage pool.
The method according to claim 1, characterized in that the placement group is a carrier for placing objects, one placement group corresponds to multiple objects, and one object corresponds to one disk.
The method according to claim 14 is characterized in that each disk is distributed on each node of the distributed system.
The method according to claim 1, characterized in that after selecting a member for the second placement group on a node, it further includes:

The number of the second placement group corresponding to this node is increased by one.
The method according to claim 16, characterized in that the node also records the number of the first placement group; if a member is selected for the first placement group on a node, it also includes:

The number of the first placement group corresponding to the node is increased by one.
A device for selecting placement group members, including:

Determining module, configured to determine the set of placement groups corresponding to any first placement group in the first storage pool in the second storage pool; wherein the first storage pool includes multiple first placement groups, and the second The storage pool includes a plurality of second placement groups; the total number of placement groups in the first storage pool is less than the total number of placement groups in the second storage pool;

A placement group selection module, configured to select a second placement group in the placement group set and determine the target node where each member of the first placement group is located;

A node selection module, configured to select N nodes in the target node if the number of members of the first placement group is not less than the number of members of the second placement group; N is the second placement group the number of members;

A member selection module is used to select a disk in each of the N nodes to obtain N members of the second placement group.
An electronic device, characterized by including:

Memory, used to store computer programs;

A processor, configured to execute the computer program to implement the method according to any one of claims 1 to 17.
A non-volatile computer-readable storage medium, characterized in that it is used to store a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 17 is implemented.