CN111651125A

CN111651125A - Method for determining storage area block in distributed system and related device

Info

Publication number: CN111651125A
Application number: CN202010500583.XA
Authority: CN
Inventors: 杨骥
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2020-09-11

Abstract

The embodiment of the application discloses a method for determining storage area blocks in a distributed system and a related device, wherein the method enables the storage area blocks with large block capacity to be accessed to the storage nodes with small node capacity, the storage area blocks with small block capacity are accessed to the storage nodes with large block capacity, and the block capacity among n storage area blocks accessed to the storage nodes can be relatively balanced. And the N node groups access the storage nodes to the N storage blocks according to the order of the second capacity sequence, so that the block capacity difference among the N storage blocks can be smoother. The balance and smoothness of the block capacity among the storage blocks are determined, and a favorable implementation basis is provided for the management domain mode, so that the storage volume is established based on the management domain mode, and the operation of stopping reading or writing data is possible to avoid occurring during system upgrading.

Description

Method for determining storage area block in distributed system and related device

Technical Field

The present application relates to the field of data processing, and in particular, to a method and a related apparatus for determining a storage block in a distributed system.

Background

The distributed system, such as a distributed cloud storage system, can provide various services such as data storage and computation for users and companies, and can be composed of massive storage nodes. When a user or a company needs such a service, a storage Volume (Volume) borne by a storage node can be generated from the distributed cloud storage system to serve as a data storage and computation basis for realizing the service.

However, since the distributed system itself often needs to be upgraded with services, in order to avoid the influence of write-stop operation on the provided services or storage volumes when the distributed system is upgraded, a Management Zone (MZ) manner is introduced in the distributed system in the related art, and the storage nodes in the distributed system are divided into a plurality of storage blocks as Management domains. When a storage volume is generated, it is endeavored to ensure that different data blocks (vlets) in the storage volume are in different storage blocks. Correspondingly, when the distributed system is upgraded, the distributed system is upgraded sequentially according to the storage blocks. In the upgrading process, only a part of data blocks in one storage volume are influenced at most, so that the integral service of the storage volume is not substantially influenced, and the upgrading effect of the distributed system without writing operation is achieved.

The implementation of the related art described above has a high requirement on the block capacity of each memory block in the distributed system, but the number of memory nodes in the distributed system is often increased or decreased, and how to reasonably control the block capacity of each memory block in the distributed system is an urgent problem to be solved.

Disclosure of Invention

In order to solve the above technical problem, the present application provides a method and a related device for determining a storage block in a distributed system, so as to achieve the purpose of reasonably controlling the block capacity of each storage block in the distributed system.

The embodiment of the application discloses the following technical scheme:

in one aspect, an embodiment of the present application provides a method for determining a storage block in a distributed system, where the distributed system includes n storage blocks, and the method includes:

acquiring a storage node set to be accessed to the distributed system, wherein the storage node set comprises s storage nodes;

according to the block capacity corresponding to the n storage blocks respectively, sorting the n storage blocks according to a first capacity sequence to obtain a first sorting result;

according to the node capacities corresponding to the s storage nodes respectively, sequencing the s storage nodes according to a second capacity sequence to obtain a second sequencing result, wherein the first capacity sequence is opposite to the second capacity sequence;

according to the second sequencing result, sequentially dividing the s storage nodes into N node groups, wherein the node groups comprise at most N storage nodes;

and accessing the ith storage node of each node group in the N node groups into the ith storage block in the first sequencing result, wherein i belongs to N.

On the other hand, an embodiment of the present application provides an apparatus for determining a storage block in a distributed system, where the distributed system includes n storage blocks, and the apparatus includes an obtaining unit, a sorting unit, a grouping unit, and an access unit:

the acquisition unit is used for acquiring a storage node set to be accessed to the distributed system, and the storage node set comprises s storage nodes;

the sorting unit is used for sorting the n storage blocks according to a first capacity sequence according to the block capacities corresponding to the n storage blocks respectively to obtain a first sorting result;

the sorting unit is further configured to sort the s storage nodes according to a second capacity order according to node capacities corresponding to the s storage nodes, respectively, to obtain a second sorting result, where the first capacity order is opposite to the second capacity order;

the grouping unit is configured to sequentially divide the s storage nodes into N node groups according to the second sorting result, where each node group includes at most N storage nodes;

the access unit is configured to access an ith storage node of each node group of the N node groups to an ith storage block in the first ordering result, where i ∈ N.

In another aspect, an embodiment of the present application provides a device for determining a storage block in a distributed system, where the device includes a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the method of the above aspect according to instructions in the program code.

In another aspect, the present application provides a computer-readable storage medium for storing a computer program for executing the method of the above aspect.

According to the technical scheme, for a distributed system comprising N storage blocks, if a storage node set to be accessed is obtained, the N storage blocks can be sequenced according to a first capacity sequence to obtain a first sequencing result, s storage nodes in the storage node set are sequenced according to a second capacity sequence to obtain a second sequencing result, and the s storage nodes are sequentially divided into N node groups according to the second sequencing result, wherein the node groups comprise at most N storage nodes corresponding to the N storage blocks. In the process of accessing the storage nodes in the storage node set to the distributed system, the ith storage node in each group is accessed to the ith storage block under the first sequencing result, because the first capacity sequence is opposite to the second capacity sequence, the storage node with small node capacity is accessed to the storage block with large block capacity, the storage node with large node capacity is accessed to the storage block with small block capacity, and the block capacities among the n storage blocks after the storage nodes are accessed can be relatively balanced. And the N node groups access the storage nodes to the N storage blocks according to the order of the second capacity sequence, so that the block capacity difference among the N storage blocks can be smoother. The balance and smoothness of the block capacity among the storage blocks are determined, and a favorable implementation basis is provided for the management domain mode, so that the storage volume is established based on the management domain mode, and the writing stopping operation during system upgrading is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic application scenario diagram of a method for determining a storage block in a distributed system according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for determining a storage block in a distributed system according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another method for determining a storage block in a distributed system according to an embodiment of the present disclosure;

fig. 4 is a schematic application scenario diagram of another method for determining a storage block in a distributed system according to an embodiment of the present application;

fig. 5 is a schematic application scenario diagram of another method for determining a storage block in a distributed system according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an apparatus for determining a storage block in a distributed system according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

In the related art, there is no MZ-managed distributed storage cluster, and therefore, when binary upgrade is performed on a large-scale storage node in a distributed system, a write stop operation needs to be performed. The write stop operation refers to an operation of stopping writing data. If the MZ stops writing, the storage volume in the MZ cannot provide service, and the user experience is affected.

In view of the foregoing problems, an embodiment of the present application provides a design for performing storage data management based on MZ, that is, a storage space corresponding to a plurality of storage nodes is managed by using the MZ. Wherein one storage node can carry a plurality of vlets.

In the field of distributed storage, a Volume may be composed of multiple vlets, and is used for providing services such as data storage, data calculation and the like for users. When multiple vlets are selected for the same Volume, it is necessary to ensure that the multiple vlets are located in different MZs respectively. Based on this, when distributed system upgrading, such as binary upgrading, is performed according to MZ, one service upgrading request falls on different vlets of Volume, at most one Shard (Shard) in service data fails, and a single Shard failure has no influence on the service. Therefore, the operation of non-stop writing during upgrading can be realized. The non-stop writing operation is an operation of not stopping writing data. If the memory block stops writing, that is, all the memory nodes in the memory block stop writing, this may result in the memory block not being able to continue to provide service or accept new service requests. When the operation of not writing in the upgrading process can be realized, even if certain MZ is upgraded, the normal service provided by the storage volume is not influenced, and service consumers (such as users and companies) can still write various types of data into the storage volume normally.

In the application process, if the block capacity allocation corresponding to the MZ is not uniform, when the Vlet is allocated to the Volume, the storage space in the MZ with the smaller block capacity is allocated, and a large amount of remaining storage space exists in the MZ with the larger block capacity. Since the Vlet in one Volume needs to be carried by different MZs, if the number of MZs with a large amount of remaining space is not enough to allocate the Vlet to the Volume, the remaining storage space will be unusable, and the storage resource is wasted. It can be seen that balancing the block sizes among MZs as much as possible is one of the bases for implementing the above-mentioned management domain.

In order to ensure that the block capacity among different MZs is kept as balanced as possible after each time a storage node accesses a distributed system, the embodiment of the present application provides a method for determining a storage block in a distributed system. The method for determining the storage area block is suitable for the technical field of distributed cloud storage (cloud storage), and can be applied to a distributed cloud storage system.

The distributed cloud storage system is a storage system which integrates a large number of storage devices (storage devices are also called storage nodes) of different types in a network through application software or application interfaces to cooperatively work through functions of cluster application, grid technology, distributed storage file system and the like, and provides data storage and service access functions to the outside.

Cloud computing is a computing model, which distributes computing tasks over a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information services as required. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (called as an ifas (Infrastructure as a Service) platform for short is established, and multiple types of virtual resources are deployed in the resource pool and are selectively used by external clients.

The method for determining the storage block in the distributed system is applied to storage block determination equipment with cloud computing capability, such as a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. In this embodiment, a server is used as a storage block determination device to describe a determination method for a storage block in a distributed system provided in this embodiment.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a method for determining a storage block in a distributed system according to an embodiment of the present application.

In the application scenario shown in fig. 1, a server 101 is included, and is configured to determine a storage block corresponding to a storage node to be accessed. For convenience of description, in the present application, the memory blocks are represented by MZ, and in the distributed system shown in fig. 1, 4 MZs are included, that is, n is 4.

In a task of accessing a storage node to a distributed system, the server 101 acquires a storage node set to be accessed to the distributed system, where the storage node set includes 5 storage nodes, that is, s is 5.

First, the server 101 sorts the 4 MZs according to the block capacities corresponding to the 4 MZs respectively and according to a first capacity order, so as to obtain a first sorting result. The block capacity corresponding to one MZ is equal to the sum of the node capacities corresponding to all storage nodes in the MZ.

In the embodiment of the present application, the capacity order is used to characterize the arrangement logic based on the capacity size, and may be, for example, the order of the capacity from small to large, or the order of the capacity from large to small. The first capacity order and the second capacity order described below represent different capacity orders, and the first capacity order and the second capacity order are logically opposite in arrangement as characterized by the first capacity order and the second capacity order.

In the scenario shown in fig. 1, the first capacity order is an order from small capacity to large capacity, and 4 MZs in the distributed system are ordered according to the first capacity order, so as to obtain a first ordering result: MZ1, MZ2, MZ3, and MZ 4. The block capacity of MZ1 is the smallest, and the block capacity of MZ4 is the largest.

Secondly, the server may sort the 5 storage nodes to be accessed to the distributed storage system according to the node capacity corresponding to the storage node and the second capacity order, so as to obtain a second sorting result.

In the scenario shown in fig. 1, the second capacity order is an order from large capacity to small capacity, and 5 storage nodes in the storage node set are ordered according to the second capacity order, so as to obtain a second ordering result, i.e., node 1, node 2, node 3, node 4, and node 5. The node capacity corresponding to the node 1 is the largest, and the node capacity corresponding to the node 5 is the smallest.

The MZ in the positive order according to the block capacity is identified by the first ordering result obtained by the above arrangement, and the storage nodes in the reverse order according to the node capacity are identified by the second ordering result, so that the storage node with smaller node capacity in the second ordering result can be allocated to the MZ with larger block capacity in the first ordering result, and the storage node with larger block capacity in the second ordering result can be allocated to the MZ with smaller node capacity in the first ordering result, so that the block capacities of different MZs can be kept relatively balanced after the distributed system is accessed to a batch of storage nodes.

Specifically, the server 101 groups the 5 storage nodes according to the second sorting result, and obtains 2 node groups according to the number 4 of MZs. I.e., N-2, where N-ceil (s/N), ceil () denotes rounding up. Further, the server 101 may access the ith storage node in each node group to the ith MZ in the first ordering result in a node group unit. Wherein i ∈ n.

In the scenario shown in fig. 1, 5 storage nodes in the second sorting result are grouped according to the number 4 of MZs, and are divided into 2 node groups: node group1 and node group 2. Wherein, the node group1 includes: node 1, node 2, node 3, and node 4, with node group2 including node 5. And then, accessing the ith storage node in each node group into the ith MZ in the first sequencing result. For example, the 2 nd storage node in node group 1: node 2 is connected to MZ 2. Store node 1 in node group 2: node 5 is connected to MZ 1.

The node groups are divided by the number of the MZs, so that the storage nodes in the node groups are sequentially accessed into the MZs according to the arrangement sequence of the second sequencing result by taking the node groups as a unit, and the capacity difference among different MZs in the distributed system can be relatively smooth.

Based on the above, after the storage nodes are respectively connected to the corresponding MZs, the block capacities corresponding to different MZs have balance and smoothness, which provides a favorable implementation basis for the management domain mode, so that the storage volume is established based on the management domain mode, and the write stop operation during system upgrade is avoided.

The following describes a method for determining a storage block in a distributed system according to an embodiment of the present application with reference to the accompanying drawings.

Referring to fig. 2, fig. 2 is a schematic flowchart of a method for determining a storage block in a distributed system according to an embodiment of the present application. As shown in fig. 2, the method comprises the steps of:

s201: and acquiring a storage node set to be accessed to the distributed system, wherein the storage node set comprises s storage nodes.

In practical applications, the server may provide a physical resource metadata service by using a Node Manager (Node Manager), which is responsible for block capacity management between MZs. If n MZs are included in the distributed storage cluster, denoted by MZ _ LIST: { MZ1, MZ2, …, MZn }, then MZ _ LIST [ i ] denotes the ith MZ.

If the server acquires a storage node set to be accessed in an access task of accessing a storage node to a distributed system, the storage node set comprises s storage nodes and is represented by HOST _ LIST { HOST1, HOST2, … and HOSTs }, and HOST _ LIST [ p ] represents the p-th storage node. Therefore, the server needs to determine the MZ corresponding to each of the s storage nodes, so that the block capacities of the MZs accessed to the storage nodes are balanced as much as possible.

The storage node may be a machine having a data storage space. For example, a computer device, server, etc. It will be appreciated that the storage nodes may correspond to different node types (e.g., model) depending on machine performance or hardware configuration. Storage nodes of different node types may have different data processing characteristics, for example, in terms of machine performance, and a high-performance machine and a low-performance machine may be distinguished by the node type. It can also be embodied in areas of excellence, such as machines with strong graphics processing capabilities and machines with strong data computing capabilities distinguished by node type.

In this step, the s storage nodes included in the storage node set may have different node types or may have the same node type.

S202: and according to the block capacity corresponding to the n storage blocks respectively, sequencing the n storage blocks according to a first capacity sequence to obtain a first sequencing result.

For the n MZs, the server firstly sequences the n MZs according to the block capacity corresponding to the MZ and the first capacity sequence.

In the embodiment of the present application, the capacity order is used to characterize the arrangement logic of the capacity size, and for example, the capacity order may be from small to large, or from large to small. The first capacity order ordering here is opposite to the second capacity order in S203.

Thus, there are at least two possible combinations of the first capacity order and the second capacity order, one combination being: the first capacity order is from small to large according to capacity, and the second capacity order is from large to small according to capacity. The other combination form is as follows: the first capacity order is from large to small according to capacity, and the second capacity order is from small to large according to capacity. In any combination, the effect of balancing the block capacity between n MZs as much as possible after accessing the storage node can be achieved.

In some application scenarios, if the first capacity order is set to be from small to large according to the capacity, the n MZs are SORTED from small to large according to the positive order of the block capacity to obtain a first sorting result, and the first sorting result is represented by a SORTED _ MZ _ LIST, and the SORTED _ MZ _ LIST [ i ] represents an MZ with a small i block capacity.

In other application scenarios, if the first capacity sequence is set to be from large to small according to the capacity, the n MZs are SORTED according to the positive sequence from large to small according to the block capacity to obtain a first sorting result, which is represented by the SORTED _ MZ _ LIST, and the SORTED _ MZ _ LIST [ i ] represents the MZ with the block capacity i being large.

S203: and according to the node capacities corresponding to the s storage nodes respectively, sequencing the s storage nodes according to a second capacity sequence to obtain a second sequencing result, wherein the first capacity sequence is opposite to the second capacity sequence. Based on the above S202, in some application scenarios, if the first capacity order is set from small to large according to the capacity, the second capacity order is from large to small according to the capacity. Therefore, the server may SORT the storage nodes according to the node capacities corresponding to the s storage nodes respectively, in reverse order from large to small, to obtain a second sorting result, which is represented by the SORTED _ HOST _ LIST, and the SORTED _ HOST _ LIST [ p ] represents the storage node with the pth node capacity.

In other application scenarios, if the first capacity order is set from large to small in capacity, the second capacity order is set from small to large in capacity. Therefore, the server may sort the s storage nodes in the reverse order from small to large according to the node capacities corresponding to the respective storage nodes to obtain a second sorting result, which is represented by the heated _ HOST _ LIST, and the heated _ HOST _ LIST [ p ] represents the storage node with the small node capacity.

Because the first ordering result identifies the size relationship between the block capacities corresponding to the n MZs, and the second ordering result identifies the size relationship between the node capacities corresponding to the s storage nodes, the storage node with larger node capacity can be accessed to the MZ with smaller block capacity and the storage node with smaller node capacity can be allocated to the MZ with larger block capacity according to the size relationship between the block capacities of the MZs and the node capacities of the storage nodes, so that the respective corresponding block capacities of the different MZs accessed to the storage nodes can be kept relatively balanced.

S204: and according to the second sequencing result, sequentially dividing the s storage nodes into N node groups, wherein the node groups comprise at most N storage nodes.

Further, the server may group the s storage nodes in the second sorting result according to the number N of MZs, so as to obtain N node groups. Where N ═ ceil (s/N), ceil () denotes rounding up. When s is smaller than or equal to N, N is equal to 1, namely only 1 node group is provided, and the node group comprises s storage nodes; when s is larger than N and s can be divided by N, N is equal to (s/N), namely, (s/N) node groups exist, and each node group comprises N storage nodes; when s is greater than N and s cannot divide N evenly, N is equal to (s/N) rounding plus 1, wherein (N-1) node groups include N storage nodes and the other 1 node group includes [ s- (N-1) × N ] storage nodes. For example, N is 3, s is 10, and N is 4, where 3 node groups include 3 storage nodes and 1 other node group includes 1 storage node.

For the N node groups, denoted by Group1, Group2, …, and Group pn, then Group j denotes the jth node Group. Wherein the content of the first and second substances,

Group1：

SORT_HOST_LIST[1],SORT_HOST_LIST[2],…,SORT_HOST_LIST[n]；

Group2:

SORT_HOST_LIST[n+1],SORT_HOST_LIST[n+2],…,SORT_HOST_LIST[2n]；

…

GroupN:

SORT_HOST_LIST[(N-1)*n+1],…,SORT_HOST_LIST[s-1],SORT_HOST_LIST[s]。

since the N node groups are obtained by grouping according to the second sorting result, the storage nodes included in each node group are sequentially arranged according to the second capacity order, and the node capacities corresponding to the storage nodes in different node groups also have a one-point size order.

In some application scenarios, if the second capacity sequence is from large to small according to the capacity, N node groups are obtained after grouping according to the second capacity sequence, the storage nodes included in each node group are arranged from large to small according to the node capacity, and the node capacity of the last storage node of the jth node group is greater than or equal to the node capacity of the first storage node of the jth +1 node group.

In other application scenarios, if the second capacity sequence is from small to large according to the capacity, N node groups are obtained after grouping according to the second capacity sequence, the storage nodes included in each node group are arranged from small to large according to the node capacity, and the node capacity of the last storage node of the jth node group is less than or equal to the node capacity of the first storage node of the jth +1 node group.

Since the N node groups are grouped according to the number N of MZs, and the storage nodes in the N node groups are sequentially divided according to the order of the second sorting result, when accessing the storage nodes, the N storage nodes included in one node group can be used as a whole to access the N storage nodes in the node group in the N MZs in sequence, so that the block capacity difference between the storage nodes is relatively balanced. It can be understood that, if there is a difference in node capacities of different storage nodes, the difference in block capacities of different MZs changes after the different MZs access the corresponding different storage nodes. After a storage node with a large node capacity is accessed to the MZ, the change amount of the block capacity of the MZ is large, and the influence on the difference change between the block capacities of different MZs is large, while after a storage node with a small node capacity is accessed to the MZ, the change amount of the block capacity of the MZ is small, and the influence on the difference change between the block capacities of different MZs is small. Therefore, in order to balance the block capacities of different MZs as much as possible, in one possible implementation, a storage node with a larger node capacity may be connected to the MZ, and then a storage node with a smaller node capacity may be connected to the MZ.

In concrete implementation, the first capacity sequence is set to be from small to large according to the capacity, the second capacity sequence is from large to small according to the capacity, the MZs in the first sequencing result obtained according to the first capacity sequence are arranged in a positive sequence from small to large according to the block capacity, and the storage nodes in the second sequencing result obtained according to the second capacity sequence are arranged in a reverse sequence from large to small according to the node capacity. In the grouping process, the n storage nodes are sequentially divided into the same node group according to the arrangement sequence of the storage nodes in the second ordering result, and the node capacity of the nth storage node in the jth node group is greater than or equal to the node capacity of the 1 st storage node in the (j + 1) th node group. That is, the node capacity of the last storage node in the previous node group is greater than or equal to the node capacity of the first storage node in the subsequent node group.

For example, there are 4 storage nodes, i.e., s-4, and 2 MZ, i.e., n-2. Firstly, the 4 storage nodes are arranged in a reverse order according to the node capacity from large to small to obtain a first ordering result: node 1, node 2, node 3 and node 4. The node capacity size relationship of the storage nodes is as follows: node 1 is greater than or equal to node 2, and node 3 is greater than or equal to node 4. Then, according to the first ordering result, every 2 storage nodes are divided into one group, and a node group1 is obtained: node 1 and node 2, node group 2: node 3 and node 4. It follows that the 2 nd storage node in the node group 1: the node capacity of node 2 is greater than or equal to the 1 st storage node in node group 2: node capacity of node 3.

After the grouping is performed in the above manner, the server may access the storage nodes in the node group to the MZ one by one in the unit of the node group. Specifically, n storage nodes in a node group with the largest node capacity are correspondingly accessed into n MZs in the first sequencing result, then n storage nodes in a node group with the second largest node capacity are correspondingly accessed into n MZs until all the storage nodes are accessed into the MZs, so that the purpose of accessing the storage nodes with the large node capacity into the MZ with the small block capacity and then accessing the storage nodes with the small node capacity into the MZ with the large block capacity is achieved, and the block capacities of different MZs are kept balanced as much as possible.

In the process of accessing the storage nodes, for different node groups, each storage node in the node group with large node capacity is allocated to the corresponding MZ. For the same node group, the storage nodes with large node capacity are firstly allocated to the corresponding MZs, namely, the storage nodes with large node capacity are firstly accessed into the MZs based on a multi-round grouped greedy mode, and then the storage nodes with small node capacity are accessed into the MZs, so that the influence of the storage nodes with large node capacity on the block capacity difference between different MZs is reduced, and the relative balance of the block capacity between different MZs is kept as much as possible.

S205: and accessing the ith storage node of each node group in the N node groups into the ith storage block in the first sequencing result, wherein i belongs to N.

After the server groups the s storage nodes to obtain N node groups, the node groups Group1, Group2, …, and GroupN may be processed one by one. And for each node group, accessing the ith storage node in the node group to the ith MZ in the first sequencing result, wherein the value of i is from 1 to n in sequence.

And if the node Group performing access operation each time is set as CURRENT _ Group, using CURRENT _ Group [ i ] to represent the ith storage node in the CURRENT _ Group. For the ith storage node in CURRENT _ GROUP, i.e. CURRENT _ GROUP [ i ], it correspondingly accesses the ith MZ in the first ordering result, i.e. SORTED _ MZ _ LIST [ i ]. Wherein the value of i is from 1 to n in sequence. For example, when CURRENT _ GROUP is GROUP2, the 1 st storage node in GROUP2 is sequentially accessed to the 1 st MZ in the first ordering result, and then the nth storage node in GROUP2 is accessed to the nth MZ in the first ordering result.

Through the steps, after the storage nodes are accessed to the corresponding MZs, the block capacity among the MZs is kept relatively balanced as much as possible, and the block capacity difference is kept relatively smooth, so that a favorable implementation basis is provided for managing the storage space based on a management domain mode, the storage volume is established based on the management domain mode, and the write stop operation during system upgrading is avoided.

It can be understood that after n storage nodes in a node group access the corresponding MZ, the block capacity corresponding to the MZ changes. The block size relationship for different MZs may also change. In order to further improve the relative smoothness of the block capacity difference between the MZs, in a possible implementation manner, after n storage nodes in the jth node group are connected to the n MZs, the n storage nodes are reordered according to the block capacity respectively corresponding to the n MZs according to the first capacity sequence to update the first ordering result, and then the ith storage node in the j +1 node group is connected to the ith MZ in the first ordering result.

After n storage nodes in one node group are all accessed to n MZs, the n MZs need to be reordered according to a first capacity sequence, and the obtained first ordering result updates the ordering sequence of the n MZs, so that when the n MZs accessed to the n storage nodes are accessed to a storage node in the next node group, the storage nodes with large node capacity can be accessed to the MZs with small block capacity and the storage nodes with small node capacity are accessed to the MZs with large block capacity, the balance among the MZ block capacities is improved, and the smoothness among MZ block capacity differences is also improved.

In the method for determining a storage block in a distributed system provided in the foregoing embodiment, for a distributed system including N storage blocks, if a storage node set to be accessed is obtained, the N storage blocks may be sorted according to a first capacity order to obtain a first sorting result, and s storage nodes in the storage node set may be sorted according to a second capacity order to obtain a second sorting result, and the s storage nodes are sequentially divided into N node groups according to the second sorting result, where the node groups include at most N storage nodes corresponding to the N storage blocks. In the process of accessing the storage nodes in the storage node set to the distributed system, the ith storage node in each group is accessed to the ith storage block under the first sequencing result, because the first capacity sequence is opposite to the second capacity sequence, the storage node with small node capacity is accessed to the storage block with large block capacity, the storage node with large block capacity is accessed to the storage block with small block capacity, and the block capacities among the n storage blocks after the storage nodes are accessed can be relatively balanced. And the N node groups access the storage nodes to the N storage blocks according to the order of the second capacity sequence, so that the block capacity difference among the N storage blocks can be smoother. The balance and smoothness of the block capacity among the storage blocks are determined, and a favorable implementation basis is provided for the management domain mode, so that the storage volume is established based on the management domain mode, and the writing stopping operation during system upgrading is avoided.

There may be differences in node types accessing storage nodes in the distributed system. If all storage nodes of the same node type are accessed to the same MZ, when a service (for example, a graphics processing service) is provided, at most one storage node with strong graphics processing capability can be selected from the Vlet selected by the Volume, which is obviously not the optimal selection mode of the Volume.

Therefore, in the process of accessing the storage node to the MZ, the node types of the storage nodes in the MZ need to be balanced as much as possible, so that each MZ includes storage nodes of different node types, and therefore, when the Volume selects the Vlet, the Volume can have a better combination mode, and better service is provided for users.

Based on the foregoing, an embodiment of the present application provides another method for determining a storage block in a distributed system. As shown in fig. 3, the method includes the following steps before S201:

s301: and acquiring m storage nodes to be accessed into the distributed system, wherein m is larger than s.

S302: and determining the node types corresponding to the M storage nodes respectively, wherein the determined node types comprise M.

S303: and classifying the M storage nodes according to the node types to obtain M type groups, wherein the type groups comprise storage nodes of the same node type.

S304: and sorting according to the average capacity corresponding to the M type groups respectively and the second capacity sequence to obtain a third sorting result.

S305: and selecting a target type group from the M type groups according to the third sorting result, wherein the target type group comprises the storage node set.

As shown in fig. 4, for M storage nodes to be accessed, the server determines node types corresponding to the M storage nodes, including M storage nodes. Represented by HOST _ TYPE _ LIST { HOST _ TYPE1, HOST _ TYPE2, …, HOST _ TYPE }, where HOST _ TYPE [ q ] represents the qth node TYPE. The server groups the M storage nodes according to the M node types, and divides the storage nodes with the same node type into the same type group to obtain M type groups.

In a possible implementation manner, the node types of the storage nodes may be divided according to functions of the storage nodes, or divided according to machine types of the storage nodes. Storage nodes of different node types have different data processing characteristics.

The function of the storage node refers to the machine performance of the storage node for data processing. For example, the node types of the storage nodes may be divided into: storage nodes that perform graphics processing, storage nodes that perform data computation, and so on. The model of the storage node refers to the hardware configuration corresponding to the storage node. For example, the node types of the storage nodes can be divided into: a storage node configured with 100G storage space, a storage node configured with 1T storage space, and so on. In practical applications, the node type may be set according to an application scenario, and is not limited herein.

Because the node types corresponding to the storage nodes in the same type group are the same, in the process of accessing the storage nodes into the MZ, the type group can be used as a whole, and a plurality of storage nodes in the same type group can be accessed into the MZ in a balanced manner, so that the node types of the storage nodes in different MZs are kept balanced as much as possible, and the performance of the distributed storage system based on MZ management is improved.

It can be understood that a storage node with a large node capacity has a large influence on the block capacity of the MZ, and a storage node with a small node capacity has a small influence on the block capacity of the MZ. In order to keep the block capacities of different MZs relatively balanced, i.e. to reduce the difference between the block capacities of different MZs as much as possible, the storage node with large node capacity may be connected to the MZ first, and then the storage node with small node capacity may be connected to the MZ.

In a possible implementation manner, after the storage nodes to be accessed are grouped according to the node types, the server may calculate the average capacity corresponding to all the storage nodes in each type group, and then sequence the M type groups according to the second capacity sequence in S203. For example, if the second capacity sequence in S203 is that the capacity is increased to decreased, in this step, the TYPE groups are SORTED according to the average capacity corresponding to the TYPE groups, so as to obtain a third sorting result, where the third sorting result may be represented by a SORTED _ HOST _ TYPE _ LIST, where the SORTED _ HOST _ TYPE _ LIST [ q ] represents the TYPE group with the largest average capacity q. In the scenario shown in fig. 4, after the M type groups are sorted according to the second capacity order, a third sorting result is obtained as follows: type group1, type group2, …, type group k, …, type group M.

Since the third sorting result identifies the magnitude relation of the average capacities of all the storage nodes in the different types of groups, the storage nodes in the type group with the large average capacity can be accessed to the MZ first, and then the storage nodes in the type group with the small average capacity can be accessed to the MZ based on the third sorting result, so that the influence of the storage nodes with the large node capacity on the block capacity difference between the different MZs is reduced, and the block capacities between the MZs accessed to the storage nodes are kept relatively balanced.

And based on the arrangement sequence of the M type groups in the third ordering result, accessing the storage nodes in each type group into the MZ one by one. Specifically, after the server selects the kth type group, that is, the type group k, as the target type group, all storage nodes in the type group k are accessed into the n MZs.

Assuming that the storage nodes in the type group k are the storage node set HOST _ LIST in S201, that is, S storage nodes in the storage node set in S201 belong to the same node type, and after the average capacities corresponding to the S storage nodes are sorted according to the second capacity order, the storage nodes in the storage node set in S201 are the type group with the largest k-th order in the third sorting result, the above S201-S205 are executed to access the S storage nodes into the n MZs, so that after different storage nodes in the same type group are accessed into the MZs, the block capacities between different MZs are relatively balanced.

After all the storage nodes in the type group k are accessed to the MZ, the server continues to select the (k + 1) th type group, that is, the type group k +1, as the target type group, and repeatedly executes the above S201-S205 for the storage nodes in the (k + 1) th type group until all the M storage nodes in the M type groups are accessed to the n MZs. Because the type group identifies the storage nodes of different node types, when all the storage nodes in the same type group are accessed to the MZs one by one according to the type group as a unit, the storage nodes belonging to the same node type are accessed to different MZs in a balanced manner, so that the storage nodes in each MZ have different node types, and the performance of the distributed storage system based on MZ management is improved. In addition, a large-capacity storage node is connected into a small-capacity MZ, and a small-capacity storage node is connected into a large-capacity MZ, so that the balance of block capacity among different MZs is ensured, and the resource utilization rate of a distributed storage system based on MZ management is improved.

It can be understood that after all storage nodes in a type group are connected to an MZ, the block capacity of each MZ changes, and therefore, the block capacity size relationship between different MZs may also change. Therefore, in a possible implementation manner, after all the storage nodes in the type group k are accessed to the MZs, the server may reorder according to the total number of blocks corresponding to each of the n MZs according to the first capacity order, and update the first ordering result. Then, the storage nodes in the (k + 1) th type group are accessed into the MZs until all the M storage nodes in the M type groups are accessed into the n MZs.

Since the updated first ordering result re-identifies the size relationship of the block capacities among the n MZs, when accessing the storage node in the next type group, the storage node with a large node capacity in the type group can still be accessed into the MZ with a small block capacity, and the storage node with a small node capacity in the type group is accessed into the MZ with a large block capacity, thereby further improving the smoothness among MZ block capacity differences.

For ease of understanding, in the scenario shown in fig. 5, it is assumed that 18 MZs are included in one distributed storage cluster (e.g., YottaStore), i.e., n ═ 18. In one access task, the storage machines to be accessed total 180, that is, the storage node m to be accessed is 180. There are 2 types of these 180 storage nodes, which are: a storage node configured with 384T storage space and a storage node configured with 144T storage space.

The server divides the 180 storage nodes according to the model of the storage nodes to obtain 2 types of groups, which are respectively: type group1 and type group 2. The type group1 includes 108 storage nodes configured with 384T storage space, and the type group2 includes 72 storage nodes configured with 144T storage space. Based on this, the average capacity of all storage nodes in type group1 is 384T, and the average capacity of all storage nodes in type group2 is 144T.

Then, the server sorts the storage nodes in the type group according to the average capacity of the storage nodes in the type group in a reverse order according to a second capacity sequence from large average capacity to small average capacity, and a third sorting result is obtained: type group1, type group 2. Based on this, the type group1 with larger average capacity is selected as the target type group, 108 storage nodes are firstly accessed into 18 MZs, and then 72 storage nodes in the type group2 are processed.

In the process of accessing the storage node to the MZ, the server firstly sequences 18 MZs in positive sequence according to a first capacity sequence of the MZ block from small to large, and obtains a first sequencing result: MZ1, MZ2, …, MZ 18. Then, the server sorts the 108 storage nodes in a reverse order according to a second capacity order of the node capacities from large to small to obtain a second sorting result: node 1, node 2, …, node 108. Furthermore, the server groups the 108 storage nodes according to the second sorting result and the number of MZs 18 to obtain 6 node groups, which are: node group1 includes node 1, node 2, …, node 18; node group2 includes node 19, node 20, …, node 38; … node group 6 includes node 91, node 92, …, node 108.

For 108 storage nodes in the type group1, the server selects the storage node with the largest node capacity corresponding to the node group1 as a target node group, and respectively accesses 18 storage nodes into 18 MZs. Specifically, the 1 st storage node: node 1 accesses the 1 st memory bank MZ1 until the 18 th memory node: node 18 accesses the 18 th memory bank MZ 18.

After all 18 storage nodes in the node group1 are connected to the MZ, the 18 MZs are reordered according to the first capacity sequence, and the first ordering result is updated. Then, taking the node group2 as a target node group, and similarly accessing 18 storage nodes thereof to corresponding 18 MZs. The MZs are repeatedly updated and storage nodes in each node group are accessed until all of the 108 storage nodes in type1 are accessed into 18 MZs. Similarly, all of the 72 storage nodes in type2 are accessed to the 18 MZs, which is not described herein again.

As can be seen from the above embodiments, the storage node with a small node capacity is accessed to the storage block with a large block capacity, and the storage node with a large node capacity is accessed to the storage block with a small block capacity, so that the block capacities of the n storage blocks accessed to the storage node can be relatively balanced. And the N node groups access the storage nodes to the N storage blocks according to the order of the second capacity sequence, and after all the storage nodes in each node group are all accessed to the MZ again, the first ordering result is updated according to the first capacity sequence, so that the block capacity difference among the N storage blocks can be smoother. The balance and smoothness of the block capacity among the storage blocks are determined, and a favorable implementation basis is provided for the management domain mode, so that the storage volume is established based on the management domain mode, and the writing stopping operation during system upgrading is avoided.

For the method provided above, an embodiment of the present application further provides a device for determining a storage block in a distributed system.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a determining apparatus for a storage block in a distributed system according to an embodiment of the present application. As shown in fig. 6, the distributed system includes n memory blocks, and the determining apparatus 600 includes an obtaining unit 601, an ordering unit 602, a grouping unit 603, and an accessing unit 604:

the obtaining unit 601 is configured to obtain a storage node set to be accessed to the distributed system, where the storage node set includes s storage nodes;

the sorting unit 602 is configured to sort the n storage blocks according to a first capacity order according to the block capacities corresponding to the n storage blocks, respectively, to obtain a first sorting result;

the sorting unit 602 is further configured to sort, according to node capacities corresponding to the s storage nodes respectively, the s storage nodes according to a second capacity order to obtain a second sorting result, where the first capacity order is opposite to the second capacity order;

the grouping unit 603 is configured to sequentially divide the s storage nodes into N node groups according to the second sorting result, where each node group includes at most N storage nodes;

the accessing unit 604 is configured to access an ith storage node of each node group in the N node groups to an ith storage block in the first ordering result, where i ∈ N.

Wherein the first capacity order is from small to large capacity, and the second capacity order is from large to small capacity; alternatively, the first and second electrodes may be,

the first capacity order is from large to small in capacity, and the second capacity order is from small to large in capacity.

Wherein, if the jth node group is a jth node group sequentially divided from the N node groups according to the second sorting result, the access unit 604 is configured to:

after n storage nodes in the jth node group are accessed into the n storage blocks, reordering according to a first capacity sequence according to block capacities corresponding to the n storage blocks respectively to update the first ordering result;

and accessing the ith storage node in the j +1 node group to the ith storage block in the first sequencing result.

Then, the device further comprises a determining unit, a classifying unit and a selecting unit:

the obtaining unit 601 is further configured to obtain m storage nodes to be accessed to the distributed system, where m is greater than s;

the determining unit is configured to determine node types corresponding to the M storage nodes, where the determined node types include M;

the classification unit is configured to classify the M storage nodes according to the node types to obtain M type groups, where each type group includes storage nodes of the same node type;

the sorting unit 602 is further configured to sort according to the second capacity order and the average capacities corresponding to the M type groups, respectively, to obtain a third sorting result;

the selecting unit is configured to select a target type group from the M type groups according to the third sorting result, where the target type group includes the storage node set.

The node types are divided according to functions of the storage nodes, or the node types are divided according to models of the storage nodes.

If the first capacity sequence is from small to large according to capacity, the second capacity sequence is from large to small according to capacity, the target type group is a kth type group in the M type groups, and after the s storage nodes are accessed into the distributed system, the selecting unit is further configured to select a (k + 1) th type group from the M type groups, use the storage nodes included in the (k + 1) th type group as the storage node set, and trigger the obtaining unit 601.

If the first capacity sequence is from small to large according to capacity, the second capacity sequence is from large to small according to capacity, and the node capacity of the nth storage node in the jth node group is greater than or equal to the node capacity of the 1 st storage node in the (j + 1) th node group.

The apparatus for determining a storage block in a distributed system according to the foregoing embodiment, for a distributed system including N storage blocks, if a storage node set to be accessed is obtained, may sequence the N storage blocks according to a first capacity order to obtain a first sequencing result, sequence s storage nodes in the storage node set according to a second capacity order to obtain a second sequencing result, and sequentially divide the s storage nodes into N node groups according to the second sequencing result, where the node groups include at most N storage nodes corresponding to the N storage blocks. In the process of accessing the storage nodes in the storage node set to the distributed system, the ith storage node in each group is accessed to the ith storage block under the first sequencing result, because the first capacity sequence is opposite to the second capacity sequence, the storage node with small node capacity is accessed to the storage block with large block capacity, the storage node with large node capacity is accessed to the storage block with small block capacity, and the block capacities among the n storage blocks after the storage nodes are accessed can be relatively balanced. And the N node groups access the storage nodes to the N storage blocks according to the order of the second capacity sequence, so that the block capacity difference among the N storage blocks can be smoother. The balance and smoothness of the block capacity among the storage blocks are determined, and a favorable implementation basis is provided for the management domain mode, so that the storage volume is established based on the management domain mode, and the writing stopping operation during system upgrading is avoided.

The embodiment of the present application further provides a determination server for a storage block in a distributed system, and the following introduces the determination server for a storage block in a distributed system provided in the embodiment of the present application from the perspective of hardware implementation.

Referring to fig. 7, fig. 7 is a schematic diagram of a server 1400 according to an embodiment of the present application, where the server 1400 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1422 (e.g., one or more processors) and a memory 1432, one or more storage media 1430 (e.g., one or more mass storage devices) for storing applications 1442 or data 1444. Memory 1432 and storage media 1430, among other things, may be transient or persistent storage. The program stored on storage medium 1430 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Still further, a central processor 1422 may be disposed in communication with storage medium 1430 for executing a series of instruction operations on storage medium 1430 on server 1400.

The server 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input-output interfaces 1458, and/or one or more operating systems 1441, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 7.

The CPU 1422 is configured to perform the following steps:

Optionally, the CPU 1422 may further execute the method steps of any specific implementation manner of the method for determining the storage area block in the distributed system in the embodiment of the present application.

The embodiment of the present application further provides a computer-readable storage medium for storing a computer program, where the computer program is used to execute the method for determining the storage block in the distributed system provided in the foregoing embodiment.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.

It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for determining a storage block in a distributed system, wherein the distributed system comprises n storage blocks, the method comprising:

2. The method of claim 1, wherein the first capacity order is from small to large capacity, and the second capacity order is from large to small capacity; alternatively, the first and second electrodes may be,

3. The method according to claim 1 or 2, wherein the j-th node group is j of the N node groups sequentially divided according to the second ordering result, and the accessing the i-th storage node in the N node groups to the i-th storage block in the first ordering result comprises:

4. The method according to claim 1 or 2, wherein before the obtaining s storage nodes to access the distributed system, the method further comprises:

acquiring m storage nodes to be accessed into the distributed system, wherein m is larger than s;

determining node types corresponding to the M storage nodes respectively, wherein the determined node types comprise M;

classifying the M storage nodes according to the node types to obtain M type groups, wherein the type groups comprise storage nodes of the same node type;

according to the average capacities corresponding to the M type groups respectively, sorting according to the second capacity sequence to obtain a third sorting result;

and selecting a target type group from the M type groups according to the third sorting result, wherein the target type group comprises the storage node set.

5. The method of claim 4, wherein the node types are divided according to functions of the storage nodes or are divided according to models of the storage nodes.

6. The method of claim 4, wherein if the first capacity order is from small to large capacity, the second capacity order is from large to small capacity, the target type group is a kth type group of the M type groups, and after accessing the s storage nodes in the distributed system, the method further comprises:

selecting a (k + 1) th type group from the M type groups;

and taking the storage nodes included in the (k + 1) th type group as the storage node set, and executing the step of acquiring the storage node set to be accessed into the distributed system.

7. The method of claim 3, wherein if the first capacity order is from small to large capacity, and the second capacity order is from large to small capacity, the node capacity of the nth storage node in the jth node group is greater than or equal to the node capacity of the 1 st storage node in the j +1 th node group.

8. The device for determining the storage block in the distributed system is characterized in that the distributed system comprises n storage blocks, and the device comprises an acquisition unit, a sorting unit, a grouping unit and an access unit:

9. A device for determining a block of a storage area in a distributed system, the device comprising a processor and a memory:

the processor is configured to perform the method of any of claims 1-7 according to instructions in the program code.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program for performing the method of any one of claims 1-7.