CN115712390A

CN115712390A - Method and system for determining available data strip slicing number

Info

Publication number: CN115712390A
Application number: CN202211421670.1A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Anchao Cloud Software Co Ltd
Current assignee: Anchao Cloud Software Co Ltd
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-02-24
Anticipated expiration: 2042-11-14
Also published as: CN115712390B

Abstract

The invention provides a method and a system for determining the available data strip fragmentation number, which are applied to a distributed storage system and comprise the following steps: determining the number of available disks, the number of copies and the number of nodes formed by each node in the current state so as to calculate the number of data strip fragments in the current state; and performing at least one loop iteration calculation on each node of the group-formed distributed storage system until the number of the data stripe fragments in the current state is not less than the number of available disks formed on any node in the distributed storage system, and taking the number of the data stripe fragments determined by the last loop iteration calculation as the number of the available data stripe fragments. By the method and the device, trial-and-error time caused by unbalance of the number of available disks among the nodes is reduced, huge and meaningless resource expenses caused by a distributed storage system are reduced, and the response capability of the distributed storage system to data read-write requests is improved.

Description

Method and system for determining available data strip slicing number

Technical Field

The invention relates to the technical field of distributed storage systems, in particular to a method and a system for determining the number of available data stripe fragments.

Background

In a distributed storage system, implementing distributed storage on data through multiple nodes refers to a storage technology in which data is fragmented (sharing) to perform data segmentation, multiple data fragments are formed, and then different data fragments are delivered to different nodes for storage. There are two main ways of data distribution on a distributed storage system.

In the first data distribution mode, data is evenly divided into smaller data blocks within 10MB, and then the data blocks are distributed to different disks of each node in the cluster server based on a consistent hash algorithm. The method is more suitable for a large-scale distributed storage system, and has the advantages that a uniform metadata storage service is not needed, and the calculation can be directly carried out when the fragment position is searched. The second data distribution mode is to divide data into a limited number of data blocks (blocks), automatically generate a data distribution scheme of a storage object by using rules when the storage object is created, and record the generated distribution scheme by using a metadata storage service. The second data distribution mode is more suitable for a small distributed storage system, so that the position of the data fragment can be controlled conveniently, and the requirements of special scenes such as data localization and the like can be realized.

In the second data distribution mode, the number of available stripe fragments is calculated according to the number of available disks in the cluster server and the copy number strategy of data storage, and then, the disk allocation is tried to be performed on several copies of each stripe fragment according to a certain combination sequence. Each time of distribution, a disk is firstly distributed to one copy of the fragments on the strip, and after one copy of all the fragments on the whole strip is distributed to the disk, the disk is then distributed to the next copy of all the fragments on the whole strip. However, each time a node is fetched, all available disks on that node are used up, unless multiple copies of the same stripe slice would be caused to fall on the same node. However, if the number of available disks in each node deployed by the cluster server is not even, the second data distribution method may result in low efficiency of allocating disks to data fragments, and even if a stripe fragment number is not possible to achieve distribution, the stripe fragment number is reduced by 1 after all the arrangements (i.e., the six arrangements shown in fig. 2 to 7) are tried, and then the foregoing process is repeated again. Referring to fig. 1, the applicant exemplarily illustrates data distribution of a file object (i.e., root data) in a scenario in which a cluster server includes three nodes (i.e., node1, node2, and node 3) and each node includes four disks (node 1 includes disk1 to disk4, node2 includes disk5 to disk8, and node3 includes disk9 to disk 12). If two copies per data slice are required and the data slice uses as many disks as possible. When all disk spaces in the cluster server are sufficient, the total number of available disks is 3 × 4=12, and the number of usable stripe segments can be calculated according to the number of copies and the total number of disks: 12 ÷ 2=6. Referring to fig. 2 to 7, the number of data stripe fragments is 3, at this time, the whole root data is split into 6 stripe fragments, each fragment data has two copies (i.e., mirror0 and mirror 1), and two disks located on different nodes need to be used to store the two fragment data, so as to avoid storing two or more stripes with the same copy number in the same node.

According to the flow of generating the data distribution scheme, firstly, disks are distributed for the duplicate mirror0 of the strip 1 to the strip 6, the disks on the node1 are firstly used according to the sequence of node combination, so that four disks on the node1 are distributed for the strip 1mirror0, the strip 2mirror0, the strip 3 mirror0 and the strip 4 mirror0 in sequence, at the moment, the available disks on the node1 are used up, so that the node2 is switched to next, and two disks are distributed for the strip 5 mirror0 and the strip 6 mirror0, at this time, the mirror0 copy of the whole stripe is already allocated, and then the allocation is transferred to the mirror1 copy, and the node2 has two available disks, so that the remaining two available disks on the node2 are allocated to the strip 1mirror1 and the strip 2mirror1, and then the node3 is transferred, and the four disks on the node3 are allocated to the strip 3 mirror1, the strip 4 mirror1, the strip 5 mirror1 and the strip 6 mirror1 one by one. Thus all stripe slices have been allocated two copies to meet the copy number requirement.

For example, the number of available disks in the three nodes constituting the cluster server is 4,1,1, respectively. If the number of copies specified is 2, the number of stripe segments is (4 + 1) ÷ 2=3 by calculation. However, since two copies of the same segment may not be located in the same node, at most three disks can be used on a node with four disks, and therefore no matter how the node order is arranged, the available distribution cannot be obtained according to the number of stripe segments 3, and therefore the node order cannot succeed after all the arrangement orders are tried, and then the number of stripe segments is changed to 2, and at this time, an available distribution scheme can be found. In this scenario, when the number of stripe slices is 3, all six permutations of three nodes are tried, and a reasonable data distribution scheme cannot be found. After reducing the number of stripe slices to 2, an available data distribution scheme can be found. However, this method is time consuming and resource consuming when the number of nodes in the cluster server increases. For example, a cluster server with ten nodes, the permutation formed by the ten nodes is 10 factorial (i.e., 10 |) for 3628800 different combinations, and thus the computation of the number of available data stripe slices is very expensive. Therefore, the current technology for determining the number of available data stripe fragments cannot generate an available data distribution policy in a short time, and even if the available data distribution policy is generated, huge and meaningless resource overhead is caused to the distributed storage system, and thus the distributed storage system has insufficient responsiveness to data read and write requests initiated by users.

In view of the above, there is a need to improve the prior art methods for determining the number of data stripe fragments in a distributed storage system, so as to solve the above technical problems.

Disclosure of Invention

The invention aims to disclose a method and a system for determining the number of available data stripe fragments, which are used for rapidly determining the number of available data stripe fragments in each node in a short time under the condition that the number of available disks in each node in a distributed storage system is extremely unbalanced so as to improve the response capability of the distributed storage system to data read-write requests.

To achieve one of the above objects, the present invention provides a method for determining the number of available data stripe splits, which is applied to a distributed storage system,

the method comprises the following steps:

determining the number of available disks, the number of copies and the number of nodes formed by each node in the current state so as to calculate the number of data strip fragments in the current state;

and performing loop iteration calculation on each node of the group-formed distributed storage system until the number of the data strip fragments in the current state is not less than the number of available disks formed on any node in the distributed storage system, and taking the number of the data strip fragments determined by the last loop iteration calculation as the number of the available data strip fragments.

As a further improvement of the invention, when the number of available disks formed on any node is more than the number of data stripe fragments in the current state in each node of the distributed storage system, the number of data stripe fragments in the current state is gradually decreased, wherein the number of data stripe fragments in the current state is the quotient of the sum of the number of available disks in the current state in all nodes contained in the distributed storage system and the number of copies.

As a further improvement of the present invention, when the number of available disks formed on any node is higher than the number of data stripe fragments in the current state, the number of data stripe fragments is gradually decreased according to the number of copies in the current state, and the number of available disks of the node where the number of available disks is higher than the number of data stripe fragments in the current state is set as the number of available disks in the node for the next loop iteration to calculate the number of available disks in the node, so as to redetermine the number of data stripe fragments determined by the next loop iteration.

As a further improvement of the invention, the data stripe slicing number is rounded down when the data stripe slicing number is gradually decreased according to the copy number of the current state.

As a further improvement of the present invention, the number of available disks included in the nodes included in the distributed storage system is non-uniformly distributed, and the nodes are composed of mechanical hard disks, solid state hard disks, or physical machines including mechanical hard disks and/or solid state hard disks.

As a further improvement of the invention, the method also comprises the following steps: and determining the maximum data stripe fragmentation number which can be formed in each node and the distribution position of the data stripe fragmentation in each node of the distributed storage system according to the available data stripe fragmentation number.

As a further improvement of the invention, the method also comprises the following steps: and generating a stripe data fragmentation strategy containing the available data stripe fragmentation number, and segmenting the data storage object stored in the distributed storage system based on the stripe data fragmentation strategy to form a plurality of data fragments.

As a further improvement of the present invention, after taking the data stripe sliced number determined by the last loop iteration calculation as the available data stripe sliced number, the method further comprises: and randomly generating a node sequence by taking the available storage space contained in each node in the distributed storage system as a weight, and determining the node for next iterative computation according to the node sequence.

As a further improvement of the present invention, after randomly generating the node sequence, the method further includes:

initializing the fragment serial number and the copy serial number of the data strip in the current node, judging whether a next available node exists, if so, judging whether the current node has the next available disk to distribute to the data fragment strip copy in the current state, and if not, determining that the distribution of the available disk fails and ending.

As a further improvement of the present invention, the determining whether there is a copy of a data slice stripe allocated to the current state by the next available disk in the current node includes the following logic:

if the current node has a copy of the data fragment stripe allocated to the current state by the next available disk, judging whether the current copies of all the data fragments of the current data stripe are allocated;

and if the current node does not have the data fragment strip copy distributed to the current state by the next available disk, re-executing the judgment of judging whether the next available node exists.

As a further improvement of the present invention, the determining whether current copies of all data slices of the current data stripe are allocated comprises the following logic:

if the current copies of all the data fragments of the current data stripe are distributed, judging whether the number of the data stripe fragments in the current state is not less than the number of available disks formed on any node in the distributed storage system;

and if the current copies of all the data fragments of the current data stripe are not distributed, adding 1 to the fragment serial number of the data fragment and keeping the copy serial number unchanged, and re-executing the judgment for judging whether the current node has the next available disk to distribute to the current-state data fragment stripe copy.

As a further improvement of the present invention, the determining whether the number of data stripe fragments in the current state is not lower than the number of available disks formed on any node in the distributed storage system includes the following logic:

if yes, determining that the available disk is successfully distributed and finishing;

if not, resetting the fragment serial number to 0, adding 1 to the copy serial number, and re-executing the logic for judging whether the current node has the next available disk to be allocated to the data fragment stripe copy in the current state.

Based on the same invention idea, the invention also discloses a system for determining the number of available data stripe fragments, which is deployed in a distributed storage system and comprises: the storage management module and the data fragment management module;

the storage management module receives a request for creating a data storage object, and sends a data fragment strategy generation request generated by the storage management module to a data fragment management module;

the data fragmentation management module executes the steps of the method for determining the number of available data stripe fragments created by any one of the above inventions, and returns a data fragmentation policy containing information for determining the number of available data stripe fragments to the storage management module.

As a further improvement of the invention, the method also comprises the following steps: the metadata server is connected with the storage management module; and the metadata server deploys a metadata storage service, and the metadata storage service stores the data slicing strategy.

Compared with the prior art, the invention has the beneficial effects that:

in the application, the calculated data stripe fragmentation number can be compared with the available disk number of each node, and the available disk number of each node is subjected to multiple loop iteration calculation to obtain a reasonable value which is enough to ensure that the available data stripe fragmentation number converges to form available data distribution, so that the trial and error time caused by the extreme imbalance of the available disk number among the nodes is reduced, huge and meaningless resource expenses caused by a distributed storage system are reduced, and the response capability of the distributed storage system to data read-write requests is improved.

Drawings

Fig. 1 is a schematic diagram of data fragment distribution formed by dividing root data into six stripe fragments (stripes) and allocating two copies (i.e., mirror0 and mirror 1) of each stripe fragment to three nodes (i.e., nodes 1 to 3) in a distributed storage system scene constructed by one cluster server including three nodes and each node including four disks in the prior art;

FIG. 2 is a diagram of a first arrangement where the number of data stripe slices is 3, and no reasonable data distribution can be found, where the data slice stripe2mirror1 can not find a usable disk;

FIG. 3 is a diagram illustrating that a reasonable data distribution cannot be found by attempting the second arrangement with a data stripe fragmentation number of 3, wherein the data fragmentation stripe2mirror1 cannot find a usable disk;

fig. 4 is a schematic diagram of a data stripe slicing number of 3, and a reasonable data distribution cannot be found by trying the third arrangement, where the data slice stripe2mirror1 cannot find an available disk;

fig. 5 is a schematic diagram of a data stripe fragmentation number of 3, and a reasonable data distribution cannot be found by trying the fourth arrangement, wherein the data fragmentation stripe2mirror1 cannot find an available disk;

fig. 6 is a schematic diagram of a fifth arrangement in which the number of data stripe fragments is 3, and no reasonable data distribution can be found by trying the fifth arrangement, where the data stripe fragment stripe2mirror1 cannot find a usable disk;

fig. 7 is a schematic diagram of the sixth arrangement in which the number of data stripe fragments is 3, and no reasonable data distribution can be found, wherein the data stripe fragment stripe2mirror1 cannot find a usable disk;

FIG. 8 is a general flow chart of a method for determining the number of available data stripe slices according to the present invention;

FIG. 9 is a detailed flowchart of a method for determining the number of available data stripe slices according to the present invention;

fig. 10 is a detailed flowchart of allocating disks for data segments at the same node of the distributed storage system running the available data segment number determination method according to the available data segment number determination method of the present invention;

FIG. 11 is a diagram illustrating four loop iterations of the calculation to determine the final number of usable data stripe slices in one embodiment;

fig. 12 is a topology diagram of a usable data stripe fragmentation number determination system of the present invention.

Detailed Description

The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.

It should be noted that when one element/logic unit is referred to as being "connected" to another element/logic unit, it can be directly connected to the other element or intervening elements may also be present, as well as interactions and/or sessions between logic units that may be formed by computer-executable program packages. Before explaining technical schemes and inventive ideas included in the present application in detail, technical meanings indicated by partial terms or abbreviations referred to in the present application are briefly described or defined. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Before describing the embodiments of the present application in detail, the meanings of the main technical terms and acronyms referred to in the embodiments are explained or defined as necessary. The term "node" (i.e., node1, node2, and node3 in fig. 1 to 7) has an equivalent meaning to the term "node server" or the term "HOST". Unless otherwise specified, the term "disk" in the various embodiments of the present application refers to a physical disk, which includes but is not limited to a mechanical hard disk, a magnetic tape, a RAID, a solid state disk, or an NVMe (Non-Volatile Memory express). The term "cluster" (or "cluster server") broadly refers to a distributed storage system 200 composed of a plurality of physical disks (i.e., disk 201, disk 202 to disk 20i in fig. 12, where parameter i is a positive integer greater than or equal to 1) or a distributed storage system composed of a plurality of physical servers, each of which contains one or more physical disks. The applicant exemplifies the implementation procedures of the available data stripe slice number determining method and the available data stripe slice number determining system disclosed in the present invention through the following embodiments.

Referring to fig. 8 to fig. 11, a method for determining a sharding number of an available data stripe according to an embodiment of the present invention is disclosed. The available data stripe fragmentation number determination method (hereinafter, referred to as "determination method") is used for quickly and accurately determining the available data stripe fragmentation number in the distributed storage system 200, so as to quickly determine the available data stripe fragmentation number in each node in a short time, thereby improving the response capability of the distributed storage system 200 to data read/write requests (i.e., IOPS and TPS).

Since data fragmentation is transparent to the user (application layer), the user does not know in which node of the plurality of nodes comprising distributed storage system 200 the data operation request (i.e., computer operation such as write data operation, modify data operation, migrate data operation, delete data operation) that it initiated to comprise distributed storage system 200 occurred. With the distributed storage architecture, the problem of scalability by the computer system (e.g., cloud platform or data center) of the distributed storage system 200 can be overcome by breaking through the I/O performance limit of a single node. The stripe slicing number formed by data slicing refers to the slicing number for slicing one copy of data to be stored. When the fragmented data is stored, the main data and the backup data to be stored are fragmented in the same fragmentation mode, so that the same data fragments are obtained. Generally, the higher the stripe fragmentation number is, the more the fragmentation number of the data fragments stored in the storage node is, so as to reduce the data fragmentation granularity, so as to more easily store the data uniformly in each node of the cluster, so as to achieve load balancing of each node in the distributed storage system 200 and reliability of the data disaster recovery backup.

A method for determining the number of available data stripe fragments is applied to a distributed storage system 200 (shown in FIG. 12), wherein the number of available disks contained in nodes contained in the distributed storage system 200 is non-uniformly distributed, and the nodes (nodes) are composed of mechanical hard disks, solid state hard disks or physical machines containing the mechanical hard disks and/or the solid state hard disks.

The method for determining the number of the available data stripe fragments comprises the following steps S1 and S2, and is particularly suitable for application scenarios with extremely unbalanced available disks in a plurality of nodes forming the distributed storage system 200, and the method rapidly converges to a reasonable number of the available data stripe fragments with extremely low calculation overhead, so as to determine the width of a stripe (or a numerical value which can also be understood as the number of the available data stripe fragments) and the copy number of a root data which needs to be split in the data striping process according to the number of the available data stripe fragments. For example, mirror0 and mirror1 represent the number of copies as 2, and stripe0 and stripe represent the stripe width (stripe width). Meanwhile, the data striping process is a method of dividing continuous data (e.g., root data) into data blocks of the same size, and writing each data block to a different disk in the array.

S1, determining the number of available disks, the number of copies and the number of nodes formed by each node in the current state so as to calculate the number of data strip fragments in the current state.

When the number of available disks formed on any node is greater than the number of data stripe fragments in the current state through loop iteration calculation on each node forming the distributed storage system 200, the number of data stripe fragments in the current state is gradually decreased, wherein the number of data stripe fragments in the current state is a quotient formed by the sum of the number of available disks in the current state and the number of copies in all nodes included in the distributed storage system 200. Assuming that the sum of the available disk numbers of all nodes in the current state is 12 and the copy number is 2, the quotient formed in the loop iteration calculation is 12 divided by 2, so as to determine that the quotient is 6, and the available disk number of the node with a larger disk number is set to 6 in the next loop iteration calculation, although 10 disks still exist in the node at this time, the other 4 disks still cannot be used as the available disks for storing the fragment data.

When the number of the available disks formed on any node is higher than the number of the data stripe fragments in the current state, the number of the data stripe fragments is gradually decreased according to the number of the copies in the current state, and the number of the available disks of the node where the number of the available disks is higher than the number of the data stripe fragments in the current state is set as the number of the available disks in the node for the next loop iteration calculation, so that the number of the data stripe fragments determined by the next loop iteration calculation is determined again. Specifically, if the node targeted by multiple loop iteration calculations is node1, the number of available disks of the node whose available disk number is higher than the data stripe fragmentation number in the current state is set as the node targeted by the next loop iteration calculation, which is still node1. Therefore, in the process of determining the data stripe slicing number determined by the next loop iteration calculation, the decreasing amplitude of the data stripe slicing number in each loop iteration calculation process is reduced from large to small and tends to a convergence state, so that the calculation resources and the calculation overhead consumed by calculating the available data stripe slicing number are further reduced. Preferably, in this embodiment, rounding is performed downwards when the number of data stripe slices is gradually decreased according to the number of copies of the current state.

And S2, performing at least one loop iterative computation on each node forming the distributed storage system until the number of the data stripe fragments in the current state is not less than the number of available disks formed on any node in the distributed storage system, and taking the number of the data stripe fragments determined by the last loop iterative computation as the number of the available data stripe fragments. The available data stripe fragmentation number determined by the last loop iteration calculation is determined as a stripe data fragmentation strategy and generated, a data storage object (namely, a root data upper concept) stored in the distributed storage system is segmented based on the stripe data fragmentation strategy to form a plurality of data fragments, and finally the data fragments are respectively stored in three nodes. In the embodiment, the root data is finally sliced into four data fragments, and the four data fragments are stored in the nodes 1 to 3.

Each loop iteration corresponds to

steps

110, 111 and 112 contained in dashed box 11 in fig. 9. After the steps included in the dashed box 11 are executed in a loop, the

steps

121, 122 and 123 included in the dashed box 12 are further executed.

As shown in fig. 9, the method for determining the number of available data stripe slices further includes: and determining the maximum data stripe fragmentation number which can be formed in each node and the distribution position of the data stripe fragmentation in each node of the distributed storage system according to the available data stripe fragmentation number.

Specifically, in this embodiment, step 121 is first executed, the available data stripe fragmentation number is calculated, and an available generated stripe data fragmentation policy is generated. After the stripe data fragmentation policy is generated, the data storage objects (e.g., root data) stored in the distributed storage system 200 are fragmented based on the stripe data fragmentation policy to form a plurality of data fragments. And determining the available data strip fragmentation number through multiple loop iteration calculation, and determining the available data strip fragmentation number after the calculation is finished, thereby generating an available data strip fragmentation strategy. After the data stripe slicing number determined by the last loop iteration calculation is taken as the available data stripe slicing number, the method further comprises a jump execution step 122: the node sequence is randomly generated by using the available storage space included in each node in the distributed storage system 200 as a weight. Illustratively, the node sequence is the sequence of data fragments stored in the nodes in the three nodes of the distributed storage system 200. Finally, step 123 is executed: and determining the node for which the iterative computation is performed next time according to the node sequence. Assuming that the node targeted by the loop iterative computation process indicated by the dashed box 11 is node1, the node targeted by the next iterative computation is determined according to the generated node sequence, for example, the previously generated node sequence is [1,2,3], and then the node targeted by the next iterative computation is node2.

Referring to fig. 11, applicants illustrate an example scenario of a distributed storage system 200 comprising three nodes (i.e., node1, node2, and node 3). Meanwhile, the three nodes can also be deployed to form a cluster server. The current state is one state for each round-robin iteration computation. In the example disclosed in the present application, four loop iterations are required to calculate the state, so as to present five current states, that is, the number of available disks in three nodes is sequentially changed from the current state of [10,1,1] to the current state of [6,1,1] through the first loop iteration, then changed to the current state of [4,1,1] through the second loop iteration, then changed to the current state of [3,1,1] through the third loop iteration, and finally changed to the current state of [2,1,1] through the fourth loop iteration, and the process is ended. And calculating the determined data stripe fragmentation number as an available data stripe fragmentation number through multiple loop iterations. Therefore, in the present embodiment, after four loop iteration calculations, the finally calculated and determined data stripe slice number is 2, so that 2 is taken as the available data stripe slice number.

To further illustrate the technical idea of the foregoing technical solution, the applicant further analyzes in detail with reference to fig. 11. In the first current state, assume node1 has 10 available disks, node2 has 1 available disk, and node3 has 1 available disk. At this point, the distribution of available disks in the first current state among the three nodes is extremely unbalanced. The node1 has 10 disks in the first current state and all serves as 1 available disk in the current state, the node2 has 1 disk in the first current state and all serves as 1 available disk in the current state, and the node2 has 1 disk in the first current state and all serves as 1 available disk in the current state. It should be noted that, in the process of striping the root data, the root data is sliced to form a plurality of sliced data, and when the sliced data is stored, the slicing granularity and the storage position are determined only according to the number of data stripe slices determined by the first current state. It should be noted that, objectively, the node1 to the node3 may further include one or more other magnetic disks, but these magnetic disks are not used to store other data or root data for which a read-write request initiated by a user to the distributed storage system 200 is directed. In the process of executing the first loop iteration calculation, the number of available disks for storing the fragment data formed by performing data segmentation on a certain specific root data by the nodes 1 to 3 is 12 (i.e., 12+ 1), the number of copies is 2 (i.e., mirror0 and mirror1 are formed), and the number of nodes is 3. Thus, the data stripe fragmentation number in the first current state is a quotient of 6 (i.e., 12/2=6) formed by the sum of the number of available disks in the first current state and the number of copies in all nodes included in distributed storage system 200. Thus, it is calculated that the number of data stripe fragments determined by performing loop iteration calculation on the first node is 6, at this time, 6 fragment data (i.e., stripe0 to stripe 5) are formed, each fragment data has 2 copies, and the copy numbers are mirror0 and mirror1, respectively, thereby forming 12 descriptors. At this time, if saving 7 data fragments (or 8 data fragments or more) to the node of the original prior 10 disks would result in 2 (or more) different copies of the same data fragment being written to the same node, it is obviously not allowed. Even if a certain node with 10 available disks still has 4 disks allocated to be used, it cannot be written into the fragmented data. Therefore, a second iteration of the loop is required. For example, the slice data of the strip 1mirror0 and the strip 1mirror1 have the slice sequence number of 1, and the slice sequence numbers of the strip 1mirror0 and the strip 1mirror1 are two different copies. Therefore, the technical solution disclosed in the present application aims at ensuring that each loop iteration calculation process avoids two (or even more) pieces of fragment data formed by different copies with the same fragment sequence number from being written into the same node.

Similarly, the second loop iteration calculation is based on the first loop iteration calculation result. Therefore, in the second current state, 6 usable disks are formed in node1, 1 usable disk is formed in node2, and 1 usable disk is formed in node3, and at this time, 8 usable disks are formed in three nodes, and the second current state is formed with the number of usable disks [6,1,1 ]. Since the number of copies is still 2, the data stripe fragmentation number in the second current state is a quotient of 4 (i.e., 8/2=4) formed by the sum of the number of available disks in the second current state and the number of copies in all nodes included in distributed storage system 200. The gradual decreasing amplitude of the data stripe fragmentation number is 2, and finally the data stripe fragmentation number obtained by the second loop iteration calculation is calculated to be 4, and at this time, 4 fragmentation data are formed. Therefore, if the current node targeted by the second loop iteration calculation has only 4 fragment data at most even though there are still 10 disks in the current node (i.e., node 1), and the remaining 2 fragment data are stored in the other two nodes (i.e., node2 and node 3), so as to avoid that the same node stores two fragment data with different copies having the same fragment sequence number.

Further, a third iterative calculation is performed again, and the result of the second iterative calculation is used as the basis. At this time, 6 usable disks are formed in three nodes, and the third current state with the number of usable disks [4,1,1] is formed. Since the number of copies is still 2, the quotient of the number of data stripe slices in the third current state, which is the sum of the number of available disks in the third current state among all nodes included in the distributed storage system 200, and the number of copies is 3 (i.e., 6/2=3), so that the number of data stripe slices determined by the iteration of this loop is 3, and a fourth current state with the number of available disks [3,1,1] is formed. At this time, only 3 pieces of fragment data can be saved at most if in the current node. Two pieces of fragment data with the same fragment sequence number and different copies are not allowed to be stored in the same node, even if 4 usable disks exist in the current node (i.e., node 1), the current node still is not allowed to be written in, and the remaining 2 pieces of fragment data are stored in the other two nodes (i.e., node2 and node 3), so that the situation that the same node stores two pieces of fragment data with the same fragment sequence number and different copy sequence numbers is avoided.

And finally, continuously repeating the step of loop iteration calculation, and executing the fourth loop iteration calculation. At this time, 5 usable disks are formed in three nodes, and the fourth current state with the number of usable disks [3,1,1] is formed. Meanwhile, in the present application, rounding is performed downwards when the number of data stripe slices is gradually decreased according to the number of copies of the fourth current state. Since the number of copies is still 2, the quotient of the sum of the number of available disks in the fourth current state and the number of copies in all nodes included in the distributed storage system, which is the data stripe slice number in the fourth current state, is 2.5 (i.e., 5/2= 2.5), after rounding-down calculation, the finally determined data stripe slice number is 2 (i.e., 2 is calculated by rounding-down on 2.5), and a fifth current state with the number of available disks [2,1,1] is formed. At this time, for the distributed storage system 200 with the copy number of 2, the data stripe fragmentation number of 2, and the node number of 3, the number of available disks in the current node (e.g., node 1) that originally has 10 available disks that can be used for saving the data fragmentation has been changed to 2, and the number of available disks in node2 and node3 that can be used for saving the data fragmentation is still 1. At this time, only 2 pieces of fragment data can be written in node1, and the other two pieces of fragment data are written in 1 available disk respectively possessed by node2 and node3, and there is no possibility that two pieces of fragment data having the same fragment sequence number are saved in the same node. To this end, the process of performing the multiple loop iteration calculation on the current node (i.e., node 1) is ended, and a result that the final available data stripe fragmentation number is 2 is obtained, where the available data stripe fragmentation number is 2, that is, the finally determined stripe width, and at this time, the loop iteration calculation of the node (e.g., node 1) targeted by the multiple loop iteration calculation is ended, where only two disks in node1 may be used as available disks for writing the fragmented data. It should be noted that the number of loop iteration calculations may be 1, or 2 or more.

Finally, the operations of allocating disks for the data fragments by the same node of the distributed storage system 200 are performed through the steps shown in fig. 10. Referring to fig. 10, in this embodiment, after randomly generating the node sequence, and specifically after step 123 is finished, the method for determining the number of available data stripe slices further includes the following steps 301 to 309.

Step 301, initializing the fragment serial number and the copy serial number of the data stripe in the current node, and skipping to execute step 302, determining whether a next available node exists, if so, skipping to execute step 303, determining whether the current node has a copy of the data fragment stripe allocated to the current state by the next available disk, if not, determining that the allocation of the available disk fails and ending, and referring to step 309. Referring to fig. 2, "0" in stripe0 is a slice serial number, and when the slice serial number is increased by 1, it becomes stripe1. "0" in mirror0 is the copy number, and when the copy number is added with 1, the copy number becomes mirror1, and so on. One fragment serial number and one copy serial number jointly form a descriptor for uniquely describing a specific data fragment, so that the storage positions of the data fragments in all nodes are recorded through the unique descriptor.

Meanwhile, as shown in fig. 2 to 7, the method for determining the number of fragments of an available data stripe disclosed in the present application aims to prevent two data fragments corresponding to the same fragment serial number and/or the same copy serial number from being stored in the same node. For example, strip 2mirror0 and strip 2mirror1 in node1 in fig. 2 cannot be stored in node1, and the node sequence of three nodes included in distributed storage system 200 at this time is [1,2,3]. For another example, in fig. 6, the stripe2mirror0 and the stripe2mirror1 cannot be stored in the node1, and the node sequence of the three nodes included in the distributed storage system 200 is [2,3,1]. Thus, two data slices corresponding to two different copies of the same slice sequence number (e.g., slice 2mirror0 and slice 2mirror 1) are prevented from being stored in the same node.

In step 303, determining whether the current node has a next available disk allocated to the data slice stripe copy in the current state includes the following logic: if the current node has a copy of the data fragment stripe allocated to the current state by the next available disk, further skipping to execute step 304, and judging whether the current copies of all the data fragments of the current data stripe are allocated; if the current node does not have the next available disk allocated to the data fragment stripe copy in the current state, the judgment of whether the next available node exists is executed again, and the judgment logics of the step 302 and the step 303 are executed in a circulating manner again. In this implementation, if the determination logic of step 303 is negative, it is proved that the current node has no available disk or that two copies of the same stripe fragment are located in the same node if the current node continues to allocate a copy of the stripe to the node.

Determining whether current copies of all data slices of the current data stripe are assigned in step 304 includes the following logic: if the current copies of all the data fragments of the current data stripe are distributed, judging whether the number of the data stripe fragments in the current state is not less than the number of available disks formed on any node in the distributed storage system 200; if the current copies of all the data fragments of the current data stripe are not allocated, the step 305 is skipped to execute, the fragment sequence numbers of the data fragments are increased by 1 and the copy sequence numbers are kept unchanged, and the judgment of judging whether the current node has the next available disk to allocate to the current state data fragment stripe copy is executed again. And after the step 305 is finished, skipping again to execute the step 303. The aforementioned current copies are copies of the data slice stripe in any one current state, for example, the strip 0mirror0 and the strip 1mirror0 determined and written in node1 in the last iteration of the loop, the strip 0mirror1 written in node2, the strip 1mirror1 written in node3, the strip 0mirror0 and the strip 0mirror1 being two current copies of the strip 0 in the last current state (i.e., the fifth current state), and the strip 1mirror0 and the strip 1mirror1 being two current copies of the strip 1 in the last current state (i.e., the fifth current state).

Then, the step 306 of determining whether the number of data stripe fragments in the current state is not less than the number of available disks formed on any node in the distributed storage system 200 includes the following logic: if yes, go to step 307, determine that the allocation of the available disks is successful and end; if not, step 308 is executed to reset the fragment sequence number to 0 and add 1 to the copy sequence number, and the logic of judging whether the current node has a data fragment stripe copy allocated to the current state by the next available disk is executed again. And after the step 308 is executed, the step 303 is executed again.

Step 309, identify as failing to allocate the available disks and end.

As mentioned above, when the finally determined available data stripe fragmentation number is 2, 4 available disks are formed in three nodes in the distributed storage system 200. Then 2 usable disks in node1 are used to store stripe0mirror0 and stripe1mirror0, 1 usable disk in node2 is used to store stripe0mirror1, and 1 usable disk in node3 is used to store stripe1mirror1, so far, the allocation procedure for storing data fragments with unique descriptors in the usable disks in the three nodes is finished.

However, the node sequence such as [2,1,1] is variable and can be adaptively adjusted according to the actual operation condition and capacity of the node. For example, the node sequence may also be [1,2,1], where 1 available disk in node1 is used to store stripe0mirror0, 2 available disks in node2 are used to store stripe0mirror1 and stripe1mirror0, and 1 available disk in node3 is used to store stripe1mirror1. Meanwhile, the node sequence can also be [1,1,2]. At this time, 1 block in the node1 can be used as a disk for storing the stripe0mirror0, 1 block in the node2 can be used as a disk for storing the stripe0mirror1, and 2 blocks in the node3 can be used as disks for storing the stripe1mirror0 and the stripe1mirror1. The three node sequences can ensure that two data fragments with the same descriptor cannot be stored in the same node, so that two or more data fragments formed by different copies of the same fragment sequence number are prevented from being stored in a plurality of disks in the same node, and the phenomenon of collision of the data fragments is avoided. It should be noted that, as no feature indicates, in the present application, a data slice and a sliced data have the same meaning.

In summary, by the method for determining the number of available data stripe fragments disclosed in the present application, IOPS (i.e., the number of I/O operations per second) and TPS (i.e., the data transmission rate) of the distributed storage system 200 can be significantly improved, so that the response capability of the distributed storage system 200 to the data read/write request initiated by the user is improved. In particular, in this embodiment, the number of usable data stripe fragments finally obtained through one or more loop iterations can converge to a storage location just ensuring that usable fragment data is formed, so that the trial-and-error time caused by the objective reason that the number of usable disks in the first current state is greatly unbalanced among the nodes can be significantly reduced, and the huge and meaningless resource overhead caused by the distributed storage system 200 can be reduced. Finally, compared with the technical scheme in the prior art that different fragmentation numbers need to be tried repeatedly to determine the stripe width, the method for determining the available data stripe fragmentation number of the distributed storage system 200 including 3 nodes disclosed in this embodiment does not need to try out a reasonable available data stripe fragmentation number from 24 different combinations, and gradually decreases the data stripe fragmentation number (i.e., decreases the stripe width) after the arrangement trial and error of the 3 nodes is replaced by a convergence method, so that the calculation overhead of calculating the available data stripe fragmentation number is greatly reduced, the calculation time of calculating the available data stripe fragmentation number is shortened, and the technical problem that two or more copies of the same data stripe are stored in the same node is effectively avoided.

Based on the technical solutions included in the method for determining the number of shards of an available data stripe disclosed in the foregoing embodiments, the present application also discloses a system 100 for determining the number of shards of an available data stripe.

Referring to fig. 12, a system 100 for determining the number of available data stripe slices is deployed in a distributed storage system 200. The available data stripe slice number determination system 100 includes: a storage management module 20, a data slice management module 30 and a metadata server 40 connected to the storage management module 20. A user establishes a session with the available data stripe slice number determination system 100 via the graphical interface 10 and the network 11 to initiate a request to create a data storage object to the available data stripe slice number determination system 100. Data storage object requests include, but are not limited to, initiating computer events to write data to the distributed storage system, modify data, migrate data, or create files. Meanwhile, the type of the distributed storage system 200 is not particularly limited in the present application, and may be applicable to any type based on Hadoop Distributed File System (HDFS), ceph, DAS, NAS, SAN, or the like. The distributed storage system 200 may deploy i disks, such as disk 201 to disk 20i in fig. 12. The i disks form a plurality of nodes (for example, node1 to node3, but not limited to these three nodes) by a virtualization technique. Since virtualization of i disks in a physical state into a plurality of nodes is a mature prior art and is not the point of the present application, the description thereof is omitted in the present specification.

The storage management module 20 receives the request for creating the data storage object, and sends the request for generating the data fragmentation policy generated by the storage management module 20 to the data fragmentation management module 30. The data slice management module 30 performs the steps of a method for determining the number of available data slice slices as disclosed in the above embodiments, and returns a data slice policy containing information for determining the number of available data slice slices to the storage management module 20. The number of available data stripe fragments is sent to the network 11 in the form of a notification by the storage management module 20 and eventually learned by the user. The metadata server 40 deploys a metadata storage service 41, and the metadata storage service 41 stores a data fragmentation policy, so that storage positions of a plurality of data fragments formed after data fragmentation is performed on the basis of a data storage object (for example, a certain root data) targeted by a read-write request initiated by a user are determined according to the data fragmentation policy, and the storage positions are recorded, so that a complete data storage object is formed by the plurality of data fragments in a subsequent process and is fed back to the user. The metadata storage service 41 stores metadata such as the number of nodes, the node name, the disk into which each node is logically divided, the number of copies, the storage location of data fragments, and the like.

The system 100 for determining the number of slices of an available data slice disclosed in this embodiment has the same technical solution as the method for determining the number of slices of an available data slice disclosed in the foregoing embodiment, and reference is made to the foregoing description, which is not repeated herein.

The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. The method for determining the available data stripe slicing number is applied to a distributed storage system,

it is characterized by comprising:

and performing at least one loop iteration calculation on each node of the group-formed distributed storage system until the number of the data stripe fragments in the current state is not less than the number of available disks formed on any node in the distributed storage system, and taking the number of the data stripe fragments determined by the last loop iteration calculation as the number of the available data stripe fragments.

2. The method according to claim 1, wherein the data stripe fragmentation number is gradually decreased when a number of available disks formed on any node is higher than a data stripe fragmentation number in a current state in a loop iteration calculation for each node constituting the distributed storage system, wherein the data stripe fragmentation number in the current state is a quotient formed by a sum of a number of available disks in the current state and a copy number in all nodes included in the distributed storage system.

3. The method according to claim 2, wherein when the number of available disks formed at any node is higher than the number of data stripe slices in the current state, the number of data stripe slices is gradually decreased according to the number of copies of the current state, and the number of available disks of a node where the number of available disks is higher than the number of data stripe slices in the current state is set as the number of available disks in the node for the next loop iteration to calculate the number of available disks in the node, so as to re-determine the number of data stripe slices determined by the next loop iteration.

4. The method of claim 3, wherein the data stripe slice number is rounded down when the data stripe slice number is gradually decreased according to the copy number of the current state.

5. The method for determining the number of available data stripes to be sliced according to claim 1, wherein the number of available disks included in the nodes included in the distributed storage system is non-uniformly distributed, and the nodes are composed of mechanical hard disks, solid state hard disks, or physical machines including mechanical hard disks and/or solid state hard disks.

6. The method for determining the number of available data slices according to claim 1, further comprising: and determining the maximum data stripe fragmentation number which can be formed in each node and the distribution position of the data stripe fragmentation in each node of the distributed storage system according to the available data stripe fragmentation number.

7. The method for determining the fragmentation number of an available data strip according to claim 1, further comprising: and generating a stripe data fragmentation strategy containing the available data fragmentation number, and segmenting the data storage object stored in the distributed storage system based on the stripe data fragmentation strategy to form a plurality of data fragments.

8. The available data slice fraction determining method according to any one of claims 1 to 7, further comprising, after taking the data slice fraction determined by the last loop iteration calculation as the available data slice fraction: and randomly generating a node sequence by taking the available storage space contained in each node in the distributed storage system as a weight, and determining the node for next iterative computation according to the node sequence.

9. The method of determining the number of available data slices of claim 8, further comprising, after randomly generating the sequence of nodes:

10. The method of claim 9, wherein the determining whether the current node has a next available disk allocated to the current state data slice stripe copy comprises the following logic:

and if the current node does not have the copy of the data fragment strip allocated to the current state by the next available disk, re-executing the judgment of judging whether the next available node exists.

11. The method of claim 10, wherein determining whether current copies of all data slices of a current data slice are assigned comprises logic to:

if the current copies of all the data fragments of the current data strip are distributed, judging whether the number of the data strip fragments in the current state is not less than the number of available disks formed on any node in the distributed storage system;

12. The method of claim 11, wherein the determining whether the number of data stripe shards in the current state is not lower than the number of available disks formed on any node in the distributed storage system comprises:

13. A system for determining a fraction of available data stripes deployed in a distributed storage system, comprising: the storage management module and the data fragment management module;

the data slice management module performs the steps of the available data slice fragmentation number determination method according to any one of claims 1 to 12, and returns a data fragmentation policy containing information for determining the available data slice fragmentation number to the storage management module.

14. The system of claim 13, further comprising: the metadata server is connected with the storage management module; and the metadata server deploys a metadata storage service, and the metadata storage service stores the data slicing strategy.