WO2018028229A1

WO2018028229A1 - Data shard storage method, device and system

Info

Publication number: WO2018028229A1
Application number: PCT/CN2017/079971
Authority: WO
Inventors: 王华琼; 高超
Original assignee: 华为技术有限公司
Priority date: 2016-08-10
Filing date: 2017-04-10
Publication date: 2018-02-15
Also published as: CN106302702A; EP3487149B1; US10942828B2; EP3487149A1; CN106302702B; EP3487149A4; US20190171537A1

Abstract

The present application relates to the field of distributed storage, and in particular relates to distributed sharding storage technology. The data shard storage method in a distributed storage system comprises: determining M data nodes required for storing data to be stored, acquiring N copies of the data to be stored, and performing sharding on each copy among the N copies to form X data shards according to the same sharding mode; and then, storing the data to be stored in the M storage nodes, that is, respectively storing N copies of each data shard among the X data shards into the N storage nodes, so that the number of data shards with data shard copies being stored in the same N storage nodes is P or P+1, wherein P is an integer quotient of X divided by (I). Provided is a distributed data shard storage method. Thus, the data availability and the efficiency of data recovery when a node fault occurs are improved.

Description

Data storage method, device and system

The present application claims priority to Chinese Patent Application No. 201610659118.4, the disclosure of which is incorporated herein by reference. In this application.

Technical field

Embodiments of the present invention relate to the field of distributed storage, and in particular, to a method, an apparatus, and a system for storing fragments of data in a distributed storage system.

Background technique

With the rapid development of information technology, the amount of data in the information system database is increasing. In order to meet the storage requirements of large data volumes, distributed storage systems running on multiple servers have been widely used. In a distributed storage system, multiple database systems are running on multiple servers. When data is stored, the data needs to be sharding, and then different data fragments are stored by different servers for storage. Fragmentation is a way of horizontal scaling. A large data set is spread across multiple data nodes. All data nodes form a logical database to store this large data set. The slice is transparent to the user (application layer), and the user does not know which slice server the data is stored on. Using data sharding for data storage can break the I/O capacity limitation of a single-node server and solve the problem of database scalability.

At the same time, in order to ensure high availability of data and services, it is often necessary to provide the necessary fault tolerance mechanism for distributed databases, and perform redundant backup of each data fragment. By storing multiple copies of the same data slice on different servers, you can avoid data fragmentation caused by a single server being unavailable.

In the prior art, data backup in a distributed storage system is usually backed up by means of cross backup. The number of fragments of the data fragment is generally the same as the number of storage nodes, so that the main fragment of each data fragment is stored on each storage node, and the backup fragment is stored in another different from the main fragment. On two storage nodes. For example, Table 1 lists a common storage strategy for storing primary data and two backup data in six data nodes. The data is divided into six total data segments of AF, and each data slice contains one primary data. And two backup data.

存储节点Storage node	数据分片Data fragmentation
节点1Node 1	A C DA C D
节点2Node 2	B A CB A C
节点3Node 3	C A FC A F
节点4Node 4	D B ED B E
节点5Node 5	E D FE D F
节点6Node 6	F B EF B E

Table 1

In the prior art, by storing the primary fragment and the backup fragment on different storage nodes, it can be guaranteed When a storage node fails, the data fragmentation is not lost, and the data slice is lost only when all the data nodes in which the same data fragment is located are faulty. However, when all the data nodes in which the same data fragment is located are faulty, there may be cases where the primary and secondary data of the two data fragments are stored in the same multiple nodes. For example, in the example of Table 1, the data Both fragment A and data fragment C are stored in nodes 1, 2, and 3. When the three data nodes fail, data fragments A and C are lost. In addition, when a single node fails, data recovery is required to form a new node. In the case of data recovery, the efficiency of concurrent recovery is not high. For example, in the example of Table 1, when the node 6 fails, it passes at most respectively. A node that stores F, B, and E data fragments, such as nodes 3, 4, and 5, concurrently implements data recovery, while other nodes cannot participate in recovery.

Summary of the invention

In view of this, the embodiments of the present invention provide a method, a device, and a system for storing data in a distributed storage system, which improve the availability of data backup and the efficiency of data recovery.

In a first aspect, an embodiment of the present invention provides a fragment storage method for data. The method includes determining M data nodes to which the data to be stored is to be stored, and acquiring N copies of the data to be stored, wherein the fragment storage method N of the copy number data is the original data of the data to be stored and the number of copies of the backup data. Sum, each of the N copies is fragmented into X data fragments according to the same fragmentation mode, so that each data fragment has N data slice copies. The N copies of the data to be stored are stored in the M storage nodes, that is, the N data slice copies of each of the X data fragments are stored in the N storage nodes. Wherein, the number of data fragmentation copies stored in the same N storage nodes is the smallest, and specifically, the number of data fragments stored in the same N storage nodes is P or P+. 1, where P is the X divided by

Integer quotient (integer quotient refers to an incomplete quotient or a partial quotient, for example, X is 10,

When it is 3, the integer quotient P is 3). Thus, since the number of data slice copies in the same N storage nodes is the smallest, when any N storage nodes fail simultaneously, the largest data loss may be minimized, compared to the prior art. Reduce the data loss ratio and improve the availability of data backup. At the same time, since the copies of each data slice are evenly distributed on different nodes, when a node fails, the data fragments stored on the node can be recovered by storing corresponding copies in a plurality of different nodes, thereby Improve the efficiency of concurrent recovery.

In a possible design, the copy is time-divided, and the number of slices X is based on the optimal slice base Y, wherein

The number of slices X may be equal to or smaller than the product of the optimal slice base Y and the coefficient K, where K is an integer greater than or equal to one.

In an implementation of the design, the value of X is less than the product of Y and K. At this time, the closer the value of X is to the product of Y and K, when N storage nodes fail at the same time, it may cause The smaller the amount of data loss, the smaller the proportion of total stored data.

In an implementation of the design, the value of X is the product of Y and K. At this time, when N storage nodes fail simultaneously, the maximum amount of data loss that may be caused is the ratio of the total stored data. The smallest.

In an implementation manner of the design, when determining the number of fragments X of the data to be stored, the coefficient K may be determined according to the load balancing requirement of the distributed storage system. The larger the value of K, the higher the degree of balanced load of the data to be stored.

In a possible design, the number of to-be-stored is determined according to the balanced load situation of the current distributed storage system. According to the number of fragments X. When it is required to increase the degree of balanced load of the data to be stored in the distributed storage system, a larger number of slices X may be taken, thereby obtaining a smaller data granularity and increasing the degree of balanced load.

In a possible design, the number N of copies of the data to be stored is determined according to the security requirement of the data to be stored, wherein the larger the value of the number of copies N, the higher the security requirement of the data to be stored that can be satisfied. That is, to lose a data fragment, the number of nodes that need to fail at the same time is more.

In a possible design, the number N of copies of the data to be stored is determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies. This provides more flexible data availability guarantees for different types of data.

In a possible design, when the data to be stored is stored in the storage node, the N data nodes are selected from the M data nodes.

The combination of data nodes; determine the number of fragments X divided by

The resulting quotient P and the remainder Q;

Selecting a combination of Q data nodes in the combination of the data nodes for storing P+1 data fragments, and the rest

The combination of data nodes is used to store P data slices, wherein N copies of each data slice are respectively stored on N different data nodes in a combination of data nodes to be stored.

In a second aspect, an embodiment of the present invention provides a method for determining data fragmentation in a distributed storage system, the method comprising: determining M data nodes to be stored in the data to be stored, and obtaining a copy number N of the data to be stored, thereby determining The number of fragments X of data to be stored. Wherein, it is equal to or smaller than the product of the optimal fragment base Y and the coefficient K, wherein K is an integer greater than or equal to 1. The closer the value of X is to the product of Y and K, the smaller the ratio of the largest data loss to the total amount of stored data when N storage nodes fail at the same time. When the value of X is the product of Y and K, the largest possible data loss is the smallest proportion of the total stored data.

In a possible design, the coefficient K is determined according to the load balancing requirement of the distributed storage system, wherein the coefficient K is an integer greater than or equal to 1, and the value of the K is larger, and the load balancing of the data to be stored is performed. The higher the degree. The number of fragments X is equal to or less than

In a possible design, after determining the number of slices X, determining a storage policy for storing data to be stored in the M storage nodes, wherein each of the X data fragments is fragmented The N data slice copies are respectively stored in the N storage nodes of the M storage nodes, and the number of data fragments stored in the same N storage nodes as the data slice copies is P or P+ 1, where P is the X divided by

The integer quotient.

In a possible implementation manner of the design, the specific manner of determining the storage policy is: determining that N of the data to be stored are stored in the M data nodes, and selecting N data from the M data nodes. Node's

The combination of data nodes; determine the number of fragments X divided by

The resulting quotient P and the remainder Q;

In a possible design, the number N of copies of the data to be stored may be determined according to the security requirement of the data to be stored. The larger the value of the number of copies N, the higher the security requirement of the data to be stored that can be satisfied.

In a possible design, the number N of copies of the data to be stored is determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies. This allows different levels of data to be available for different data types. Sexual protection.

In a third aspect, an embodiment of the present invention provides a distributed storage device, which can implement the functions in the foregoing method design of the first or second aspect. The functions may be implemented by hardware or by corresponding software implemented by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware.

In one possible design, the structure of the device includes a processor and a memory coupled to the processor. Wherein the processor invokes instructions stored in the memory for performing the method of the first or second aspect described above.

In a possible design, the device includes an obtaining unit and a storage unit, wherein the obtaining unit is configured to determine M storage nodes to which the data to be stored is to be stored and obtain N copies of the data to be stored, wherein the The N copies include the original data of the data to be stored and N-1 backup data of the original data, and each of the N copies is sliced into X data fragments according to the same fragmentation manner. So that each data slice has N data slice copies, N is less than or equal to M. Determining, by the unit, the N copies of the data to be stored to the M storage nodes, wherein N data slice copies of each of the X data fragments are respectively stored in the Among the N storage nodes among the M storage nodes, and the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by

The integer quotient.

In a possible design, the device comprises an obtaining unit and a determining unit, wherein the acquiring unit is configured to acquire the M data nodes to which the data to be stored is to be stored and the number N of copies of the data to be stored. The determining unit determines the number of slices X according to the method of the aforementioned second aspect.

In a fourth aspect, an embodiment of the present invention provides a distributed storage system. The distributed storage system includes a client, a plurality of hard disks, and a distributed storage device, and the distributed storage device may be the device in the foregoing design of the third aspect, for performing the foregoing first aspect or the second aspect relative to the method .

In a fifth aspect, an embodiment of the present invention provides a distributed storage system, where the system includes a client and a distributed storage server system, where the distributed storage server system may include: a control server, an operation and maintenance management (OAM) server. , business servers, storage resource pools, and storage engines. Here, the storage engine may be used to perform the corresponding method of the aforementioned first aspect or second aspect.

Compared with the prior art, the present invention provides a method for data slice storage in a distributed environment, which improves the availability of data and the efficiency of data recovery when a node fails.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, only some embodiments of the present invention are reflected in the following drawings, and other embodiments of the present invention can be obtained according to the drawings without any inventive labor for those skilled in the art. . And all of these embodiments or embodiments are in the invention Within the scope of protection.

1 is a schematic diagram of a possible system architecture of the present invention;

2 is a schematic flowchart of determining a fragment storage policy of a distributed storage system according to an embodiment of the present invention;

3 is a schematic diagram of a possible fragment storage policy of a distributed storage system according to an embodiment of the present invention;

4 is a schematic diagram of node recovery in a possible distributed storage system fragment storage policy according to an embodiment of the present invention;

FIG. 5 is a diagram showing an example of relationship between the number of fragments after a multi-node failure and a data loss ratio in a possible scenario according to an embodiment of the present invention; FIG.

FIG. 6 is a schematic structural diagram of a distributed storage device according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of still another distributed storage device according to an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of a distributed storage system according to an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of still another distributed storage system according to an embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram of a distributed storage device according to an embodiment of the present invention.

FIG. 11 is a schematic structural diagram of still another distributed storage device according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

In order to facilitate the understanding of the embodiments of the present invention, the system architecture of the distributed storage system to which the present invention is applied is first introduced. Distributed storage systems distribute data across multiple independent devices. The traditional network storage system uses a centralized storage server to store all data. The storage server becomes a bottleneck of system performance, and is also the focus of reliability and security, and cannot meet the needs of large-scale storage applications. The distributed network storage system adopts a scalable system structure, uses multiple storage servers to share the storage load, and uses the location server to locate the storage information, which not only improves the reliability, availability and access efficiency of the system, but also is easy to expand.

As shown in FIG. 1, it is a schematic diagram of a distributed storage system. It should be noted that the distributed storage system is only an example, and the scope of application of the present invention is not limited thereto. As shown in the distributed storage system, a distributed database engine 102 and a distributed data storage node 107 are included. The distributed database engine 102 is the core of the system, and is responsible for data parsing, routing, distribution, merging, etc., and manages a plurality of storage nodes at the bottom; the distributed storage node is composed of a plurality of data nodes for storing data. Users can flexibly build clusters of data nodes of different sizes according to their needs.

The distributed database engine 102 includes an API (Application Programming Interface) module 103 that provides an interface to the client to invoke the database. The resource application module 104 determines the number of nodes that are provided to the client for the current storage requirement according to the storage requirements of the client and the storage capacity of each node provided in the distributed data storage node. Optionally, the resource application module may also be reliable according to the data submitted by the user. The sexual demand determines the number of backup copies of the data to be stored. The data management module 105 determines a storage policy according to the applied storage resource, that is, the number of data fragments and the data. The correspondence between the fragment and the storage node. The data routing module 106 routes the request from the client according to the storage policy determined by the data management module, fragments the data and routes the data to the data node, or aggregates the data of each node and returns the data to the client.

It should be understood that in a distributed storage system, the function of the module is implemented by a server. Usually, a certain functional module can be implemented by a separate server, but in some cases, it can also be through a server. A plurality of functional modules are implemented, or a functional module has a cluster of multiple servers.

Referring to FIG. 2, it is a schematic flowchart of Embodiment 1 of the present invention. In the embodiment of the present invention, a method for determining a data fragmentation storage policy in a distributed storage system is provided. In conjunction with the foregoing description, the embodiment of the present invention mainly determines the storage strategy of the data fragment by improving the data management module 105.

In the embodiment of the present invention, the client initiates a data storage request to the distributed storage system, and stores the data to be stored into the distributed storage system. It should be understood that, in this embodiment, the order of execution between method steps is not limited. Referring to FIG. 2, those skilled in the art can understand that S101 and S102 are both pre-steps of step S103, that is, S101 and S102 may be executed in any order, or may be performed in parallel in one or more steps.

As shown in Figure 2, the method includes:

S101. Determine M data nodes to which the data to be stored is to be stored.

Specifically, in general, according to the data size of the data to be stored and the amount of storage that each data node can provide, the M data nodes to be stored can be determined. In some cases, the number of data nodes stored in the data may also be a preset fixed value or the total number of storage nodes. In addition, the number of data nodes M can also be determined according to the set value of the user according to the API interface.

In a design, when determining the data node to be stored, the M data nodes with lower load levels can be selected as the data nodes to be stored according to the load condition of each node, thereby improving the entire distributed storage system. The degree of balanced load.

S102. Obtain N copies of data to be stored.

For convenience of description, in the present invention, the copy number N refers to the sum of the original data of the data to be stored and the number of copies of the backup data, that is, the N copies include the original data of the data to be stored and the N of the original data. - 1 backup data. In order to ensure the availability of data, redundant backups of stored data are required. The larger the number N of copies of the data to be stored, the higher the degree of redundancy of the data to be stored, and the better the reliability of the data to be stored, but also the larger storage space. In general, the number of copies of the data is a preset value, which can be set by the user in advance, or according to different data to be stored, each time the request is stored.

It can be understood that in order to ensure the isolation of the redundant backup of the copy, the copy of the same data should be stored on different storage nodes, so as to ensure that when a storage node fails, other copies are not lost. Therefore, for the same distributed system, the value of the copy number N should be less than or equal to the value of the number M of data nodes.

Optionally, the number N of copies of the data to be stored may be determined according to security requirements of the data to be stored. When the security requirement of the data to be stored is higher, the determined value of the number N of copies of the data to be stored is larger. The security requirements of the data can be directly obtained by the user's storage request, that is, the user requests different security requirements for storing the data in different storage requests; and the preset judgment logic, for example, the correspondence between different data types and security requirements, Or the correspondence between different user types and security requirements, etc., to determine the security requirements of the data to be stored. In some server platforms, for example Under the platform of the platform-as-a-service platform, the security requirements of different applications of the users on the platform are different. The security requirements of the data to be stored can also be determined according to different applications or application types. .

Optionally, the number N of copies of the data to be stored may be determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies, so as to protect different types of data with different degrees of availability.

Each of the N copies is sliced into X data slices according to the same slice mode such that each data slice has N data slice copies. The number of slices X refers to the number of slices in which a copy of the data to be stored is sliced. When performing slice storage, both the primary data and the backup data of the data to be stored need to be sliced in the same fragmentation manner, thereby segmenting the primary data and the corresponding backup data. After the sharding, each copy of the data is divided into the same X data fragments, so for one data shard, there are N identical shards containing the data shard. It can be understood that, when the number of fragments X of the data to be stored is determined, a total of N×X data fragments to be stored need to be stored in the M nodes.

In some cases, the number of fragments X may be preset by the user, that is, the same number of fragments may be used for any data to be stored, or the data may be set by the user when performing a storage request according to different data to be stored. The number of shards.

Optionally, the higher the number of fragments, the greater the number of data fragments stored in the storage node, thereby reducing the granularity of the data fragmentation, and it is easier to uniformly store the data in each node, thereby achieving load balancing as much as possible. . Therefore, according to the load balancing situation of the current distributed storage system, different number of fragments are set for the stored data. For example, when the distributed system has a high demand for load balancing, the number of fragments is dynamically increased to improve the balanced load of the distributed system.

S103. Store N copies of the to-be-stored data to the M storage nodes.

Specifically, the following method is performed when storing: storing N data slice copies of each of the X data fragments in N storage nodes of the M storage nodes, And the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by

The integer quotient.

The number of data fragments on the same N storage nodes means that in a distributed system, for any N data nodes, data points of all N copies of the data fragments are stored on the N data nodes. The number of pieces. The number of data fragments that cause the data slice copies to be stored in the same N storage nodes is P or P+1, where P is X divided by

The integer quotient is essentially the smallest number of data fragments stored on the same N nodes. That is, the data nodes need to be uniformly stored in the combination of possible N data nodes, so that the number of data fragments stored in each N data node combination is relatively uniform, thereby arbitrarily selecting N data nodes, and possibly storing them. Among the number of data nodes of all N copies, the maximum value is the smallest. It can be understood that each data slice is stored on N nodes, that is, a combination of N nodes. For a distributed system that includes M storage nodes, a combination of N nodes is selected from which a total of

Different combinations. Therefore, when the number of fragments X is smaller than

When each data fragment can select different N data fragments, that is, the number of data fragments stored on the same N nodes is 1; when the number of fragments X is greater than

When there are multiple data fragments stored on the same N nodes. Specifically, set the number of fragments X divided by

The resulting integer quotient P and the remainder Q, then,

A combination of Q data nodes in a combination of data nodes for storing P+1 data fragments, and the rest

The combination of data nodes is used to store P data fragments.

For example, when there are 40 data slices that need to be stored, each data slice has 3 copies, and in a distributed system, There are 20 different combinations of 3 data nodes. Therefore, when storing, the number of data fragments stored in the same 3 storage nodes is minimized, that is, each combination stores all copies of 2 different data nodes; When there are 50 data fragments to be stored, 10 data node combinations are required to store all the copies of 3 different data nodes, and the other 10 data node combinations store all the copies of 2 different data nodes.

The following example shows a specific storage algorithm to obtain a storage strategy that satisfies the foregoing storage mode. It should be understood that the algorithm is merely one design for a storage strategy for storing data to be stored in a storage node in accordance with the foregoing principles. For those skilled in the art, on the basis of understanding the foregoing allocation principles, the foregoing storage strategies can be implemented by a plurality of different specific algorithms, which are not enumerated here.

The algorithm includes the following steps:

1. The X pieces of data to be stored are numbered 1, 2, 3, ... X;

2. For each data node number 1, 2, 3...N;

3, establish a storage allocation table, the storage allocation table contains

Row, N columns, one row contains N data nodes, each row contains different combinations of data nodes, that is, the data nodes in each row are

One of a combination of data nodes;

4. Establish a correspondence between the data fragment and the data node combination of each row in the storage allocation table. Among them, set the number of fragments X divided by

The obtained quotient P, the Nth row in the storage allocation table respectively corresponds to the Kth data node,

(1 ≤ i ≤ P), and K ≤ X;

5. According to the established correspondence, the N copies of each data node are stored in the N data nodes of the row corresponding to the storage allocation table, and the storage policy is a storage policy that conforms to the foregoing principles.

In order to facilitate the understanding of the present embodiment, a specific example of the storage strategy obtained in accordance with the embodiment of the present method is as follows. As shown in FIG. 3, the data to be stored having a fragment number X of 20 is stored in a distributed system having a data node number M of 6 in a number of copies of N, and one of the storage strategies obtained according to the embodiment of the present invention is as shown in FIG. Kind. In this strategy, due to

Just equal to the number of shards, the number of data shards stored in the same three storage nodes is the smallest, that is, each of the three storage node combinations stores three copies of one data shard. Therefore, in this example, three storage nodes are arbitrarily selected, all of which store only all three copies of one data slice. For example, nodes 1, 2, 3 only completely store all 3 copies of data slice A, while nodes 1, 2, 4 only completely store all 3 copies of data slice B. Since each data fragment is stored in three different data nodes, at least three data nodes need to be simultaneously failed to completely lose the data fragment. In the storage strategy of this example, when any three data nodes fail at the same time, only one data fragment is lost. For example, when nodes 1, 2, and 3 fail, only data fragment A is lost, while other data fragments retain at least one data fragment replica in the remaining nodes.

Meanwhile, in this example, when any one of the data nodes fails, since the copies of the data fragments are evenly dispersed on the remaining data nodes, the remaining nodes can simultaneously perform data recovery on the nodes. For example, as shown in FIG. 4, a possible data recovery mode when the node 5 fails is listed, and a copy of the data slice stored in the node can be respectively passed through a copy of the gray portion of the other 5 data nodes. restore.

According to the embodiment of the present invention, the data of different number of fragments and the number of copies are stored in a distributed system, and the number of fragments and the number of copies can be flexibly adjusted according to different conditions of data to be stored. Since the number of data fragments stored in the same N nodes is guaranteed to be the least, when N nodes fail at the same time, the amount of data fragmentation may be minimized, thereby improving the availability of data backup. At the same time, because this scheme can achieve higher number of fragments than nodes The number of storage strategies, so the number of data fragments stored by a single node is increased. When a single node sends a failure, it can participate in data recovery because the copies of the stored data fragments are evenly distributed among other data nodes. The number of data nodes is increased, that is, the number of data recovery concurrency at the time of node failure is improved, and the data recovery efficiency is improved.

Next, a second method embodiment of the present invention will be described. In this embodiment, a method for determining the number of data segments to be stored is given. By using the number of data segments to be stored, better data availability can be achieved. In this embodiment, how to determine the number of data nodes to be stored and the number of copies of the data to be stored is similar to the method described in S101 and S102 of the foregoing embodiment, and details are not described herein again. In addition, the number of fragments of the data to be stored determined according to the embodiment of the method may be used to slice the data copy according to the step S102 in the foregoing embodiment, and the similar introduction is no longer in this embodiment. Narration.

The number of fragments X for determining data to be stored in this embodiment includes:

S201. Determine, according to the number of copies N and the number of storage nodes M, an optimal fragment base Y of the data to be stored, where

S202. Obtain a number of fragments X of data to be stored according to the optimal fragment base Y. The number of fragments X of the data to be stored is equal to or smaller than a product of the optimal fragment base Y and a coefficient K, where K Is an integer greater than or equal to 1.

It can be seen from the foregoing that when the number of copies is N and the number of storage nodes is M, when data to be stored is stored in the M nodes, each data fragment of the data to be stored needs to be stored in N nodes. Select N data nodes from M data nodes, a total of

The combination method. In order to improve data availability, when any N nodes fail, the number of possible data fragments may be minimized, and each data fragment should be stored as much as possible in a combination of different data nodes. It can be seen that when the number of fragments is smaller than

The larger the number of shards, the smaller the maximum amount of data that can be lost when N nodes fail. Specifically, if the number of fragments is X, and the size of each data fragment is equal,

When the N nodes fail, the maximum amount of data that can be lost is 1/X of the total data amount. Therefore, when

When the total amount of data that may be lost is the smallest,

And when the number of fragments X is greater than

When there are 2 or more data fragments stored on the same N nodes, when the N nodes fail, 2 or more data fragments may be lost. Set X by

The integer quotient is P, and when N nodes fail, and each data slice is equal in size, the maximum amount of data that may be lost is P/X of the total data amount. It can be seen that when X is

Integer multiple, the value of P/N is equal to

The total amount of data that can be lost at this time is also minimal.

For the sake of understanding, the following is an example of the case where the number of data nodes is 6, the number of copies is 3, and the size of the data fragments is equal. For example, when different data points fail under different number of fragments, The resulting maximum amount of data loss as a percentage of the total number of changes. As shown in FIG. 5, the abscissa is the number of data slices X, and the ordinate is the ratio of the maximum amount of data loss that may be caused when any three data nodes fail, and the function image is as shown in the figure. among them,

If the number of data fragments is 6, the maximum loss data at the three-point failure is 1/6 of the total data;

If the number of data fragments is 7, the maximum loss data at the three-point failure is 1/7 of the total data;

If the number of data fragments is 20, the maximum lost data at the three-point failure is 1/20 of the total data;

If the number of data fragments is 21, the maximum loss data at the three-point failure is 2/21 of the total data;

If the number of data fragments is 40, the maximum loss data at the three-point failure is 2/40 of the total data;

If the number of data fragments is 41, the maximum loss data at the three-point failure is 3/41 of all data;

It can be seen that

As a base, the number of fragments X of the data to be stored is equal to the product of the optimal fragment base Y and the coefficient K. When N nodes fail, the maximum amount of data that may be lost accounts for the smallest proportion of the total data amount. That is, the availability of data is the highest. At the same time, the number of data fragments X is smaller than

or

When the integer multiple is used, the closer the fragment number X is

or

The integer multiple, the smaller the ratio of the maximum amount of data that may be lost to the total amount of data, the higher the availability of the data, so when the number of fragments X of the data to be stored is smaller than and close to the optimal fragment base Y and coefficient The product of K also gives relatively high data availability.

It can be seen that when the number of data fragments X is

or

When it is greater than 1 integer multiple, the optimal value can be achieved from the availability of data. And when the optimal value cannot be taken, the value is smaller and closer

or

When the integer multiple is greater than 1, the availability of data is higher. Therefore, in determining the number of slices X, in order to achieve the best in data availability, it should be selected

or

An integer multiple greater than 1 as the number of fragments X; and when considering other factors, it will not

or

When the integer multiple of more than 1 is used as the number of fragments X, the value of the number of fragments X is smaller and closer.

or

The integer multiple of more than 1, the more the data availability can be improved.

Specifically, considering that the number of fragments X is smaller than

or

When the integer multiple is greater than 1, the final value of X can be determined in conjunction with the effect to be achieved by applying the specific scene of the present invention. When the desired technical effect is achieved, when N nodes fail, the largest possible data loss ratio is less than Q. It can be understood from the foregoing that when K takes an integer greater than or equal to 1, in the interval

When N nodes fail, the maximum data loss ratio that may be caused is monotonously decreasing, and the maximum data loss ratio corresponding to the value of X in the interval is K/X. Therefore, to make K/X smaller than Q, the value of X should be greater than K/Q. Correspondingly, when the value of K/Q is less than or equal to

When X is taken

Any value in the range can satisfy the maximum possible data loss ratio less than Q; when the value of K/Q is greater than

When X is in the interval

Any value in the middle can satisfy the maximum possible data loss ratio less than Q.

Optionally, the higher the number of fragments, the higher the load balance that can be achieved by storing the data to be stored in the data node. Therefore, in the implementation manner, the coefficient can be determined according to the load balancing requirement of the distributed storage system. K. The coefficient K is an integer greater than or equal to 1, used to determine a multiple of the optimal slice base. When the equalized load demand of the data to be stored is higher, the value of the coefficient K is larger. The value of the number of slices X is equal to or smaller than the product of the optimal slice base Y and the coefficient K. That is, when the number of slices X is equal to the product of the optimal fragmentation base Y and the coefficient K, the optimal data availability can be obtained and the load balancing requirement of the distributed storage system can be satisfied; and when other factors are comprehensively considered, the fragmentation is made. When the value of the number X is not the product of the optimal fragment base Y and the coefficient K, the closer the value of the number of slices X is to the product of the optimal fragment base Y and the coefficient K, the more the data availability is. High, and more able to meet the load balancing needs of distributed storage systems.

In this embodiment, by determining the optimal slice base Y and determining the number of slices according to the optimal slice base, the availability of the data can be further improved on the basis of realizing the advantages of the foregoing embodiments, so that the same is In the case of the number of data nodes and the number of copies, the optimal or relatively optimal data availability can be achieved based on the number of slices determined by the optimal slice base. At the same time, since the determined number of fragments is often larger than the number of nodes, the load balancing of the distributed system is improved, and the concurrent recovery efficiency when a certain node fails.

Referring to FIG. 6, FIG. 6 is a distributed storage device 600 according to an embodiment of the present application. The device 600 may be a node deployed in a distributed storage system, or may be independent in a distributed storage system. Data management device. The device 600 includes, but is not limited to, a computer, a server, etc., as shown in FIG. 6, the device 600 includes a processor 601, a memory 602, a transceiver 603, and a bus 604. The transceiver 603 is configured to transceive data with and from external devices, such as other nodes in a distributed system or network devices other than distributed systems. The number of processors 601 in device 600 can be one or more. In some embodiments of the present application, processor 601, memory 602, and transceiver 603 may be connected by a bus system or other means. For the meanings and examples of the terms involved in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein again.

The program code can be stored in the memory 602. The processor 601 is configured to call the program code stored in the memory 602 for performing the operations of S101, S102, and S103 in the foregoing embodiment:

For the understanding of the above operations, reference may be made to the description in the foregoing first method embodiment, and details are not described herein again.

Optionally, the processor 501 is further configured to perform a refinement or an alternative of the foregoing steps in the first embodiment.

Optionally, in this embodiment, when determining the number of fragments X, the processor 501 may further determine the number of fragments X by performing operations S201 and S202: determining, according to the number of copies N and the number of storage nodes M, The best shard base Y for storing data, the best shard base

Acquiring the number of fragments X of the data to be stored according to the optimal fragment base Y, the number of fragments X of the data to be stored is equal to or smaller than the optimal fragment base Y or equal to or smaller than the optimal fragment base The natural multiple of Y.

For the understanding of performing the above steps, reference may be made to the introduction in the foregoing second embodiment, and the above steps may also be extended or refined with reference to the foregoing second embodiment.

It should be noted that the processor 601 herein may be a processing component or a general term of multiple processing components. For example, the processing component may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application. For example, one or more digital singal processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs).

The memory 603 may be a storage device or a collective name of a plurality of storage elements, and is used to store executable program code or parameters, data, and the like required for the application running device to operate. And the memory 603 may include random access memory (RAM), and may also include non-volatile memory such as a magnetic disk memory, a flash memory, or the like.

The bus 604 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 6, but it does not mean that there is only one bus or one type of bus.

The user equipment may also include input and output means coupled to bus 604 for connection to other portions, such as processor 601, via a bus. The user may implement the steps of manually configuring or preset parameters in this embodiment through the input device. The input and output device can provide an input interface for the operator, so that the operator can select the control through the input interface. The item can also be another interface through which other devices can be externally connected.

Referring to FIG. 7, FIG. 7 is a distributed storage device 700 according to an embodiment of the present disclosure. The device 700 may be a node deployed in a distributed storage system, or may be independent in a distributed storage system. Data management device. The device 700 includes, but is not limited to, a computer, a server, etc., as shown in FIG. 7, the device 700 includes a processor 701, a memory 702, a transceiver 703, and a bus 704. The transceiver 703 is configured to transceive data with and from external devices, such as other nodes in a distributed system or network devices other than distributed systems. The number of processors 701 in device 700 can be one or more. In some embodiments of the present application, processor 701, memory 702, and transceiver 703 may be connected by a bus system or other means. For the meanings and examples of the terms involved in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein again.

The program code can be stored in the memory 702. The processor 701 is configured to call the program code stored in the memory 702 for performing the following S201, S202 operation operations, thereby determining the number of fragments when the slice storage is performed in the distributed storage system.

For the understanding of the foregoing operations, refer to the introduction in the foregoing second method embodiment, and details are not described herein again.

Optionally, the processor 701 is further configured to perform the refinement or the optional solution of the foregoing steps in the second embodiment.

It should be noted that the processor 701 herein may be a processing component or a collective name of multiple processing components. For example, the processing component may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application. For example, one or more digital singal processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs).

The memory 703 may be a storage device or a collective name of a plurality of storage elements, and is used to store executable program code or parameters, data, and the like required for the application running device to operate. And the memory 703 may include random access memory (RAM), and may also include non-volatile memory such as a magnetic disk memory, a flash memory, or the like.

The bus 704 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 7, but it does not mean that there is only one bus or one type of bus.

The device may also include input and output devices coupled to bus 704 for connection to other portions, such as processor 701, via a bus. The user may implement the steps of manually configuring or preset parameters in this embodiment through the input device. The input/output device can provide an input interface for the operator, so that the operator can select the control item through the input interface, and can also be other interfaces through which other devices can be externally connected.

FIG. 8 is a schematic block diagram of a distributed storage system 800 in accordance with an embodiment of the present invention. The distributed storage system 800 includes a client 810, a plurality of hard disks 820, and a distributed storage device 830. The distributed storage device 830 may be the distributed storage device 600 and the distributed storage device 700 shown in FIG. 6 or FIG. 7 , and details are not described herein again.

The hardware entity of the distributed system provided in this embodiment can be understood by referring to the foregoing distributed system architecture in FIG. In the embodiment of the present invention, the distributed database engine 102 is not a storage device 830 as a hardware entity. Therefore, in the data management module 105 improved in the embodiment of the present invention, the hardware entity corresponding to the bearer in the embodiment is Distributed storage device 830.

The distributed storage device 830 stores/reads the user's data file on the plurality of hard disks 820 according to the storage/read request transmitted by the user through the client 810.

FIG. 9 is a schematic block diagram of another distributed storage system 900 according to an embodiment of the present invention. Distributed storage system 900 includes a client 910 and a distributed storage server system 920.

Client 910 can connect to storage server system 920 via the Internet.

The client 910 can run a client agent of the distributed storage system to support various types of distributed storage applications to access the distributed storage system. For example, the client agent can implement personal online storage and backup, enterprise online storage. And backup, application online storage or other emerging storage and backup, and more.

The distributed storage server system 920 can include a control server 930, an operation and maintenance management (OAM) server 940, a service server 950, a storage resource pool 970, and a storage engine 980. Here, storage engine 980 may be an example of the distributed storage device of FIG. 6 or 7.

The hardware device in this embodiment can be understood in accordance with the distributed architecture in FIG. 1 described above. The storage engine 980 in this embodiment implements the functions of the distributed database engine 102, and the distributed storage server system 920 also includes distributed Other functional servers related to the system, such as the control server 930, the operation and maintenance management server 940, the service server 950, and the like.

The control server 930 is mainly used to control the distributed storage system to perform various types of storage services, such as relocation, moving and backup of organizational data, and elimination of storage hotspots.

The operation and maintenance management server 940 can provide a configuration interface and an operation and maintenance interface of the storage system, and provides functions such as logs and alarms.

The service server 950 can provide functions such as service identification and authentication to complete the service delivery function.

The storage resource pool 970 may include a storage resource pool composed of physical storage nodes. For example, the storage resource pool may be composed of a storage server/storage board 960. The virtual nodes in each physical storage node form a storage logical ring, and the user's data files may be stored in the storage resource pool. On the virtual node in the storage resource pool.

The storage engine 980 can provide logic for the main functions of the distributed storage system. The logic can be deployed on one of the control server 930, the service server 950, and the operation and maintenance management server 940, or can be deployed in a distributed deployment manner. The server 940, the service server 950, the operation and maintenance management server 940, and the storage resource pool 970 are provided. Therefore, the corresponding improvement of the present invention can also be implemented in the above hardware.

FIG. 10 is a schematic structural diagram of a distributed storage device 1000 according to an embodiment of the present invention. The distributed storage device 1000 includes an obtaining unit 1001 and a storage unit 1002.

The obtaining unit 1001 is configured to determine M storage nodes to be stored to be stored, and obtain N copies of the to-be-stored data, where the N copies include original data of the data to be stored and the original Number According to the N-1 backup data, each of the N copies is sliced into X data fragments according to the same fragmentation manner so that each data fragment has N data slice copies, N Less than or equal to M. With reference to the method described in the foregoing first embodiment, the specific manner or optional implementation manner in which the acquiring unit 1001 obtains the number of data nodes and obtains a copy of the data to be stored is not described in this embodiment.

In conjunction with the foregoing apparatus embodiments, the obtaining unit 1001 may obtain the data from an external network or other device inside the distributed storage system through the transceiver 603 including the distributed storage device of FIG. 6. Alternatively, the obtaining unit 1001 may further include an input and output device so that the data can be acquired by means set by a user. In addition, the obtaining unit 1001 can also read a preset value stored in the distributed storage device, thereby acquiring a preset value of the data.

Optionally, in the embodiment, the acquiring unit 1001 fragments the replica when acquiring the portion of the data to be stored, and may also invoke the memory 602 by using the processor 601 of the distributed storage device in FIG. The stored program code performs the following operation steps to determine the number of slices X: determining an optimal slice base Y of the data to be stored according to the number of copies N and the number M of storage nodes, the optimal slice base

Obtaining, according to the optimal fragment base Y, the number of fragments X of the data to be stored, the number of fragments X of the data to be stored being equal to or smaller than the product of the optimal fragment base Y and the coefficient K, wherein , K is an integer greater than or equal to 1.

Optionally, the coefficient K is determined according to the load balancing requirement of the distributed storage system, where the coefficient K is a natural number, and the value of the K is larger, and the load balancing degree of the data to be stored is higher.

Optionally, determining the number of fragments X of the to-be-stored data according to the balanced load situation of the current distributed storage system.

The storage unit 1002 is configured to store the to-be-stored data into the M storage nodes of the distributed system. Specifically, the storage strategy for storing is performed according to the following principle: storing N pieces of data fragments of each of the X data fragments in N storage nodes of the M storage nodes And the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by

The integer quotient.

In conjunction with the foregoing apparatus embodiments, the storage unit 1002 can be implemented by calling the program code stored in the memory 602 by the processor 601 including the distributed storage device of FIG.

For the understanding of the above steps, reference may be made to the introduction in the foregoing first or second embodiment, and the above steps may also be extended or refined with reference to the foregoing embodiments.

FIG. 11 is a schematic structural diagram of a distributed storage device 1100 according to an embodiment of the present invention. The distributed storage device 1100 includes an acquisition unit 1101 and a determination unit 1102.

The obtaining unit 1101 is configured to acquire M data nodes to be stored, and a copy number N of data to be stored. With reference to the method described in the foregoing second embodiment, the specific manner or optional implementation manner in which the acquiring unit 1101 acquires the two pieces of data is not described in this embodiment.

In conjunction with the foregoing apparatus embodiments, the obtaining unit 1101 can obtain the data from an external network or other device inside the distributed storage system through the transceiver 703 including the distributed storage device of FIG. Alternatively, the obtaining unit 1101 may further include an input and output device so that the data can be acquired by a user setting. In addition, the obtaining unit 1101 can also read a preset value stored in the distributed storage device, thereby acquiring a preset value of the data.

The determining unit 1101 is configured to determine the number of fragments X of the data to be stored, where the number of fragments is a number of fragments after the data to be stored is fragmented, and the number of fragments X of the data to be stored is equal to or smaller than

Or equal to or less than

A positive integer multiple.

Optionally, the determining unit 1101 is further configured to determine a coefficient K according to a load balancing requirement of the distributed storage system, where the coefficient K is a positive integer, and the value of the K is larger, the data to be stored is The higher the load balancing degree; the number of fragments X is equal to or smaller than

The determining unit 1102 can determine the number of slices X by performing the above-described operational steps by calling the program code stored in the memory 702 by the processor 701 including the distributed storage device of FIG.

In conjunction with the foregoing apparatus embodiments, the determining unit 1102 can be implemented by calling the program code stored in the memory 702 by the processor 701 including the distributed storage device of FIG.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), and a random A variety of media that can store program code, such as RAM (Random Access Memor), disk, or optical disk.

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

A fragment storage method for data, characterized in that the method comprises:

Determining M storage nodes to which the data to be stored is to be stored;

Obtaining N copies of the data to be stored, where the N copies include original data of the data to be stored and N-1 backup data of the original data, and each copy of the N copies Fragmenting into X data fragments according to the same fragmentation method, so that each data fragment has N data slice copies, N is less than or equal to M;

And storing N copies of the data to be stored in the M storage nodes, where N data slice copies of each of the X data fragments are respectively stored in the M The number of data fragments in the N storage nodes in the storage node and the data fragment copies stored in the same N storage nodes is P or P+1, where P is X divided by
The integer quotient.
The method according to claim 1, wherein the fragmenting each of the N copies into X data fragments according to the same fragmentation method comprises:

Determining, according to the number of copies N and the number M of storage nodes, an optimal fragment base Y of the data to be stored, the optimal fragment base

The number of slices X of the data to be stored is equal to the product of the optimal slice base Y and the coefficient K, where K is an integer greater than or equal to 1;

Each of the N copies is fragmented into X data slices according to the same fragmentation mode.
The method according to claim 1, wherein the fragmenting each of the N copies into X data fragments according to the same fragmentation method comprises:

Determining, according to the number of copies N and the number M of storage nodes, an optimal fragment base Y of the data to be stored, the optimal fragment base

Obtaining, according to the optimal fragment base Y, the number of fragments X of the data to be stored, the number of fragments X of the data to be stored being smaller than the product of the optimal fragment base Y and the coefficient K, where K An integer greater than or equal to 1;

Each of the N copies is fragmented into X data slices according to the same fragmentation mode.
The method according to claim 2 or 3, wherein the coefficient K is determined according to a load balancing requirement of the distributed storage system, and the value of the K is higher, and the load balancing degree of the data to be stored is higher. .
The method according to any one of claims 1-4, wherein the obtaining the number N of copies of the data to be stored comprises:

The number N of copies of the data to be stored is determined according to the security requirement of the data to be stored, wherein the larger the value of the number of copies N, the higher the security requirement of the data to be stored that can be satisfied.
The method according to claim 1, wherein the number of data fragments X is determined according to a load balancing requirement of the distributed storage system, and the larger the value of the X, the more load balancing the data to be stored is. high.
The method according to any one of claims 1-6, wherein the storing the N copies of the data to be stored to the M storage nodes specifically includes:

Determining, when the N pieces of the data to be stored are stored in the M data nodes, selecting N data nodes from the M data nodes
Combination of data nodes;

Determine the number of slices X divided by
The resulting quotient P and the remainder Q;

In the stated
A combination of Q data nodes in a combination of data nodes for storing P+1 data fragments, and the rest
The combination of data nodes is used to store P data slices, wherein N copies of each data slice are respectively stored on N different data nodes in a combination of data nodes to be stored.
A distributed storage device for a fragment storage policy for determining data to be stored in a distributed storage system including at least two storage nodes, the device comprising:

a processor, and a memory coupled to the processor;

Wherein the processor invokes instructions stored in the memory for performing the method of any of claims 1-7.
A distributed storage system, the system includes at least two storage nodes, and at least one management device, the management device is configured to determine a fragment storage policy of data to be stored, and the device includes:

a processor, and a memory coupled to the processor;

Wherein the processor invokes instructions stored in the memory for performing the method of any of claims 1-7.