WO2018028229A1 - Data shard storage method, device and system - Google Patents

Data shard storage method, device and system Download PDF

Info

Publication number
WO2018028229A1
WO2018028229A1 PCT/CN2017/079971 CN2017079971W WO2018028229A1 WO 2018028229 A1 WO2018028229 A1 WO 2018028229A1 CN 2017079971 W CN2017079971 W CN 2017079971W WO 2018028229 A1 WO2018028229 A1 WO 2018028229A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
stored
nodes
copies
storage
Prior art date
Application number
PCT/CN2017/079971
Other languages
French (fr)
Chinese (zh)
Inventor
王华琼
高超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17838361.8A priority Critical patent/EP3487149B1/en
Publication of WO2018028229A1 publication Critical patent/WO2018028229A1/en
Priority to US16/270,048 priority patent/US10942828B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2082Data synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • Embodiments of the present invention relate to the field of distributed storage, and in particular, to a method, an apparatus, and a system for storing fragments of data in a distributed storage system.
  • data backup in a distributed storage system is usually backed up by means of cross backup.
  • the number of fragments of the data fragment is generally the same as the number of storage nodes, so that the main fragment of each data fragment is stored on each storage node, and the backup fragment is stored in another different from the main fragment.
  • Table 1 lists a common storage strategy for storing primary data and two backup data in six data nodes. The data is divided into six total data segments of AF, and each data slice contains one primary data. And two backup data.
  • the embodiments of the present invention provide a method, a device, and a system for storing data in a distributed storage system, which improve the availability of data backup and the efficiency of data recovery.
  • an embodiment of the present invention provides a fragment storage method for data.
  • the method includes determining M data nodes to which the data to be stored is to be stored, and acquiring N copies of the data to be stored, wherein the fragment storage method N of the copy number data is the original data of the data to be stored and the number of copies of the backup data. Sum, each of the N copies is fragmented into X data fragments according to the same fragmentation mode, so that each data fragment has N data slice copies.
  • the N copies of the data to be stored are stored in the M storage nodes, that is, the N data slice copies of each of the X data fragments are stored in the N storage nodes.
  • the number of data fragmentation copies stored in the same N storage nodes is the smallest, and specifically, the number of data fragments stored in the same N storage nodes is P or P+. 1, where P is the X divided by Integer quotient (integer quotient refers to an incomplete quotient or a partial quotient, for example, X is 10, When it is 3, the integer quotient P is 3).
  • P is the X divided by Integer quotient
  • integer quotient refers to an incomplete quotient or a partial quotient, for example, X is 10
  • the integer quotient P is 3
  • the largest data loss may be minimized, compared to the prior art. Reduce the data loss ratio and improve the availability of data backup.
  • the copies of each data slice are evenly distributed on different nodes, when a node fails, the data fragments stored on the node can be recovered by storing corresponding copies in a plurality of different nodes, thereby Improve the efficiency of concurrent recovery.
  • the copy is time-divided, and the number of slices X is based on the optimal slice base Y, wherein The number of slices X may be equal to or smaller than the product of the optimal slice base Y and the coefficient K, where K is an integer greater than or equal to one.
  • the value of X is less than the product of Y and K. At this time, the closer the value of X is to the product of Y and K, when N storage nodes fail at the same time, it may cause The smaller the amount of data loss, the smaller the proportion of total stored data.
  • the value of X is the product of Y and K.
  • the maximum amount of data loss that may be caused is the ratio of the total stored data. The smallest.
  • the coefficient K when determining the number of fragments X of the data to be stored, the coefficient K may be determined according to the load balancing requirement of the distributed storage system. The larger the value of K, the higher the degree of balanced load of the data to be stored.
  • the number of to-be-stored is determined according to the balanced load situation of the current distributed storage system. According to the number of fragments X.
  • a larger number of slices X may be taken, thereby obtaining a smaller data granularity and increasing the degree of balanced load.
  • the number N of copies of the data to be stored is determined according to the security requirement of the data to be stored, wherein the larger the value of the number of copies N, the higher the security requirement of the data to be stored that can be satisfied. That is, to lose a data fragment, the number of nodes that need to fail at the same time is more.
  • the number N of copies of the data to be stored is determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies. This provides more flexible data availability guarantees for different types of data.
  • the N data nodes are selected from the M data nodes.
  • the combination of data nodes determine the number of fragments X divided by The resulting quotient P and the remainder Q; Selecting a combination of Q data nodes in the combination of the data nodes for storing P+1 data fragments, and the rest
  • the combination of data nodes is used to store P data slices, wherein N copies of each data slice are respectively stored on N different data nodes in a combination of data nodes to be stored.
  • an embodiment of the present invention provides a method for determining data fragmentation in a distributed storage system, the method comprising: determining M data nodes to be stored in the data to be stored, and obtaining a copy number N of the data to be stored, thereby determining The number of fragments X of data to be stored.
  • it is equal to or smaller than the product of the optimal fragment base Y and the coefficient K, wherein K is an integer greater than or equal to 1.
  • K is an integer greater than or equal to 1.
  • the coefficient K is determined according to the load balancing requirement of the distributed storage system, wherein the coefficient K is an integer greater than or equal to 1, and the value of the K is larger, and the load balancing of the data to be stored is performed. The higher the degree.
  • the number of fragments X is equal to or less than
  • determining a storage policy for storing data to be stored in the M storage nodes wherein each of the X data fragments is fragmented
  • the N data slice copies are respectively stored in the N storage nodes of the M storage nodes, and the number of data fragments stored in the same N storage nodes as the data slice copies is P or P+ 1, where P is the X divided by The integer quotient.
  • the specific manner of determining the storage policy is: determining that N of the data to be stored are stored in the M data nodes, and selecting N data from the M data nodes. Node's The combination of data nodes; determine the number of fragments X divided by The resulting quotient P and the remainder Q; Selecting a combination of Q data nodes in the combination of the data nodes for storing P+1 data fragments, and the rest The combination of data nodes is used to store P data slices, wherein N copies of each data slice are respectively stored on N different data nodes in a combination of data nodes to be stored.
  • the number N of copies of the data to be stored may be determined according to the security requirement of the data to be stored. The larger the value of the number of copies N, the higher the security requirement of the data to be stored that can be satisfied.
  • the number N of copies of the data to be stored is determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies. This allows different levels of data to be available for different data types. Sexual protection.
  • an embodiment of the present invention provides a distributed storage device, which can implement the functions in the foregoing method design of the first or second aspect.
  • the functions may be implemented by hardware or by corresponding software implemented by hardware.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • the modules can be software and/or hardware.
  • the structure of the device includes a processor and a memory coupled to the processor.
  • the processor invokes instructions stored in the memory for performing the method of the first or second aspect described above.
  • the device includes an obtaining unit and a storage unit, wherein the obtaining unit is configured to determine M storage nodes to which the data to be stored is to be stored and obtain N copies of the data to be stored, wherein the The N copies include the original data of the data to be stored and N-1 backup data of the original data, and each of the N copies is sliced into X data fragments according to the same fragmentation manner. So that each data slice has N data slice copies, N is less than or equal to M.
  • N Determining, by the unit, the N copies of the data to be stored to the M storage nodes, wherein N data slice copies of each of the X data fragments are respectively stored in the Among the N storage nodes among the M storage nodes, and the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by The integer quotient.
  • the device comprises an obtaining unit and a determining unit, wherein the acquiring unit is configured to acquire the M data nodes to which the data to be stored is to be stored and the number N of copies of the data to be stored.
  • the determining unit determines the number of slices X according to the method of the aforementioned second aspect.
  • an embodiment of the present invention provides a distributed storage system.
  • the distributed storage system includes a client, a plurality of hard disks, and a distributed storage device, and the distributed storage device may be the device in the foregoing design of the third aspect, for performing the foregoing first aspect or the second aspect relative to the method .
  • an embodiment of the present invention provides a distributed storage system, where the system includes a client and a distributed storage server system, where the distributed storage server system may include: a control server, an operation and maintenance management (OAM) server. , business servers, storage resource pools, and storage engines.
  • OAM operation and maintenance management
  • the storage engine may be used to perform the corresponding method of the aforementioned first aspect or second aspect.
  • the present invention provides a method for data slice storage in a distributed environment, which improves the availability of data and the efficiency of data recovery when a node fails.
  • FIG. 1 is a schematic diagram of a possible system architecture of the present invention
  • FIG. 2 is a schematic flowchart of determining a fragment storage policy of a distributed storage system according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a possible fragment storage policy of a distributed storage system according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of node recovery in a possible distributed storage system fragment storage policy according to an embodiment of the present invention
  • FIG. 5 is a diagram showing an example of relationship between the number of fragments after a multi-node failure and a data loss ratio in a possible scenario according to an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a distributed storage device according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of still another distributed storage device according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a distributed storage system according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of still another distributed storage system according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of a distributed storage device according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of still another distributed storage device according to an embodiment of the present invention.
  • the system architecture of the distributed storage system to which the present invention is applied is first introduced.
  • Distributed storage systems distribute data across multiple independent devices.
  • the traditional network storage system uses a centralized storage server to store all data.
  • the storage server becomes a bottleneck of system performance, and is also the focus of reliability and security, and cannot meet the needs of large-scale storage applications.
  • the distributed network storage system adopts a scalable system structure, uses multiple storage servers to share the storage load, and uses the location server to locate the storage information, which not only improves the reliability, availability and access efficiency of the system, but also is easy to expand.
  • FIG. 1 it is a schematic diagram of a distributed storage system. It should be noted that the distributed storage system is only an example, and the scope of application of the present invention is not limited thereto.
  • a distributed database engine 102 and a distributed data storage node 107 are included.
  • the distributed database engine 102 is the core of the system, and is responsible for data parsing, routing, distribution, merging, etc., and manages a plurality of storage nodes at the bottom; the distributed storage node is composed of a plurality of data nodes for storing data. Users can flexibly build clusters of data nodes of different sizes according to their needs.
  • the distributed database engine 102 includes an API (Application Programming Interface) module 103 that provides an interface to the client to invoke the database.
  • the resource application module 104 determines the number of nodes that are provided to the client for the current storage requirement according to the storage requirements of the client and the storage capacity of each node provided in the distributed data storage node. Optionally, the resource application module may also be reliable according to the data submitted by the user. The sexual demand determines the number of backup copies of the data to be stored.
  • the data management module 105 determines a storage policy according to the applied storage resource, that is, the number of data fragments and the data. The correspondence between the fragment and the storage node.
  • the data routing module 106 routes the request from the client according to the storage policy determined by the data management module, fragments the data and routes the data to the data node, or aggregates the data of each node and returns the data to the client.
  • the function of the module is implemented by a server.
  • a certain functional module can be implemented by a separate server, but in some cases, it can also be through a server.
  • a plurality of functional modules are implemented, or a functional module has a cluster of multiple servers.
  • FIG. 2 it is a schematic flowchart of Embodiment 1 of the present invention.
  • a method for determining a data fragmentation storage policy in a distributed storage system is provided.
  • the embodiment of the present invention mainly determines the storage strategy of the data fragment by improving the data management module 105.
  • the client initiates a data storage request to the distributed storage system, and stores the data to be stored into the distributed storage system.
  • the order of execution between method steps is not limited. Referring to FIG. 2, those skilled in the art can understand that S101 and S102 are both pre-steps of step S103, that is, S101 and S102 may be executed in any order, or may be performed in parallel in one or more steps.
  • the method includes:
  • the M data nodes to be stored can be determined according to the data size of the data to be stored and the amount of storage that each data node can provide.
  • the number of data nodes stored in the data may also be a preset fixed value or the total number of storage nodes.
  • the number of data nodes M can also be determined according to the set value of the user according to the API interface.
  • the M data nodes with lower load levels can be selected as the data nodes to be stored according to the load condition of each node, thereby improving the entire distributed storage system.
  • the degree of balanced load when determining the data node to be stored, the M data nodes with lower load levels can be selected as the data nodes to be stored according to the load condition of each node, thereby improving the entire distributed storage system.
  • the degree of balanced load when determining the data node to be stored, the M data nodes with lower load levels can be selected as the data nodes to be stored according to the load condition of each node, thereby improving the entire distributed storage system.
  • the degree of balanced load when determining the data node to be stored, the M data nodes with lower load levels can be selected as the data nodes to be stored according to the load condition of each node, thereby improving the entire distributed storage system. The degree of balanced load.
  • the copy number N refers to the sum of the original data of the data to be stored and the number of copies of the backup data, that is, the N copies include the original data of the data to be stored and the N of the original data.
  • - 1 backup data In order to ensure the availability of data, redundant backups of stored data are required.
  • the number of copies of the data is a preset value, which can be set by the user in advance, or according to different data to be stored, each time the request is stored.
  • the copy of the same data should be stored on different storage nodes, so as to ensure that when a storage node fails, other copies are not lost. Therefore, for the same distributed system, the value of the copy number N should be less than or equal to the value of the number M of data nodes.
  • the number N of copies of the data to be stored may be determined according to security requirements of the data to be stored.
  • the security requirement of the data to be stored is higher, the determined value of the number N of copies of the data to be stored is larger.
  • the security requirements of the data can be directly obtained by the user's storage request, that is, the user requests different security requirements for storing the data in different storage requests; and the preset judgment logic, for example, the correspondence between different data types and security requirements, Or the correspondence between different user types and security requirements, etc., to determine the security requirements of the data to be stored.
  • the security requirements of different applications of the users on the platform are different.
  • the security requirements of the data to be stored can also be determined according to different applications or application types. .
  • the number N of copies of the data to be stored may be determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies, so as to protect different types of data with different degrees of availability.
  • Each of the N copies is sliced into X data slices according to the same slice mode such that each data slice has N data slice copies.
  • the number of slices X refers to the number of slices in which a copy of the data to be stored is sliced.
  • both the primary data and the backup data of the data to be stored need to be sliced in the same fragmentation manner, thereby segmenting the primary data and the corresponding backup data.
  • each copy of the data is divided into the same X data fragments, so for one data shard, there are N identical shards containing the data shard. It can be understood that, when the number of fragments X of the data to be stored is determined, a total of N ⁇ X data fragments to be stored need to be stored in the M nodes.
  • the number of fragments X may be preset by the user, that is, the same number of fragments may be used for any data to be stored, or the data may be set by the user when performing a storage request according to different data to be stored.
  • the number of shards may be preset by the user, that is, the same number of fragments may be used for any data to be stored, or the data may be set by the user when performing a storage request according to different data to be stored.
  • the number of shards may be preset by the user, that is, the same number of fragments may be used for any data to be stored, or the data may be set by the user when performing a storage request according to different data to be stored. The number of shards.
  • the higher the number of fragments the greater the number of data fragments stored in the storage node, thereby reducing the granularity of the data fragmentation, and it is easier to uniformly store the data in each node, thereby achieving load balancing as much as possible. . Therefore, according to the load balancing situation of the current distributed storage system, different number of fragments are set for the stored data. For example, when the distributed system has a high demand for load balancing, the number of fragments is dynamically increased to improve the balanced load of the distributed system.
  • the following method is performed when storing: storing N data slice copies of each of the X data fragments in N storage nodes of the M storage nodes, And the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by The integer quotient.
  • the number of data fragments on the same N storage nodes means that in a distributed system, for any N data nodes, data points of all N copies of the data fragments are stored on the N data nodes.
  • the number of data fragments that cause the data slice copies to be stored in the same N storage nodes is P or P+1, where P is X divided by
  • the integer quotient is essentially the smallest number of data fragments stored on the same N nodes. That is, the data nodes need to be uniformly stored in the combination of possible N data nodes, so that the number of data fragments stored in each N data node combination is relatively uniform, thereby arbitrarily selecting N data nodes, and possibly storing them.
  • each data slice is stored on N nodes, that is, a combination of N nodes.
  • N nodes that is, a combination of N nodes.
  • a combination of N nodes is selected from which a total of Different combinations. Therefore, when the number of fragments X is smaller than When each data fragment can select different N data fragments, that is, the number of data fragments stored on the same N nodes is 1; when the number of fragments X is greater than When there are multiple data fragments stored on the same N nodes.
  • each data slice has 3 copies
  • the following example shows a specific storage algorithm to obtain a storage strategy that satisfies the foregoing storage mode. It should be understood that the algorithm is merely one design for a storage strategy for storing data to be stored in a storage node in accordance with the foregoing principles. For those skilled in the art, on the basis of understanding the foregoing allocation principles, the foregoing storage strategies can be implemented by a plurality of different specific algorithms, which are not enumerated here.
  • the algorithm includes the following steps:
  • the X pieces of data to be stored are numbered 1, 2, 3, ... X;
  • the storage allocation table contains Row, N columns, one row contains N data nodes, each row contains different combinations of data nodes, that is, the data nodes in each row are One of a combination of data nodes;
  • the N copies of each data node are stored in the N data nodes of the row corresponding to the storage allocation table, and the storage policy is a storage policy that conforms to the foregoing principles.
  • FIG. 3 a specific example of the storage strategy obtained in accordance with the embodiment of the present method is as follows.
  • the data to be stored having a fragment number X of 20 is stored in a distributed system having a data node number M of 6 in a number of copies of N
  • one of the storage strategies obtained according to the embodiment of the present invention is as shown in FIG. kind.
  • the number of data shards stored in the same three storage nodes is the smallest, that is, each of the three storage node combinations stores three copies of one data shard. Therefore, in this example, three storage nodes are arbitrarily selected, all of which store only all three copies of one data slice.
  • nodes 1, 2, 3 only completely store all 3 copies of data slice A, while nodes 1, 2, 4 only completely store all 3 copies of data slice B. Since each data fragment is stored in three different data nodes, at least three data nodes need to be simultaneously failed to completely lose the data fragment. In the storage strategy of this example, when any three data nodes fail at the same time, only one data fragment is lost. For example, when nodes 1, 2, and 3 fail, only data fragment A is lost, while other data fragments retain at least one data fragment replica in the remaining nodes.
  • the remaining nodes can simultaneously perform data recovery on the nodes.
  • a possible data recovery mode when the node 5 fails is listed, and a copy of the data slice stored in the node can be respectively passed through a copy of the gray portion of the other 5 data nodes. restore.
  • the data of different number of fragments and the number of copies are stored in a distributed system, and the number of fragments and the number of copies can be flexibly adjusted according to different conditions of data to be stored. Since the number of data fragments stored in the same N nodes is guaranteed to be the least, when N nodes fail at the same time, the amount of data fragmentation may be minimized, thereby improving the availability of data backup. At the same time, because this scheme can achieve higher number of fragments than nodes The number of storage strategies, so the number of data fragments stored by a single node is increased. When a single node sends a failure, it can participate in data recovery because the copies of the stored data fragments are evenly distributed among other data nodes. The number of data nodes is increased, that is, the number of data recovery concurrency at the time of node failure is improved, and the data recovery efficiency is improved.
  • a method for determining the number of data segments to be stored is given. By using the number of data segments to be stored, better data availability can be achieved.
  • how to determine the number of data nodes to be stored and the number of copies of the data to be stored is similar to the method described in S101 and S102 of the foregoing embodiment, and details are not described herein again.
  • the number of fragments of the data to be stored determined according to the embodiment of the method may be used to slice the data copy according to the step S102 in the foregoing embodiment, and the similar introduction is no longer in this embodiment. Narration.
  • each data fragment of the data to be stored needs to be stored in N nodes.
  • Select N data nodes from M data nodes a total of The combination method.
  • the number of possible data fragments may be minimized, and each data fragment should be stored as much as possible in a combination of different data nodes. It can be seen that when the number of fragments is smaller than The larger the number of shards, the smaller the maximum amount of data that can be lost when N nodes fail.
  • the following is an example of the case where the number of data nodes is 6, the number of copies is 3, and the size of the data fragments is equal.
  • the resulting maximum amount of data loss as a percentage of the total number of changes.
  • the abscissa is the number of data slices X
  • the ordinate is the ratio of the maximum amount of data loss that may be caused when any three data nodes fail
  • the function image is as shown in the figure. among them,
  • the maximum lost data at the three-point failure is 1/20 of the total data
  • the maximum loss data at the three-point failure is 2/21 of the total data
  • the maximum loss data at the three-point failure is 2/40 of the total data
  • the number of fragments X of the data to be stored is equal to the product of the optimal fragment base Y and the coefficient K.
  • the maximum amount of data that may be lost accounts for the smallest proportion of the total data amount. That is, the availability of data is the highest.
  • the number of data fragments X is smaller than or
  • the closer the fragment number X is or The integer multiple the smaller the ratio of the maximum amount of data that may be lost to the total amount of data, the higher the availability of the data, so when the number of fragments X of the data to be stored is smaller than and close to the optimal fragment base Y and coefficient
  • the product of K also gives relatively high data availability.
  • the optimal value can be achieved from the availability of data. And when the optimal value cannot be taken, the value is smaller and closer or When the integer multiple is greater than 1, the availability of data is higher. Therefore, in determining the number of slices X, in order to achieve the best in data availability, it should be selected or An integer multiple greater than 1 as the number of fragments X; and when considering other factors, it will not or When the integer multiple of more than 1 is used as the number of fragments X, the value of the number of fragments X is smaller and closer. or The integer multiple of more than 1, the more the data availability can be improved.
  • the final value of X can be determined in conjunction with the effect to be achieved by applying the specific scene of the present invention.
  • the desired technical effect when N nodes fail, the largest possible data loss ratio is less than Q.
  • K takes an integer greater than or equal to 1
  • the maximum data loss ratio that may be caused is monotonously decreasing, and the maximum data loss ratio corresponding to the value of X in the interval is K/X. Therefore, to make K/X smaller than Q, the value of X should be greater than K/Q.
  • the coefficient can be determined according to the load balancing requirement of the distributed storage system.
  • K is an integer greater than or equal to 1, used to determine a multiple of the optimal slice base.
  • the value of the coefficient K is larger.
  • the value of the number of slices X is equal to or smaller than the product of the optimal slice base Y and the coefficient K.
  • the optimal data availability can be obtained and the load balancing requirement of the distributed storage system can be satisfied; and when other factors are comprehensively considered, the fragmentation is made.
  • the value of the number X is not the product of the optimal fragment base Y and the coefficient K, the closer the value of the number of slices X is to the product of the optimal fragment base Y and the coefficient K, the more the data availability is. High, and more able to meet the load balancing needs of distributed storage systems.
  • the availability of the data can be further improved on the basis of realizing the advantages of the foregoing embodiments, so that the same is
  • the optimal or relatively optimal data availability can be achieved based on the number of slices determined by the optimal slice base.
  • the load balancing of the distributed system is improved, and the concurrent recovery efficiency when a certain node fails.
  • FIG. 6 is a distributed storage device 600 according to an embodiment of the present application.
  • the device 600 may be a node deployed in a distributed storage system, or may be independent in a distributed storage system.
  • Data management device includes, but is not limited to, a computer, a server, etc., as shown in FIG. 6, the device 600 includes a processor 601, a memory 602, a transceiver 603, and a bus 604.
  • the transceiver 603 is configured to transceive data with and from external devices, such as other nodes in a distributed system or network devices other than distributed systems.
  • the number of processors 601 in device 600 can be one or more.
  • processor 601, memory 602, and transceiver 603 may be connected by a bus system or other means.
  • bus system or other means.
  • the program code can be stored in the memory 602.
  • the processor 601 is configured to call the program code stored in the memory 602 for performing the operations of S101, S102, and S103 in the foregoing embodiment:
  • the processor 501 is further configured to perform a refinement or an alternative of the foregoing steps in the first embodiment.
  • the processor 501 may further determine the number of fragments X by performing operations S201 and S202: determining, according to the number of copies N and the number of storage nodes M, The best shard base Y for storing data, the best shard base Acquiring the number of fragments X of the data to be stored according to the optimal fragment base Y, the number of fragments X of the data to be stored is equal to or smaller than the optimal fragment base Y or equal to or smaller than the optimal fragment base The natural multiple of Y.
  • the processor 601 herein may be a processing component or a general term of multiple processing components.
  • the processing component may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • DSPs digital singal processors
  • FPGAs Field Programmable Gate Arrays
  • the memory 603 may be a storage device or a collective name of a plurality of storage elements, and is used to store executable program code or parameters, data, and the like required for the application running device to operate. And the memory 603 may include random access memory (RAM), and may also include non-volatile memory such as a magnetic disk memory, a flash memory, or the like.
  • RAM random access memory
  • non-volatile memory such as a magnetic disk memory, a flash memory, or the like.
  • the bus 604 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 6, but it does not mean that there is only one bus or one type of bus.
  • the user equipment may also include input and output means coupled to bus 604 for connection to other portions, such as processor 601, via a bus.
  • the user may implement the steps of manually configuring or preset parameters in this embodiment through the input device.
  • the input and output device can provide an input interface for the operator, so that the operator can select the control through the input interface.
  • the item can also be another interface through which other devices can be externally connected.
  • FIG. 7 is a distributed storage device 700 according to an embodiment of the present disclosure.
  • the device 700 may be a node deployed in a distributed storage system, or may be independent in a distributed storage system.
  • Data management device includes, but is not limited to, a computer, a server, etc., as shown in FIG. 7, the device 700 includes a processor 701, a memory 702, a transceiver 703, and a bus 704.
  • the transceiver 703 is configured to transceive data with and from external devices, such as other nodes in a distributed system or network devices other than distributed systems.
  • the number of processors 701 in device 700 can be one or more.
  • processor 701, memory 702, and transceiver 703 may be connected by a bus system or other means.
  • bus system or other means.
  • the program code can be stored in the memory 702.
  • the processor 701 is configured to call the program code stored in the memory 702 for performing the following S201, S202 operation operations, thereby determining the number of fragments when the slice storage is performed in the distributed storage system.
  • the processor 701 is further configured to perform the refinement or the optional solution of the foregoing steps in the second embodiment.
  • the processor 701 herein may be a processing component or a collective name of multiple processing components.
  • the processing component may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • DSPs digital singal processors
  • FPGAs Field Programmable Gate Arrays
  • the memory 703 may be a storage device or a collective name of a plurality of storage elements, and is used to store executable program code or parameters, data, and the like required for the application running device to operate. And the memory 703 may include random access memory (RAM), and may also include non-volatile memory such as a magnetic disk memory, a flash memory, or the like.
  • RAM random access memory
  • the bus 704 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 7, but it does not mean that there is only one bus or one type of bus.
  • the device may also include input and output devices coupled to bus 704 for connection to other portions, such as processor 701, via a bus.
  • the user may implement the steps of manually configuring or preset parameters in this embodiment through the input device.
  • the input/output device can provide an input interface for the operator, so that the operator can select the control item through the input interface, and can also be other interfaces through which other devices can be externally connected.
  • FIG. 8 is a schematic block diagram of a distributed storage system 800 in accordance with an embodiment of the present invention.
  • the distributed storage system 800 includes a client 810, a plurality of hard disks 820, and a distributed storage device 830.
  • the distributed storage device 830 may be the distributed storage device 600 and the distributed storage device 700 shown in FIG. 6 or FIG. 7 , and details are not described herein again.
  • the hardware entity of the distributed system provided in this embodiment can be understood by referring to the foregoing distributed system architecture in FIG.
  • the distributed database engine 102 is not a storage device 830 as a hardware entity. Therefore, in the data management module 105 improved in the embodiment of the present invention, the hardware entity corresponding to the bearer in the embodiment is Distributed storage device 830.
  • the distributed storage device 830 stores/reads the user's data file on the plurality of hard disks 820 according to the storage/read request transmitted by the user through the client 810.
  • FIG. 9 is a schematic block diagram of another distributed storage system 900 according to an embodiment of the present invention.
  • Distributed storage system 900 includes a client 910 and a distributed storage server system 920.
  • Client 910 can connect to storage server system 920 via the Internet.
  • the client 910 can run a client agent of the distributed storage system to support various types of distributed storage applications to access the distributed storage system.
  • the client agent can implement personal online storage and backup, enterprise online storage. And backup, application online storage or other emerging storage and backup, and more.
  • the distributed storage server system 920 can include a control server 930, an operation and maintenance management (OAM) server 940, a service server 950, a storage resource pool 970, and a storage engine 980.
  • OAM operation and maintenance management
  • storage engine 980 may be an example of the distributed storage device of FIG. 6 or 7.
  • the hardware device in this embodiment can be understood in accordance with the distributed architecture in FIG. 1 described above.
  • the storage engine 980 in this embodiment implements the functions of the distributed database engine 102, and the distributed storage server system 920 also includes distributed Other functional servers related to the system, such as the control server 930, the operation and maintenance management server 940, the service server 950, and the like.
  • the control server 930 is mainly used to control the distributed storage system to perform various types of storage services, such as relocation, moving and backup of organizational data, and elimination of storage hotspots.
  • the operation and maintenance management server 940 can provide a configuration interface and an operation and maintenance interface of the storage system, and provides functions such as logs and alarms.
  • the service server 950 can provide functions such as service identification and authentication to complete the service delivery function.
  • the storage resource pool 970 may include a storage resource pool composed of physical storage nodes.
  • the storage resource pool may be composed of a storage server/storage board 960.
  • the virtual nodes in each physical storage node form a storage logical ring, and the user's data files may be stored in the storage resource pool.
  • the storage engine 980 can provide logic for the main functions of the distributed storage system.
  • the logic can be deployed on one of the control server 930, the service server 950, and the operation and maintenance management server 940, or can be deployed in a distributed deployment manner.
  • the server 940, the service server 950, the operation and maintenance management server 940, and the storage resource pool 970 are provided. Therefore, the corresponding improvement of the present invention can also be implemented in the above hardware.
  • FIG. 10 is a schematic structural diagram of a distributed storage device 1000 according to an embodiment of the present invention.
  • the distributed storage device 1000 includes an obtaining unit 1001 and a storage unit 1002.
  • the obtaining unit 1001 is configured to determine M storage nodes to be stored to be stored, and obtain N copies of the to-be-stored data, where the N copies include original data of the data to be stored and the original Number According to the N-1 backup data, each of the N copies is sliced into X data fragments according to the same fragmentation manner so that each data fragment has N data slice copies, N Less than or equal to M.
  • the specific manner or optional implementation manner in which the acquiring unit 1001 obtains the number of data nodes and obtains a copy of the data to be stored is not described in this embodiment.
  • the obtaining unit 1001 may obtain the data from an external network or other device inside the distributed storage system through the transceiver 603 including the distributed storage device of FIG. 6.
  • the obtaining unit 1001 may further include an input and output device so that the data can be acquired by means set by a user.
  • the obtaining unit 1001 can also read a preset value stored in the distributed storage device, thereby acquiring a preset value of the data.
  • the acquiring unit 1001 fragments the replica when acquiring the portion of the data to be stored, and may also invoke the memory 602 by using the processor 601 of the distributed storage device in FIG.
  • the stored program code performs the following operation steps to determine the number of slices X: determining an optimal slice base Y of the data to be stored according to the number of copies N and the number M of storage nodes, the optimal slice base Obtaining, according to the optimal fragment base Y, the number of fragments X of the data to be stored, the number of fragments X of the data to be stored being equal to or smaller than the product of the optimal fragment base Y and the coefficient K, wherein , K is an integer greater than or equal to 1.
  • the coefficient K is determined according to the load balancing requirement of the distributed storage system, where the coefficient K is a natural number, and the value of the K is larger, and the load balancing degree of the data to be stored is higher.
  • the storage unit 1002 is configured to store the to-be-stored data into the M storage nodes of the distributed system.
  • the storage strategy for storing is performed according to the following principle: storing N pieces of data fragments of each of the X data fragments in N storage nodes of the M storage nodes And the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by The integer quotient.
  • the storage unit 1002 can be implemented by calling the program code stored in the memory 602 by the processor 601 including the distributed storage device of FIG.
  • FIG. 11 is a schematic structural diagram of a distributed storage device 1100 according to an embodiment of the present invention.
  • the distributed storage device 1100 includes an acquisition unit 1101 and a determination unit 1102.
  • the obtaining unit 1101 is configured to acquire M data nodes to be stored, and a copy number N of data to be stored. With reference to the method described in the foregoing second embodiment, the specific manner or optional implementation manner in which the acquiring unit 1101 acquires the two pieces of data is not described in this embodiment.
  • the obtaining unit 1101 can obtain the data from an external network or other device inside the distributed storage system through the transceiver 703 including the distributed storage device of FIG.
  • the obtaining unit 1101 may further include an input and output device so that the data can be acquired by a user setting.
  • the obtaining unit 1101 can also read a preset value stored in the distributed storage device, thereby acquiring a preset value of the data.
  • the determining unit 1101 is configured to determine the number of fragments X of the data to be stored, where the number of fragments is a number of fragments after the data to be stored is fragmented, and the number of fragments X of the data to be stored is equal to or smaller than Or equal to or less than A positive integer multiple.
  • the determining unit 1101 is further configured to determine a coefficient K according to a load balancing requirement of the distributed storage system, where the coefficient K is a positive integer, and the value of the K is larger, the data to be stored is The higher the load balancing degree; the number of fragments X is equal to or smaller than
  • the determining unit 1102 can determine the number of slices X by performing the above-described operational steps by calling the program code stored in the memory 702 by the processor 701 including the distributed storage device of FIG.
  • the determining unit 1102 can be implemented by calling the program code stored in the memory 702 by the processor 701 including the distributed storage device of FIG.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), and a random A variety of media that can store program code, such as RAM (Random Access Memor), disk, or optical disk.
  • ROM Read-Only Memory
  • RAM Random Access Memor

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the field of distributed storage, and in particular relates to distributed sharding storage technology. The data shard storage method in a distributed storage system comprises: determining M data nodes required for storing data to be stored, acquiring N copies of the data to be stored, and performing sharding on each copy among the N copies to form X data shards according to the same sharding mode; and then, storing the data to be stored in the M storage nodes, that is, respectively storing N copies of each data shard among the X data shards into the N storage nodes, so that the number of data shards with data shard copies being stored in the same N storage nodes is P or P+1, wherein P is an integer quotient of X divided by (I). Provided is a distributed data shard storage method. Thus, the data availability and the efficiency of data recovery when a node fault occurs are improved.

Description

数据的分片存储方法、装置及系统Data storage method, device and system
本申请要求于2016年8月10日提交中国专利局、申请号为201610659118.4、发明名称为“数据的分片存储方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201610659118.4, the disclosure of which is incorporated herein by reference. In this application.
技术领域Technical field
本发明实施例涉及分布式存储领域,更具体地,涉及一种分布式存储系统中数据的分片存储方法、装置以及系统。Embodiments of the present invention relate to the field of distributed storage, and in particular, to a method, an apparatus, and a system for storing fragments of data in a distributed storage system.
背景技术Background technique
随着信息技术的快速发展,信息系统数据库中的数据量越来越大。为了满足大数据量的存储需求,在多台服务器上运行的分布式存储系统得到了广泛的应用。在分布式存储系统中,多台服务器上分别运行了多个数据库系统。数据进行存储时,需要先将数据进行分片(sharding),再将不同的数据分片交由不同的服务器进行存储。分片是一种水平扩展(horizontal scaling)的方式,把一个大的数据集分散到多个数据节点上,所有的数据节点将组成一个逻辑上的数据库来存储这个大的数据集。分片对用户(应用层)是透明的,用户不会知道数据很被存放到哪个片服务器上。采用数据分片进行数据存储,可以突破单节点服务器的I/O能力限制,解决数据库拓展性的问题。With the rapid development of information technology, the amount of data in the information system database is increasing. In order to meet the storage requirements of large data volumes, distributed storage systems running on multiple servers have been widely used. In a distributed storage system, multiple database systems are running on multiple servers. When data is stored, the data needs to be sharding, and then different data fragments are stored by different servers for storage. Fragmentation is a way of horizontal scaling. A large data set is spread across multiple data nodes. All data nodes form a logical database to store this large data set. The slice is transparent to the user (application layer), and the user does not know which slice server the data is stored on. Using data sharding for data storage can break the I/O capacity limitation of a single-node server and solve the problem of database scalability.
同时,为了保证数据和服务的高可用性,往往需要为分布式数据库提供必要的容错机制,对各个数据分片进行冗余备份。通过将同一数据分片的多个副本存储在不同的服务器上,可以避免由于单个服务器不可用时造成的数据分片丢失。At the same time, in order to ensure high availability of data and services, it is often necessary to provide the necessary fault tolerance mechanism for distributed databases, and perform redundant backup of each data fragment. By storing multiple copies of the same data slice on different servers, you can avoid data fragmentation caused by a single server being unavailable.
在现有技术中,通常采用交叉备份的方式来对分布式存储系统中的数据分片进行备份。数据分片的分片数量一般与存储节点数量相同,从而将每个数据分片的主分片分别存储于每个存储节点上,而将备份分片存储于与主分片所不同的另外任意两个存储节点上。例如,表1列举了一种常见的将主数据及两份备份数据存储在6个数据节点的存储策略,其中,数据分为A-F共六个数据分片,每个数据分片包含一个主数据以及两个备份数据。In the prior art, data backup in a distributed storage system is usually backed up by means of cross backup. The number of fragments of the data fragment is generally the same as the number of storage nodes, so that the main fragment of each data fragment is stored on each storage node, and the backup fragment is stored in another different from the main fragment. On two storage nodes. For example, Table 1 lists a common storage strategy for storing primary data and two backup data in six data nodes. The data is divided into six total data segments of AF, and each data slice contains one primary data. And two backup data.
存储节点Storage node 数据分片Data fragmentation
节点1Node 1 A C DA C D
节点2Node 2 B A CB A C
节点3Node 3 C A FC A F
节点4Node 4 D B ED B E
节点5Node 5 E D FE D F
节点6Node 6 F B EF B E
表1Table 1
在现有技术中,通过将主分片与备份分片分别存储在不同的存储节点上,可以保证当 一个存储节点故障时,该数据分片不会损失,而只有当同一数据分片所在的全部数据节点都产生故障时,该数据片才会损失。但是,当同一数据分片所在的全部数据节点都产生故障时,可能会出现两个数据分片的主备数据存储在了相同的多个节点的情况,例如,在表1的例子中,数据分片A和数据分片C均存储在了节点1、2、3中,当该3个数据节点出现故障时,数据分片A和C均会丢失。此外,当单个节点发生故障后,需要通过数据恢复来形成新的节点,而在数据恢复时,并发恢复的效率不高,例如在表1的例子中,当节点6发生故障后,最多分别通过一个存储有F、B、E数据分片的节点,例如节点3、4、5来并发实现数据恢复,而其他节点则不可参与恢复。In the prior art, by storing the primary fragment and the backup fragment on different storage nodes, it can be guaranteed When a storage node fails, the data fragmentation is not lost, and the data slice is lost only when all the data nodes in which the same data fragment is located are faulty. However, when all the data nodes in which the same data fragment is located are faulty, there may be cases where the primary and secondary data of the two data fragments are stored in the same multiple nodes. For example, in the example of Table 1, the data Both fragment A and data fragment C are stored in nodes 1, 2, and 3. When the three data nodes fail, data fragments A and C are lost. In addition, when a single node fails, data recovery is required to form a new node. In the case of data recovery, the efficiency of concurrent recovery is not high. For example, in the example of Table 1, when the node 6 fails, it passes at most respectively. A node that stores F, B, and E data fragments, such as nodes 3, 4, and 5, concurrently implements data recovery, while other nodes cannot participate in recovery.
发明内容Summary of the invention
有鉴于此,本发明实施例提供了一种分布式存储系统中数据的分片存储方法、装置以及系统,提高了数据备份的可用性,以及数据恢复的效率。In view of this, the embodiments of the present invention provide a method, a device, and a system for storing data in a distributed storage system, which improve the availability of data backup and the efficiency of data recovery.
第一方面,本发明的实施例提供了一种数据的分片存储方法。方法包括确定待存储数据所要存储到的M个数据节点,并获取待存储数据的N个副本,其中,副本数数据的分片存储方法N为待存储数据的原始数据和备份数据的份数的总和,将所述N个副本中的每个副本依据同样的分片方式分片为X个数据分片,从而每个数据分片都有N个数据分片副本。将待存储数据的N个副本存储到M个存储节点中,即将X个数据分片中每个数据分片的N个数据分片副本分别存储于N个存储节点中。其中,使得副本存储在相同的N个存储节点中的数据分片副本的数量最小,具体的,使得数据分片副本存储在相同的N个存储节点中的数据分片的数量为P或P+1,其中P为X除以
Figure PCTCN2017079971-appb-000001
的整数商(整数商指不完全商或部分商,例如,X为10,
Figure PCTCN2017079971-appb-000002
为3时,整数商P为3)。由此,由于在相同的N个存储节点中的数据分片副本的数量最小,使得当任意N个存储节点同时发生故障时,可能造成的最大的数据损失最小,相对于现有技术而言,降低了数据损失比例,提高了数据备份的可用性。同时,由于每个数据分片的副本均匀分布在不同的节点上,当某一节点故障时,存储于该节点上的数据分片可以通过存储于多个不同的节点相应的副本进行恢复,从而提高并发恢复的效率。
In a first aspect, an embodiment of the present invention provides a fragment storage method for data. The method includes determining M data nodes to which the data to be stored is to be stored, and acquiring N copies of the data to be stored, wherein the fragment storage method N of the copy number data is the original data of the data to be stored and the number of copies of the backup data. Sum, each of the N copies is fragmented into X data fragments according to the same fragmentation mode, so that each data fragment has N data slice copies. The N copies of the data to be stored are stored in the M storage nodes, that is, the N data slice copies of each of the X data fragments are stored in the N storage nodes. Wherein, the number of data fragmentation copies stored in the same N storage nodes is the smallest, and specifically, the number of data fragments stored in the same N storage nodes is P or P+. 1, where P is the X divided by
Figure PCTCN2017079971-appb-000001
Integer quotient (integer quotient refers to an incomplete quotient or a partial quotient, for example, X is 10,
Figure PCTCN2017079971-appb-000002
When it is 3, the integer quotient P is 3). Thus, since the number of data slice copies in the same N storage nodes is the smallest, when any N storage nodes fail simultaneously, the largest data loss may be minimized, compared to the prior art. Reduce the data loss ratio and improve the availability of data backup. At the same time, since the copies of each data slice are evenly distributed on different nodes, when a node fails, the data fragments stored on the node can be recovered by storing corresponding copies in a plurality of different nodes, thereby Improve the efficiency of concurrent recovery.
在一种可能的设计中,对副本进行分时,分片数X基于最佳分片基数Y进行取值,其中
Figure PCTCN2017079971-appb-000003
分片数X可以等于或小于所述最佳分片基数Y与系数K的乘积,其中K为大于或者等于1的整数。
In a possible design, the copy is time-divided, and the number of slices X is based on the optimal slice base Y, wherein
Figure PCTCN2017079971-appb-000003
The number of slices X may be equal to or smaller than the product of the optimal slice base Y and the coefficient K, where K is an integer greater than or equal to one.
在该种设计的一种实现方式中,X的取值小于Y与K的乘积,此时,X的取值越接近Y与K的乘积时,当N个存储节点同时发生故障时,可能造成的最大的数据损失量占总存储数据量的比例越小。In an implementation of the design, the value of X is less than the product of Y and K. At this time, the closer the value of X is to the product of Y and K, when N storage nodes fail at the same time, it may cause The smaller the amount of data loss, the smaller the proportion of total stored data.
在该种设计的一种实现方式中,X的取值为Y与K的乘积,此时,当N个存储节点同时发生故障时,可能造成的最大的数据损失量占总存储数据量的比例最小。In an implementation of the design, the value of X is the product of Y and K. At this time, when N storage nodes fail simultaneously, the maximum amount of data loss that may be caused is the ratio of the total stored data. The smallest.
在该种设计的一种实现方式中,确定待存储数据的分片数X时,可以根据分布式存储系统的负载均衡需求,确定系数K。其中,K的值越大,待存储数据的均衡负载程度越高。In an implementation manner of the design, when determining the number of fragments X of the data to be stored, the coefficient K may be determined according to the load balancing requirement of the distributed storage system. The larger the value of K, the higher the degree of balanced load of the data to be stored.
在一种可能的设计中,根据当前分布式存储系统的均衡负载情况,确定所述待存储数 据的分片数X。当需要提高待存储数据在分布式存储系统中的均衡负载程度时,可以取较大分片数X,从而获得更小的数据粒度,提高均衡负载程度。In a possible design, the number of to-be-stored is determined according to the balanced load situation of the current distributed storage system. According to the number of fragments X. When it is required to increase the degree of balanced load of the data to be stored in the distributed storage system, a larger number of slices X may be taken, thereby obtaining a smaller data granularity and increasing the degree of balanced load.
在一种可能的设计中,根据待存储数据的安全需求确定待存储数据的副本数N,其中,所述副本数N的值越大,所能够满足的所述待存储数据的安全需要越高,即要损失一个数据分片,需要同时发生故障的节点数量更多。In a possible design, the number N of copies of the data to be stored is determined according to the security requirement of the data to be stored, wherein the larger the value of the number of copies N, the higher the security requirement of the data to be stored that can be satisfied. That is, to lose a data fragment, the number of nodes that need to fail at the same time is more.
在一种可能的设计中,根据所述待存储数据的数据类型和数据类型与副本数的对应关系,确定所述待存储数据的副本数N。从而对不同类型的数据提供更加灵活的数据可用性保障。In a possible design, the number N of copies of the data to be stored is determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies. This provides more flexible data availability guarantees for different types of data.
在一种可能的设计中,将待存储数据存储到存储节点中时,从M个数据节点中选出N个数据节点的的
Figure PCTCN2017079971-appb-000004
种数据节点的组合方式;确定分片数X除以
Figure PCTCN2017079971-appb-000005
所得的商P以及余数Q;在所述
Figure PCTCN2017079971-appb-000006
种所述数据节点的组合方式种中选择Q种数据节点的组合方式用于存储P+1个数据分片,其余
Figure PCTCN2017079971-appb-000007
个数据节点的组合方式用于存储P个数据分片,其中,每个数据分片的N个副本分别存储在要存储的数据节点的组合方式中N个不同的数据节点上。
In a possible design, when the data to be stored is stored in the storage node, the N data nodes are selected from the M data nodes.
Figure PCTCN2017079971-appb-000004
The combination of data nodes; determine the number of fragments X divided by
Figure PCTCN2017079971-appb-000005
The resulting quotient P and the remainder Q;
Figure PCTCN2017079971-appb-000006
Selecting a combination of Q data nodes in the combination of the data nodes for storing P+1 data fragments, and the rest
Figure PCTCN2017079971-appb-000007
The combination of data nodes is used to store P data slices, wherein N copies of each data slice are respectively stored on N different data nodes in a combination of data nodes to be stored.
第二方面,本发明实施例提供了一种确定分布式存储系统中数据分片的方法,方法包括确定待存储数据所要存储到的M个数据节点以及获取待存储数据的副本数N,从而确定待存储数据的分片数X。其中,等于或小于所述最佳分片基数Y与系数K的乘积,其中,K为大于或等于1的整数。X的取值越接近Y与K的乘积时,当N个存储节点同时发生故障时,可能造成的最大的数据损失量占总存储数据量的比例越小。当X的取值为Y与K的乘积时,可能造成的最大的数据损失量占总存储数据量的比例最小。In a second aspect, an embodiment of the present invention provides a method for determining data fragmentation in a distributed storage system, the method comprising: determining M data nodes to be stored in the data to be stored, and obtaining a copy number N of the data to be stored, thereby determining The number of fragments X of data to be stored. Wherein, it is equal to or smaller than the product of the optimal fragment base Y and the coefficient K, wherein K is an integer greater than or equal to 1. The closer the value of X is to the product of Y and K, the smaller the ratio of the largest data loss to the total amount of stored data when N storage nodes fail at the same time. When the value of X is the product of Y and K, the largest possible data loss is the smallest proportion of the total stored data.
在一种可能的设计中,根据分布式存储系统的负载均衡需求,确定系数K,其中,所述系数K为大于或等于1的整数,所述K的值越大,待存储数据的负载均衡程度越高。分片数X等于或小于
Figure PCTCN2017079971-appb-000008
In a possible design, the coefficient K is determined according to the load balancing requirement of the distributed storage system, wherein the coefficient K is an integer greater than or equal to 1, and the value of the K is larger, and the load balancing of the data to be stored is performed. The higher the degree. The number of fragments X is equal to or less than
Figure PCTCN2017079971-appb-000008
在一种可能的设计中,确定分片数X后,确定将待存储数据存储到所述M个存储节点中的存储策略,其中,将所述X个数据分片中的每个数据分片的N个数据分片副本分别存储于所述M个存储节点中的N个存储节点中,并使得数据分片副本存储在相同的N个存储节点中的数据分片的数量为P或P+1,其中P为X除以
Figure PCTCN2017079971-appb-000009
的整数商。
In a possible design, after determining the number of slices X, determining a storage policy for storing data to be stored in the M storage nodes, wherein each of the X data fragments is fragmented The N data slice copies are respectively stored in the N storage nodes of the M storage nodes, and the number of data fragments stored in the same N storage nodes as the data slice copies is P or P+ 1, where P is the X divided by
Figure PCTCN2017079971-appb-000009
The integer quotient.
在该设计的一种可能的实现方式中,确定存储策略的具体方式为:确定N个所述待存储数据副本存储于M个所述数据节点时,从M个数据节点中选出N个数据节点的的
Figure PCTCN2017079971-appb-000010
种数据节点的组合方式;确定分片数X除以
Figure PCTCN2017079971-appb-000011
所得的商P以及余数Q;在所述
Figure PCTCN2017079971-appb-000012
种所述数据节点的组合方式种中选择Q种数据节点的组合方式用于存储P+1个数据分片,其余
Figure PCTCN2017079971-appb-000013
个数据节点的组合方式用于存储P个数据分片,其中,每个数据分片的N个副本分别存储在要存储的数据节点的组合方式中N个不同的数据节点上。
In a possible implementation manner of the design, the specific manner of determining the storage policy is: determining that N of the data to be stored are stored in the M data nodes, and selecting N data from the M data nodes. Node's
Figure PCTCN2017079971-appb-000010
The combination of data nodes; determine the number of fragments X divided by
Figure PCTCN2017079971-appb-000011
The resulting quotient P and the remainder Q;
Figure PCTCN2017079971-appb-000012
Selecting a combination of Q data nodes in the combination of the data nodes for storing P+1 data fragments, and the rest
Figure PCTCN2017079971-appb-000013
The combination of data nodes is used to store P data slices, wherein N copies of each data slice are respectively stored on N different data nodes in a combination of data nodes to be stored.
在一种可能的设计中,可以根据待存储数据的安全需求确定待存储数据的副本数N,副本数N的值越大,所能够满足的所述待存储数据的安全需要越高。In a possible design, the number N of copies of the data to be stored may be determined according to the security requirement of the data to be stored. The larger the value of the number of copies N, the higher the security requirement of the data to be stored that can be satisfied.
在一种可能的设计中,根据所述待存储数据的数据类型和数据类型与副本数的对应关系,确定所述待存储数据的副本数N。从而可以对不同数据类型提供不同程度的数据可用 性的保障。In a possible design, the number N of copies of the data to be stored is determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies. This allows different levels of data to be available for different data types. Sexual protection.
第三方面,本发明实施例提供了一种分布式存储设备,该设备可以实现前述第一或者第二方面方法设计中的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。所述模块可以是软件和/或硬件。In a third aspect, an embodiment of the present invention provides a distributed storage device, which can implement the functions in the foregoing method design of the first or second aspect. The functions may be implemented by hardware or by corresponding software implemented by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware.
在一种可能的设计中,该设备的结构包括处理器,以及与所述处理器相连接的存储器。其中,处理器调用所述存储器中存储的指令以用于执行前述第一或者第二方面的方法。In one possible design, the structure of the device includes a processor and a memory coupled to the processor. Wherein the processor invokes instructions stored in the memory for performing the method of the first or second aspect described above.
在一种可能的设计中,该设备包括获取单元和存储单元,其中,获取单元用于确定待存储数据所要存储到的M个存储节点以及获取所述待存储数据的N个副本,其中所述N个副本包括所述待存储数据的原始数据和所述原始数据的N-1个备份数据,所述N个副本中的每个副本依据同样的分片方式被分片为X个数据分片以使得每个数据分片具有N个数据分片副本,N小于或等于M。确定单元将所述待存储数据的N个副本存储到所述M个存储节点,其中,将所述X个数据分片中的每个数据分片的N个数据分片副本分别存储于所述M个存储节点中的N个存储节点中,并使得数据分片副本存储在相同的N个存储节点中的数据分片的数量为P或P+1,其中P为X除以
Figure PCTCN2017079971-appb-000014
的整数商。
In a possible design, the device includes an obtaining unit and a storage unit, wherein the obtaining unit is configured to determine M storage nodes to which the data to be stored is to be stored and obtain N copies of the data to be stored, wherein the The N copies include the original data of the data to be stored and N-1 backup data of the original data, and each of the N copies is sliced into X data fragments according to the same fragmentation manner. So that each data slice has N data slice copies, N is less than or equal to M. Determining, by the unit, the N copies of the data to be stored to the M storage nodes, wherein N data slice copies of each of the X data fragments are respectively stored in the Among the N storage nodes among the M storage nodes, and the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by
Figure PCTCN2017079971-appb-000014
The integer quotient.
在一种可能的设计中,该设备包括获取单元和确定单元,其中,获取单元用于获取待存储数据所要存储到的M个数据节点以及待存储数据的副本数N。确定单元根据前述第二方面的方法确定分片数X。In a possible design, the device comprises an obtaining unit and a determining unit, wherein the acquiring unit is configured to acquire the M data nodes to which the data to be stored is to be stored and the number N of copies of the data to be stored. The determining unit determines the number of slices X according to the method of the aforementioned second aspect.
第四方面,本发明实施例提供了一种分布式存储系统。该分布式存储系统包括客户端、多个硬盘以及分布式存储设备,该分布式存储设备可以是前述第三方面的设计中的设备,用于执行前述第一方面或者第二方面相对于的方法。In a fourth aspect, an embodiment of the present invention provides a distributed storage system. The distributed storage system includes a client, a plurality of hard disks, and a distributed storage device, and the distributed storage device may be the device in the foregoing design of the third aspect, for performing the foregoing first aspect or the second aspect relative to the method .
第五方面,本发明实施例提供了又一种分布式存储系统,该系统包括客户端和分布式存储服务器系统,其中,分布式存储服务器系统可以包括:控制服务器、运维管理(OAM)服务器、业务服务器、存储资源池以及存储引擎。这里,存储引擎可以用于执行前述第一方面或者第二方面中相应的方法。In a fifth aspect, an embodiment of the present invention provides a distributed storage system, where the system includes a client and a distributed storage server system, where the distributed storage server system may include: a control server, an operation and maintenance management (OAM) server. , business servers, storage resource pools, and storage engines. Here, the storage engine may be used to perform the corresponding method of the aforementioned first aspect or second aspect.
相较于现有技术,本发明提供了一种分布式环境下的数据分片存储的方法,提高了数据的可用性,以及当节点故障时数据恢复的效率。Compared with the prior art, the present invention provides a method for data slice storage in a distributed environment, which improves the availability of data and the efficiency of data recovery when a node fails.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。显而易见地,下面附图中反映的仅仅是本发明的一部分实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得本发明的其他实施方式。而所有这些实施例或实施方式都在本发明的 保护范围之内。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, only some embodiments of the present invention are reflected in the following drawings, and other embodiments of the present invention can be obtained according to the drawings without any inventive labor for those skilled in the art. . And all of these embodiments or embodiments are in the invention Within the scope of protection.
图1为本发明的一种可能的系统架构示意图;1 is a schematic diagram of a possible system architecture of the present invention;
图2为本发明实施例提供的一种确定分布式存储系统分片存储策略的流程示意图;2 is a schematic flowchart of determining a fragment storage policy of a distributed storage system according to an embodiment of the present invention;
图3为本发明实施例中的一种可能的分布式存储系统分片存储策略的示意图;3 is a schematic diagram of a possible fragment storage policy of a distributed storage system according to an embodiment of the present invention;
图4为本发明实施例中的一种可能的分布式存储系统分片存储策略下节点恢复的示意图;4 is a schematic diagram of node recovery in a possible distributed storage system fragment storage policy according to an embodiment of the present invention;
图5为本发明实施例中一种可能的场景下多节点故障后分片数量与数据损失比例的关系示例图;FIG. 5 is a diagram showing an example of relationship between the number of fragments after a multi-node failure and a data loss ratio in a possible scenario according to an embodiment of the present invention; FIG.
图6为本发明实施例提供的一种分布式存储设备的结构示意图;FIG. 6 is a schematic structural diagram of a distributed storage device according to an embodiment of the present disclosure;
图7为本发明实施例提供的又一种分布式存储设备的结构示意图;FIG. 7 is a schematic structural diagram of still another distributed storage device according to an embodiment of the present disclosure;
图8为本发明实施例提供的一种分布式存储系统的结构示意图;FIG. 8 is a schematic structural diagram of a distributed storage system according to an embodiment of the present disclosure;
图9为本发明实施例提供的又一种分布式存储系统的结构示意图;FIG. 9 is a schematic structural diagram of still another distributed storage system according to an embodiment of the present disclosure;
图10为本发明实施例提供的一种分布式存储装置的结构示意图。FIG. 10 is a schematic structural diagram of a distributed storage device according to an embodiment of the present invention.
图11为本发明实施例提供的又一种分布式存储装置的结构示意图。FIG. 11 is a schematic structural diagram of still another distributed storage device according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
为了便于对本发明实施例的理解,首先介绍本发明所应用于的分布式存储系统的系统架构。分布式存储系统,是将数据分散存储在多台独立的设备上。传统的网络存储系统采用集中的存储服务器存放所有数据,存储服务器成为系统性能的瓶颈,也是可靠性和安全性的焦点,不能满足大规模存储应用的需要。分布式网络存储系统采用可扩展的系统结构,利用多台存储服务器分担存储负荷,利用位置服务器定位存储信息,它不但提高了系统的可靠性、可用性和存取效率,还易于扩展。In order to facilitate the understanding of the embodiments of the present invention, the system architecture of the distributed storage system to which the present invention is applied is first introduced. Distributed storage systems distribute data across multiple independent devices. The traditional network storage system uses a centralized storage server to store all data. The storage server becomes a bottleneck of system performance, and is also the focus of reliability and security, and cannot meet the needs of large-scale storage applications. The distributed network storage system adopts a scalable system structure, uses multiple storage servers to share the storage load, and uses the location server to locate the storage information, which not only improves the reliability, availability and access efficiency of the system, but also is easy to expand.
如图1所示,是一种分布式存储系统的架构示意图。需要说明的是,该分布式存储系统只是举例说明,本发明的应用范围并不局限于此。如图所示的分布式存储系统中,包括分布式数据库引擎102和分布式数据存储节点107。其中,分布式数据库引擎102是系统核心,其负责数据的解析、路由、分发、合并等操作,并将底层的众多存储节点管理起来;分布式存储节点由多个用于存储数据的数据节点组合,用户可以根据需求灵活的构建不同规模的数据节点集群。As shown in FIG. 1, it is a schematic diagram of a distributed storage system. It should be noted that the distributed storage system is only an example, and the scope of application of the present invention is not limited thereto. As shown in the distributed storage system, a distributed database engine 102 and a distributed data storage node 107 are included. The distributed database engine 102 is the core of the system, and is responsible for data parsing, routing, distribution, merging, etc., and manages a plurality of storage nodes at the bottom; the distributed storage node is composed of a plurality of data nodes for storing data. Users can flexibly build clusters of data nodes of different sizes according to their needs.
分布式数据库引擎102包含了API(Application Programming Interface,应用程序编程接口)模块103,为客户端提供接口以调用数据库。资源申请模块104根据客户端的存储需求,以及分布式数据存储节点中提供的每个节点的存储量确定提供给客户端本次存储需求的节点数量,可选的,还可以根据用户提交的数据可靠性需求确定待存储数据备份副本的数量。数据管理模块105根据申请的存储资源确定存储策略,即数据分片的数量以及数据 分片与存储节点之间的对应关系。数据路由模块106根据数据管理模块确定的存储策略,对来自客户端的请求进行路由,将数据进行分片并路由到数据节点上,或者聚合各个节点的数据并返回客户端。The distributed database engine 102 includes an API (Application Programming Interface) module 103 that provides an interface to the client to invoke the database. The resource application module 104 determines the number of nodes that are provided to the client for the current storage requirement according to the storage requirements of the client and the storage capacity of each node provided in the distributed data storage node. Optionally, the resource application module may also be reliable according to the data submitted by the user. The sexual demand determines the number of backup copies of the data to be stored. The data management module 105 determines a storage policy according to the applied storage resource, that is, the number of data fragments and the data. The correspondence between the fragment and the storage node. The data routing module 106 routes the request from the client according to the storage policy determined by the data management module, fragments the data and routes the data to the data node, or aggregates the data of each node and returns the data to the client.
应当理解的是,在分布式存储系统中,模块的功能是由服务器来实现的,通常情况下,某一功能模块可以由独立的服务器所实现,但是,在一些情况下,也可以通过一个服务器实现多个功能模块,或者一个功能模块有多个服务器所组成的集群来实现。It should be understood that in a distributed storage system, the function of the module is implemented by a server. Usually, a certain functional module can be implemented by a separate server, but in some cases, it can also be through a server. A plurality of functional modules are implemented, or a functional module has a cluster of multiple servers.
参考图2,是本发明实施例一的流程示意图。在本发明实施例中,提供了一种确定分布式存储系统中数据分片存储策略方法。结合前述介绍,本发明实施例主要通过对数据管理模块105的改进,从而确定数据分片的存储策略。Referring to FIG. 2, it is a schematic flowchart of Embodiment 1 of the present invention. In the embodiment of the present invention, a method for determining a data fragmentation storage policy in a distributed storage system is provided. In conjunction with the foregoing description, the embodiment of the present invention mainly determines the storage strategy of the data fragment by improving the data management module 105.
在本发明实施例中,客户端向分布式存储系统发起了数据存储请求,将待存储数据存储到分布式存储系统中。应当理解的,本实施例中,对于方法步骤之间的执行顺序并不进行限定。参考图2本领域技术人员可以理解,S101、S102均为S103步骤的前置步骤,即S101、S102可以按照任意顺序执行,也可以一个或多个步骤并行同时执行。In the embodiment of the present invention, the client initiates a data storage request to the distributed storage system, and stores the data to be stored into the distributed storage system. It should be understood that, in this embodiment, the order of execution between method steps is not limited. Referring to FIG. 2, those skilled in the art can understand that S101 and S102 are both pre-steps of step S103, that is, S101 and S102 may be executed in any order, or may be performed in parallel in one or more steps.
如图2所示,该方法包括:As shown in Figure 2, the method includes:
S101、确定待存储数据所要存储到的M个数据节点。S101. Determine M data nodes to which the data to be stored is to be stored.
具体的,一般而言,根据待存储数据的数据大小,以及每个数据节点所能提供的存储量,可以确定要存储到的M个数据节点。在一些情况下,数据所存储的数据节点数量也可以为预设的固定值,或者为存储节点的总数。此外,数据节点数M也可以根据通过API接口,根据用户的设定值来确定。Specifically, in general, according to the data size of the data to be stored and the amount of storage that each data node can provide, the M data nodes to be stored can be determined. In some cases, the number of data nodes stored in the data may also be a preset fixed value or the total number of storage nodes. In addition, the number of data nodes M can also be determined according to the set value of the user according to the API interface.
在一种设计中,在确定要存储到的数据节点时,可以根据每个节点的负载情况,选择负载程度较低的M个数据节点作为要存储到的数据节点,从而提高整个分布式存储系统的均衡负载程度。In a design, when determining the data node to be stored, the M data nodes with lower load levels can be selected as the data nodes to be stored according to the load condition of each node, thereby improving the entire distributed storage system. The degree of balanced load.
S102、获取待存储数据的N个副本。S102. Obtain N copies of data to be stored.
为了方便描述,在本发明中,副本数N指的是待存储数据的原始数据和备份数据的份数的总和,即N个副本包括所述待存储数据的原始数据和所述原始数据的N-1个备份数据。为了保证数据的可用性,需要对待存储数据进行冗余备份。待存储数据的副本数N越大,待存储数据的冗余程度越高,从而待存储数据的可靠性也越好,但同时也会占用更大的存储空间。在一般情况下,数据的副本数为预设的值,可以通过用户事先设定,或者根据不同的待存储数据,在每次存储请求时进行设定。For convenience of description, in the present invention, the copy number N refers to the sum of the original data of the data to be stored and the number of copies of the backup data, that is, the N copies include the original data of the data to be stored and the N of the original data. - 1 backup data. In order to ensure the availability of data, redundant backups of stored data are required. The larger the number N of copies of the data to be stored, the higher the degree of redundancy of the data to be stored, and the better the reliability of the data to be stored, but also the larger storage space. In general, the number of copies of the data is a preset value, which can be set by the user in advance, or according to different data to be stored, each time the request is stored.
可以理解的,为了保证副本的冗余备份的隔离性,同一数据的副本应当存储于不同的存储节点上,从而保证当某一存储节点故障时,其他副本不会丢失。因此,对于同一分布式系统而言,副本数N的值应当小于或者等于数据节点数M的值。It can be understood that in order to ensure the isolation of the redundant backup of the copy, the copy of the same data should be stored on different storage nodes, so as to ensure that when a storage node fails, other copies are not lost. Therefore, for the same distributed system, the value of the copy number N should be less than or equal to the value of the number M of data nodes.
可选的,可以根据待存储数据的安全需求确定待存储数据的副本数N。当待存储数据的安全需求越高时,所确定的待存储数据副本数N的值越大。数据的安全需求可以通过用户的存储请求直接获得,即用户在不同的存储请求对待存储数据提出不同的安全需求;也可以通过预设的判断逻辑,例如不同的数据类型与安全需求的对应关系,或者不同的用户类型与安全需求的对应关系等,确定待存储数据的安全需求。在一些服务器平台中,例如 PaaS(Platform-as-a-Service,平台即服务)平台下,用户在平台上所部属的不同应用的数据的安全需求不同,还可以根据不同的应用或者应用类型来确定待存储数据的安全需求。Optionally, the number N of copies of the data to be stored may be determined according to security requirements of the data to be stored. When the security requirement of the data to be stored is higher, the determined value of the number N of copies of the data to be stored is larger. The security requirements of the data can be directly obtained by the user's storage request, that is, the user requests different security requirements for storing the data in different storage requests; and the preset judgment logic, for example, the correspondence between different data types and security requirements, Or the correspondence between different user types and security requirements, etc., to determine the security requirements of the data to be stored. In some server platforms, for example Under the platform of the platform-as-a-service platform, the security requirements of different applications of the users on the platform are different. The security requirements of the data to be stored can also be determined according to different applications or application types. .
可选的,可以根据待存储数据的数据类型和数据类型与副本数的对应关系,确定待存储数据的副本数N,从而对不同类型的数据以不同程度的可用性保护。Optionally, the number N of copies of the data to be stored may be determined according to the data type of the data to be stored and the correspondence between the data type and the number of copies, so as to protect different types of data with different degrees of availability.
N个副本中的每个副本依据同样的分片方式被分片为X个数据分片以使得每个数据分片具有N个数据分片副本。分片数X是指将一个待存储数据副本进行分片的分片数量。在进行分片存储时,待存储数据的主数据和备份数据均需要以同样的分片方式进行分片,从而的到主数据分片和与之对应的备份数据分片。在进行分片后,每个数据副本都被分成同样的X个数据分片,因而对于一个数据分片而言,存在N个包含该数据分片在内的相同的分片副本。可以理解的,当待存储数据的分片数X确定后,待存储数据共计有N×X个数据分片需要存储到M个节点中。Each of the N copies is sliced into X data slices according to the same slice mode such that each data slice has N data slice copies. The number of slices X refers to the number of slices in which a copy of the data to be stored is sliced. When performing slice storage, both the primary data and the backup data of the data to be stored need to be sliced in the same fragmentation manner, thereby segmenting the primary data and the corresponding backup data. After the sharding, each copy of the data is divided into the same X data fragments, so for one data shard, there are N identical shards containing the data shard. It can be understood that, when the number of fragments X of the data to be stored is determined, a total of N×X data fragments to be stored need to be stored in the M nodes.
在一些情况下,分片数X可以由用户预先设置,即可以对任一待存储数据均采用相同的分片数,或者由用户根据不同的待存储数据在进行存储请求时设置该待存储数据的分片数。In some cases, the number of fragments X may be preset by the user, that is, the same number of fragments may be used for any data to be stored, or the data may be set by the user when performing a storage request according to different data to be stored. The number of shards.
可选的,由于分片数越高,存储到存储节点中的数据分片数量越多,从而减小数据分片粒度,更容易将数据均匀的存储到各个节点中,尽可能的实现负载均衡。因此,根据当前分布式存储系统的负载均衡情况,对待存储数据设置不同的分片数。例如,当分布式系统对于负载均衡的需求较高时,动态的增大分片数,以提高分布式系统的均衡负载程度。Optionally, the higher the number of fragments, the greater the number of data fragments stored in the storage node, thereby reducing the granularity of the data fragmentation, and it is easier to uniformly store the data in each node, thereby achieving load balancing as much as possible. . Therefore, according to the load balancing situation of the current distributed storage system, different number of fragments are set for the stored data. For example, when the distributed system has a high demand for load balancing, the number of fragments is dynamically increased to improve the balanced load of the distributed system.
S103、将所述待存储数据的N个副本存储到所述M个存储节点。S103. Store N copies of the to-be-stored data to the M storage nodes.
具体的,在进行存储时遵循以下方法:将所述X个数据分片中的每个数据分片的N个数据分片副本分别存储于所述M个存储节点中的N个存储节点中,并使得数据分片副本存储在相同的N个存储节点中的数据分片的数量为P或P+1,其中P为X除以
Figure PCTCN2017079971-appb-000015
的整数商。
Specifically, the following method is performed when storing: storing N data slice copies of each of the X data fragments in N storage nodes of the M storage nodes, And the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by
Figure PCTCN2017079971-appb-000015
The integer quotient.
相同的N个存储节点上的数据分片的数量,是指在分布式系统中,对于任意N个数据节点,在这N个数据节点上存储了该数据分片的全部N个副本的数据分片的数量。使得数据分片副本存储在相同的N个存储节点中的数据分片的数量为P或P+1,其中P为X除以
Figure PCTCN2017079971-appb-000016
的整数商,实质上即是使得存储在相同的N个节点上的数据分片的数量最少。即需要将数据节点均匀的存储在可能的N个数据节点的组合中,使得每种N个数据节点组合所存储的数据分片数量相对均匀,从而任意选取N个数据节点,在可能的存储了全部N个副本的数据节点的数量中,其最大值最小。可以理解的,每个数据分片存储于N个节点上,即一种N个节点的组合。而对于包含M个存储节点的分布式系统而言,从中选出N个节点的组合,总共可以有
Figure PCTCN2017079971-appb-000017
种不同的组合。因此,当分片数X小于
Figure PCTCN2017079971-appb-000018
时,每个数据分片均可以选择不同的N个数据分片,即存储在相同的N个节点上的数据分片数量为1;当分片数X大于
Figure PCTCN2017079971-appb-000019
时,则会出现多个数据分片存储在相同的N个节点上的情况。具体的,设分片数X除以
Figure PCTCN2017079971-appb-000020
所得的整数商P以及余数Q,那么,在
Figure PCTCN2017079971-appb-000021
种数据节点的组合方式种中选择Q种数据节点的组合方式用于存储P+1个数据分片,其余
Figure PCTCN2017079971-appb-000022
个数据节点的组合方式用于存储P个数据分片。
The number of data fragments on the same N storage nodes means that in a distributed system, for any N data nodes, data points of all N copies of the data fragments are stored on the N data nodes. The number of pieces. The number of data fragments that cause the data slice copies to be stored in the same N storage nodes is P or P+1, where P is X divided by
Figure PCTCN2017079971-appb-000016
The integer quotient is essentially the smallest number of data fragments stored on the same N nodes. That is, the data nodes need to be uniformly stored in the combination of possible N data nodes, so that the number of data fragments stored in each N data node combination is relatively uniform, thereby arbitrarily selecting N data nodes, and possibly storing them. Among the number of data nodes of all N copies, the maximum value is the smallest. It can be understood that each data slice is stored on N nodes, that is, a combination of N nodes. For a distributed system that includes M storage nodes, a combination of N nodes is selected from which a total of
Figure PCTCN2017079971-appb-000017
Different combinations. Therefore, when the number of fragments X is smaller than
Figure PCTCN2017079971-appb-000018
When each data fragment can select different N data fragments, that is, the number of data fragments stored on the same N nodes is 1; when the number of fragments X is greater than
Figure PCTCN2017079971-appb-000019
When there are multiple data fragments stored on the same N nodes. Specifically, set the number of fragments X divided by
Figure PCTCN2017079971-appb-000020
The resulting integer quotient P and the remainder Q, then,
Figure PCTCN2017079971-appb-000021
A combination of Q data nodes in a combination of data nodes for storing P+1 data fragments, and the rest
Figure PCTCN2017079971-appb-000022
The combination of data nodes is used to store P data fragments.
例如,当有40个数据分片需要存储,每个数据分片有3个副本,而分布式系统中,共 有20种3个数据节点的不同组合,因此,当进行存储时,要使得存储在相同3个存储节点中的数据分片的数量最小,即每种组合存储2个不同数据节点的全部副本;而当有50个数据分片需要存储时,则需要其中10种数据节点组合存储3个不同的数据节点的全部副本,而另外10中数据节点组合存储2个不同的数据节点的全部副本。For example, when there are 40 data slices that need to be stored, each data slice has 3 copies, and in a distributed system, There are 20 different combinations of 3 data nodes. Therefore, when storing, the number of data fragments stored in the same 3 storage nodes is minimized, that is, each combination stores all copies of 2 different data nodes; When there are 50 data fragments to be stored, 10 data node combinations are required to store all the copies of 3 different data nodes, and the other 10 data node combinations store all the copies of 2 different data nodes.
下面举例给出一种具体的进行存储的算法,从而获得满足前述存储方式的存储策略。应当理解的是,该算法仅仅是对于根据前述原则将待存储数据存储于存储节点的存储策略的一种设计。对于本领域技术人员而言,在理解前述分配原则的基础上,可以通过多种不同的具体算法来实现上述的存储策略,在此不再一一列举。The following example shows a specific storage algorithm to obtain a storage strategy that satisfies the foregoing storage mode. It should be understood that the algorithm is merely one design for a storage strategy for storing data to be stored in a storage node in accordance with the foregoing principles. For those skilled in the art, on the basis of understanding the foregoing allocation principles, the foregoing storage strategies can be implemented by a plurality of different specific algorithms, which are not enumerated here.
该算法包括如下步骤:The algorithm includes the following steps:
1、为X个待存储数据分片编号为1,2,3……X;1. The X pieces of data to be stored are numbered 1, 2, 3, ... X;
2、为每个数据节点编号为1,2,3……N;2. For each data node number 1, 2, 3...N;
3、建立存储分配表,该存储分配表包含
Figure PCTCN2017079971-appb-000023
行,N列,一行包含N个数据节点,每行所包含的数据节点的组合互不相同,即每行中的数据节点为
Figure PCTCN2017079971-appb-000024
种数据节点组合中的一种;
3, establish a storage allocation table, the storage allocation table contains
Figure PCTCN2017079971-appb-000023
Row, N columns, one row contains N data nodes, each row contains different combinations of data nodes, that is, the data nodes in each row are
Figure PCTCN2017079971-appb-000024
One of a combination of data nodes;
4、确立数据分片与存储分配表中每行数据节点组合的对应关系。其中,设分片数X除以
Figure PCTCN2017079971-appb-000025
所得的商P,存储分配表中第N行分别与第K个数据节点相对应,
Figure PCTCN2017079971-appb-000026
(1≤i≤P),且K≤X;
4. Establish a correspondence between the data fragment and the data node combination of each row in the storage allocation table. Among them, set the number of fragments X divided by
Figure PCTCN2017079971-appb-000025
The obtained quotient P, the Nth row in the storage allocation table respectively corresponds to the Kth data node,
Figure PCTCN2017079971-appb-000026
(1 ≤ i ≤ P), and K ≤ X;
5、根据确立的对应关系,将每个数据节点的N个副本存储到与之对应的存储分配表上一行的N个数据节点中,所述的存储策略即为符合前述原则的存储策略。5. According to the established correspondence, the N copies of each data node are stored in the N data nodes of the row corresponding to the storage allocation table, and the storage policy is a storage policy that conforms to the foregoing principles.
为了方便对对本实施例的理解,下面按照本方法实施例得到的存储策略的一个具体示例。如图3所示,是将分片数X为20的待存储数据以副本数N为3存储于数据节点数M为6的分布式系统中,按照本发明实施例得到的存储策略中的一种。在该策略中,由于
Figure PCTCN2017079971-appb-000027
正好与分片数量相等,要满足存储在相同的3个存储节点中的数据分片的数量最小,即每种3个存储节点组合存储一个数据分片的3个副本。因此,在本例中,任意选出3个存储节点,其都只完整的存储了一个数据分片的全部3个副本。例如,节点1、2、3仅完整存储了数据分片A的全部3个副本,而节点1、2、4仅完整存储了数据分片B的全部3个副本。由于每个数据分片均分别存储在了3个不同的数据节点中,因此,要造成数据分片的彻底丢失,需要至少3个数据节点同时故障。而在本例的存储策略下,当任意3个数据节点同时故障时,只会造成1个数据分片的丢失。例如,当节点1、2、3故障时,仅会造成数据分片A的丢失,而其他数据分片则在其余节点中至少还保存有1个数据分片副本。
In order to facilitate the understanding of the present embodiment, a specific example of the storage strategy obtained in accordance with the embodiment of the present method is as follows. As shown in FIG. 3, the data to be stored having a fragment number X of 20 is stored in a distributed system having a data node number M of 6 in a number of copies of N, and one of the storage strategies obtained according to the embodiment of the present invention is as shown in FIG. Kind. In this strategy, due to
Figure PCTCN2017079971-appb-000027
Just equal to the number of shards, the number of data shards stored in the same three storage nodes is the smallest, that is, each of the three storage node combinations stores three copies of one data shard. Therefore, in this example, three storage nodes are arbitrarily selected, all of which store only all three copies of one data slice. For example, nodes 1, 2, 3 only completely store all 3 copies of data slice A, while nodes 1, 2, 4 only completely store all 3 copies of data slice B. Since each data fragment is stored in three different data nodes, at least three data nodes need to be simultaneously failed to completely lose the data fragment. In the storage strategy of this example, when any three data nodes fail at the same time, only one data fragment is lost. For example, when nodes 1, 2, and 3 fail, only data fragment A is lost, while other data fragments retain at least one data fragment replica in the remaining nodes.
同时,在本例中,当任意一个数据节点发生故障时,由于数据分片的副本均匀的分散在其余的数据节点上,其余节点均可以对该节点同时进行数据恢复。例如,如图4所示,列举了当节点5故障时的一种可能的数据恢复方式,该节点中所存储的数据分片的副本可以分别通过另外5个数据节点中灰色部分的的副本进行恢复。Meanwhile, in this example, when any one of the data nodes fails, since the copies of the data fragments are evenly dispersed on the remaining data nodes, the remaining nodes can simultaneously perform data recovery on the nodes. For example, as shown in FIG. 4, a possible data recovery mode when the node 5 fails is listed, and a copy of the data slice stored in the node can be respectively passed through a copy of the gray portion of the other 5 data nodes. restore.
根据本发明实施例,适用于将不同分片数和副本数的数据存储于分布式系统中,可以根据待存储数据的不同情况对分片数和副本数进行灵活的调整。由于保证了存储在相同N个节点的数据分片数量最少,因此,当N个节点同时发生故障时,可能造成的数据分片的丢失量最少,从而提高了数据备份的可用性。同时,由于本方案可以实现分片数高于节点 数的存储策略,因此单个节点所存储的的数据分片数量提高,当单个节点发送故障时,由于其存储的数据分片的副本均匀的分布在其他的数据节点中,因此,能够参与数据恢复的数据节点数量提高,即提高了节点故障时的数据恢复并发数量,提高了数据恢复效率。According to the embodiment of the present invention, the data of different number of fragments and the number of copies are stored in a distributed system, and the number of fragments and the number of copies can be flexibly adjusted according to different conditions of data to be stored. Since the number of data fragments stored in the same N nodes is guaranteed to be the least, when N nodes fail at the same time, the amount of data fragmentation may be minimized, thereby improving the availability of data backup. At the same time, because this scheme can achieve higher number of fragments than nodes The number of storage strategies, so the number of data fragments stored by a single node is increased. When a single node sends a failure, it can participate in data recovery because the copies of the stored data fragments are evenly distributed among other data nodes. The number of data nodes is increased, that is, the number of data recovery concurrency at the time of node failure is improved, and the data recovery efficiency is improved.
下面介绍本发明的第二种方法实施例。在本实施例中,给出了一种确定待存储数据分片数的方法,通过所述的待存储数据分片数,可以达到更优的数据可用性。本实施例中,如何确定待存储数据所要存储的数据节点数以及获取待存储数据的副本数的方法,与前述实施例的S101以及S102所介绍的方法相类似,在此不再赘述。此外,根据本方法实施例所确定的待存储数据的分片数,可以用于根据前述实施例中的S102步骤来将数据副本进行分片,与之相似的介绍在本实施例中亦不再赘述。Next, a second method embodiment of the present invention will be described. In this embodiment, a method for determining the number of data segments to be stored is given. By using the number of data segments to be stored, better data availability can be achieved. In this embodiment, how to determine the number of data nodes to be stored and the number of copies of the data to be stored is similar to the method described in S101 and S102 of the foregoing embodiment, and details are not described herein again. In addition, the number of fragments of the data to be stored determined according to the embodiment of the method may be used to slice the data copy according to the step S102 in the foregoing embodiment, and the similar introduction is no longer in this embodiment. Narration.
本实施例中确定待存储数据的分片数X包括:The number of fragments X for determining data to be stored in this embodiment includes:
S201、根据副本数N和存储节点数M,确定待存储数据的最佳分片基数Y,其中,
Figure PCTCN2017079971-appb-000028
S201. Determine, according to the number of copies N and the number of storage nodes M, an optimal fragment base Y of the data to be stored, where
Figure PCTCN2017079971-appb-000028
S202、根据最佳分片基数Y,获取待存储数据的分片数X,所述待存储数据的分片数X等于或小于所述最佳分片基数Y与系数K的乘积,其中,K为大于或等于1的整数。S202. Obtain a number of fragments X of data to be stored according to the optimal fragment base Y. The number of fragments X of the data to be stored is equal to or smaller than a product of the optimal fragment base Y and a coefficient K, where K Is an integer greater than or equal to 1.
由前述可知,当副本数为N,存储节点数为M时,将待存储数据存入所述M个节点时,待存储数据的每个数据分片均需要存储到N个节点中。从M个数据节点中选出N个数据节点,总共有
Figure PCTCN2017079971-appb-000029
中组合方式。为了提高数据可用性,使得当任意N个节点故障时,可能造成的数据分片数丢失最少,应当尽可能的将每个数据分片存储到不同的数据节点的组合中。由此可知,当分片数量小于
Figure PCTCN2017079971-appb-000030
时,分片数量越大,当N个节点故障时可能丢失的最大数据量越少。具体的,若分片数为X,且每个数据分片的大小相等的情况下,当
Figure PCTCN2017079971-appb-000031
时,N各节点故障时可能丢失的最大数据量为总数据量的1/X。因此,当
Figure PCTCN2017079971-appb-000032
时,可能丢失的总数据量最小,为
Figure PCTCN2017079971-appb-000033
It can be seen from the foregoing that when the number of copies is N and the number of storage nodes is M, when data to be stored is stored in the M nodes, each data fragment of the data to be stored needs to be stored in N nodes. Select N data nodes from M data nodes, a total of
Figure PCTCN2017079971-appb-000029
The combination method. In order to improve data availability, when any N nodes fail, the number of possible data fragments may be minimized, and each data fragment should be stored as much as possible in a combination of different data nodes. It can be seen that when the number of fragments is smaller than
Figure PCTCN2017079971-appb-000030
The larger the number of shards, the smaller the maximum amount of data that can be lost when N nodes fail. Specifically, if the number of fragments is X, and the size of each data fragment is equal,
Figure PCTCN2017079971-appb-000031
When the N nodes fail, the maximum amount of data that can be lost is 1/X of the total data amount. Therefore, when
Figure PCTCN2017079971-appb-000032
When the total amount of data that may be lost is the smallest,
Figure PCTCN2017079971-appb-000033
而当分片数X大于
Figure PCTCN2017079971-appb-000034
时,则会出现有2个或者更多的数据分片存储在了相同的N个节点上,当这N个节点故障时,可能会造成2个或者更多的数据分片丢失。设X除以
Figure PCTCN2017079971-appb-000035
的整数商为P,则当N个节点故障时,且每个数据分片的大小相等的情况下,可能丢失的最大数据量占总数据量的P/X。由此可知,当X为
Figure PCTCN2017079971-appb-000036
的整数倍时,P/N的值等于
Figure PCTCN2017079971-appb-000037
此时可能丢失的总数据量也是最小的。
And when the number of fragments X is greater than
Figure PCTCN2017079971-appb-000034
When there are 2 or more data fragments stored on the same N nodes, when the N nodes fail, 2 or more data fragments may be lost. Set X by
Figure PCTCN2017079971-appb-000035
The integer quotient is P, and when N nodes fail, and each data slice is equal in size, the maximum amount of data that may be lost is P/X of the total data amount. It can be seen that when X is
Figure PCTCN2017079971-appb-000036
Integer multiple, the value of P/N is equal to
Figure PCTCN2017079971-appb-000037
The total amount of data that can be lost at this time is also minimal.
为了便于理解,下面以数据节点数为6,副本数为3,且数据分片的大小相等的情况为例,举例说明在不同的分片数下,当任意3个数据节点发生故障时,可能造成的最大数据丢失量占全部数量的比例的变化情况。如图5所示,横坐标为数据分片数量X,纵坐标为当任意3个数据节点发生故障时,可能造成的最大数据丢失量占全部数量的比例,其函数图像如图所示。其中,For the sake of understanding, the following is an example of the case where the number of data nodes is 6, the number of copies is 3, and the size of the data fragments is equal. For example, when different data points fail under different number of fragments, The resulting maximum amount of data loss as a percentage of the total number of changes. As shown in FIG. 5, the abscissa is the number of data slices X, and the ordinate is the ratio of the maximum amount of data loss that may be caused when any three data nodes fail, and the function image is as shown in the figure. among them,
若数据分片数量为6,三点故障时的最大丢失数据为全部数据的1/6;If the number of data fragments is 6, the maximum loss data at the three-point failure is 1/6 of the total data;
若数据分片数量为7,三点故障时的最大丢失数据为全部数据的1/7;……If the number of data fragments is 7, the maximum loss data at the three-point failure is 1/7 of the total data;
若数据分片数量为20,三点故障时的最大丢失数据为全部数据的1/20;If the number of data fragments is 20, the maximum lost data at the three-point failure is 1/20 of the total data;
若数据分片数量为21,三点故障时的最大丢失数据为全部数据的2/21;……If the number of data fragments is 21, the maximum loss data at the three-point failure is 2/21 of the total data;
若数据分片数量为40,三点故障时的最大丢失数据为全部数据的2/40; If the number of data fragments is 40, the maximum loss data at the three-point failure is 2/40 of the total data;
若数据分片数量为41,三点故障时的最大丢失数据为全部数据的3/41;……If the number of data fragments is 41, the maximum loss data at the three-point failure is 3/41 of all data;
由此可知,以
Figure PCTCN2017079971-appb-000038
作为基数,所述待存储数据的分片数X等于所述最佳分片基数Y与系数K的乘积,当N个节点发生故障时,可能丢失的最大数据量占总数据量的比例最小,即数据的可用性最高。同时,数据分片数X小于
Figure PCTCN2017079971-appb-000039
Figure PCTCN2017079971-appb-000040
的整数倍数时,分片数X越接近
Figure PCTCN2017079971-appb-000041
Figure PCTCN2017079971-appb-000042
的整数倍数,则可能丢失的最大数据量占总数据量的比例越小,数据的可用性越高,因而当待存储数据的分片数X小于且比较接近所述最佳分片基数Y与系数K的乘积,也能获得相对较高的数据可用性。
It can be seen that
Figure PCTCN2017079971-appb-000038
As a base, the number of fragments X of the data to be stored is equal to the product of the optimal fragment base Y and the coefficient K. When N nodes fail, the maximum amount of data that may be lost accounts for the smallest proportion of the total data amount. That is, the availability of data is the highest. At the same time, the number of data fragments X is smaller than
Figure PCTCN2017079971-appb-000039
or
Figure PCTCN2017079971-appb-000040
When the integer multiple is used, the closer the fragment number X is
Figure PCTCN2017079971-appb-000041
or
Figure PCTCN2017079971-appb-000042
The integer multiple, the smaller the ratio of the maximum amount of data that may be lost to the total amount of data, the higher the availability of the data, so when the number of fragments X of the data to be stored is smaller than and close to the optimal fragment base Y and coefficient The product of K also gives relatively high data availability.
可见,当数据分片数X为
Figure PCTCN2017079971-appb-000043
Figure PCTCN2017079971-appb-000044
的大于1的整数倍时,从数据的可用性上考虑,可以达到最优值。而当不能取所述的最优值时,则取值小于且越接近
Figure PCTCN2017079971-appb-000045
Figure PCTCN2017079971-appb-000046
的大于1的整数倍数时,数据的可用性越高。因此,在确定分片数X时,为了达到数据可用性上的最优,应当选取
Figure PCTCN2017079971-appb-000047
Figure PCTCN2017079971-appb-000048
的大于1的整数倍数作为分片数X;而当综合其他因素的考虑,不将
Figure PCTCN2017079971-appb-000049
Figure PCTCN2017079971-appb-000050
的大于1的整数倍数作为分片数X时,分片数X的取值小于且越接近
Figure PCTCN2017079971-appb-000051
Figure PCTCN2017079971-appb-000052
的大于1的整数倍数,越能够提高数据的可用性。
It can be seen that when the number of data fragments X is
Figure PCTCN2017079971-appb-000043
or
Figure PCTCN2017079971-appb-000044
When it is greater than 1 integer multiple, the optimal value can be achieved from the availability of data. And when the optimal value cannot be taken, the value is smaller and closer
Figure PCTCN2017079971-appb-000045
or
Figure PCTCN2017079971-appb-000046
When the integer multiple is greater than 1, the availability of data is higher. Therefore, in determining the number of slices X, in order to achieve the best in data availability, it should be selected
Figure PCTCN2017079971-appb-000047
or
Figure PCTCN2017079971-appb-000048
An integer multiple greater than 1 as the number of fragments X; and when considering other factors, it will not
Figure PCTCN2017079971-appb-000049
or
Figure PCTCN2017079971-appb-000050
When the integer multiple of more than 1 is used as the number of fragments X, the value of the number of fragments X is smaller and closer.
Figure PCTCN2017079971-appb-000051
or
Figure PCTCN2017079971-appb-000052
The integer multiple of more than 1, the more the data availability can be improved.
具体的,在考虑分片数X小于
Figure PCTCN2017079971-appb-000053
Figure PCTCN2017079971-appb-000054
的大于1的整数倍数时,可以结合应用本发明的具体场景所要达到的效果来确定X的最终取值。当希望达到的技术效果为当N个节点故障时,可能造成的最大的数据丢失比例小于Q。结合前述可以理解,当K取大于或者等于1的整数时,在区间
Figure PCTCN2017079971-appb-000055
当N个节点故障时,可能造成的最大的数据丢失比例是单调递减的,且该区间内X的取值所对应的可能造成的最大数据丢失比例为K/X。因此,要使得K/X小于Q,即X的取值应当大于K/Q。相应的,当K/Q的值小于或等于
Figure PCTCN2017079971-appb-000056
时,X取区间
Figure PCTCN2017079971-appb-000057
中的任意值均可以满足可能造成的最大的数据丢失比例小于Q;当K/Q的值大于
Figure PCTCN2017079971-appb-000058
时,X在区间
Figure PCTCN2017079971-appb-000059
中取任意值可以满足可能造成的最大的数据丢失比例小于Q。
Specifically, considering that the number of fragments X is smaller than
Figure PCTCN2017079971-appb-000053
or
Figure PCTCN2017079971-appb-000054
When the integer multiple is greater than 1, the final value of X can be determined in conjunction with the effect to be achieved by applying the specific scene of the present invention. When the desired technical effect is achieved, when N nodes fail, the largest possible data loss ratio is less than Q. It can be understood from the foregoing that when K takes an integer greater than or equal to 1, in the interval
Figure PCTCN2017079971-appb-000055
When N nodes fail, the maximum data loss ratio that may be caused is monotonously decreasing, and the maximum data loss ratio corresponding to the value of X in the interval is K/X. Therefore, to make K/X smaller than Q, the value of X should be greater than K/Q. Correspondingly, when the value of K/Q is less than or equal to
Figure PCTCN2017079971-appb-000056
When X is taken
Figure PCTCN2017079971-appb-000057
Any value in the range can satisfy the maximum possible data loss ratio less than Q; when the value of K/Q is greater than
Figure PCTCN2017079971-appb-000058
When X is in the interval
Figure PCTCN2017079971-appb-000059
Any value in the middle can satisfy the maximum possible data loss ratio less than Q.
可选的,由于分片数越高,待存储数据存储到数据节点中可实现的负载均衡程度越高,因此,在以种实施方式中,可以根据分布式存储系统的负载均衡需求,确定系数K。该系数K为大于或等于1的整数,用于确定最佳分片基数的倍数。当待存储数据的均衡负载需求越高时,所述的系数K的值越大。分片数X的取值为等于或者小于所述最佳分片基数Y与所述系数K的乘积。即分片数X等于最佳分片基数Y与所述系数K的乘积时,可以获得最佳的数据可用性,且满足分布式存储系统的负载均衡需求;而当综合考虑其他因素,使得分片数X的取值不为最佳分片基数Y与所述系数K的乘积时时,则分片数X的取值越接近最佳分片基数Y与所述系数K的乘积,则数据可用性越高,且越能够满足分布式存储系统的负载均衡需求。Optionally, the higher the number of fragments, the higher the load balance that can be achieved by storing the data to be stored in the data node. Therefore, in the implementation manner, the coefficient can be determined according to the load balancing requirement of the distributed storage system. K. The coefficient K is an integer greater than or equal to 1, used to determine a multiple of the optimal slice base. When the equalized load demand of the data to be stored is higher, the value of the coefficient K is larger. The value of the number of slices X is equal to or smaller than the product of the optimal slice base Y and the coefficient K. That is, when the number of slices X is equal to the product of the optimal fragmentation base Y and the coefficient K, the optimal data availability can be obtained and the load balancing requirement of the distributed storage system can be satisfied; and when other factors are comprehensively considered, the fragmentation is made. When the value of the number X is not the product of the optimal fragment base Y and the coefficient K, the closer the value of the number of slices X is to the product of the optimal fragment base Y and the coefficient K, the more the data availability is. High, and more able to meet the load balancing needs of distributed storage systems.
在本实施例中,通过确定最佳分片基数Y,进而根据最佳分片基数确定分片数,可以在实现前述实施例的优点的基础上,进一步的提高数据的可用性,使得在相同的数据节点数以及副本数的情况下,根据最佳分片基数所确定的分片数可以达到最优或者相对最优的数据可用性。同时,由于所确定的分片数的值往往大于节点数,从而提高了分布式系统的负载均衡,以及当某一节点故障时的并发恢复效率。 In this embodiment, by determining the optimal slice base Y and determining the number of slices according to the optimal slice base, the availability of the data can be further improved on the basis of realizing the advantages of the foregoing embodiments, so that the same is In the case of the number of data nodes and the number of copies, the optimal or relatively optimal data availability can be achieved based on the number of slices determined by the optimal slice base. At the same time, since the determined number of fragments is often larger than the number of nodes, the load balancing of the distributed system is improved, and the concurrent recovery efficiency when a certain node fails.
参阅图6,图6为本申请下一实施例提供的一种分布式存储设备600,该设备600可以为部署在分布式存储系统中的一个节点,也可以为在分布式存储系统中独立的数据管理装置。该设备600包括但不限于:计算机、服务器等设备,如图6所示,该设备600包括:处理器601、存储器602、收发器603和总线604。收发器603用于与外部设备(例如分布式系统中的其他节点或分布式系统以外的网络设备)之间收发数据。设备600中的处理器601的数量可以是一个或多个。本申请的一些实施例中,处理器601、存储器602和收发器603可通过总线系统或其他方式连接。关于本实施例涉及的术语的含义以及举例,可以参前述实施例,此处不再赘述。Referring to FIG. 6, FIG. 6 is a distributed storage device 600 according to an embodiment of the present application. The device 600 may be a node deployed in a distributed storage system, or may be independent in a distributed storage system. Data management device. The device 600 includes, but is not limited to, a computer, a server, etc., as shown in FIG. 6, the device 600 includes a processor 601, a memory 602, a transceiver 603, and a bus 604. The transceiver 603 is configured to transceive data with and from external devices, such as other nodes in a distributed system or network devices other than distributed systems. The number of processors 601 in device 600 can be one or more. In some embodiments of the present application, processor 601, memory 602, and transceiver 603 may be connected by a bus system or other means. For the meanings and examples of the terms involved in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein again.
其中,存储器602中可以存储程序代码。处理器601用于调用存储器602中存储的程序代码,用于执行前述实施例中S101、S102、S103操作:The program code can be stored in the memory 602. The processor 601 is configured to call the program code stored in the memory 602 for performing the operations of S101, S102, and S103 in the foregoing embodiment:
对于上述操作的理解,可以参考前述第一个方法实施例中的介绍,在此不再赘述。For the understanding of the above operations, reference may be made to the description in the foregoing first method embodiment, and details are not described herein again.
可选的,处理器501还可以用于执行前述第一个实施例中的步骤的细化或者可选的方案。Optionally, the processor 501 is further configured to perform a refinement or an alternative of the foregoing steps in the first embodiment.
可选的,在本实施例中,所述的处理器501在确定分片数X时,还可以通过执行S201、S202操作确定分片数X:根据副本数N和存储节点数M,确定待存储数据的最佳分片基数Y,最佳分片基数
Figure PCTCN2017079971-appb-000060
根据最佳分片基数Y,获取待存储数据的分片数X,所述待存储数据的分片数X等于或小于所述最佳分片基数Y或者等于或小于所述最佳分片基数Y的自然数倍数。
Optionally, in this embodiment, when determining the number of fragments X, the processor 501 may further determine the number of fragments X by performing operations S201 and S202: determining, according to the number of copies N and the number of storage nodes M, The best shard base Y for storing data, the best shard base
Figure PCTCN2017079971-appb-000060
Acquiring the number of fragments X of the data to be stored according to the optimal fragment base Y, the number of fragments X of the data to be stored is equal to or smaller than the optimal fragment base Y or equal to or smaller than the optimal fragment base The natural multiple of Y.
对于执行上述步骤的理解,可以参考前述第二种实施例中的介绍,上述步骤也可以参照前述第二种实施例进行拓展或者细化。For the understanding of performing the above steps, reference may be made to the introduction in the foregoing second embodiment, and the above steps may also be extended or refined with reference to the foregoing second embodiment.
需要说明的是,这里的处理器601可以是一个处理元件,也可以是多个处理元件的统称。例如,该处理元件可以是中央处理器(Central Processing Unit,CPU),也可以是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,例如:一个或多个微处理器(digital singnal processor,DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,FPGA)。It should be noted that the processor 601 herein may be a processing component or a general term of multiple processing components. For example, the processing component may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application. For example, one or more digital singal processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs).
存储器603可以是一个存储装置,也可以是多个存储元件的统称,且用于存储可执行程序代码或应用程序运行装置运行所需要参数、数据等。且存储器603可以包括随机存储器(RAM),也可以包括非易失性存储器(non-volatile memory),例如磁盘存储器,闪存(Flash)等。The memory 603 may be a storage device or a collective name of a plurality of storage elements, and is used to store executable program code or parameters, data, and the like required for the application running device to operate. And the memory 603 may include random access memory (RAM), and may also include non-volatile memory such as a magnetic disk memory, a flash memory, or the like.
总线604可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 604 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 6, but it does not mean that there is only one bus or one type of bus.
该用户设备还可以包括输入输出装置,连接于总线604,以通过总线与处理器601等其它部分连接。用户可以通过输入设备实现本实施例中需要人工配置或者预设参数的步骤。该输入输出装置可以为操作人员提供一输入界面,以便操作人员通过该输入界面选择布控 项,还可以是其它接口,可通过该接口外接其它设备。The user equipment may also include input and output means coupled to bus 604 for connection to other portions, such as processor 601, via a bus. The user may implement the steps of manually configuring or preset parameters in this embodiment through the input device. The input and output device can provide an input interface for the operator, so that the operator can select the control through the input interface. The item can also be another interface through which other devices can be externally connected.
参阅图7,图7为本申请下一实施例提供的一种分布式存储设备700,该设备700可以为部署在分布式存储系统中的一个节点,也可以为在分布式存储系统中独立的数据管理装置。该设备700包括但不限于:计算机、服务器等设备,如图7所示,该设备700包括:处理器701、存储器702、收发器703和总线704。收发器703用于与外部设备(例如分布式系统中的其他节点或分布式系统以外的网络设备)之间收发数据。设备700中的处理器701的数量可以是一个或多个。本申请的一些实施例中,处理器701、存储器702和收发器703可通过总线系统或其他方式连接。关于本实施例涉及的术语的含义以及举例,可以参前述实施例,此处不再赘述。Referring to FIG. 7, FIG. 7 is a distributed storage device 700 according to an embodiment of the present disclosure. The device 700 may be a node deployed in a distributed storage system, or may be independent in a distributed storage system. Data management device. The device 700 includes, but is not limited to, a computer, a server, etc., as shown in FIG. 7, the device 700 includes a processor 701, a memory 702, a transceiver 703, and a bus 704. The transceiver 703 is configured to transceive data with and from external devices, such as other nodes in a distributed system or network devices other than distributed systems. The number of processors 701 in device 700 can be one or more. In some embodiments of the present application, processor 701, memory 702, and transceiver 703 may be connected by a bus system or other means. For the meanings and examples of the terms involved in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein again.
其中,存储器702中可以存储程序代码。处理器701用于调用存储器702中存储的程序代码,用于执行以下S201、S202操作操作,从而确定在分布式存储系统中进行分片存储时的分片数量。The program code can be stored in the memory 702. The processor 701 is configured to call the program code stored in the memory 702 for performing the following S201, S202 operation operations, thereby determining the number of fragments when the slice storage is performed in the distributed storage system.
对于上述操作的理解,可以参考前述第二个方法实施例中的介绍,在此不再赘述。For the understanding of the foregoing operations, refer to the introduction in the foregoing second method embodiment, and details are not described herein again.
可选的,处理器701还可以用于执行前述第二个实施例中的步骤的细化或者可选的方案。Optionally, the processor 701 is further configured to perform the refinement or the optional solution of the foregoing steps in the second embodiment.
需要说明的是,这里的处理器701可以是一个处理元件,也可以是多个处理元件的统称。例如,该处理元件可以是中央处理器(Central Processing Unit,CPU),也可以是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路,例如:一个或多个微处理器(digital singnal processor,DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,FPGA)。It should be noted that the processor 701 herein may be a processing component or a collective name of multiple processing components. For example, the processing component may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application. For example, one or more digital singal processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs).
存储器703可以是一个存储装置,也可以是多个存储元件的统称,且用于存储可执行程序代码或应用程序运行装置运行所需要参数、数据等。且存储器703可以包括随机存储器(RAM),也可以包括非易失性存储器(non-volatile memory),例如磁盘存储器,闪存(Flash)等。The memory 703 may be a storage device or a collective name of a plurality of storage elements, and is used to store executable program code or parameters, data, and the like required for the application running device to operate. And the memory 703 may include random access memory (RAM), and may also include non-volatile memory such as a magnetic disk memory, a flash memory, or the like.
总线704可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 704 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 7, but it does not mean that there is only one bus or one type of bus.
该设备还可以包括输入输出装置,连接于总线704,以通过总线与处理器701等其它部分连接。用户可以通过输入设备实现本实施例中需要人工配置或者预设参数的步骤。该输入输出装置可以为操作人员提供一输入界面,以便操作人员通过该输入界面选择布控项,还可以是其它接口,可通过该接口外接其它设备。The device may also include input and output devices coupled to bus 704 for connection to other portions, such as processor 701, via a bus. The user may implement the steps of manually configuring or preset parameters in this embodiment through the input device. The input/output device can provide an input interface for the operator, so that the operator can select the control item through the input interface, and can also be other interfaces through which other devices can be externally connected.
图8是根据本发明的实施例的一种分布式存储系统800的意性结构图。分布式存储系统800包括:客户端810,多个硬盘820和分布式存储设备830。分布式存储设备830可以是图6或者图7所示的分布式存储设备600和分布式存储设备700,在此不再赘述。 FIG. 8 is a schematic block diagram of a distributed storage system 800 in accordance with an embodiment of the present invention. The distributed storage system 800 includes a client 810, a plurality of hard disks 820, and a distributed storage device 830. The distributed storage device 830 may be the distributed storage device 600 and the distributed storage device 700 shown in FIG. 6 or FIG. 7 , and details are not described herein again.
在本实施例中所提供的分布式系统的硬件实体,可以参照前述的图1中分布式系统架构进行理解。图1中分布式数据库引擎102在本发明实施例中以本不是存储装置830作为硬件实体,因而在本发明实施例所改进的数据管理模块105,在本实施例中所对应承载的硬件实体是分布式存储装置830。The hardware entity of the distributed system provided in this embodiment can be understood by referring to the foregoing distributed system architecture in FIG. In the embodiment of the present invention, the distributed database engine 102 is not a storage device 830 as a hardware entity. Therefore, in the data management module 105 improved in the embodiment of the present invention, the hardware entity corresponding to the bearer in the embodiment is Distributed storage device 830.
分布式存储装置830根据用户通过客户端810发送的存储/读取请求,在多个硬盘820上存储/读取用户的数据文件。The distributed storage device 830 stores/reads the user's data file on the plurality of hard disks 820 according to the storage/read request transmitted by the user through the client 810.
图9是根据本发明的实施例给出的另一种的分布式存储系统900的示意性结构图。分布式存储系统900包括:客户端910和分布式存储服务器系统920。FIG. 9 is a schematic block diagram of another distributed storage system 900 according to an embodiment of the present invention. Distributed storage system 900 includes a client 910 and a distributed storage server system 920.
客户端910可以通过互联网连接到存储服务器系统920。 Client 910 can connect to storage server system 920 via the Internet.
客户端910可以运行分布式存储系统的客户端代理程序,用于支撑各种类型的分布式存储应用接入分布式存储系统,例如,客户端代理程序可以实现个人在线存储和备份、企业在线存储和备份、应用在线存储或者其它新兴的存储和备份等等。The client 910 can run a client agent of the distributed storage system to support various types of distributed storage applications to access the distributed storage system. For example, the client agent can implement personal online storage and backup, enterprise online storage. And backup, application online storage or other emerging storage and backup, and more.
分布式存储服务器系统920可以包括:控制服务器930、运维管理(OAM)服务器940、业务服务器950、存储资源池970以及存储引擎980。这里,存储引擎980可以为图6或图7的分布式存储装置的例子。The distributed storage server system 920 can include a control server 930, an operation and maintenance management (OAM) server 940, a service server 950, a storage resource pool 970, and a storage engine 980. Here, storage engine 980 may be an example of the distributed storage device of FIG. 6 or 7.
本实施例中的硬件装置可以对应前述图1中的分布式架构进行理解,通过本实施例中的存储引擎980实现分布式数据库引擎102的功能,而分布式存储服务器系统920中还包含分布式系统相关的其他功能性服务器,如控制服务器930、运维管理服务器940、业务服务器950等。The hardware device in this embodiment can be understood in accordance with the distributed architecture in FIG. 1 described above. The storage engine 980 in this embodiment implements the functions of the distributed database engine 102, and the distributed storage server system 920 also includes distributed Other functional servers related to the system, such as the control server 930, the operation and maintenance management server 940, the service server 950, and the like.
控制服务器930主要用于控制分布式存储系统执行各类存储业务,如组织数据的搬迁、搬移和备份、存储热点消除等等。The control server 930 is mainly used to control the distributed storage system to perform various types of storage services, such as relocation, moving and backup of organizational data, and elimination of storage hotspots.
运维管理服务器940可以提供存储系统的配置接口和运行维护接口,并提供日志、告警等功能。The operation and maintenance management server 940 can provide a configuration interface and an operation and maintenance interface of the storage system, and provides functions such as logs and alarms.
业务服务器950可以提供业务识别、鉴权等功能,完成业务的传递功能。The service server 950 can provide functions such as service identification and authentication to complete the service delivery function.
存储资源池970可以包括物理存储节点构成的存储资源池,例如,可以由存储服务器/存储单板960构成,各个物理存储节点中的虚拟节点构成了一个存储逻辑环,用户的数据文件可以存储在存储资源池中的虚拟节点上。The storage resource pool 970 may include a storage resource pool composed of physical storage nodes. For example, the storage resource pool may be composed of a storage server/storage board 960. The virtual nodes in each physical storage node form a storage logical ring, and the user's data files may be stored in the storage resource pool. On the virtual node in the storage resource pool.
存储引擎980可以提供分布式存储系统的主要功能的逻辑,这些逻辑可以部署在控制服务器930、业务服务器950、运维管理服务器940中的某一个设备上,也可以以分布式部署方式部署在控制服务器940、业务服务器950、运维管理服务器940和存储资源池970上。因此,本发明所对应的改进也可以是在上述硬件中实现。The storage engine 980 can provide logic for the main functions of the distributed storage system. The logic can be deployed on one of the control server 930, the service server 950, and the operation and maintenance management server 940, or can be deployed in a distributed deployment manner. The server 940, the service server 950, the operation and maintenance management server 940, and the storage resource pool 970 are provided. Therefore, the corresponding improvement of the present invention can also be implemented in the above hardware.
图10是是根据本发明的实施例的一种分布式存储设备1000的示意性结构图。分布式存储装置1000包括:获取单元1001、存储单元1002。FIG. 10 is a schematic structural diagram of a distributed storage device 1000 according to an embodiment of the present invention. The distributed storage device 1000 includes an obtaining unit 1001 and a storage unit 1002.
其中,获取单元1001用于确定待存储数据所要存储到的M个存储节点以及获取所述待存储数据的N个副本,其中所述N个副本包括所述待存储数据的原始数据和所述原始数 据的N-1个备份数据,所述N个副本中的每个副本依据同样的分片方式被分片为X个数据分片以使得每个数据分片具有N个数据分片副本,N小于或等于M。结合前述第一个实施例中所述的方法,本实施例中不再对获取单元1001获取数据节点数以及获取待存储数据的副本的具体方式或者可选的实施方式进行赘述。The obtaining unit 1001 is configured to determine M storage nodes to be stored to be stored, and obtain N copies of the to-be-stored data, where the N copies include original data of the data to be stored and the original Number According to the N-1 backup data, each of the N copies is sliced into X data fragments according to the same fragmentation manner so that each data fragment has N data slice copies, N Less than or equal to M. With reference to the method described in the foregoing first embodiment, the specific manner or optional implementation manner in which the acquiring unit 1001 obtains the number of data nodes and obtains a copy of the data to be stored is not described in this embodiment.
结合前述的装置实施例,获取单元1001可以通过包含图6所述分布式存储装置的收发器603,从而从外部网络或者分布式存储系统内部的其他设备中获取所述数据。或者,获取单元1001还可以包含输入输出设备,从而可以通过用户设置的方式获取所述数据。此外,获取单元1001还可以读取存储在该分布式存储设备的预设值,从而获取所述数据的预设数值。In conjunction with the foregoing apparatus embodiments, the obtaining unit 1001 may obtain the data from an external network or other device inside the distributed storage system through the transceiver 603 including the distributed storage device of FIG. 6. Alternatively, the obtaining unit 1001 may further include an input and output device so that the data can be acquired by means set by a user. In addition, the obtaining unit 1001 can also read a preset value stored in the distributed storage device, thereby acquiring a preset value of the data.
可选的,在本实施例中,所述的获取单元1001在获取待存储数据的部分时,对副本进行分片,还可以通过图6所述分布式存储装置的处理器601调用存储器602中存储的程序代码执行如下操作步骤来确定分片数X:根据所述副本数N和所述存储节点数M,确定所述待存储数据的最佳分片基数Y,所述最佳分片基数
Figure PCTCN2017079971-appb-000061
根据所述最佳分片基数Y,获取所述待存储数据的分片数X,所述待存储数据的分片数X等于或小于所述最佳分片基数Y与系数K的乘积,其中,K为大于或等于1的整数。
Optionally, in the embodiment, the acquiring unit 1001 fragments the replica when acquiring the portion of the data to be stored, and may also invoke the memory 602 by using the processor 601 of the distributed storage device in FIG. The stored program code performs the following operation steps to determine the number of slices X: determining an optimal slice base Y of the data to be stored according to the number of copies N and the number M of storage nodes, the optimal slice base
Figure PCTCN2017079971-appb-000061
Obtaining, according to the optimal fragment base Y, the number of fragments X of the data to be stored, the number of fragments X of the data to be stored being equal to or smaller than the product of the optimal fragment base Y and the coefficient K, wherein , K is an integer greater than or equal to 1.
可选的,根据所述分布式存储系统的负载均衡需求,确定系数K,其中,所述系数K为自然数,所述K的值越大,所述待存储数据的负载均衡程度越高。Optionally, the coefficient K is determined according to the load balancing requirement of the distributed storage system, where the coefficient K is a natural number, and the value of the K is larger, and the load balancing degree of the data to be stored is higher.
可选的,根据当前分布式存储系统的均衡负载情况,确定所述待存储数据的分片数X。Optionally, determining the number of fragments X of the to-be-stored data according to the balanced load situation of the current distributed storage system.
存储单元1002用于将所述待存储数据存储到分布式系统的M个存储节点中。具体的,进行存储时的存储策略遵循如下原则:将所述X个数据分片中的每个数据分片的N个数据分片副本分别存储于所述M个存储节点中的N个存储节点中,并使得数据分片副本存储在相同的N个存储节点中的数据分片的数量为P或P+1,其中P为X除以
Figure PCTCN2017079971-appb-000062
的整数商。
The storage unit 1002 is configured to store the to-be-stored data into the M storage nodes of the distributed system. Specifically, the storage strategy for storing is performed according to the following principle: storing N pieces of data fragments of each of the X data fragments in N storage nodes of the M storage nodes And the number of data fragments in which the data slice copies are stored in the same N storage nodes is P or P+1, where P is X divided by
Figure PCTCN2017079971-appb-000062
The integer quotient.
结合前述的装置实施例,存储单元1002可以通过包含图6所述分布式存储装置的处理器601调用存储器602中存储的程序代码实现。In conjunction with the foregoing apparatus embodiments, the storage unit 1002 can be implemented by calling the program code stored in the memory 602 by the processor 601 including the distributed storage device of FIG.
对于执行上述步骤的理解,可以参考前述第一或者第二个实施例中的介绍,上述步骤也可以参照前述实施例进行拓展或者细化。For the understanding of the above steps, reference may be made to the introduction in the foregoing first or second embodiment, and the above steps may also be extended or refined with reference to the foregoing embodiments.
图11是是根据本发明的实施例的一种分布式存储设备1100的示意性结构图。分布式存储装置1100包括:获取单元1101、确定单元1102。FIG. 11 is a schematic structural diagram of a distributed storage device 1100 according to an embodiment of the present invention. The distributed storage device 1100 includes an acquisition unit 1101 and a determination unit 1102.
其中,获取单元1101用于获取待存储数据所要存储到的M个数据节点、待存储数据的副本数N。结合前述第二个实施例中所述的方法,本实施例中不再对获取单元1101获取所述的两个个数据的具体方式或者可选的实施方式进行赘述。The obtaining unit 1101 is configured to acquire M data nodes to be stored, and a copy number N of data to be stored. With reference to the method described in the foregoing second embodiment, the specific manner or optional implementation manner in which the acquiring unit 1101 acquires the two pieces of data is not described in this embodiment.
结合前述的装置实施例,获取单元1101可以通过包含图7所述分布式存储装置的收发器703,从而从外部网络或者分布式存储系统内部的其他设备中获取所述数据。或者,获取单元1101还可以包含输入输出设备,从而可以通过用户设置的方式获取所述数据。此外,获取单元1101还可以读取存储在该分布式存储设备的预设值,从而获取所述数据的预设数值。 In conjunction with the foregoing apparatus embodiments, the obtaining unit 1101 can obtain the data from an external network or other device inside the distributed storage system through the transceiver 703 including the distributed storage device of FIG. Alternatively, the obtaining unit 1101 may further include an input and output device so that the data can be acquired by a user setting. In addition, the obtaining unit 1101 can also read a preset value stored in the distributed storage device, thereby acquiring a preset value of the data.
确定单元1101用于确定待存储数据的分片数X,所述分片数为将一个待存储数据副本分片后的分片数量,所述待存储数据的分片数X等于或小于
Figure PCTCN2017079971-appb-000063
或者等于或小于
Figure PCTCN2017079971-appb-000064
的正整数倍数。
The determining unit 1101 is configured to determine the number of fragments X of the data to be stored, where the number of fragments is a number of fragments after the data to be stored is fragmented, and the number of fragments X of the data to be stored is equal to or smaller than
Figure PCTCN2017079971-appb-000063
Or equal to or less than
Figure PCTCN2017079971-appb-000064
A positive integer multiple.
可选的,确定单元1101还用于根据所述分布式存储系统的负载均衡需求,确定系数K,其中,所述系数K为正整数,所述K的值越大,所述待存储数据的负载均衡程度越高;所述分片数X等于或小于
Figure PCTCN2017079971-appb-000065
Optionally, the determining unit 1101 is further configured to determine a coefficient K according to a load balancing requirement of the distributed storage system, where the coefficient K is a positive integer, and the value of the K is larger, the data to be stored is The higher the load balancing degree; the number of fragments X is equal to or smaller than
Figure PCTCN2017079971-appb-000065
确定单元1102可以通过包含图7所述分布式存储装置的处理器701调用存储器702中存储的程序代码执行上述的操作步骤来确定分片数X。The determining unit 1102 can determine the number of slices X by performing the above-described operational steps by calling the program code stored in the memory 702 by the processor 701 including the distributed storage device of FIG.
结合前述的装置实施例,确定单元1102可以通过包含图7所述分布式存储装置的处理器701调用存储器702中存储的程序代码实现。In conjunction with the foregoing apparatus embodiments, the determining unit 1102 can be implemented by calling the program code stored in the memory 702 by the processor 701 including the distributed storage device of FIG.
对于执行上述步骤的理解,可以参考前述第一或者第二个实施例中的介绍,上述步骤也可以参照前述实施例进行拓展或者细化。For the understanding of the above steps, reference may be made to the introduction in the foregoing first or second embodiment, and the above steps may also be extended or refined with reference to the foregoing embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机 存取存储器(RAM,Random Access Memor)、磁碟或者光盘等各种可以存储程序代码的介质。The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), and a random A variety of media that can store program code, such as RAM (Random Access Memor), disk, or optical disk.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。 The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims (9)

  1. 一种数据的分片存储方法,其特征在于,所述方法包括:A fragment storage method for data, characterized in that the method comprises:
    确定待存储数据所要存储到的M个存储节点;Determining M storage nodes to which the data to be stored is to be stored;
    获取所述待存储数据的N个副本,其中所述N个副本包括所述待存储数据的原始数据和所述原始数据的N-1个备份数据,将所述N个副本中的每个副本依据同样的分片方式分片为X个数据分片,以使得每个数据分片具有N个数据分片副本,N小于或等于M;Obtaining N copies of the data to be stored, where the N copies include original data of the data to be stored and N-1 backup data of the original data, and each copy of the N copies Fragmenting into X data fragments according to the same fragmentation method, so that each data fragment has N data slice copies, N is less than or equal to M;
    将所述待存储数据的N个副本存储到所述M个存储节点,其中,将所述X个数据分片中的每个数据分片的N个数据分片副本分别存储于所述M个存储节点中的N个存储节点中,并使得数据分片副本存储在相同的N个存储节点中的数据分片的数量为P或P+1,其中P为X除以
    Figure PCTCN2017079971-appb-100001
    的整数商。
    And storing N copies of the data to be stored in the M storage nodes, where N data slice copies of each of the X data fragments are respectively stored in the M The number of data fragments in the N storage nodes in the storage node and the data fragment copies stored in the same N storage nodes is P or P+1, where P is X divided by
    Figure PCTCN2017079971-appb-100001
    The integer quotient.
  2. 根据权利要求1所述方法,其特征在于,所述将所述N个副本中的每个副本依据同样的分片方式分片为X个数据分片具体包括:The method according to claim 1, wherein the fragmenting each of the N copies into X data fragments according to the same fragmentation method comprises:
    根据所述副本数N和所述存储节点数M,确定所述待存储数据的最佳分片基数Y,所述最佳分片基数
    Figure PCTCN2017079971-appb-100002
    Determining, according to the number of copies N and the number M of storage nodes, an optimal fragment base Y of the data to be stored, the optimal fragment base
    Figure PCTCN2017079971-appb-100002
    所述待存储数据的分片数X等于所述最佳分片基数Y与系数K的乘积,其中,K为大于或等于1的整数;The number of slices X of the data to be stored is equal to the product of the optimal slice base Y and the coefficient K, where K is an integer greater than or equal to 1;
    将所述N个副本中的每个副本依据同样的分片方式分片为X个数据分片。Each of the N copies is fragmented into X data slices according to the same fragmentation mode.
  3. 根据权利要求1所述方法,其特征在于,所述将所述N个副本中的每个副本依据同样的分片方式分片为X个数据分片具体包括:The method according to claim 1, wherein the fragmenting each of the N copies into X data fragments according to the same fragmentation method comprises:
    根据所述副本数N和所述存储节点数M,确定所述待存储数据的最佳分片基数Y,所述最佳分片基数
    Figure PCTCN2017079971-appb-100003
    Determining, according to the number of copies N and the number M of storage nodes, an optimal fragment base Y of the data to be stored, the optimal fragment base
    Figure PCTCN2017079971-appb-100003
    根据所述最佳分片基数Y,获取所述待存储数据的分片数X,所述待存储数据的分片数X小于所述最佳分片基数Y与系数K的乘积,其中,K为大于或等于1的整数;Obtaining, according to the optimal fragment base Y, the number of fragments X of the data to be stored, the number of fragments X of the data to be stored being smaller than the product of the optimal fragment base Y and the coefficient K, where K An integer greater than or equal to 1;
    将所述N个副本中的每个副本依据同样的分片方式分片为X个数据分片。Each of the N copies is fragmented into X data slices according to the same fragmentation mode.
  4. 根据权利要求2或3所述方法,其特征在于,所述系数K根据所述分布式存储系统的负载均衡需求确定,所述K的值越大,所述待存储数据的负载均衡程度越高。The method according to claim 2 or 3, wherein the coefficient K is determined according to a load balancing requirement of the distributed storage system, and the value of the K is higher, and the load balancing degree of the data to be stored is higher. .
  5. 根据权利要求1-4中任一所述方法,其特征在于,所述获取待存储数据的副本数N具体包括:The method according to any one of claims 1-4, wherein the obtaining the number N of copies of the data to be stored comprises:
    根据待存储数据的安全需求确定待存储数据的副本数N,其中,所述副本数N的值越大,所能够满足的所述待存储数据的安全需要越高。The number N of copies of the data to be stored is determined according to the security requirement of the data to be stored, wherein the larger the value of the number of copies N, the higher the security requirement of the data to be stored that can be satisfied.
  6. 根据权利要求1所述方法,其特征在于,所述数据分片数X根据所述分布式存储系统的负载均衡需求确定,所述X的值越大,所述待存储数据的负载均衡程度越高。The method according to claim 1, wherein the number of data fragments X is determined according to a load balancing requirement of the distributed storage system, and the larger the value of the X, the more load balancing the data to be stored is. high.
  7. 根据权利要求1-6中任一所述方法,其特征在于,所述将所述待存储数据的N个副本存储到所述M个存储节点具体包括:The method according to any one of claims 1-6, wherein the storing the N copies of the data to be stored to the M storage nodes specifically includes:
    确定N个所述待存储数据副本存储于M个所述数据节点时,从M个数据节点中选出N个数据节点的的
    Figure PCTCN2017079971-appb-100004
    种数据节点的组合方式;
    Determining, when the N pieces of the data to be stored are stored in the M data nodes, selecting N data nodes from the M data nodes
    Figure PCTCN2017079971-appb-100004
    Combination of data nodes;
    确定分片数X除以
    Figure PCTCN2017079971-appb-100005
    所得的商P以及余数Q;
    Determine the number of slices X divided by
    Figure PCTCN2017079971-appb-100005
    The resulting quotient P and the remainder Q;
    在所述
    Figure PCTCN2017079971-appb-100006
    种数据节点的组合方式种中选择Q种数据节点的组合方式用于存储P+1个数据分片,其余
    Figure PCTCN2017079971-appb-100007
    个数据节点的组合方式用于存储P个数据分片,其中,每个数据分片的N个副本分别存储在要存储的数据节点的组合方式中N个不同的数据节点上。
    In the stated
    Figure PCTCN2017079971-appb-100006
    A combination of Q data nodes in a combination of data nodes for storing P+1 data fragments, and the rest
    Figure PCTCN2017079971-appb-100007
    The combination of data nodes is used to store P data slices, wherein N copies of each data slice are respectively stored on N different data nodes in a combination of data nodes to be stored.
  8. 一种分布式存储设备,所述设备用于包含至少两个存储节点的分布式存储系统中,用以确定待存储数据的分片存储策略,所述设备包括:A distributed storage device for a fragment storage policy for determining data to be stored in a distributed storage system including at least two storage nodes, the device comprising:
    处理器,以及与所述处理器相连接的存储器;a processor, and a memory coupled to the processor;
    其中,所述处理器调用所述存储器中存储的指令以用于执行权利要求1-7中任意一项权利要求所述的方法。Wherein the processor invokes instructions stored in the memory for performing the method of any of claims 1-7.
  9. 一种分布式存储系统,所述系统包括至少两个存储节点,以及至少一个管理设备,所述管理设备用于确定待存储数据的分片存储策略,所述设备包括:A distributed storage system, the system includes at least two storage nodes, and at least one management device, the management device is configured to determine a fragment storage policy of data to be stored, and the device includes:
    处理器,以及与所述处理器相连接的存储器;a processor, and a memory coupled to the processor;
    其中,所述处理器调用所述存储器中存储的指令以用于执行权利要求1-7中任意一项权利要求所述的方法。 Wherein the processor invokes instructions stored in the memory for performing the method of any of claims 1-7.
PCT/CN2017/079971 2016-08-10 2017-04-10 Data shard storage method, device and system WO2018028229A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17838361.8A EP3487149B1 (en) 2016-08-10 2017-04-10 Data shard storage method, device and system
US16/270,048 US10942828B2 (en) 2016-08-10 2019-02-07 Method for storing data shards, apparatus, and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610659118.4A CN106302702B (en) 2016-08-10 2016-08-10 Data fragment storage method, device and system
CN201610659118.4 2016-08-10

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/270,048 Continuation US10942828B2 (en) 2016-08-10 2019-02-07 Method for storing data shards, apparatus, and system

Publications (1)

Publication Number Publication Date
WO2018028229A1 true WO2018028229A1 (en) 2018-02-15

Family

ID=57670023

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/079971 WO2018028229A1 (en) 2016-08-10 2017-04-10 Data shard storage method, device and system

Country Status (4)

Country Link
US (1) US10942828B2 (en)
EP (1) EP3487149B1 (en)
CN (1) CN106302702B (en)
WO (1) WO2018028229A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020002889A1 (en) * 2018-06-29 2020-01-02 Arm Ip Limited Blockchain infrastructure for securing and/or managing electronic artifacts
CN113297005A (en) * 2020-07-27 2021-08-24 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN114925073A (en) * 2022-06-14 2022-08-19 九有技术(深圳)有限公司 Distributed database architecture supporting flexible dynamic fragmentation and implementation method thereof

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106302702B (en) * 2016-08-10 2020-03-20 华为技术有限公司 Data fragment storage method, device and system
US11461273B1 (en) * 2016-12-20 2022-10-04 Pure Storage, Inc. Modifying storage distribution in a storage system that includes one or more storage devices
CN108572976A (en) * 2017-03-10 2018-09-25 华为软件技术有限公司 Data reconstruction method, relevant device and system in a kind of distributed data base
CN107395745A (en) * 2017-08-20 2017-11-24 长沙曙通信息科技有限公司 A kind of distributed memory system data disperse Realization of Storing
CN109660493B (en) * 2017-10-11 2020-12-18 南京南瑞继保电气有限公司 New energy centralized control cloud storage method based on block chain
WO2019080015A1 (en) * 2017-10-25 2019-05-02 华为技术有限公司 Data reading and writing method and device, and storage server
CN109992196B (en) * 2017-12-29 2022-05-17 杭州海康威视数字技术股份有限公司 Index data storage method and device and storage system
CN108491167B (en) * 2018-03-29 2020-12-04 重庆大学 Industrial process working condition data rapid random distribution storage method
US20210233074A1 (en) * 2018-04-27 2021-07-29 nChain Holdings Limited Partitioning a blockchain network
CN108769171B (en) * 2018-05-18 2021-09-17 百度在线网络技术(北京)有限公司 Copy keeping verification method, device, equipment and storage medium for distributed storage
CN108664223B (en) * 2018-05-18 2021-07-02 百度在线网络技术(北京)有限公司 Distributed storage method and device, computer equipment and storage medium
CN108874585B (en) * 2018-05-25 2021-01-22 南京云信达科技有限公司 File backup method, device and system
CN110535898B (en) * 2018-05-25 2022-10-04 许继集团有限公司 Method for storing and complementing copies and selecting nodes in big data storage and management system
CN110798492B (en) * 2018-08-02 2022-08-09 杭州海康威视数字技术股份有限公司 Data storage method and device and data processing system
CN109062736A (en) * 2018-08-20 2018-12-21 广州视源电子科技股份有限公司 Data backup method, device, equipment and storage medium
CN110874288A (en) * 2018-09-04 2020-03-10 北京奇虎科技有限公司 Management method and device for Redis cluster cache data
CN109597826B (en) * 2018-09-04 2023-02-21 创新先进技术有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN110633378A (en) * 2019-08-19 2019-12-31 杭州欧若数网科技有限公司 Graph database construction method supporting super-large scale relational network
CN110795702A (en) * 2019-10-12 2020-02-14 山东英信计算机技术有限公司 Software anti-cracking method, device, equipment and medium
US11074129B2 (en) 2019-10-31 2021-07-27 Western Digital Technologies, Inc. Erasure coded data shards containing multiple data objects
CN113315800A (en) * 2020-02-27 2021-08-27 华为技术有限公司 Mirror image storage and downloading method, device and system
CN111444274B (en) * 2020-03-26 2021-04-30 上海依图网络科技有限公司 Data synchronization method, data synchronization system, and apparatus, medium, and system thereof
CN111428271A (en) * 2020-04-17 2020-07-17 上海坤仪金科信息技术有限公司 Block chain cloud storage user data security solution method
CN111835848B (en) * 2020-07-10 2022-08-23 北京字节跳动网络技术有限公司 Data fragmentation method and device, electronic equipment and computer readable medium
EP3961419A1 (en) * 2020-08-28 2022-03-02 Siemens Aktiengesellschaft Computer-implemented method for storing a dataset and computer network
CN112231398B (en) * 2020-09-25 2024-07-23 北京金山云网络技术有限公司 Data storage method, device, equipment and storage medium
CN112711382B (en) * 2020-12-31 2024-04-26 百果园技术(新加坡)有限公司 Data storage method and device based on distributed system and storage node
CN113722393A (en) * 2021-06-03 2021-11-30 京东城市(北京)数字科技有限公司 Control method and device of distributed platform and electronic equipment
CN113553217A (en) * 2021-07-08 2021-10-26 广州炒米信息科技有限公司 Data recovery method and device, storage medium and computer equipment
CN113268472B (en) * 2021-07-15 2021-10-12 北京华品博睿网络技术有限公司 Distributed data storage system and method
CN113609090B (en) * 2021-08-06 2024-06-18 杭州网易云音乐科技有限公司 Data storage method and device, computer readable storage medium and electronic equipment
CN113505027B (en) * 2021-09-10 2022-03-01 深圳市科力锐科技有限公司 Business system backup method, device, equipment and storage medium
CN113961149B (en) * 2021-10-29 2024-01-26 国网江苏省电力有限公司营销服务中心 Polymorphic data storage system and method for electric power information system
CN114398371B (en) * 2022-01-13 2024-06-04 深圳九有数据库有限公司 Multi-copy slicing method, device, equipment and storage medium for database cluster system
CN117494146B (en) * 2023-12-29 2024-04-26 山东街景智能制造科技股份有限公司 Model database management system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753349A (en) * 2008-12-09 2010-06-23 中国移动通信集团公司 Upgrading method of data node, upgrade dispatching node as well as upgrading system
CN104376087A (en) * 2014-11-19 2015-02-25 天津南大通用数据技术股份有限公司 Load balance calculation method for distributed database adopting cross backups
CN104580427A (en) * 2014-12-27 2015-04-29 北京奇虎科技有限公司 Master-slave balance method and device in distributed memory system
US20160171073A1 (en) * 2013-08-27 2016-06-16 Kabushiki Kaisha Toshiba Database system, computer program product, and data processing method
CN106302702A (en) * 2016-08-10 2017-01-04 华为技术有限公司 Burst storage method, the Apparatus and system of data

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5555404A (en) * 1992-03-17 1996-09-10 Telenor As Continuously available database server having multiple groups of nodes with minimum intersecting sets of database fragment replicas
US8589574B1 (en) 2005-12-29 2013-11-19 Amazon Technologies, Inc. Dynamic application instance discovery and state management within a distributed system
US7653782B2 (en) * 2006-05-23 2010-01-26 Dell Products L.P. Method for host bus adapter-based storage partitioning and mapping across shared physical drives
CN104468651B (en) 2013-09-17 2019-09-10 南京中兴新软件有限责任公司 Distributed more copy data storage methods and device
US10620830B2 (en) * 2013-12-18 2020-04-14 Amazon Technologies, Inc. Reconciling volumelets in volume cohorts
US10120924B2 (en) * 2014-03-31 2018-11-06 Akamai Technologies, Inc. Quarantine and repair of replicas in a quorum-based data storage system
CN105335297B (en) * 2014-08-06 2018-05-08 阿里巴巴集团控股有限公司 Data processing method, device and system based on distributed memory and database
CN105740295B (en) * 2014-12-12 2019-06-14 中国移动通信集团公司 A kind of processing method and processing device of distributed data
US20160306822A1 (en) * 2015-04-17 2016-10-20 Samsung Electronics Co., Ltd. Load balancing of queries in replication enabled ssd storage

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753349A (en) * 2008-12-09 2010-06-23 中国移动通信集团公司 Upgrading method of data node, upgrade dispatching node as well as upgrading system
US20160171073A1 (en) * 2013-08-27 2016-06-16 Kabushiki Kaisha Toshiba Database system, computer program product, and data processing method
CN104376087A (en) * 2014-11-19 2015-02-25 天津南大通用数据技术股份有限公司 Load balance calculation method for distributed database adopting cross backups
CN104580427A (en) * 2014-12-27 2015-04-29 北京奇虎科技有限公司 Master-slave balance method and device in distributed memory system
CN106302702A (en) * 2016-08-10 2017-01-04 华为技术有限公司 Burst storage method, the Apparatus and system of data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020002889A1 (en) * 2018-06-29 2020-01-02 Arm Ip Limited Blockchain infrastructure for securing and/or managing electronic artifacts
US10764258B2 (en) 2018-06-29 2020-09-01 Arm Ip Limited Blockchain infrastructure for securing and/or managing electronic artifacts
CN113297005A (en) * 2020-07-27 2021-08-24 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN113297005B (en) * 2020-07-27 2024-01-05 阿里巴巴集团控股有限公司 Data processing method, device and equipment
CN114925073A (en) * 2022-06-14 2022-08-19 九有技术(深圳)有限公司 Distributed database architecture supporting flexible dynamic fragmentation and implementation method thereof
CN114925073B (en) * 2022-06-14 2024-04-16 深圳九有数据库有限公司 Distributed database system supporting flexible dynamic fragmentation and implementation method thereof

Also Published As

Publication number Publication date
CN106302702A (en) 2017-01-04
EP3487149B1 (en) 2020-04-29
US10942828B2 (en) 2021-03-09
EP3487149A1 (en) 2019-05-22
CN106302702B (en) 2020-03-20
EP3487149A4 (en) 2019-05-22
US20190171537A1 (en) 2019-06-06

Similar Documents

Publication Publication Date Title
WO2018028229A1 (en) Data shard storage method, device and system
TWI710915B (en) Resource processing method based on internet data center, related devices and communication system
US10735509B2 (en) Systems and methods for synchronizing microservice data stores
US10169163B2 (en) Managing backup operations from a client system to a primary server and secondary server
WO2016197994A1 (en) Capacity expansion method and device
WO2019001017A1 (en) Inter-cluster data migration method and system, server, and computer storage medium
JP7442466B2 (en) Data verification methods and devices, and storage media
US11953997B2 (en) Systems and methods for cross-regional back up of distributed databases on a cloud service
CN108804465B (en) Method and system for data migration of distributed cache database
US10235249B1 (en) System and method for PaaS replication
US20220232073A1 (en) Multichannel virtual internet protocol address affinity
US11194501B2 (en) Standby copies withstand cascading fails
US11086542B1 (en) Network-configurable snapshot load order properties
US10725971B2 (en) Consistent hashing configurations supporting multi-site replication
US11216204B2 (en) Degraded redundant metadata, DRuM, technique
CN116954863A (en) Database scheduling method, device, equipment and storage medium
US11853785B1 (en) Virtual machine cloning and resource configuration
US11567905B2 (en) Method and apparatus for replicating a concurrently accessed shared filesystem between storage clusters
US10712959B2 (en) Method, device and computer program product for storing data
US20180262565A1 (en) Replicating containers in object storage using intents
US8849763B1 (en) Using multiple clients for data backup
US20240201887A1 (en) Storage array aware dynamic slicing of a file system
CN108153614B (en) Database backup and recovery method
CN117093357A (en) Resource scheduling method, device and system of elastic search cluster
CN117851040A (en) Resource integration method for realizing cloud platform computing nodes based on dynamic resource load

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17838361

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017838361

Country of ref document: EP

Effective date: 20190218