CN107689983B

CN107689983B - Cloud storage system and method based on low repair bandwidth

Info

Publication number: CN107689983B
Application number: CN201710544567.9A
Authority: CN
Inventors: 骆源; 徐亚宁
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2017-07-05
Filing date: 2017-07-05
Publication date: 2021-02-12
Anticipated expiration: 2037-07-05
Also published as: CN107689983A

Abstract

The invention provides a cloud storage system based on low repair bandwidth, which comprises a data inserting and reading module, a coding preprocessing module, a data file coding module, a data file decoding module, a data file management module, a data storage module and a data repair module, wherein the data inserting and reading module is used for inserting data; meanwhile, a cloud storage method based on low repair bandwidth is provided, which comprises a system initialization stage, a user data insertion stage, a data file encoding stage, a user data reading stage, a data file decoding stage and a fault node repair stage; by introducing an erasure code technology, the redundancy of data storage is reduced while the reliability of the data storage is ensured; aiming at the problem of network congestion caused by the repair process of a fault node, a network coding and interference technology is adopted, and a solution of mutual information quantity is introduced. The reliability of data storage is easy to guarantee, and meanwhile, the congestion degree of the network in the fault node repairing process is relieved.

Description

Cloud storage system and method based on low repair bandwidth

Technical Field

The invention belongs to the field of data storage, and particularly relates to a cloud storage system and a cloud storage method based on low repair bandwidth, which are used for enhancing storage reliability through data coding and are used for reducing repair bandwidth generated when a node fails.

Background

In recent years, with the rapid development of internet technology and the development of the whole information industry, both personal information and enterprise data are explosively increased. So that more and more manufacturers are currently launching cloud storage services.

Cloud storage services allow users to store data remotely and share this information conveniently. Although cloud storage brings great convenience to users, a key problem is how to reduce the repair bandwidth of a failed node and relieve the network congestion condition in the node repair process.

In order to reduce the data storage cost, the cloud service provider adopts an erasure code technology to reduce the redundancy of data storage. The working principle of erasure codes is that the original data of a user is divided into k file blocks, and linear coding is adopted

And generating n blocks of coded data, then storing the n blocks of coded data in n different nodes, and reconstructing the original data by a receiving end by acquiring k (k' ≧ k) available coded data. For MDS codes, k' ═ k. MDS codes are therefore a very memory efficient class of coding schemes. By introducing erasure code technology, storage cost is effectively controlled. MDS codes are optimal in terms of redundancy and reliability tradeoffs. When a node fails, the conventional repair scheme is to send the contents of k nodes to the new node. The new node can reconstruct all the original data and use it to construct the contents of the failed node. This repair scheme is simple, but it generates k times of traffic in the network as the failed node stores data, resulting in network congestion. The traffic generated during the repair of a failed node is called repair bandwidth, and how to reduce repair bandwidth is called repair problem. In a large-scale distributed storage system, nodes fail very frequently. In order to effectively reduce network congestion caused by a failed node, a repair mechanism based on network coding is necessary.

According to whether the data of the new node is identical to the data of the failed node, the repair strategies can be divided into 3 types:

function repair: the data of the new node is not necessarily identical to the data of the failed node, and only the data needs to be combined with the surviving node to form the MDS code.

And (3) accurate repair: the data of the new node is identical to the data of the failed node.

Partial accurate repair: the compromise of the two repairing schemes is used for accurately repairing the system node and performing function repairing on the check node.

The erasure code adopted by the storage system is generally (n, k) system erasure code, that is, the data of the first k nodes are not encoded, and the data of the remaining (n-k) nodes are linearly encoded. In the system maintenance process, it is very important to ensure that the system code encoding scheme is unchanged. Since systematic codes are advantageous in reducing the time delay of the data reconstruction process and the workload in the maintenance of the coding scheme. Therefore, accurate repair has been the focus of academic research.

Network coding is an extension of conventional routing (store and forward) methods. In conventional routing, each intermediate node in the network simply stores and forwards the received information. Instead, network coding allows an intermediate node to generate output data by encoding previously received input data. Currently, there are many coding schemes based on network coding techniques to construct low repair bandwidth. These coding schemes are mostly impractical. In addition to the repair problem, other problems, such as how to optimize the coding scheme to make the codec more efficient, are also considered when actually selecting the coding scheme. It is more practical to design an optimal repair scheme that is applicable to all linear erasure codes than to design a coding scheme directly.

Therefore, it is important how to ensure reliable storage of data, how to select a coding scheme, and how to design an efficient repair mechanism to ensure reliable storage of data. Through the search of documents in the prior art, the existing cloud storage system such as the HDFS guarantees the data storage reliability through a multi-copy technology, but high storage cost is brought because of high data redundancy. Other cloud storage systems such as Azure which adopt erasure code technology adopt original reconstruction files to repair the fault nodes, and the defect is that when the system scale is large, the repair of the fault nodes can cause network congestion.

At present, no explanation or report of the similar technology of the invention is found, and similar data at home and abroad are not collected.

Disclosure of Invention

In view of the foregoing defects in the prior art, an object of the present invention is to provide a cloud storage system and method based on low repair bandwidth with high reliability and low repair bandwidth. The cloud storage system and the method have the advantages that on one hand, the erasure code technology is adopted, the data are coded after being blocked, the high reliability and the low storage cost of data storage are guaranteed, and on the other hand, the network coding algorithm is adopted to realize the repair scheme with the low repair bandwidth.

The invention is realized by the following technical scheme.

According to one aspect of the invention, a cloud storage system based on low repair bandwidth is provided, which comprises a data inserting and reading module, an encoding preprocessing module, a data file encoding module, a data file decoding module, a data file management module, a data storage module and a data repair module; wherein:

the data inserting and reading module works on the client and is used for providing an access interface for a user so that the user can send data inserting, reading and deleting commands to the main server;

the coding preprocessing module works on the main server and is used for preprocessing the data of the original data;

the data file coding module works on the main server and is used for coding the preprocessed data to generate data insertion;

the data file decoding module works on the main server and is used for carrying out data decoding reconstruction on the inserted data to generate data reading;

the data file management module works on the main server and is used for storing metadata of the original data;

the data storage module works in a storage server and comprises a plurality of storage nodes, wherein one part of the storage nodes are used for storing uncoded original data, and the other part of the storage nodes are used for storing coded redundant data;

the data restoration module works on the candidate storage server and is used for restoring the data of the fault node on the data storage module.

Preferably, the data preprocessing comprises: and equally slicing the original data which is expected to be inserted, and inserting redundant data for alignment when unequal slicing is not performed.

Preferably, the data insertion comprises: and performing high-speed encoding operation on the preprocessed data through exclusive OR (XOR) operation, and then distributing the encoded data to each storage node of the data storage module for storage.

Preferably, the data reading comprises: the data read from each storage node of the data storage module is decoded to obtain data before encoding, redundant data used for alignment are removed from the data before encoding to obtain reconstructed original data, and the reconstructed original data are sent to a client.

Preferably, the data storage module adopts an erasure code technology to ensure data storage reliability.

Preferably, the data repair module includes:

a repair strategy making module for determining a repair strategy of the failed node according to the failed node and the surviving node set of the data storage module so as to minimize the repair bandwidth;

the restoration operation module downloads data from the survival node set according to the restoration strategy and calculates a new joining storage node which restores the data on the fault node and stores the data on the data storage module.

Preferably, the repair strategy employs a mutual information quantity-based repair mechanism.

According to another aspect of the present invention, there is provided a cloud storage method based on low repair bandwidth, including the following steps:

-a system initialization phase: setting coding parameters on the main server, and setting n data storage nodes A as { A ═ in the data storage module₁，A₂，…，A_nAnd each data storage node independently stores local data, wherein the first k data storage nodes A ═ A₁，A₂，…，A_kThe next (n-k) data storage nodes are used for storing the encoded redundant data;

-a user insertion data phase: a user sends a data inserting command to a main server through a data inserting and reading module of a client, and sends coded data to a data storage module on a storage server through a data file coding module on the main server;

-a data file encoding stage: the encoding preprocessing module on the main server equally divides the original data which is expected to be inserted, when the original data which is not equally divided appears, redundant data is inserted for alignment, and the divided data is stored in the first k data storage nodes A ═ A₁，A₂，…，A_kGenerating (n-k) check data blocks through linear coding, and sending the check data blocks to the remaining (n-k) data storage nodes;

-a user read data phase: a user sends a data reading command to a main server through a data inserting and reading module of a client, and sends decoded original data to the client through a data file decoding module on the main server;

-a data file decoding stage: firstly, reading data from each storage node of a data storage module, decoding the data before encoding, then removing redundant data for alignment from the data before encoding to obtain reconstructed original data, and finally sending the reconstructed original data to a client;

-repair phase of failed node: when a certain storage node of the data storage module fails, the main server actively reads data from other surviving nodes of the data storage module through the data restoration module and restores the data, and stores the restored data in a newly added storage node of the data storage module.

Preferably, the repair stage of the failed node adopts a repair mechanism based on mutual information quantity to repair the failed node.

Compared with the prior art, the invention has the following beneficial effects:

1. in the encoding process, the operation on the finite field is simplified into XOR operation through the representation characteristics and the operational properties of elements on the finite field;

2. the core idea of relative generalized Hamming weight is used for reference;

3. a repair mechanism based on mutual information quantity is adopted on the basis of network coding, and the repair bandwidth of a fault node is optimal;

4. the repair mechanism of the fault node has wide applicability, and is suitable for any linear erasure code coding scheme;

5. the invention realizes the following functions:

(1) and (4) data insertion, namely encoding the original data and storing the encoded original data in different physical nodes of the system.

(2) And data reconstruction, namely downloading data from different storage nodes of the system and then restoring the original data.

(3) And data maintenance, namely repairing the data on the failed node on a new node.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a general block diagram of the present invention;

FIG. 2 is an architecture diagram of a host server of the present invention;

FIG. 3 is a diagram of the file encoding process of the present invention;

FIG. 4 is a diagram of the file decoding process of the present invention;

fig. 5 is a diagram of the failed node repair process of the present invention.

Detailed Description

The following examples illustrate the invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Examples

The embodiment provides a cloud storage system based on low repair bandwidth, which includes:

a data insertion and reading module working on a user client;

the system comprises a code preprocessing module, a data file coding module, a data file decoding module and a data file management module which work on a main server;

a data storage module operating in the storage server cluster;

and the fault node repairing module works on the candidate storage server.

Wherein:

the data inserting and reading module is used for sending an inserting and reading data command to the main storage server by a user so as to realize the inserting and reading of the file;

the coding preprocessing module is used for fragmenting and aligning the data;

the data file coding module is used for generating data insertion;

the data file decoding module is used for reconstructing data;

the data file management module is used for managing file storage of a user and providing a related operation interface;

the data repair module is used for repairing the failed node on the new node.

The data inserting and reading module of the client provides an access interface for a user, and the user can insert, read and delete data into the cloud storage system through the module.

The encoding preprocessing module of the main server performs equal fragmentation on original data which is expected to be inserted, and properly inserts redundant data for alignment when unequal fragmentation does not occur.

The data file coding module of the main server performs high-speed coding operation on the preprocessed data through exclusive OR (XOR) operation, and then distributes the coded data to each storage node of the data storage module for storage.

The data file decoding module of the main server reads data from each storage node, decodes the data before encoding, removes redundant data for alignment to obtain reconstructed original data, and finally sends the reconstructed original data to the client.

The data file management module of the main server is used for storing metadata inserted into the data file, such as fileId and fileSize.

When the node fails, the main server actively reads data from other surviving nodes of the data storage module through the data restoration module and restores the data, and the restored data is stored in the storage node newly added to the data storage module.

The embodiment relates to an erasure code based cloud storage system with low repair bandwidth in a distributed environment, and the working mode of the erasure code based cloud storage system comprises any one or more of the following stages:

in the system initialization stage, coding parameters are set on the main server, and the data storage module is assumed to have total numbern data storage nodes a ═ a₁，A₂，…，A_nAnd each data storage node independently stores local data, wherein the first k data storage nodes A ═ A₁，A₂，…，A_kAnd the next (n-k) data storage nodes are responsible for storing the coded redundant data. The encoding scheme conforming to such a rule is a systematic code, which can simplify the decoding process.

And in the stage of data insertion by the user, when the user sends data to be inserted to the main server through the data insertion and reading module of the client, the coded data is sent to the data storage module on the storage server through the data file coding module on the main server.

In the data file coding stage, a coding preprocessing module of a main server equally divides original data which is expected to be inserted, when unequal division cannot be performed, redundant data is properly inserted for alignment, and the divided data is stored in the first k data storage nodes A ═ A { (A) }₁，A₂，…，A_kAnd then generating (n-k) check data blocks through linear coding, and sending the check data blocks to the remaining (n-k) data storage nodes.

And in the stage of reading data by the user, when the user sends the data to be read to the main server through the client data inserting and reading module, the decoded original data is sent to the client through the data file decoding module on the main server.

And in the data file decoding stage, data is read from each storage node firstly, the data before encoding is decoded, redundant data for alignment is removed to obtain reconstructed original data, and the reconstructed original data is sent to a client.

In the repair stage of the failed node, when a certain storage node of the data storage module fails, the main server actively reads data from other surviving nodes of the data storage module through the data repair module and then repairs the data, and then stores the repaired data in the storage node newly added to the data storage module.

The following describes the embodiments and specific operation procedures of the present embodiment in detail with reference to specific examples.

I. As shown in fig. 1, the present embodiment includes four parts: the storage server continuously provides data storage and access services, and a server cluster of the storage server comprises a large number of cheap machines; a main server providing encoding and decoding services and a fault node repair service; the client is used for inserting and reading data by a user; and the candidate server is used for replacing the storage server with the failure.

The work involved by the main server as shown in fig. 2 is initialization, codec, respectively.

II.1. initialization: the method is completed by a main server in the cloud storage system, and related algorithm parameters are selected by a system user.

1) The erasure code can be represented by a quadruple (n, k, α, r). Wherein k represents the number of file blocks before encoding, alpha represents the number of bits contained in each file block, r (r ≧ k) represents the number for restoring the original file, and n represents the number of file blocks after encoding.

2) The coding scheme, i.e. the composition of the generation matrix, is determined. Currently, erasure code technologies are mainly studied in distributed storage systems, namely, array erasure codes, RS-type erasure codes, and LDPC codes. The array erasure code can only correct one-bit data errors, the LDPC code cannot guarantee 100% recovery of original data, and the RS type erasure code is an MDS code and has strong fault-tolerant capability.

Insert data: firstly, a user sends a file to be stored to a main server through a client, file data of the user is divided into k file blocks, and X (X) is represented by a set₁，x₂，...，x_k) Wherein x is_i(1. ltoreq. i. ltoreq. k) is a file block of α bits (in the case of a shortage of α bits, 0 is used for padding). By equation (1), the original data set X is (X)₁，x₂，...，x_k) Encoding generates Y ═ Y (Y)₁，y₂，...，y_n) And then sent to the n storage nodes. The first k rows in equation (1) are identity matrices,

representing the ith row and ith column elements in the generator matrix.

II.3. read data: firstly, a user sends a read file X to a main server through a client, and the main server reads data from k storage servers which normally work (y)_w1,y_w2,...,y_wk) Then, k rows corresponding to the generated matrix are selected, an inverse matrix is obtained, and then an original data file X is obtained by a formula (2) as (X)₁,x₂,...,x_k) Wherein

Representing the ith row and ith column elements in the generator matrix.

The candidate storage nodes as shown in fig. 2 include work to repair the failed node.

Compared with the replica technique, the erasure code has higher reliability under the condition of the same redundancy. However, at the same time, erasure codes have repair problems. The network coding is applied to the distributed storage system by the regeneration code, so that the system has the advantages of low redundancy and high reliability like erasure codes, and the repair bandwidth can be reduced.

Because network coding allows each data storage node to operate on the data on that node, the data stored on each node is computed by multiplying by one (2)^α-1) × α matrix G_subExpansion into a linear subspace:

therefore, n sub-linear spaces can be obtained from the coding scheme of the erasure code

Wherein G is_iThe generator matrix representing node i. To simplify the attestation process, we define

N＝n(2^α-1)，K＝kα，

Encoded data

The relationship with the original data X is as follows:

wherein

Representing the transformed generator matrix, Q representing a data set downloaded from an existing node, the mutual information quantity I (X)₁(ii) a Q) denotes x which can be derived from Q₁The amount of information of (2). Data collections downloaded from existing nodes

Represents a method for repairing node one, wherein

Indicating from which locations data can be downloaded. In fact, it is possible to use,

is that

In that

The projection of the above, is represented as

To represent

In that

Is represented by a projection of

Can be coded as

Wherein

Wherein

To represent

Column i.

Suppose x is uniformly distributed in a finite field

We can get

Is also uniformly distributed in its sampling space, so that the probability of each value is

Therefore, the temperature of the molten metal is controlled,

1) for a given vector

And

probability of

2) For a given X₁And

gamma denotes the system of equations

Can derive the solution set of

Wherein the content of the first and second substances,

3) from the above formula, we can deduce

Theorem 1 (mutual information theorem)

Wherein

1)D₁Representing column vectors

A linear space of compositions;

2)P_J(D₁) Represents D₁Projection onto J;

3)D₂representing column vectors

A linear space of compositions;

4)P_J(D₂) Represents D₂Projection onto J.

Theorem 2 (smallest repair bandwidth theorem)

Based on the mutual information theory, the embodiment designs a backtracking algorithm based on the mutual information quantity. As shown in the algorithm below, the embodiment constructs a decision tree to describe this breadth search algorithm. In this search tree, a node represents a candidate index union J and a leaf represents a candidate repair solution. To find the minimum repair bandwidth, we need to find the shortest path from the root to the leaf. To find the shortest path, we use a breadth first search algorithm. During the search, the nodes of the tree are accessed from the top to the next level, and this algorithm can be implemented using queues and iterations.

In this embodiment:

and the user uploads the data file to the cloud system, downloads the previously uploaded data from the cloud system, and deletes the uploaded data.

And the user interacts with the cloud storage system through the client, and the interaction comprises data insertion, data reading and data deletion.

The encoding preprocessing module of the main server firstly performs equal fragmentation on the inserted data, and properly inserts redundant data for alignment when unequal fragmentation fails; the data file coding module is used for carrying out high-speed coding operation on the preprocessed data through XOR operation and then distributing the coded data to each storage node for storage; the data file decoding module is used for reading data from each storage node, decoding the data before encoding, removing redundant data for alignment and finally sending original data to a client; and the data file management module is used for storing metadata inserted into the data file, such as fileId, fileSize and the like.

And the data storage server is used for storing all data, providing a data sharing platform and being capable of downloading and uploading data.

And the candidate storage server is used for replacing the server with the fault and repairing the data of the fault node.

And the data repairing module formulates a fault node repairing mechanism through the repairing strategy formulating module, and determines to select a traditional repairing strategy or a repairing strategy based on mutual information quantity when repairing the fault node based on the selection of a system designer.

The storage server (cluster) comprises a data storage module, provides a data safety storage module and ensures the reliability of data storage through an erasure code technology.

The main server comprises an encoding preprocessing module, a data file encoding module, a data file decoding module, a data file management module and a data file repairing module, and has the functions of initializing system parameters, generating keys and distributing the keys to users. Comprises the following steps:

the encoding preprocessing module firstly performs equal fragmentation on the inserted data, and properly inserts redundant data for alignment when unequal fragmentation cannot be performed.

And the data file coding module is used for carrying out high-speed coding operation on the preprocessed data through XOR operation and then distributing the coded data to each storage node for storage.

The data file decoding module reads data from each storage node, decodes the data before encoding, removes redundant data for alignment, and sends the original data to the client.

And the data file management module is used for storing metadata inserted into the data file, such as fileId, fileSize and the like.

And the data file repair module actively reads data from the surviving node when the node fails, and then repairs the data of the failed node at the node newly added into the system.

And the candidate storage server replaces the failed node. The method specifically comprises the following steps:

and the repair strategy making module is used for determining a repair scheme of the failed node according to the set of the failed node and the surviving node so as to minimize the repair bandwidth.

And the repair operation module is used for calculating and restoring the data on the fault node after downloading the data from the survival node set according to the repair strategy.

In order to minimize the repair bandwidth in the repair process of the failed node, the embodiment adopts a repair mechanism based on mutual information quantity on the basis of network coding, thereby ensuring that the repair bandwidth is optimal.

The cloud storage system and method based on low repair bandwidth provided by the embodiment are a scheme for introducing a mutual information theory on the basis of network coding. Firstly, by introducing an erasure code technology, the redundancy rate of data storage is reduced while the reliability of the data storage is ensured; meanwhile, aiming at the problem of network congestion caused by the fault node repairing process, a network coding and interference technology is adopted, and a solution of mutual information quantity is introduced on the basis. The reliability of data storage is easy to guarantee in the embodiment, and meanwhile, the congestion degree of the network in the fault node repairing process is relieved.

Compared with the prior art, the embodiment has the following advantages and innovations:

2. the core idea of relative generalized Hamming weight is used for reference;

the foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. A cloud storage system based on low repair bandwidth is characterized by comprising a data inserting and reading module, a coding preprocessing module, a data file coding module, a data file decoding module, a data file management module, a data storage module and a data repair module; wherein:

the data insertion includes: carrying out high-speed coding operation on the preprocessed data through XOR operation, and then distributing the coded data to each storage node of the data storage module for storage;

the data reading includes: decoding data read from each storage node of the data storage module to obtain data before encoding, removing redundant data for alignment from the data before encoding to obtain reconstructed original data, and sending the reconstructed original data to a client;

the data storage module works in a storage server and comprises a plurality of storage nodes, wherein one part of the storage nodes are used for storing uncoded original data, the other part of the storage nodes are used for storing coded redundant data, and the data storage module adopts an erasure code technology to ensure the reliability of data storage;

the data recovery module works on a candidate storage server and is used for data recovery of a fault node on the data storage module, and the data recovery module comprises:

a repair calculation module for downloading data from the surviving node set according to a repair policy and calculating a newly added storage node for restoring the data on the failed node and storing the data on the data storage module;

the repair strategy adopts a repair mechanism based on mutual information quantity.

2. The low repair bandwidth based cloud storage system of claim 1, wherein said data preprocessing comprises: and equally slicing the original data which is expected to be inserted, and inserting redundant data for alignment when unequal slicing is not performed.

3. A cloud storage method based on low repair bandwidth, characterized in that, the cloud storage system based on low repair bandwidth of claim 1 is used for cloud storage, and the method includes any one or more of the following steps:

4. The cloud storage method based on low repair bandwidth as claimed in claim 3, wherein the repair phase of the failed node adopts a repair mechanism based on mutual information amount to repair the failed node.