WO2015180038A1

WO2015180038A1 - Partial replica code construction method and device, and data recovery method therefor

Info

Publication number: WO2015180038A1
Application number: PCT/CN2014/078539
Authority: WO
Inventors: 李挥; 朱兵; 陈俊; 侯韩旭; 周泰
Original assignee: 北京大学深圳研究生院; 深圳赛思鹏科技发展有限公司
Priority date: 2014-05-27
Filing date: 2014-05-27
Publication date: 2015-12-03
Also published as: CN107003933A; CN107003933B

Abstract

A partial replica code construction method. MDS coding is performed on data, and β coding blocks are obtained (S11), and the coding blocks are numbered sequentially to obtain a set V (S12); grouping is performed on elements in the set V, and β/t groups are obtained (S13); according to the grouping of the set V, all area groups satisfying a condition are obtained (S14); coding blocks corresponding to the obtained area groups are stored in storage nodes, each storage node storing coding blocks corresponding to one area group, and a partial replica code is obtained (S15). The partial replica code construction method, a device implementing the method and a method performing data recovery on the partial replica code have the following beneficial effect: parameter setting is convenient and flexible.

Description

Method and device for constructing partial replica code and method for repairing data thereof

Technical field

The present invention relates to the field of network storage, and more particularly to a method, a device for constructing a partial replica code, and a method for repairing the data.

Background technique

With the rapid development of computer technology and the Internet, the amount of network information data is exploding. Big data poses a serious challenge to existing storage systems, and systems for efficiently storing massive amounts of data have become increasingly important. Currently, distributed storage systems are an effective system for storing massive amounts of data with their high scalability and high availability. However, in large-scale distributed storage systems, data storage nodes are unreliable due to sudden power outages and the like. In order to be able to provide reliable storage services from unreliable storage nodes, redundancy is often introduced into the storage system. The most direct way to introduce redundancy is to directly back up the original data. Although the backup mechanism is simple, its storage efficiency is not high. With the same redundancy, the emerging coding technology can greatly improve its storage efficiency. In the current storage system, the encoding method generally uses the M DS code (Maximum Distance Separable), and the M DS code can achieve the best storage space efficiency. An MDS code with a parameter of (n, k) needs to divide an original file into /: equal-sized modules, and generate 7 mutually unrelated coding modules by coding, and 7 nodes store different modules and satisfy The data stored in any /: node can be reconstructed from the original file. This feature is further referred to as the M DS attribute. This coding technology plays an important role in providing effective network storage redundancy, and is especially suitable for large file storage and archive data backup applications.

In a distributed storage system, data of size B is usually stored in n storage nodes, and the size of data stored in each node is ". The data receiver only needs to connect any / n of the storage nodes and download The data can recover the original data β, which is called the data reconstruction process. The RS (Reed-Solomon) code is a typical code word that satisfies the characteristics of the M DS code. When the storage node fails, in order to maintain the redundancy of the storage system, it is necessary to recover the data stored by the failed node and store the data in the new node. This process is called a repair process. However, in the repair process, the RS code first It is necessary to download /: the data of the storage node and recover the original file, and then generate the data stored by the failed node for the newly introduced node code. Decoding the entire original data in order to recover the data of one storage node is obviously a waste of network bandwidth. . However, due to node failure or file loss in the system, the redundancy of the system will gradually decrease with time, so a mechanism is needed to ensure system redundancy. On this basis, EC code (Erasure Codes) has been proposed, which effectively reduces the system storage overhead. However, the communication overhead required to support redundant recovery is also large. In the EC code, when the data is repaired, the data is first downloaded from the /: storage nodes in the system and the original file is reconstructed; then the original file is re-encoded and the new module is stored on the new node. The recovery process indicates that the network load required to repair any failed node is at least /: the content stored by the nodes.

In order to reduce the bandwidth used in the repair process, some people have proposed the Regenerating Codes (RGC) using the idea of network coding theory, and the RGC codes also satisfy the MDS code characteristics. In the traditional regenerative code repair process, the replacement node needs to connect X among the remaining available storage nodes and download the y-size data from the X storage nodes respectively, so the repair bandwidth of the RGC code is xy. For the RGC code function repair, two types of optimal codes of RGC code are proposed: minimum storage reproduction code (MSR) and minimum repair bandwidth reproduction code (MBR). The RGC code does not need to reconstruct the source file during the repair process, and the repair bandwidth is better than the RS code.

However, the repair process of the regenerated code is computationally complex, and usually involves a large number of finite field operations, that is, the repair node needs to perform a random linear network coding operation on the data stored therein. Specifically, the node participating in the repair reads the stored data block and performs a specific linear operation, and then passes the combined data block to the replacement node. In order to satisfy that all coding packets are independent of each other, the operation of the RGC code needs to be in a large finite field. Considering that the node read and write bandwidth is less than the network bandwidth in the actual system, the read and write bandwidth can easily become a system performance bottleneck. In order to reduce the computational complexity of the repair process, the concept of FR code is proposed based on the M BR code, which indicates that the FR code can provide accurate and effective repair. In general, the FR code consists of two parts: an external MDS code and an internal copy code. After the data block is encoded by M DS, the output code block is copied an integer multiple and then distributed to each storage node. When a node failure occurs in the system, the repair can be done by directly downloading data from other nodes and storing it to the replacement node, without additional operations. Compared with the traditional RS code and the regenerative code RGC, the FR code greatly improves the node failure repair speed and correspondingly reduces the repair time. Since the construction of the M DS code is a relatively mature technology, the construction difficulty of the partial replica code lies in the internal replica code design. Existing partial replica codes are generally constructed based on finite geometry, such as regular graphs, finite projective planes, orthogonal Latin squares, etc. These specific abstract geometric construction processes It is more complicated, and the parameter selection has certain limitations, which undoubtedly increases the design complexity of some duplicate codes.

Summary of the invention

The technical problem to be solved by the present invention is to provide a short-time, convenient parameter setting, and low system overhead for the defects of the prior art that are used for a long time, inconvenient parameter setting, and large system overhead. A method, a device for reconstructing a code, and a method for repairing the data.

The technical solution adopted by the present invention to solve the technical problem thereof is as follows: Constructing a method for constructing a partial replica code, comprising the following steps:

A) The data to be stored is equally divided into α parts, and MDS codes whose parameters are (β, α) are obtained, and β code blocks are obtained;

Β) obtaining a setting parameter, the parameter including the number of the elements t included in each group, and the number s of coding blocks stored in each storage node; and sequentially numbering the β coding blocks as a set V Element, get the set V;

C) grouping the elements in the set V to obtain β It packets; the element numbers in a group are different;

D) obtaining all the blocks of the set V by the grouping obtained in the above steps, and selecting among all the blocks according to the setting parameters, and obtaining the selected n blocks; the block is an element in which the elements are satisfied by a set of elements of different groupings; the n blocks include a total of β code blocks that are copied by a factor of f;

Ε storing the obtained code blocks corresponding to the selected block in the storage node, and each storage node stores a code block corresponding to the selected block;

Wherein, α, β, t, s, and f are all positive integers, and the β can be divisible by t.

Further, in step C), all the obtained packets constitute a set G; the set G is a partition of the set V.

Further, all the blocks obtained in the step D) satisfy the elements in any one of the sets V and exist in the f blocks respectively.

Further, the size of each of the groups is the same, and the capacity of each group is the same. Further, in the step B), the coding block is obtained according to /=^-^/^-1) The factor f is obtained; the number n of storage nodes is obtained according to n β{β- ή ΐ -\.

Further, in the step D), the method further includes the following steps:

D1) dividing all the obtained blocks into ρ parallel classes; wherein, if the elements in the set of several blocks are just all the elements in the set V, and there are no intersecting elements between the blocks, then these The block constitutes a parallel class;

D2) arbitrarily selecting f among the ρ parallel classes to obtain a selected block; wherein the f is less than or equal to P; and the p parallel classes include n blocks.

The invention also relates to an apparatus for implementing the above method, comprising:

The coding block acquisition module is configured to divide the data to be stored into α parts, and perform MDS coding with the parameter (β, oc ) to obtain β coding blocks;

a set V building block: configured to obtain a set parameter, where the parameter includes the number t of the elements included in each group, and the number of code blocks stored in each storage node s; And as an element of the set V, get the set V;

Grouping module: used to group elements in the set V to obtain β / t groups; the element numbers in a group are different;

The block construction module is configured to obtain all the blocks of the set V by the obtained grouping, and select among all the blocks according to the set parameters, to obtain the selected n blocks; the block is an element in which the element satisfies a set consisting of elements of any different grouping; a total of β code blocks that are copied by a factor of f are included in the n blocks;

a data storage module: configured to store the obtained code blocks corresponding to the selected block in the storage node, and each storage node stores a code block corresponding to the selected block;

Further, the block building module further includes:

Parallel class division unit: configured to divide all the obtained blocks into p parallel classes; wherein, for example, the elements in the set of several block groups are exactly all elements in the set V, and there is no between the blocks Intersect elements, then these blocks form a parallel class;

Parallel class selection unit: used to arbitrarily select f among the p parallel classes to obtain a selected block; wherein, f is less than or equal to p; and the p parallel classes include n blocks.

The present invention also relates to a method of repairing data obtained using the above method, including The following steps:

M) obtain the repair table, find the repair scheme by using the number of the failed node as an index; N) download the node data indicated by the repair table and obtain the replacement node data to generate a replacement node.

Further, the repair table is stored in system metadata of a tracking server in the storage system; the repair scheme for one node in the repair table includes at least one.

The method, apparatus and data repair method for implementing the partial replica code of the present invention have the following beneficial effects: Since in the present embodiment, the internal replica code of the partial replica code is constructed by group design, so that it Under the premise of keeping the partial copy code construction shorter and the system overhead is small, the parameter setting is more convenient and flexible; it has great flexibility in using it on different storage systems.

DRAWINGS

1 is a flow chart showing a process of constructing a partial replica code in an embodiment of a method, apparatus, and data repair method for constructing a partial replica code according to the present invention;

2 is a schematic diagram of a construction of a partial replica code in the embodiment;

Figure 3 is a schematic view showing another construction in the embodiment;

4 is a schematic diagram of a device for constructing a partial replica code in the embodiment;

Figure 5 is a schematic diagram showing the relationship between data repair selectivity and node storage capacity in the embodiment;

Figure 6 is a comparison of the repair time of various codes in the case of a storage parameter in the embodiment; Figure 7 is a comparison of the repair time of various codes in the case of another storage parameter in the embodiment. detailed description

The embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

As shown in FIG. 1, in the method for constructing a partial replica code of the present invention, an apparatus, and a method for repairing the data, the method for constructing the partial replica code includes the following steps:

Step S11 performs MDS encoding on the data to obtain a coding block: In this step, the data on the network to be stored, usually a file, is equally divided into α parts, and its parameters are (β, ot The MDS code is obtained to obtain β code blocks. Since MDS coding itself is a relatively mature technology, it will not be described too much here. Step S12 processes the obtained coded block according to the set parameter, and obtains the set V: In this step, the previously set parameters are obtained, and the parameters are not only related to this step, but also the grouping in the subsequent step, to obtain the block and Select the group and other related. These parameters include the number of elements t in each group, the number of coding blocks s stored in each storage node, the number n of storage nodes, the copy multiple f of the coding block, and the like. In this embodiment, the above α, β, t, s, and f are all positive integers, and β can be divisible by t. In this step, after obtaining the above-described setting parameters, first, the coded blocks obtained in the above steps are sequentially numbered, and the elements of the set V are obtained as a set. That is to say, for the set V, the number of its elements is unique, and there are a total of β code blocks in the above set V. In this embodiment, when any set is involved, its element is replaced by the number of the element, and the specific content of the element is not involved.

Step S13 groups the set V: In this step, the set V obtained above is grouped, each group includes t of the above elements, and the elements between each group are not repeated (that is, the number of the element is not repeated) . That is, the above β elements (i.e., coding blocks) are grouped, and each group includes t elements to obtain β / t groups. In this step, all the obtained packets constitute a set G; each group has the same capacity; and the set G is a partition (or a division) of the set V.

Step S14 obtains the block group, and selects the block group according to the setting parameters: In this step, since the group has been obtained in the above steps, the block group is obtained on the basis of the group; one block group is a set, wherein the elements are all It is an element in the above set V, and each block includes s elements, and any element in each block does not belong to the same group as other elements in the block. In other words, in the present embodiment, the set V is first divided once by means of grouping. On the basis of this division, the set V is again divided according to the definition of the block to obtain the block. For a block, its elements are elements that are divided into sets V of different groups. For example, let the set V have 4 coding blocks, which are 1, 2, 3, and 4, respectively, which are divided into two groups, which are (1, 2) and (3, 4) respectively; then the grouping may include (1) 3); (2, 4); (1, 4) and (2, 3). If all are selected in this step, they can be stored in 4 storage nodes, each storing 2 encoding blocks, and for the elements in the set V, the copying multiple is 2, because these groups include 2 in total. One coding block 1, two coding blocks 2, two coding blocks 3 and two coding blocks 4. As can be seen from this example, in the present embodiment, the range of selection of the setting parameters is relatively large. When the settings are chosen more appropriately, the resulting groupings may form a parallel class. Generally speaking, if several blocks All elements in the set are just all the elements in the set V, and these blocks are considered to form a parallel class. For all the blocks obtained in a set, it can be divided into p parallel classes. In this case, in this step, all the blocks of the p parallel classes for the set V can also be obtained first, and then f among the p parallel classes is selected to realize the selection of the block. For example, if a parallel group includes 3 blocks, all blocks constitute 3 parallel classes, and if two parallel classes are selected, 6 out of all 9 blocks are selected. Wherein f is less than p. At this time, f can be set or calculated.

It is worth mentioning that, in this embodiment, the parameters used may be all given or set, or may be obtained by calculating some other ungiven parameters. For example, the coding block copy factor f can be obtained according to f = {v - t)l{s - l), and the number of storage nodes π is obtained according to w = - t) / -l).

Step S15 assigns the selected block to each storage node: In this step, the above-mentioned block groups are respectively stored on the storage nodes, and each storage node stores one block. For example, in the above steps, two parallel classes have been selected for a total of six blocks, so that the code blocks represented by the selected six blocks are stored in six storage nodes. The amount of data stored on a storage node is the amount of data included (or pointed) by a block; for example, a block includes two coded block numbers, that is, it includes (or points to) two code blocks, one storage node The amount of data stored is the amount of data of two coded blocks, and the data stored therein is the two coded blocks.

In the above steps, α, β, t, s, and / are all positive integers, and β can be divisible by t. In this embodiment, a distributed storage system is generally represented by (n, k, d), where "the total number of nodes representing the storage system, indicating the minimum number of nodes required to reconstruct the original file, ί Fix the number of available nodes required for a failed node and satisfy 1. The research on MDS codes has been relatively mature and can satisfy almost any qualified parameter. Therefore, the difficulty in constructing part of the replica code lies in the internal replica code design. The essence of the FR code is an arrangement in which the data blocks of the multiple of / are replicated on the node, while ensuring that copies of each data block are stored separately on different nodes.

One applies to the partial copy code C=^7, A) of the distributed storage system with parameters Λ, ί, and the copy multiple is /, which refers to the set of specific w subsets M = { ₁ .., M where each subset The elements are all from the symbol set = { ,..., . At the same time, the following two conditions are met: (1) The size of each subset is d

(2) Each element in U belongs to M / subset. In the above definition, the elements in each subset represent the subscripts of the MDS-encoded data blocks, which are stored in the corresponding nodes = · = 1, ..., «). As you can see, each subset corresponds to a storage node. All data blocks are distributed on "different nodes, and each node has a storage capacity of o.

Assume that Γ = (;^,. represents a file containing 5 data blocks, representing a finite field of size q. After MDS encoding with parameters (6, 5), output 6 data blocks.. ₆ where = ;r,, = i,...,5;;r ₆ = ;r,. The code block for each output is duplicated twice, and the generated data block is stored on 4 nodes, see Figure 2. The number in the 2 box indicates the subscript of the coding block. For example, the three data blocks stored by the node are, in turn, Y ₃ , 7 ₅ . The data stored by any two nodes can reconstruct the original file, so there is =2. When the node fails, the data can be downloaded from the other three nodes for repair, then d=3. Let V and λ be the given positive integers, and S and Γ be the given positive integer set. Let 0 = ^, 0, ) is a finite association structure, where V is a set of V elements and G constitutes a division of V. The elements in V are called points, the elements in A are called blocks, and the elements in G are called groups. If the following conditions are met:

(1) for any seA;

(2) For any GGG, there is |G r;

(3) For any A and G£G, there is |5flG| l;

(4) Any pair of elements belonging to different groups in V is included in the λ block at the same time; then, D is a group divisible design or GD design, which is denoted as GD (& λ, Τ If each group has the same capacity, each block has the same size, ie S = }, T = {t), and GD({s}, λ, {t}; abbreviated as GD0, λ, t; It is called a uniform group design. If 1 G contains a group with a capacity of ^, and v = 2^ _i , t, then D is a type (type)

GD design.

For example, 0 = (^ G, A) is a GD design for PcA, if every point in V is exactly The only block in P is associated, so P is called a parallel class. If all the blocks of a GD ( λ, t; v) can be divided into parallel classes, it is called a decomposable GD design. When v=W, GD( , λ, t; is called λ-fold transversal design, which is denoted as ΤΟ(·ν,λ;ί), referred to as TD design. If the parameter λ=1, horizontal The existence of the truncated design TD( , is equivalent to the existence of mutually orthogonal Latin squares. If each group contains only one point, ie ί=1, then the TD design is equivalent to a Steiner system. Although the Steiner system is a A special GD design, but not all GD designs belong to the Steiner system. For uniform GD( , , i; v), each point in V belongs to a specific number of blocks (denoted as r, called this design) The number of repetitions, and satisfies the following parameter relationship: r = A(vt)/(s-\) At the same time, b is used to indicate the total number of blocks included in the GD design, so that the following equation holds: b = Av(vt) /s(s-\). For example, let V = {1, 2, 6}, three equal-sized groups are taken as G: {1, 2}, {3, 4}, {5, 6}, The generated block is {1, 3, 5}, {2, 3, 6}, {1, 4, 6}, {2, 4, 5}. Then (V, G, A) constitutes a square GD(3, 1,2;6) where any given point belongs to two different blocks. Therefore, r=2, b=4. The FR code of the system storage capacity in the mode, λ=1 should be taken in the GD design, and any pair of points belonging to different groups should be included in the unique block at the same time. In the design, the storage capacity of the nodes is the same. GD design. A GD design may have several isomorphisms. In this embodiment, only one specific design (corresponding to a specific grouping) is considered, and the corresponding construction method is equally applicable to all other isomorphic designs.

The GDDFRC code construct takes a given GD0, 1, t; v), where ί>2. The entire block of the design is Α = 3⁄4,. Then an FR^ C=(V, A) can be generated. Here, the constructed FR code parameters are: 6> = ν, /=(ν_ί)/( -1). The node size of the corresponding storage system is "= vO-i" Av v -1), and each node can store i = s data blocks. Among them, the copy multiple / and the number of system nodes can be obtained by the above equation.釆 Using the above uniform GD (3, 1 , 2; 6) design, the constructed FR code is shown in Figure 2. The system can accommodate one Node failure and accurate data reproduction without encoding. If the node N _P N ₂ fails at the same time, the original file must be reconstructed to obtain the coded block 7 ₃ . In general, for a FR code with a copy number of /, the system can withstand /-7 nodes without losing the exact codeless repair feature, at which point all data blocks in the system have at least one backup.

For the case where multiple nodes are allowed to fail, let the element set V = {1, 2, 8}, G = {{1, 2}, {3, 4}, {5, 6}, {7, 8}}, Thus you can get 8 blocks:

{1,3, 5}, {2, 4, 6}, {1,4, 7}, {2,3, 8},

{1,6, 8}, {2, 5,7}, {3, 6,7}, {4, 5,8}. Take a file containing 6 data blocks, denoted as , ...^. After MDS encoding with parameter 6, output 8 encoding blocks; ^. Using the GG design, the data block storage mode in the system is shown in Figure 3. If the design used in the construction process can be decomposed into p parallel classes, then the FR code with a copy multiple of / can be generated by selecting any /(< p) of them. Each parallel class contains all the elements in the symbol set, so as long as there is a complete parallel class in the system, the node repair can proceed normally. Accordingly, if the GD design applied in the construction of the GDDFRC code is decomposable, the copying factor of the coding block and the node size in the system can be flexibly selected. For another example, consider a GG (3, 1, 3; 9), three of which are {1, 2, 3}, {4, 5, 6}, {7, 8, 9}. The nine blocks generated by this design can be divided into three parallel classes (the blocks of each row form a parallel class):

{1,4, 7}, {2, 5,9}, {3,6, 8};

{1,6, 9}, {2,4, 8}, {3, 5,7};

{1,5, 8}, {2, 6, 7}, {3, 4,9}.

If any two parallel classes are selected, a GDDFRC code with a copy multiple/=2 can be obtained by the constructor, which is suitable for distributed storage systems with parameters (6, 3, 3); if three parallel classes are used, Generate a copying multiple / = 3 GDFFRC code, corresponding to the storage system parameters (9, 3, 3). This flexible parameter selection provides great convenience for system design. In this embodiment, referring to FIG. 4, an apparatus for implementing the foregoing method, further includes an encoding block obtaining module 1, a set V building block 2, a grouping module 3, a block building module 4, and a data storage module 5; The coding block obtaining module 1 is configured to divide the data to be stored into α parts, and perform MDS encoding of the parameter (β, α) to obtain β coding blocks; the set V construction module 2 is used to obtain setting parameters. The parameter includes the number t of the elements included in each group, the number of code blocks stored in each storage node s; the β code blocks are sequentially numbered and used as elements of the set V to obtain a set V; The grouping module 3 is configured to group the elements in the set V to obtain β / t packets; the element numbers in one group are different; the block building module 4 is used to obtain all the areas of the set V by the obtained grouping. Grouping, and selecting among all the groups according to the set parameters, obtaining the selected n blocks; the block is a set in which the elements satisfy any of the different grouped elements; the n blocks are total The data encoding module 5 is configured to store the coded blocks corresponding to the selected selected block in the storage node, and each storage node stores a code block corresponding to the selected block. Wherein α, β, t, s, and f are all positive integers, and the β can be divisible by t.

In this embodiment, the block construction module 1 further includes: a parallel class division unit 41 and a parallel class selection unit 42; wherein, the parallel class division unit 41 is configured to divide all the obtained block groups into P parallel classes; Wherein, if the elements in the set of several block groups are exactly all the elements in the set V, and there are no intersecting elements between the blocks, the blocks form a parallel class; the parallel class selecting unit 42 is used in the Any one of the p parallel classes is selected to obtain a selected block; wherein, f is smaller than p; and the p parallel classes include n blocks.

In this embodiment, a method for performing data repair on the partial replica code obtained by the foregoing method is also involved. In this embodiment, the GDDFRC code covers all the characteristics of the FR code. The copying multiple of each data block is the same, and the storage capacity of each node of the system is the same. It is worth noting that, unlike the traditional random access mode, the GDDFRC code uses a table-based repair method. Specifically, the repair form indicates the repair options that are selectable for each particular failed node. As shown in FIG. 3, if node N ₈ fails, repairs can be made through nodes N ₂ , N ₄ , N ₆ instead of nodes, N ₂ and N ₃ . A storage server deployment usually includes a tracking server.

( tracker server ), used to record system metadata. Therefore, the repair form information can be written to the metadata for quick access reading of the fail-safe. In terms of reducing the complexity of the repair process, The cost of building and maintaining a node repair form is worthwhile. In addition, for the partial replica code obtained by using the method in this embodiment, the degree of selection of data repair is relatively large. For the MDS code, when a node fails in the system, the downloaded data can be randomly selected from other nJ available nodes, and the original file is reconstructed and then encoded and repaired. Therefore, for any node failure, there are ⁷¹ repair schemes for the MDS code. This specifies the number of alternatives for node failure repair, called the repair selectivity of the node. In this embodiment, unlike the random access mode, the GDDFRC code uses a table-based repair method, wherein the table gives a node-specific repair scheme. Since each copy of the data block is distributed in different nodes and a pair of different data blocks are stored on a unique node, when one node fails, other nodes that store the same data block with the node can be connected, and the download is lost. A copy of the data block regenerates the replacement node. It can be seen that given a storage node with a capacity of one, the system has (~ l) ^rf failure repair scheme. Figure 5 shows the relationship between the node repair selectivity and the storage capacity of the GDDFRC code with a copying multiple of 3. As can be seen from the figure, although the repair method of the GDDFRC code is based on a table, the node repair selectivity can still reach a very high level. For the GDDFRC code with a certain copying multiple, the node repair selectivity increases exponentially with the node storage capacity.

In an example of the embodiment, the GDDFRC code proposed by the present invention is implemented by using the popular Hadoop distributed file system in the industry, and the file encoding and decoding and failure recovery functions are completed. In the experiment, the CPU of the system server is configured as Intel(R) Xeon( ) E5-2609 2.40GHz, and the memory size is 24G. Using a normal PC (CPU is AMD A8-5600k 3.0GHz, 4G memory) as a data storage node, the same experimental environment is configured, and there is no other operation for each node during the experiment. Under the condition that the node storage capacity is the same, the difference between the GDDFRC code and the classic RS code and MBR code in the repair time is analyzed from different A values. First, set the number of nodes n=9, and the data stored in any six nodes can reconstruct the original file. At the same time, in the experiment, the GDDFRC code with copy number 2 is used, and the three-coded single-node failure repair time is tested under the condition that the node storage capacity is 100MB, 200MB, 300MB. The average test value was run multiple times under the same conditions, and the result is shown in FIG. 6. As can be seen from the figure, the GDDFRC code significantly reduces the node failure recovery time compared to the RS code and the MBR code. Set the number of nodes "=6, and ^=10. GD Use the copy ratio of the GDDFRC code of the household 3, and the result is shown in Fig. 7. When the node storage capacity increases, the advantage of the GDDFRC code in repair time is more obvious. In the traditional RS code node repair process, the original file needs to be restored, and the generated code block is re-encoded and stored in the replacement node, so the repair time is relatively long. For the minimum bandwidth regenerative MBR code, the node participating in the repair performs a linear operation on the stored data, and then transfers the combined data block to the replacement node. The node further integrates all the received data blocks to recover the invalid data. The entire process involves a large number of finite field operations, increasing the repair time. When a node failure is detected, the system first determines which node is invalid, and determines the repair plan according to the GDDFRC code repair table (stored in the system metadata). At the same time, connect the available nodes specified in the scheme, download the corresponding data blocks and store them directly in the replacement node. It can be seen that the entire repair process only involves file reading work and does not introduce other complex operations. Although the redundancy of the system is increased to a certain extent, the results show that the GDDFRC code can greatly reduce the failure repair time. Compared with the traditional RGC code, the maximum advantage of the partial copy code (G DDFRC ) based on the group design is that the computational complexity in the codec process is significantly reduced, and the finite domain complex operation is replaced by the simple and easy to implement data replication. . The construction of traditional RGC codes is based on finite field GF (W, finite field addition, subtraction and multiplication designed in the coding and decoding process. Although the theoretical research is quite mature, the practical application is cumbersome and time consuming, obviously not It meets the fast and reliable design specifications of today's distributed storage systems. The GDDFRC code is different. The node failure repair in the system can be repaired by directly downloading data from other nodes and storing it to the replacement node. No additional calculations are required, which greatly improves the node repair. And the rate of data block regeneration has high application value and development potential in practical distributed storage systems.

The partial replica code based on the groupable design not only reduces the computational complexity in the node repair process, but also ensures that the bandwidth consumed during the node repair process is the smallest (ie, the original file size), and does not consume excess bandwidth. Today, as bandwidth resources become more and more valuable, the benefits of GDDFRC codes are obvious. In this embodiment, its GDDFRC code can guarantee: Lost coding block Several subsets of other encoding modules can be downloaded directly for repair; missing encoded blocks can be fixed by a fixed number of encoding modules, and the repair mode is based on tables. At the same time, the data stored by the node after the repair of the G DDFRC code is completely consistent with the failed node, that is, the exact repair, which greatly reduces the system operation complexity (such as metadata update, updated data broadcast, etc.).

The detailed description is not to be construed as limiting the scope of the invention. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, the scope of the invention should be determined by the appended claims.

Claims

Claim

A method for constructing a partial replica code, comprising the steps of:

Β) obtaining a setting parameter, the parameter including the number of the elements t included in each group, the number of coding blocks stored in each storage node s; and sequentially numbering the β coding blocks as a set V Element, get the set V;

The method of constructing a partial replica code according to claim 1, wherein in step C), all the obtained packets constitute a set G; and said set G is a partition of said set V.

The method for constructing a partial replica code according to claim 2, wherein all the blocks obtained in the step D) satisfy the elements in any one of the sets V are respectively present in the f blocks.

The method for constructing a partial replica code according to claim 3, wherein each of the groups has the same size, and each of the groups has the same capacity.

The method for constructing a partial replica code according to claim 4, wherein in the step B), the coding block copy multiple f is obtained according to / = ^ - t - l); according to ^ - /s -l) Get the number n of storage nodes.

The method for constructing a partial replica code according to claim 5, wherein the step D) further comprises the following steps: Dl) dividing all the obtained blocks into p parallel classes; wherein, if the elements in the set of several blocks are just all the elements in the set V, and there are no intersecting elements between the blocks, then these The block constitutes a parallel class;

D2) arbitrarily selecting f among the p parallel classes to obtain a selected block; wherein, f is less than or equal to P; and the p parallel classes include n blocks.

An apparatus for implementing a partial replica code construction method according to claim 1, comprising:

8. The apparatus according to claim 7, wherein the block building module further comprises:

9. A data obtained by using the partial replica code construction method of claim 1. The method for repairing data data is characterized in that it comprises the following steps:

10. The data repair method according to claim 9, wherein the repair table is stored in system metadata of a tracking server in a storage system; and the repair scheme for one node in the repair table includes at least one .