CN109067852A

CN109067852A - Cross-center cooperative repair method based on erasure codes

Info

Publication number: CN109067852A
Application number: CN201810773745.XA
Authority: CN
Inventors: 王意洁; 许方亮; 包涵; 马行空
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2018-07-15
Filing date: 2018-07-15
Publication date: 2018-12-21

Abstract

The invention discloses a cross-center collaborative repair method based on erasure codes, which adopts the technical scheme that a distributed storage system consisting of a data center and control nodes is firstly constructed, the data center comprises N storage nodes, and the control nodes are failure data blocks D_iSelecting a new storage node L N, sending a network delay test request and a storage node two-dimensional array NSet to L N, testing the network delay between storage nodes in L N and NSet by L N, sending a network delay test result array N L to a control node, selecting k available data blocks and storage nodes where the k available data blocks are located by the control node according to N L, carrying out decoding calculation on the storage nodes where the k available data blocks are located, merging the decoding calculation results by a data center where the selected storage nodes are located, sending the merged results to L N, further merging the results by L N to obtain recovered D_i(ii) a And the control node returns the repair success information to the user. The invention can improve the data restoration efficiency and fault-tolerant capability of the large-scale distributed storage system.

Description

A kind of span centre heart cooperative repair method based on correcting and eleting codes

Technical field

The present invention relates to the fault tolerant of distributed storage system, the span centre heart collaboration for being based especially on correcting and eleting codes is repaired Method.

Background technique

Data fault-tolerant is one of the key technology of mass data distribution storage, availability of the data fault-tolerant for raising system It is most important with reliability.Currently, fault tolerant mainly includes the fault-toleranr technique based on duplication and the appearance based on correcting and eleting codes Wrong technology.

Fault-toleranr technique simple, intuitive based on duplication, it is easy to accomplish and deployment, it needs to create for each data object several An equal amount of copy, memory space expense are larger.It is multiple by the way that data object to be divided into based on the fault-toleranr technique of correcting and eleting codes Data block encodes original data block using correcting and eleting codes algorithm to obtain the data block of redundancy, the letter of multiple data blocks Breath is fused in less redundancy, effectively saving memory space.Compared with the fault-toleranr technique based on duplication, it is based on correcting and eleting codes Fault-toleranr technique have many advantages, such as that redundancy is low, storage space utilization is high, gradually become big data era data fault-tolerant Mainstream technology.

The basic thought of fault-toleranr technique based on correcting and eleting codes is to carry out coding to k original data block that m is calculated is superfluous Remaining data block；Only need any k number in k+m data block according to block, so that it may calculate by decoding and restore to obtain original k number According to block.K and m is positive integer.

Specifically, redundant data block C_j(1≤j≤m) is k original data block i.e. D₁, D₂... ..., D_i... ..., D_k(1 ≤ i≤k) linear combination, coding calculate be represented by formula (1), wherein G_jiIt is data block code coefficient.That is, right In redundant data block C_jFor, firstly, code coefficient G_jiWith data block D_iWise multiplication is carried out, new data block T is obtained_ji=G_ji× D_i；Then, k new data block is added by turn, obtains redundant data block C_j,

G_j1×D₁+…+G_ji×D_i+…+G_jk×D_k=C_j (1)

Each original data block D_iIt is all other any k available block E_r(1≤r≤k) (original data block is superfluous Remaining data block) linear combination, decoding calculate be represented by formula (2), wherein H_irIt is data block desorption coefficient.That is, For original data block D_iFor, firstly, desorption coefficient H_irWith available block E_rWise multiplication is carried out, new data block S is obtained_ir =H_ir×E_r；Then, k new data block is added by turn, obtains original data block

H_i1×E₁+…+H_ir×E_r+…+H_ik×E_k=D_i (2)

Although as the progress of technology, the reliability of various software and hardware resources is significantly improved, still, with storage The continuous expansion of resource extent, the failure probability of resource obviously rise.In extensive distributed storage system, resource failed is one Kind normal behavior.In data block failure, the fault-toleranr technique based on correcting and eleting codes needs multiple data blocks that could decode recovery failure Original data block, need to occupy more Internet resources.

With the rapid development of cloud computing technology, the distributed storage system based on multiple data centers is become increasingly popular.Same number According to the network delay between two memory nodes of central interior and between two memory nodes at different data center Network delay differs greatly, and the network delay between different data center can also have larger difference.Existing correcting and eleting codes data Recovery technique mainly randomly chooses available block and carries out fail data reparation, does not fully take into account available block place Memory node between network delay between difference give remediation efficiency bring influence.

Therefore, it how to be directed to the difference of network delay between memory node, the span centre heart collaboration based on correcting and eleting codes is designed and repairs Compound method, effectively improves the data remediation efficiency of extensive distributed storage system, to improve extensive distributed storage system Fault-tolerant ability has become the hot research problem of distribution field of storage.

Summary of the invention

The technical problem to be solved by the present invention is proposing that one kind is based on for the difference of network delay between memory node The span centre heart cooperative repair method of correcting and eleting codes, effectively improves the data remediation efficiency of extensive distributed storage system, to improve The fault-tolerant ability of extensive distributed storage system.

Technical solution of the present invention the following steps are included:

The first step constructs a distributed storage system, it is made of M data center and a control node, every number Include N number of memory node according to center, each node (including control node and memory node) be one can be independently operated Computer, each node are interconnected by the network equipment.M and N is positive integer.

It is mounted on operating system, ICP/IP protocol software in control node and memory node, is configured with network environment.

Control node is responsible for interacting with user, receives the fail data block reparation request that user submits；It is responsible for failure number New memory node is selected according to block, and sends network delay test request to new memory node；It is responsible for being selected according to network delay Select available block and its place memory node；It is responsible for storage desorption coefficient, distributes the reparation of fail data block to each memory node Successful information is repaired in task and reception, is returned to user and is repaired successful information.

Memory node is responsible for the original data block and redundant data block of storing data object, executes network delay test and appoints Business, and network delay test result is returned to control node；It executes fail data block and repairs task, and repaired to control node return Multiple successful information.If data object DO is divided into k original data block i.e. D₁, D₂... ..., D_i..., D_k(1≤i≤k), k is Positive integer carries out coding to k original data block and m redundant data block is calculated, and m is positive integer, this k+m (k+m < N) is a Data block is respectively stored on k+m memory node.In distributed storage system, each data block possesses unique data block coding Number.

Task management program and result reclaimer are installed in control node.Task management program receives what user submitted Fail data block reparation request selects a new memory node for fail data block, and sends network delay to new memory node Test request；According to network delay test result from the k+m-1 of data object DO (removing fail data block) a available block Middle k available block of selection and its place memory node；Fail data is sent to the memory node where k available block Relevant information needed for block reparation request and reparation.As a result reclaimer is responsible for receiving the reparation successful information of memory node and be returned Back to user.

Network delay test program and repair procedure are installed on memory node.Network delay test program, which is responsible for receiving, to be come From the network delay test request of control node, network delay test is carried out, sends network delay test result to control node. Repair procedure is responsible for receiving the request of the fail data block reparation from control node, and to the data available stored on memory node Block is decoded calculating, completes to send after repairing to control node and repairs successful information.

Second step, control node are fail data block D to be repaired_iNew memory node LN is selected, specific method is:

The task management program of 2.1 control nodes receives the fail data block D that user submits_iRepair request.

The task management program of 2.2 control nodes selects a memory node as fail data from available memory node Block D_iNew memory node LN, selection principle is any data block of the not stored data object DO of memory node LN.

Third step, control node send network delay test request and memory node two-dimensional array to memory node LN NSet, specific method are:

The task management program of 3.1 control nodes sends network delay test request to memory node LN.

The task management program of 3.2 control nodes is by the removing fail data block D of data object DO_iIn addition k+m-1 Memory node where available block, which constitutes memory node two-dimensional array NSet, NSet [s] [1], indicates s-th of data available Memory node where block, the data center where memory node where NSet [s] [2] s-th of available block of expression, 1 ≤ s≤k+m-1 sends memory node two-dimensional array NSet to memory node LN.

4th step, memory node LN receive network delay test request and memory node two-dimensional array from control node NSet successively tests the network delay in LN and NSet between each memory node, sends network delay test knot to control node Fruit array NL, specific method is:

The network delay test program of 4.1 memory node LN receives the network delay test request from control node and deposits Store up node two-dimensional array NSet.

The network delay test program of 4.2 memory node LN successively tests the network in LN and NSet between each memory node Delay stores test result using network delay test result array NL, and NL [s] indicates memory node LN and memory node NSet Network delay between [s] [1], the specific steps are as follows:

4.2.1 memory node serial number s=1 is initialized.

4.2.2 test memory node LN and memory node NSet [s] [1] between network delay Lat, enable NL [s]= Lat。

4.2.3 node ID, s=s+1 are updated storage.

4.2.4 if s=k+m, network delay test terminates, and turns the 4.3rd step；Otherwise, 4.2.2 step is gone to, after Continuous test network delay.

The network delay test program of 4.3 memory node LN sends network delay test result array NL to control node.

5th step, control node receive the network delay test result array NL from memory node LN；According to NL from number It is saved according to selecting k available block and its place to store in k+m-1 (removing fail data block) a available block of object DO Point, specific method are:

The task management program of 5.1 control nodes receives the network delay test result array NL from memory node LN.

The task management program of 5.2 control nodes is a according to the k+m-1 (removing fail data block) of NL from data object DO K available block is selected in available block, it can using k that selected memory node two-dimensional array RNSet storage is selected With where data block memory node and its place data center, RNSet [t] [1] indicate t-th selected of available block Place memory node, RNSet [t] [2] are indicated in the place data of selected t-th of available block place memory node The heart, 1≤t≤k, the specific steps are as follows:

5.2.1 selected memory node serial number t=1 is initialized.

5.2.2 according to network delay test result array NL, from k+m-1 (removing fail data block) a available block Select t-th of available block and its place memory node, the specific steps are as follows:

5.2.2.1 the memory node serial number s=k+m-1 where available block is initialized.

5.2.2.2 if NL [s]=0, enables s=s-1, turn 5.2.2.2 step；Otherwise, minimum network delay MinL= NL [s], the corresponding memory node serial number ms=s of minimum network delay.

5.2.2.3 if MinL > NL [s-1] and NL [s-1] > 0, enable MinL=NL [s-1], ms=s-1.

5.2.2.4 s=s-1 is enabled.

5.2.2.5 if s=0, find t-th of available block and its place memory node, RNSet [t] [1]=NSet [ms] [1], RNSet [t] [2]=NSet [ms] [2], NL [ms]=0 turn 5.2.3 step；Otherwise, turn 5.2.2.3 step continues comparing cell delay.

5.2.3 selected memory node serial number, t=t+1 are updated.

5.2.4 if t > k, k available block and its place memory node have been found, the 6th step is turned；It is no Then, turn 5.2.2 step, continue to select available block.

6th step, control node execute task management program, and the place of the k available block selected to the 5th step stores Node sends fail data block D_iRepair request, data available block number and its desorption coefficient H_ir(1≤i≤k, 1≤r≤k), quilt Select memory node two-dimensional array RNSet, fail data block D_iNew memory node LN number.

The place memory node of 7th step, k available block of the 5th step selection executes repair procedure, receives and carrys out automatic control The fail data block D of node processed_i(1≤i≤k) repairs request, data available block number and its desorption coefficient H_ir(1≤i≤k, 1 ≤ r≤k), selected memory node two-dimensional array RNSet, fail data block D_iNew memory node LN number；Based on available number According to block and its desorption coefficient H_irIt is decoded calculating, each data center in RNSet first merges decoding calculated result, Then the intermediate decoded results after merging are sent to LN.

The repair procedure of the place memory node of 7.1 k selected available blocks is decoded calculating, obtains k New data block, i.e. S_i1..., S_ir..., S_ik, the specific steps are as follows:

7.1.1 selected memory node serial number r=1 is initialized.

7.1.2 the repair procedure of memory node RNSet [r] [1] receives the fail data block D from control node_iIt repairs Request, data available block number and its desorption coefficient H_ir, selected memory node two-dimensional array RNSet, fail data block D_iIt is new The number of memory node LN.

7.1.3 the repair procedure of memory node RNSet [r] [1] according to data available block number obtain be locally stored can With data block E_r, carry out local decoding and calculate, by desorption coefficient H_irWith available block E_rIt carries out wise multiplication and obtains new data Block S_ir, that is, S_ir=H_ir×E_r。

7.1.4 if r < k, enables r=r+1, turn 7.1.2 step；Otherwise, decoding calculates and terminates, and turns the 7.2nd step.

The repair procedure of the place memory node of 7.2 k selected available blocks carries out the solution inside data center Code result merges, the intermediate decoded results after being merged, the specific steps are as follows:

7.2.1 it is using the decoding calculated result of k selected available block of data block status array CSet record No to be transmitted and merge, initializing each element in CSet is 0, i.e. CSet [1] ... ..., CSet [r] ... ..., CSet [k] is 0 (1≤r≤k).

7.2.2 selected memory node serial number r=1 is initialized.

7.2.3 follow-up storage node ID x=r+1 is initialized.

7.2.4 if CSet [x]=0 and RNSet [r] [2]=RNSet [x] [2], memory node RNSet [x] [1] the data block S that decoding is calculated repair procedure_ixIt is sent to memory node RNSet [r] [1], memory node RNSet The repair procedure of [r] [1] receives the decoding calculation result data block S that memory node RNSet [x] [1] is sent_ix, by S_ixWith this Ground decodes calculation result data block S_irIt merges, that is, S_ir=S_ir+S_ix(i.e. by data block S_irAnd S_ixIt is added, is enabled by turn CSet [x]=1 turns 7.2.5 step；Otherwise, directly turn 7.2.5 step.

7.2.5 if x < k, enables x=x+1, turn 7.2.4 step；Otherwise, r=r+1 turns 7.2.6 step.

7.2.6 if r < k, turn 7.2.3 step；Otherwise, data center's inner decoding result merging terminates, and turns the 7.3rd Step.

Intermediate decoded results after merging are sent to fail data block D by 7.3 each data centers_iNew memory node LN, Specific step is as follows:

7.3.1 selected memory node serial number r=1 is initialized.

7.3.2 if CSet [r]=0, the repair procedure of memory node RNSet [r] [1] is by data block S_irIt is sent to mistake Imitate data block D_iNew memory node LN, turn 7.3.3；If CSet [r] ≠ 0, directly turns 7.3.3.

7.3.3 if r < k, enables r=r+1, turn 7.3.2 step；Otherwise, intermediate decoded results transmission terminates, and turns the 8th Step.

8th step, fail data block D_iNew memory node LN receive the intermediate decoded results that each data center sends, will The intermediate decoded results received, which merge, finally to be decoded as a result, the D restored_i, to control node send repair at Function information.

The result reclaimer of 9th step, control node receives the reparation successful information from memory node LN, to user It returns and repairs successful information.

Compared with prior art, the present invention has following technological merit:

1. 4.2 step fail data block D of the present invention_iNew memory node LN receive from control node network delay test The network delay between the memory node of each available block place is successively tested after request, and network delay test has been effectively ensured Accuracy.

2. 5.2 step control nodes of the invention select available block and its place according to the network delay between memory node Memory node effectively reduces the network transmission expense of repair data, establishes base to effectively improve the remediation efficiency of fail data Plinth.

3. being carried out inside data center between the memory node where k selected available block of 7.2 steps of the invention Decoding result merge, the intermediate decoded results after merging are then sent to fail data block D_iNew memory node LN, fill Divide the bandwidth resources being utilized inside data center, reduces the transport overhead between data center, effectively increase failure number According to remediation efficiency.

Detailed description of the invention

Fig. 1 is the physical structure of the distributed storage system of first step building of the present invention.

Fig. 2 is the software deployment figure of distributed storage system of the present invention.

Fig. 3 is overview flow chart of the present invention.

Specific embodiment

Fig. 3 is general flow chart of the present invention.Detailed process is as follows:

The first step constructs a distributed storage system as shown in Figure 1, it is by M data center and a control node Composition, each data center includes N number of memory node, each node (including control node and memory node) be one can Independently operated computer, each node are interconnected by the network equipment.M and N is positive integer.The node of distributed storage system is by controlling Node and memory node composition processed.Control node and memory node are comprising processor, memory, disk and network interface Computer.It is connected between control node and memory node by interference networks.Behaviour is mounted in control node and memory node Make system, ICP/IP protocol software, is configured with network environment.Control node is responsible for interacting with user, receives the mistake that user submits Imitate data block reparation request；It is responsible for fail data block and selects new memory node, and send network to new memory node and prolong Slow test request；It is responsible for selecting available block and its place memory node according to network delay；It is responsible for storage desorption coefficient, to Each memory node distribution fail data block, which is repaired task and received, repairs successful information, returns to user and repairs successful information.It deposits Storage node is responsible for the original data block and redundant data block of storing data object, executes network delay test assignment, and to control Node returns to network delay test result；It executes fail data block and repairs task, and returned to control node and repair successful information.

As shown in Fig. 2, being equipped with task management program and result reclaimer in control node.Task management program receives The fail data block reparation request that user submits selects a new memory node for fail data block, and sends out to new memory node The request of SCN Space Cable Network delayed test；K+m-1 (removing fail data block) according to network delay test result from data object DO is a K available block and its place memory node are selected in available block；To the memory node hair where k available block Relevant information needed for sending fail data block reparation to request and repair.As a result reclaimer be responsible for receive memory node reparation at Function information simultaneously returns to user.Network delay test program and repair procedure are installed on memory node.Network delay tests journey Sequence is responsible for receiving the network delay test request from control node, carries out network delay test, sends network to control node Delayed test result.Repair procedure is responsible for receiving the request of the fail data block reparation from control node, and on memory node The available block of storage is decoded calculating, completes to send after repairing to control node and repairs successful information.

Second step, control node execute task management program, are fail data block D to be repaired_iSelect new memory node LN。

Third step, control node execute task management program, send network delay test request to memory node LN and deposit Store up node two-dimensional array NSet.

4th step, memory node LN execute network delay test program, receive the network delay test from control node Request and memory node two-dimensional array NSet, successively the network delay in test and NSet between each memory node, is saved to control Point sends network delay test result array NL.

5th step, control node execute task management program, receive the network delay test result from memory node LN Array NL；It selection k can from k+m-1 (removing fail data block) a available block of data object DO according to network delay With data block and its place memory node.

6th step, control node execute task management program, and the place of the k available block selected to the 5th step stores Node sends fail data block D_iRepair request, data available block number and its desorption coefficient H_ir, selected memory node two-dimemsional number Group RNSet, fail data block D_iNew memory node LN number.

The place memory node of 7th step, k available block of the 5th step selection executes repair procedure, receives and carrys out automatic control The fail data block D of node processed_iRepair request, data available block number and its desorption coefficient H_ir, selected memory node two-dimemsional number Group RNSet, fail data block D_iNew memory node LN number；Based on available block and its desorption coefficient H_irIt is decoded It calculates, first merges decoding calculated result inside each data center in RNSet, then by the intermediate decoded after merging As a result it is sent to LN.

Claims

1. A cross-center cooperative restoration method based on erasure code, it is characterized in that comprising the following steps:

The first step is to build a distributed storage system, which consists of M data centers and a control node. Each data center includes N storage nodes. Each node is a computer that can run independently. Device interconnection, both M and N are positive integers;

Operating systems, TCP/IP protocol software are installed on the control nodes and storage nodes, and the network environment is configured;

The control node is responsible for interacting with users, receiving invalid data block repair requests submitted by users; responsible for selecting new storage nodes for invalid data blocks, and sending network delay test requests to new storage nodes; responsible for selecting available data blocks and their The storage node where it is located is responsible for storing the decoding coefficients, distributing the repair task of invalid data blocks to each storage node and receiving the repair success information, and returning the repair success information to the user;

The storage node is responsible for storing the original data block and redundant data block of the data object, performing the network delay test task, and returning the network delay test result to the control node; performing the repair task of the invalid data block, and returning the repair success information to the control node; The object DO is divided into k original data blocks, that is, D ₁ , D ₂ ,..., _{D i} _. Redundant data blocks, m is a positive integer, these k+m data blocks are stored on k+m storage nodes, k+m<N; in a distributed storage system, each data block has a unique data block number ;

The task management program and the result recovery program are installed on the control node; the task management program receives the invalid data block repair request submitted by the user, selects a new storage node for the invalid data block, and sends a network delay test request to the new storage node; according to the network delay Test results Select k available data blocks and their storage nodes from the k+m-1 available data blocks except the invalid data blocks of the data object DO; send invalid data block repair requests to the storage nodes where the k available data blocks are located and the relevant information required for repair; the result recovery program is responsible for receiving the repair success information of the storage node and returning it to the user;

A network delay test program and a repair program are installed on the storage node; the network delay test program is responsible for receiving the network delay test request from the control node, performing network delay test, and sending the network delay test result to the control node; the repair program is responsible for receiving the network delay test request from the control node Invalid data block repair request, and decode and calculate the available data blocks stored on the storage node, and send repair success information to the control node after the repair is completed;

In the second step, the control node selects a new storage node LN for the invalid data block D _i to be repaired;

In the third step, the control node sends the network delay test request and the storage node two-dimensional array NSet to the storage node LN, the specific method is:

3.1 The task management program of the control node sends a network delay test request to the storage node LN;

3.2 The task management program of the control node forms the storage node where k+m-1 available data blocks of the data object DO except the invalid data block D _i are located to form a two-dimensional array NSet of storage nodes, and NSet[s][1] represents the first The storage node where the s available data block is located, NSet[s][2] indicates the data center where the storage node where the sth available data block is located, 1≤s≤k+m-1, send the storage node to the storage node LN array NSet;

In the fourth step, the storage node LN receives the network delay test request from the control node and the storage node two-dimensional array NSet, sequentially tests the network delay between the LN and each storage node in the NSet, and sends the network delay test result array NL to the control node, The specific method is:

4.1 The network delay test program of the storage node LN receives the network delay test request from the control node and the two-dimensional array NSet of the storage node;

4.2 The network delay test program of the storage node LN tests the network delay between the LN and each storage node in the NSet in turn, and uses the network delay test result array NL to store the test results, and NL[s] represents the storage node LN and the storage node NSet[s] The network delay between [1];

4.3 The network delay test program of the storage node LN sends the network delay test result array NL to the control node;

In the fifth step, the control node receives the network delay test result array NL from the storage node LN; select k available data blocks and their locations from the k+m-1 available data blocks of the data object DO except the invalid data blocks according to NL Storage nodes, the specific method is:

5.1 The task management program of the control node receives the network delay test result array NL from the storage node LN;

5.2 The task management program of the control node selects k available data blocks from the k+m-1 available data blocks of the data object DO according to NL, and uses the two-dimensional array RNSet of the selected storage node to store the selected k The storage node where the available data block is located and the data center where it is located, RNSet[t][1] indicates the storage node where the selected t-th available data block is located, and RNSet[t][2] indicates the selected t-th available data block The data center of the storage node where the data block is located, 1≤t≤k;

In the sixth step, the control node executes the task management program, and sends the invalid data block D _i repair request, the number of the available data block and its decoding coefficient H _ir , the selected storage node Two-dimensional array RNSet, number of new storage node LN of invalid data block D _i , 1≤r≤k;

In the seventh step, the storage node where the k available data blocks selected in the fifth step are located executes the repair program, and receives the repair request of the invalid data block D _i from the control node, the number of the available data block and its decoding coefficient H _ir , and the selected storage node The two-dimensional array RNSet, the number of the new storage node LN of the invalid data block D _i ; the decoding calculation is performed based on the available data block and its decoding coefficient H _ir , each data center in the RNSet first merges the decoding calculation results, and then merges After the intermediate decoding result is sent to LN:

7.1 The repair program of the storage node where the selected k available data blocks are located is decoded and calculated to obtain new data blocks S _i1 , ..., S _ir , ..., S _ik ;

7.2 The repair program of the storage node where the selected k available data blocks are located is to merge the decoding results inside the data center to obtain the merged intermediate decoding results. The method is:

7.2.1 Use the data block state array CSet to record whether the decoding calculation results of the selected k available data blocks are transmitted and combined, and initialize each element in the CSet to 0, that is, CSet[1], ..., CSet[r ], ..., CSet[k] all=0;

7.2.2 Initialize the serial number of the selected storage node r=1;

7.2.3 Initialize the subsequent storage node serial number x=r+1;

7.2.4 If CSet[x]=0 and RNSet[r][2]=RNSet[x][2], the repair program of storage node RNSet[x][1] will decode and calculate the data block S _ix to send To the storage node RNSet[r][1], the repair program of the storage node RNSet[r][1] receives the decoding calculation result data block S _ix sent by the storage node RNSet[x][1], and compares S _ix with the local decoding The calculation result data block S _ir is merged, that is, S _ir =S _ir +S _ix , let CSet[x]=1, go to step 7.2.5; otherwise, go to step 7.2.5 directly;

7.2.5 If x<k, set x=x+1, go to step 7.2.4; otherwise, r=r+1, go to step 7.2.6;

7.2.6 If r<k, go to step 7.2.3; otherwise, the merging of internal decoding results in the data center ends;

7.3 Each data center sends the merged intermediate decoding result to the new storage node LN of the invalid data block D _i ;

In the eighth step, the new storage node LN of the invalid data block D _i receives the intermediate decoding results sent by each data center, combines the received intermediate decoding results to obtain the final decoding result, that is, the recovered D _i , and sends a repair success message to the control node information;

In the ninth step, the result recovery program of the control node receives the restoration success information from the storage node LN, and returns the restoration success information to the user.

2. A kind of erasure code-based cross-center collaborative repair method as claimed in claim 1, it is characterized in that the specific method that the control node in the second step selects a new storage node _LN for the invalid data block D to be repaired is :

2.1 The task management program of the control node receives the invalid data block D _i repair request submitted by the user;

2.2 The task management program of the control node selects a storage node from the available storage nodes as the new storage node LN of the invalid data block D _i . The selection principle is that the storage node LN does not store any data block of the data object DO.

3. A kind of inter-center cooperative restoration method based on erasure codes as claimed in claim 1, characterized in that the network delay test program of the storage node LN described in step 4.2 of the fourth step tests the relationship between each storage node in the LN and the NSet sequentially. The method of inter-network delay is:

4.2.1 Initialize the storage node serial number s=1;

4.2.2 Test the network delay Lat between the storage node LN and the storage node NSet[s][1], let NL[s]=Lat;

4.2.3 Update the serial number of the storage node, s=s+1;

4.2.4 If s=k+m, end; otherwise, go to step 4.2.2.

4. A kind of cross-center cooperative repair method based on erasure code as claimed in claim 1, it is characterized in that the task management program of the control node described in the fifth step step 5.2 removes the failure data block from the data object DO according to NL The specific method for selecting k available data blocks from k+m-1 available data blocks is:

5.2.1 Initialize the serial number of the selected storage node t=1;

5.2.2 According to the network delay test result array NL, select the t-th available data block and its storage node from the k+m-1 available data blocks except the invalid data block. The specific steps are as follows:

5.2.2.1 Initialize the serial number of the storage node where the available data block is located s=k+m-1;

5.2.2.2 If NL[s]=0, then set s=s-1, go to step 5.2.2.2; otherwise, the minimum network delay MinL=NL[s], the storage node serial number ms=s corresponding to the minimum network delay;

5.2.2.3 If MinL>NL[s-1] and NL[s-1]>0, set MinL=NL[s-1], ms=s-1;

5.2.2.4 Let s=s-1;

5.2.2.5 If s=0, the tth available data block and its storage node have been found, RNSet[t][1]=NSet[ms][1], RNSet[t][2]=NSet[ms] ][2], NL[ms]=0, turn to step 5.2.3; otherwise, turn to step 5.2.2.3;

5.2.3 Update the serial number of the selected storage node, t=t+1;

5.2.4 If t>k, end; otherwise, go to step 5.2.2 and continue to select available data blocks.

5. A cross-center collaborative repair method based on erasure codes as claimed in claim 1, characterized in that in the seventh step, the repair program of the storage node where the selected k available data blocks are located in step 7.1 performs decoding calculation The specific method is:

7.1.1 Initialize the serial number of the selected storage node r=1;

7.1.2 The repair program of the storage node RNSet[r][1] receives the repair request of the invalid data block D _i from the control node, the number of the available data block and its decoding coefficient H _ir , the two-dimensional array RNSet of the selected storage node, the invalid data The number of the new storage node LN of the block D _i ;

7.1.3 The repair procedure of the storage node RNSet[r][1] obtains the locally stored available data block E _r according to the available data block number, performs local decoding calculation, and compares the decoding coefficient H _ir with the available data block E _r bit by bit Multiply to get the new data block S _ir , that is, S _ir =H _ir ×E _r ;

7.1.4 If r<k, set r=r+1 and go to step 7.1.2; otherwise, end the decoding calculation and obtain S _i1 , ..., S _ir , ..., S _ik .

6. A kind of erasure code-based cross-center collaborative repair method as claimed in claim 1, characterized in that in the seventh step, each data center in step 7.3 sends the merged intermediate decoding result to the invalid data block D _i The specific method of the new storage node LN is:

7.3.1 Initialize the serial number of the selected storage node r=1;

7.3.2 If CSet[r]=0, then the repair program of the storage node RNSet[r][1] sends the data block S _ir to the new storage node LN of the invalid data block D _i , go to 7.3.3; if CSet[ r]≠0, go directly to 7.3.3;

7.3.3 If r<k, set r=r+1 and go to step 7.3.2; otherwise, the transmission of intermediate decoding results ends.