CN113553212B - Hybrid regeneration coding repair method and system for satellite cluster storage network - Google Patents

Hybrid regeneration coding repair method and system for satellite cluster storage network Download PDF

Info

Publication number
CN113553212B
CN113553212B CN202110856458.7A CN202110856458A CN113553212B CN 113553212 B CN113553212 B CN 113553212B CN 202110856458 A CN202110856458 A CN 202110856458A CN 113553212 B CN113553212 B CN 113553212B
Authority
CN
China
Prior art keywords
cluster
repair
data
node
gamma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110856458.7A
Other languages
Chinese (zh)
Other versions
CN113553212A (en
Inventor
顾术实
王福刚
张智凯
张钦宇
孙新毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202110856458.7A priority Critical patent/CN113553212B/en
Publication of CN113553212A publication Critical patent/CN113553212A/en
Application granted granted Critical
Publication of CN113553212B publication Critical patent/CN113553212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/085Error detection or correction by redundancy in data representation, e.g. by using checking codes using codes with inherent redundancy, e.g. n-out-of-m codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a hybrid regeneration coding repair method and a system for a satellite cluster storage network, wherein the method comprises the following steps: performing first encoding and second encoding on the file; firstly, performing intra-cluster repair, including transmitting repair data to a new node by each local help node in a main cluster of a failure node, transmitting repair data to the new node by each help node in an auxiliary cluster of the failure node, and calculating the data amount required by inter-cluster repair according to the intra-cluster repair data amount; and performing cross-cluster repair, and transmitting required data quantity to the newly generated node through each cluster of the regenerated code strip to complete the repair process of the failure node data. The invention not only considers the isomerism of the satellite cluster storage network, but also considers the transmission bandwidth cost difference caused by the link difference among different clusters, and realizes the data restoration of the cross-cluster asymmetric link according to the cross-cluster data restoration fault tolerance theory on the basis of the hybrid regeneration code restoration principle.

Description

Hybrid regeneration coding repair method and system for satellite cluster storage network
Technical Field
The invention relates to the field of network data storage, in particular to a hybrid regeneration coding repair method and system for a satellite cluster storage network.
Background
In the data storage of the satellite cluster storage network, during the operation of the satellite storage device, due to factors such as high-energy particles and cosmic rays, single Event Upset (SEU) is caused, soft errors are caused, so that data are damaged, and in addition, due to factors such as overhigh device temperature and the like of some satellites in repeated data calculation and transmission, the data are not available. In order to ensure the availability of data when the storage nodes are not available, the cluster storage network system needs to introduce a fault-tolerant strategy of the data, and the fault tolerance is improved by increasing the redundancy of the system.
The current methods for improving fault tolerance in distributed storage systems are mainly divided into two types, one is a data Replication (Replication) strategy and the other is an Erasure-Coding (Erasure-Coding) strategy. In the copy strategy, the system divides the stored file into data blocks, copies the data blocks into a plurality of copies, then stores the data blocks in different nodes, and can acquire data from other nodes for restoration when the nodes are damaged. The fault-tolerant strategy copies the stored data into a plurality of copies, has low storage efficiency, and greatly increases storage overhead especially in a system for storing large data.
In the traditional erasure strategy, reed-Solomon (RS) codes are most commonly used, an RS code with a parameter of (n, k) can divide a file with a size of M into k parts, which are called data blocks, additional n-k check blocks are obtained through a coding matrix, then the n blocks are stored in nodes of different clusters, any one of the data blocks is lost, the data blocks can be recovered by connecting the nodes corresponding to the rest of the k normal working clusters, and the n-k data blocks can be maximally tolerated to be lost. Such erasure policies provide high fault tolerance of data and greatly increase the storage efficiency of the system, but require that k clusters be connected for data repair and that each cross-cluster transfers an amount of data equivalent to the file size, i.e., repair size M 0 Is repaired with a bandwidth of kM 0 This greatly increases repair overhead. In order to remedy the defect of RS code in repairing expenditure, dimakis et al put forward the concept of new regeneration code (Regenerating Codes, RC) of MDS coding mode based on network coding theory, its data repairing principle is similar to RS code, for damage of a random data block, the data in it needs to connect the corresponding nodes in the remaining arbitrary d (d.gtoreq.k) clusters, and the nodes in each cluster transmit beta data quantity respectively, then the total data quantity needed for repairing is dbeta, called repairing bandwidth, when d = n-1, the repairing bandwidth is minimum, and the total bandwidth is smaller than the file size M, the regeneration code compromises the memory expenditure and the repairing bandwidth, and the regeneration code reaching the minimum memory expenditure is generally called minimum memory regeneration code (Minimum Storage Regenerating Codes, M)SR), the minimum repair bandwidth regeneration code is generally referred to as minimum bandwidth regeneration code (Minimum Bandwidth Regenerating Codes, MBR). Although the regenerated code is slightly lower in storage efficiency than the normal RS code, the data transmission bandwidth is much smaller when data corruption occurs, so the regenerated code has a better storage bandwidth tradeoff than the RS code.
In a clustered storage system, the communication link distance between clusters is long and shared by a plurality of nodes, and bandwidth resources between clusters are generally considered to be expensive and scarce, so when a fault tolerance policy in a clustered distributed storage system is designed, the difference of bandwidth costs inside and outside the clusters needs to be considered, and the bandwidth resources between clusters needed for repairing the nodes are reduced as much as possible. The general fault-tolerant strategy often causes a large amount of cross-cluster bandwidth, and the data transmission cost is greatly increased. Therefore, in the cluster storage system, the generalized regenerated code is proposed, the cross-cluster repair bandwidth is reduced by increasing the number of auxiliary nodes in the cluster, and the compromise between the storage overhead and the cross-cluster repair is better realized. Currently, generalized regenerated codes are an excellent data storage codeword in a clustered storage system.
However, in a satellite cluster storage network, due to reasons of distance, power, bandwidth resource availability and the like between different cluster satellites, communication links between different clusters are heterogeneous, i.e. different costs for transmitting data across the cluster links are different, while generalized regenerated codes only consider the difference of bandwidth costs inside and outside the clusters, and do not consider the difference of bandwidth transmission costs of links between different clusters, a symmetrical repair mode is adopted between the clusters, which may be suboptimal in the case of the difference of bandwidth transmission costs of links between different clusters. In addition, all coding modes only focus on reducing the repair bandwidth, and neglecting the measure with cost as an index, reducing the repair bandwidth in a generic way increases the repair cost. Furthermore, through research on the coding mode and the repairing principle of the generalized regenerated code, the coding mode is to realize data fault tolerance by adding redundancy, and MDS characteristics (namely, original data can be repaired by corresponding nodes of any k clusters) are required to be met, so that the cross-cluster repairing data volume of the generalized regenerated code still has redundancy.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides a hybrid regeneration coding repair method and a system for a satellite cluster storage network, aiming at balancing transmission cost by considering different data volumes transmitted by different cross-cluster links according to the isomerism of the data volumes on the premise of different cross-cluster link isomerism, solving the problem of linear programming and reducing the total transmission cost during node repair.
In a first aspect of the present invention, a hybrid regenerative code repair method for a satellite cluster storage network is provided, including:
s1, performing primary coding on a file, wherein the primary coding comprises performing RS coding on one part of the file and performing regeneration code coding on the other part of the file, and storing data after primary coding to corresponding nodes of each cluster to form a cross-cluster data strip comprising an RS code strip and a regeneration code strip;
s2, superposing the first coded regenerated code data and the first coded RS code data by multiplying a reversible generation matrix, and storing the first coded regenerated code data and the first coded RS code data into the regenerated code strip;
s3, performing intra-cluster repair, wherein each local auxiliary node transmits repair data gamma to a new node in a main cluster of a failure node, wherein gamma represents the local repair bandwidth of the main cluster, each auxiliary node transmits repair data gamma ' to the new node in an auxiliary cluster of the failure node, and gamma ' represents the local repair bandwidth of the auxiliary cluster, and the data quantity required by cross-cluster repair is calculated according to gamma and gamma ';
s4, performing cross-cluster repair, and transmitting required data quantity to the newly generated node through each cluster of the regenerated code strip to finish data repair of the failure node.
Further, the following constraint conditions met by each local helping node to transmit the repair data gamma to the new node and each helping node to transmit the repair data gamma' to the new node are respectively:
where α represents the number of symbols stored by each node, β i,j Representing the amount of cross-cluster data transmitted from cluster j when repairing failed nodes of cluster i, d representing the number of clusters, k representing any k clusters of the d clusters, m representing the number of nodes of each cluster, l representing the number of local helping nodes, the gamma and gamma' not being able to simultaneously obtain a lower bound value, gamma * Representing the optimal solution for gamma. .
Further, the hybrid regeneration coding repair cost C comprises data transmission cost inside the cluster and data transmission cost across the cluster, and the specific expression is as follows:
where l' denotes the number of helping nodes in the secondary cluster of failed nodes,representing the slave cluster h when repairing a failed node of cluster i i Cross-cluster data volume of transmission, < >>Representing clusters i and h i Transmission cost coefficient between->When the node of the cluster i is restored, an index set of the external cluster is represented;
when the local restoration bandwidth gamma 'of the auxiliary cluster satisfies gamma' =alpha, the local restoration bandwidth gamma of the main cluster satisfies the lower bound condition gamma not less than gamma (beta), gamma (beta) =alpha-min { alpha, e [k+1,d+1] β [k+1,d+1] ' wherein beta tableN-dimensional cross-cluster repair bandwidth column vector, β= [ β ] i,1i,2 ,...,β i,n ] Ti,i =0,β [k+1,d+1] Representation [ beta ] i,k+1 ,...,β i,d+1 ] T ,e [k+1,d+1] Representation and beta [k+1,d+1] Full 1 row vector of the same dimension, linear programming problem with cross cluster repair bandwidth vector as optimization variableThe method comprises the following steps:
s.t.β≥0
γ(β)≤γ
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the set of external help clusters to which a cluster inode failure is connected c i (beta) represents repair->The repair cost of the middle node, B represents the size of the storable file;
ρ i representation ofTo->Row vectors of transmission cost coefficients, +.>
β i Representing repairThe middle node corresponds to a column vector composed of cross-cluster repair bandwidths,>
when the local repair bandwidth γ of the primary cluster satisfies γ=α, the local repair bandwidth γ' of the secondary cluster satisfies the lower bound constraintWherein xi i Linear programming problem with cross-cluster repair bandwidth vector as optimization variable +.>The method comprises the following steps:
s.t.β≥0
γ'(β)≤γ'
further, repair costs are reduced by increasing the number of local helping nodes when the primary cluster of the satellite cluster storage network is limited, and by reducing the number of local helping nodes when the local repair bandwidth of the secondary cluster is limited.
In a second aspect of the present invention, a hybrid regenerative code repair system for a satellite cluster storage network, comprises:
a first-time encoding unit: the method comprises the steps of performing primary encoding on a file, wherein the primary encoding comprises the steps of performing RS encoding on one part of the file, performing regeneration code encoding on the other part of the file, and storing data after primary encoding to corresponding nodes of each cluster to form a cross-cluster data strip containing RS code strips and regeneration code strips;
second encoding unit: the method comprises the steps of performing secondary coding, multiplying a reversible generation matrix, superposing the reproduction code data after primary coding and the RS code data after primary coding, and storing the reproduction code data into the reproduction code strip;
and (3) a cluster internal repair unit: the method comprises the steps that in a main cluster of a failure node, each local auxiliary node transmits repair data gamma to a new generation node, wherein gamma represents the local repair bandwidth of the main cluster, in an auxiliary cluster of the failure node, each auxiliary node transmits repair data gamma 'to the new generation node, gamma represents the local repair bandwidth of the auxiliary cluster, and the data quantity needed by cross-cluster repair is calculated according to gamma and gamma';
a cross-cluster repair unit: the method is used for performing cross-cluster repair, and the data repair of the failure node is completed by transmitting the required data quantity to the new node through each cluster of the regenerated code strip.
Further, the following constraint conditions met by each local helping node in the intra-cluster repairing unit to transmit repairing data gamma to the new node and each helping node to transmit repairing data gamma' to the new node are respectively:
where α represents the number of symbols stored by each node, β i,j Representing the amount of cross-cluster data transmitted from cluster j when repairing failed nodes of cluster i, d representing the number of clusters, k representing any k clusters of the d clusters, m representing the number of nodes of each cluster, l representing the number of local helping nodes, the gamma and gamma' not being able to simultaneously obtain a lower bound value, gamma * Representing the optimal solution for gamma. .
Further, in the intra-cluster repair unit and the inter-cluster repair unit, when a main cluster of a satellite cluster storage network is limited, repair cost is reduced by increasing the number of local help nodes, and when a local repair bandwidth of an auxiliary cluster is limited, repair cost is reduced by reducing the number of local help nodes.
In a third aspect of the present invention, there is provided a hybrid regenerative code repair system for a satellite cluster storage network, comprising: a processor; and a memory, wherein the memory stores a computer executable program that, when executed by the processor, performs the hybrid regenerative code repair method for a satellite cluster storage network described above.
The invention provides a hybrid regeneration coding repair method and a system for a satellite cluster storage network, which are used for researching a data repair process when storage equipment in the satellite cluster storage network fails, and not only consider layering isomerism of the storage network, but also consider cost difference caused by link difference among different clusters, and realize asymmetric repair of a cross-cluster link according to redundancy of cross-cluster data on the basis of a hybrid regeneration code repair principle, and have the beneficial effects that:
1. by researching the layered restoration principle of the hybrid regeneration code and combining the differences of different cross-cluster link costs in the satellite cluster, an asymmetric restoration mode of the hybrid regeneration code is provided so as to better match the isomerism of the satellite cluster storage network.
2. The definition of the node repairing cost of the hybrid regenerative code is used for representing the data transmission cost of the hybrid regenerative code during node repairing, the difference of the communication link cost among all clusters is quantified by proposing the transmission cost coefficient, the function expression of the node repairing cost of the hybrid regenerative code is obtained and optimized, and the effectiveness of the asymmetric repairing mode of the hybrid regenerative code in reducing the repairing cost is verified.
Drawings
FIG. 1 is a schematic diagram of a hybrid regenerative code repair method for a satellite cluster storage network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data stripe after one-time encoding in an embodiment of the present invention;
FIG. 3 is a schematic diagram of hybrid regenerative coding after secondary coding in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a hybrid regenerative code repair model in an embodiment of the invention;
FIG. 5 is a schematic diagram of a transmission cost model of a clustered storage system in an embodiment of the invention;
FIG. 6 is a graph showing cost comparisons of two repair modes for different codeword parameters in an embodiment of the present invention;
FIG. 7 is a graph showing the cost of two repair methods for different cost factors in accordance with an embodiment of the present invention;
FIG. 8 is a cost comparison diagram of two repair modes under the limitation of the local bandwidth of the main cluster in the embodiment of the invention;
FIG. 9 is a cost comparison diagram of two repair modes under the limitation of local bandwidth of an auxiliary cluster in an embodiment of the invention;
FIG. 10 is a schematic diagram of a hybrid regenerative code repair system for a satellite cluster storage network in accordance with an embodiment of the present invention;
FIG. 11 is a schematic diagram of a computer device in an embodiment of the invention;
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In order to better match the isomerism of the satellite cluster storage network, the invention provides a hybrid regeneration coding repair method for the satellite cluster storage network, as shown in fig. 1, for a hybrid regeneration code with a parameter set of { (n, k, d) (α, β) (m, l) }, the hybrid regeneration codes coexist in n different clusters, each cluster consisting of m nodes. A file of size B symbols is encoded into nm alpha symbols and stored scattered onto nm nodes, each node storing alpha symbols, all from a finite field of size qAnd the entire contents of any k clusters are sufficient to recover the original data file. The first l nodes in each cluster are subjected to RS coding to ensure high storage efficiency of generalized regenerated codes, the last m-l nodes in each cluster are subjected to a coding mode of the regenerated codes to meet low-span cluster transmission bandwidth of the generalized regenerated codes, and after coding is finished, a coding matrix is multiplied to connect data in each cluster. Assuming that one node in the cluster i is damaged, under asymmetric repair, for a mixed regeneration code with parameters of (n, k, d, m, l), in the aspect of intra-cluster repair, in the cluster where a failure node is located, a new node is connected with other l surviving nodes in the cluster and acquires repair data on each node, in the aspect of inter-cluster repair, a new node is connected with other d clusters and acquires beta respectively i,j A symbol of, wherein beta i,j Representing the amount of cross-cluster data transferred from cluster j when repairing a failed node of cluster i,when the node of the cluster i is restored, the index set of the external cluster is represented. Also in the external help cluster, l 'nodes are required to respectively transmit gamma' symbols to the control nodes to calculate the required beta i,j And a symbol. Wherein, in order to ensure the reliability of repair, beta is determined according to the upper limit of the file i,j And the storage overhead alpha, the relation needs to be satisfied>On the premise of meeting the inequality, different data volumes are transmitted by adopting different links through the cost difference of different cross-cluster links so as to optimize the data repair cost. The specific method comprises the following steps:
s1, performing primary coding on a file, wherein the primary coding comprises performing RS coding on one part of the file and performing regenerative code coding on the other part of the file, and storing data after primary coding to corresponding nodes of each cluster to form a cross-cluster data strip comprising an RS code strip and a regenerative code strip;
the specific implementation process is as follows: dividing a file with the size of M into M small files, wherein the size of each small file is M/M; in the embodiment, the first l small files are subjected to RS coding, the last m-l small files are subjected to RS coding, and then m-l small files are subjected to regeneration coding, so that each small file is coded into na symbols and stored in the corresponding node of each cluster, and each alpha symbol is the content of one node, so that each small file can be regarded as a strip after being coded, as shown in fig. 2.
S2, superposing the first coded regenerated code data and the first coded RS code data by multiplying a reversible generation matrix, and storing the first coded regenerated code data and the first coded RS code data into the regenerated code strip;
the specific implementation process is as follows: after the first encoding is finished, the matrix is generated by multiplying an m multiplied by m reversible generationWherein->Is a unit array in which only one element is 1 and the other elements are 0 in each row,/->As the vandermonde matrix, the data of the reproduction code after primary encoding and the data of the RS code after primary encoding are overlapped and stored into the original reproduction code strip, and the encoding is finished, as shown in fig. 3. After two times of encoding, the data repairing of the cross-cluster can be completed through the regeneration code with small bandwidth, when the node is damaged, the last node in the first cluster in the embodiment is damaged, and the asymmetric repairing process of the mixed regeneration code is divided into two parts, namely the repairing inside the cluster and the repairing among the cross-clusters.
S3, performing intra-cluster repair, wherein each local auxiliary node transmits repair data gamma to a new node in a main cluster of a failure node, wherein gamma represents the local repair bandwidth of the main cluster, each auxiliary node transmits repair data gamma ' to the new node in an auxiliary cluster of the failure node, and gamma ' represents the local repair bandwidth of the auxiliary cluster, and the data quantity required by cross-cluster repair is calculated according to gamma and gamma ';
the specific implementation process is as follows: in a main cluster where a failure node is located, l local help nodes transmit gamma repair data to new nodes respectively; in d auxiliary clusters, the helping node of l 'transmits gamma' repair data to the new node to calculate the data quantity needed by cross-cluster repair
S4, performing cross-cluster repair, and transmitting required data quantity to the newly generated node through each cluster of the regenerated code strip to finish data repair of the failure node.
The specific implementation process is as follows: after the repair inside the cluster is finished, each cluster of the regenerated code strip is transmittedWherein->Representing a set of helper clusters participating in repairing a node in cluster 1, and +.>The repair process ends.
When repairing a failed node in cluster i, the amount of data beta is repaired across clusters i,j Will be adjusted according to the different cost coefficients of the links between clusters, i.e. the data amount transmitted by the expensive cross-cluster links is reduced by increasing the data amount transmitted by the cheap cross-cluster links, thereby reducing the total repair cost, and the same as symmetric repair i,j Can be regarded as a function of l 'and gamma'. A schematic diagram of the hybrid regenerative code repair model is shown in FIG. 4.
According to the least squares theorem of the information flow diagram, under the asymmetric restoration of the function restoration mixed regeneration code of the parameter set (n, k, d, m, l), the upper bound of the storable file size B can be realizedThe method comprises the following steps:wherein (1)>When the node of the cluster i is restored, the index set of the external cluster is represented. The file storage capacity of the hybrid regeneration code asymmetric repair strategy can be obtained and is no more than that of different generalized regeneration codes.
In an alternative embodiment, to reach the upper bound of the storage file capacity, the local repair bandwidth γ of the primary cluster and the local repair bandwidth γ' of the secondary cluster are limited, the lower bounds of which are given by: preferably, when one of the local bandwidths reaches the lower bound, the other defaults to α, i.e., γ and γ' cannot simultaneously take the lower bound.
By analyzing the transmission cost of the link, the concept of the transmission cost coefficient is defined, ρ ij Representing the cost per bit of data to be transmitted when data is transmitted between cluster i and cluster j. In all cluster storage systems, the communication links within a cluster and between clusters are different, and the communication distances between links are different, so that the cost of data transmission across clusters is different. Inside the cluster, the cost coefficients of the communication links inside each cluster are normalized due to the convenience of communication, namelyIn addition, since the same communication link is used when the clusters i and j are mutually transmitting data, ρ is ij =ρ ji A cluster storage system transmission cost model as shown in fig. 5 may be built and the cluster storage system quantized with an n x n transmission cost coefficient matrix ψ:
the notation ψ= [ ψ ] 12 ,…,ψ n ] T ,ψ i (1.ltoreq.i.ltoreq.n) is an n-dimensional vector ψ i =[ρ i,1i,2 ,…ρ i,n ]When any node of the ith cluster fails, any psi is taken i Except ρ ii Any d elements other than For the transmission cost coefficient vector involved in a single repair, wherein +.>When the node of the cluster i is restored, the index set of the external cluster is represented.
The data repair cost of the hybrid regeneration code is divided into two parts, one part is the data transmission cost inside the cluster, and the other part is the data transmission cost across the clusters. The method can obtain the following steps: c=c intra-cluster +C inter-cluster Further, the repair cost formula for the asymmetric repair of the mixed regenerated code can be expressed as follows:assuming that the transmission cost coefficient matrix ψ of a cluster distributed storage system is known, the hybrid regeneration code of the asymmetric repair model has known parameter sets (n, k, d, m, l), the storage overhead is a, and the external help cluster set connected with the failure of the cluster inode is +.>Meaning that d are taken as external help clusters from any of the remaining n-1 clusters, assuming that the set of subscripts of the external help clusters taken is +.>The cross-cluster repair bandwidth of a cluster i to external help cluster request is denoted +.>The corresponding transmission cost coefficient is +.>The repair cost of any node in the repair cluster i can be obtained as follows: />As can be seen from the above analysis,
when the local restoration bandwidth gamma 'of the auxiliary cluster satisfies gamma' =alpha, the local restoration bandwidth gamma of the main cluster satisfies the lower bound condition gamma not less than gamma (beta), gamma (beta) =alpha-min { alpha, e [k+1,d+1] β [k+1,d+1] -where β represents an n-dimensional cross-cluster repair bandwidth column vector, β= [ β ] i,1i,2 ,...,β i,n ] Ti,i =0,β [k+1,d+1] Representation [ beta ] i,k+1 ,...,β i,d+1 ] T ,e [k+1,d+1] Representation and beta [k+1,d+1] The problem of minimizing global repair bandwidth cost in this case can be expressed as a linear programming problem with one objective function as global repair bandwidth cost as objective function and cross-cluster repair bandwidth vector as optimization variableThe method comprises the following steps:
s.t.β≥0
γ(β)≤γ
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the set of external help clusters to which a cluster inode failure is connected c i (beta) represents repair->The repair cost of the middle node, B represents the size of the storable file;
ρ i representation ofTo->Row vectors of transmission cost coefficients, +.>
β i Representing repairThe middle node corresponds to a column vector composed of cross-cluster repair bandwidths,>
when the local repair bandwidth γ of the primary cluster satisfies γ=α, the local repair bandwidth γ' of the secondary cluster satisfies the lower bound constraintWherein xi i Linear programming problem with cross-cluster repair bandwidth vector as optimization variable +.>The method comprises the following steps:
s.t.β≥0
γ'(β)≤γ'
preferably, the transmission cost coefficient ρ between cluster i and cluster j in a clustered distributed storage system is assumed ij As j becomes progressively larger from 1 to n and i+.j, i.e. ρ i1 ≤ρ i2 ≤...≤ρ i,i-1 ≤ρ i,i+1 ...≤ρ in It is easy to know that in order to minimize the transmission cost of the helper data, the new node will choose a path with a smaller coefficient of transmission cost for both when selecting the external helper cluster, and therefore it can be assumed that in this case,node repair corresponding external help cluster index set in cluster iFrom this, beta is deduced i1 ...β i,i-1 β i,i+1 ... β in 0, i.e. the cross-cluster repair bandwidth varies inversely with its transmission cost factor between different clusters, as demonstrated by the anti-certification law.
Assume thatIs satisfied->And->The optimal solution of the constraint conditions in (1) only satisfies the beta.gtoreq.0 constraint, assuming that +.>For some i 1 <i 2 Exchange->Value of->And->The feasible area of the upper bound of capacity is unchanged, i.e. the whole constraint and however the objective function, i.e. the system global repair bandwidth cost, will be reduced, because of the larger +.>The multiplied transmission cost coefficient is reduced, which goes against the previous +.>Is an assumption of the optimal solution, and thus for the optimal solution beta * For example, the elements must satisfy β i1 ... β i,i-1 β i,i+1 ... β in 0。
The bandwidth across clusters can finally be satisfied:on the premise of changing the cross-cluster repair bandwidth between different clusters in opposite trend with the transmission cost coefficient thereof, namely, more expensive cross-cluster communication links transmit less data, so that the repair cost of the nodes can be effectively reduced.
Preferably, repair costs are reduced by increasing the number of local helping nodes when the primary cluster of the satellite cluster storage network is limited, and by reducing the number of local helping nodes when the local repair bandwidth of the secondary cluster is limited.
Application example:
the hybrid regenerative code repair method for satellite cluster storage networks described above is further explained below in conjunction with a specific example.
Example 1: under different codeword parametersThe performance of the two repair modes is compared, as shown in fig. 6, and the parameter settings are respectively: (n=5, k=3, d=4, m=4), (n= 7,k =4, d=5, m=6) and (n= 9,k =4, d=5, m=7), b=36, l= 3,l' =m, ρ i =[1 2 5 10 20 30 50 100 200]。
Example 2: under different cross-cluster transmission coefficients, the performance of the two repair modes is compared, as shown in FIG. 7, ρ i The method comprises the following steps of: [1 2 5 10 20 50 100]、[1 3 15 20 30 100 200]And [1 5 20 50 100 200 500 ]]Similarly, b=36, l= 3,l' =m, and other encoding parameters are (n= 7,k =4, d=5, and m=6).
Preferably, for different codeword parameters and cost coefficients, the asymmetric repair mode of the hybrid regeneration code can effectively reduce the repair cost of data, and the larger the cost coefficient difference between different clusters is, the better the performance of the asymmetric repair mode is.
Fig. 8 and 9 are a cost comparison of two repair modes under the local bandwidth limitation of the primary cluster of example 3 and a cost comparison of two repair modes under the local bandwidth limitation of the secondary cluster of example 4, respectively, further illustrate that it is preferable to reduce the repair cost by increasing the number of local helping nodes when the primary cluster of the satellite cluster storage network is limited and to reduce the repair cost by reducing the number of local helping nodes when the local repair bandwidth of the secondary cluster is limited.
Hereinafter, a system corresponding to the method shown in fig. 1 according to an embodiment of the present disclosure will be described with reference to fig. 10, and a hybrid regenerative code repair system 100 for a satellite cluster storage network, and fig. 10 is a schematic structural diagram of a hybrid regenerative code repair system for a satellite cluster storage network according to an embodiment of the present disclosure. Since the function of the system 100 is the same as the details of the method described above, a detailed description of the same is omitted herein for simplicity. As shown in fig. 10, the system 100 includes: first-time encoding section 101: the method comprises the steps of performing primary encoding on a file, wherein the primary encoding comprises the steps of performing RS encoding on one part of the file, performing regenerative code encoding on the other part of the file, and storing data after primary encoding to corresponding nodes of each cluster to form a data strip comprising an RS code strip and a regenerative code strip; second encoding unit 102: the method comprises the steps of performing secondary coding, multiplying a reversible generation matrix, superposing the reproduction code data after primary coding and the RS code data after primary coding, and storing the reproduction code data into the reproduction code strip; cluster internal repair unit 103: the method comprises the steps that in a main cluster of a failure node, each local auxiliary node transmits repair data gamma to a new generation node, wherein gamma represents the local repair bandwidth of the main cluster, in an auxiliary cluster of the failure node, each auxiliary node transmits repair data gamma 'to the new generation node, gamma represents the local repair bandwidth of the auxiliary cluster, and the data quantity needed by cross-cluster repair is calculated according to gamma and gamma'; cross-cluster repair unit 104: the method is used for performing cross-cluster repair, and the data repair of the failure node is completed by transmitting the required data quantity to the new node through each cluster of the regenerated code strip. In addition to these 4 units, the system 100 may include other components, however, since these components are not relevant to the contents of the embodiments of the present disclosure, illustration and description thereof are omitted herein.
The following constraint conditions met by each local helping node in the intra-cluster repair unit 103 to transmit repair data γ to the new node and each helping node to transmit repair data γ' to the new node are respectively:
where α represents the number of symbols stored by each node, β i,j Representing the amount of cross-cluster data transmitted from cluster j when repairing failed nodes of cluster i, d representing the number of clusters, k representing any k clusters, m representing the number of nodes per cluster, l representing the number of local helping nodes, said gamma and gamma' not being able to simultaneously take a lower bound value, gamma * Representing the optimal solution for gamma.
In the intra-cluster repair unit 103 and the inter-cluster repair unit 104, when the main cluster of the satellite cluster storage network is limited, the repair cost is reduced by increasing the number of local helping nodes, and when the local repair bandwidth of the auxiliary cluster is limited, the repair cost is reduced by reducing the number of local helping nodes.
The specific working process of the hybrid regenerative code repair system 100 for a satellite cluster storage network refers to the description of the hybrid regenerative code repair method for a satellite cluster storage network, and is not repeated.
Furthermore, a system according to an embodiment of the present invention may also be implemented by means of the architecture of the computing device shown in fig. 11. Fig. 11 illustrates an architecture of the computing device. As shown in fig. 11, a computer system 201, a system bus 203, one or more CPUs 204, input/output components 202, memory 205, and the like. The memory 20 may store various data or files used for computer processing and/or communication and program instructions executed by the CPU. The architecture shown in fig. 11 is merely exemplary, and one or more of the components in fig. 11 may be adapted as needed to implement different devices.
The invention researches the data restoration process when the storage equipment in the satellite cluster storage network fails, not only considers layering isomerism of the storage network, but also considers cost difference caused by link difference among different clusters, and realizes asymmetric restoration of the cross-cluster link according to redundancy of the cross-cluster data on the basis of a hybrid regeneration code restoration principle.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (7)

1. A hybrid regenerative code repair method for a satellite cluster storage network, the method comprising:
s1, performing primary coding on a file, wherein the primary coding comprises performing RS coding on one part of the file and performing regeneration code coding on the other part of the file, and storing data after primary coding to corresponding nodes of each cluster to form a cross-cluster data strip comprising an RS code strip and a regeneration code strip;
s2, superposing the first coded regenerated code data and the first coded RS code data by multiplying a reversible generation matrix, and storing the first coded regenerated code data and the first coded RS code data into the regenerated code strip;
s3, performing intra-cluster repair, wherein each local auxiliary node transmits repair data gamma to a new node in a main cluster of a failure node, wherein gamma represents the local repair bandwidth of the main cluster, each auxiliary node transmits repair data gamma ' to the new node in an auxiliary cluster of the failure node, and gamma ' represents the local repair bandwidth of the auxiliary cluster, and the data quantity required by cross-cluster repair is calculated according to gamma and gamma ';
s4, performing cross-cluster repair, and transmitting required data quantity to the newly generated node through each cluster of the regenerated code strip to finish data repair of the failure node;
the following constraint conditions met by each local help node transmitting repair data gamma to the new node and each help node transmitting repair data gamma' to the new node are respectively:
where α represents the number of symbols stored by each node, β i,j Representing the amount of cross-cluster data transmitted from cluster j when repairing failed nodes of cluster i, d representing the number of clusters, s=k+1, k representing any k of the d clusters, m representing the number of nodes per cluster, l representing the number of local helping nodes, said γ and γ' not being able to simultaneously obtain a lower bound value, γ * Representing the optimal solution for gamma.
2. The hybrid regenerative code repair method of claim 1, wherein the hybrid regenerative code repair cost C includes a data transmission cost inside the cluster and a data transmission cost across the cluster, and the specific expression is:
where d represents the number of clusters, l' represents the number of helping nodes in the secondary cluster of failed nodes,indicating +.>Slave cluster h i Cross-cluster data volume of transmission, < >>Representing clusters i and h i Transmission cost coefficient between->When the node of the cluster i is restored, an index set of the external cluster is represented;
when the local restoration bandwidth gamma 'of the auxiliary cluster satisfies gamma' =alpha, the local restoration bandwidth gamma of the main cluster satisfies the lower bound condition gamma not less than gamma (beta), gamma (beta) =alpha-min { alpha, e [k+1,d+1] β [k+1,d+1] -where β represents an n-dimensional cross-cluster repair bandwidth column vector, β= [ β ] i,1i,2 ,...,β i,n ] Ti,i =0,β [k+1,d+1] Representation [ beta ] i,k+1 ,...,β i,d+1 ] T ,e [k+1,d+1] Representation and beta [k+1,d+1] Full 1 row vector of the same dimension, linear programming problem with cross cluster repair bandwidth vector as optimization variableThe method comprises the following steps:
s.t.β≥0
γ(β)≤γ
wherein, the liquid crystal display device comprises a liquid crystal display device,an external help cluster set representing a cluster inode failure connection,/->Indicating i external help clusters connected by failure of cluster inodes, n indicating different cluster numbers, c i (beta) represents repair x i Repair cost of the intermediate node, B represents the size of the storable file, e i Representing a unit vector;
ρ i representation ofTo->Is transmitted by (a) toColumn vectors of cost coefficients, +.>
β i Representing repairThe middle node corresponds to a column vector composed of cross-cluster repair bandwidths,>
when the local repair bandwidth gamma of the main cluster satisfies gamma=alpha, the local repair bandwidth gamma ' of the auxiliary cluster satisfies the lower bound constraint condition gamma '. Gtoreq.gamma ' (beta) =min { e [i+1,k] β [i+1,k] ,α-e [k+1,d+1] β [k+1,d+1] }/ξ iWherein xi i Linear programming problem with cross-cluster repair bandwidth vector as optimization variable +.>The method comprises the following steps:
3. the hybrid regenerative code repair method of claim 1, wherein repair costs are reduced by increasing the number of local helping nodes when a primary cluster of the satellite cluster storage network is limited, and repair costs are reduced by reducing the number of local helping nodes when a local repair bandwidth of the auxiliary cluster is limited.
4. A hybrid regenerative code repair system for a satellite cluster storage network, the system comprising:
a first-time encoding unit: the method comprises the steps of performing primary encoding on a file, wherein the primary encoding comprises the steps of performing RS encoding on one part of the file, performing regeneration code encoding on the other part of the file, and storing data after primary encoding to corresponding nodes of each cluster to form a cross-cluster data strip containing RS code strips and regeneration code strips;
second encoding unit: the method comprises the steps of performing secondary coding, multiplying a reversible generation matrix, superposing the reproduction code data after primary coding and the RS code data after primary coding, and storing the reproduction code data into the reproduction code strip;
and (3) a cluster internal repair unit: the method comprises the steps that in a main cluster of a failure node, each local auxiliary node transmits repair data gamma to a new generation node, wherein gamma represents the local repair bandwidth of the main cluster, in an auxiliary cluster of the failure node, each auxiliary node transmits repair data gamma 'to the new generation node, gamma represents the local repair bandwidth of the auxiliary cluster, and the data quantity needed by cross-cluster repair is calculated according to gamma and gamma';
a cross-cluster repair unit: the method is used for performing cross-cluster repair, and transmitting required data quantity to a newly generated node through each cluster of a regenerated code strip to finish data repair of a failure node;
the following constraint conditions met by each local help node transmitting repair data gamma to the new node and each help node transmitting repair data gamma' to the new node are respectively:
where α represents the number of symbols stored by each node, β i,j Represents the amount of cross-cluster data transferred from cluster j when repairing failed nodes of cluster i, d tableShowing the number of clusters, s=k+1, k representing any k clusters of d clusters, m representing the number of nodes of each cluster, l representing the number of local help nodes, said γ and γ' being unable to simultaneously obtain a lower bound value, γ * Representing the optimal solution for gamma.
5. The hybrid regenerative code repair system of claim 4 wherein the following constraint conditions met by each local helping node transmitting repair data γ to a new node and each helping node transmitting repair data γ' to a new node in the intra-cluster repair unit are:
where α represents the number of symbols stored by each node, β i,j Representing the amount of cross-cluster data transmitted from cluster j when repairing failed nodes of cluster i, d representing the number of clusters, k representing any k clusters of the d clusters, m representing the number of nodes of each cluster, l representing the number of local helping nodes, the gamma and gamma' not being able to simultaneously obtain a lower bound value, gamma * Representing the optimal solution for gamma.
6. The hybrid regenerative code repair system of claim 4 wherein in the intra-cluster repair unit and the inter-cluster repair unit, repair costs are reduced by increasing the number of local helping nodes when a primary cluster of a satellite cluster storage network is limited, and repair costs are reduced by reducing the number of local helping nodes when a local repair bandwidth of an auxiliary cluster is limited.
7. A hybrid regenerative code repair system for a satellite cluster storage network, comprising: a processor; and a memory, wherein the memory has stored therein a computer executable program which, when executed by the processor, performs the method of any of claims 1-3.
CN202110856458.7A 2021-07-28 2021-07-28 Hybrid regeneration coding repair method and system for satellite cluster storage network Active CN113553212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110856458.7A CN113553212B (en) 2021-07-28 2021-07-28 Hybrid regeneration coding repair method and system for satellite cluster storage network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110856458.7A CN113553212B (en) 2021-07-28 2021-07-28 Hybrid regeneration coding repair method and system for satellite cluster storage network

Publications (2)

Publication Number Publication Date
CN113553212A CN113553212A (en) 2021-10-26
CN113553212B true CN113553212B (en) 2023-07-18

Family

ID=78104769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110856458.7A Active CN113553212B (en) 2021-07-28 2021-07-28 Hybrid regeneration coding repair method and system for satellite cluster storage network

Country Status (1)

Country Link
CN (1) CN113553212B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103688515A (en) * 2013-03-26 2014-03-26 北京大学深圳研究生院 Method for encoding minimum bandwidth regeneration codes and repairing storage nodes
CN106776129A (en) * 2016-12-01 2017-05-31 陕西尚品信息科技有限公司 A kind of restorative procedure of the multinode data file based on minimum memory regeneration code
CN108512553A (en) * 2018-03-09 2018-09-07 哈尔滨工业大学深圳研究生院 A kind of truncation regeneration code constructing method reducing bandwidth consumption
CN110764950A (en) * 2019-10-31 2020-02-07 深圳信息职业技术学院 Hybrid coding method, data restoration method and system based on RS (Reed-Solomon) code and regeneration code

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103688515A (en) * 2013-03-26 2014-03-26 北京大学深圳研究生院 Method for encoding minimum bandwidth regeneration codes and repairing storage nodes
CN106776129A (en) * 2016-12-01 2017-05-31 陕西尚品信息科技有限公司 A kind of restorative procedure of the multinode data file based on minimum memory regeneration code
CN108512553A (en) * 2018-03-09 2018-09-07 哈尔滨工业大学深圳研究生院 A kind of truncation regeneration code constructing method reducing bandwidth consumption
CN110764950A (en) * 2019-10-31 2020-02-07 深圳信息职业技术学院 Hybrid coding method, data restoration method and system based on RS (Reed-Solomon) code and regeneration code

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于稀疏随机矩阵的再生码构造方法;徐志强;袁德砦;陈亮;;计算机应用(第07期);全文 *

Also Published As

Publication number Publication date
CN113553212A (en) 2021-10-26

Similar Documents

Publication Publication Date Title
US9722637B2 (en) Construction of MBR (minimum bandwidth regenerating) codes and a method to repair the storage nodes
CN104052576B (en) Data recovery method based on error correcting codes in cloud storage
US11531593B2 (en) Data encoding, decoding and recovering method for a distributed storage system
CN109491835B (en) Data fault-tolerant method based on dynamic block code
US20140317222A1 (en) Data Storage Method, Device and Distributed Network Storage System
US11500725B2 (en) Methods for data recovery of a distributed storage system and storage medium thereof
CN107003933B (en) Method and device for constructing partial copy code and data restoration method thereof
WO2013191658A1 (en) System and methods for distributed data storage
WO2018171111A1 (en) Multi-fault tolerance mds array code encoding and repair method
WO2018166078A1 (en) Mds array code encoding and decoding method for repairing failure of multiple nodes
US11303302B2 (en) Erasure code calculation method
JP2011504269A (en) Parallel Reed-Solomon RAID (RS-RAID) architecture, device, and method
CN110764950A (en) Hybrid coding method, data restoration method and system based on RS (Reed-Solomon) code and regeneration code
WO2016058289A1 (en) Mds erasure code capable of repairing multiple node failures
Zorgui et al. Centralized multi-node repair regenerating codes
CN113190377B (en) Reliable redundancy method and equipment based on distributed storage system
CN113541870A (en) Recovery optimization method for erasure code storage single node failure
CN103650462B (en) Coding, decoding and the data recovery method of selfreparing code based on homomorphism and storage system thereof
US20150227425A1 (en) Method for encoding, data-restructuring and repairing projective self-repairing codes
CN113553212B (en) Hybrid regeneration coding repair method and system for satellite cluster storage network
CN108304264B (en) Erasure code filing method based on SPARK streaming calculation
US20170255510A1 (en) System and method for regenerating codes for a distributed storage system
CN112000278B (en) Self-adaptive local reconstruction code design method for thermal data storage and cloud storage system
WO2017041233A1 (en) Encoding and storage node repairing method for functional-repair regenerating code
CN110781024A (en) Matrix construction method of symmetrical partial repetition code and fault node repairing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant