CN113708780B - Partial repetition code construction method based on shadow - Google Patents

Partial repetition code construction method based on shadow Download PDF

Info

Publication number
CN113708780B
CN113708780B CN202110931106.3A CN202110931106A CN113708780B CN 113708780 B CN113708780 B CN 113708780B CN 202110931106 A CN202110931106 A CN 202110931106A CN 113708780 B CN113708780 B CN 113708780B
Authority
CN
China
Prior art keywords
code
shadow
phi
heterogeneous
storage capacity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110931106.3A
Other languages
Chinese (zh)
Other versions
CN113708780A (en
Inventor
王静
孙伟
何亚锦
沈克勤
张鑫楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yingsheng Network Technology Co ltd
Original Assignee
Shanghai Yingsheng Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yingsheng Network Technology Co ltd filed Critical Shanghai Yingsheng Network Technology Co ltd
Priority to CN202110931106.3A priority Critical patent/CN113708780B/en
Publication of CN113708780A publication Critical patent/CN113708780A/en
Application granted granted Critical
Publication of CN113708780B publication Critical patent/CN113708780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/47Error detection, forward error correction or error protection, not provided for in groups H03M13/01 - H03M13/37

Abstract

The invention discloses a partial repetition code construction method based on shadow, which comprises the following steps: step 1: dividing an original file M into k original data blocks, and performing (n, k) MDS coding on the k original data blocks to obtain n coded data blocks; step 2: constructing a set X and a set phi according to the number n of the coded data blocks, wherein the set X comprises n different elements, the set phi comprises t subsets phi, the subsets phi are (d+1) element subsets of the set X, the subsets phi comprise (d+1) elements, and each subset phi has no identical element; step 3: obtaining a shadow set of set ψWherein the shadow setThe method comprises t groups of sub-shadow sets, wherein each group of sub-shadow sets comprises (d+1) sets phi ', each set phi ' comprises d elements, and each set phi ' consists of the rest elements after any element in the subset phi is deleted; step 4: according to the shadow setAnd constructing an FR code. The FR code constructed by the invention has lower repairing locality and can not increase along with the increase of system parameters, and meanwhile, the proper node storage capacity and data repeatability can be selected according to the system requirement.

Description

Partial repetition code construction method based on shadow
Technical Field
The invention belongs to the field of computers, and particularly relates to a partial repetition code construction method based on shadow.
Background
With the progress of technology, data needs to be stored more and more, but a traditional storage system cannot meet the requirement of mass data storage, and a distributed storage system capable of storing a large amount of data is generated. In distributed storage systems, there is often a loss of data, so some means are required to ensure the reliability of the data, and "copy" and "erasure code" techniques are typically employed. However, the storage overhead occupied by the copy policy is large, erasure code repair is complex, and the whole file needs to be downloaded for repair in the repair process, so that the required large repair bandwidth overhead is required. Rouayheb and ramchandoran then proposed a partially repeated (Fractional Repetition, FR) code for accurate repair in 2010. The FR code can tolerate the accurate coding-free repair of multiple fault nodes, the repair bandwidth overhead and the calculation complexity are small, and the repair performance of the fault nodes is greatly improved. There are many methods for constructing FR codes, such as those using the Steiner system, paired balance design, etc.
In the existing FR code construction, system parameters cannot be adjusted, for example, a part of repeated codes are constructed by using a regular graph, the repeatability of the repeated codes cannot be changed, and only single-node faults can be repaired. Prajapati proposes a partial repetition code with a ring structure, and parameters cannot be adjusted timely according to system requirements. Based on the FR codes that can be designed in groups, an appropriate storage capacity or repetition level can be selected according to the system requirements, but the repair locality thereof increases as the parameters increase.
Disclosure of Invention
The invention aims to provide a partial repetition code construction method based on shadow, which is used for solving the problems that the storage capacity, the repetition degree and the repair locality are large because the storage capacity and the repetition degree cannot be changed according to the system requirement in the prior art.
In order to realize the tasks, the invention adopts the following technical scheme:
a partial repetition code construction method based on shadow comprises the following steps:
step 1: dividing an original file M into k original data blocks, and performing (n, k) MDS coding on the k original data blocks to obtain n coded data blocks, wherein k is more than or equal to 2 and n is more than or equal to k;
step 2: constructing a set X and a set phi according to the number n of the coded data blocks, wherein the set X comprises n different elements, the set phi comprises t subsets phi, the subsets phi are (d+1) element subsets of the set X, the subsets phi comprise (d+1) elements and no identical elements exist in each subset phi, and d is a positive integer and (d+1) < n;
step 3: obtaining a shadow set of set ψWherein, shadow set->The method comprises t groups of sub-shadow sets, wherein each group of sub-shadow sets comprises (d+1) sets phi ', each set phi ' comprises d elements, and each set phi ' consists of the rest elements after any element in the subset phi is deleted;
step 4: according to the shadow setConstructing an FR code includes three cases:
case one: if the isomorphic FR code is constructed, each node of the isomorphic FR code corresponds to a shadow setThe number of nodes of the isomorphic FR code is t x (d+1), the node storage capacity of the isomorphic FR code is d, the repeatability of the isomorphic FR code is d, and the data block stored by each node of the isomorphic FR code is a corresponding set +.>An element contained;
and a second case: if the repetition rate heterogeneous FR code is constructed, then deleteAny one of the sub-shadow sets phi' of each group, resulting in a pruned shadow set +.>Deleted shadow set->Including t sets of sub-shadow sets, each set of sub-shadow sets comprising d sets phi';
each node of the repetition heterogeneous FR code corresponds to the pruned shadow setThe number of nodes of the repetition degree heterogeneous FR code is t multiplied by d, the node storage capacity of the repetition degree heterogeneous FR code is d, the repetition degree of the repetition degree heterogeneous FR code is d or (d-1), and the data block stored by each node of the repetition degree heterogeneous FR code is an element contained in a corresponding set phi';
and a third case: if the storage capacity heterogeneous FR code is constructed, a pruned shadow set is constructed based on the second casePerforming row-column exchange on the A to obtain a matrix A';
each node of the storage capacity heterogeneous FR code corresponds to each row in the matrix A ', the number of nodes of the storage capacity heterogeneous FR code is the number of rows of the matrix A', the storage capacity of the nodes of the storage capacity heterogeneous FR code is d or d-1, the repeatability of the storage capacity heterogeneous FR code is d, and the data block stored by each node of the storage capacity heterogeneous FR code is the number of columns with 1 element in the corresponding row.
Compared with the prior art, the invention has the following technical characteristics:
1. the partial repetition code based on the shadow construction is a new algorithm, the FR code is constructed by using the algorithm, the method is simpler, more visual and more efficient, and the constructed FR code has lower repairing locality and does not increase with the increase of system parameters.
2. Based on the partial repetition code of the shadow construction, a proper node storage capacity and data repetition degree can be selected according to the system requirement.
Drawings
FIG. 1 is a isomorphic FR code structure based on a shadow construction;
FIG. 2 is a repetitive heterogeneous FR code based on a shadow construction;
fig. 3 is a storage capacity heterogeneous FR code constructed based on shadow.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples. It should be noted that, in the following embodiments, only the objects, technical solutions, and advantages of the present invention will be more apparent to those skilled in the art, and the present invention is not limited to these embodiments.
shadow structure: let X be the set of n elements, letRepresents the set of k elements in all X, there is the set +.>Wherein k is more than or equal to 0 and less than or equal to n. Aggregation
Set of scalesIs the shadow of delta (shadow), wherein +.>Representing the set of all k-1 elements in X. E represents the set +.>F represents a subset of the set delta, shadow set +.>Is a set formed by deleting one element in the set δ.
The embodiment discloses a construction method of a partial repetition code based on a shadow construction, which specifically comprises the following steps:
step 1: dividing the original file M into k original data blocks, and performing (n, k) MDS coding on the k original data blocks to obtain n coded data blocks c 1 ,…,c k-1 ,c k ,c k+1 ,…c n The n coded data blocks comprise k original data blocks and n-k check data blocks, wherein k is more than or equal to 2 and n is more than or equal to k;
step 2: constructing a set X and a set phi according to the number n of the coded data blocks, wherein the set X comprises n different elements, the set phi comprises t subsets phi, the subsets phi are (d+1) element subsets of the set X, the subsets phi comprise (d+1) elements and no identical elements exist in each subset phi, and d is a positive integer and (d+1) < n;
step 3: deleting each element in the subset phi once to obtain a shadow set of the set phiWherein, shadow set->Including t sets of sub-shadow sets, each set of sub-shadow sets comprising (d+1) sets phi ', each set phi' comprising d elements;
step 4: according to the shadow setConstructing an FR code includes three cases:
case one: if the isomorphic FR code is constructed, each node of the isomorphic FR code corresponds to a shadow setThe number of nodes of the isomorphic FR codes is t x (d+1), the node storage capacity of the isomorphic FR codes is d, the repeatability of the isomorphic FR codes is d, and the data blocks stored by each node of the isomorphic FR codes are elements contained in the corresponding set phi';
and a second case: if the repetition rate heterogeneous FR code is constructed, then deleteAny one of the sub-shadow sets phi' of each group, resulting in a pruned shadow set +.>Deleted shadow set->Including t sets of sub-shadow sets, each set of sub-shadow sets comprising d sets phi';
each node of the repetition heterogeneous FR code corresponds to the pruned shadow setThe number of nodes of the repetition degree heterogeneous FR code is t multiplied by d, the node storage capacity of the repetition degree heterogeneous FR code is d, the repetition degree of the repetition degree heterogeneous FR code is d or (d-1), and the data block stored by each node of the repetition degree heterogeneous FR code is an element contained in a corresponding set phi';
and a third case: if the storage capacity heterogeneous FR code is constructed, a pruned shadow set is constructed based on the second casePerforming row-column exchange on the A to obtain a matrix A';
each node of the storage capacity heterogeneous FR code corresponds to each row in the matrix A ', the number of nodes of the storage capacity heterogeneous FR code is the number of rows of the matrix A', the storage capacity of the nodes of the storage capacity heterogeneous FR code is d or d-1, the repeatability of the storage capacity heterogeneous FR code is d, and the data block stored by each node of the storage capacity heterogeneous FR code is the number of columns with 1 element in the corresponding row.
Specifically, in case oneI.e. the i-th set of FR codes i-th node->The i-th set of (a) contains elements corresponding to the data blocks stored by the i-th node of the FR code. According to the set psi and shadow set +.>Is divided into t sub-shadow groups, the s-th sub-shadow group of the FR code is divided into the s-th sub-set of the set psi (0 therein<s.ltoreq.t) and the generated shadow set phis' are correspondingly generated.
Specifically, in case three, each row in matrix A 'represents a storage node, and the ith row in matrix A' represents the ith storage node N in the distributed storage system i I=1, 2, …, n. The FR code is constructed from the following formula:
N i ={j:a ij =1} (2)
j=1, 2, …, n, i denotes the i-th storage node, a ij Representing the value of row i and column j of the matrix. N (N) i Storage node representing FR code, N i The data blocks contained in the data block are the columns corresponding to all 1's of the ith row in the matrix A', the columns are extracted to obtain the data block stored in one node, and the heterogeneous FR codes with the storage capacity of d or d-1 and the repetition degree rho=d of each node can be constructed.
Examples
The embodiment provides a construction method of an infinitely expandable partial repetition (Fractional Repetition, FR) code, and on the basis of the above embodiment, the following technical features are further disclosed:
this embodiment constructs a (12, 9) MDS code using m= (m) 1 ,m 2 ,m 3 ,m 4 ,m 5 ,m 6 ,m 7 ,m 8 ,m 9 ) Representing an original file stored in a distributed storage system, c= (m 1 ,m 2 ,m 3 ,m 4 ,m 5 ,m 6 ,m 7 ,m 8 ,m 9 ,p 10 ,p 11 ,p 12 ) Representing a systematic MDS code, where m 1 ,m 2 ,m 3 ,m 4 ,m 5 ,m 6 ,m 7 ,m 8 ,m 9 Representing an original data block; p is p 10 ,p 11 ,p 12 Representing a check data block.
In this embodiment, the storage capacity of the partial repetition code node in the distributed storage system is defined as d=3, so that a set x= {1,2,3,4,5,6,7,8,9,10,11,12} containing 12 elements is selected, and a 4-element set ψ satisfying the condition is constructed as follows
ψ={{1,4,8,12},{2,5,9,11},{3,6,7,10}} (3)
In this embodiment, 3 subsets are included and therefore divided into 3 sub shadow groups, shadow setsThe following are provided:
in this embodiment, according to the case of step 4, a shadow set corresponds to a distributed storage system, and an isomorphic FR code is constructed as shown in fig. 1 below. Each node stores capacity d=3, repetition rate ρ=3 and is divided into three sub-shadow groups. The repeatability requirement can be met by deleting shadow sets phi' according to the size of the storage capacity of the system.
In this embodiment, a shadow set of shadow obtained by subtracting a subset is deletedThe following are provided:
from a collectionThe repetition of the heterogeneous FR codes is obtained according to case two in step 4, as shown in fig. 2. The FR code repetition rate ρ=2 or 3 of the structure, and the storage capacity d=3.
In this embodiment, shadow shadow setThe corresponding shadow sub-association matrix is shown as follows:
shadow set for shadowThe corresponding shadow sub-associated matrix exchanges rows and columns to obtain the following matrix
In this embodiment, the association matrix a' obtains FR codes with heterogeneous storage capacities according to the third case in step 4, as shown in fig. 3. The node stores a heterogeneous partial repetition code with a capacity of 2 or 3 and a repetition degree of ρ=3.
In this embodiment, it can be seen that constructing an isomorphic FR code has the same storage capacity and the same repetition degree for each storage node, by simply deleting the set, heterogeneous FR codes with different repetition degrees can be constructed, heterogeneous FR codes with different storage capacities of nodes can be constructed by inverting the association matrix, and a proper shadow set can be selected to construct according to the needs of the system for the storage capacity of the nodes and the data repetition degree. It is obvious that this FR code is more suitable for practical distributed storage systems than the normal FR code, and the cost of storing is lower.

Claims (1)

1. A shadow-based partial repetition code construction method, comprising the steps of:
step 1: dividing an original file M into k original data blocks, and performing (n, k) MDS coding on the k original data blocks to obtain n coded data blocks, wherein k is more than or equal to 2 and n is more than or equal to k;
step 2: constructing a set X and a set phi according to the number n of the coded data blocks, wherein the set X comprises n different elements, the set phi comprises t subsets phi, the subsets phi are (d+1) element subsets of the set X, the subsets phi comprise (d+1) elements and no identical elements exist in each subset phi, and d is a positive integer and (d+1) < n;
step 3: obtaining a shadow set of set ψWherein, shadow set->The method comprises t groups of sub-shadow sets, wherein each group of sub-shadow sets comprises (d+1) sets phi ', each set phi ' comprises d elements, and each set phi ' consists of the rest elements after any element in the subset phi is deleted;
step 4: according to the shadow setConstructing an FR code includes three cases:
case one: if the isomorphic FR code is constructed, each node of the isomorphic FR code corresponds to a shadow setThe number of nodes of the isomorphic FR code is t x (d+1), the node storage capacity of the isomorphic FR code is d, the repeatability of the isomorphic FR code is d, and the data block stored by each node of the isomorphic FR code is a corresponding set +.>An element contained;
and a second case: if the repetition rate heterogeneous FR code is constructed, then deleteAny one of the sub-shadow sets phi' of each group, resulting in a pruned shadow set +.>Deleted shadow set->Including t sets of sub-shadow sets, each set of sub-shadow sets comprising d sets phi';
each node of the repetition heterogeneous FR code corresponds to the pruned shadow setThe number of nodes of the repetition degree heterogeneous FR code is t multiplied by d, the node storage capacity of the repetition degree heterogeneous FR code is d, the repetition degree of the repetition degree heterogeneous FR code is d or (d-1), and the data block stored by each node of the repetition degree heterogeneous FR code is an element contained in a corresponding set phi';
and a third case: if the storage capacity heterogeneous FR code is constructed, a pruned shadow set is constructed based on the second casePerforming row-column exchange on the A to obtain a matrix A';
each node of the storage capacity heterogeneous FR code corresponds to each row in the matrix A ', the number of nodes of the storage capacity heterogeneous FR code is the number of rows of the matrix A', the storage capacity of the nodes of the storage capacity heterogeneous FR code is d or d-1, the repeatability of the storage capacity heterogeneous FR code is d, and the data block stored by each node of the storage capacity heterogeneous FR code is the number of columns with 1 element in the corresponding row.
CN202110931106.3A 2021-08-13 2021-08-13 Partial repetition code construction method based on shadow Active CN113708780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110931106.3A CN113708780B (en) 2021-08-13 2021-08-13 Partial repetition code construction method based on shadow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110931106.3A CN113708780B (en) 2021-08-13 2021-08-13 Partial repetition code construction method based on shadow

Publications (2)

Publication Number Publication Date
CN113708780A CN113708780A (en) 2021-11-26
CN113708780B true CN113708780B (en) 2024-02-02

Family

ID=78652670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110931106.3A Active CN113708780B (en) 2021-08-13 2021-08-13 Partial repetition code construction method based on shadow

Country Status (1)

Country Link
CN (1) CN113708780B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2020268A1 (en) * 1989-06-30 1990-12-31 Scott H. Davis Digital data management system
US7043415B1 (en) * 2001-01-31 2006-05-09 Pharsight Corporation Interactive graphical environment for drug model generation
JP2011242818A (en) * 2010-04-21 2011-12-01 Allied Engineering Corp Parallel finite element calculation system
WO2014153716A1 (en) * 2013-03-26 2014-10-02 北京大学深圳研究生院 Methods for encoding minimum bandwidth regenerating code and repairing storage node
CN110990375A (en) * 2019-11-19 2020-04-10 长安大学 Method for constructing heterogeneous partial repeat codes based on adjusting matrix
CN110990188A (en) * 2019-11-19 2020-04-10 长安大学 Construction method of partial repetition code based on Hadamard matrix
CN111125014A (en) * 2019-11-19 2020-05-08 长安大学 Construction method of flexible partial repeat code based on U-shaped design

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2020268A1 (en) * 1989-06-30 1990-12-31 Scott H. Davis Digital data management system
US7043415B1 (en) * 2001-01-31 2006-05-09 Pharsight Corporation Interactive graphical environment for drug model generation
JP2011242818A (en) * 2010-04-21 2011-12-01 Allied Engineering Corp Parallel finite element calculation system
WO2014153716A1 (en) * 2013-03-26 2014-10-02 北京大学深圳研究生院 Methods for encoding minimum bandwidth regenerating code and repairing storage node
CN110990375A (en) * 2019-11-19 2020-04-10 长安大学 Method for constructing heterogeneous partial repeat codes based on adjusting matrix
CN110990188A (en) * 2019-11-19 2020-04-10 长安大学 Construction method of partial repetition code based on Hadamard matrix
CN111125014A (en) * 2019-11-19 2020-05-08 长安大学 Construction method of flexible partial repeat code based on U-shaped design

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Optimal Fractional Repetition Codes Based on Graphs and Designs;Natalia Silberstein etal.;《IEEE TRANSACTIONS ON INFORMATION THEORY》;第61卷(第8期);全文 *
异构部分重复码的构造;孙伟等;《计算机系统应用》;第第30卷卷(第第2期期);全文 *

Also Published As

Publication number Publication date
CN113708780A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
US20120173932A1 (en) Storage codes for data recovery
CN106484559B (en) A kind of building method of check matrix and the building method of horizontal array correcting and eleting codes
US8928503B2 (en) Data encoding methods, data decoding methods, data reconstruction methods, data encoding devices, data decoding devices, and data reconstruction devices
JP3978195B2 (en) Method and system for minimizing the length of a defect list in a storage device
CN106788891A (en) A kind of optimal partial suitable for distributed storage repairs code constructing method
CN110032470B (en) Method for constructing heterogeneous partial repeat codes based on Huffman tree
US20170302294A1 (en) Data processing method and system based on quasi-cyclic ldpc
CN113258936B (en) Dual coding construction method based on cyclic shift
CN113708780B (en) Partial repetition code construction method based on shadow
EP3648357B1 (en) Encoding method and apparatus, and computer storage medium
CN111125014B (en) Construction method of flexible partial repeat code based on U-shaped design
CN110990188B (en) Construction method of partial repetition code based on Hadamard matrix
CN110990375B (en) Method for constructing heterogeneous partial repeat codes based on adjusting matrix
CN106788454B (en) Construction method of local unequal codes
CN114285420A (en) Iterative matrix-based construction method of partial repetition codes and node restoration method
Elishco et al. The entropy rate of some Pólya string models
CN109634953A (en) A kind of weight quantization Hash search method towards higher-dimension large data sets
Antoniou et al. Compressing biological sequences using self adjusting data structures
CN107665152A (en) The interpretation method of a kind of correcting and eleting codes
CN113611354A (en) Protein torsion angle prediction method based on lightweight deep convolutional network
Silberstein et al. Optimal fractional repetition codes
Paunkoska et al. Improving DSS efficiency with shortened MSR codes
CN110909027A (en) Hash retrieval method
CN117171497B (en) Sparse matrix storage method, device, equipment and storage medium
Zhang et al. Two families of LRCs with availability based on iterative matrix

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20231226

Address after: Room A1-955, No. 58 Fumin Branch Road, Chongming District, Shanghai, 202150

Applicant after: Shanghai Yingsheng Network Technology Co.,Ltd.

Address before: 710064 No. 126 central section of South Ring Road, Yanta District, Xi'an, Shaanxi

Applicant before: CHANG'AN University

GR01 Patent grant
GR01 Patent grant