WO2018209541A1 - 基于t-设计的部分重复码的编码结构以及构造方法 - Google Patents

基于t-设计的部分重复码的编码结构以及构造方法 Download PDF

Info

Publication number
WO2018209541A1
WO2018209541A1 PCT/CN2017/084430 CN2017084430W WO2018209541A1 WO 2018209541 A1 WO2018209541 A1 WO 2018209541A1 CN 2017084430 W CN2017084430 W CN 2017084430W WO 2018209541 A1 WO2018209541 A1 WO 2018209541A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
node
design
partial repetition
repetition code
Prior art date
Application number
PCT/CN2017/084430
Other languages
English (en)
French (fr)
Inventor
朱兵
李挥
王菡
杨昕
Original Assignee
北京大学深圳研究生院
深圳赛思鹏科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学深圳研究生院, 深圳赛思鹏科技发展有限公司 filed Critical 北京大学深圳研究生院
Priority to PCT/CN2017/084430 priority Critical patent/WO2018209541A1/zh
Publication of WO2018209541A1 publication Critical patent/WO2018209541A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes

Definitions

  • the present invention relates to the field of distributed storage, and in particular, to a coding structure and a construction method of a partial repetition code based on a t-design.
  • the actual file system usually uses a cheap commercial computer as a storage node, which has low storage overhead and good scalability.
  • the ever-expanding system scale increases the probability of failures, such as offline nodes and sudden power outages, which makes the system reliability face a severe test.
  • large-scale distributed file systems need to introduce data redundancy mechanisms.
  • Traditional data copy-based solutions are simple and easy to manage, and support efficient data recovery.
  • the disadvantage of the backup mechanism is that the storage overhead is large and the storage efficiency is low. Especially when storing big data files, the overhead caused by the copy is not negligible.
  • the encoding method usually uses the MDS code because the MDS code can achieve the best storage space efficiency.
  • an MDS code with a parameter of (n, k) divides the original file of size M into k equal-sized data blocks, and generates n coding blocks by encoding, and stores them on n different nodes respectively.
  • the original file can be reconstructed by satisfying the data stored by any k nodes, as shown in Figure 1(a).
  • This process is called a data reconstruction process, and this data reconstruction feature is called an MDS attribute.
  • This coding technology plays an important role in providing effective network storage redundancy, especially for large file storage and archive data backup applications.
  • the RS code is a typical codeword that satisfies the characteristics of the MDS code.
  • the replacement node needs to randomly connect d among the remaining available storage nodes and download data of size ⁇ from the d storage nodes respectively, so the repair bandwidth is d ⁇ .
  • the original file is not reconstructed during the node repair process of the RGC code, so the repair bandwidth is better than the RS code.
  • Dimakis et al. also presented a functional repair model of the RGC code and proposed two types of optimal codes for the RGC code: minimum storage regeneration (MSR) code and minimum repair bandwidth regeneration (MBR) code.
  • the repair process of the regenerated code is computationally complex, and usually involves a large number of finite field operations, that is, the repair node needs to perform a random linear network coding operation on the data stored therein. Specifically, the node participating in the repair reads out the stored data block and performs a specific linear operation, and then passes the combined data block to the replacement node. In order to satisfy that all coding packets are independent of each other, the operation of the RGC code needs to be in a large finite field. Considering that the node read and write bandwidth is less than the network bandwidth in the actual system, the read and write bandwidth can easily become a system performance bottleneck. In order to reduce the computational complexity of the repair process, [S. El Rouayheb and K.
  • FR code can provide accurate and effective repair.
  • the FR code contains two parts: an external MDS code and an internal repeat code. After the data block is encoded by the MDS, the output code block is copied by f times and then distributed to each storage node. When a node failure occurs in the system, the repair can be completed by directly downloading data from other nodes and storing it to the replacement node, without additional operations. Compared with the traditional RS code and RGC code, the FR code greatly improves the node failure repair speed, thus reducing the repair time.
  • Patent PCT/CN2012/071177 proposes an RGC code construction method in which repairing a lost coding module requires only a small amount of data, without the need to reconstruct the entire file.
  • the RGC code uses the linear network coding idea, combined with the maximum flow minimum cut theory to improve the bandwidth overhead required to repair an encoding module.
  • the node repair can be completed by downloading and losing the same amount of data from the unfailed node.
  • the main idea of the RGC code is to use the MDS attribute.
  • MDS attribute When some storage nodes in the network fail, it is equivalent to storing data loss. It is necessary to download information from the existing effective nodes to make the lost data repair the lost data module and store it in the data module.
  • On the new node Over time, many of the original nodes may fail, some The regenerated new node can re-execute the regeneration process itself, which in turn generates more new nodes. Therefore, the regeneration process needs to ensure two points: 1) the failed nodes are independent of each other, the regeneration process can be cyclically recursive; 2) any k nodes are sufficient to recover the original file.
  • Figure 2 depicts the regeneration process for a node failure.
  • the system contains n storage nodes, and each node stores the amount of data as ⁇ .
  • Each storage node i can be represented by a pair of nodes X i in , X i out , which are connected by an edge having a capacity of the node (ie, ⁇ ).
  • the regeneration process is described by an information flow diagram, and X in downloads ⁇ data from any of the available nodes, respectively.
  • any one of the receivers can access X out .
  • the maximum information flow from the source to the sink is determined by the minimum cut set in the graph.
  • the size of the stream cannot be smaller than the size of the original file.
  • each node stores at least M/k bits, so it can be derived from the MSR code.
  • MSR minimum storage regeneration
  • MSR minimum bandwidth regeneration
  • each node stores at least M/k bits, so it can be derived from the MSR code.
  • d takes the maximum value, that is, a newcomer communicates with all n-1 nodes that have not failed at the same time
  • the repair bandwidth ⁇ MSR is the minimum
  • the newly generated module can contain data different from the lost node, as long as the repaired system retains the MDS code attribute (the core technology is network coding);
  • Partially accurate repair of the system a hybrid repair model between exact repair and functional repair.
  • the exact recovery is used for uncoded data, ie the recovered information is the same as the information stored by the failed node; Data blocks, no need to be fixed, only functional repairs can make the recovered information Meet the MDS code attributes (the core technology is interference queue and network coding).
  • the RGC code scheme repair process is computationally complex, usually involving a large number of finite field operations, which is one order higher than the traditional erasure code, which reduces the speed of node failure repair.
  • the node involved in the repair needs to read out the stored data block and perform a specific linear operation, and then pass the combined data block to the replacement node.
  • the node read and write bandwidth is less than the network bandwidth in the actual system, the read and write bandwidth can easily become a system performance bottleneck.
  • a method for constructing a partial repetition code is proposed in the patent PCT/CN2014/078539, which adopts a group design theory to design a specific construction method of the FRC.
  • the technique used can select the construction parameters within a certain range and construct different FRC codes by adjusting the grouping of the design. If the groupable design used in the construction process is decomposable, the system node size can be flexibly selected. Further analysis shows that the constructed codeword can reach the system storage capacity in random access mode and achieve theoretical optimization. Although the constructed codewords use a table-based node repair method, the analysis shows that the nodes in the system still have a large number of repair options.
  • the FRC code construction method based on the group design can complete the codeword construction within a certain parameter range, but the actual optional construction parameters are very limited. Due to the particularity of group design, the design parameters currently known internationally are limited to a certain range. Therefore, given a groupable design, the constructed codewords can only be deployed to a specific parameter storage system. Considering the diversity of the actual storage system environment, the codeword design cannot be widely applied to actual storage systems.
  • the present invention provides a coding structure and a construction method of a partial repetition code based on a t-design, which solves the problem that the coding mode node used in the prior art has a large bandwidth consumption during the repair process.
  • the communication overhead required is large and the computational complexity is high.
  • a coding structure of a partial repetition code based on t-design is designed and manufactured, which is composed of an external MDS code and a partial repetition code, and the coding structure identifier is a TFRC code structure;
  • the partial repetition code copies an arrangement of a plurality of coding blocks of a finite multiple on the storage node, and ensures that the copies of each coding block are respectively stored on different nodes; after the data blocks are encoded by the external MDS, the output coding blocks are respectively copied.
  • the finite multiple is redistributed to each storage node.
  • the partial repetition code is used for distributed storage of parameters (n, k, d)
  • the partial repetition code is represented by an association matrix, and the row sum of each row in the correlation matrix and the column sum of each column are constant values, and the values of at least one of the positions of any two rows are different.
  • the partial repetition code adopts a t-design configuration, which is specifically: taking a given simple t-(v, s, ⁇ ) design Let the r-order matrix be W r , 1 ⁇ r ⁇ t, then the matrix W r generates a parameter Part of the repeat code.
  • the partial repeat code will Code blocks are evenly copied ⁇ r times and stored into one containing Node storage system, where each node can store Data blocks.
  • the invention also provides a t-design based partial repetition code construction method, comprising the following steps: constructing by an external MDS code and a partial repetition TFRC code, after the data block is encoded by the external MDS, the output coding block is separately copied limited.
  • the repetition is spread over the storage nodes; the partial repetition code copies an arrangement of a plurality of coding blocks of a limited multiple on the storage node, while ensuring that the copies of each coding block are respectively stored on different nodes.
  • Each element belongs to f subsets of M, f is called the copy multiple of each data block; each subset corresponds to one storage node, and each node has a storage capacity of d.
  • the partial repetition code is represented by an association matrix, and the row sum of each row in the correlation matrix and the column sum of each column are constant values, and the values of at least one of the positions of any two rows are different.
  • the partial repetition code adopts a t-design structure, which is specifically: taking a given simple t-(v, s, ⁇ ) design Let the r-order matrix be W r , 1 ⁇ r ⁇ t, then the matrix W r generates a parameter Partial repeat code; the partial repeat code will Code blocks are evenly copied ⁇ r times and stored into one containing Node storage system, where each node can store Data blocks.
  • the TFRC code adopts a table-based repair method, and neither the repair process nor the reconstruction process involves complicated finite field operations.
  • TFRC t-design based partial repetition code
  • the beneficial effects of the present invention are that the t-design based partial repetition code (TFRC) significantly reduces the computational complexity in the node repair process, replacing complex finite field operations with simple and easy to implement data copies.
  • the construction of traditional RGC codes is based on finite fields. The finite field addition, subtraction and multiplication involved in the data recovery process. Although the theoretical research is mature, the practical application is cumbersome and time-consuming, which obviously cannot meet the current distribution.
  • the fast and reliable design index of the storage system; the TFRC code is different.
  • the node failure repair in the system can be repaired by directly downloading data from other nodes and storing it to the replacement node, without additional operations, thereby greatly improving the node repair and data.
  • the rate of block regeneration has high application value and development potential in practical distributed storage systems.
  • the partial repetition code based on t-design not only reduces the computational complexity in the node repair process, but also ensures that the bandwidth consumed during the node repair process is minimal and does not consume redundant bandwidth; the TFRC code can guarantee: 1) loss The coding block can be directly downloaded to repair several subsets of other coding modules; 2) the lost coding block can be repaired by a fixed number of coding modules, and the repair mode is table based. At the same time, the data stored by the node after the TFRC code is repaired is completely consistent with the failed node, that is, the exact repair, which greatly reduces the system operation complexity (such as metadata update, etc.).
  • FIG. 1 is a schematic diagram of a source file recovery process in the prior art
  • FIG. 2 is a schematic diagram of a recovery process of a failed node in the prior art
  • FIG. 3 is a schematic diagram of a RGC code regeneration process in the prior art
  • FIG. 4 is a schematic diagram of a (4, 2, 3) distributed storage system using an FRC code according to the present invention
  • FIG. 5 is a schematic diagram showing an example of a TFRC code configuration of the present invention.
  • FIG. 6 is a schematic diagram of another TFRC code configuration example of the present invention.
  • a coding structure of a partial repetition code based on t-design which is composed of an external MDS code and a partial repetition code, whose coding structure is identified as a TFRC code structure; and the partial repetition code replicates a plurality of coding blocks of a finite multiple at a storage node
  • the above arrangement ensures that the copies of each coding block are respectively stored on different nodes; after the data blocks are encoded by the external MDS, the output coding blocks are respectively copied by a limited multiple and then distributed to the storage nodes.
  • the partial repetition code is represented by an association matrix, and the row sum of each row in the correlation matrix and the column sum of each column are constant values, and the values of at least one position of any two rows are different.
  • the partial repetition code adopts a t-design structure, which is specifically: taking a given simple t-(v, s, ⁇ ) design Let the r-order matrix be W r , 1 ⁇ r ⁇ t, then the matrix W r generates a parameter Part of the repeat code.
  • the partial repeat code will Code blocks are evenly copied ⁇ r times and stored into one containing Node storage system, where each node can store Data blocks.
  • the invention also provides a t-design based partial repetition code construction method, comprising the following steps: After the external MDS code and the partial repeat TFRC code are constructed, after the data block is encoded by the external MDS, the output code blocks are respectively copied and finitely dispersed and distributed to the storage nodes; the partial repetition code copies a plurality of coding blocks of the finite multiple at the storage node. An arrangement above that ensures that copies of each code block are stored separately on different nodes.
  • the partial repetition code is represented by an association matrix, and the row sum of each row in the correlation matrix and the column sum of each column are constant values, and the values of at least one position of any two rows are different.
  • the partial repetition code adopts a t-design structure, which is specifically: taking a given simple t-(v, s, ⁇ ) design Let the r-order matrix be W r , 1 ⁇ r ⁇ t, then the matrix W r generates a parameter Partial repeat code; the partial repeat code will Code blocks are evenly copied ⁇ r times and stored into one containing Node storage system, where each node can store Data blocks.
  • the TFRC code adopts a table-based repair method, and neither the repair process nor the reconstruction process involves complex finite field operations.
  • the partial repeat code construction method based on the t-design of the present invention is mainly directed to a conventional storage system, and the system structure is relatively complicated, and the coding mode node repairing bandwidth consumption is large, and the communication overhead required in the repair process is large.
  • a high computational complexity problem is proposed.
  • a new t-design based FRC code is proposed to reduce the computational complexity in the codec process, called TFRC code.
  • the codeword effectively reduces the system repair bandwidth, ensures accurate regeneration of lost data, and improves the effectiveness of the repair process after node failure (including repair bandwidth, computational overhead, and repair time).
  • the solution of the invention greatly expands the construction parameters of the FRC code, and thus can construct the codeword within a larger parameter range.
  • the TFRC code satisfies the basic properties of the MDS, that is, repairing a lost coding module requires only a small amount of data, without refactoring the entire file.
  • the output code block is copied by f times and then distributed to each storage node.
  • the TFRC code encoding process consists of two parts: an external MDS code and an internal repeat code.
  • the output code block is copied several times and then distributed to each storage node. When a node fails in the system, it can be downloaded and stored directly from other nodes. To replace the node to complete the repair, no additional operations are required.
  • the TFRC code can select construction parameters over a wide range, and for a given t-design, 4t different FRC code words can be constructed. In addition, the TFRC code can reduce the computational complexity of the node repair process and reduce the system repair time.
  • the traditional regenerative code repair process has a relatively high computational complexity and usually involves a large number of finite field operations.
  • the node participating in the repair reads out the stored data block and performs a specific linear operation, and then passes the combined data block to the replacement node. Considering that the node read and write bandwidth is less than the network bandwidth in the actual system, the read and write bandwidth can easily become a system performance bottleneck.
  • TFRC code partial repeat code structure based on t-design
  • the minimum number of nodes, d represents the number of available nodes required to repair a failed node, and satisfies k ⁇ d ⁇ n-1.
  • the output coding block (may be set to ⁇ ) is copied by f times and then distributed to each storage node. For system users, data can be downloaded from any k nodes and the original file can be reconstructed according to the MDS characteristics.
  • the partial repetition code is essentially an arrangement of ⁇ coding blocks with a multiple of f on the storage node, while ensuring that copies of each coding block are stored separately on different nodes.
  • the above FR code is referred to as a codeword whose parameter is (n, d, ⁇ , f), where f is called the copying multiple of each data block.
  • a partial repetition code can be represented by an association matrix at the same time.
  • each row of the association matrix represents one storage node, and each column corresponds to one coding block. Therefore, an association matrix corresponding to the FRC codeword of (n, d, ⁇ , f) has a row sum of d for each row and a column sum of f for each column.
  • the correlation matrix corresponding to the FRC code in FIG. 3 is
  • the construction of the FRC codeword is equivalent to constructing such a (0,1) matrix, the row sum of each row and the column sum of each column are constant values, and the value of at least one position of any two rows is not same.
  • a combinatorial design is a two-tuple Where X is a set of points, and the number of points contained in X is called the order of the design; Is a subset of X, each subset is called a block. Usually v represents the cardinality of the set X and b represents the set Cardinality, ie
  • v, A composite design can also be seen as an associative structure. If a point in X is contained in a block, it is said to be associated with the block, otherwise it is irrelevant.
  • any t-design is also a 1-design, that is, any point appears in the same number of blocks.
  • the number of blocks b contained in a t-(v, s, ⁇ ) design is
  • each row of the matrix W corresponds to A block of each column, each column corresponding to a subset of X r elements.
  • the size of each block in the middle is s, and r ⁇ t ⁇ s, then the number of r-th subsets of each block can cover the set of points X is That is, the line and line of each line
  • a t-(v, s, ⁇ ) design is also an r-(v, s, ⁇ r ) design, ie any r-ary subset is included in the ⁇ r block, so each The column of the column is ⁇ r .
  • Codeword construction takes a given simple t-(v, s, ⁇ ) design Let the r-order matrix be W r , 1 ⁇ r ⁇ t, then the matrix W r can generate a parameter as FRC code word.
  • the FRC code constructed by the t-design is referred to as a TFRC code.
  • the constructed codeword can Code blocks are evenly copied ⁇ r times and stored into one containing Node storage system, where each node can store Data blocks.
  • the TFRC code in FIG. 4 can be constructed by the matrix, and each coding block has a copying multiple of 5.
  • the TFRC code in FIG. 5 can be constructed by the matrix, and the coding multiple of each coding block is 2.
  • J be a The all-one matrix, that is, each element in the matrix is 1. Further, the matrix JW r can generate a parameter as FRC code. Similarly, the matrix (JW r ) T can generate a parameter as FRC code.
  • a total of t TFRC codewords can be generated by its r-order matrix W r , 1 ⁇ r ⁇ t.
  • W r For each matrix W r , combining matrix operations (transpose, subtraction, etc.), it is further possible to generate 3 different FRC codes. Therefore, a t-design can construct up to 4t TFRC codewords, greatly extending the construction parameters of existing FRC codewords.
  • the TFRC code covers all the features of the FR code.
  • the copying multiple of each data block is the same, and the storage capacity of each node of the system is the same.
  • the TFRC code uses a table-based repair method.
  • the repair form specifies the repair options that are selectable for each particular failed node. For example, considering the TFRC code in FIG. 4, if node N1 fails, repairs can be made through nodes N2 and N3 instead of nodes N3 and N4.
  • the following table gives each node failure repair scheme for the TFRC codeword in Figure 5.
  • Dead node Nodes that can participate in the repair N1 N2, N3 and N8 N2 N1, N5 and N7 N3 N1, N4 and N9 N4 N3, N5 and N6 N5 N2, N4 and N10 N6 N4, N7 and N8 N7 N2, N6 and N9 N8 N1, N6 and N10
  • the actual storage system deployment usually includes a tracker server for recording system metadata. Therefore, the repair form information can be written to the metadata for quick access reading of the fail-safe. In terms of reducing the complexity of the repair process, the cost of establishing and maintaining a node repair form is worthwhile.
  • the present invention proposes a novel t-design based FR code construction - TFRC code.
  • the t-design based scheme is more concise and intuitive, and can support more construction parameters.
  • the TFRC code simplifies the computational complexity of the node repair process and is simpler and easier to implement.
  • the construction of the TFRC code is more concise and intuitive, and the node repair efficiency is higher.
  • the TFRC code uses a table-based repair method, its repair process and reconstruction process do not involve complex finite field operations, so the computational complexity is low, the computational overhead is small, and the system repair delay is greatly reduced.
  • the actual storage system; TFRC code can ensure that no additional bandwidth is needed in the repair process (the bandwidth consumption is only the lost data size), and TFRC can realize the node accurate repair, that is, the data after the system repair is completely consistent with the data lost by the node.
  • the TFRC code is easy to implement and low in repair cost, so it has considerable application prospects in practical large-scale storage systems.

Landscapes

  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Error Detection And Correction (AREA)

Abstract

本申请提供一种基于t-设计的部分重复码的编码结构及其构造方法。该编码结构由外部MDS码以及部分重复码组成,数据块经过外部MDS编码后,输出的编码块分别复制有限倍再分散到各存储节点上;部分重复码表示复制有限倍数的多个编码块在存储节点上的一种排列,同时保证每个编码块的副本分别存储在不同的节点上。通过本申请提供的技术方案,降低了节点修复过程中的计算复杂度,同时保证节点修复过程中所消耗的带宽是最小的,并不消耗多余的带宽。

Description

基于t-设计的部分重复码的编码结构以及构造方法 【技术领域】
本发明涉及分布式存储领域,尤其涉及一种基于t-设计的部分重复码的编码结构以及构造方法。
【背景技术】
随着计算机技术的飞速发展和互联网应用的不断普及,网络信息量呈现出爆炸性的增长。在大数据时代,海量数据对存储系统提出了严峻的挑战。传统集中式的文件系统已经无法满足大数据的存储以及处理需求,建立支持海量数据存储的新型文件系统成了大数据领域的一个重要研究课题。近年来,云计算技术的不断发展,使得分布式云存储技术成为一种有效的存储解决方案。分布式存储系统采用云计算的理念,通过集群网格技术以及分布式文件系统等功能,将分布在不同区域的独立存储设备通过网络联合起来进行协同工作,共同对用户提供数据存储和访问功能。分布式存储系统以其高效的存储性能,如高可用性、高可扩展性等,日益成为现代主流存储系统。
实际的文件系统通常采用廉价的商业计算机作为存储节点,存储开销低同时具有良好的扩展性。然而不断扩大的系统规模增加了故障发生的概率,如节点离线、突发断电等,使得系统可靠性面临严峻的考验。为了保证数据的可用性,大规模分布式文件系统需要引入数据冗余机制。传统的基于数据拷贝的方案简单易于管理,并且支持高效的数据恢复。备份机制的缺点在于存储开销大,存储效率低。尤其是存储大数据文件的时候,副本引起的开销是不可忽略的。
研究表明,在相同冗余的情况下,纠删码技术可以大幅地提高系统的存储效率。在目前的存储系统中,编码方法通常采用MDS码,因为MDS码可以实现存储空间效率的最佳。具体来说,一个参数为(n,k)的MDS码将大小为M的原文件均分成k个大小相等的数据块,通过编码生成n个的编码块,并分别存储在n个不同节点上,且满足任意k个节点存储的数据就可重构原始文件,如图1(a)所示。这一过程称为数据重建过程,该数据重构特性称为MDS属性。这种编码技术在提供有效的网络存储冗余中占有重要的地位,特别适合大文件存储以及档案数据备份应用。特别地,RS码是一种典型的满足MDS码特性的一种码字。
当存储系统中出现节点失效时,为了保持存储系统的冗余量,需要恢复该失效节点存储的数据并存储在新的节点中,该过程称为节点修复过程。对于传统的RS码,其修复过程首先需要从k个存储节点下载数据并恢复出原始文件,进而重新编码再生出 丢失的数据并存储到新引入的节点上,如图1(b)所示。为了恢复一个存储节点的数据而解码出整个原始文件,显然对网络带宽是一种浪费。
为了降低修复过程中所使用的带宽,文[A.G.Dimakis,P.B.Godfrey,Y.Wu,M.Wainwright,and K.Ramchandran,“Network coding for distributed storage systems,”IEEE Trans.Inf.Theory,vol.56,no.9,pp.4539–4551,Sep.2010]利用网络编码理论的思想提出了再生码(RGC)的概念。RGC码也满足MDS属性,即n个节点中的任意k个节点存储的数据可以恢复出原数据文件。传统的再生码修复过程中,替换节点需要在剩下的可用存储节点中随机连接d个并分别从这d个存储节点下载大小为β的数据,所以其修复带宽为dβ。RGC码的节点修复过程中不需要重构出原文件,因此修复带宽优于RS码。此外,Dimakis等人同时给出了RGC码的功能修复模型并提出了RGC码的两类最佳码:最小存储再生(MSR)码和最小修复带宽再生(MBR)码。
然而,再生码的修复过程计算复杂度比较高,通常涉及大量的有限域运算,即修复节点需要对其存储的数据执行随机线性网络编码操作。具体地说,参与修复的节点读出所存储的数据块并进行特定的线性运算,再向替换节点传递组合后的数据块。为了满足所有编码包是相互独立的,RGC码的运算需要在一个较大的有限域内。考虑到实际系统中节点读写带宽小于网络带宽,因此读写带宽很容易成为系统性能瓶颈。为了降低修复过程运算复杂度,文[S.El Rouayheb and K.Ramchandran,“Fractional repetition codes for repair in distributed storage systems,”Annual Allerton Conference on Communication,Control,and Computing,Oct.2010]在MBR码的基础上提出了FR码的概念,指出了FR码可以提供精确有效的修复。一般地,FR码包含两个部分:一个外部MDS码以及一个内部重复码。数据块经过MDS编码后,将输出的编码块复制f倍再分散到各存储节点。系统中发生节点失效时,可以通过从其它节点直接下载数据并存储到替换节点来完成修复,不需要额外的运算。相比传统的RS码和RGC码,FR码大幅提升了节点失效修复速度,从而降低了修复时间。
专利PCT/CN2012/071177中提出了一种RGC码构造方法,该方案中修复一个丢失的编码模块只需要一小部分的数据量,而不需要重构整个文件。RGC码采用线性网络编码思想,结合最大流最小割理论来改善修复一个编码模块所需要的带宽开销。特别地,在RGC码的节点修复过程中,从未失效的节点中下载和丢失模块相同数据量就可以完成节点修复。
RGC码主要思想还是利用MDS属性,当网络中一些存储节点失效,也就相当于存储数据丢失,需要从现有有效节点中下载信息来使得丢失的数据修复丢失的数据模块,并将其存储在新的节点上。随着时间的推移,很多原始节点可能都会失效,一些 再生的新节点可以在自身再重新执行再生过程,继而生成更多的新节点。因此再生过程需要确保两点:1)失效的节点是相互独立的,再生过程可以循环递推;2)任意k个节点就足够恢复原始文件。
图2描述了一个节点失效的再生过程。系统中包含n个存储节点,且每个节点存储的数据量为α。当出现节点失效时,新节点通过从d≥k个可用节点中下载数据来进行数据恢复,其中每个连接的节点传输下载量为β,因此修复带宽为γ=dβ。每个存储节点i可以通过一对节点Xi in,Xi out来表示,这对节点通过一个容量为该节点的存储量(即α)的边连接。再生过程通过一个信息流图描述,Xin从任意d个可用节点中分别下载β个数据,通过
Figure PCTCN2017084430-appb-000001
在Xout中存储α个数据,任何一个接收者都可以访问Xout。从信源到信宿的最大信息流是由图中最小割集决定,当信宿要重构原始文件时,这个流的大小不能低于原始文件的大小。
研究表明,每个节点存储量α和再生一个节点所需要的带宽γ之间存在一个折中,在折中曲线的两端极值点分别对应最小存储再生(MSR)码和最小带宽再生(MBR)码。对于最小存储再生,每个节点至少存储M/k比特,因此可推出MSR码中
Figure PCTCN2017084430-appb-000002
当d取最大值即一个新来者同时和所有未失效的n-1个节点通信时,修复带宽γMSR最小即
Figure PCTCN2017084430-appb-000003
而MBR码拥有最小修复带宽,可以推出当d=n-1时,获得最小修复带宽为:
Figure PCTCN2017084430-appb-000004
对于节点失效修复问题,该专利考虑了以下三种修复模型:
精确修复:失效的模块需要精确构造,恢复的信息和丢失的完全一致(核心技术为干扰队列和网络编码);
功能修复:新产生的模块可以包含不同于丢失节点的数据,只要修复后的系统保留MDS码属性(核心技术为网络编码);
系统部分精确修复:是介于精确修复和功能修复之间的一个混合修复模型,在这个混合模型中,对于未编码数据采用精确恢复,即恢复的信息和失效节点所存储的信息一样;对于编码数据块,则不需要精确修复,只需要功能修复使得恢复的信息能够 满足MDS码属性(核心技术为干扰队列和网络编码)。
为了将RGC码部署到实际的分布式存储系统中,即使不是最优情况也至少需要从k个节点下载数据才能修复丢失模块,因此尽管其修复过程所需要的数据传输量比较低,RGC码需要较高的协议负载和系统设计(网络编码技术)复杂度来实现。此外,RGC码中未考虑工程解决方法,如懒修复过程,因此不能避免临时失效所带来的修复负载。最后,RGC码方案修复过程计算复杂度比较高,通常涉及大量的有限域运算,比传统的纠删码要高一个阶数,这降低了节点失效修复的速度。通常,参与修复的节点需要读出所存储的数据块并进行特定的线性运算,再向替换节点传递组合后的数据块。考虑到实际系统中节点读写带宽小于网络带宽,因此读写带宽很容易成为系统性能瓶颈。
专利PCT/CN2014/078539中提出了一种部分重复码(FRC)的构建方法,该方案采用可分组设计理论设计了FRC的具体构造方法。所采用的技术可以在一定范围内选择构造参数,并通过调整设计的分组,进而构造出不同的FRC码。如果构造过程中采用的可分组设计是可分解的,系统节点规模则可以灵活地选择。进一步分析得出,所构造出的码字可以达到随机访问模式下的系统存储容量,达到理论上的最优。尽管构造出的码字采用基于表格的节点修复方式,分析表明系统中的节点仍然具有大量的修复选择方案。基于可分组设计的FRC码构造方法能在一定参数范围内完成码字构造,然而实际可选的构造参数是非常有限的。由于可分组设计的特殊性,目前国际上已知的设计参数仅仅局限于一定的范围内。因此,给定一个可分组设计,所构造出的码字也只能部署到一个特定参数的存储系统。考虑到实际存储系统环境的多样性,该码字设计方案不能广泛地应用到实际的存储系统中。
【发明内容】
为了解决现有技术中的问题,本发明提供了一种基于t-设计的部分重复码的编码结构以及构造方法,解决现有技术中采用的编码方式节点修复带宽消耗大,在修复过程中所需要的通信开销大以及计算复杂性高的问题。
本发明是通过以下技术方案实现的:设计、制造了一种基于t-设计的部分重复码的编码结构,其由外部MDS码以及部分重复码组成,其编码结构标识为TFRC码结构;所述部分重复码复制有限倍数的多个编码块在存储节点上的一种排列,同时保证每个编码块的副本分别存储在不同的节点上;数据块经过外部MDS编码后,输出的编码块分别复制有限倍再分散到各存储节点上。
作为本发明的进一步改进:所述部分重复码为用于参数为(n,k,d)分布式存 储系统的部分重复码,其为C=(U,M),复制倍数为f,是指特定n个子集的集合M={M1,…,Mn},其中每个子集的元素均来自于符号集U={1,…,θ},其中,每个子集的大小均为d,U中的每个元素属于M中f个子集,f称为每个数据块的复制倍数;每个子集对应于一个存储节点,且每个节点的存储容量为d。
作为本发明的进一步改进:所述部分重复码通过关联矩阵来表示,关联矩阵中每行的行和及每列的列和均为恒定值,并且任意两行至少有一个位置的数值不一样。
作为本发明的进一步改进:所述部分重复码采用t-设计构造,其具体为:取一个给定的单纯t-(v,s,λ)设计
Figure PCTCN2017084430-appb-000005
设其r阶矩阵为Wr,1≤r≤t,则矩阵Wr生成一个参数为
Figure PCTCN2017084430-appb-000006
的部分重复码。
作为本发明的进一步改进:所述部分重复码将
Figure PCTCN2017084430-appb-000007
个编码块均匀复制λr倍,并存储到一个包含
Figure PCTCN2017084430-appb-000008
个节点的存储系统中,其中每个节点可以存储
Figure PCTCN2017084430-appb-000009
个数据块。
本发明同时提供了一种基于t-设计的部分重复码的构造方法,包括如下步骤:通过外部MDS码以及部分重复TFRC码进行构造,数据块经过外部MDS编码后,输出的编码块分别复制有限倍再分散到各存储节点上;部分重复码复制有限倍数的多个编码块在存储节点上的一种排列,同时保证每个编码块的副本分别存储在不同的节点上。
作为本发明的进一步改进:所述部分重复码为用于参数为(n,k,d)分布式存储系统的部分重复码,其为C=(U,M),复制倍数为f,是指特定n个子集的集合M={M1,…,Mn},其中每个子集的元素均来自于符号集U={1,…,θ},其中,每个子集的大小均为d,U中的每个元素属于M中f个子集,f称为每个数据块的复制倍数;每个子集对应于一个存储节点,且每个节点的存储容量为d。
作为本发明的进一步改进:所述部分重复码通过关联矩阵来表示,关联矩阵中每行的行和及每列的列和均为恒定值,并且任意两行至少有一个位置的数值不一样。
作为本发明的进一步改进:
所述部分重复码采用t-设计构造,其具体为:取一个给定的单纯t-(v,s,λ)设计
Figure PCTCN2017084430-appb-000010
设其r阶矩阵为Wr,1≤r≤t,则矩阵Wr生成一个参数为
Figure PCTCN2017084430-appb-000011
的部分重复码;所述部分重复码将
Figure PCTCN2017084430-appb-000012
个编码块均匀复制λr倍,并存储到一个包含
Figure PCTCN2017084430-appb-000013
个节点的存储系统中,其中每个节点可以存储
Figure PCTCN2017084430-appb-000014
个数据块。
作为本发明的进一步改进:所述TFRC码采用基于表格的修复方式,其修复过程和重建过程均不涉及复杂的有限域运算。
本发明的有益效果是:基于t-设计的部分重复码(TFRC)显著地降低了节点修复过程中计算复杂度,以简单易于实施的数据拷贝取代了复杂的有限域运算。传统RGC码的构造基于有限域,数据恢复过程中涉及到的有限域加法、减法以及乘法,有限域的运算虽然理论研究比较成熟,但实际运用起来比较繁琐、时间消耗大,明显不能符合当今分布式存储系统快速可靠的设计指标;TFRC码则不同,系统中节点失效修复可以通过从其它节点直接下载数据并存储到替换节点来完成修复,不需要额外的运算,从而大大提高了节点修复及数据块再生的速率,在实际的分布式存储系统中具有很高的应用价值和发展潜力。
基于t-设计的部分重复码不仅降低了节点修复过程中的运算复杂度,同时可以保证节点修复过程中所消耗的带宽是最小的,并不消耗多余的带宽;TFRC码能够保证:1)丢失的编码块可以直接下载其他编码模块的若干子集进行修复;2)丢失的编码块可以通过固定数目的编码模块进行修复,修复模式是基于表格的。同时,TFRC码修复后的节点存储的数据和失效节点是完全一致的,也就是精确修复,很大程度上减少了系统操作复杂度(如元数据更新等)。
【附图说明】
图1为现有技术中源文件恢复过程示意图;
图2为现有技术中失效节点恢复过程示意图;
图3为现有技术中RGC码再生过程示意图;
图4为本发明采用FRC码的(4,2,3)分布式存储系统示意图;
图5为本发明一TFRC码构造实例示意图;
图6为本发明又一TFRC码构造实例示意图。
【具体实施方式】
下面结合附图说明及具体实施方式对本发明进一步说明。
缩略语和关键术语定义
RGC     Regenerating Codes  再生码
MDS     Maximum Distance Separable     最大距离可分
RS      Reed-Solomon   里德-所罗门
EC      Erasure Codes   纠删码
MSR     Minimum-Storage Regenerating     最小存储再生
MBR     Minimum-Bandwidth Regenerating     最小带宽再生
FRC     Fractional Repetition Codes     部分重复码
一种基于t-设计的部分重复码的编码结构,其由外部MDS码以及部分重复码组成,其编码结构标识为TFRC码结构;所述部分重复码复制有限倍数的多个编码块在存储节点上的一种排列,同时保证每个编码块的副本分别存储在不同的节点上;数据块经过外部MDS编码后,输出的编码块分别复制有限倍再分散到各存储节点上。
所述部分重复码为用于参数为(n,k,d)分布式存储系统的部分重复码,其为C=(U,M),复制倍数为f,是指特定n个子集的集合M={M1,…,Mn},其中每个子集的元素均来自于符号集U={1,…,θ},其中,每个子集的大小均为d,U中的每个元素属于M中f个子集,f称为每个数据块的复制倍数;每个子集对应于一个存储节点,且每个节点的存储容量为d。
所述部分重复码通过关联矩阵来表示,关联矩阵中每行的行和及每列的列和均为恒定值,并且任意两行至少有一个位置的数值不一样。
所述部分重复码采用t-设计构造,其具体为:取一个给定的单纯t-(v,s,λ)设计
Figure PCTCN2017084430-appb-000015
设其r阶矩阵为Wr,1≤r≤t,则矩阵Wr生成一个参数为
Figure PCTCN2017084430-appb-000016
的部分重复码。
所述部分重复码将
Figure PCTCN2017084430-appb-000017
个编码块均匀复制λr倍,并存储到一个包含
Figure PCTCN2017084430-appb-000018
个节点的存储系统中,其中每个节点可以存储
Figure PCTCN2017084430-appb-000019
个数据块。
本发明同时提供了一种基于t-设计的部分重复码的构造方法,包括如下步骤:通 过外部MDS码以及部分重复TFRC码进行构造,数据块经过外部MDS编码后,输出的编码块分别复制有限倍再分散到各存储节点上;部分重复码复制有限倍数的多个编码块在存储节点上的一种排列,同时保证每个编码块的副本分别存储在不同的节点上。
所述部分重复码为用于参数为(n,k,d)分布式存储系统的部分重复码,其为C=(U,M),复制倍数为f,是指特定n个子集的集合M={M1,…,Mn},其中每个子集的元素均来自于符号集U={1,…,θ},其中,每个子集的大小均为d,U中的每个元素属于M中f个子集,f称为每个数据块的复制倍数;每个子集对应于一个存储节点,且每个节点的存储容量为d。
所述部分重复码通过关联矩阵来表示,关联矩阵中每行的行和及每列的列和均为恒定值,并且任意两行至少有一个位置的数值不一样。
所述部分重复码采用t-设计构造,其具体为:取一个给定的单纯t-(v,s,λ)设计
Figure PCTCN2017084430-appb-000020
设其r阶矩阵为Wr,1≤r≤t,则矩阵Wr生成一个参数为
Figure PCTCN2017084430-appb-000021
的部分重复码;所述部分重复码将
Figure PCTCN2017084430-appb-000022
个编码块均匀复制λr倍,并存储到一个包含
Figure PCTCN2017084430-appb-000023
个节点的存储系统中,其中每个节点可以存储
Figure PCTCN2017084430-appb-000024
个数据块。
所述TFRC码采用基于表格的修复方式,其修复过程和重建过程均不涉及复杂的有限域运算。
在一实施例中,本发明基于t-设计的部分重复码构造方法主要针对传统的存储装置系统结构比较复杂,采用的编码方式节点修复带宽消耗大,在修复过程中所需要的通信开销大、计算复杂性高的问题,提出了一种新型的基于t-设计的FRC码来降低编解码过程中的计算复杂度,称为TFRC码。该码字有效地减少了系统修复带宽,能够保证丢失数据的精确再生,并且提高节点失效后修复过程的有效性(包括修复带宽、计算开销和修复时间)。相比于现有的码字构造技术,本发明方案极大地拓展了FRC码的构造参数,进而能够在更大的参数范围内构造码字。
具体来说,TFRC码满足MDS的基本属性,即修复一个丢失的编码模块只需要一小部分的数据量,而不需要重构整个文件。数据块经过MDS编码后,将输出的编码块复制f倍再分散到各存储节点。TFRC码编码过程中包含两个部分:一个外部MDS码以及一个内部重复码。数据块经过MDS编码后,将输出的编码块复制若干倍后再分散到各存储节点。系统中发生节点失效时,可以通过从其它节点直接下载数据并存储 到替换节点来完成修复,不需要额外的运算。
TFRC码可以在很大范围内选择构造参数,对于一个给定的t-设计,可以构造出4t个不同的FRC码字。此外,TFRC码可以降低节点修复过程中计算复杂度,减少系统修复时间。
传统的再生码修复过程计算复杂度比较高,通常涉及大量的有限域运算。参与修复的节点读出所存储的数据块并进行特定的线性运算,再向替换节点传递组合后的数据块。考虑到实际系统中节点读写带宽小于网络带宽,因此读写带宽很容易成为系统性能瓶颈。为了降低修复过程运算复杂度,提出了基于t-设计的部分重复码构造,称为TFRC码。
部分重复码:
一个分布式存储系统通常用(n,k,d)来表示,其中n表示系统中的节点总数(分别记为Ni,i=1,...,n),k表示重构原文件所需最少节点数,d表示修复一个失效节点所需的可用节点数,并且满足k≤d≤n-1。原文件经过MDS编码后,输出的编码块(不妨设为θ个)分别复制f倍再分散到各存储节点上。对系统用户来说,可以从任意k个节点下载数据并根据MDS特性进而重构出原文件。目前MDS码的研究已经相对成熟,几乎可以满足任何符合条件的参数。所以,FRC码的构造难点在于内部重复码设计。部分重复码的实质上是复制倍数为f的θ个编码块在存储节点上的一种排列,同时保证每个编码块的副本分别存储在不同的节点上。
定义1.一个适用于参数为(n,k,d)分布式存储系统的部分重复码C=(U,M),复制倍数为f,是指特定n个子集的集合M={M1,…,Mn},其中每个子集的元素均来自于符号集U={1,…,θ}。同时满足以下两个条件:
(1)每个子集的大小均为d;
(2)U中的每个元素属于M中f个子集。
在上述定义中,每个子集Mi中的元素表示经过MDS编码后数据块的下标,这些数据块相应地存储在节点Ni(i=1,...,n)上。可见,每个子集对应于一个存储节点,且每个节点的存储容量为d。为方便起见,称上述FR码为一个参数为(n,d,θ,f)的码字,其中f称为每个数据块的复制倍数。
例如,假设
Figure PCTCN2017084430-appb-000025
表示一个包含5个数据块的文件,
Figure PCTCN2017084430-appb-000026
表示大小为q的有限域。经过参数为(6,5)的MDS编码,输出6个数据块
Figure PCTCN2017084430-appb-000027
其中Yt=Xt,i=1,…,5;
Figure PCTCN2017084430-appb-000028
每个输出的编码块均复制两倍,将生成的数据块存储在4个节点上,如图3所示。方框中的数字表示编码块的下标,如节点N1存储的三个数据块依次为Y1,Y3,Y2。任意两个节点存储的数据可以重构出原文件,因此有k=2。当节点失效时,系统可以从其它三个节点下载数据进行修复,则d=3。
此外,一个部分重复码同时可以用一个关联矩阵来表示。具体来说,给定一个FRC码C={U,M},其中U={1,…,θ},且M={M1,…,Mn},。对于1≤i≤n,1≤j≤θ,令
Figure PCTCN2017084430-appb-000029
则称(0,1)矩阵
Figure PCTCN2017084430-appb-000030
为C的关联矩阵(incidence matrix)。根据定义可以看出,关联矩阵的每一行表示一个存储节点,每一列对应于一个编码块。因此,一个参数为(n,d,θ,f)的FRC码字所对应的关联矩阵,其每行的行和为d,每列的列和为f。例如,图3中的FRC码对应的关联矩阵为
Figure PCTCN2017084430-appb-000031
综上所述,FRC码字的构造相当于构造这样一个(0,1)矩阵,其每行的行和及每列的列和均为恒定值,并且任意两行至少有一个位置的数值不一样。
t-设计
一个组合设计(combinatorial design)是一个二元组
Figure PCTCN2017084430-appb-000032
其中X是一个点集,X中所包含点的个数称为该设计的阶;
Figure PCTCN2017084430-appb-000033
是X的一个子集族,每个子集称为一个区组。通常以v表示集合X的基数,以b表示集合
Figure PCTCN2017084430-appb-000034
的基数,即|X|=v,
Figure PCTCN2017084430-appb-000035
一个组合设计也可以看作是一种关联结构,如果X中的一个点包含在某个区组中,那么称这个 点与该区组关联,反之则不相关。
如果一个设计
Figure PCTCN2017084430-appb-000036
满足以下条件:1)|X|=v;2)存在常数s,使得对于对任意的
Figure PCTCN2017084430-appb-000037
都有|B|=s;3)对于给定的正整数t,存在常数λ>0,使得X中任意一个t元子集恰好出现在
Figure PCTCN2017084430-appb-000038
的λ个区组中。那么,该设计称为一个t-(v,s,λ)设计(或者t-设计)。根据定义,
Figure PCTCN2017084430-appb-000039
中可能包含重复的区组,将仅考虑不包含重复区组的t-设计,这样的设计也称为单纯(simple)t-设计。如果X={1,…,7},
Figure PCTCN2017084430-appb-000040
Figure PCTCN2017084430-appb-000041
构成一个2-(7,3,1)设计。
引理1.设1≤i≤t,一个t-(v,s,λ)设计也是一个i-(v,s,λi)设计,其中
Figure PCTCN2017084430-appb-000042
根据上述定理,当t≥2时,任意一个t-设计同时也是一个1-设计,即任意一点出现在相同数目的区组中。特别地,一个t-(v,s,λ)设计所包含的区组数b为
Figure PCTCN2017084430-appb-000043
TFRC码构造
取一个t-(v,s,λ)设计
Figure PCTCN2017084430-appb-000044
其中
Figure PCTCN2017084430-appb-000045
所包含的区组依次为B1,B2,…,Bb。设r为一个正整数,并且满足r≤t。考虑点集X的所有r元子集,分别记为
Figure PCTCN2017084430-appb-000046
对于1≤i≤b,
Figure PCTCN2017084430-appb-000047
将构造如下r阶矩阵W:
Figure PCTCN2017084430-appb-000048
其中
Figure PCTCN2017084430-appb-000049
引理2.矩阵Wr的每一行的行和为
Figure PCTCN2017084430-appb-000050
每一列的列和为λr
证明.根据定义可以看出,矩阵W的每一行对应于
Figure PCTCN2017084430-appb-000051
的一个区组,每一列对应于X一个r元的子集。由于
Figure PCTCN2017084430-appb-000052
中每个区组的大小为s,并且r≤t≤s,则每个区组可以涵盖点集X的r元子集的个数为
Figure PCTCN2017084430-appb-000053
即每行的行和为
Figure PCTCN2017084430-appb-000054
此外,根据引理1,一个t-(v,s,λ)设计也是一个r-(v,s,λr)设计,即任意r元子集恰好包含在λr个区组中,所以每列的列和为λr
此外,由于仅考虑单纯t-设计,因此矩阵Wr的任意两行均互不相同。综上,得出如下码字构造:
码字构造取一个给定的单纯t-(v,s,λ)设计
Figure PCTCN2017084430-appb-000055
设其r阶矩阵为Wr,1≤r≤t,则矩阵Wr可以生成一个参数为
Figure PCTCN2017084430-appb-000056
的FRC码字。
特别地,将t-设计构造出的FRC码称为TFRC码。所构造出的码字能够将
Figure PCTCN2017084430-appb-000057
个编码块均匀复制λr倍,并存储到一个包含
Figure PCTCN2017084430-appb-000058
个节点的存储系统中,其中每个节点可以存储
Figure PCTCN2017084430-appb-000059
个数据块。
例如,考虑一个的2-(6,3,2)设计,该设计包含10个区组,如下所示:
{1,2,4},{1,2,6},{1,3,4},{1,3,5},{1,5,6},
{2,3,5},{2,3,6},{2,4,5},{3,4,6},{4,5,6}.
如果取r=1,则其相应的1阶关联矩阵为
Figure PCTCN2017084430-appb-000060
通过该矩阵可以构造出图4中的TFRC码,每个编码块复制倍数为5。
如果取r=2,则其相应的2阶关联矩阵为
Figure PCTCN2017084430-appb-000061
通过该矩阵可以构造出图5中的TFRC码,每个编码块复制倍数为2。
TFRC码性能评估
构造参数拓展
现考虑一个单纯t-(v,s,λ)设计
Figure PCTCN2017084430-appb-000062
以及其r阶矩阵为Wr,1≤r≤t。通过上述分析知道,矩阵Wr的每一行的行和为
Figure PCTCN2017084430-appb-000063
每一列的列和为λr。如果采用矩阵转置操作,则Wr T可以生成一个参数为
Figure PCTCN2017084430-appb-000064
的FRC码,其中Wr T是矩阵Wr的转置矩阵。通过转置,矩阵Wr的行列发生置换。从而根据FRC码与其关联矩阵的对应关系,可以 构造出出上述码字。
设J为一个
Figure PCTCN2017084430-appb-000065
的全1矩阵,即矩阵中的每个元素均为1。进一步地,矩阵J-Wr可以生成一个参数为
Figure PCTCN2017084430-appb-000066
的FRC码。同理,矩阵(J-Wr)T可以生成一个参数为
Figure PCTCN2017084430-appb-000067
的FRC码。
综上所述,给定一个t-(v,s,λ)设计,通过其r阶矩阵Wr,1≤r≤t总共可以生成t个TFRC码字。对于每个矩阵Wr,结合矩阵操作(转置、减法等),进一步可以生成3个不同的FRC码。因此,一个t-设计可以构造出多达4t个TFRC码字,从而极大地扩展了现有FRC码字的构造参数。
基于表格的修复方式
TFRC码涵盖了FR码的所有特性。每个数据块的复制倍数一致,同时系统每个节点的存储容量相同。值得注意的是,与传统随机访问模式不同,TFRC码采用基于表格(table-based)的修复方式。具体地说,修复表格指明了每个具体失效节点可选择的修复方案。举例来说,考虑图4中的TFRC码,如果节点N1失效,可以通过节点N2和N3来进行修复,而非节点N3和N4。此外,下表给出了图5中的TFRC码字的每个节点失效修复方案。
失效节点   可参与修复的节点
N1   N2,N3和N8
N2   N1,N5和N7
N3   N1,N4和N9
N4   N3,N5和N6
N5   N2,N4和N10
N6   N4,N7和N8
N7   N2,N6和N9
N8   N1,N6和N10
N9   N3,N7和N10
N10   N5,N8和N9
实际存储系统部署中通常包含一个追踪服务器(tracker server),用于记录系统元数据。因此,可以将修复表格信息写入元数据,便于失效修复的快速访问读取。就降低修复过程的复杂度而言,建立和维护节点修复表格的代价是值得的。
本发明提出了一种新型的基于t-设计的FR码构造——TFRC码。相比现有的码字构造方式,基于t-设计的方案更加简洁直观,能够支持更多的构造参数。TFRC码简化了节点修复过程中的计算复杂度,更加简单易于实施。相比于现有的编码方案,TFRC码的构造更加简洁直观,节点修复效率更高。
虽然TFRC码采用基于表格的修复方式,但是其修复过程和重建过程均不涉及复杂的有限域运算,所以计算复杂度很低、计算开销很小,很大程度上降低了系统修复时延,适合实际的存储系统;TFRC码可以保证修复过程中不需要额外的带宽(带宽消耗仅仅为丢失数据大小),同时TFRC可以实现节点精确修复,即系统修复后的数据与节点丢失的数据完全一致,这使得TFRC码易于实施、修复代价低,因此在实际大规模存储系统中具有可观的应用前景。
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。

Claims (10)

  1. 一种基于t-设计的部分重复码的编码结构,其特征在于:其由外部MDS码以及部分重复码组成,其编码结构标识为TFRC码结构;所述部分重复码复制有限倍数的多个编码块在存储节点上的一种排列,同时保证每个编码块的副本分别存储在不同的节点上;数据块经过外部MDS编码后,输出的编码块分别复制有限倍再分散到各存储节点上。
  2. 根据权利要求1所述的基于t-设计的部分重复码的编码结构,其特征在于:所述部分重复码为用于参数为(n,k,d)分布式存储系统的部分重复码,其为C=(U,M),复制倍数为f,是指特定n个子集的集合M={M1,…,Mn},其中每个子集的元素均来自于符号集U={1,…,θ},其中,每个子集的大小均为d,U中的每个元素属于M中f个子集,f称为每个数据块的复制倍数;每个子集对应于一个存储节点,且每个节点的存储容量为d。
  3. 根据权利要求1所述的基于t-设计的部分重复码的编码结构,其特征在于:所述部分重复码通过关联矩阵来表示,关联矩阵中每行的行和及每列的列和均为恒定值,并且任意两行至少有一个位置的数值不一样。
  4. 根据权利要求1所述的基于t-设计的部分重复码的编码结构,其特征在于:所述部分重复码采用t-设计构造,其具体为:取一个给定的单纯t-(v,s,λ)设计
    Figure PCTCN2017084430-appb-100001
    设其r阶矩阵为Wr,1≤r≤t,则矩阵Wr生成一个参数为
    Figure PCTCN2017084430-appb-100002
    的部分重复码。
  5. 根据权利要求4所述的基于t-设计的部分重复码的编码结构,其特征在于:所述部分重复码将
    Figure PCTCN2017084430-appb-100003
    个编码块均匀复制λr倍,并存储到一个包含
    Figure PCTCN2017084430-appb-100004
    个节点的存储系统中,其中每个节点可以存储
    Figure PCTCN2017084430-appb-100005
    个数据块。
  6. 一种基于t-设计的部分重复码的构造方法,其特征在于:包括如下步骤:通过外部MDS码以及部分重复TFRC码进行构造,数据块经过外部MDS编码后,输出的编码块分别复制有限倍再分散到各存储节点上;部分重复码复制有限倍数的多个编码块在存储节点上的一种排列,同时保证每个编码块的副本分别存储在不同的节点上。
  7. 根据权利要求6所述的基于t-设计的部分重复码的构造方法,其特征在于:所述部 分重复码为用于参数为(n,k,d)分布式存储系统的部分重复码,其为C=(U,M),复制倍数为f,是指特定n个子集的集合M={M1,…,Mn},其中每个子集的元素均来自于符号集U={1,…,θ},其中,每个子集的大小均为d,U中的每个元素属于M中f个子集,f称为每个数据块的复制倍数;每个子集对应于一个存储节点,且每个节点的存储容量为d。
  8. 根据权利要求6所述的基于t-设计的部分重复码的构造方法,其特征在于:所述部分重复码通过关联矩阵来表示,关联矩阵中每行的行和及每列的列和均为恒定值,并且任意两行至少有一个位置的数值不一样。
  9. 根据权利要求6所述的基于t-设计的部分重复码的构造方法,其特征在于:所述部分重复码采用t-设计构造,其具体为:取一个给定的单纯t-(v,s,λ)设计
    Figure PCTCN2017084430-appb-100006
    设其r阶矩阵为Wr,1≤r≤t,则矩阵Wr生成一个参数为
    Figure PCTCN2017084430-appb-100007
    的部分重复码;所述部分重复码将
    Figure PCTCN2017084430-appb-100008
    个编码块均匀复制λr倍,并存储到一个包含
    Figure PCTCN2017084430-appb-100009
    个节点的存储系统中,其中每个节点可以存储
    Figure PCTCN2017084430-appb-100010
    个数据块。
  10. 根据权利要求6所述的基于t-设计的部分重复码的构造方法,其特征在于:所述TFRC码采用基于表格的修复方式,其修复过程和重建过程均不涉及复杂的有限域运算。
PCT/CN2017/084430 2017-05-16 2017-05-16 基于t-设计的部分重复码的编码结构以及构造方法 WO2018209541A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/084430 WO2018209541A1 (zh) 2017-05-16 2017-05-16 基于t-设计的部分重复码的编码结构以及构造方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/084430 WO2018209541A1 (zh) 2017-05-16 2017-05-16 基于t-设计的部分重复码的编码结构以及构造方法

Publications (1)

Publication Number Publication Date
WO2018209541A1 true WO2018209541A1 (zh) 2018-11-22

Family

ID=64273001

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/084430 WO2018209541A1 (zh) 2017-05-16 2017-05-16 基于t-设计的部分重复码的编码结构以及构造方法

Country Status (1)

Country Link
WO (1) WO2018209541A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157485A (zh) * 2021-05-06 2021-07-23 中南大学 一种部分重复码的扩张构造方法
CN113347026A (zh) * 2021-05-21 2021-09-03 长安大学 基于立方体网络的部分重复码构造和故障节点修复方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100138717A1 (en) * 2008-12-02 2010-06-03 Microsoft Corporation Fork codes for erasure coding of data blocks
CN102624866A (zh) * 2012-01-13 2012-08-01 北京大学深圳研究生院 一种存储数据的方法、装置及分布式网络存储系统
WO2015180038A1 (zh) * 2014-05-27 2015-12-03 北京大学深圳研究生院 部分复制码的构建方法、装置及其数据修复的方法
KR101621752B1 (ko) * 2015-09-10 2016-05-17 연세대학교 산학협력단 부분접속 복구 가능한 반복분할 부호를 이용한 분산 저장 장치 및 그 방법
CN105721611A (zh) * 2016-04-15 2016-06-29 西南交通大学 一种由极大距离可分存储码生成最小存储再生码的一般方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100138717A1 (en) * 2008-12-02 2010-06-03 Microsoft Corporation Fork codes for erasure coding of data blocks
CN102624866A (zh) * 2012-01-13 2012-08-01 北京大学深圳研究生院 一种存储数据的方法、装置及分布式网络存储系统
WO2015180038A1 (zh) * 2014-05-27 2015-12-03 北京大学深圳研究生院 部分复制码的构建方法、装置及其数据修复的方法
KR101621752B1 (ko) * 2015-09-10 2016-05-17 연세대학교 산학협력단 부분접속 복구 가능한 반복분할 부호를 이용한 분산 저장 장치 및 그 방법
CN105721611A (zh) * 2016-04-15 2016-06-29 西南交通大学 一种由极大距离可分存储码生成最小存储再生码的一般方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHU, BING ET AL.: "Adaptive Fractional Repetition Codes for Dynamic Storage Systems", IEEE COMMUNICATIONS LETTERS, vol. 19, no. 12, 31 December 2015 (2015-12-31), XP055545097 *
ZHU, BING ET AL.: "Exploring Node Repair Locality in Fractional Repetition Codes", IEEE COMMUNICATIONS LETTERS, vol. 20, no. 12, 31 December 2016 (2016-12-31), pages 2350 - 2353, XP011636329 *
ZHU, BING ET AL.: "Rethinking Fractional Repetition Codes: New Construction and Code Distance", IEEE COMMUNICATIONS LETTERS, vol. 20, no. 2, 29 February 2016 (2016-02-29), pages 220 - 223, XP055545093, ISSN: 1089-7798 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157485A (zh) * 2021-05-06 2021-07-23 中南大学 一种部分重复码的扩张构造方法
CN113157485B (zh) * 2021-05-06 2022-07-15 中南大学 一种部分重复码的扩张构造方法
CN113347026A (zh) * 2021-05-21 2021-09-03 长安大学 基于立方体网络的部分重复码构造和故障节点修复方法
CN113347026B (zh) * 2021-05-21 2022-06-28 长安大学 基于立方体网络的部分重复码构造和故障节点修复方法

Similar Documents

Publication Publication Date Title
CN109491835B (zh) 一种基于动态分组码的数据容错方法
CN103118133B (zh) 基于文件访问频次的混合云存储方法
CN107003933B (zh) 部分复制码的构建方法、装置及其数据修复的方法
CN104052576B (zh) 一种云存储下基于纠错码的数据恢复方法
CN112799605B (zh) 平方部分重复码构造方法、节点修复方法及容量计算方法
CN106484559A (zh) 一种校验矩阵的构造方法及水平阵列纠删码的构造方法
CN105956128A (zh) 一种基于简单再生码的自适应编码存储容错方法
CN113190377B (zh) 一种基于分布式存储系统的可靠冗余方法及设备
CN103650462B (zh) 基于同态的自修复码的编码、解码和数据修复方法及其存储系统
WO2018209541A1 (zh) 基于t-设计的部分重复码的编码结构以及构造方法
CN104782101B (zh) 用于分布式网络存储的自修复码的编码、重构和恢复方法
WO2014059651A1 (zh) 一种射影自修复码的编码、数据重构及修复方法
CN108304264A (zh) 一种基于spark流式计算的纠删码归档方法
WO2018119976A1 (zh) 应用于数据仓库系统的高效优化数据布局方法
CN111224747A (zh) 可降低修复带宽和磁盘读取开销的编码方法及其修复方法
Zhu et al. On low repair complexity storage codes via group divisible designs
Li et al. Parallelizing degraded read for erasure coded cloud storage systems using collective communications
Gomez et al. Hierarchical clustering strategies for fault tolerance in large scale HPC systems
Zhao et al. G-update: A group-based update scheme for heterogenous erasure-coded storage systems
Li et al. A data-check based distributed storage model for storing hot temporary data
CN110781025B (zh) 基于完全图的对称部分重复码构造及故障节点修复方法
CN110781163B (zh) 基于完全图的异构部分重复码构造及故障节点修复方法
Fang et al. CLRC: A new erasure code localization algorithm for HDFS
Zhu et al. General fractional repetition codes from combinatorial designs
CN112667443A (zh) 一种面向用户的可变分布式存储副本容错方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17910165

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17910165

Country of ref document: EP

Kind code of ref document: A1