WO2017061891A1 - Codage pour système de stockage distribué - Google Patents

Codage pour système de stockage distribué Download PDF

Info

Publication number
WO2017061891A1
WO2017061891A1 PCT/RU2015/000655 RU2015000655W WO2017061891A1 WO 2017061891 A1 WO2017061891 A1 WO 2017061891A1 RU 2015000655 W RU2015000655 W RU 2015000655W WO 2017061891 A1 WO2017061891 A1 WO 2017061891A1
Authority
WO
WIPO (PCT)
Prior art keywords
symbols
codeword
unknown
marked
node
Prior art date
Application number
PCT/RU2015/000655
Other languages
English (en)
Other versions
WO2017061891A9 (fr
Inventor
Peter Vladimirovich Trifonov
Yunfeng Shao
Yuangang WANG
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/RU2015/000655 priority Critical patent/WO2017061891A1/fr
Priority to CN201580083657.1A priority patent/CN108141228A/zh
Publication of WO2017061891A1 publication Critical patent/WO2017061891A1/fr
Publication of WO2017061891A9 publication Critical patent/WO2017061891A9/fr

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/13Linear codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/37Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35
    • H03M13/373Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35 with erasure correction and erasure determination, e.g. for packet loss recovery or setting of erasures for the decoding of Reed-Solomon codes

Definitions

  • the present invention relates to a method for encoding input data in a codeword and to a method for updating a codeword.
  • the present invention also relates to a storage controller and to a computer-readable storage medium.
  • the present invention further also relates to a com- puter-readable storage medium storing program code, the program code comprising instructions for carrying out one of the above methods.
  • each server equipped with N d storage devices, servers, disks and blocks on them may fail or go temporary offline at any time for many different reasons.
  • N d storage devices In order to ensure that the information stored in the system is continuously available, one may store multiple copies of data on different servers/disks (this approach is adopted in Google file system and Hadoop Distributed File System).
  • Another solution is to employ some kind of erasure coding, i.e. partition a chunk of data (stripe) into k information blocks (symbols), compute for them n— k parity (check) blocks (symbols), and store these blocks on different disks and servers. If any of them fails, one can consider the corresponding symbols as erased, and try to recover the missing blocks by means of erasure decoding of the corresponding code.
  • Reed-Solomon codes Numerous erasure correcting codes for network storage systems have been suggested, including: Reed-Solomon codes, Pyramid codes, EvenOdd and RDP codes, Parity splitting codes, and Zigzag codes.
  • Reed-Solomon code provides protection against any combi- nation of up to n— k erasures, and have therefore the lowest possible redundancy.
  • recovering any erased symbol requires one to access at least k surviving symbols.
  • Pyramid and parity splitting codes provide the ability to recover a number of erasures by accessing at most / ⁇ k non-erased symbols. This is achieved at the expense of higher redundancy of the code.
  • these constructions are obtained from some maximum distance separable code (e.g. Reed-Solomon) by introducing into codewords additional check symbols, which depend only on some subsets of information symbols.
  • Array codes such as EvenOdd, RDP and zigzag codes, are defined over a vector alphabet. This enables one to design efficient encoding algorithms, as well as to reduce the amount of data to be transmitted over the network in case of erasure recovery.
  • code dimension k and redundancy n— k there are still no explicit and efficient methods for construction of these codes for arbitrary values of code dimension k and redundancy n— k.
  • the performance of a network storage system depends on the amount of traffic generated and the number of servers contacted during encode and rebuild operations, disk access rate and computational complexity of the associated algorithms.
  • An important problem arising in such systems is that applications tend to write data in relatively small chunks consisting of less than k blocks. This requires one to implement bufferization, i.e. accumulate the data somewhere until sufficient amount of it is collected.
  • an objective of the present invention to provide a method for encoding data and a method for updating a codeword, wherein the methods overcome one or more of the above-mentioned problems of the prior art.
  • an objective of the present invention can include ensuring data availability in a distributed storage system, which may suffer from block, device and server failures.
  • the set 7 comprises integers s with 0 ⁇ s ⁇ r h >
  • G 2 J1
  • is a natural number.
  • the method is a method for recovering erased data and wherein the method is based on an encoding scheme based on nodes F_i that correspond to the matrices F t and the method comprises:
  • a node has t known input symbols, t unknown and /— t known output symbols, recover unknown output symbols by local decoding at the node and mark all output symbols as known, and if all output symbols of a node are known, compute unknown input symbols and mark them known.
  • the method of the first implementation provides an efficient method for erasure recovery.
  • the task of erasure recovery can be reduced to a plurality of local decoding tasks, for which efficient implementations exist, as further outlined below.
  • the method is a method for systematic encoding and obtaining the codeword comprises initial steps of:
  • the method further comprises a step of recovering the symbols in the remaining codeword positions using the method of the first implementation of the first aspect.
  • the proposed method for selecting data positions within a codeword ensures that the described erasure recovery method always recovers all the check symbols of the codeword.
  • the local decoding comprises:
  • a Fast Fourier Transform is utilized for determining unknown values from known values.
  • the FFT can be used for evaluating the polynomials, as required by the above described local decoding method, thus providing a particularly efficient way of performing the local decoding.
  • the method further comprises a step of partially encoding input data with a length that is less than code dimension
  • encoding can be performed partially until unknown symbols can be recovered. Encoding can be resumed as soon as additional data arrives. Long codes are needed in order to maximize the payload capacity of a storage system given some target data loss probability. However, the dimension of such codes may be too high compared to the amount of data which can be produced at once by an application. Therefore the eighth implementation provides a delayed encoding method, which can be used in order to generate a few check symbols for small pieces of data as soon as it arrives, until sufficient amount of data is accumulated in order to produce the whole codeword.
  • the method of the eighth implementation can make use of the idea to designate initially all codeword symbols as unknown, and put the information symbols into appropriate positions, designating them as known, as soon as they arrive. Then one can execute the above described systematic encoding algorithm, which may stop at some points due to lack of known symbols, and resume as soon as they appear. Observe that it is not likely that many devices fail within a short time span which is needed to accumulate t data blocks. Therefore, a few check symbols obtained during incomplete execution of the encoding algorithm may be sufficient to cope with such failures. As soon as the encoding algorithm completes its execution, the whole set of check symbols can be obtained, which ensures protection against many device failures, as required for long-term data storage.
  • the method of the eighth implementation provides an efficient way of dealing with large codes and small information chunks provided by an application.
  • the method further comprises a step of storing an z ' -th element of the codeword on an z ' -th node, wherein the node is a server or a disk within a server.
  • a second aspect of the invention refers to a method for updating a codeword encoded according to the method of the first aspect or one of its implementations, the method comprising:
  • An implementation of the second aspect can provide a method for updating the data encoded according to the method of the first aspect, where the symbol to be updated is declared obsolete, a new value is stored on the device, the old check symbols are declared obsolete, and new check symbols are allocated and declared initially as unknown. Then the values of unknown symbols are recursively determined as in the case of the encoding procedure.
  • the block storing its previous value is declared obsolete.
  • a third aspect of the invention refers to a storage controller, configured to carry out the method of one of the previous claims.
  • the controller can be implemented either in software or in hardware (e.g. ASIC, FPGA).
  • the controller can be directly connected to the storage devices or it can be connected to the storage devices through a network connection, wherein e.g. the storage devices are connected to the network through a further controller.
  • a fourth aspect of the invention refers to a computer-readable storage medium storing prog] code, the program code comprising instructions for carrying out the method of the first or second aspect or one of the implementations of the first or second aspect.
  • FIG. 1 is a block diagram illustrating an encoder structure in accordance with an embodiment of the present invention
  • FIG. 2A to 2D are block diagrams illustrating processing steps of an erasure recovery method in accordance with a further embodiment of the present invention
  • FIG. 3 is a schematic diagram illustrating a method for delayed encoding in accordance with a further embodiment of the present invention.
  • FIG. 4 is a schematic diagram illustrating a method for updating a codeword in accordance with a further embodiment of the present invention.
  • U ( and q are elements of GF(2 ⁇ ), although in practice these may be column vectors (blocks) of GF(2 ) values.
  • Set T will be referred to as the set of frozen symbols.
  • j F, although the proposed construction is generic and can be used for any combination of ⁇ values.
  • FIG. 1 illustrates an encoding scheme in accordance with an embodiment of the present in- vention.
  • a system 100 comprises a set of nodes 102 to 1 12 wherein each of the nodes, denoted by "F", implements multiplication of a vector of input values (left-hand side inputs) by matrix F, and the result is passed via right-hand side terminals.
  • the encoded symbols can be stored in the following ways:
  • Each symbol is stored on its own device, where the output of each "F" node in the right-hand side layer is stored in a single group of devices (e.g. within one server).
  • Each symbol is stored in its own block, and the output of each node is stored on the same device, and the output of I adjacent node is stored within a single server.
  • a set of 4 channels are frozen, indicated with reference number 120.
  • Input data are provided as a set of symbols u 0 to ut, indicated with reference number 122.
  • Output symbols Co to c 8 are indicated with reference number 124.
  • An erasure recovery method An embodiment of the invention presents the following algorithm for correction of erasures in a codeword of a polar code:
  • FIGs. 2A to 2D presents an example of application of the above described erasure recovery method.
  • the example system comprises five nodes, indicated with reference numbers 202, 204, 206, 208 and 210. Unknown values are shown with dashed lines, and known values are shown with uninterrupted lines.
  • FIG. 2A shows the initial situation.
  • the symbols c 0 , ci, c 3 , and c 6 have been erased and are marked as unknown.
  • the first input of the first node 202 corresponds to a frozen channel and is therefore also marked as known.
  • the first input of the third, fourth and fifth node 206, 208, 210 are also frozen channels and are therefore also marked as known.
  • the third node 206 has two unknown output symbols, therefore, according to above rule 3), the second input is also marked as unknown.
  • a first processing step according to above rule 5) a
  • local decoding is performed at the fourth node 208 and the fifth node 210.
  • their outputs c 3 and c 6 can be computed and marked as known. Consequently, all outputs of the fourth node 208 and the fifth node 210 are known and their inputs can be marked as known, according to above rule 5) b).
  • the situation above applying rules 5) a) and 5) b) in the first processing step is shown in FIG. 2B. Subsequently, the topmost output symbol of the first node 202 can be computed according to rule 5) a).
  • Local decoding at a node can be performed as follows. Let 5 ⁇ and c ⁇ denote input and output symbols, respectively.
  • syndrome vector evaluation can be performed using the cyclotomic fast Fourier transform.
  • the above described erasure decoding algorithm can be used to implement systematic encod- ing. To do this, one can place the information symbols to positions ⁇ TM ⁇ 1 W; i m-1_i within the codeword, where ⁇ ⁇ 1 w l T, 0 ⁇ Wi ⁇ I, mark the remaining ones as erased, and execute the above described encoding method.
  • ⁇ ( ⁇ ) ⁇ ( ⁇ )
  • FIGs 3 and 4 illustrate embodiments of methods for delayed encoding and for updating encoded codewords.
  • check symbols are located in positions 3, 7, 11 , 14, and 15.
  • FIG. 3 illustrates a method for delayed encoding.
  • Positions 0, 1 and 2 comprise previously encoded data symbols x 0 , xi, and x 2 .
  • Position 3, indicated with reference number 300, comprises a parity symbol p 0 that is marked as unknown.
  • a step of the encoding algorithm is performed, the parity symbol po is computed, stored in position 3, and marked as known, as indicated with reference number 310.
  • new data symbols x 3 , x 4 , and x 5 arrive. They are stored in positions 4, 5 and 6, and marked as known, as indicated with reference numbers 320, 322 and 324.
  • a further step of the encoding algorithm is performed and the parity symbol pi is computed and stored in position 7, as indicated with reference number 330.
  • new data symbols x 6 , x 7 , x 8 , x 9 and xio arrive. They are stored in positions 8, 9 and 10, and marked as known, as indicated with reference numbers 340, 342, 344, 346 and 348.
  • a further step of the encoding algorithm is performed and the parity symbols p 2 and p 3 are computed and stored in positions 1 1 and 14, as indicated with reference number 330. 15 000655
  • a further step of the encoding algorithm is performed and the global parity symbol p 4 is computed.
  • the above approach can be extended in order to implement partial update of the information symbols. This is illustrated in FIG. 4.
  • symbol x 0 is updated, i.e., it is marked as obsolete, indicated with reference number 410, and new symbol x 0 ' is stored, indicated with reference number 412.
  • a corresponding check symbol x 0 ' is marked as unknown, indicated with reference number 414.
  • a second processing step S21 the previous check symbol p 0 is marked as obsolete, indicated with reference number 420, and the new check symbol p 0 ' is marked as known, indicated with reference number 422.
  • a third processing step S22 the check symbols are updated, they are stored instead of the old check symbols.
  • This is indicated with reference number 430, 432, and 434.
  • the invention provides a method for encoding the data in a storage system with a polar code, which includes a method for finding parameters of the polar code, a systematic encoding algorithm, a delayed encoding method, and a method for partial updating of the encoded data.
  • Embodiments of the present invention employ polar codes for encoding the data in distributed storage system, and provide a method for their construction, which enable local data recovery, protection against a given number of block, disk and server failures, as well as a fast algorithm for their encoding and erasure decoding. Furthermore, the invention presents techniques which enable the data to be written to the system in small blocks, and provide an efficient implementation of the partial update operation. This can provide one or more of the following advantages compared to existing approaches such as Reed-Solomon code used in HDFS-RAID:
  • the proposed method enables one to construct codes over field GF(q), q ⁇ max ⁇ ⁇ , while the constructions based on parity splitting and pyramid codes requires field size q ⁇ n. This results in reduced complexity of arithmetic operations.

Landscapes

  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Error Detection And Correction (AREA)

Abstract

La présente invention concerne un procédé de codage de données d'entrée dans un mot codé, le mot codé étant obtenu comme le produit d'un vecteur u' et d'une matrice A, le vecteur u' comprenant des symboles, (I), les positions restantes (II) comprenant les données d'entrée et la matrice A pouvant être représentée comme (III), B étant une matrice de permutation et F 0 , F 1 ,..., F m-1 étant des matrices l i x l i sur GF(2 µ ) qui ne sont pas un équivalent permutation des matrices diagonales, l'ensemble ℱ comprenant des entiers s avec (IV), l'ensemble ℱ comprenant des entiers (V) tels que (VI) ou l'ensemble ℱ comprenant des entiers (VII).
PCT/RU2015/000655 2015-10-09 2015-10-09 Codage pour système de stockage distribué WO2017061891A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/RU2015/000655 WO2017061891A1 (fr) 2015-10-09 2015-10-09 Codage pour système de stockage distribué
CN201580083657.1A CN108141228A (zh) 2015-10-09 2015-10-09 分布式存储系统的编码

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2015/000655 WO2017061891A1 (fr) 2015-10-09 2015-10-09 Codage pour système de stockage distribué

Publications (2)

Publication Number Publication Date
WO2017061891A1 true WO2017061891A1 (fr) 2017-04-13
WO2017061891A9 WO2017061891A9 (fr) 2017-06-15

Family

ID=55967384

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2015/000655 WO2017061891A1 (fr) 2015-10-09 2015-10-09 Codage pour système de stockage distribué

Country Status (2)

Country Link
CN (1) CN108141228A (fr)
WO (1) WO2017061891A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018228357A1 (fr) * 2017-06-15 2018-12-20 Huawei Technologies Co., Ltd. Procédés et appareil de codage et de décodage basés sur un code polaire en couches

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110208996A1 (en) * 2010-02-22 2011-08-25 International Business Machines Corporation Read-other protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
US20140331083A1 (en) * 2012-12-29 2014-11-06 Emc Corporation Polar codes for efficient encoding and decoding in redundant disk arrays

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676735B2 (en) * 2005-06-10 2010-03-09 Digital Fountain Inc. Forward error-correcting (FEC) coding and streaming
CN101834898B (zh) * 2010-04-29 2013-01-30 中科院成都信息技术有限公司 一种网络分布式编码存储方法
CN102624866B (zh) * 2012-01-13 2014-08-20 北京大学深圳研究生院 一种存储数据的方法、装置及分布式网络存储系统
US9203901B2 (en) * 2012-01-31 2015-12-01 Cleversafe, Inc. Efficiently storing data in a dispersed storage network
US8996950B2 (en) * 2012-02-23 2015-03-31 Sandisk Technologies Inc. Erasure correction using single error detection parity
CN103336785B (zh) * 2013-06-04 2016-12-28 华中科技大学 一种基于网络编码的分布式存储方法及其装置
CN107844268B (zh) * 2015-06-04 2021-09-14 华为技术有限公司 一种数据分发方法、数据存储方法、相关装置以及系统

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110208996A1 (en) * 2010-02-22 2011-08-25 International Business Machines Corporation Read-other protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
US20140331083A1 (en) * 2012-12-29 2014-11-06 Emc Corporation Polar codes for efficient encoding and decoding in redundant disk arrays

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ERDAL ARIKAN: "A survey of reed-muller codes from polar coding perspective", INFORMATION THEORY WORKSHOP (ITW), 2010 IEEE, IEEE, PISCATAWAY, NJ, USA, 6 January 2010 (2010-01-06), pages 1 - 5, XP031703947, ISBN: 978-1-4244-6372-5 *
ESMAILI KYUMARS SHEYKH ET AL: "CORE: Cross-object redundancy for efficient data repair in storage systems", 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, IEEE, 6 October 2013 (2013-10-06), pages 246 - 254, XP032535096, DOI: 10.1109/BIGDATA.2013.6691581 *
ESMAILI KYUMARS SHEYKH ET AL: "Efficient updates in cross-object erasure-coded storage systems", 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, IEEE, 6 October 2013 (2013-10-06), pages 28 - 32, XP032535038, DOI: 10.1109/BIGDATA.2013.6691658 *
HUANG PENGFEI ET AL: "Cyclic linear binary locally repairable codes", 2015 IEEE INFORMATION THEORY WORKSHOP (ITW), IEEE, 26 April 2015 (2015-04-26), pages 1 - 5, XP032788757, ISBN: 978-1-4799-5524-4, [retrieved on 20150624], DOI: 10.1109/ITW.2015.7133128 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018228357A1 (fr) * 2017-06-15 2018-12-20 Huawei Technologies Co., Ltd. Procédés et appareil de codage et de décodage basés sur un code polaire en couches
US10505566B2 (en) 2017-06-15 2019-12-10 Huawei Technologies Co., Ltd. Methods and apparatus for encoding and decoding based on layered polar code
CN111066250A (zh) * 2017-06-15 2020-04-24 华为技术有限公司 基于分层极化码编码和译码的方法及装置
CN111066250B (zh) * 2017-06-15 2021-11-19 华为技术有限公司 基于分层极化码编码和译码的方法及装置

Also Published As

Publication number Publication date
CN108141228A (zh) 2018-06-08
WO2017061891A9 (fr) 2017-06-15

Similar Documents

Publication Publication Date Title
US10146618B2 (en) Distributed data storage with reduced storage overhead using reduced-dependency erasure codes
US9600365B2 (en) Local erasure codes for data storage
Cadambe et al. Permutation code: Optimal exact-repair of a single failed node in MDS code based distributed storage systems
Sasidharan et al. A high-rate MSR code with polynomial sub-packetization level
US9354975B2 (en) Load balancing on disks in raid based on linear block codes
US9465692B2 (en) High reliability erasure code distribution
US20140006850A1 (en) Redundant disk encoding via erasure decoding
Sung et al. A ZigZag-decodable code with the MDS property for distributed storage systems
CN106201764B (zh) 一种数据存储方法和装置、一种数据恢复方法和装置
KR20120058556A (ko) 인코딩 및 디코딩 프로세스들을 위해 심볼들의 영속적 비활성화에 의한 fec 코드들을 활용하는 방법 및 장치
WO2012008921A1 (fr) Procédés de codage de données, procédés de décodage de données, procédés de reconstruction de données, dispositifs de codage de données, dispositifs de décodage de données et dispositifs de reconstruction de données
Shahabinejad et al. A class of binary locally repairable codes
CN114153651A (zh) 一种数据编码方法、装置、设备及介质
CN114116297B (zh) 一种数据编码方法、装置、设备及介质
WO2020029418A1 (fr) Procédé de construction d'une matrice génératrice de code binaire de réparation et procédé de réparation
Balaji et al. On partial maximally-recoverable and maximally-recoverable codes
US20180246679A1 (en) Hierarchical data recovery processing for extended product codes
EP3408956B1 (fr) Appareil et procédé de stockage distribué multicode
US10110258B2 (en) Accelerated erasure coding for storage systems
WO2017061891A1 (fr) Codage pour système de stockage distribué
CN109257049B (zh) 一种修复二进制阵列码校验矩阵的构造方法及修复方法
CN104932836B (zh) 一种提高单写性能的三盘容错编码和解码方法
Chen et al. A new Zigzag MDS code with optimal encoding and efficient decoding
CN108352845B (zh) 用于对存储数据进行编码的方法以及装置
WO2020029417A1 (fr) Procédé de codage et d'encadrement d'un code de réseau mds binaire

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15860015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15860015

Country of ref document: EP

Kind code of ref document: A1