WO2018028107A1 - Coding fault-tolerant method for array-type storage system - Google Patents

Coding fault-tolerant method for array-type storage system Download PDF

Info

Publication number
WO2018028107A1
WO2018028107A1 PCT/CN2016/110614 CN2016110614W WO2018028107A1 WO 2018028107 A1 WO2018028107 A1 WO 2018028107A1 CN 2016110614 W CN2016110614 W CN 2016110614W WO 2018028107 A1 WO2018028107 A1 WO 2018028107A1
Authority
WO
WIPO (PCT)
Prior art keywords
check
elements
data
array
err
Prior art date
Application number
PCT/CN2016/110614
Other languages
French (fr)
Chinese (zh)
Inventor
唐聃
舒红平
王亚强
Original Assignee
成都信息工程大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都信息工程大学 filed Critical 成都信息工程大学
Publication of WO2018028107A1 publication Critical patent/WO2018028107A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0014Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the source coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0061Error detection codes

Definitions

  • the invention belongs to the technical field of computer information storage, in particular to a coding fault tolerance method for an array storage system of any multi-node.
  • RAID Redundant Array of Inexpensive Disk
  • the data is processed in various combinations, and the reliability is improved by means of redundant data, so that the failure of a single hard disk does not affect the normal access of data, thereby ensuring the security of important data; and it is also desirable to use such a plurality of small disks to be fixed.
  • the conditions are combined into a large-capacity logical disk to replace the expensive single-block large-capacity hard disk, thereby reducing data storage costs.
  • the purpose of the present invention is to provide a coding and fault tolerance method for an array storage system of any multi-node, which can effectively improve fault tolerance.
  • the specific technical solution of the coding and fault tolerance method of the array storage system of the present invention is that the array storage system is composed of n data nodes and k check nodes, and each node is in the same strip.
  • Each bar contains m elements arranged vertically, m is a positive integer greater than 2, each element is represented by one bit, forming an array of m*(n+k), and the elements stored by the data node are arrays of data elements,
  • the elements stored in the check node are arrays of check elements; the specific steps of the method are:
  • % is the modulo operation. Take an integer for
  • the nodes include, but are not limited to, a PC, a server, or a disk.
  • the above method can be seen in step S3.
  • the error type for the method is a node error, that is, once a node has an arbitrary error, the data of the node is considered to be no longer reliable or all lost.
  • the method is applicable to any multi-node array. storage.
  • the invention provides a coding fault tolerance method of an array storage system, which can improve the reliability of the storage system, and is suitable for a large amount of data such as a company or an organization and has high requirements on data stability, and can be widely applied.
  • the server system In the server system.
  • FIG. 1 is a schematic diagram showing the logical structure of a multi-node storage system to which the present invention is applicable.
  • an array storage system is composed of n data nodes and k check nodes, and each node may be, but not limited to, a PC, a server, or a disk; each node is divided into an equal number of The storage areas are called strips, and the corresponding strips on each node form strips; each strip can adopt different storage methods.
  • a stripe contains m elements, m is 3, each element is represented by one bit; and 3 elements are arranged vertically to form a 3*(n+k) array.
  • the size of the data element array is 3*7; assume that the values of the elements in the data element array are as follows:
  • % is the modulo operation. Take an integer for
  • All data elements and check elements form a check chain, that is, D(1,1), D(2,2), D(3,3), C(1) constitute a check chain;
  • C(15) 0, D(1,1), D(2,3), D(3,5), and C(15) involved in this calculation constitute a check chain;
  • C(21) 0, D(1,7), D(2,2), D(3,4), and C(21) involved in this calculation constitute one check chain.
  • the check element C(h) array is arranged in the first dimension by column first as follows:
  • the final storage array is as follows:
  • the invention can recover all the failed elements when there is an error of not more than err nodes, thereby effectively improving the reliability of the storage system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

The present invention belongs to the technical field of computer information storage, and specifically relates to a coding fault-tolerant method for any multi-node array-type storage system. The fault type targeted by the method is a node fault, namely, once any fault occurs at a certain node, it is considered that all data of the node is no longer reliable or is lost. The method is applicable to any multi-node array-type storage, can improve the reliability of a storage system, is applicable to the case where a company or an institution, etc. has a large volume of data and has high requirements regarding data stability, and can be widely applied to server systems.

Description

一种阵列式存储系统的编码容错方法Code error tolerance method for array storage system 技术领域Technical field
本发明属于计算机信息存储技术领域,具体是一种用于任意多节点的阵列式存储系统的编码容错方法。The invention belongs to the technical field of computer information storage, in particular to a coding fault tolerance method for an array storage system of any multi-node.
背景技术Background technique
随着网络和服务器的迅速成长,数据的容量越来越大,数据的重要性和安全性也更加得到重视。为了应对由数据量的快速增长而带来的数据存储可靠性问题,同时也为了提高数据访问的并发效率和降低成本,通常有效的做法是使用多个存储节点共同构建一个存储系统,该存储系统通常是基于网络的分布式存储系统,其雏形可以追溯到集中式的RAID(Redundant Array of Inexpensive Disk,廉价冗余磁盘阵列)系统。RAID技术的提出,是希望用多块小型磁盘按照一定条件组合成一块大容量的逻辑硬盘,在多块硬盘中按照一定规则,如分条、分块、交叉存取等方式对数据和校验数据进行各种组合处理,用冗余数据的方式提高可靠性,使得单个硬盘发生故障时不会影响数据的正常访问,从而保证重要数据的安全;另外也希望以此种多块小型磁盘按一定条件组合成大容量的逻辑磁盘,替代昂贵的单块大容量硬盘,从而降低数据储存费用。With the rapid growth of networks and servers, the capacity of data is increasing, and the importance and security of data is also being taken more seriously. In order to cope with the problem of data storage reliability caused by the rapid growth of data volume, and also to improve the concurrency efficiency and reduce the cost of data access, it is usually effective to use a plurality of storage nodes to jointly construct a storage system. It is usually a network-based distributed storage system, and its prototype can be traced back to a centralized RAID (Redundant Array of Inexpensive Disk) system. The introduction of RAID technology is to combine multiple small disks into a large-capacity logical hard disk according to certain conditions. In multiple hard disks, data and check are performed according to certain rules, such as striping, blocking, and interleaving. The data is processed in various combinations, and the reliability is improved by means of redundant data, so that the failure of a single hard disk does not affect the normal access of data, thereby ensuring the security of important data; and it is also desirable to use such a plurality of small disks to be fixed. The conditions are combined into a large-capacity logical disk to replace the expensive single-block large-capacity hard disk, thereby reducing data storage costs.
虽然当前单个硬件设备的稳定性已经较高,但是对于由众多节点构成的分布式存储系统,节点故障事件仍然会频繁发生。尤其在数据量巨大的情况下,单个节点平均失效概率一定的情况下,总的节点数目增加 就意味着一个系统在任意时间段内可能同时失效的节点数目也随之增长;黑客攻击或操作失误等突发行为也可能造成一个存储系统中多个节点同时失效。采用常见的基于镜像备份的可靠性增强策略虽然提高了存储系统的可靠性,但显然需要浪费巨大的存储空间,随着容错能力提升而存储效率不断下降、更新成本不断增加。Although the stability of a single hardware device is already high, node failure events still occur frequently for distributed storage systems composed of many nodes. Especially in the case of a large amount of data, when the average failure probability of a single node is constant, the total number of nodes increases. It means that the number of nodes that a system may fail at the same time in any time period also increases; sudden behaviors such as hacking or operational errors may also cause multiple nodes in a storage system to fail at the same time. The common image-based backup-based reliability enhancement strategy improves the reliability of the storage system. However, it is obviously necessary to waste huge storage space. As the fault-tolerant capability increases, the storage efficiency decreases and the update cost increases.
发明内容Summary of the invention
本发明的目的就是针对现有技术的不足,提供一种用于任意多节点的阵列式存储系统的编码容错方法,能够有效提高容错能力。The purpose of the present invention is to provide a coding and fault tolerance method for an array storage system of any multi-node, which can effectively improve fault tolerance.
为实现上述目的,本发明所述阵列式存储系统的编码容错方法的具体技术方案为:所述阵列存储系统由n个数据节点和k个校验节点构成,每个节点同一个条带中的每个条块包含m个元素纵向排列,m为大于2的正整数,每个元素由一个比特表示,形成一个m*(n+k)的阵列,数据节点存储的元素为数据元素阵列,校验节点存储的元素为校验元素阵列;所述方法的具体步骤为:To achieve the above objective, the specific technical solution of the coding and fault tolerance method of the array storage system of the present invention is that the array storage system is composed of n data nodes and k check nodes, and each node is in the same strip. Each bar contains m elements arranged vertically, m is a positive integer greater than 2, each element is represented by one bit, forming an array of m*(n+k), and the elements stored by the data node are arrays of data elements, The elements stored in the check node are arrays of check elements; the specific steps of the method are:
S1、设存储系统所需最大容错数量为err,则err,m,n需要满足如下条件:n>=m*err-err+1;S1, the maximum number of fault tolerances required for the storage system is err, then err, m, n need to meet the following conditions: n>=m*err-err+1;
S2、对数据元素阵列内每个元素进行二维编号,即用D(i,j)表示数据元素阵列的第i行第j个元素;对于校验元素阵列中的每个元素按照列优先进行一维编号,即用C(h)表示校验元素阵列中的第h个元素;基于已知的数据元素进行err轮的校验元素计算,每一轮进行n次计算,产生n个校验元素,校验元素的计算公式为: S2, two-dimensionally numbering each element in the data element array, that is, using D(i, j) to represent the i-th row and the j-th element of the data element array; for each element in the verification element array, the column priority is performed. One-dimensional numbering, that is, C(h) is used to represent the hth element in the array of check elements; based on the known data elements, the check element calculation of the err round is performed, n calculations are performed each round, and n checks are generated. The calculation formula for the element and the check element is:
Figure PCTCN2016110614-appb-000001
Figure PCTCN2016110614-appb-000001
以上式中,x、y表示第x轮的第y次码链部署,1<=x<=err,1<=y<=n,其中j可由公式(二)计算得出:In the above formula, x and y represent the yth code chain deployment of the xth round, 1<=x<=err,1<=y<=n, where j can be calculated by formula (2):
Figure PCTCN2016110614-appb-000002
Figure PCTCN2016110614-appb-000002
以上公式(二)中,%为取模运算,
Figure PCTCN2016110614-appb-000003
为上取整数;
In the above formula (2), % is the modulo operation.
Figure PCTCN2016110614-appb-000003
Take an integer for
每一次的计算所涉及的所有数据元素及校验元素构成一个校验链;All data elements and check elements involved in each calculation constitute a check chain;
S3、当有不大于err个节点发生错误时,该节点上所有元素称为失效元素,遍历所有校验链,寻找仅有一个失效元素的校验链,对该链条上所有非失效元素进行异或运算,其结果就是该失效元素的值,重复遍历校验链计算失效元素,直至找回所有失效元素。S3. When there is an error of no more than err nodes, all elements on the node are called invalid elements, traverse all check chains, and look for a check chain with only one failed element, and different non-failed elements on the chain Or the operation, the result is the value of the invalid element, repeating the traversal check chain to calculate the invalid element until all the failed elements are retrieved.
所述节点包括但不限于PC、服务器或磁盘。The nodes include, but are not limited to, a PC, a server, or a disk.
以上方法执行步骤S3可以看出,本方法针对的错误类型为节点错误,即一旦某节点出现任意错误就认为该节点的数据全部不再可靠或全部丢失,该方法适用于任意多节点的阵列式存储。The above method can be seen in step S3. The error type for the method is a node error, that is, once a node has an arbitrary error, the data of the node is considered to be no longer reliable or all lost. The method is applicable to any multi-node array. storage.
本发明的有益效果是提供了一种阵列式存储系统的编码容错方法,能够提高存储系统的可靠性,适于公司或机构等数据量大且对数据稳定性要求高的情况,能够广泛的应用于服务器系统中。The invention provides a coding fault tolerance method of an array storage system, which can improve the reliability of the storage system, and is suitable for a large amount of data such as a company or an organization and has high requirements on data stability, and can be widely applied. In the server system.
附图说明DRAWINGS
附图用来提供对本发明的进一步理解,并不构成对本发明的限制。The drawings are intended to provide a further understanding of the invention and are not intended to limit the invention.
图1为本发明适用的多节点存储系统的逻辑结构示意图。FIG. 1 is a schematic diagram showing the logical structure of a multi-node storage system to which the present invention is applicable.
具体实施方式 detailed description
下面结合实施例,对本发明的实施作进一步的描述。The implementation of the present invention will be further described below in conjunction with the embodiments.
如图1所示,一个阵列存储系统由n个数据节点和k个校验节点构成,每个节点可以是但不限于是一台PC、服务器或磁盘;将每个节点分为数目相等的多个存储区域,称为条块,每个节点上对应的条块组成条带;每个条带可以采用不同的存储方式。As shown in FIG. 1, an array storage system is composed of n data nodes and k check nodes, and each node may be, but not limited to, a PC, a server, or a disk; each node is divided into an equal number of The storage areas are called strips, and the corresponding strips on each node form strips; each strip can adopt different storage methods.
设某条带内,每个条块包含有m个元素,m为3,每个元素由一个比特表示;将3个元素纵向排列,则形成一个3*(n+k)的阵列。Let a stripe contain m elements, m is 3, each element is represented by one bit; and 3 elements are arranged vertically to form a 3*(n+k) array.
S1、设存储系统所需最大容错数量为err,则err,m,n需要满足如下条件:n>=m*err-err+1=3*3-3+1=7,取n为7;数据元素阵列的尺寸为3*7;假设数据元素阵列中各元素值如下所示:S1, the maximum number of fault tolerances required for the storage system is err, then err, m, n need to meet the following conditions: n>=m*err-err+1=3*3-3+1=7, taking n is 7; The size of the data element array is 3*7; assume that the values of the elements in the data element array are as follows:
11 11 00 00 11 00 11
00 11 00 11 00 00 00
11 11 11 00 11 11 11
S2、对数据节点的条块内每个元素进行二维编号,即用D(i,j)表示数据元素阵列的第i行第j个元素;对于校验元素阵列中的每个元素按照列优先进行一维编号,即用C(h)表示校验元素阵列中的第h个元素;基于已知的数据元素进行3轮的校验元素计算,每一轮进行7次计算,产生7个校验元素,采用公式(一)计算:S2, two-dimensionally numbering each element in the strip of the data node, that is, using D(i, j) to represent the i-th row and the j-th element of the data element array; for each element in the check element array, according to the column Priority is given to one-dimensional numbering, that is, C (h) is used to represent the h-th element in the check element array; 3 rounds of check element calculation is performed based on known data elements, and 7 calculations are performed each round, resulting in 7 The check element is calculated using the formula (1):
Figure PCTCN2016110614-appb-000004
Figure PCTCN2016110614-appb-000004
以上式中,x、y表示第x轮的第y次码链部署,1<=x<=err,1<=y<=n,
Figure PCTCN2016110614-appb-000005
为D(i,j)的异或运算,其中j可由公式(二)计算得出:
In the above formula, x and y represent the yth code chain deployment of the xth round, 1<=x<=err,1<=y<=n,
Figure PCTCN2016110614-appb-000005
Is the XOR operation of D(i,j), where j can be calculated by formula (2):
Figure PCTCN2016110614-appb-000006
Figure PCTCN2016110614-appb-000006
以上公式(二)中,%为取模运算,
Figure PCTCN2016110614-appb-000007
为上取整数;
In the above formula (2), % is the modulo operation.
Figure PCTCN2016110614-appb-000007
Take an integer for
详细地,各校验元素值计算过程如下:In detail, the calculation process of each check element value is as follows:
第x=1轮:第y=1次计算,Round x=1: y = 1 calculation,
Figure PCTCN2016110614-appb-000008
Figure PCTCN2016110614-appb-000008
Figure PCTCN2016110614-appb-000009
即C(1)=1,本次计算所涉及
Figure PCTCN2016110614-appb-000009
That is, C(1)=1, this calculation involves
的所有数据元素及校验元素构成一个校验链,即D(1,1)、D(2,2)、D(3,3)、C(1)构成一个校验链;All data elements and check elements form a check chain, that is, D(1,1), D(2,2), D(3,3), C(1) constitute a check chain;
第y=2次计算,The first y=2 calculations,
Figure PCTCN2016110614-appb-000010
Figure PCTCN2016110614-appb-000010
Figure PCTCN2016110614-appb-000011
即C(2)=1,本次计算D(1,2)、D(2,3)、D(3,4)、C(2)构成一个校验链;
Figure PCTCN2016110614-appb-000011
That is, C(2)=1, this calculation D(1,2), D(2,3), D(3,4), C(2) constitute a check chain;
依次得出校验元素C(3)=0+1+1=0;C(4)=0+0+1=1;C(5)=1+0+1=0;C(6)=0+0+1=1;C(7)=1+0+1=0以及校验链。The check elements C(3)=0+1+1=0 are successively obtained; C(4)=0+0+1=1; C(5)=1+0+1=0; C(6)= 0+0+1=1; C(7)=1+0+1=0 and the check chain.
第x=2轮:第y=1次计算Round x=2: y = 1 calculation
Figure PCTCN2016110614-appb-000012
Figure PCTCN2016110614-appb-000012
Figure PCTCN2016110614-appb-000013
即C(8)=0,本次计算所涉及的D(1,1)、D(2,7)、D(3,6)、C(8)构成一个校验链;
Figure PCTCN2016110614-appb-000013
That is, C(8)=0, D(1,1), D(2,7), D(3,6), and C(8) involved in this calculation constitute a check chain;
第y=2次计算,The first y=2 calculations,
Figure PCTCN2016110614-appb-000014
Figure PCTCN2016110614-appb-000014
Figure PCTCN2016110614-appb-000015
即C(9)=0,本次计算D(1,2)、 D(2,1)、D(3,7)、C(9)构成一个校验链;
Figure PCTCN2016110614-appb-000015
That is, C(9)=0, this calculation D(1,2), D(2,1), D(3,7), C(9) constitute a check chain;
依次得出校验元素C(10)=0+1+1=0;C(11)=0+0+1=1;C(12)=1+1+1=1;C(13)=0+0+0=0;C(14)=1+0+1=0以及校验链;The check element C(10)=0+1+1=0 is successively obtained; C(11)=0+0+1=1; C(12)=1+1+1=1; C(13)= 0+0+0=0; C(14)=1+0+1=0 and check chain;
第x=3轮:第y=1次计算Round x=3: y = 1 calculation
Figure PCTCN2016110614-appb-000016
Figure PCTCN2016110614-appb-000016
Figure PCTCN2016110614-appb-000017
即C(15)=0,本次计算所涉及的D(1,1)、D(2,3)、D(3,5)、C(15)构成一个校验链;
Figure PCTCN2016110614-appb-000017
That is, C(15)=0, D(1,1), D(2,3), D(3,5), and C(15) involved in this calculation constitute a check chain;
第y=2次计算,The first y=2 calculations,
Figure PCTCN2016110614-appb-000018
Figure PCTCN2016110614-appb-000018
Figure PCTCN2016110614-appb-000019
即C(16)=1,本次计算所涉及的D(1,2)、D(2,4)、D(3,6)、C(16)构成一个校验链;
Figure PCTCN2016110614-appb-000019
That is, C(16)=1, D(1,2), D(2,4), D(3,6), and C(16) involved in this calculation constitute a check chain;
依次得出校验元素C(17)=0+0+1=1;C(18)=0+0+1=1;C(19)=1+0+1=0;C(20)=0+0+1=1以及校验链;The check elements C(17)=0+0+1=1 are successively obtained; C(18)=0+0+1=1; C(19)=1+0+1=0; C(20)= 0+0+1=1 and check chain;
直至第x=err=3轮y=n=7次计算,Until the xth err=3 rounds y=n=7 calculations,
Figure PCTCN2016110614-appb-000020
Figure PCTCN2016110614-appb-000020
Figure PCTCN2016110614-appb-000021
即C(21)=0,本次计算所涉及的D(1,7)、D(2,2)、D(3,4)、C(21)构成一个校验链。
Figure PCTCN2016110614-appb-000021
That is, C(21)=0, D(1,7), D(2,2), D(3,4), and C(21) involved in this calculation constitute one check chain.
按列优先一维排列出校验元素C(h)阵列如下:The check element C(h) array is arranged in the first dimension by column first as follows:
C(1)C(1) C(4)C(4) C(7)C(7) C(10)C(10) C(13)C(13) C(16)C(16) C(19)C(19)
C(2)C(2) C(5)C(5) C(8)C(8) C(11)C(11) C(14)C(14) C(17)C(17) C(20)C(20)
C(3)C(3) C(6)C(6) C(9)C(9) C(12)C(12) C(15)C(15) C(18)C(18) C(21)C(21)
即:which is:
11 11 00 00 00 11 00
11 00 00 11 00 11 11
00 11 00 11 00 11 00
最终整个存储阵列如下所示:The final storage array is as follows:
11 11 00 00 11 00 11 11 11 00 00 00 11 00
00 11 00 11 00 00 00 11 00 00 11 00 11 11
11 11 11 00 11 11 11 00 11 00 11 00 11 00
S3、当有不大于err个节点发生错误时,该节点上所有元素称为失效元素;本实施例中,假设3个节点发生错误,如第1、3、9个节点发生错误,则1、3、9列上的数据全部丢失,即元素D(1,1)、D(2,1)、D(3,1),元素D(1,3)、D(2,3)、D(3,3),元素C(4)、C(5)、C(6)失效,如下表:S3. When there is an error of not more than err nodes, all the elements on the node are called invalid elements; in this embodiment, it is assumed that three nodes have errors, such as errors occur in the first, third, and ninth nodes, The data on columns 3 and 9 are all lost, that is, the elements D(1,1), D(2,1), D(3,1), the elements D(1,3), D(2,3), D( 3, 3), elements C (4), C (5), C (6) are invalid, as shown in the following table:
xx 11 xx 00 11 00 11 11 xx 00 00 00 11 00
xx 11 xx 11 00 00 00 11 xx 00 11 00 11 11
xx 11 xx 00 11 11 11 00 xx 00 11 00 11 00
数据恢复:遍历所有校验链,寻找仅有一个失效元素的校验链,对该链条上所有非失效元素进行异或运算,其结果就是该失效元素的值,重复遍历校验链计算失效元素,直至找回所有数据,过程如下所示:Data recovery: traverse all check chains, look for a check chain with only one failed element, and XOR the non-failed elements on the chain. The result is the value of the failed element. Repeat the traversal check chain to calculate the invalid element. Until all the data is retrieved, the process is as follows:
首先找到仅有一个失效元素的校验链D(1,3)、D(2,4)、D(3,5)、C(3),则D(1,3)=1^1^0=0;赋值如下表所示:First find the check chain D(1,3), D(2,4), D(3,5), C(3) with only one failed element, then D(1,3)=1^1^0 =0; Assignment is shown in the following table:
xx 11 00 00 11 00 11 11 xx 00 00 00 11 00
xx 11 xx 11 00 00 00 11 xx 00 11 00 11 11
xx 11 xx 00 11 11 11 00 xx 00 11 00 11 00
重复遍历校验链找到仅有一个失效元素的校验链D(1,2)、D(2,3)、D(3,4)、C(2),则D(2,3)=1^0^1=1;Repeat the traversal check chain to find the check chain D(1,2), D(2,3), D(3,4), C(2) with only one failed element, then D(2,3)=1 ^0^1=1;
继续重复遍历校验链找到仅有一个失效元素的校验链D(1,1)、D(2,7)、D(3,6)、C(8),则D(1,1)=0^1^0=1;Continue to repeat the traversal check chain to find the check chain D(1,1), D(2,7), D(3,6), C(8) with only one failed element, then D(1,1)= 0^1^0=1;
继续重复遍历校验链依次计算失效元素D(3,3)=1、D(2,1)=0、D(3,1)=1、C(4)=1、C(5)=0、C(6)=1,最终所有失效数据完全恢复。 Continue to repeat the traversal check chain to calculate the failed elements D(3,3)=1, D(2,1)=0, D(3,1)=1, C(4)=1, C(5)=0. , C(6)=1, and finally all the failed data is completely recovered.
本发明在当有不大于err个节点发生错误时,能够找回所有失效元素,有效提高存储系统的可靠性。The invention can recover all the failed elements when there is an error of not more than err nodes, thereby effectively improving the reliability of the storage system.
以上结合对本发明进行了示例性描述,显然本发明具体实现并不受上述方式的限制,只要采用了本发明的方法构思和技术方案进行的各种非实质性的改进,或未经改进将本发明的构思和技术方案直接应用于其它场合的,均在本发明的保护范围之内。 The present invention has been described in connection with the embodiments of the present invention, and it is obvious that the invention is not limited by the above-described manner, as long as various insubstantial improvements made by the method concept and technical solution of the present invention are adopted, or the present invention is not improved. The concept and technical solution of the invention are directly applicable to other occasions, and are all within the scope of the invention.

Claims (2)

  1. 一种阵列式存储系统的编码容错方法,所述阵列存储系统由n个数据节点和k个校验节点构成;每个节点同一个条带中的每个条块包含m个元素纵向排列,m为大于2的正整数,每个元素由一个比特表示,形成一个m*(n+k)的阵列,数据节点存储的元素为数据元素阵列,校验节点存储的元素为校验元素阵列;其特征在于:所述方法的具体步骤为:An encoding fault tolerance method for an array storage system, wherein the array storage system is composed of n data nodes and k check nodes; each of the same strips in each node includes m elements arranged longitudinally, m For a positive integer greater than 2, each element is represented by one bit, forming an array of m*(n+k), the elements stored by the data node are arrays of data elements, and the elements stored by the check node are arrays of check elements; The feature is that the specific steps of the method are:
    S1、设存储系统所需最大容错数量为err,则err,m,n需要满足如下条件:n>=m*err-err+1;S1, the maximum number of fault tolerances required for the storage system is err, then err, m, n need to meet the following conditions: n>=m*err-err+1;
    S2、对数据元素阵列内每个元素进行二维编号,即用D(i,j)表示数据元素阵列的第i行第j个元素;对于校验元素阵列中的每个元素按照列优先进行一维编号,即用C(h)表示校验元素阵列中的第h个元素;基于已知的数据元素进行err轮的校验元素计算,每一轮进行n次计算,产生n个校验元素,校验元素的计算公式为:S2, two-dimensionally numbering each element in the data element array, that is, using D(i, j) to represent the i-th row and the j-th element of the data element array; for each element in the verification element array, the column priority is performed. One-dimensional numbering, that is, C(h) is used to represent the hth element in the array of check elements; based on the known data elements, the check element calculation of the err round is performed, n calculations are performed each round, and n checks are generated. The calculation formula for the element and the check element is:
    Figure PCTCN2016110614-appb-100001
    Figure PCTCN2016110614-appb-100001
    以上式中,x、y表示第x轮的第y次码链部署,1<=x<=err,1<=y<=n,
    Figure PCTCN2016110614-appb-100002
    为D(i,j)的异或运算,其中j可由公式(二)计算得出:
    In the above formula, x and y represent the yth code chain deployment of the xth round, 1<=x<=err,1<=y<=n,
    Figure PCTCN2016110614-appb-100002
    Is the XOR operation of D(i,j), where j can be calculated by formula (2):
    Figure PCTCN2016110614-appb-100003
    Figure PCTCN2016110614-appb-100003
    以上公式(二)中,%为取模运算,
    Figure PCTCN2016110614-appb-100004
    为上取整数;
    In the above formula (2), % is the modulo operation.
    Figure PCTCN2016110614-appb-100004
    Take an integer for
    每一次的计算所涉及的所有数据元素及校验元素构成一个校验链;All data elements and check elements involved in each calculation constitute a check chain;
    S3、当有不大于err个节点发生错误时,该节点上所有元素称为失效元素,遍历所有校验链,寻找仅有一个失效元素的校验链,对该链条上所有非失效元素进行异或运算,其结果就是该失效元素的值,重复 遍历校验链计算失效元素,直至找回所有数据。S3. When there is an error of no more than err nodes, all elements on the node are called invalid elements, traverse all check chains, and look for a check chain with only one failed element, and different non-failed elements on the chain Or operation, the result is the value of the invalid element, repeated Traverse the check chain to calculate the invalidation element until all data is retrieved.
  2. 根据权利要求1所述的阵列式存储系统的编码容错方法,其特征在于:所述节点包括但不限于PC、服务器或磁盘。 The method of encoding fault tolerance of an arrayed storage system according to claim 1, wherein the nodes include, but are not limited to, a PC, a server, or a disk.
PCT/CN2016/110614 2016-08-07 2016-12-18 Coding fault-tolerant method for array-type storage system WO2018028107A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610640565.5A CN106254033B (en) 2016-08-07 2016-08-07 A kind of input tolerant for Chinese method of array storage system
CN201610640565.5 2016-08-07

Publications (1)

Publication Number Publication Date
WO2018028107A1 true WO2018028107A1 (en) 2018-02-15

Family

ID=58078062

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110614 WO2018028107A1 (en) 2016-08-07 2016-12-18 Coding fault-tolerant method for array-type storage system

Country Status (2)

Country Link
CN (1) CN106254033B (en)
WO (1) WO2018028107A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023279906A1 (en) * 2021-07-09 2023-01-12 深圳大普微电子科技有限公司 Data processing method and apparatus, device, and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074995A1 (en) * 2004-09-30 2006-04-06 International Business Machines Corporation System and method for tolerating multiple storage device failures in a storage system with constrained parity in-degree
CN102103533A (en) * 2011-02-25 2011-06-22 华中科技大学 Method for reconstructing single disk in double-disk fault-tolerance disk array
CN103593253A (en) * 2013-11-22 2014-02-19 华中科技大学 Vertical RAID-6 coding method based on exclusive or

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7020803B2 (en) * 2002-03-11 2006-03-28 Hewlett-Packard Development Company, Lp. System and methods for fault path testing through automated error injection
CN104111880B (en) * 2013-04-16 2016-03-02 华中科技大学 A kind of forms data dish inefficacy fast reconstructing method holding three dish inefficacy correcting and eleting codes
CN105812448A (en) * 2016-06-13 2016-07-27 青海师范大学 Erasure coding method of cloud storage system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074995A1 (en) * 2004-09-30 2006-04-06 International Business Machines Corporation System and method for tolerating multiple storage device failures in a storage system with constrained parity in-degree
CN102103533A (en) * 2011-02-25 2011-06-22 华中科技大学 Method for reconstructing single disk in double-disk fault-tolerance disk array
CN103593253A (en) * 2013-11-22 2014-02-19 华中科技大学 Vertical RAID-6 coding method based on exclusive or

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TANG, DAN ET AL.: "A Class of Array Erasure Codes with High Fault Tolerance", SCIENTIA SINICA INFORMATIONIS, vol. 46, no. 4, 13 April 2016 (2016-04-13), pages 523 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023279906A1 (en) * 2021-07-09 2023-01-12 深圳大普微电子科技有限公司 Data processing method and apparatus, device, and readable storage medium

Also Published As

Publication number Publication date
CN106254033B (en) 2019-07-19
CN106254033A (en) 2016-12-21

Similar Documents

Publication Publication Date Title
Xiang et al. Optimal recovery of single disk failure in RDP code storage systems
US11303302B2 (en) Erasure code calculation method
Greenan et al. Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs
CN101868785B (en) Generating a parallel recovery plan for a data storage system
WO2018072294A1 (en) Method for constructing check matrix and method for constructing horizontal array erasure code
CN104461781B (en) A kind of data block method for reconstructing based on correcting and eleting codes
US20140310571A1 (en) Local Erasure Codes for Data Storage
US10521304B1 (en) Multidimensional RAID
US20120017140A1 (en) Non-mds erasure codes for storage systems
CN101888398A (en) Data storage method based on network storage structure of (d, k) mole diagram
CN105353974B (en) A kind of two fault-tolerant coding methods for being applied to disk array and distributed memory system
CN109086000B (en) Three-fault-tolerant data layout method in RAID storage system
US10025512B2 (en) Distributed storage data recovery
US20150089328A1 (en) Flex Erasure Coding of Controllers of Primary Hard Disk Drives Controller
CN102843212B (en) Coding and decoding processing method and device
CN105808170A (en) RAID6 (Redundant Array of Independent Disks 6) encoding method capable of repairing single-disk error by minimum disk accessing
CN104866243A (en) RAID-6 transverse and oblique check encoding and decoding method for optimizing input/output load
WO2018028107A1 (en) Coding fault-tolerant method for array-type storage system
CN111125014B (en) Construction method of flexible partial repeat code based on U-shaped design
US6792391B1 (en) Method and system for three disk fault tolerance in a disk array
CN108614749A (en) A kind of data processing method and device
WO2018119976A1 (en) Efficient data layout optimization method for data warehouse system
US7103716B1 (en) RAID 6 disk array with prime number minus one disks
US20070006019A1 (en) Data storage system
DE112019001968T5 (en) COMMON CORRECTION LOGIC FOR HIGH AND LOW RANDOM BIT ERROR RATES

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16912561

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16912561

Country of ref document: EP

Kind code of ref document: A1