KR101731832B1

KR101731832B1 - Method and Apparatus of Encoding and Decoding for Data Recovery in Storage System

Info

Publication number: KR101731832B1
Application number: KR1020160013964A
Authority: KR
Inventors: 송홍엽; 김정현
Original assignee: 연세대학교 산학협력단
Priority date: 2016-02-04
Filing date: 2016-02-04
Publication date: 2017-05-02

Abstract

The present invention relates to an encoding and decoding method for recovering data lost in a storage system, and to an apparatus therefor. The storage system supports a function of recovering nodes that stores data in a distributed storage environment when the nodes are lost. More specifically, the present invention relates to an encoding and decoding method for recovering data lost in a distributed storage system, and to an apparatus therefor. The encoding method for recovering data lost in a storage system according to one embodiment of the present invention comprises: a division step of dividing data to be stored into a predetermined number of data blocks; an encoding step of generating information symbols according to the divided data blocks, selecting two different data blocks among the divided data blocks, and generating parity symbols by using the selected two data blocks; and a storage step of storing each of the generated symbols in each node of a storage system. The encoding step enables to select two different data blocks among the divided data blocks and to encode the parity symbols by using the selected two data blocks in order to maintain the number of the nodes required for recovering symbols stored in the lost nodes and the number of the parity symbols generated in the encoding step at a predetermined limit.

Description

[0001] The present invention relates to an encoding and decoding method and apparatus for recovering data loss in a storage system,

본 발명은 저장 시스템에서 손실된 데이터를 복구하기 위한 부호화 및 복호화 방법 및 그 장치와, 분산 저장 환경에서 데이터를 저장한 노드가 손실될 경우 이를 복구하는 기능을 지원하는 저장 시스템에 관한 것이다. 특히 본 발명은 분산 저장 시스템에서의 데이터 손실 복구를 위한 부호화 및 복호화 방법과 그에 관한 장치에 관한 것이다.The present invention relates to a coding and decoding method and apparatus for recovering lost data in a storage system and a storage system supporting a function of recovering a lost node in a distributed storage environment. In particular, the present invention relates to an encoding and decoding method for data loss recovery in a distributed storage system and an apparatus therefor.

빅 데이터의 안전한 저장과 복구를 위한 클라우드 저장 시스템이 도입되어 활발하게 이용되고 있다. 예를 들어 전세계적으로 검색 및 자료 제공 등의 서비스를 제공하는 Google, Facebook, Amazon, Microsoft 등의 회사들은 막대한 양의 자료를 저장하기 위해 클라우드 저장 시스템 즉 분산 저장 시스템을 이용하고 있다. A cloud storage system for secure storage and recovery of Big Data has been introduced and is actively being used. For example, companies like Google, Facebook, Amazon, and Microsoft, which provide services such as search and data delivery around the world, use cloud storage systems, or distributed storage systems, to store vast amounts of data.

이와 같은 분산 저장 시스템에 저장되고 유통되는 정보의 양은 서비스 이용자의 증가와 컨텐츠의 양질화 등의 요인으로 인하여 빠른 속도로 증가하고 있다. 그런데 분산 저장 시스템에 있어서는 분산 노드 내 장비의 결함 또는 소프트웨어, 하드웨어적인 업데이트 등과 같은 요인으로 인하여 데이터의 손실이 발생하게 된다. 따라서 이러한 데이터 손실에 대응하여 손실된 데이터를 복구하기 위하여, 다양한 종류의 코딩 기법이 개발되어 사용되고 있다.The amount of information stored and distributed in the distributed storage system is rapidly increasing due to factors such as the increase of service users and the quality of contents. However, in a distributed storage system, data is lost due to a defect in the equipment in the distributed node, software, or hardware update. Therefore, various types of coding techniques have been developed and used to recover lost data in response to such data loss.

예를 들어 분산 저장 시스템에서 손실된 데이터를 복구하는 가장 기본적인 기존 부호화 방법으로 반복(Repetition) 부호화 방법이 있다. 이 방법에서는 데이터를 여러 개의 동일한 복사본을 만들어 여러 분산 노드에 저장하여, 복사된 노드들 중 어느 하나가 손실되더라도 나머지 노드를 이용하여 손실된 노드를 복구할 수 있도록 하는 방법이다.For example, there is a repetition coding method as the most basic existing coding method for recovering lost data in a distributed storage system. In this method, multiple identical copies of data are created and stored in multiple distributed nodes, so that any one of the duplicated nodes can be recovered using the remaining nodes, even if one of them is lost.

데이터 손실 복구를 위한 부호화 기법에서 중요한 지표는 분산 저장 시스템의 노드 중 일부가 손실되었을 때 최소 몇 개의 다른 노드를 이용하여 손실된 노드들을 복구할 수 있는지를 나타내는 로컬리티(Locality) 값이다. 또한 위와 같은 특정 로컬리티를 지원하기 위한 데이터 소실 복구 코딩에 의하여 코딩된 데이터의 용량 또는 부호화율(코드 레이트, Code Rate) 역시 중요한 지표이다.An important index in the coding scheme for data loss recovery is a locality value indicating whether at least some other nodes can be used to recover lost nodes when some of the nodes of the distributed storage system are lost. Also, the capacity or coding rate (coded rate) of coded data by data loss recovery coding to support the above specific localities is also an important index.

그런데 기존의 데이터 손실 복구를 위한 부호화 및 복호화 기법들은 하나의 특정된 개수의 노드가 손실된 경우 이를 복구하기 위해 필요한 노드의 개수인 로컬리티를 보장하는 방법만을 제안할 뿐 두 개 이상의 특정된 개수의 노드 손실에 대해 통합적으로 최적화된 로컬리티를 보장하지 못하였다는 한계점이 있었다. 또한 기존의 데이터 손실 복구를 위한 부호화 및 복호화 기법들은 두 개 이상의 특정된 개수의 노드 손실과 함께 부호화율를 최적화하는 기법을 고려하지 못하고 있다는 한계점이 있다.However, existing coding and decoding schemes for data loss recovery suggest only a method of guaranteeing the locality, which is the number of nodes required to recover a specified number of nodes when they are lost, There is a limitation in that it can not guarantee the integrated localizedness for the node loss. In addition, existing coding and decoding techniques for data loss recovery have a limit in that they can not consider a coding rate optimization method with two or more specified number of node losses.

공개특허공보 제10-2011-0051527호 (2011.05.18.)Japanese Patent Application Laid-Open No. 10-2011-0051527 (May 18, 2011)

본 발명이 해결하고자 하는 과제는, 두 개 이상의 특정된 개수의 노드 손실 시 손실된 노드를 복구하기 위하여 이용하여야 할 노드의 수를 소정의 개수 이내로 유지하면서, 즉 소정의 조인트 로컬리티를 보장하면서, 동시에 부호화율을 최적화하는 부호화 및 복호화 기법 및 그에 따른 분산 저장 시스템을 제공하는 것이다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a method and apparatus for maintaining a node within a predetermined number of nodes to be used for recovering a node lost in the case of two or more specified number of node losses, And at the same time to provide a coding and decoding technique for optimizing a coding rate and a distributed storage system accordingly.

상기 과제를 해결하기 위해 본 발명의 일 유형에 따른 저장 시스템에서의 데이터 손실 복구를 위한 부호화 방법은, 저장 대상 데이터를 미리 정해진 일정한 개수의 데이터 블록으로 분할하는 단계; 상기 분할한 데이터 블록에 따른 정보 심볼(Information Symbol)을 생성하고, 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 선택하고 상기 선택한 두 개의 데이터 블록을 이용하여 패리티 심볼(Parity Symbol)를 생성하는 부호화 단계; 및 상기 생성한 각 심볼을 저장 시스템의 각 노드에 저장하는 저장 단계를 포함할 수 있다.According to an aspect of the present invention, there is provided a method for restoring data loss in a storage system, the method comprising: dividing data to be stored into a predetermined number of data blocks; Generating an information symbol according to the divided data block, selecting two different data blocks from the divided data blocks, and using the selected two data blocks to generate a parity symbol (Parity Symbol) Encoding step; And a storing step of storing each symbol generated in each node of the storage system.

여기서 상기 부호화 단계는 상기 분할한 각 데이터 블록을 나타내는 입력 심볼을 각 원소로 하는 입력 행렬과, 원소의 값이 1 또는 0 중 어느 하나이고 각 열벡터에 포함된 1의 개수가 2개 이하인 생성 행렬(Generator Matrix)을 곱하여 출력 행렬을 생성하고, 상기 출력 행렬에서 하나의 상기 입력 심볼에 대응하는 원소는 상기 정보 심볼로, 상기 출력 행렬에서 두 개의 상기 입력 심볼의 합에 대응하는 원소는 상기 패리티 심볼로, 각각 생성할 수 있다.Wherein the encoding step comprises: inputting an input matrix having input symbols representing each of the divided data blocks as an element; generating a matrix having an element value of 1 or 0 and the number of 1s included in each column vector is 2 or less; Wherein an element corresponding to one input symbol in the output matrix is the information symbol, and an element corresponding to a sum of two input symbols in the output matrix is multiplied by the parity symbol (Generator Matrix) Respectively.

여기서 상기 생성 행렬의 열벡터는 상기 입력 행렬의 원소 수에 따른 길이를 가지고, 상기 생성 행렬은 상기 입력 행렬의 원소 수 크기의 단위 행렬과, 행 순서가 상호 인접한 두 개의 원소가 1인 모든 열벡터를 연결한 행렬인 것을 특징으로 할 수 있다.Wherein the generator matrix has a unit matrix of the number of elements of the input matrix and all column vectors having two elements of which the row sequences are adjacent to each other is 1, wherein the generator matrix has a length according to the number of elements of the input matrix, And the matrix is a matrix in which

여기서 상기 생성 행렬의 열벡터는 상기 입력 행렬의 원소 수에 따른 길이를 가지고, 상기 생성 행렬은 상기 입력 행렬의 원소 수 크기의 단위 행렬과, 행 순서가 상호 인접한 두 개의 원소가 1인 5개의 열벡터와, 상기 5개의 열벡터 중 어느 하나의 열벡터에 포함된 어느 한 원소와 상기 5개의 열벡터에 포함되지 않은 어느 한 원소가 각각 1인 모든 조합의 열벡터들을 연결한 행렬인 것을 특징으로 할 수 있다.Wherein the generating matrix has a length corresponding to the number of elements of the input matrix and the generating matrix has a unit matrix of the number of elements of the input matrix and five columns of two elements of which row- And a matrix in which all the combinations of the column vectors of any one element included in any one column vector of the five column vectors and one element not included in the five column vectors are connected, can do.

여기서 상기 생성 행렬의 열벡터는 상기 입력 행렬의 원소 수에 따른 길이를 가지고, 상기 생성 행렬은 상기 입력 행렬의 원소 수 크기의 단위 행렬과, 행 순서가 상호 인접한 두 개의 원소가 1인 1개의 열벡터와, 상기 1개의 열벡터에 포함된 어느 한 원소와 상기 1개의 열벡터에 포함되지 않은 어느 한 원소가 각각 1인 모든 조합의 열벡터들을 연결한 행렬인 것을 특징으로 할 수 있다.Wherein the column vector of the generator matrix has a length corresponding to the number of elements of the input matrix and the generator matrix has a unit matrix of the number of elements of the input matrix and a column of two And a matrix in which all the combinations of column vectors in which one element included in the one column vector and one element not included in the one column vector are 1 are connected.

여기서 상기 저장 시스템의 노드는 손실될 경우, 상기 손실된 노드 이외의 나머지 상기 노드를 이용하여 복구될 수 있고, 상기 부호화 단계는, 상기 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 상기 노드의 개수와 상기 부호화 단계에서 생성하는 상기 패리티 심볼의 개수를 미리 설정된 한도로 유지하기 위하여, 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 선택하고 상기 선택한 두 개의 데이터 블록을 이용하여 상기 패리티 심볼을 부호화할 수 있다.Wherein, when a node of the storage system is lost, the node can be recovered by using the remaining nodes other than the lost node, and the encoding step includes the steps of: The method comprising: selecting two different data blocks from the divided data blocks to maintain the number of parity symbols generated in the encoding step to a predetermined limit; Can be encoded.

여기서 상기 부호화 단계는 모든 상기 데이터 블록에 대하여 각 상기 데이터 블록이 2개의 상기 패리티 심볼 만을 생성하기 위하여 2번 만 이용되도록 두 개의 상기 데이트 블록을 선택 및 이용하여, 각 상기 패리티 심볼을 부호화할 수 있다.Here, the encoding step may code each of the parity symbols by selecting and using two data blocks so that each data block is used only twice to generate two parity symbols for all the data blocks .

여기서 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 4개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있는 것을 특징으로 할 수 있다.Here, if one node corresponding to any one of the information symbols of the storage system is lost, the lost node may be recovered using two nodes other than the lost node, When two nodes corresponding to any two information symbols of the nodes of the system are lost, the lost two nodes can be recovered by using four nodes other than the lost node .

여기서 상기 저장 대상 데이터가 k개의 상기 데이터 블록으로 분할되는 경우, 상기 부호화 단계는 상기 정보 심볼은 k 개 생성하고, k 개의 상기 데이터 블록을 2번씩 모두 이용하여 k 개의 상기 패리티 심볼을 생성하는 것을 특징으로 할 수 있다.Here, when the data to be stored is divided into k data blocks, the coding step generates k pieces of the information symbols and k pieces of the parity symbols using k pieces of the data blocks twice .

여기서 상기 부호화 단계는, 상기 데이터 블록 중 제1 데이터 블록, 제2 데이터 블록, 제3 데이터 블록, 제4 데이터블록을 선정하는 과정, 상기 제1 데이터 블록과 상기 제2 데이터 블록의 쌍, 상기 제2 데이터 블록과 상기 제3 데이터 블록의 쌍, 상기 제3 데이터 블록과 상기 제4 데이터 블록의 쌍을 각각 이용하여 상기 쌍 별로 상기 패리티 심볼을 부호화하는 과정, 상기 데이터 블록 중 성기 선정된 데이터 블록을 제외한 나머지 데이터 블록에 대하여, 상기 나머지 데이터 블록 별로 각 상기 나머지 데이터 블록과 상기 제1 데이터 블록의 쌍, 각 상기 나머지 데이터 블록과 상기 제2 데이터 블록의 쌍을 각각 이용하여, 상기 쌍 별로 상기 패티리 심볼을 부호화하는 과정을 포함할 수 있다.Wherein the encoding step includes the steps of: selecting a first data block, a second data block, a third data block, and a fourth data block among the data blocks; a pair of the first data block and the second data block; Coding the parity symbols for each pair using a pair of the first data block, the second data block and the third data block, and the third data block and the fourth data block, For each of the remaining data blocks, a pair of each of the remaining data blocks and the first data block, each of the remaining data blocks and the second data block for each of the remaining data blocks, And encoding the symbol.

여기서 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 3개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 4개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있는 것을 특징으로 할 수 있다.Here, if one node corresponding to any one of the information symbols of the storage system is lost, the lost node may be recovered using two nodes other than the lost node, When two nodes corresponding to any two information symbols of the nodes of the system are lost, it is possible to recover the two lost nodes using three nodes other than the lost node, If one of the nodes of the storage system is lost, the lost node can be recovered using two nodes other than the lost node, and if any two nodes of the storage system are lost, It is possible to recover the lost two nodes by using four nodes other than the lost node.

여기서 상기 저장 대상 데이터가 k 개의 상기 데이터 블록으로 분할되는 경우, 상기 부호화 단계는 상기 정보 심볼은 k 개 생성하고, 2k - 5 개의 상기 패리티 심볼을 생성할 수 있다.Here, when the data to be stored is divided into k data blocks, the encoding step may generate k pieces of the information symbols and generate 2k-5 pieces of the parity symbols.

여기서 상기 부호화 단계는, 상기 데이터 블록 중 제1 데이터 블록, 제2 데이터 블록을 선정하는 과정, 상기 제1 데이터 블록과 상기 제2 데이터 블록의 쌍을 이용하여 상기 패리티 심볼을 부호화하는 과정, 상기 데이터 블록 중 성기 선정된 데이터 블록을 제외한 나머지 데이터 블록에 대하여, 상기 나머지 데이터 블록 별로 각 상기 나머지 데이터 블록과 상기 제1 데이터 블록의 쌍, 각 상기 나머지 데이터 블록과 상기 제2 데이터 블록의 쌍을 각각 이용하여, 상기 쌍 별로 상기 패티리 심볼을 부호화하는 과정을 포함할 수 있다.The encoding step may include a step of selecting a first data block and a second data block of the data block, a step of encoding the parity symbol using a pair of the first data block and the second data block, The remaining data blocks and the first data blocks, and the remaining data blocks and the second data blocks are used for the remaining data blocks except for the data block selected as the block in the remaining data blocks And encoding the patterned symbols by the pair.

상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 3개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 3개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있는 것을 특징으로 할 수 있다.If one node corresponding to any one of the information symbols of the storage system is lost, the lost node can be recovered using two nodes other than the lost node, If two nodes corresponding to any two information symbols among the nodes of the storage system are lost, the lost two nodes can be recovered using three nodes other than the lost node, When one arbitrary node is lost, the lost node can be recovered using two nodes other than the lost node, and if any two nodes of the storage system are lost, It is possible to recover the lost two nodes by using three nodes other than the lost node.

여기서 상기 저장 대상 데이터가 k 개의 상기 데이터 블록으로 분할되는 경우, 상기 부호화 단계는 상기 정보 심볼은 k 개 생성하고, 2k - 3 개의 상기 패리티 심볼을 생성할 수 있다.Here, when the data to be stored is divided into k data blocks, the encoding step may generate k pieces of the information symbols and generate 2k-3 pieces of the parity symbols.

상기 과제를 해결하기 위해 본 발명의 일 유형에 따른 저장 시스템에서의 데이터 손실 복구를 위한 복호화 방법은, 저장 대상 데이터를 미리 정해진 일정한 개수로 분할한 데이터 블록에 대하여 생성된 정보 심볼(Information Symbol) 및 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 이용하여 생성된 패리티 심볼(Parity Symbol)이 각 노드에 저장된 저장 시스템의 노드들 중 어느 노드가 손실된 경우, 상기 손실된 노드에 저장된 심볼을 복구하기 위한 복호화 방법에 있어서, 상기 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 상기 노드의 개수와 상기 부호화기가 생성하는 상기 패리티 심볼의 개수를 미리 설정된 한도로 유지하기 위하여, 상기 손실된 노드 이외의 나머지 상기 노드들 중에서 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 적어도 하나 이상의 상기 노드를 선택하는 노드 선택 단계; 및 상기 선택한 노드를 이용하여 상기 손실된 노드에 저장된 심볼을 복구하는 복구 단계를 포함할 수 있다.According to an aspect of the present invention, there is provided a decoding method for recovering data loss in a storage system according to one aspect of the present invention includes an information symbol generated for a data block obtained by dividing data to be stored into a predetermined number, When a parity symbol generated by using two different data blocks among the divided data blocks is lost in a node of a storage system stored in each node, The method comprising the steps of: storing a number of nodes needed to recover a symbol stored in the lost node and a number of the parity symbols generated by the encoder; To calculate the symbol stored in the node among the remaining nodes Node selection step of selecting is at least one or more of the nodes as possible to obtain the symbol stored in the lost node; And recovering the symbol stored in the lost node using the selected node.

여기서 상기 저장 시스템은 모든 상기 데이터 블록에 대하여 각 상기 데이터 블록이 2개의 상기 패리티 심볼 만을 생성하기 위하여 2번 만 이용되도록 두 개의 상기 데이트 블록을 선택 및 이용하여 부호화된 상기 패리티 심볼을 각 상기 노드에 저장하고, 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 노드 선택 단계는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 2개를 선택하고, 상기 복구 단계는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구하고, 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 노드 선택 단계는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 4개를 선택하고, 상기 복구 단계는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구하는 것을 특징으로 할 수 있다.Wherein the storage system is configured to use the parity symbol encoded by selecting and using two data blocks so that each data block is used only twice to generate two parity symbols for all the data blocks, And when one node corresponding to any one of the information symbols of the nodes of the storage system is lost, the node selecting step selects the symbol stored in the node among the remaining nodes other than the lost node And selecting the two nodes capable of acquiring the symbol stored in the lost node, the recovering step restores the lost node by using the selected node, When two nodes corresponding to the two information symbols of the node < RTI ID = 0.0 > The selecting step selects the four nodes capable of operating the symbol stored in the node among the remaining nodes other than the lost node to obtain the symbol stored in the lost node, And recovering the lost node by using the node.

여기서 상기 저장 시스템은 상기 데이터 블록 중 제1 데이터 블록, 제2 데이터 블록, 제3 데이터 블록, 제4 데이터블록을 선정하는 과정, 상기 제1 데이터 블록과 상기 제2 데이터 블록의 쌍, 상기 제2 데이터 블록과 상기 제3 데이터 블록의 쌍, 상기 제3 데이터 블록과 상기 제4 데이터 블록의 쌍을 각각 이용하여 상기 쌍 별로 상기 패리티 심볼을 부호화하는 과정, 상기 데이터 블록 중 성기 선정된 데이터 블록을 제외한 나머지 데이터 블록에 대하여, 상기 나머지 데이터 블록 별로 각 상기 나머지 데이터 블록과 상기 제1 데이터 블록의 쌍, 각 상기 나머지 데이터 블록과 상기 제2 데이터 블록의 쌍을 각각 이용하여, 상기 쌍 별로 상기 패티리 심볼을 부호화하는 과정에 의하여 부호화된 상기 패리티 심볼을 각 상기 노드에 저장하고, 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 노드 선택 단계는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 2개를 선택하고, 상기 복구 단계는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구하고, 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 노드 선택 단계는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 3개를 선택하고, 상기 복구 단계는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구하는 것을 특징으로 할 수 있다.Wherein the storage system is configured to select a first data block, a second data block, a third data block, and a fourth data block among the data blocks, a pair of the first data block and the second data block, Encoding the parity symbols for each pair using a pair of the data block and the third data block, the third data block and the fourth data block, For each of the remaining data blocks, a pair of the remaining data block and the first data block, each of the remaining data blocks and the second data block for each of the remaining data blocks, And storing the parity symbols encoded in the encoding process in each node, When one node corresponding to any one of the information symbols is lost, the node selecting step calculates the symbol stored in the node among the remaining nodes other than the lost node, and stores the symbol stored in the lost node Selecting one of the two nodes capable of acquiring the symbol, restoring the lost node by using the selected node, and selecting a node corresponding to any two of the information symbols of the nodes of the storage system The node selecting step comprises the steps of: computing a symbol stored in the node, which is stored in the node, of the remaining nodes other than the lost node, And the recovery step restores the lost node by using the selected node . &Lt; / RTI >

여기서 상기 저장 시스템은 상기 데이터 블록 중 제1 데이터 블록, 제2 데이터 블록을 선정하는 과정, 상기 제1 데이터 블록과 상기 제2 데이터 블록의 쌍을 이용하여 상기 패리티 심볼을 부호화하는 과정, 상기 데이터 블록 중 성기 선정된 데이터 블록을 제외한 나머지 데이터 블록에 대하여, 상기 나머지 데이터 블록 별로 각 상기 나머지 데이터 블록과 상기 제1 데이터 블록의 쌍, 각 상기 나머지 데이터 블록과 상기 제2 데이터 블록의 쌍을 각각 이용하여, 상기 쌍 별로 상기 패티리 심볼을 부호화하는 과정에 의하여 부호화된 상기 패리티 심볼을 각 상기 노드에 저장하고, 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 노드 선택 단계는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 2개를 선택하고, 상기 복구 단계는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구하고, 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 노드 선택 단계는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 3개를 선택하고, 상기 복구 단계는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구하는 것을 특징으로 할 수 있다.Here, the storage system may include the steps of selecting a first data block and a second data block of the data block, encoding the parity symbol using a pair of the first data block and the second data block, A pair of the remaining data blocks and the first data blocks and a pair of the remaining data blocks and the second data blocks for the remaining data blocks, , Storing the parity symbols encoded by coding the parity symbol on the pair basis in each node, and when one node corresponding to any one of the information symbols of the storage system is lost , The node selecting step may include a step of selecting a node among the remaining nodes other than the lost node The node selecting two nodes capable of calculating the symbol stored in the lost node and obtaining the symbol stored in the lost node, and the recovering step restores the lost node using the selected node, When two nodes corresponding to any two of the information symbols of the nodes of the system are lost, the node selecting step calculates the symbols stored in the node among the remaining nodes other than the lost node, The node selecting three nodes capable of acquiring the symbol stored in the node, and the restoring step restores the lost node by using the selected node.

상기 과제를 해결하기 위해 본 발명의 일 유형에 따른 저장 시스템에서의 데이터 손실 복구를 위한 부호화 장치는, 저장 시스템에 저장할 저장 대상 데이터를 미리 정해진 일정한 개수로 분할한 데이터 블록에 따른 정보 심볼(Information Symbol)을 생성하고, 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 선택하고 상기 선택한 두 개의 데이터 블록을 이용하여 패리티 심볼(Parity Symbol)을 생성하는 부호화부; 및 상기 생성한 각 심볼을 상기 저장 시스템의 각 노드에 저장하는 데이터 처리부;를 포함하고, 상기 부호화부는 상기 저장 시스템의 노드가 손실될 경우 상기 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 상기 노드의 개수와 상기 부호화기가 생성하는 상기 패리티 심볼의 개수를 미리 설정된 한도로 유지하기 위하여, 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 선택하고 상기 선택한 두 개의 데이터 블록을 이용하여 상기 패리티 심볼을 부호화하는 것을 특징으로 할 수 있다.According to an aspect of the present invention, there is provided an encoding apparatus for recovering data loss in a storage system, the apparatus comprising: a storage unit for storing data to be stored in a storage system; An encoding unit for generating a parity symbol using the selected two data blocks, selecting two different data blocks from the divided data blocks, and generating a parity symbol using the selected two data blocks; And a data processing unit for storing the generated symbols in each node of the storage system, wherein when the node of the storage system is lost, the encoding unit updates the symbols stored in the lost node, And the number of parity symbols generated by the encoder is set to a predetermined limit, the data block selecting unit selects two different data blocks from the divided data blocks, and uses the selected two data blocks to generate the parity symbol And a second encoding unit.

상기 과제를 해결하기 위해 본 발명의 일 유형에 따른 저장 시스템에서의 데이터 손실 복구를 위한 복호화 장치는, 저장 대상 데이터를 미리 정해진 일정한 개수로 분할한 데이터 블록에 대하여 생성된 정보 심볼(Information Symbol) 및 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 이용하여 생성된 패리티 심볼(Parity Symbol)이 각 노드에 저장된 저장 시스템의 노드들 중 어느 노드가 손실된 경우, 상기 손실된 노드에 저장된 심볼을 복구하기 위한 복호화 장치에 있어서, 상기 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 상기 노드의 개수와 상기 부호화기가 생성하는 상기 패리티 심볼의 개수를 미리 설정된 한도로 유지하기 위하여, 상기 손실된 노드 이외의 나머지 상기 노드들 중에서 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 적어도 하나 이상의 상기 노드를 선택하는 노드 선택부; 및 상기 선택한 노드를 이용하여 상기 손실된 노드에 저장된 심볼을 복구하는 복구부를 포함할 수 있다.According to one aspect of the present invention, there is provided a decoding apparatus for recovering data loss in a storage system, comprising: an information symbol generated for a data block obtained by dividing data to be stored into a predetermined number; When a parity symbol generated by using two different data blocks among the divided data blocks is lost in a node of a storage system stored in each node, Wherein the number of nodes required to recover a symbol stored in the lost node and the number of parity symbols generated by the encoder are maintained at a predetermined limit, To calculate the symbol stored in the node among the remaining nodes The stored in the lost-node selection unit that selects at least one or more of the nodes as possible to obtain the symbols; And a recovery unit for recovering the symbols stored in the lost node using the selected node.

본 발명에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 부호화 및 복호화 방법에 의하면, 두 개 이상의 특정된 개수의 노드 손실 시 손실된 노드를 복구하기 위하여 이용하여야 할 노드의 수를 소정의 개수 이내로 유지하면서, 즉 소정의 조인트 로컬리티를 보장하면서, 동시에 부호화율을 최적화할 수 있는 효과가 있다.According to the encoding and decoding method for recovering data loss in the distributed storage system according to the present invention, the number of nodes to be used for restoring lost nodes in the case of two or more specified number of node losses is maintained within a predetermined number The coding rate can be optimized at the same time while guaranteeing a predetermined joint locality.

도 1a 내지 도1c는 기존의 분산 저장 시스템을 설명하기 위한 참고도이다.
도 2는 기존의 반복 부호화 방법이 동작하는 방식을 나타내는 참고도이다.
도 3은 본 발명의 일 실시예에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 부호화 방법의 흐름도이다.
도 4는 본 발명에 따른 링 코드 기법에 따른 부호화 방법을 설명하기 위한 참고도이다.
도 5는 본 발명에 따른 링 코드 기법에 따른 생성 행렬을 나타내는 참고도이다.
도 6은 부호화율을 최적화하는 링 코드 기법의 특징을 설명하기 위한 참고도이다.
도 7은 본 발명에 따른 크라운 코드 기법에 따른 부호화 방법을 설명하기 위한 참고도이다.
도 8은 본 발명에 따른 크라운 코드 기법에 따른 생성 행렬을 나타내는 참고도이다.
도 9는 부호화율을 최적화하는 크라운 코드 기법의 특징을 설명하기 위한 참고도이다.
도 10은 본 발명에 따른 티아라 코드 기법에 따른 부호화 방법을 설명하기 위한 참고도이다.
도 11은 본 발명에 따른 티아라 코드 기법에 따른 생성 행렬을 나타내는 참고도이다.
도 12는 본 발명의 일 실시예에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 복호화 방법의 흐름도이다.
도 13은 본 발명의 일 실시예에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 부호화 장치의 블록도이다.
도 14는 본 발명의 일 실시예에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 복호화 장치의 블록도이다.
도 15는 본 발명의 일 실시예에 따른 부호화 장치 및 복호화 장치를 포함하는 분산 저장 시스템을 나타내는 참고도이다.
도 16은 본 발명의 일 실시예에 따른 노드 관리 서버를 포함하는 분산 저장 시스템을 나타내는 참고도이다.1A to 1C are reference views for explaining an existing distributed storage system.
FIG. 2 is a reference diagram showing a manner in which a conventional iterative encoding method operates.
3 is a flowchart of a method for recovering data loss in a distributed storage system according to an embodiment of the present invention.
4 is a reference diagram for explaining a coding method according to the ring code technique according to the present invention.
5 is a reference diagram showing a generation matrix according to a ring code technique according to the present invention.
6 is a reference diagram for explaining features of the ring code technique for optimizing the coding rate.
7 is a reference diagram for explaining a coding method according to the crown code technique according to the present invention.
8 is a reference diagram showing a generation matrix according to the crown code technique according to the present invention.
9 is a reference diagram for explaining a feature of the crown code technique for optimizing the coding rate.
FIG. 10 is a reference diagram for explaining a coding method according to the tiara code technique according to the present invention.
11 is a reference diagram showing a generation matrix according to the tiara code technique according to the present invention.
12 is a flowchart of a decoding method for data loss recovery in a distributed storage system according to an embodiment of the present invention.
13 is a block diagram of an encoding apparatus for data loss recovery in a distributed storage system according to an embodiment of the present invention.
14 is a block diagram of a decoding apparatus for recovering data loss in a distributed storage system according to an embodiment of the present invention.
15 is a reference view showing a distributed storage system including an encoding apparatus and a decoding apparatus according to an embodiment of the present invention.
16 is a reference view showing a distributed storage system including a node management server according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 우선 각 도면의 구성요소들에 참조 부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. 또한, 이하에서 본 발명의 바람직한 실시예를 설명할 것이나, 본 발명의 기술적 사상은 이에 한정하거나 제한되지 않고 당업자에 의해 변형되어 다양하게 실시될 수 있음은 물론이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, the preferred embodiments of the present invention will be described below, but it is needless to say that the technical idea of the present invention is not limited thereto and can be variously modified by those skilled in the art.

빅 데이터 시대가 도래함에 따라, 대용량의 빅 데이터를 저장하기 위하여 네트워크 상의 여러 개의 노드들을 이용하는 분산 저장 시스템이 개발되어 이용되고 있다. 분산 저장 시스템은 원본 데이터를 여러 개의 블록들로 나눈 후 부호화를 통해 분산된 클라우드의 노드에 각 저장하고, 원본 데이터가 필요한 경우 각 노드들에 저장된 데이터들을 취합하여 원본 데이터를 획득하는 저장 시스템을 의미한다. As the age of big data comes, a distributed storage system using several nodes on the network has been developed and used to store a large amount of big data. Distributed storage system is a storage system that divides original data into several blocks, stores them in nodes of distributed cloud through encoding, and acquires original data by collecting data stored in each node when original data is needed do.

이와 같은 분산 저장 시스템에서 중요한 사항은 데이터를 복수개의 노드들에 분산하여 저장하였다가 다시 원래의 데이터 대로 정확하게 복구하는 것이다. 그런데 분산 저장 시스템에서 사용되는 저장 공간의 총량이 매우 크게 증가함에 따라 저장 하드웨어 장치의 오류, 저장 시스템 내 장비의 결함, 소프트웨어의 업데이트 또는 네트워크 상의 오류 등으로 인하여 저장 노드들 중 일부가 손실되거나 손실되는 경우가 발생하게 된다. 예를 들어 Facebook의 경우 사진 파일 저장을 위한 공간이 수십 페타 바이트(테라 바이트의 1024배)에 이르고, 주기적으로 수천개의 저장 노드들 중 수십개의 저장 노드들이 손실되고 있다는 것이다.An important point in such a distributed storage system is to distribute data to a plurality of nodes and to restore the data to the original data. However, as the total amount of storage space used in the distributed storage system is greatly increased, some of the storage nodes are lost or lost due to errors in the storage hardware device, equipment in the storage system, software update, . For example, Facebook has dozens of petabytes (1024 times as many terabytes) of storage space for photo files, and dozens of storage nodes are periodically lost among thousands of storage nodes.

따라서 분산 저장 시스템에서는 단순히 원본 데이터를 분할하여 복수 개의 노드들에 분산하여 저장한 후 이를 다시 불러와 복구하는 방법 만으로는, 위와 같은 저장 노드의 손실에 대응할 수 없다. 이에 저장 노드의 손실에 대응하기 위하여 원본 데이터를 분할하여 노드에 분산 저장하기 위한 다양한 부호화 방법들이 제안되어 왔다. 기존의 분산 저장 시스템에서의 부호화 방법들은 기본적으로 원본 데이터를 분할할 때 데이터 복구를 위한 별도의 데이터를 부호화하고 이를 원본 데이터와 함께 각 노드들에 저장한 다음, 손실된 노드가 발생할 경우 다른 노드들을 이용하여 손실된 노드를 복구하거나 또는 손실된 노드의 내용을 복구하여 대체 노드에 저장하는 동작을 수행한다.Therefore, in the distributed storage system, it is impossible to cope with the loss of the storage node by simply dividing the original data, distributing the original data to a plurality of nodes, storing the same, and then restoring the original data. Various coding methods have been proposed for dividing original data and distributing the original data to nodes in order to cope with the loss of the storage node. In the conventional distributed storage system, when original data is divided, separate data for data recovery is encoded and stored in each node together with original data. Then, when a lost node occurs, Or recovering the lost node or restoring the contents of the lost node to the alternative node.

도 1a 내지 도1c는 이와 같이 기존의 분산 저장 시스템이 원본 데이터를 분할 및 부호화하여 네트워크 상에 분산된 노드들에 저장하고, 이를 다시 수집하여 원본 데이터를 획득하거나 또는 특정 노드가 손실되었을 시 손실된 노드를 복구하는 과정을 도시한 참고도이다. FIGS. 1A to 1C illustrate a case where a conventional distributed storage system divides and encodes original data and stores the divided data in distributed nodes on the network, collects them again to acquire original data, FIG. 6 is a reference diagram showing a process of recovering a node. FIG.

도 1a는 원본 데이터가 일정한 개수의 데이터 블록으로 분할된 다음 부호화 과정을 거쳐 분산된 각 노드들에 저장되는 과정을 나타내는 참고도이다. 또한 도 1b는 도 1a와 같이 분산되어 존재하는 노드들(n개) 중 일부(k개)를 선택하고 선택한 노드들로부터 데이터를 수집하여 원본 데이터를 획득하는 과정을 나타내는 참고도이다. 도 1c는 노드 중 어느 하나(예를 들어 D1)가 손실된 경우 나머지 D2 내지 Dn의 노드들 중 일부(d개)를 선택하고 선택한 노드들로부터 데이터를 수집하고 수집한 데이터를 이용하여 손실된 노드(D1)의 데이터를 복구하여 대체 노드에 저장하는 과정을 나타내는 참고도이다. 즉 이와 같은 데이터 손실을 복구하는 부호화 방법을 이용하면, 원본 데이터를 분할하여 저장하고 있는 복수개의 노드들 중 일부가 손실되는 경우라도 나머지 노드를 이용하여 손실된 노드를 복구할 수 있고, 사용자는 이와 같이 손실된 노드가 주기적으로 복구되어 관리되는 분산 저장 시스템에서 필요로 하는 원본 데이터를 획득할 수 있다.FIG. 1A is a reference diagram showing a process in which original data is divided into a predetermined number of data blocks and then stored in distributed nodes through an encoding process. Also, FIG. 1B is a reference diagram showing a process of selecting a part (k) of nodes (n) dispersed as shown in FIG. 1A and collecting data from selected nodes to acquire original data. FIG. 1C illustrates a method of selecting a part (d) of remaining nodes D2 to Dn when one of the nodes (for example, D1) is lost, collecting data from the selected nodes, (D1) is restored and stored in an alternative node. That is, by using such a coding method for recovering the data loss, even if a part of a plurality of nodes dividing and storing the original data is lost, the lost node can be recovered by using the remaining nodes, Similarly, the lost nodes can be periodically recovered to obtain the raw data needed by the managed distributed storage system.

예를 들어 분산 저장 시스템에서 손실된 데이터를 복구하는 가장 기본적인 기존 부호화 방법으로 반복(Repetition) 부호화 방법이 있다. 이 방법에서는 원본 데이터를 분할할 때 여러 개의 동일한 복사본을 만들어 여러 분산 노드에 저장하여, 복사된 노드들 중 어느 하나가 손실되면 나머지 노드를 이용하여 손실된 노드를 복구하는 방법이다. For example, there is a repetition coding method as the most basic existing coding method for recovering lost data in a distributed storage system. In this method, when the original data is divided, several identical copies are created and stored in various distributed nodes, and if one of the copied nodes is lost, the remaining nodes are used to recover the lost node.

도 2는 이와 같은 반복 부호화 방법이 동작하는 방식을 나타내는 참고도이다.Fig. 2 is a reference diagram showing a manner in which such a repetition coding method operates.

이와 같은 분산 저장 시스템에서 손실 데이터를 복구하는 부호화 방법의 성능을 평가하는 지표로는 원본 데이터의 수집 성공률, 손실 데이터의 복구 성공률, 분산 노드의 저장 공간 용량, 복구 대역폭 또는 코딩 레이트, 복구를 위한 접속 노드의 수 등이 있다. 여기서 복구 대역폭은 손실된 노드를 복구하기 위하여 다운로드하여야 하는 데이터의 량을 나타내는 지표이다. As an index for evaluating the performance of the coding method for recovering lost data in the distributed storage system, there are an acquisition success rate of original data, a recovery success rate of lost data, a storage capacity of a distributed node, a recovery bandwidth or a coding rate, And the number of nodes. Here, the recovery bandwidth is an index indicating the amount of data to be downloaded in order to recover the lost node.

특별히 여기서 복구를 위한 접속 노드의 수는 임의의 손실된 노드를 복구하기 위하여 이용하여야 하는 최소 노드의 수를 의미하며, 로컬리티(Locality)로 표현한다. 분산 저장 시스템에서 이러한 로컬리티는 임의의 손실된 노드를 복구하는 과정에서 연결하여 이용하여야 하는 최소 노드의 수가 된다. 여기서 최소 노드의 수라 함은 저장 시스템에 저장되어 있는 임의의 노드가 손실된 경우, 손실된 노드를 복구할 수 있는 최소한의 노드의 수를 의미한다. 다만 임의의 노드에 대하여 모두 적용이 가능하여야 하므로, 결국 저장 시스템에 저장된 각 노드의 손실 시 복구를 위하여 필요한 최소 노드의 수들 중 최대 값이 상기 로컬리티 값이 된다. 보다 상세하게 설명하면 저장 시스템에 저장된 노드 N_i에 대하여 노드 N_i가 손실될 경우 이를 복구하기 위하여 필요한 최소한의 노드의 수가 K_i 이고, i 는 노드의 인덱스이고, 저장 시스템에 저장된 노드의 수가 총 M개라면, 부호화된 코드의 로컬리티는 max{ K_i} , i = 1, ... , M 으로 정의될 수 있다.In particular, here, the number of access nodes for recovery refers to the minimum number of nodes that must be used to recover any lost node, and is expressed as a locality. In a distributed storage system, this locality is the number of the minimum number of nodes that must be connected and used in the process of recovering any lost node. Here, the number of the minimum number of nodes means the minimum number of nodes that can recover the lost node if any node stored in the storage system is lost. However, since all of the nodes need to be applicable to the arbitrary node, the maximum value among the minimum number of nodes required for restoration upon loss of each node stored in the storage system becomes the locality value. In more detail, when the node N _i stored in the storage system is lost, the minimum number of nodes required to recover the node N _i is K _i , i is the index of the node, If M, the locality of the encoded code can be defined as max {K _i }, i = 1, ..., M.

또한 여기서 임의의 노드 l 개가 손실된 경우 손실된 l 개의 노드들을 복구하기 위하여 이용되어야 하는 최소 노드의 개수는 l - 로컬리티( r _l )로 정의될 수 있다.Also here, if the dog is any node l loss minimum number of nodes that must be used to recover the loss of the node l l - it can be defined as a locality (r _l).

예를 들어 n개의 노드들 중 임의의 어느 하나가 손실되어 이를 복구하는데 최소 3개의 노드가 필요하다면, 이 때 (l = 1) - 로컬리티는 3이 된다. 이와 같은 로컬리티는 손실된 노드를 복구함에 있어서 필요한 노드의 수를 결정하는 지표이기 때문에, 분산 저장 시스템의 성능을 결정하는 중요한 지표이다. 보다 적은 로컬리티를 보장하는 부호화 방법을 사용하는 경우, 손실된 노드를 복구하는데 보다 적은 수의 노드만이 필요하기 때문에 손실된 노드를 보다 빠르게 복구할 수 있고 네트워크 트래픽을 줄일 수 있다는 장점이 있다.For example, if any one of the n nodes is lost and at least three nodes are needed to recover it, then ( l = 1) - the locality is 3. Such locality is an important index that determines the performance of the distributed storage system because it is an index that determines the number of nodes needed to recover the lost node. If a coding method that guarantees less locality is used, it is advantageous to recover lost nodes faster and reduce network traffic because fewer nodes are required to recover lost nodes.

그런데 기존의 분산 저장 시스템에서의 부호화 방법은 하나의 특정된 개수의 노드가 손실된 경우 이를 복구하기 위해 필요한 노드의 개수인 로컬리티(혹은 l - 로컬리티)를 보장하는 방법만을 제안하여 왔다. 그러나 실제 분산 저장 시스템 상에서는 노드가 손실되는 다양한 경우가 존재하기 때문에, 통합적으로 최적화된 로컬리티를 보장하는 부호화 방법이 고안되어야 할 필요가 있다.However, in the existing distributed storage system, a method of guaranteeing the locality (or l - locality), which is the number of nodes required to recover a specified number of nodes, has been proposed. However, since there are various cases in which nodes are lost on a real distributed storage system, a coding method that guarantees an optimized locality must be devised.

이에 본 발명에 따른 데이터 손실 복구를 위한 부호화 방법은 두 개 이상의 특정된 개수의 노드 손실에 대해 이를 복구하기 위해 필요한 최소 노드의 개수를 함께 나타내는 조인트 로컬리티(Joint Locality)로 정의하고, 이와 같은 조인트 로컬리티를 특정한 수로 보장하면서 부호화율을 최적화할 수 있는 데이터 손실 복구를 위한 부호화 방법을 개시한다. Accordingly, a coding method for data loss recovery according to the present invention is defined as a joint locality which together represents the number of minimum nodes necessary for restoring two or more specified number of node losses, Disclosed is an encoding method for data loss recovery that can optimize a coding rate while ensuring a certain number of localities.

예를 들면 이하 설명할 링 코드 기법 또는 크라운 코드 기법을 이용할 경우, 본 발명에 따른 데이터 손실 복구를 위한 부호화 방법은 노드 1개가 손실된 경우와 노드 2개가 손실된 경우 각각 로컬리티 2와 4을 보장하는 부호화 방법을 제공한다. 이 경우 조인트 로컬리티는 ( (r₁, r₂) = (2, 4) )로 표시하기로 한다. 또한 이하 설명할 티아라 코드 기법을 이용할 경우, 본 발명에 따른 데이터 손실 복구를 위한 부호화 방법은 노드 1개가 손실된 경우와 노드 2개가 손실된 경우 각각 로컬리티 2와 3을 보장하는 부호화 방법도 제공한다. 이 경우 조인트 로컬리티는 ( (r₁, r₂) = (2, 3) )로 표시된다.For example, when a ring code scheme or a crown code scheme, which will be described below, is used, the encoding method for data loss recovery according to the present invention guarantees localities 2 and 4 when one node is lost and two nodes are lost, respectively And a coding method. In this case, the joint localities are expressed as ((r ₁ , r ₂ ) = (2, 4)). Also, when using the tiara code technique to be described below, the encoding method for data loss recovery according to the present invention also provides a coding method for guaranteeing localities 2 and 3, respectively, when one node is lost and two nodes are lost . In this case, the joint localities are expressed as ((r ₁ , r ₂ ) = (2, 3)).

또한 본 발명에서는 이하 설명할 정보 심볼을 저장하는 노드가 손실되는 경우의 상술한 조인트 로컬리티를 조인트 정보 로컬리티(Joint Information Locality)로 정의한다. 즉 노드 중에는 저장 대상 데이터를 분할한 데이터 블록에 대하여 생성되는 정보 심볼(Information Symbol)과 상기 데이터 블록들을 부호화하여 생성되는 패리티 심볼(Parity Symbol)이 존재하고, 이와 같이 생성된 각 심볼들이 저장 시스템의 노드에 저장되는데, 정보 심볼을 저장한 노드가 손실되는 경우의 조인트 로컬리티를 조인트 정보 로컬리티로 정의한다. 따라서 조인트 정보 로컬리티는 두 개 이상의 특정한 개수의 정보 노드(정보 심볼을 저장하는 노드) 손실에 대해 이를 복구하기 위해 필요한 최소 노드의 개수가 된다.In the present invention, the above-described joint localities in the case where nodes storing information symbols to be described below are lost are defined as Joint Information Locality. In other words, there are an information symbol generated for a data block obtained by dividing data to be stored and a parity symbol generated by encoding the data blocks, and the symbols thus generated are stored in a storage system The joint locality is defined as the joint information locality when the node storing the information symbol is lost. Therefore, the joint information locality is the minimum number of nodes needed to recover it from the loss of two or more specific information nodes (nodes storing information symbols).

예를 들면 이하 설명할 링 코드 기법을 이용할 경우, 본 발명에 따른 데이터 손실 복구를 위한 부호화 방법은 정보 노드 1개가 손실된 경우와 정보 노드 2개가 손실된 경우 각각 로컬리티 2와 4을 보장하는 부호화 방법을 제공한다. 이 경우 조인트 정보 로컬리티는 ( (r₁, r₂)_info = (2, 4) )로 표시하기로 한다. 또한 이하 설명할 크라운 코드 기법 또는 티아라 코드 기법을 이용할 경우, 본 발명에 따른 데이터 손실 복구를 위한 부호화 방법은 노드 1개가 손실된 경우와 노드 2개가 손실된 경우 각각 로컬리티 2와 3을 보장하는 부호화 방법도 제공한다. 이 경우 조인트 정보 로컬리티는 ( (r₁, r₂)_info = (2, 3) )로 표시된다.For example, in the case of using a ring coding technique to be described below, the coding method for data loss recovery according to the present invention is a coding method for guaranteeing locality 2 and 4, respectively, when one information node is lost and two information nodes are lost &Lt; / RTI > In this case, the joint information locality is expressed as ((r ₁ , r ₂ ) _info = (2, 4)). When a coded code scheme or a tiara code scheme, which will be described below, is used, a coding scheme for data loss recovery according to the present invention is a coding scheme for ensuring localities 2 and 3 when a node is lost, Method. In this case, the joint information locality is expressed as ((r ₁ , r ₂ ) _info = (2, 3)).

이하에서는 본 발명에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 부호화 방법과 그 장치, 그리고 복호화 방법과 그 장치에 대하여 보다 상세히 설명한다. 이하에서 분산 저장 시스템에 대한 설명은 데이터를 적어도 하나 이상의 노드에 저장하는 저장 시스템에도 동일하게 적용된다. 이에 이하에서는 데이터를 적어도 하나 이상의 노드에 저장하는 저장 시스템을 분산 저장 시스템이라 지칭하고, 그에 대한 데이터 손실 복구를 위한 부호화 및 복호화 방법과 그에 관한 장치에 대하여 설명한다.Hereinafter, a coding method, an apparatus, a decoding method, and a device for recovering data loss in a distributed storage system according to the present invention will be described in detail. Hereinafter, the description of the distributed storage system is equally applied to a storage system that stores data in at least one or more nodes. Hereinafter, a storage system storing data in at least one node will be referred to as a distributed storage system, and a method and apparatus for encoding and decoding the data for loss recovery will be described.

도 3은 본 발명의 일 실시예에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 부호화 방법의 흐름도이다.3 is a flowchart of a method for recovering data loss in a distributed storage system according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 저장 시스템에서의 데이터 손실 복구를 위한 부호화 방법은 저장 대상 데이터 분할 단계(S100), 부호화 단계(S200), 저장 단계(S300)를 포함할 수 있다.The encoding method for data loss recovery in the storage system according to an exemplary embodiment of the present invention may include a storage target data segmentation step S100, an encoding step S200, and a storage step S300.

이하에서 상세히 설명할 저장 대상 데이터 분할 단계(S100), 부호화 단계(S200), 저장 단계(S300)의 동작은 저장 시스템에 연결된 부호화 장치(10)에 의하여 수행될 수 있다. 여기서 부호화 장치(10)는 저장 시스템에 연결된 서버 장치나 컴퓨터 장치나 또는 부호화 기능을 수행하기 위한 임베디드 시스템 등이 될 수 있다.The operation of the storage target data segmentation step S100, the encoding step S200, and the storage step S300, which will be described in detail below, may be performed by the encoding device 10 connected to the storage system. Here, the encoding device 10 may be a server device connected to the storage system, a computer device, or an embedded system for performing a coding function.

저장 대상 데이터 분할 단계(S100)는 저장 대상 데이터를 미리 정해진 일정한 개수의 데이터 블록으로 분할한다.The storage target data segmenting step S100 divides the storage target data into a predetermined number of data blocks.

부호화 단계(S200)는 상기 분할한 데이터 블록에 따른 정보 심볼(Information Symbol)을 생성하고, 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 선택하고 상기 선택한 두 개의 데이터 블록을 이용하여 패리티 심볼(Parity Symbol)를 생성한다.The encoding step S200 generates an information symbol according to the divided data block, selects two different data blocks from the divided data blocks, and uses the selected two data blocks to generate parity And generates a symbol (Parity Symbol).

여기서 위와 같이 생성된 정보 심볼과 패리티 심볼을 통칭하여 심볼이라고 지칭한다.Herein, the information symbol and the parity symbol generated as above are collectively referred to as a symbol.

부호화 단계(S200)에서 부호화 장치(10)는 각 데이터 블록을 정보 심볼로 생성하고, 데이터 블록들에서 두 개씩 선택한 조합에 대하여 각각 패리티 심볼을 부호화하여 생성한다. 예를 들어 저장 대상 데이터가 A, B, C의 세 개의 데이터 블록으로 분할된 경우, 부호화 장치(10)는 A, B, C 블록에 따라 각각 정보 심볼 S_A, S_B, S_C을 생성하고, A, B, C 데이터 블록들 중에서 두 개씩 선택한 조합인 (A, B), (B, C), (A, C)의 각 조합에 대하여는 패리티 심볼로 S_(A, B), S_(B, C), S_(A, C) 을 부호화하여 생성할 수 있다. 이하에서는 X, Y를 임의의 데이터 블록이라고 할 때, S_X는 X 데이터 블록에 대하여 생성된 정보 심볼을 지칭하고, S_(X, Y)는 X 데이터 블록과 Y 데이터 블록 2개를 선택하고 이들을 이용하여 부호화한 패리티 심볼을 지칭하는 것으로 한다.In the encoding step S200, the encoding device 10 generates each data block as an information symbol, and generates a parity symbol for each combination of two data blocks. For example, when the data to be stored is divided into three data blocks A, B, and C, the encoding device 10 generates information symbols S_A, S_B, and S_C according to A, B, and C blocks, (A, B), S_ (B, C) as parity symbols for each combination of (A, B), (B, C) , And S_ (A, C). Hereinafter, X and Y are arbitrary data blocks, S_X denotes an information symbol generated for an X data block, S_ (X, Y) denotes an X data block and 2 Y data blocks, Quot; and " parity symbol "

저장 단계(S300)는 상기 생성한 각 심볼을 저장 시스템의 각 노드에 저장한다.The storage step (S300) stores each symbol generated in each node of the storage system.

즉 부호화 단계(S200)에서 생성된 정보 심볼과 패리티 심볼을 포함하는 심볼들은 분산 저장 시스템 상의 각 노드에 저장될 수 있다. 여기서 분산 저장 시스템의 각 노드는 분산 저장 시스템 내의 정보 저장 장치에서 형성된 노드들이 될 수 있고 네트워크 상에서 서로 연결될 수 있다.That is, the symbols including the information symbol and the parity symbol generated in the encoding step S200 may be stored in each node on the distributed storage system. Wherein each node of the distributed storage system may be a node formed in an information storage device in a distributed storage system and may be interconnected on a network.

여기서 상기 정보 심볼이 저장된 상기 노드는 정보 노드, 상기 패리티 심볼이 저장된 상기 노드는 패리티 노드가 된다. 위의 예를 다시 들면 부호화 단계(S200)에서 생성된 각 심볼 S_A, S_B, S_C, S_(A, B), S_(B, C), S_(A, C)는 분산 저장 시스템의 각 노드에 저장될 수 있다.Here, the node storing the information symbol is an information node, and the node storing the parity symbol is a parity node. (A, B), S_ (B, C), S_ (A, C) generated in the coding step S200 are transmitted to each node of the distributed storage system Lt; / RTI >

여기서 상기 분산 저장 시스템의 노드는 특정 노드가 손실될 경우, 상기 손실된 노드 이외의 나머지의 노드를 이용하여 복구될 수 있는 성질을 가진다. 이는 상기 생성된 심볼에 있어서, 특정 심볼이 손실될 경우 상기 손실된 심볼 이외의 나머지 심볼을 이용하여 복구될 수 있는 것과 동일한 의미를 가진다. 따라서 이하에서 특정 심볼이 손실되거나 특정 노드가 손실된 경우 이를 복구하는 방법을 설명함에 있어서는 심볼에 대하여 설명한 부분은 해당 심볼이 저장된 노드에 대하여 동일하게 적용될 수 있고, 노드에 대하여 설명한 부분은 해당 노드가 저장하고 있는 심볼에 대하여 동일하게 적용될 수 있다. 이에 손실된 노드를 복구하는 것은 손실된 노드에 저장된 심볼을 복구하는 것이 될 수 있다. 이때 복구된 데이터는 원래의 저장 노드에 저장될 수도 있고, 또는 다른 저장 장소에 대체 노드로써 저장될 수도 있다.Here, the node of the distributed storage system has a property that, when a specific node is lost, it can be restored by using the remaining nodes other than the lost node. This has the same meaning as that in the generated symbol, if a specific symbol is lost, it can be recovered using the remaining symbols other than the lost symbol. Therefore, in describing a method of recovering a specific symbol lost or a specific node is lost, the description of the symbol can be applied to the node where the corresponding symbol is stored. The same holds true for stored symbols. Recovering the lost node may then be to recover the symbol stored in the lost node. At this time, the recovered data may be stored in the original storage node, or may be stored as an alternative node in another storage location.

이하에서는 부호화 단계(S200)에 대하여 보다 상세히 설명한다. 부호화 단계(S200)에서 부호화 장치(10)는 상기 분할한 데이터 블록에 따른 정보 심볼(Information Symbol)을 생성하고, 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 선택하고 상기 선택한 두 개의 데이터 블록을 이용하여 패리티 심볼(Parity Symbol)을 생성한다. 여기서 부호화 단계(S200)는, 상기 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 상기 노드의 개수와 상기 부호화 단계(S200)에서 생성하는 상기 패리티 심볼의 개수를 미리 설정된 한도로 유지하면서 소정의 부호화율을 달성하기 위하여, 상기 분할한 데이터 블록들 중 일부의 두 데이터 블록을 선택하고 상기 선택한 두 데이터 블록을 이용하여 상기 패리티 심볼을 부호화할 수 있다.Hereinafter, the encoding step S200 will be described in more detail. In the encoding step S200, the encoding device 10 generates an information symbol according to the divided data block, selects two different data blocks from the divided data blocks, A parity symbol is generated using the data block. Here, the encoding step S200 may include a step of encoding the number of nodes needed to recover the symbol stored in the lost node and the number of parity symbols generated in the encoding step S200 to a predetermined coding rate , It is possible to select two data blocks of a part of the divided data blocks and to encode the parity symbols using the selected two data blocks.

이하에서 상세히 설명할 바, 부호화 단계(S200)는 상기 저장 시스템에 저장된 상기 노드에 대하여, 모든 경우의 2개 이하의 상기 노드 손실에 있어서 최대 4개 이하의 상기 노드에 접속하여 손실된 상기 노드의 복구를 보장할 수 있도록, 상기 패리티 심볼을 생성한다.As will be described in detail below, the encoding step (S200) comprises, for the node stored in the storage system, accessing up to four or less of the node losses in all cases of not more than two node losses, And generates the parity symbol so as to guarantee recovery.

여기서 부호화 단계(S200)는 아래에서 보다 상세히 정의하고 설명할 링 코드 기법, 크라운 코드 기법, 티아라 코드 기법 중 어느 하나의 기법을 이용하여 패리티 심볼을 생성할 수 있다. 그리고 상기 기법들을 이용하여 부호화를 수행한 경우 그에 따른 심볼과 노드를 저장하는 저장 시스템은 미리 정해진 조인트 로컬리티 또는 조인트 정보 로컬리티 값을 유지하면서, 동시에 해당 조인트 로컬리티 또는 조인트 정보 로컬리티 값에 있어서의 최적화된 부호화율을 달성한다. Here, the encoding step S200 may generate a parity symbol using any one of a ring code technique, a crown code technique, and a tiara code technique, which will be described in more detail below. When the coding is performed using the above techniques, the storage system for storing the symbols and the nodes according to the coding scheme maintains a predetermined joint locality or joint information locality value, and at the same time, the joint locality or the joint information locality value Lt; / RTI >

여기서 부호화율(코드 레이트, Code Rate)은 정보 심볼의 개수를 총 심볼 수로 나눈 것으로 정의될 수 있으며, 정보 심볼의 수는 저장 대상 데이터를 분할하는 수에 따라 결정되는 것이므로, 결국 정보 심볼을 이용하여 부호화하여 생성하는 패리티 심볼의 수에 따라 부호화율이 변화하게 된다. 그리고 이때 생성되는 패리티 심볼의 수가 적을수록 부호화율은 향상되게 된다. 이처럼 생성되는 패리티 심볼의 개수를 줄여 부호화율이 향상된다는 것의 의미는, 손실된 노드를 복구할 경우 연결하여야 하는 패리티 심볼을 저장하는 노드의 총 수를 줄이고, 그에 따라 분산 저장 시스템에 저장되는 노드의 수를 줄인다는 의미가 될 수 있다. 즉 본 발명에 따른 부호화 단계(S200)에 의하면, 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 노드의 개수를 미리 정해진 정도로 유지하면서, 동시에 상기 부호화 단계(S200)에서 생성하는 패리티 심볼의 개수를 줄여 부호화율을 최적화할 수 있다.Here, the coding rate (code rate) can be defined as the number of information symbols divided by the total number of symbols. Since the number of information symbols is determined according to the number of divided pieces of data to be stored, The coding rate is changed according to the number of parity symbols generated by coding. In this case, the smaller the number of generated parity symbols, the more the coding rate is improved. This means that the coding rate is improved by reducing the number of generated parity symbols. This means that when the lost node is recovered, the total number of nodes storing the parity symbols to be connected is reduced, It can mean to reduce the number. That is, according to the coding step S200 according to the present invention, the number of parity symbols generated in the coding step S200 is reduced while maintaining the number of nodes required for recovering symbols stored in the lost node to a predetermined level The coding rate can be optimized.

여기서 정보 심볼은 데이터 블록에 대응하여 생성된다. 즉 정보 심볼은 데이터 블록의 내용을 모두 포함하고, 또한 정보 심볼을 이용하여 해당 데이터 블록의 내용을 획득할 수 있도록 각 데이터 블록에 대응하여 생성된다. 또한 패리티 심볼은 위와 같이 복수개의 데이터 블록을 이용하여 부호화하여 생성한다.Where information symbols are generated corresponding to the data blocks. That is, the information symbol includes all the contents of the data block and is generated corresponding to each data block so as to acquire the contents of the data block using the information symbol. The parity symbol is generated by encoding using a plurality of data blocks as described above.

여기서 부호화 단계(S200)는 두 개의 데이터 블록을 이용하여 패리티 심볼을 부호화 할 때, 두 개의 데이터 블록을 더하여 패리티 심볼을 부호화할 수 있다. 여기서 패리티 심볼을 부호화하는 방법은 이와 같이 두 개의 데이터 블록을 더하는 것에 한정되지는 아니하고, 분산 저장 시스템에서 복수개의 데이터 블록을 이용하여 패리티 심볼을 생성하는 다양한 방법을 이용할 수 있다.Here, in encoding step S200, when parity symbols are encoded using two data blocks, parity symbols may be encoded by adding two data blocks. Here, the method of encoding a parity symbol is not limited to adding two data blocks, and various methods of generating parity symbols using a plurality of data blocks in the distributed storage system can be used.

여기서 부호화 단계(S200)는 상기 패리티 심볼을 생성하는 경우, 상기 저장 대상 데이터를 분할한 상기 데이터 블록들 중 일부를 선택하고 상기 선택한 데이터 블록들을 이용하여 상기 패리티 심볼을 생성하되, 아래에서 설명할 링 코드 기법, 크라운 코드 기법, 티아라 코드 기법 중 어느 하나의 기법을 이용하여 데이터 블록을 선택하고 그에 따라 패리티 심볼을 생성할 수 있다.In the encoding step S200, when the parity symbol is generated, a parity symbol is generated using the selected data blocks by selecting a part of the data blocks into which the storage target data is divided, A code block, a coded code block, a tier code block, and a parity symbol can be generated according to the selected block.

먼저 부호화 단계(S200)가 링 코드 기법을 이용하여 패리티 심볼을 부호화하는 방법에 대하여 설명한다.First, a description will be given of a method of encoding a parity symbol using a ring coding scheme in a coding step (S200).

여기서 부호화 단계(S200)는 모든 상기 데이터 블록에 대하여 각 상기 데이터 블록이 2개의 상기 패리티 심볼 만을 생성하기 위하여 2번 만 이용되도록 두 개의 상기 데이트 블록을 선택 및 이용하여, 각 상기 패리티 심볼을 부호화할 수 있다.Here, the encoding step S200 selects and uses the two data blocks so that each data block is used only twice to generate two parity symbols for all the data blocks, and the parity symbols are encoded .

도 4는 링 코드 기법에 따른 부호화 방법을 설명하기 위한 참고도이다. 링 코드 기법을 이용한 부호화 방법의 동작을 그래프로 표현하면 도 4와 같이 표현될 수 있다. 도 4의 (a)를 참조하면서 설명하면 예를 들어 분할된 데이터 블록의 수 k 가 3인 경우 그래프 노드가 정보 심볼/노드를(여기서 그래프 노드는 분산 저장 시스템의 노드와는 별개의 개념이다), 그래프 에지가 패리티 심볼/노드를 각 나타낸다고 할 때(에지로 연결되는 두 개의 그래프 노드를 이용하여 패리티 심볼을 생성함), 부호화 단계(S200)에서 생성하는 심볼들은 도 4의 (a)의 그래프와 같이 표현될 수 있다. 그리고 k가 4일 경우는 도 4의 (b), k가 8일 경우는 도 4의 (c)와 같이 표현될 수 있다. 그리고 임의의 k에 대하여도 마찬가지로 링의 형태로 그래프 노드들을 에지로 연결할 수 있다. 이와 같이 링 코드 기법에서는 정보 심볼/노드를 링의 형태로 에지로 연결하고, 위와 같은 연결으로 인하여 생성되는 에지에 따라 패리티 심볼을 부호화한다.4 is a reference diagram for explaining a coding method according to a ring code technique. The operation of the encoding method using the ring code scheme can be expressed as a graph as shown in FIG. Referring to FIG. 4A, for example, when the number k of divided data blocks is 3, the graph node transmits the information symbol / node (here, the graph node is a concept different from the node of the distributed storage system) (The parity symbol is generated using the two graph nodes connected to the edge), and the symbols generated in the encoding step S200 are the graphs of FIG. 4 (a) Can be expressed as 4 (b) when k is 4, and (c) when k is 8 can be expressed as shown in FIG. 4 (c). And for arbitrary k, graph nodes can also be connected to the edge in the form of a ring. Thus, in the ring code scheme, an information symbol / node is connected to an edge in the form of a ring, and a parity symbol is encoded according to an edge generated due to the above connection.

본 발명에서는 상술한 도 4의 그래프와 같이 표현되는 부호화 방법을 링 코드(Ring Code) 기법을 이용한 부호화 방법이라고 지칭한다.In the present invention, the coding method expressed by the graph of FIG. 4 is referred to as a coding method using a ring code technique.

이와 같은 링 코드 기법을 이용하는 부호화 단계(S200)의 동작을 다음과 같이 생성 행렬로 표현할 수 있다.The operation of the encoding step S200 using the ring code technique can be expressed as a generator matrix as follows.

여기서 부호화 단계(S200)는 상기 분할한 각 데이터 블록을 나타내는 입력 심볼을 각 원소로 하는 입력 행렬과, 원소의 값이 1 또는 0 중 어느 하나이고 각 열벡터에 포함된 1의 개수가 2개 이하인 생성 행렬(Generator Matrix)을 곱하여 출력 행렬을 생성할 수 있다. 이와 같은 생성 행렬의 조건은 이하 설명할 크라운 코드 기법 및 티아라 코드 기법에서도 공통되는 조건이다. 여기서 입력 심볼이란 각 데이터 블록을 나타내는 심볼을 지칭하고, 이들 입력 심볼로 이루어지는 행렬인 입력 행렬을 상기 생성 행렬과 곱하여 출력 행렬을 생성한다는 의미에서 '입력' 심볼 및 '입력' 행렬이라고 지칭한다. 이때 생성 행렬의 열벡터는 상기 입력 행렬의 원소 수에 따른 길이를 가진다.Here, the encoding step S200 includes an input matrix in which each element is an input symbol representing the divided data blocks, and an input matrix in which the value of the element is either 1 or 0, and the number of 1s included in each column vector is 2 or less The output matrix can be generated by multiplying the generator matrix by the generator matrix. The condition of the generation matrix is a common condition in the crown code technique and the tiara code technique to be described below. Herein, the input symbol refers to a symbol representing each data block, and is referred to as an 'input' symbol and an 'input' matrix in the sense that an input matrix, which is a matrix formed by these input symbols, is multiplied by the generator matrix to generate an output matrix. At this time, the column vector of the generator matrix has a length corresponding to the number of elements of the input matrix.

예를 들어 분할한 각 데이터 블록을 나타내는 A, B, C를 각 원소로 하는 입력 행렬 [A, B, C]와, 원소의 값이 1 또는 0 중 어느 하나이고 각 열벡터에 포함된 1의 개수가 2개 이하인 생성 행렬

을 곱하여 출력 행렬로 [A, B, C, A+B, A+C, B+C]를 생성할 수 있다. 이는 도 4의 (a)의 경우 이용되는 생성 행렬이다. For example, an input matrix [A, B, C] in which each element represents A, B, and C representing each divided data block, and an input matrix [ Generation matrix with two or fewer

A, B, C, A + B, A + C, B + C] as the output matrix. This is the generation matrix used in case of FIG. 4 (a).

여기서 부호화 단계(S200)는 위와 같이 생성 행렬을 이용하여 생성한 출력 행렬에 따라 정보 심볼과 패리티 심볼을 생성할 수 있다. 먼저 상기 출력 행렬에서 하나의 상기 입력 심볼에 대응하는 원소는 상기 정보 심볼로 생성한다. 그리고 상기 출력 행렬에서 두 개의 상기 입력 심볼의 합에 대응하는 원소는 상기 패리티 심볼로 생성한다. 상기 예를 들면 하나의 입력 심볼에 대응하는 원소인 A, B, C는 각 정보 심볼 S_A, S_B, S_C로 생성하고, 두 개의 입력 심볼의 합에 대응하는 원소인 A+B, A+C, B+C는 각 패리티 심볼 S_(A, B), S_(A, C), S_(B, C)로 생성할 수 있다.Here, the encoding step S200 can generate the information symbol and the parity symbol according to the output matrix generated using the generator matrix as described above. First, an element corresponding to one input symbol in the output matrix is generated as the information symbol. And an element corresponding to the sum of the two input symbols in the output matrix is generated as the parity symbol. For example, the elements A, B, and C corresponding to one input symbol are generated as the information symbols S_A, S_B, and S_C, and the elements corresponding to the sum of the two input symbols A + B, A + C, B + C can be generated by each parity symbol S_ (A, B), S_ (A, C), S_ (B, C).

또한 여기서 상기 생성 행렬은 상기 생성 행렬의 열의 순서를 바꿔 만드는 다른 행렬을 포함할 수 있다. 다시 말하면 생성 행렬은 생성 행렬의 열벡터 간에 순서를 바꿔 만들어질 수 있는 모든 행렬들, 즉 모든 가능한 열-섞음(Column-Permutation)으로 만들어지는 행렬들을 포함할 수 있고, 이들 중 어느 하나를 생성 행렬로 사용할 수 있다. 상기 출력 행렬에 따라 정보 심볼 및 패리티 심볼을 생성하는 내용과 열-섞음에 관한 내용은 이하 설명할 크라운 코드 기법 및 티아라 코드 기법에도 동일하게 적용되는 내용이다.Also, the generator matrix may include another matrix for reordering the columns of the generator matrix. In other words, the generator matrix may include all the matrices that can be rearranged between the column vectors of the generator matrix, i.e., all possible column-permutation matrices, . The contents of generating the information symbol and the parity symbol according to the output matrix and the content of the column-mixing are also applied to the crown code technique and the tiara code technique to be described below.

여기서 링 코드 기법을 이용할 경우의 생성 행렬은 상기 입력 행렬의 원소 수 크기의 단위 행렬과, 행 순서가 상호 인접한 두 개의 원소가 1인 모든 열벡터를 연결한 행렬인 것을 특징으로 할 수 있다. 이때 상기 생성 행렬은 행 순서가 상호 인접한 두 개의 원소가 1인 모든 열벡터와, 행 순서의 처음과 마지막 두 개의 원소가 1인 열벡터를 연결한 행렬이 될 수 있다.Here, the generation matrix when the ring code scheme is used is a matrix formed by concatenating a unit matrix of the number of elements of the input matrix and all column vectors of two elements whose adjacent rows are adjacent to each other. In this case, the generation matrix may be a matrix formed by concatenating all column vectors whose two adjacent elements in the row order are 1, and column vectors whose first and last two elements are 1 in the row order.

도 5는 상술한 링 코드 기법에 따른 생성 행렬을 나타내는 참고도이다. 이처럼 링 코드 기법에 따른 생성 행렬은 분할한 데이터 블록 수와 같은 k 크기의 단위 행렬과 나머지 부분으로 구성될 수 있고, 이때 나머지 부분의 열벡터는 인접한 두 개의 원소가 1인 모든 열 벡터들을 포함할 수 있다. 여기서 도 5의 최 우측 열 벡터와 같이 행 순서가 처음과 마지막인 경우도 두 원소가 인접한 것으로 보기로 한다.5 is a reference diagram showing a generation matrix according to the ring code technique described above. In this way, the generation matrix according to the ring code scheme can be composed of a k-sized unit matrix and a remaining portion equal to the number of divided data blocks, and the remaining column vectors include all column vectors having two adjacent elements of 1 . Here, when the row order is the first and last row as in the rightmost column vector in FIG. 5, it is assumed that the two elements are adjacent to each other.

이와 같은 링 코드 기법을 이용할 경우, 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있다. 또한 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 4개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있다. 즉 링 코드 기법을 이용하여 부호화를 하는 경우 조인트 정보 로컬리티는 ((r₁, r₂)_info = (2, 4))이 된다.When the ring code scheme is used, when one node corresponding to any one of the information symbols of the nodes of the storage system is lost, using the two nodes other than the lost node, Can be recovered. In addition, when two nodes corresponding to any two of the information symbols of the storage system are lost, the lost nodes can be recovered using four nodes other than the lost node. That is, when encoding is performed using the ring code technique, the joint information locality becomes ((r ₁ , r ₂ ) _info = (2, 4)).

또한 링 코드 기법을 이용할 경우, 또한 상기 저장 시스템의 노드 중 임의의 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 4개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있다. 즉 링 코드 기법을 이용하여 부호화를 하는 경우 조인트 로컬리티 역시 ((r₁, r₂) = (2, 4))이 된다.Also, when a ring code scheme is used, and if any one of the nodes of the storage system is lost, the lost node can be recovered by using two nodes other than the lost node, If any two nodes of the system are lost, the lost two nodes can be recovered using four nodes other than the lost node. That is, when encoding is performed using the ring code technique, the joint localities are also ((r ₁ , r ₂ ) = (2, 4)).

이때 링 코드 기법을 이용하여 부호화한 코드는 조인트 정보 로컬리티는 ((r₁, r₂)_info = (2, 4))이고, 조인트 로컬리티가 ((r₁, r₂) = (2, 4))인 코드 중에서, 부호화율이 최적인 코드가 된다. The code is encoded by using a ring code technique is a joint information locality is _{_{((r 1, r 2)}} info = (2, 4)), the joint is locality ((r _1, and r ₂₎ = _(2, 4)) encode, the code having the best coding rate is obtained.

도 6은 부호화율을 최적화하는 링 코드 기법의 특징을 설명하기 위한 참고도이다. 도 6의 (a) 와 같이 정보 심볼을 연결하는 에지에 따라 패리티 심볼을 부호화하는 경우, 조인트 정보 로컬리티는 ((r₁, r₂)_info = (2, 4))이 되지만, 부호화율이 최적화된 코드는 아니다. 반면 도 6의 (b)와 같이 링 코드 기법에 따라 정보 심볼을 연결하는 에지에 따라 패리티 심볼을 부호화하는 경우, 조인트 정보 로컬리티가 ((r₁, r₂)_info = (2, 4))이 되면서 동시에 부호화율이 최적화된 코드가 된다.6 is a reference diagram for explaining features of the ring code technique for optimizing the coding rate. 6 (a), when the parity symbol is encoded according to the edge connecting the information symbols, the joint information locality is ((r ₁ , r ₂ ) _info = (2, 4) It is not optimized code. On the other hand, when the parity symbol is coded according to the edge connecting the information symbols according to the ring code scheme as shown in FIG. 6 (b), the joint information locality is ((r ₁ , r ₂ ) _info = (2, 4) And the code rate is optimized at the same time.

여기서 부호화된 코드의 성능을 나타내는 지표로서 최소거리(Minimum Distance)를 정의할 수 있는데, 최대 k - 1개 까지 노드(심볼)가 손실되어도 다른 노드(심볼)들을 이용하여 손실된 노드(심볼)를 복구할 수 있는 경우, 최대 복구 가능한 손실 노드(심볼)의 수를 나타내는 지표인 최소거리 d 를 k로 정의할 수 있다. 즉 다시 말하면 최소거리 d 보다 1 적은 수 만큼 노드가 손실되어도 다른 노드를 이용하여 손실된 노드들을 복구할 수 있는 것이다.Here, the minimum distance can be defined as an index indicating the performance of the encoded code. Even if a node (symbol) is lost up to k - 1, the lost node (symbol) If recoverable, the minimum distance d, which is an indicator of the number of recoverable loss nodes (symbols), can be defined as k. In other words, even if the node is lost by one less than the minimum distance d, the lost node can be recovered by using another node.

이에 의하면 링 코드 기법을 이용하여 부호화된 코드의 경우, 최대 2개 까지 노드(심볼)가 손실되어도 다른 노드(심볼)들을 이용하여 손실된 노드(심볼)를 복구할 수 있는 효과가 있다. 이 경우 최대 복구 가능한 손실 노드(심볼)의 수를 나타내는 지표인 최소거리 d 는 3이 된다.According to this, even if up to two nodes (symbols) are lost in the code coded using the ring code technique, the lost nodes (symbols) can be recovered by using other nodes (symbols). In this case, the minimum distance d, which is an index indicating the number of recoverable loss nodes (symbols), is 3.

이때 k 개로 분할된 데이터 블록을 n 개의 심볼(정보 심볼 및 패리티 심볼 포함)로 생성하고 상술한 최소거리가 d 가 되는 부호화 방법을 이용한 코드가 [n, k, d]₂로 표현한다고 할 때, 링 코드 기법에 따른 코드는 [2k, k, 3]₂코드가 된다. 즉 상기 저장 대상 데이터가 k개의 상기 데이터 블록으로 분할되는 경우, 상기 부호화 단계(S200)는 상기 정보 심볼은 k 개 생성하고, k 개의 상기 데이터 블록을 2번씩 모두 이용하여 k 개의 상기 패리티 심볼을 생성하는 것을 특징으로 한다. 그리고 이때 부호화율은 데이터 블록의 수와 심볼의 수 간의 비율인 k/n으로 정의될 수 있고, 링 코드 기법을 이용하여 부호화된 코드의 경우 부호화율은 1/2가 된다.Assuming that a code block using the coding method in which the data block divided into k blocks is generated with n symbols (including information symbols and parity symbols) and the minimum distance is d is expressed as [n, k, d] ₂ , The code according to the ring code technique is [2k, k, 3] ₂ code. That is, when the data to be stored is divided into k data blocks, the encoding step (S200) generates k pieces of the information symbols and generates k pieces of the parity symbols using all k pieces of the data blocks twice . In this case, the coding rate can be defined as k / n, which is the ratio between the number of data blocks and the number of symbols, and the coding rate is 1/2 in the case of codes encoded using the ring coding technique.

다음으로 부호화 단계(S200)가 크라운 코드 기법을 이용하여 패리티 심볼을 부호화하는 방법에 대하여 설명한다.Next, a description will be made of a method of encoding the parity symbol using the coded code scheme in the encoding step (S200).

도 7은 크라운 코드 기법에 따른 부호화 방법을 설명하기 위한 참고도이다. 크라운 코드 기법을 이용한 부호화 방법의 동작을 그래프로 표현하면 도 7과 같이 표현될 수 있다. 도 7의 (a)를 참조하면서 설명하면 예를 들어 분할된 데이터 블록의 수 k 가 5인 경우 그래프 노드가 정보 심볼/노드를, 그래프 에지가 패리티 심볼/노드를 각 나타낸다고 할 때, 부호화 단계(S200)에서 생성하는 심볼들은 도 7의 (a)의 그래프와 같이 표현될 수 있다. 그리고 k가 6일 경우는 도 7의 (b), k가 9일 경우는 도 7의 (c)와 같이 표현될 수 있다. 마찬가지로 k가 5보다 크거나 같은 경우의 모든 경우에 대하여, 도 7과 같이 5개의 그래프 노드를 기준으로 5각형을 이루도록 에지를 연결한 다음, 그 중 두 그래프 노드를 선택하여 나머지 그래프 노드들은 상기 선택한 두 그래프 노드와 에지로 연결되도록 그래프 노드를 생성할 수 있다. 크라운 코드 기법에서는 정보 심볼/노드를 이와 같은 크라운의 형태로 에지로 연결하고, 위와 같은 연결으로 인하여 생성되는 에지에 따라 패리티 심볼을 부호화한다.7 is a reference diagram for explaining a coding method according to the crown code technique. The operation of the encoding method using the crown code technique can be expressed as a graph as shown in FIG. Referring to FIG. 7A, for example, when the number k of divided data blocks is 5, when the graph node represents an information symbol / node and the graph edge represents a parity symbol / node, S200 may be expressed as a graph of FIG. 7 (a). 7 (b) when k is 6, and FIG. 7 (c) when k is 9. Similarly, for all cases where k is greater than or equal to 5, edges are concatenated to form a pentagon with respect to five graph nodes as shown in FIG. 7, and then two graph nodes are selected, You can create a graph node to connect to both graph nodes and edges. In the crown code scheme, an information symbol / node is connected to an edge in the form of a crown, and a parity symbol is encoded according to an edge generated due to the above connection.

본 발명에서는 상술한 도 7의 그래프와 같이 표현되는 부호화 방법을 크라운 코드(Crown Code) 기법을 이용한 부호화 방법이라고 지칭한다.In the present invention, the coding method expressed by the graph of FIG. 7 is referred to as a coding method using a Crown Code technique.

여기서 크라운 코드 기법을 이용하여 부호화를 할 때, 부호화 단계(S200)는 다음과 같은 과정에 따라 동작할 수 있다.Here, when encoding is performed using the crown code technique, the encoding step (S200) can operate according to the following procedure.

여기서 부호화 단계(S200)는 상기 데이터 블록 중 제1 데이터 블록, 제2 데이터 블록, 제3 데이터 블록, 제4 데이터블록을 선정하는 과정을 포함할 수 있다. 도 7의 (b)를 참조하면 각 데이터 블록에 대응하는 정보 심볼(S_1, S_2, S_3, S_4)를 선정할 수 있다.Here, the encoding step S200 may include a step of selecting the first data block, the second data block, the third data block, and the fourth data block among the data blocks. Referring to FIG. 7B, information symbols S_1, S_2, S_3, and S_4 corresponding to each data block can be selected.

여기서 부호화 단계(S200)는 상기 제1 데이터 블록과 상기 제2 데이터 블록의 쌍, 상기 제2 데이터 블록과 상기 제3 데이터 블록의 쌍, 상기 제3 데이터 블록과 상기 제4 데이터 블록의 쌍을 각각 이용하여 상기 쌍 별로 상기 패리티 심볼을 부호화하는 과정을 포함할 수 있다. 도 7의 (b)를 참조하면, S_1 정보 심볼과 S_2 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(1, 2)를 생성하고, S_2 정보 심볼과 S_3 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(2, 3)를 생성하고, S_3 정보 심볼과 S_4 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(3, 4)를 생성할 수 있다.Here, the encoding step S200 may include a pair of the first data block and the second data block, a pair of the second data block and the third data block, and a pair of the third data block and the fourth data block, respectively And encoding the parity symbols by the pair. Referring to FIG. 7B, a parity symbol S_ (1, 2) is generated using a pair of S_1 information symbol and S_2 information symbol, and a parity symbol S_ (1, 2) is generated using a pair of S_2 information symbol and S_3 information symbol. 2, 3), and generates a parity symbol S_ (3, 4) using the pair of the S_3 information symbol and the S_4 information symbol.

여기서 부호화 단계(S200)는 상기 데이터 블록 중 성기 선정된 데이터 블록을 제외한 나머지 데이터 블록에 대하여, 상기 나머지 데이터 블록 별로 각 상기 나머지 데이터 블록과 상기 제1 데이터 블록의 쌍, 각 상기 나머지 데이터 블록과 상기 제2 데이터 블록의 쌍을 각각 이용하여, 상기 쌍 별로 상기 패티리 심볼을 부호화하는 과정을 포함할 수 있다. 도 7의 (b)를 참조하면, 나머지 데이터 블록 중 하나에 대응하는 S_5 정보 심볼에 대하여, S_5 정보 심볼과 S_1 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(1, 5)를 생성하고, S_5 정보 심볼과 S_2 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(2, 5)를 생성할 수 있다. 또한 나머지 데이터 블록 중 다른 하나에 대응하는 S_6 정보 심볼에 대하여, S_6 정보 심볼과 S_1 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(1, 6)를 생성하고, S_6 정보 심볼과 S_2 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(2, 6)를 생성할 수 있다.Herein, the encoding step (S200) may include, for the remaining data blocks except for the data block selected as the gender of the data block, a pair of the remaining data blocks and the first data blocks, And encoding the pattern data using the first and second data block pairs. Referring to FIG. 7B, a parity symbol S_ (1, 5) is generated using an S_5 information symbol and a S_1 information symbol pair for an S_5 information symbol corresponding to one of the remaining data blocks, and S_5 information And the parity symbol S_ (2, 5) can be generated using a pair of S_2 information symbols. Also, for the S_6 information symbol corresponding to the other data block, the parity symbol S_ (1, 6) is generated by using the pair of the S_6 information symbol and the S_1 information symbol, and the pair of the S_6 information symbol and the S_2 information symbol To generate parity symbols S_ (2, 6).

이와 같은 크라운 코드 기법을 이용하는 부호화 단계(S200)의 동작을 다음과 같이 생성 행렬로 표현할 수 있다.The operation of the encoding step S200 using the crown code technique can be expressed as a generation matrix as follows.

이를 위하여 부호화 단계(S200)는 상기 분할한 각 데이터 블록을 나타내는 입력 심볼을 각 원소로 하는 입력 행렬과, 원소의 값이 1 또는 0 중 어느 하나이고 각 열벡터에 포함된 1의 개수가 2개 이하인 생성 행렬(Generator Matrix)을 곱하여 출력 행렬을 생성할 수 있다. 그리고 상기 출력 행렬에서 하나의 상기 입력 심볼에 대응하는 원소는 상기 정보 심볼로, 상기 출력 행렬에서 두 개의 상기 입력 심볼의 합에 대응하는 원소는 상기 패리티 심볼로, 각각 생성할 수 있다. 이때 생성 행렬의 열벡터는 상기 입력 행렬의 원소 수에 따른 길이를 가진다.To this end, the coding step S200 includes an input matrix in which each element of the input symbol represents the divided data block, and a matrix in which the value of the element is either 1 or 0, and the number of 1s included in each column vector is 2 (Generator Matrix) to generate an output matrix. And an element corresponding to one input symbol in the output matrix is the information symbol, and an element corresponding to a sum of two input symbols in the output matrix is the parity symbol. At this time, the column vector of the generator matrix has a length corresponding to the number of elements of the input matrix.

여기서 크라운 코드 기법을 이용할 경우의 생성 행렬은 상기 입력 행렬의 원소 수 크기의 단위 행렬과, 행 순서가 상호 인접한 두 개의 원소가 1인 5개의 열벡터와, 상기 5개의 열벡터 중 어느 하나의 열벡터에 포함된 어느 한 원소와 상기 5개의 열벡터에 포함되지 않은 어느 한 원소가 각각 1인 모든 조합의 열벡터들을 연결한 행렬인 것을 특징으로 할 수 있다.Here, the generation matrix in the case of using the crown code scheme is composed of a unit matrix of the number of elements of the input matrix, five column vectors of two elements whose row order is mutually adjacent, and one column of the five column vectors And a combination of all the combinations of column vectors in which one element included in the vector and one element not included in the five column vectors are 1 is connected.

도 8은 상술한 크라운 코드 기법에 따른 생성 행렬을 나타내는 참고도이다. 이처럼 크라운 코드 기법에 따른 생성 행렬은 분할한 데이터 블록 수와 같은 k 크기의 단위 행렬과 나머지 부분으로 구성될 수 있다. 이때 나머지 부분은 k가 5보다 크면 다시 제1 행렬 부분 및 제2 행렬 부분으로 나뉘어 표현되고, k가 5인 경우는 제1 행렬 부분만으로 표현될 수 있다. 여기서 제1 행렬 부분은 도 8과 같이 k 개의 행 중에서 5개의 행을 선택하고, 선택한 5개의 행에 있어서 열 방향으로 인접한 두 개의 원소가 1인 열벡터 5개로 구성될 수 있다. 여기서 행 순서가 처음과 마지막인 경우도 두 원소가 인접한 것으로 보기로 한다. 다음으로 제2 행렬 부분은 k가 5보다 큰 경우 상기 나머지 부분에 포함되는 행렬로, 도 8과 같이 상기 선택한 5개의 행 중에서 다시 2개의 행을 선택하고, 상기 선택한 2개의 행 중 어느 하나가 1인 원소와 상기 선택한 5개의 행에 포함되지 않은 행 중 어느 하나가 1인 원소로 구성되는 모든 열벡터를 포함한다.8 is a reference diagram showing a generation matrix according to the crown code technique. The generation matrix according to the crown code technique can be composed of a k-sized unit matrix equal to the number of divided data blocks and the remaining part. At this time, if k is greater than 5, the remaining part is divided into a first matrix part and a second matrix part, and if k is 5, the remaining part can be represented by only the first matrix part. Here, the first matrix part may include five column vectors selected from five rows among the k rows as shown in FIG. 8, and two adjacent elements in the column direction in the selected five rows. Here, even when the row order is first and last, it is assumed that the two elements are adjacent to each other. Next, the second matrix part is a matrix included in the remaining part when k is larger than 5, selects two rows from the five selected rows as shown in FIG. 8, and selects one of the two selected rows as 1 And all of the column vectors consisting of elements whose elements are either 1 or not included in the selected 5 rows.

이와 같은 크라운 코드 기법을 이용할 경우, 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있다. 또한 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 3개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있다. 즉 크라운 코드 기법을 이용하여 부호화를 하는 경우 조인트 정보 로컬리티는 ((r₁, r₂)_info = (2, 3))이 된다.In the case of using the crown code technique, when one node corresponding to any one of the information symbols of the storage system is lost, the lost node, using two nodes other than the lost node, Can be recovered. Also, when two nodes corresponding to any two information symbols of the nodes of the storage system are lost, the lost nodes can be recovered using three nodes other than the lost node. That is, when coding is performed using the crown code technique, the joint information locality becomes ((r ₁ , r ₂ ) _info = (2, 3)).

또한 상기 저장 시스템의 노드 중 임의의 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 4개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있다. 즉 크라운 코드 기법을 이용하여 부호화를 하는 경우 조인트 로컬리티는 ((r₁, r₂) = (2, 4))이 된다.In addition, when any one of the nodes of the storage system is lost, it is possible to recover the lost node by using two nodes other than the lost node, and any two of the nodes of the storage system If the node is lost, the lost node can be recovered using four nodes other than the lost node. That is, when coding is performed using the crown code technique, the joint localities are ((r ₁ , r ₂ ) = (2, 4)).

이때 크라운 코드 기법을 이용하여 부호화한 코드는 조인트 정보 로컬리티는 ((r₁, r₂)_info = (2, 3))이고, 조인트 로컬리티가 ((r₁, r₂) = (2, 4))인 코드 중에서, 부호화율이 최적인 코드가 된다. The crown code code used for encoding technique is a joint information locality is _{_{((r 1, r 2)}} info = (2, 3)), r 1, the joint locality ((a r ₂₎ = _(2, 4)) encode, the code having the best coding rate is obtained.

도 9는 부호화율을 최적화하는 크라운 코드 기법의 특징을 설명하기 위한 참고도이다. 도 9의 (a) 와 같이 정보 심볼을 연결하는 에지에 따라 패리티 심볼을 부호화하는 경우, 조인트 정보 로컬리티는 ((r₁, r₂)_info = (2, 3))이 되지만, 부호화율이 최적화된 코드는 아니다. 반면 도 9의 (b)와 같이 크라운 코드 기법에 따라 정보 심볼을 연결하는 에지에 따라 패리티 심볼을 부호화하는 경우, 조인트 정보 로컬리티가 ((r₁, r₂)_info = (2, 3))이 되면서 동시에 부호화율이 최적화된 코드가 된다.9 is a reference diagram for explaining a feature of the crown code technique for optimizing the coding rate. 9 (a), when the parity symbol is encoded according to the edge connecting the information symbols, the joint information locality becomes ((r ₁ , r ₂ ) _info = (2, 3) It is not optimized code. On the other hand, when the parity symbol is encoded according to the edge connecting the information symbols according to the crown code scheme as shown in FIG. 9 (b), the joint information locality is ((r ₁ , r ₂ ) _info = (2, 3) And the code rate is optimized at the same time.

또한 위와 같이 패리티 심볼을 부호화함으로써 최대 2 개 까지 노드(심볼)가 손실되어도 다른 노드(심볼)들을 이용하여 손실된 노드(심볼)를 복구할 수 있는 효과가 있다. 이 경우 최대 복구 가능한 손실 노드(심볼)의 수를 나타내는 지표인 최소거리 d 는 3가 된다.Also, by encoding the parity symbols as described above, even if up to two nodes (symbols) are lost, the lost nodes (symbols) can be recovered by using other nodes (symbols). In this case, the minimum distance d, which is an index indicating the number of recoverable loss nodes (symbols), is 3.

여기서 크라운 코드 기법에 따른 코드는 [3k - 5, k, 3]₂ 코드가 된다. 즉, 상기 저장 대상 데이터가 k 개의 상기 데이터 블록으로 분할되는 경우, 상기 부호화 단계(S200)는 상기 정보 심볼은 k 개 생성하고, 2k - 5 개의 상기 패리티 심볼을 생성한다. 그리고 최소거리 d는 3이 된다. 그리고 부호화율은 k / (3k - 5)가 된다.Here, the code according to the crown code technique is [3k - 5, k, 3] ₂ code. That is, when the data to be stored is divided into k data blocks, the encoding step (S200) generates k information symbols and generates 2k-5 parity symbols. And the minimum distance d is 3. And the coding rate is k / (3k - 5).

다음으로 부호화 단계(S200)가 티아라 코드 기법을 이용하여 패리티 심볼을 부호화하는 방법에 대하여 설명한다.Next, a description will be made of a method of encoding a parity symbol using the tiara code scheme in the encoding step (S200).

도 10은 티아라 코드 기법에 따른 부호화 방법을 설명하기 위한 참고도이다. 티아라 코드 기법을 이용한 부호화 방법의 동작을 그래프로 표현하면 도 10과 같이 표현될 수 있다. 도 10의 (a)를 참조하면서 설명하면 예를 들어 분할된 데이터 블록의 수 k 가 3인 경우 그래프 노드가 정보 심볼/노드를, 그래프 에지가 패리티 심볼/노드를 각 나타낸다고 할 때, 부호화 단계(S200)에서 생성하는 심볼들은 도 10의 (a)의 그래프와 같이 표현될 수 있다. 그리고 k가 4일 경우는 도 10의 (b), k가 7일 경우는 도 10의 (c)와 같이 표현될 수 있다. 마찬가지로 다른 k 값에 대하여도 도 10과 같이 3개의 그래프 노드를 기준으로 3각형을 이루도록 에지를 연결한 다음, 그 중 두 그래프 노드를 선택하여 나머지 그래프 노드들은 상기 선택한 두 그래프 노드와 에지로 연결되도록 그래프 노드를 생성할 수 있다. 티아라 코드 기법에서는 정보 심볼/노드를 이와 같은 티아라의 형태로 에지로 연결하고, 위와 같은 연결으로 인하여 생성되는 에지에 따라 패리티 심볼을 부호화한다.10 is a reference diagram for explaining a coding method according to the tiara code technique. The operation of the coding method using the tiara code technique can be expressed as a graph as shown in FIG. Referring to FIG. 10A, when the number k of divided data blocks is 3, for example, when the graph node represents an information symbol / node and the graph edge represents a parity symbol / node, S200 may be represented as a graph of FIG. 10 (a). 10 (b) when k is 4, and FIG. 10 (c) when k is 7. Similarly, for the other k values, the edges are connected to form a triangle with respect to the three graph nodes as shown in FIG. 10, and then the two graph nodes are selected so that the remaining graph nodes are connected to the selected two graph nodes and the edge You can create a graph node. In the tiara code scheme, an information symbol / node is connected to an edge in the form of a tiaras, and a parity symbol is encoded according to an edge generated due to the above connection.

본 발명에서는 상술한 도 10의 그래프와 같이 표현되는 부호화 방법을 티아라 코드(Tiara Code) 기법을 이용한 부호화 방법이라고 지칭한다.In the present invention, the coding method expressed by the graph of FIG. 10 is referred to as a coding method using the Tiara Code technique.

티아라 코드 기법을 이용하는 경우 부호화 단계(S200)는 다음과 같은 과정을 통해 패리티 심볼을 부호화할 수 있다.In the case of using the tiara code scheme, the encoding step (S200) can code the parity symbol through the following process.

여기서 부호화 단계(S200)는 상기 데이터 블록 중 제1 데이터 블록, 제2 데이터 블록을 선정하는 과정을 포함할 수 있다. 도 10의 (b)를 참조하면 각 데이터 블록에 대응하는 정보 심볼(S_1, S_2)를 선정할 수 있다.Here, the encoding step S200 may include a step of selecting a first data block and a second data block among the data blocks. Referring to FIG. 10B, information symbols S_1 and S_2 corresponding to each data block can be selected.

여기서 부호화 단계(S200)는 상기 제1 데이터 블록과 상기 제2 데이터 블록의 쌍을 이용하여 상기 패리티 심볼을 부호화하는 과정을 포함할 수 있다. 도 10의 (b)를 참조하면, S_1 정보 심볼과 S_2 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(1, 2)를 생성할 수 있다.Here, the coding step S200 may include coding the parity symbol using the pair of the first data block and the second data block. Referring to FIG. 10 (b), a parity symbol S_ (1, 2) can be generated using a pair of S_1 information symbols and S_2 information symbols.

여기서 부호화 단계(S200)는 상기 데이터 블록 중 성기 선정된 데이터 블록을 제외한 나머지 데이터 블록에 대하여, 상기 나머지 데이터 블록 별로 각 상기 나머지 데이터 블록과 상기 제1 데이터 블록의 쌍, 각 상기 나머지 데이터 블록과 상기 제2 데이터 블록의 쌍을 각각 이용하여, 상기 쌍 별로 상기 패티리 심볼을 부호화하는 과정을 포함할 수 있다. 도 10의 (b)를 참조하면, 나머지 데이터 블록 중 하나에 대응하는 S_3 정보 심볼에 대하여, S_3 정보 심볼과 S_1 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(1, 3)를 생성하고, S_3 정보 심볼과 S_2 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(2, 3)를 생성할 수 있다. 또한 나머지 데이터 블록 중 다른 하나에 대응하는 S_4 정보 심볼에 대하여, S_4 정보 심볼과 S_1 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(1, 4)를 생성하고, S_4 정보 심볼과 S_2 정보 심볼의 쌍을 이용하여 패리티 심볼 S_(2, 4)를 생성할 수 있다.Herein, the encoding step (S200) may include, for the remaining data blocks except for the data block selected as the gender of the data block, a pair of the remaining data blocks and the first data blocks, And encoding the pattern data using the first and second data block pairs. Referring to FIG. 10B, a parity symbol S_ (1, 3) is generated using an S_3 information symbol and a S_1 information symbol pair for an S_3 information symbol corresponding to one of the remaining data blocks, and S_3 information And the parity symbol S_ (2, 3) can be generated using the pair of S_2 information symbols. Also, a parity symbol S_ (1, 4) is generated using the pair of S_4 information symbol and S_l information symbol for the S_4 information symbol corresponding to the other data block, and a pair of S_4 information symbol and S_2 information symbol is generated To generate parity symbols S_ (2, 4).

이와 같은 티아라 코드 기법을 이용하는 부호화 단계(S200)의 동작을 다음과 같이 생성 행렬로 표현할 수 있다.The operation of the encoding step S200 using the above-mentioned tiara code technique can be represented by a generation matrix as follows.

여기서 티아라 코드 기법을 이용할 경우의 생성 행렬은 상기 입력 행렬의 원소 수 크기의 단위 행렬과, 행 순서가 상호 인접한 두 개의 원소가 1인 1개의 열벡터와, 상기 1개의 열벡터에 포함된 어느 한 원소와 상기 1개의 열벡터에 포함되지 않은 어느 한 원소가 각각 1인 모든 조합의 열벡터들을 연결한 행렬인 것을 특징으로 할 수 있다.Here, the generation matrix when the tiara code scheme is used is defined by a unit matrix of the number of elements of the input matrix, a column vector of two elements whose row order is mutually adjacent, and one column vector And a matrix of all combinations of column vectors in which an element and an element not included in the one column vector are 1, respectively.

도 11은 상술한 티아라 코드 기법에 따른 생성 행렬을 나타내는 참고도이다. 이처럼 티아라 코드 기법에 따른 생성 행렬은 분할한 데이터 블록 수와 같은 k 크기의 단위 행렬과 나머지 부분으로 구성될 수 있다. 이때 나머지 부분은 k가 3보다 크면 다시 제1 행렬 부분 및 제2 행렬 부분으로 나뉘어 표현되고, k가 3인 경우는 제1 행렬 부분만으로 표현될 수 있다. 여기서 제1 행렬 부분은 도 11과 같이 k 개의 행 중에서 2개의 행을 선택하고, 선택한 2개의 행의 원소가 1인 열벡터가 될 수 있다. 다음으로 제2 행렬 부분은 k가 3보다 큰 경우 상기 나머지 부분에 포함되는 행렬로, 도 11과 같이 상기 선택한 2개의 행 중 어느 하나가 1인 원소와 상기 선택한 2개의 행에 포함되지 않은 행 중 어느 하나가 1인 원소로 구성되는 모든 열벡터를 포함한다.11 is a reference diagram showing a generation matrix according to the above-described tiara code technique. The generation matrix according to the tiara code scheme can be composed of a k-sized unit matrix equal to the number of divided data blocks and the remaining part. In this case, if k is greater than 3, the remaining part is divided into a first matrix part and a second matrix part, and if k is 3, the remaining part can be represented by only the first matrix part. Here, the first matrix portion may be a column vector with two elements selected from the k rows selected as shown in FIG. 11, and the elements of the selected two rows may be one. Next, the second matrix part is a matrix included in the remaining part when k is greater than 3, and is a matrix including an element having one of the two selected rows as shown in FIG. 11 and a row not included in the selected two rows And contains all column vectors consisting of elements with one being one.

이와 같은 티아라 코드 기법을 이용하여 부호화를 수행한 경우, 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 3개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있다. 즉 티아라 코드 기법을 이용하여 부호화를 하는 경우 조인트 정보 로컬리티는 ((r₁, r₂)_info = (2, 3))이 된다.In a case where coding is performed using the tiara code scheme, when one node corresponding to any one of the information symbols of the nodes of the storage system is lost, using the two nodes in addition to the lost node, And if two nodes corresponding to any two of the information symbols of the storage system are lost, then three nodes other than the lost node are used to recover the lost 2 Nodes can be recovered. That is, when coding is performed using the tiara code technique, the joint information locality becomes ((r ₁ , r ₂ ) _info = (2, 3)).

또한 상기 저장 시스템의 노드 중 임의의 1개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 2개의 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있고, 상기 저장 시스템의 노드 중 임의의 2개의 노드가 손실되는 경우, 상기 손실된 노드 이외에 3개의 노드를 이용하여 상기 손실된 2개의 노드를 복구할 수 있다. 즉 티아라 코드 기법을 이용하여 부호화를 하는 경우 조인트 로컬리티는 ((r₁, r₂) = (2, 3))이 된다.In addition, when any one of the nodes of the storage system is lost, it is possible to recover the lost node by using two nodes other than the lost node, and any two of the nodes of the storage system If the node is lost, the lost node can be recovered using three nodes other than the lost node. That is, when coding is performed using the tiara code technique, the joint locality becomes ((r ₁ , r ₂ ) = (2, 3)).

이때 티아라 코드 기법을 이용하여 부호화한 코드는 조인트 정보 로컬리티는 ((r₁, r₂)_info = (2, 3))이고, 조인트 로컬리티가 ((r₁, r₂) = (2, 3))인 코드 중에서, 부호화율이 최적인 코드가 된다. The Tierra code encoded using the techniques are joint information locality is _{_{((r 1, r 2)}} info = (2, 3)) is a joint locality _{_{((r 1, r 2)}} = (2, 3)) encode, the code having the best coding rate is obtained.

여기서 상기 티아라 코드 기법에 따른 코드는 [3k - 3, k, 3]₂ 코드가 된다. 즉, 상기 저장 대상 데이터가 k 개의 상기 데이터 블록으로 분할되는 경우, 상기 부호화 단계(S200)는 상기 정보 심볼은 k 개 생성하고, 2k - 3 개의 상기 패리티 심볼을 생성한다. 그리고 부호화율은 k / (3k - 3)가 된다.Here, the code according to the tiara code technique is [3k - 3, k, 3] ₂ code. That is, when the data to be stored is divided into k data blocks, the encoding step (S200) generates k information symbols and 2k-3 parity symbols. And the coding rate is k / (3k - 3).

위와 같이 부호화 단계(S200)에서 부호화된 심볼이 저장 단계(S300)에서 분산 저장 시스템의 각 노드에 저장되는 경우, 각 심볼이 손실될 때 나머지 심볼을 이용하여 손실된 심볼을 복구하는 것과 동일한 방식으로, 각 노드가 손실될 때 나머지 노드를 이용하여 손실된 노드를 복구할 수 있다. 즉 반복하여 설명하는 바와 같이 위에서 심볼에 대하여 설명한 부분은 저장 단계(S300)에서 저장된 분산 저장 시스템의 노드에 대하여도 동일하게 적용될 수 있다.In the case where symbols encoded in the encoding step S200 are stored in each node of the distributed storage system in the storage step S300, when symbols are lost, the symbols recovered using the remaining symbols are recovered in the same manner , And when each node is lost, the remaining node can be used to recover the lost node. That is, as described repeatedly, the above-described parts of the symbols may be similarly applied to nodes of the distributed storage system stored in the storage step S300.

본 발명의 또 다른 실시예는 상술한 부호화 과정을 통해 생성 및 저장된 심볼 및 노드들을 포함하는 저장 시스템에서의 데이터 손실 복구를 위한 복호화 방법이 될 수 있다. 여기서 본 발명에 따른 저장 시스템에서의 데이터 손실 복구를 위한 복호화 방법은 상기 분산 저장 시스템의 노드들 중 어느 노드가 손실된 경우, 상기 손실된 노드 이외의 나머지 상기 노드들 중에서 적어도 하나 이상의 상기 노드를 선택하고, 상기 선택한 노드를 이용하여 상기 손실된 노드를 복구할 수 있다. 여기서 본 발명에 따른 복호화 방법은 상기 본 발명에 따른 부호화 방법에서 상세히 설명한 방식대로 부호화된 노드들을 이용하여 손실된 노드를 복구할 수 있다.Yet another embodiment of the present invention may be a decoding method for recovering data loss in a storage system including symbols and nodes generated and stored through the above-described encoding process. The decoding method for recovering data loss in the storage system according to the present invention is characterized in that when a node of the distributed storage system is lost, at least one of the nodes other than the lost node is selected And recover the lost node using the selected node. The decoding method according to the present invention can recover the lost node using the encoded nodes in the manner described in detail in the encoding method according to the present invention.

여기서 본 발명에 따른 복호화 방법은 저장 시스템에 연결된 복호화 장치(20)에 의하여 수행될 수 있다. 여기서 복호화 장치(20)는 저장 시스템에 연결된 서버 장치나 컴퓨터 장치나 또는 데이터 복구 기능을 수행하기 위한 임베디드 시스템 등이 될 수 있다.Here, the decoding method according to the present invention can be performed by the decoding device 20 connected to the storage system. Here, the decryption apparatus 20 may be a server apparatus or a computer apparatus connected to the storage system, or an embedded system for performing a data recovery function.

여기서 부호화 장치(10)와 복호화 장치(20)는 필요에 따라 하나의 서버 장치나 컴퓨터 장치에 포함될 수 있다. 이때 하나의 서버 장치는 노드 관리 서버(1)가 될 수 있다.Here, the encoding apparatus 10 and the decrypting apparatus 20 may be included in one server apparatus or a computer apparatus, if necessary. At this time, one server device may be the node management server 1. [

도 15는 이와 같은 부호화 장치(10)와 복호화 장치(20)를 포함하는 분산 저장 시스템을 나타내는 참고도이고, 도 16은 부호화 장치(10)와 복호화 장치(20)의 동작을 모두 수행하는 노드 관리 서버(1)를 포함하는 분산 저장 시스템을 나타내는 참고도이다.15 is a reference diagram showing a distributed storage system including the coding apparatus 10 and the decoding apparatus 20 as described above. FIG. 16 is a diagram showing a node management for performing both the operations of the coding apparatus 10 and the decoding apparatus 20 1 is a reference view showing a distributed storage system including a server 1. Fig.

여기서 본 발명에 따른 데이터 손실 복구를 위한 복호화 방법은 노드 선택 단계(S1000), 복구 단계(S2000)를 포함할 수 있다.Here, the decoding method for data loss recovery according to the present invention may include a node selecting step (S1000) and a recovering step (S2000).

도 12는 상기 본 발명의 일 실시예에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 복호화 방법의 흐름도이다.12 is a flowchart of a decoding method for data loss recovery in a distributed storage system according to an embodiment of the present invention.

여기서 본 발명에 따른 데이터 손실 복구를 위한 복호화 방법은 저장 대상 데이터를 미리 정해진 일정한 개수로 분할한 데이터 블록에 대하여 생성된 정보 심볼(Information Symbol) 및 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 이용하여 생성된 패리티 심볼(Parity Symbol)이 각 노드에 저장된 저장 시스템의 노드들 중 어느 노드가 손실된 경우, 상기 손실된 노드에 저장된 심볼을 복구하기 위한 복호화 방법이 된다.The decoding method for data loss recovery according to the present invention includes an information symbol generated for a data block obtained by dividing data to be stored into a predetermined number, A parity symbol generated by using a block is a decoding method for recovering a symbol stored in the lost node when a node of a storage system stored in each node is lost.

이때 노드 선택 단계(S1000)는 상기 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 상기 노드의 개수와 상기 부호화기가 생성하는 상기 패리티 심볼의 개수를 미리 설정된 한도로 유지하기 위하여, 상기 손실된 노드 이외의 나머지 상기 노드들 중에서 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 적어도 하나 이상의 상기 노드를 선택한다.In this case, in order to maintain the number of nodes required to recover the symbols stored in the lost node and the number of parity symbols generated by the encoder, a node selection step (S1000) And selecting at least one or more nodes capable of obtaining the symbol stored in the lost node by operating the symbol stored in the node among the remaining nodes.

그리고 복구 단계(S2000)는 상기 선택한 노드를 이용하여 상기 손실된 노드에 저장된 심볼을 복구한다.In the recovery step S2000, the symbol stored in the lost node is recovered using the selected node.

먼저 본 발명에 따른 링 코드 기법을 이용하여 부호화된 심볼을 저장하는 노드로 구성된 저장 시스템에서의 데이터 손실 복구를 위한 복호화 방법에 대하여 설명한다.A decoding method for data loss recovery in a storage system configured by nodes storing coded symbols using a ring code technique according to the present invention will be described.

여기서 상기 저장 시스템은 모든 상기 데이터 블록에 대하여 각 상기 데이터 블록이 2개의 상기 패리티 심볼 만을 생성하기 위하여 2번 만 이용되도록 두 개의 상기 데이트 블록을 선택 및 이용하여 부호화된 상기 패리티 심볼을 각 상기 노드에 저장한다.Wherein the storage system is configured to use the parity symbol encoded by selecting and using two data blocks so that each data block is used only twice to generate two parity symbols for all the data blocks, .

여기서 상기 저장 시스템의 노드 중 임의의 1개의 상기 정보 심볼에 대응하는 1개의 노드가 손실되는 경우, 상기 노드 선택 단계(S1000)는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 2개를 선택하고, 상기 복구 단계(S2000)는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있다.Wherein, when one node corresponding to any one of the information symbols of the storage system is lost, the node selection step (S1000) includes a step of selecting, among the remaining nodes other than the lost node, And the recovery step S2000 can recover the lost node by using the selected node. The recovery node S2000 can recover the lost node by selecting the two nodes capable of obtaining the symbol stored in the lost node.

또한 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 노드 선택 단계(S1000)는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 4개를 선택하고, 상기 복구 단계(S2000)는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있다.In addition, when two nodes corresponding to any two information symbols among the nodes of the storage system are lost, the node selection step (S1000) may include a step of selecting, among the remaining nodes other than the lost node, To select the four nodes capable of obtaining the symbol stored in the lost node, and the recovering step (S2000) can recover the lost node using the selected node.

여기서 상기 저장 시스템의 노드 중 임의의 1개의 노드가 손실되는 경우, 상기 노드 선택 단계(S1000)는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 2개를 선택하고, 상기 복구 단계(S2000)는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있다.Here, when any one of the nodes of the storage system is lost, the node selection step (S1000) may calculate the symbol stored in the node among the remaining nodes other than the lost node, The two nodes capable of acquiring the stored symbol are selected, and the recovery step (S2000) can recover the lost node using the selected node.

또한 상기 저장 시스템의 노드 중 임의의 2개의 노드가 손실되는 경우, 상기 노드 선택 단계(S1000)는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 4개를 선택하고, 상기 복구 단계(S2000)는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있다.In addition, when any two nodes among the nodes of the storage system are lost, the node selection step (S1000) calculates the symbol stored in the node among the remaining nodes other than the lost node, Selects four nodes capable of acquiring the stored symbols, and the restoring step (S2000) can recover the lost node by using the selected node.

다음으로 본 발명에 따른 크라운 코드 기법을 이용하여 부호화된 심볼을 저장하는 노드로 구성된 저장 시스템에서의 데이터 손실 복구를 위한 복호화 방법에 대하여 설명한다.Next, a decoding method for data loss recovery in a storage system configured by nodes storing coded symbols using the crown code technique according to the present invention will be described.

여기서 저장 시스템은 상기 데이터 블록 중 제1 데이터 블록, 제2 데이터 블록, 제3 데이터 블록, 제4 데이터블록을 선정하는 과정, 상기 제1 데이터 블록과 상기 제2 데이터 블록의 쌍, 상기 제2 데이터 블록과 상기 제3 데이터 블록의 쌍, 상기 제3 데이터 블록과 상기 제4 데이터 블록의 쌍을 각각 이용하여 상기 쌍 별로 상기 패리티 심볼을 부호화하는 과정, 상기 데이터 블록 중 성기 선정된 데이터 블록을 제외한 나머지 데이터 블록에 대하여, 상기 나머지 데이터 블록 별로 각 상기 나머지 데이터 블록과 상기 제1 데이터 블록의 쌍, 각 상기 나머지 데이터 블록과 상기 제2 데이터 블록의 쌍을 각각 이용하여, 상기 쌍 별로 상기 패티리 심볼을 부호화하는 과정에 의하여 부호화된 상기 패리티 심볼을 각 상기 노드에 저장한다.Wherein the storage system is configured to select a first data block, a second data block, a third data block, and a fourth data block among the data blocks, a pair of the first data block and the second data block, Encoding the parity symbols for each pair using a pair of a block, a third data block, and a third data block and a fourth data block, Wherein each of the remaining data blocks and the first data block and each of the remaining data blocks and the second data block are used for each of the remaining data blocks, And stores the parity symbol encoded by the encoding process in each node.

또한 상기 저장 시스템의 노드 중 임의의 2개의 상기 정보 심볼에 대응하는 2개의 노드가 손실되는 경우, 상기 노드 선택 단계(S1000)는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 3개를 선택하고, 상기 복구 단계(S2000)는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있다.In addition, when two nodes corresponding to any two information symbols among the nodes of the storage system are lost, the node selection step (S1000) may include a step of selecting, among the remaining nodes other than the lost node, And selects the three nodes capable of acquiring the symbol stored in the lost node, and the recovering step (S2000) can recover the lost node using the selected node.

다음으로 본 발명에 따른 티아라 코드 기법을 이용하여 부호화된 심볼을 저장하는 노드로 구성된 저장 시스템에서의 데이터 손실 복구를 위한 복호화 방법에 대하여 설명한다.Next, a decoding method for data loss recovery in a storage system configured by nodes storing coded symbols using the tiara code technique according to the present invention will be described.

여기서 상기 저장 시스템은 상기 데이터 블록 중 제1 데이터 블록, 제2 데이터 블록을 선정하는 과정, 상기 제1 데이터 블록과 상기 제2 데이터 블록의 쌍을 이용하여 상기 패리티 심볼을 부호화하는 과정, 상기 데이터 블록 중 성기 선정된 데이터 블록을 제외한 나머지 데이터 블록에 대하여, 상기 나머지 데이터 블록 별로 각 상기 나머지 데이터 블록과 상기 제1 데이터 블록의 쌍, 각 상기 나머지 데이터 블록과 상기 제2 데이터 블록의 쌍을 각각 이용하여, 상기 쌍 별로 상기 패티리 심볼을 부호화하는 과정에 의하여 부호화된 상기 패리티 심볼을 각 상기 노드에 저장한다.Here, the storage system may include the steps of selecting a first data block and a second data block of the data block, encoding the parity symbol using a pair of the first data block and the second data block, A pair of the remaining data blocks and the first data blocks and a pair of the remaining data blocks and the second data blocks for the remaining data blocks, And stores the parity symbols encoded by coding the parity symbol on a pair basis in each node.

또한 상기 저장 시스템의 노드 중 임의의 2개의 노드가 손실되는 경우, 상기 노드 선택 단계(S1000)는 상기 손실된 노드 이외의 나머지 상기 노드들 중 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 상기 노드 3개를 선택하고, 상기 복구 단계(S2000)는 상기 선택한 노드를 이용하여 상기 손실된 1개의 노드를 복구할 수 있다.In addition, when any two nodes among the nodes of the storage system are lost, the node selection step (S1000) calculates the symbol stored in the node among the remaining nodes other than the lost node, The three nodes capable of acquiring the stored symbols are selected, and the recovery step (S2000) can recover the lost node using the selected node.

여기서 본 발명에 따른 복호화 방법은 부호화 방법에서 설명하였던 것과 마찬가지로 생성행렬을 이용하는 방식으로 표현될 수 있다. 이에 본 발명에 따른 복호화 방법은 링 코드, 크라운 코드, 티아라 코드의 각 기법 별로 상기 부호화 단계(S200)에서 설명한 생성 행렬(Generator Matrix)의 역행렬에 해당하는 패리티 체크 행렬을 곱하여, 상기 저장 대상 데이터에 따른 상기 정보 심볼을 복호화할 수 있다.Here, the decoding method according to the present invention can be expressed by a method using a generation matrix as described in the encoding method. The decoding method according to the present invention multiplies the parity check matrix corresponding to the inverse matrix of the generator matrix described in the coding step S200 for each technique of the ring code, the crown code and the tiar code, The information symbol can be decoded.

여기서 특정 노드가 손실될 경우 이를 복구할 수 있는 노드들의 집합인 복구 집합이 미리 설정될 수 있다. 그리고 이와 같은 복구 집합은 생성 행렬에 따라 결정될 수 있다. 따라서 본 발명에 따른 복호화 방법에서는 이와 같이 소실된 노드에 저장된 심볼에 따라 생성 행렬의 정보를 고려하여 복구에 필요한 심볼들을 선택하고 해당 심볼을 저장하는 노드들에 접속하여 소실된 노드를 복구할 수 있다.Here, a recovery set, which is a set of nodes capable of recovering a specific node, can be preset. Such a recovery set can be determined according to the generation matrix. Therefore, in the decoding method according to the present invention, the symbols necessary for recovery are selected in consideration of the information of the generation matrix according to the symbols stored in the lost node, and the lost nodes can be recovered by accessing the nodes storing the corresponding symbols .

즉 본 발명에 따른 부호화 단계(S200)에서 모든 복구 가능한 노드 손실에 대해 복구 집합을 미리 설정하여 저장할 수 있으며, 노드 손실이 발생하면 본 발명에 따른 복호화 방법에서는 해당 손실 노드에 대하여 미리 설정 및 저장해 놓은 복구 집합에 포함된 노드에 접속하고, 접속한 노드를 이용하여 손실된 노드를 복구할 수 있다. That is, in the encoding step (S200) according to the present invention, a recovery set can be preset and stored for all recoverable node losses. If a node loss occurs, in the decoding method according to the present invention, You can connect to the nodes contained in the recovery set, and recover lost nodes using the connected nodes.

본 발명의 또 다른 실시예는 저장 시스템에서의 데이터 손실 복구를 위한 부호화 장치(10)가 될 수 있다. 여기서 상기 부호화 장치(10)는 부호화부(100), 데이터 처리부(200)를 포함할 수 있다. 여기서 상기 부호화 장치(10)는 위에서 상세히 설명한 본 발명에 따른 데이터 손실 복구를 위한 부호화 방법이 동작하는 것과 동일한 방식으로 동작할 수 있다. 이에 중복되는 부분은 생략하고 간략히 서술한다.Yet another embodiment of the present invention may be an encoding device 10 for recovering data loss in a storage system. The encoding apparatus 10 may include an encoding unit 100 and a data processing unit 200. Here, the encoding apparatus 10 may operate in the same manner as the encoding method for data loss recovery according to the present invention described above. The overlapping portions will be omitted and briefly described.

도 13은 본 발명의 일 실시예에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 부호화 장치(10)의 블록도이다.13 is a block diagram of an encoding apparatus 10 for recovering data loss in a distributed storage system according to an embodiment of the present invention.

부호화부(100)는 저장 시스템에 저장할 저장 대상 데이터를 미리 정해진 일정한 개수로 분할한 데이터 블록에 따른 정보 심볼(Information Symbol)을 생성하고, 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 선택하고 상기 선택한 두 개의 데이터 블록을 이용하여 패리티 심볼(Parity Symbol)을 생성한다. The encoding unit 100 generates an information symbol according to a data block obtained by dividing data to be stored in a storage system into a predetermined number and divides the data blocks into two different data blocks And generates a parity symbol using the selected two data blocks.

데이터 처리부(200)는 상기 생성한 각 심볼을 상기 저장 시스템의 각 노드에 저장한다.The data processing unit 200 stores each symbol generated in each node of the storage system.

여기서 부호화부(100)는 상기 저장 시스템의 노드가 손실될 경우 상기 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 상기 노드의 개수와 상기 부호화기가 생성하는 상기 패리티 심볼의 개수를 미리 설정된 한도로 유지하기 위하여, 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 선택하고 상기 선택한 두 개의 데이터 블록을 이용하여 상기 패리티 심볼을 부호화한다.Here, when the node of the storage system is lost, the encoding unit 100 maintains the number of nodes required to recover the symbol stored in the lost node and the number of the parity symbols generated by the encoder to a predetermined limit And selects the two different data blocks from among the divided data blocks and codes the parity symbols using the selected two data blocks.

본 발명의 또 다른 실시예는 저장 시스템에서의 데이터 손실 복구를 위한 복호화 장치(20)가 될 수 있다. 여기서 상기 복호화 장치(20)는 노드 선택부(1000), 복구부(2000)를 포함할 수 있다. 여기서 상기 복호화 장치(20)는 위에서 상세히 설명한 본 발명에 따른 데이터 손실 복구를 위한 복호화 방법이 동작하는 것과 동일한 방식으로 동작할 수 있다. 이에 중복되는 부분은 생략하고 간략히 서술한다.Yet another embodiment of the present invention may be a decryption device 20 for recovering data loss in a storage system. Here, the decoding apparatus 20 may include a node selecting unit 1000 and a restoring unit 2000. Here, the decryption apparatus 20 can operate in the same manner as the decryption method for data loss recovery according to the present invention described above. The overlapping portions will be omitted and briefly described.

도 14는 본 발명의 일 실시예에 따른 분산 저장 시스템에서의 데이터 손실 복구를 위한 복호화 장치(20)의 블록도이다.14 is a block diagram of a decoding apparatus 20 for recovering data loss in a distributed storage system according to an embodiment of the present invention.

여기서 상기 복호화 장치(20)는 저장 대상 데이터를 미리 정해진 일정한 개수로 분할한 데이터 블록에 대하여 생성된 정보 심볼(Information Symbol) 및 상기 분할한 데이터 블록들 중 두 개의 서로 다른 상기 데이터 블록을 이용하여 생성된 패리티 심볼(Parity Symbol)이 각 노드에 저장된 저장 시스템의 노드들 중 어느 노드가 손실된 경우, 상기 손실된 노드에 저장된 심볼을 복구하기 위한 복호화 장치(20)가 된다.Here, the decoding apparatus 20 generates an information symbol (Information Symbol) for a data block obtained by dividing data to be stored into a predetermined number and a data block generated by using two different data blocks among the divided data blocks The parity symbol is a decoding device 20 for recovering a symbol stored in the lost node when a node of the storage system stored in each node is lost.

여기서 노드 선택부(1000)는 상기 손실된 노드에 저장된 심볼을 복구하기 위해 필요한 상기 노드의 개수와 상기 부호화기가 생성하는 상기 패리티 심볼의 개수를 미리 설정된 한도로 유지하기 위하여, 상기 손실된 노드 이외의 나머지 상기 노드들 중에서 상기 노드에 저장된 상기 심볼을 연산하여 상기 손실된 노드에 저장된 상기 심볼을 획득하는 것이 가능한 적어도 하나 이상의 상기 노드를 선택한다.In order to maintain the number of nodes required for recovering symbols stored in the lost node and the number of parity symbols generated by the encoder to a predetermined limit, And selecting at least one or more nodes capable of obtaining the symbol stored in the lost node by operating the symbol stored in the node among the remaining nodes.

여기서 복구부(2000)는 상기 선택한 노드를 이용하여 상기 손실된 노드에 저장된 심볼을 복구한다.Here, the restoration unit 2000 restores the symbols stored in the lost node using the selected node.

또한 본 발명의 또 다른 형태의 실시예는 위에서 상술한 분산 저장 시스템에서의 데이터 손실 복구를 위한 부호화 방법 또는 복호화 방법을 수행하기 위해 매체에 저장된 컴퓨터 프로그램이 될 수 있다.Yet another embodiment of the present invention can be a computer program stored in a medium for performing a coding method or a decoding method for data loss recovery in the distributed storage system described above.

이상에서 설명한 본 발명의 실시예를 구성하는 모든 구성요소들이 하나로 결합하거나 결합하여 동작하는 것으로 기재되어 있다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. It is to be understood that the present invention is not limited to these embodiments, and all elements constituting the embodiment of the present invention described above are described as being combined or operated in one operation. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them.

또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 또한, 이와 같은 컴퓨터 프로그램은 USB 메모리, CD 디스크, 플래쉬 메모리 등과 같은 컴퓨터가 읽을 수 있는 기록매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 기록매체로서는 자기 기록매체, 광 기록매체, 캐리어 웨이브 매체 등이 포함될 수 있다.In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. In addition, such a computer program may be stored in a computer readable medium such as a USB memory, a CD disk, a flash memory, etc., and read and executed by a computer to implement an embodiment of the present invention. As the recording medium of the computer program, a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like can be included.

또한, 기술적이거나 과학적인 용어를 포함한 모든 용어들은, 상세한 설명에서 다르게 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 갖는다. 사전에 정의된 용어와 같이 일반적으로 사용되는 용어들은 관련 기술의 문맥상의 의미와 일치하는 것으로 해석되어야 하며, 본 발명에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Furthermore, all terms including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined in the Detailed Description. Commonly used terms, such as predefined terms, should be interpreted to be consistent with the contextual meanings of the related art, and are not to be construed as ideal or overly formal, unless expressly defined to the contrary.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. 따라서, 본 발명에 개시된 실시예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be. Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

S100 : 저장 대상 데이터 분할 단계
S200 : 부호화 단계
S300 : 저장 단계
S1000 : 노드 선택 단계
S2000 : 복구 단계
100 : 부호화부
200 : 데이터 처리부
1000 : 노드 선택부
2000 : 복구부
10 : 부호화 장치
20 : 복호화 장치
1 : 노드 관리 서버S100: Storage target data segmentation step
S200: encoding step
S300: storage step
S1000: Node selection step
S2000: Recovery Steps
100:
200:
1000: Node selection unit
2000:
10: Encoding device
20: Decryption device
1: Node management server

Claims

Dividing data to be stored into a predetermined number of data blocks;
Generating an information symbol according to the divided data block, selecting two different data blocks from the divided data blocks, and using the selected two data blocks to generate a parity symbol (Parity Symbol) Encoding step; And
And storing the generated symbols in respective nodes of the storage system,
Wherein the encoding step comprises the steps of: for the node stored in the storage system, to connect to up to four or less of the nodes in all cases of loss of two or less of the nodes in all cases, Generates a parity symbol,
Wherein the encoding step comprises the steps of: inputting an input matrix having input symbols representing the divided data blocks as elements, generating matrixes having element values of 1 or 0 and the number of 1s included in each column vector is 2 or less Generator Matrix) to generate an output matrix,
Wherein an element corresponding to one input symbol in the output matrix is the information symbol and an element corresponding to a sum of two input symbols in the output matrix is the parity symbol. A method for restoring data loss in a mobile station.

delete

The method according to claim 1,
Wherein the column vector of the generator matrix has a length according to the number of elements of the input matrix,
Wherein the generation matrix is a matrix formed by concatenating a unit matrix of the number of elements of the input matrix and all column vectors of two elements adjacent to each other in the row direction. .

The method according to claim 1,
Wherein the column vector of the generator matrix has a length according to the number of elements of the input matrix,
Wherein the generation matrix includes a unit matrix of the number of elements of the input matrix, five column vectors of two elements adjacent to each other in the row direction, and one element of one of the five column vectors And a matrix of all combinations of column vectors of which one element not included in the five column vectors is 1 is connected.

The method according to claim 1,
Wherein the column vector of the generator matrix has a length according to the number of elements of the input matrix,
Wherein the generation matrix includes a unit matrix of the number of elements of the input matrix, one column vector of two elements adjacent to each other in the row direction, one column vector of one column vector, Is a matrix obtained by concatenating all combinations of column vectors, each of which is an element not included in the first column.

The method according to claim 1,
If a node of the storage system is lost, it can be recovered using the remaining nodes other than the lost node,
Wherein the coding step comprises the steps of: storing the number of nodes required to recover a symbol stored in the lost node and the number of parity symbols generated in the encoding step to a predetermined limit; Wherein the parity symbol generator generates the parity symbols using the selected two data blocks and selects the different data blocks.

The method according to claim 1,
Wherein the encoding step generates each parity symbol by selecting and using two data blocks so that each data block is used only twice to generate two parity symbols for all the data blocks. The method comprising the steps < RTI ID = 0.0 > of: < / RTI >

8. The method of claim 7,
When one node corresponding to any one of the information symbols of the storage system is lost, the lost node can be recovered using two nodes other than the lost node,
When two nodes corresponding to any two information symbols of the nodes of the storage system are lost, the lost two nodes can be recovered using four nodes other than the lost node , A coding method for recovering data loss in a storage system.

8. The method of claim 7,
When the storage target data is divided into k data blocks,
Wherein the coding step generates k pieces of the information symbols and generates k pieces of the parity symbols using all k pieces of the data blocks twice.

2. The method according to claim 1,
Selecting a first data block, a second data block, a third data block, and a fourth data block among the data blocks;
A pair of the first data block and the second data block, a pair of the second data block and the third data block, and a pair of the third data block and the fourth data block, The process of generating a symbol,
A pair of each of the remaining data blocks and the first data block, a pair of each of the remaining data blocks and the second data block for each of the remaining data blocks excluding the selected data block among the data blocks, And generating the parity symbol for each pair by using the parity symbols, respectively.

11. The method of claim 10,
When one node corresponding to any one of the information symbols of the storage system is lost, the lost node can be recovered using two nodes other than the lost node,
If two nodes corresponding to any two information symbols of the nodes of the storage system are lost, the lost two nodes can be recovered using three nodes other than the lost node,
If one of the nodes of the storage system is lost, the lost node can be recovered using two nodes other than the lost node,
Characterized in that if any two nodes of the storage system are lost then it is possible to recover the two lost nodes using four nodes in addition to the lost node, / RTI >

11. The method of claim 10,
When the storage target data is divided into k data blocks,
Wherein the coding step generates k pieces of the information symbols and generates 2k to 5 pieces of the parity symbols.

2. The method according to claim 1,
Selecting a first data block and a second data block among the data blocks,
Generating the parity symbol using a pair of the first data block and the second data block,
A pair of each of the remaining data blocks and the first data block, a pair of each of the remaining data blocks and the second data block for each of the remaining data blocks excluding the selected data block among the data blocks, And generating the parity symbol for each pair by using the parity symbols, respectively.

14. The method of claim 13,
When one node corresponding to any one of the information symbols of the storage system is lost, the lost node can be recovered using two nodes other than the lost node,
If two nodes corresponding to any two information symbols of the nodes of the storage system are lost, the lost two nodes can be recovered using three nodes other than the lost node,
If one of the nodes of the storage system is lost, the lost node can be recovered using two nodes other than the lost node,
Characterized in that if any two of the nodes of the storage system are lost then the lost two nodes can be recovered using three nodes in addition to the lost node, / RTI >

14. The method of claim 13,
When the storage target data is divided into k data blocks,
Wherein the encoding step generates k pieces of the information symbols and generates 2k to 3 pieces of the parity symbols.

An information symbol (Information Symbol) generated for a data block obtained by dividing data to be stored into a predetermined number and a parity symbol generated using two different data blocks among the divided data blocks A decoding method for recovering a symbol stored in a lost node when a node of a storage system stored in each node is lost,
And for storing the number of nodes and the number of parity symbols required for restoring symbols stored in the lost node to a predetermined limit, calculating the symbol stored in the node among remaining nodes other than the lost node A node selecting step of selecting at least one or more nodes capable of obtaining the symbol stored in the lost node; And
And recovering symbols stored in the lost node using the selected node,
The generated information symbol and the generated parity symbol are each composed of an input matrix in which each element is an input symbol representing each divided data block, and an input matrix in which the value of the element is either 1 or 0, Wherein an element corresponding to one of the input symbols in the output matrix is multiplied by the information symbol and a sum of two input symbols in the output matrix is multiplied by a generator matrix having a number of 2 or less, Are generated by the parity symbol, respectively, in the storage system.

17. The method of claim 16,
The storage system stores the parity symbols encoded by selecting and using two data blocks so that each data block is used only twice to generate two parity symbols for all the data blocks, and,
When one node corresponding to any one of the information symbols among the nodes of the storage system is lost, the node selecting step calculates the symbol stored in the node among the remaining nodes other than the lost node, Selecting the two nodes capable of acquiring the symbol stored in the lost node, the recovering step restoring the lost node using the selected node,
When two nodes corresponding to any two information symbols among the nodes of the storage system are lost, the node selecting step calculates the symbol stored in the node among the remaining nodes other than the lost node, Selecting the four nodes capable of obtaining the symbol stored in the lost node and recovering the lost two nodes using the selected node. Decryption method for recovery.

17. The method of claim 16,
Wherein the storage system is configured to select a first data block, a second data block, a third data block, and a fourth data block among the data blocks, a pair of the first data block and the second data block, Generating the parity symbol for each pair using a pair of the block, the third data block, and the third data block and the fourth data block, The parity symbol is generated for each pair by using the pair of the remaining data block and the first data block and the pair of the remaining data block and the second data block for each of the remaining data blocks, And storing the parity symbols encoded by the parity symbol encoding unit in each node,
When one node corresponding to any one of the information symbols among the nodes of the storage system is lost, the node selecting step calculates the symbol stored in the node among the remaining nodes other than the lost node, Selecting the two nodes capable of acquiring the symbol stored in the lost node, the recovering step restoring the lost node using the selected node,
When two nodes corresponding to any two information symbols among the nodes of the storage system are lost, the node selecting step calculates the symbol stored in the node among the remaining nodes other than the lost node, Selecting the three nodes capable of obtaining the symbol stored in the lost node and recovering the lost node using the selected node. Decryption method for recovery.

17. The method of claim 16,
Wherein the storage system selects a first data block and a second data block among the data blocks,
Generating the parity symbol by using the pair of the first data block and the second data block; calculating, for each of the remaining data blocks excluding the selected data block, The parity symbol encoded by generating the parity symbol for each pair using the pair of the block and the first data block and the pair of the remaining data block and the second data block, Store,
When one node corresponding to any one of the information symbols among the nodes of the storage system is lost, the node selecting step calculates the symbol stored in the node among the remaining nodes other than the lost node, Selecting the two nodes capable of acquiring the symbol stored in the lost node, the recovering step restoring the lost node using the selected node,
When two nodes corresponding to any two information symbols among the nodes of the storage system are lost, the node selecting step calculates the symbol stored in the node among the remaining nodes other than the lost node, Selecting the three nodes capable of obtaining the symbol stored in the lost node and recovering the lost node using the selected node. Decryption method for recovery.

A storage system comprising: an information symbol generating unit configured to generate an information symbol according to a predetermined number of data blocks to be stored in a storage system, select two different data blocks from the divided data blocks, A coding unit for generating a parity symbol using a data block; And
And a data processor for storing the generated symbols in each node of the storage system,
Wherein the encoding unit is configured to divide the number of nodes required to recover a symbol stored in the lost node and the number of parity symbols generated by the encoding unit into a predetermined limit when the node of the storage system is lost, Selects two different data blocks from among one data block, generates the parity symbols using the selected two data blocks,
Wherein the encoding unit comprises: an input matrix having input symbols each representing an input symbol representing the divided data blocks; and a generator matrix having a value of an element of 1 or 0 and the number of 1's included in each column vector is 2 or less Matrix) to generate an output matrix,
And an element corresponding to one input symbol in the output matrix is the information symbol and an element corresponding to a sum of two input symbols in the output matrix is the parity symbol. Wherein the storage device is a storage device.

An information symbol (Information Symbol) generated for a data block obtained by dividing data to be stored into a predetermined number and a parity symbol generated using two different data blocks among the divided data blocks A decoding apparatus for recovering a symbol stored in a lost node when a node of a storage system stored in each node is lost,
And for storing the number of nodes and the number of parity symbols required for restoring symbols stored in the lost node to a predetermined limit, calculating the symbol stored in the node among remaining nodes other than the lost node A node selector for selecting at least one or more nodes capable of obtaining the symbol stored in the lost node; And
And a recovering unit for recovering symbols stored in the lost node using the selected node,
The generated information symbol and the generated parity symbol are each composed of an input matrix in which each element is an input symbol representing each divided data block, and an input matrix in which the value of the element is either 1 or 0, Wherein an element corresponding to one of the input symbols in the output matrix is multiplied by the information symbol and a sum of two input symbols in the output matrix is multiplied by a generator matrix having a number of 2 or less, Are generated by the parity symbol, respectively, in the storage system.