KR100529278B1

KR100529278B1 - Data Mirroring System to improve the performance of read operation for large data

Info

Publication number: KR100529278B1
Application number: KR10-2003-0092526A
Authority: KR
Inventors: 강동재; 김경배; 정성인; 김명준
Original assignee: 한국전자통신연구원
Priority date: 2003-12-17
Filing date: 2003-12-17
Publication date: 2005-11-17
Also published as: KR20050060804A

Abstract

본 발명은 데이터를 중복 저장하는 읽기 연산 집약 시스템에서 디스크에 저장되는 데이터 블록의 배치를 수정하고 이에 따른 새로운 맵핑 방식을 제공함으로써 기존의 데이터 중복 저장 방식(RAID1)의 장점인 데이터 신뢰성을 유지하면서 동시에 데이터 분산 저장 방식(RAID0)의 빠른 읽기 성능을 갖는 대용량 데이터에 대한 데이터 중복 저장 시스템에 관한 것이다. The present invention provides a new mapping scheme by modifying the arrangement of data blocks stored on disk in a read operation intensive system that stores data redundantly, while maintaining data reliability, which is an advantage of the existing data redundant storage scheme (RAID1). The present invention relates to a data redundant storage system for a large amount of data having a fast read performance of a data distributed storage method (RAID0).

본 발명의 데이터 중복 저장 시스템은 원본 데이터를 저장하는 원본 디스크와 상기 원본 데이터에 대한 사본들을 저장하는 다수의 중복 디스크들로 구성되며, 디스크 개수만큼의 연속적인 데이터 블록들을 그룹핑하여 상기 원본 디스크와 상기 중복 디스크들에 대한 SMU들을 생성한 후 각 디스크내에서 SMU 순서에 따라 SMU에 SMUno값을 부여하고 각 SMU내에서의 데이터 블록의 순서에 따라 데이터 블록에 SMUidx값을 부여하며, 동일 SMUno값의 각 SMU내에서의 동일 데이터 블록들은 각 디스크마다 서로 다른 배치 순서를 갖는 것을 특징으로 한다. The redundant data storage system of the present invention is composed of an original disk storing original data and a plurality of redundant disks storing copies of the original data, and grouping as many data blocks as the number of disks into the original disk and the original disk. After creating SMUs for duplicate disks, assign SMUno values to SMUs according to the order of SMUs within each disk, and assign SMUidx values to data blocks according to the order of data blocks within each SMU. The same data blocks in the SMU are characterized by having different arrangement order for each disk.

Description

Data mirroring system to improve the performance of read operation for large data}

본 발명은 데이터 중복 저장 시스템에 관한 것이며, 보다 상세히는 저장 데이터의 신뢰성을 위하여 데이터를 중복 저장하는 읽기 연산 집약 시스템에서 대용량 데이터의 읽기 연산의 속도를 향상시키기 위한 데이터 중복 저장 시스템에 관한 것이다. The present invention relates to a redundant data storage system, and more particularly to a redundant data storage system for improving the speed of a large data read operation in a read operation intensive system for storing redundant data for reliability of stored data.

최근 멀티미디어와 인터넷의 대중화가 야기한 급격한 저장 데이터량의 증가는 테라(Tera)바이트 이상의 대용량 저장 공간과 대용량 정보의 안정성을 지원하는 스토리지 시스템을 요구하고 있으며, 이러한 환경에서 데이터의 중복 저장 방식은 대용량 정보의 신뢰성을 위해서 필수적이다. Recently, the rapid increase in the amount of stored data caused by the popularization of multimedia and the Internet requires a storage system that supports a large storage space of more than terabytes and the stability of large information. Is essential for reliability.

한편, 본 발명과 관련한 선행 기술로는 RAID레벨 중에서 RAID0(stripping)와 RAID1(mirroring)이 있으며 각각의 RAID는 장단점을 갖는다. Meanwhile, prior arts related to the present invention include RAID0 (stripping) and RAID1 (mirroring) among RAID levels, and each RAID has advantages and disadvantages.

RAID0(stripping)는 여러 개의 디스크에 데이터를 분산하여 저장하는 방식으로 중복 데이터를 갖지 않으므로 추가적인 디스크 비용이 없고 읽기 및 쓰기 성능이 뛰어나다는 장점을 갖는다. 반면에, 중복 데이터가 없으므로 RAID0를 구성하는 디스크 중 하나 이상의 디스크에서 오류가 발생하면 데이터의 가용성 지원 및 복구가 불가능하다는 단점을 갖는다. RAID0 (stripping) is a method of distributing and storing data on multiple disks, which does not have redundant data, and thus has no additional disk cost and excellent read and write performance. On the other hand, since there is no redundant data, if an error occurs in one or more disks constituting RAID0, data availability support and recovery are impossible.

RAID1(mirroring)은 동일한 데이터를 여러 개의 저장장치에 중복 저장하는 방식으로 중복 데이터를 위한 디스크 비용이 크지만 구성 저장장치 중의 일부가 손상되더라도 저장 데이터의 가용성 및 복구 기능을 지원할 수 있는 저장 방식이다. RAID1 (mirroring) is a storage method that duplicates and stores the same data in multiple storage devices. However, RAID1 (mirroring) is a storage method that can support the availability and recovery function of stored data even if a large disk cost for redundant data is damaged.

상기 언급한 바와 같이 RAID0, RAID1을 포함한 6개의 Berkely 레이드 레벨들은 모두 장, 단점을 동시에 갖으며 적용하려는 어플리케이션에 맞는 레벨의 선택이 요구된다. 이상적인 레이드 시스템은 데이터 가용성 및 복구 능력이 뛰어난 RAID1의 특징을 가지면서 RAID0와 같이 우수한 입출력 성능을 갖는 시스템이다. 따라서, RAID1과 RAID0를 결합한 형태인 RAID1+0도 많이 사용되지만 RAID0을 구성하는 논리 디스크들만큼 추가적인 디스크들이 요구되므로 디스크 비용이 크다는 문제점을 갖는다. As mentioned above, all six Berkely RAID levels, including RAID0 and RAID1, have both advantages and disadvantages and require a level selection for the application to be applied. An ideal raid system is a system with RAID I / O, such as RAID0, with excellent data availability and recoverability. Therefore, although RAID1 + 0, which is a combination of RAID1 and RAID0, is also used a lot, additional disks are required as logical disks constituting RAID0.

본 발명은 상술한 종래의 문제점을 해결하기 위한 것으로서, 본 발명의 목적은 데이터를 중복 저장하는 읽기 연산 집약 시스템에서 디스크에 저장되는 데이터 블록의 배치를 수정하고 이에 따른 새로운 맵핑 방식을 제공함으로써 기존의 데이터 중복 저장 방식(RAID1)의 장점인 데이터 신뢰성을 유지하면서 동시에 데이터 분산 저장 방식(RAID0)의 빠른 읽기 성능을 갖는 대용량 데이터에 대한 데이터 중복 저장 시스템을 제공하는데 있다. SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned conventional problems, and an object of the present invention is to provide a new mapping scheme by modifying an arrangement of data blocks stored on a disk in a read operation-intensive system for storing data redundantly. The present invention provides a data redundancy storage system for a large amount of data having a fast read performance of the data distributed storage method (RAID0) while maintaining data reliability, which is an advantage of the data redundant storage method (RAID1).

본 발명에서 제안하는 방식은 대용량의 데이터 접근이나 다중 호스트의 환경에서 빠른 읽기 연산을 보장하기 위한 방식으로 대부분의 멀티미디어 데이터와 같이 갱신 연산이 적고 읽기 연산 집약적인 시스템에 적합하다. The method proposed in the present invention is a method for guaranteeing a fast read operation in a large data access or a multi-host environment, and is suitable for a system with less update operations and a read operation intensive system like most multimedia data.

상기 본 발명의 목적을 달성하기 위한 대용량 데이터에 대한 데이터 중복 저장 시스템은, 원본 데이터를 저장하는 원본 디스크와, 상기 원본 데이터에 대한 사본들을 저장하는 다수의 중복 디스크들로 구성되며, 디스크 개수만큼의 연속적인 데이터 블록들을 그룹핑하여 상기 원본 디스크와 상기 중복 디스크들에 대한 SMU들을 생성한 후 각 디스크내에서 SMU 순서에 따라 SMU에 SMUno값을 부여하고 각 SMU내에서의 데이터 블록의 순서에 따라 데이터 블록에 SMUidx값을 부여하며, 동일 SMUno값의 각 SMU내에서의 동일 데이터 블록들은 각 디스크마다 서로 다른 배치 순서를 갖는 것을 특징으로 한다. The data redundant storage system for a large amount of data for achieving the object of the present invention comprises a source disk for storing the original data, and a plurality of redundant disks for storing copies of the original data, Create SMUs for the source disk and the redundant disks by grouping consecutive data blocks, assigning SMUno values to the SMUs according to the order of SMUs within each disk, and data blocks according to the order of the data blocks within each SMU. SMUidx values are assigned to the same data blocks, and the same data blocks in each SMU having the same SMUno value have different arrangement order for each disk.

이하, 본 발명에 따른 실시예를 첨부한 도면을 참조하여 상세히 설명하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에서 제안하는 중복 데이터의 저장 방식은 이하 RAID-SM(Stripped Mirroring)이라는 용어를 사용한다. The redundant data storage method proposed by the present invention uses the term RAID-SM (Stripped Mirroring).

먼저, 본 발명에서 제안하는 데이터 블록의 배치에 대하여 설명하기로 한다. 도 1은 일반적인 데이터 중복 저장 방식(RAID-1)의 데이터 블록 배치를 도시하고 있고, 도 2는 본 발명에서 제안하는 데이터 블록 배치를 도시하고 있다. First, the arrangement of data blocks proposed by the present invention will be described. FIG. 1 illustrates a data block arrangement of a general data redundant storage method (RAID-1), and FIG. 2 illustrates a data block arrangement proposed by the present invention.

도 1과 도 2에서 d는 원본 디스크(110;210)의 데이터를, m은 중복 디스크(120,130;220,230)의 데이터를 각각 나타내고 영문 소문자 뒤의 번호는 데이터 블록의 번호를 나타낸다. 따라서, 동일한 번호를 갖는 블록들은 중복 데이터들이며 도 1과 도 2는 3개의 디스크에 데이터를 중복 저장하는 실시 예를 보여주고 있다. 1 and 2, d denotes data of the original disks 110 and 210, m denotes data of the redundant disks 120, 130, 220, and 230, respectively, and numbers after the lowercase letters indicate numbers of the data blocks. Accordingly, blocks having the same number are redundant data, and FIGS. 1 and 2 show an embodiment in which data is redundantly stored on three disks.

기존의 일반적인 중복 저장 방식은, 도 1에 도시된 바와 같이, 중복 데이터 블록들이 RAID1을 구성하는 각각의 저장장치에서 동일한 데이터 블록의 위치를 갖는다. In the existing general redundant storage scheme, as shown in FIG. 1, redundant data blocks have a location of the same data block in each storage device that constitutes RAID1.

하지만, 본 발명에서의 중복 데이터들의 블록들은, 도 2를 참조하면, 각 디스크마다 모두 다른 블록 위치와 배치 방식을 갖으며 기존의 RAID0의 스크라이핑 유닛(stripping unit)과 유사한 블록들의 그룹이 존재한다. However, in the present invention, the blocks of redundant data, with reference to FIG. 2, have different block positions and arrangements for each disk, and there are groups of blocks similar to the striping unit of the existing RAID0. do.

이하, 본 발명의 이러한 블록 그룹들에 대해 SMU(Stripped-Mirroring Unit)(211, 221, 231)라는 용어를 사용하며 연속적인 데이터 블록들의 모임으로 정의하기로 한다. Hereinafter, the term stripped-mirroring unit (SMU) 211, 221, and 231 for these block groups of the present invention will be defined as a collection of consecutive data blocks.

상기 SMU(211, 221, 231)를 구성하는 데이터 블록의 수는 중복 데이터를 저장하는 디스크들의 수와 동일한 값을 갖는다. 또한, SMU들은 RAID0에서의 stripping unit number와 유사한 디스크내의 순서값을 갖으며 이를 SMUno(260)라고 한다. The number of data blocks constituting the SMUs 211, 221, and 231 has the same value as the number of disks storing redundant data. In addition, SMUs have a sequence value within the disk similar to the stripping unit number in RAID0, which is called SMUno 260.

SMUno(260)가 같은 SMU들은 동일 데이터 블록들로 구성되어지며 각 SMU내에서의 블록들은 서로 다른 배치를 갖는다. SMU를 구성하는 각각의 블록들 역시 SMU내에서의 순서를 갖으며 SMUidx(250)라는 용어를 사용한다. SMUs with the same SMUno 260 are composed of identical data blocks and blocks within each SMU have different arrangements. Each block constituting the SMU also has an order within the SMU and uses the term SMUidx 250.

본 발명의 RAID-SM을 구성하는 디스크들에서 동일 SMUno의 동일 SMUidx를 접근하면 SMU를 구성하는 블록수 만큼 연속된 블록 그룹을 접근할 수 있도록 한다. 예를 들면, 도 2에서 SMUno = 1과 SMUidx = 2로 접근하게 되면 d5, m3, m4 블록을 접근하게 되므로 SMUno(1)에 포함된 연속된 블록들에 병렬로 접근할 수 있다. Accessing the same SMUidx of the same SMUno from the disks of the RAID-SM of the present invention allows access to a contiguous block group as many blocks as the SMU. For example, in FIG. 2, when SMUno = 1 and SMUidx = 2 are accessed, blocks d5, m3, and m4 are accessed, so that consecutive blocks included in SMUno (1) can be accessed in parallel.

RAID-SM을 구성하는 디스크 중에서 원본 디스크는 기존 디스크상의 블록들과 동일한 배치 방식을 가지며 중복 디스크들은 새로운 블록 배치 방식의 적용을 받게 된다. 새로운 배치 방식을 갖는 블록들은 SMU 단위로 배치가 이루어지기 때문에 중복 데이터들은 동일한 SMU값을 가지고 SMUidx는 서로 다른 값을 갖게 된다. 이러한 배치 방식은 결과적으로 RAID-SM을 구성하는 각 디스크에 존재하는 동일한 SMUno의 동일한 SMUidx를 접근함으로서 순차적인 디스크 블록들의 접근을 가능하게 하여 RAID0과 같이 SMU 크기만큼의 RAID1 데이터의 병렬 접근(Parallel Access)을 허용한다.Among the disks in the RAID-SM, the original disk has the same layout as the blocks on the existing disk, and the redundant disks are subject to the new block layout. Blocks with a new layout method are arranged in units of SMU, so duplicate data has the same SMU value and SMUidx has different values. This arrangement results in sequential disk blocks access by accessing the same SMUidx of the same SMUno present on each disk constituting RAID-SM, allowing parallel access of RAID1 data as SMU size as in RAID0. Is allowed.

한편, 본 발명에 따른 데이터 블록의 배치에서 주소 맵핑(address mapping) 방식은 다음과 같다. Meanwhile, an address mapping scheme in the arrangement of data blocks according to the present invention is as follows.

도 2에 도시된 바와 같이 본 발명의 RAID-SM은 기존의 RAID1과 다른 데이터 배치 방법을 사용하고 있으므로 그에 적합한 새로운 주소 맵핑 방식이 요구된다. As shown in FIG. 2, the RAID-SM of the present invention uses a different data disposition method than the existing RAID1, and thus a new address mapping method suitable for the RAID-SM is required.

기존 RAID1은 중복 데이터를 접근하기 위해서 각 디스크의 동일한 데이터 블록을 접근하지만, 본 발명의 RAID-SM의 경우는 각 디스크마다 데이터 블록의 위치가 다르므로 중복 데이터를 접근하기 위한 주소가 디스크마다 다른 값을 갖게 된다. 따라서, RAID-SM을 구성하는 특정 디스크에서 해당 블록 위치를 찾을 수 있는 새로운 주소 맵핑식이 정의되어야 한다. Conventional RAID1 accesses the same data block of each disk to access the redundant data. However, in the RAID-SM of the present invention, since the location of the data block is different for each disk, the address for accessing the redundant data has a different value for each disk. Will have Therefore, a new address mapping expression must be defined to find the block location on the specific disk that constitutes RAID-SM.

한편, 도 3은 본 발명에 따른 RAID-SM에서 원본 데이터와 중복 데이터의 디스크 상 배치를 일반화하여 보여주는 도면으로서, 도 3을 참조하여 원본 디스크(310)에서 임의의 블록 번호 k(360)가 주어지는 경우, RAID-SM를 구성하는 각각의 디스크에서 중복 데이터 블록의 실제 주소를 구하는 식을 정리해 보도록 한다. 3 is a diagram illustrating a general arrangement of original data and redundant data on a disk in a RAID-SM according to the present invention. Referring to FIG. 3, an arbitrary block number k 360 is given to the original disk 310. In this case, the formula for obtaining the actual address of the redundant data block in each disk configuring the RAID-SM is described.

SMU의 크기는 RAID-SM을 구성하는 디스크들의 개수와 동일하므로 N이 된다. 원본 디스크(310)에서 주소 k를 갖는 블록을 block(k)(360), k 블록의 SMUidx를 SMUidx(k)(350)라고 하고 도 3에서 순서값 i를 갖는 디스크의 order값을 order(i)(331)라고 하자. The size of the SMU is equal to the number of disks constituting the RAID-SM, and thus N. In the source disk 310, the block having the address k is called block (k) 360, and the SMUidx of the k block is called SMUidx (k) 350, and the order value of the disk having the order value i in FIG. (331).

그러면, order(i)가 SMUidx(k)보다 작거나 같은 경우는 디스크 i에서 SMUno가 |k/N|인 SMU의 블럭(SMUidx(k) - order(i))에 원본 디스크의 block(k)와 동일한 중복 데이터가 존재하고, 디스크 order(i)가 SMUidx(k)보다 크면 SMUno가 |k/N|인 SMU에서 (order(i) - SMUidx(k))의 값만큼 감소한 위치에 배치된다. Then, if order (i) is less than or equal to SMUidx (k), block (k) of the original disk is placed in block i (SMUidx (k)-order (i)) of SMU with SMUno | k / N | If there is duplicate data equal to and the disk order (i) is larger than SMUidx (k), it is placed in a position reduced by the value of (order (i)-SMUidx (k)) in the SMU where SMUno is | k / N |.

또한, SMU는 그 데이터 블록의 수 N으로 크기를 나타내고 SMUfirst는 특정 SMU에서의 첫 번째 블록이 갖는 주소라고 하자. 또한, 디스크 i에서 블록 k의 중복 데이터 블록이 가지는 SMUidx를 SMUidx(k(i)), 디스크 i에서 구하려는 블록 k의 주소를 addr(k(i))라고 하면 addr(k(i))는 SMUno와 SMU내에서의 인덱스를 구함으로서 찾을 수 있다. 이러한 맵핑 규칙을 알고리즘으로 표현하면 아래와 같다. In addition, the SMU is represented by the number N of the data blocks, and SMUfirst is the address of the first block in a specific SMU. In addition, if SMUidx of the redundant data block of block k in disk i is SMUidx (k (i)), and the address of block k to be obtained in disk i is addr (k (i)), addr (k (i)) is SMUno. Can be found by finding the index in the and SMU. This mapping rule is expressed as an algorithm as follows.

한편, 상기에서 정의한 데이터 블록의 배치 방법과 주소 맵핑 방법이 적용된 RAID-SM에서의 입출력 방식은 다음과 같다. Meanwhile, the input / output method in the RAID-SM to which the data block arrangement method and the address mapping method defined above are applied is as follows.

기존의 데이터 중복 저장 방식에서는 호스트(서비스를 사용하는 컴퓨터)들이 데이터에 대한 접근 요청을 하는 경우, RAID1을 구성하는 저장 장치 중에서 현재 사용하지 않는 임의의 저장 장치에 요청을 할당하는 방식을 사용한다. 하지만, 이러한 종래의 방식은 임의 호스트가 그 할당된 하나의 저장 장치로부터 상기 요구되는 데이터를 모두 읽어야 하므로 읽기 연산의 성능은 해당 디스크에서 지원하는 성능에 의존하게 된다. In the existing data redundancy method, when hosts (computers using a service) request access to data, a request is allocated to any storage device that is not currently used among the storage devices configuring RAID1. However, this conventional approach requires any host to read all of the required data from its allocated one storage device, so the performance of the read operation will depend on the performance supported by the disk.

예를 들면, 5개의 블록 읽기 연산을 요청하는 경우, 도 4에서와 같이 일반 중복 저장 방식에서는 할당된 저장 장치가 원본 디스크(410)인 경우, 원본 디스크로부터 요구한 데이터인 d0~d4(440)까지를 모두 접근하게 된다. For example, in the case of requesting five block read operations, as shown in FIG. 4, when the allocated storage device is the original disk 410, the data requested from the source disks d0 to d4 440 are allocated. You will approach all of them.

하지만, 본 발명의 RAID-SM의 경우는 요구하는 데이터를 RAID-SM을 구성하는 모든 디스크(410, 420, 430)로부터 동시에 접근하여 데이터 블록(450)을 접근하게 되므로 데이터 분산 저장 방식(RAID0)의 읽기 성능을 제공하는 것이 가능하다. However, in the case of the RAID-SM of the present invention, since the required data is accessed from all the disks 410, 420, and 430 constituting the RAID-SM at the same time to access the data block 450, the data distributed storage method RAID0 It is possible to provide read performance.

또한, 여러 호스트가 동시에 동일한 데이터를 접근하는 경우, RAID1에서는 하나의 저장 장치를 하나의 호스트에 할당함으로써 입출력의 성능을 개선하지만, 본 발명의 RAID-SM은 도 4와 같이 RAID-SM을 구성하는 단위인 SMU의 SMUidx를 각각 할당함으로서 다중 호스트의 요구에 대해서도 RAID0의 성능을 지원할 수 있다는 장점을 갖는다. 할당할 수 있는 SMUidx의 수는 미러링을 구성하는 디스크의 개수와 동일하게 구성되어 있으므로 RAID1에서 지원할 수 있는 동시 호스트의 수와 동일한 수를 지원할 수 있다. 즉, RAID1인 미러링에서는, 도 4에서와 같이 3개의 디스크(410,420,430)로 구성이 된 경우, 접근을 요구하는 호스트에 대하여 임의의 디스크를 할당 할 수 있고 RAID-SM에서는 SMUidx를 할당함으로서 각각 동일한 데이터들을 접근 할 수 있다. In addition, when several hosts access the same data at the same time, RAID1 improves the performance of input / output by allocating one storage device to one host, but the RAID-SM of the present invention configures the RAID-SM as shown in FIG. By allocating SMUidx of SMU as a unit, RAID0 can support the performance of multiple hosts. The number of SMUidxs that can be allocated is the same as the number of disks that make up mirroring, so it can support the same number of concurrent hosts that RAID1 can support. That is, in the case of mirroring, which is RAID1, as shown in FIG. 4, when three disks 410, 420, and 430 are configured, arbitrary disks can be allocated to a host requiring access, and RAID-SM assigns SMUidx to the same data, respectively. Can access them.

한편, 도 5는 본 발명에 따른 데이터 중복 저장 방식에서 쓰기 연산의 실시 예를 보여주는 도면이다. 5 is a diagram illustrating an embodiment of a write operation in a data redundant storage method according to the present invention.

본 발명의 RAID-SM에서 갱신 연산 및 쓰기 연산은 상기에서 정의한 맵핑 알고리즘을 사용하여 접근한다. In the RAID-SM of the present invention, the update operation and the write operation are accessed using the mapping algorithm defined above.

즉, 데이터 입출력 요청에서 갱신 및 쓰기에 대한 데이터 주소는 원본 디스크(510)에 대한 주소(541)로 요청을 하므로 동시에 원자적(Atomic)으로 연산이 발생하여야 하는 중복 디스크들(520, 530)에 대한 데이터의 갱신 및 쓰기(550)는 상기에서 정의한 맵핑식에 의하여 대상 데이터 주소(542,543)를 찾은 후 쓰기 연산을 수행한다. That is, the data address for update and write in the data I / O request is requested to the address 541 for the source disk 510, so that at the same time, the duplicate disks 520 and 530 that have to be operated atomically must be generated. The data update and write operation 550 of FIG. 5 performs a write operation after finding the target data addresses 542 and 543 by the mapping expression defined above.

이러한 쓰기 연산에 대한 전체적인 처리 과정은 RAID1에서의 처리 과정과 동일하다. 하지만, 도 5에 도시된 바와 같이 RAID-SM을 구성하는 디스크(510, 520, 530)마다 동일한 중복 데이터들의 디스크 주소(542,543)가 다르므로 상기에서 정의한 맵핑식에 의하여 정확한 위치를 산출하여 접근한다. The overall processing for these write operations is the same as for RAID1. However, as shown in FIG. 5, since the disk addresses 542 and 543 of the same redundant data are different for each of the disks 510, 520, and 530 constituting the RAID-SM, the exact position is calculated and accessed according to the mapping equation defined above. .

상술한 바와 같이 본 발명에 따른 대용량 데이터에 대한 데이터 중복 저장 시스템은, 데이터의 중복 저장에 의해 데이터의 신뢰성을 유지하면서 동시에 데이터 분산 저장 방식의 빠른 읽기 성능을 제공함으로써 대용량 데이터의 읽기 연산 집약적인 시스템에서 우수한 입출력 성능과 함께 데이터 가용성 및 복구 기능을 효율적으로 지원할 수 있다. As described above, the data redundancy storage system for a large amount of data according to the present invention is a read operation intensive system of large data by providing fast read performance of a data distributed storage method while maintaining the reliability of data by redundant storage of data. Can efficiently support data availability and recovery, along with good I / O performance.

이상에서 설명한 것은 본 발명에 따른 대용량 데이터에 대한 데이터 중복 저장 시스템을 실시하기 위한 하나의 실시예에 불과한 것으로서, 본 발명은 상기한 실시예에 한정되지 않고, 이하의 특허청구의 범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경 실시가 가능한 범위까지 본 발명의 기술적 정신이 있다고 할 것이다. What has been described above is only one embodiment for implementing a data redundancy storage system for a large amount of data according to the present invention, the present invention is not limited to the above-described embodiment, the present invention claimed in the following claims Without departing from the gist of the invention, anyone of ordinary skill in the art to which the present invention will have the technical spirit of the present invention to the extent that various modifications can be made.

도 1은 일반적인 데이터 중복 저장 방식(RAID1)의 데이터 블록 배치를 도시한 도면. 1 is a diagram illustrating a data block arrangement of a general data redundant storage method (RAID1).

도 2는 본 발명에 따른 데이터 중복 저장 방식(RAID-SM)의 데이터 블록 배치를 도시한 도면. 2 is a diagram illustrating a data block arrangement of a data redundant storage method (RAID-SM) according to the present invention.

도 3은 본 발명에 따른 데이터 중복 저장 방식에서 원본 데이터와 중복 데이터의 디스크 상 배치를 일반화하여 도시한 도면. 3 is a diagram illustrating a general arrangement on a disk of original data and redundant data in a data redundant storage method according to the present invention.

도 4는 본 발명에 따른 데이터 중복 저장 방식에서 읽기 연산의 실시 예를 도시한 도면. 4 is a diagram illustrating an embodiment of a read operation in a data redundant storage method according to the present invention.

도 5는 본 발명에 따른 데이터 중복 저장 방식에서 쓰기 연산의 실시 예를 도시한 도면. 5 is a diagram illustrating an embodiment of a write operation in a data redundant storage method according to the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of Symbols for Main Parts of Drawings>

110, 210, 310, 410, 510: 원본 디스크110, 210, 310, 410, 510: original disc

111: 원본 데이터 블록111: Original data block

120;130, 220;230, 320;330;340, 420;430, 510;520: 중복 디스크120; 130, 220; 230, 320; 330; 340, 420; 430, 510; 520: redundant disk

121, 131: 중복 데이터 블록121, 131: duplicate data blocks

211, 221, 231: SMU 250: SMUidex211, 221, 231: SMU 250: SMUidex

260: SMUno 311: RAID-SM을 구성하는 원본 디스크260: SMUno 311: Source disks that make up RAID-SM

331: RAID-SM을 구성하는 i-1번째 중복 디스크331: i-1th redundant disk forming a RAID-SM

350: 원본 디스크의 SMUidex(k)350: SMUidex (k) on source disk

360: 원본 디스크의 데이터 블록(k) 380: SMU의 인덱스360: Data block on the source disk (k) 380: Index of the SMU

390: RAID-SM에서 SMUno(k)390: SMUno (k) on RAID-SM

440: 일반 중복 저장 방식의 읽기 연산 시, 접근하는 데이터 블록들440: Data blocks accessed during a normal redundant storage read operation

450: RAID-SM 방식의 읽기 연산 시, 접근하는 데이터 블록들450: Data blocks accessed during RAID-SM read operation

540: RAID-SM 방식에서 쓰기 연산의 대상 데이터 블록540: Target data block of the write operation in the RAID-SM method

541, 542, 543: 쓰기 연산에 의해 갱신될 대상 블록들541, 542, 543: target blocks to be updated by a write operation

Claims

An original disk storing original data, and a plurality of redundant disks storing copies of the original data,

Create SMUs for the source disk and the redundant disks by grouping as many data blocks as the number of disks, and then assign the SMUno value to the SMU according to the SMU order in each disk, and the order of the data blocks in each SMU. And assigning a SMUidx value to the data block, wherein the same data blocks in each SMU having the same SMUno value have a different arrangement order for each disk.

The data storage system of claim 1, wherein the data redundant storage system comprises:

A data redundancy storage system for a large amount of data characterized by allowing parallel access to data blocks as large as SMU size when performing read operations by accessing each disk through the same SMUno value and the same SMUidx value.

The method of claim 1, wherein the redundant data storage system maps the same data block in SMUs having the same SMUno value of each disk.

If disk order (i) is less than or equal to SMUidx (k), place a duplicate data block equal to block (k) of the original disk in a block of disk i (SMUidx (k)-order (i)), and If (i) is greater than SMUidx (k), data redundancy storage system for a large amount of data, characterized in that placed in a position reduced by the value of (order (i)-SMUidx (k)) in the size of the SMU.

Where block (k): the kth block on the source disk,

SMUidx (k): SMUidx value of kth block,

order (i): Order value of the I disk.

In case of data access request from hosts, data redundancy for large data is provided by allocating SMUno value and SMUidx value to simultaneously access the requested data blocks from all disks to provide data distributed storage read performance. Storage system.