KR20120090320A

KR20120090320A - Method for effective data recovery in distributed file system

Info

Publication number: KR20120090320A
Application number: KR1020110010688A
Authority: KR
Inventors: 김홍모; 황지수; 박수호; 이대우; 홍윤정; 이숙영; 임승현; 한준희; 하용호; 양정수; 조해공
Original assignee: 케이티하이텔 주식회사
Priority date: 2011-02-07
Filing date: 2011-02-07
Publication date: 2012-08-17
Also published as: KR101254179B1

Abstract

PURPOSE: An efficient data recovering method in a distributed file system is provided to efficiently reduce recovery task time by configuring data servers as a pipeline. CONSTITUTION: If a first data server is detected through a metadata server, location information of data servers is detected for storing recovery data of the first data server(S200). A recovery task requests recovery data to the data servers(S300). If a recovery task of the data server is completed through recovery operation, the recovery completion of the data server is reported to the metadata server(S400).

Description

Method for effective data recovery in distributed file system

본 발명은 분산 파일 시스템에서의 자료 복구 방법에 관한 것으로, 특히 자료를 여러 개의 조각(slice)로 분할하고, 분할된 데이터를 적어도 2개 이상의 서버로 저장하는 분산 파일 시스템에서 효율적인 자료 복구를 위한 방법에 관한 것이다.The present invention relates to a data recovery method in a distributed file system, and more particularly, to a method for efficient data recovery in a distributed file system in which data is divided into slices and the divided data are stored in at least two servers. It is about.

종래의 스토리지(storage) 환경에서 저장되는 데이터의 대부분은 기업이나 기관에서 생성한 업무 관련 데이터였으나, 최근 인터넷 기술의 비약적인 발전으로 블로그, 사진, 동영상과 같은 멀티미디어 데이터들의 저장 비율도 급속도로 증가하고 있다. 특히, 국내외에서 인터넷 서비스를 실시중인 대형 포탈 업체의 경우에는 매월 수 테라바이트(Tera Byte : TB)-수십 테라바이트의 데이터가 새롭게 생성되어 저장 및 관리되고 있다. 그러나 기존의 저장 구조 환경은 스토리지 확장성 및 관리의 용이성에서 많은 문제점이 있기 때문에 변화 무쌍한 서비스 환경에 대체하기에는 부족하다.Most of the data stored in the conventional storage environment was business-related data generated by companies or institutions. However, due to the rapid development of Internet technology, the storage rate of multimedia data such as blogs, pictures, and videos is rapidly increasing. . In particular, large portal companies that provide Internet services at home and abroad are newly generating, storing, and managing several terabytes (TBs) of data every month. However, the existing storage structure environment has many problems in storage scalability and ease of management, so it is not sufficient to replace the ever-changing service environment.

최근 스토리지 시스템 혹은 파일 시스템의 근원적인 기술 발전은 스토리지 시스템의 확장성(scalability) 및 성능의 향상에 기인한 것이다. 상세하게는, 파일 시스템 구조 측면에서 몇몇 시스템들이 파일의 데이터 입출력 경로와 파일의 메타데이터 관리 경로를 분리시켜서 분산 스토리지 시스템의 확장성과 성능을 높인 것이다. 이러한 구조를 적용하여 클라이언트 시스템이 저장 장치들에 직접 접근할 수 있게 하고, 메타데이터를 분산시켜서 빈번한 파일의 메타데이터 접근으로 인한 병목현상을 해소하여 스토리지의 확장성을 높인다.Recent technological advances in storage systems or file systems are due to improvements in scalability and performance of storage systems. Specifically, in terms of the file system structure, some systems have separated the data input / output path of the file and the metadata management path of the file to increase the scalability and performance of the distributed storage system. By applying this structure, the client system can directly access the storage devices, and by distributing the metadata, the bottleneck caused by frequent file metadata access can be eliminated to increase the storage scalability.

이러한 구조를 기반으로 개발된 엔터프라이즈급 스토리지 솔루션으로 IBM의 StorageTank, Panasas의 ActiveScale Storage Cluster, 그리고 Cluster filesystems의 luster, Google의 Google Filesystem 등이 있다. 특히, Google Filesystem은 한 파일에 대한 블록 데이터를 다수의 데이터 서버에 복제하여 가용성을 더욱 높였다.Enterprise-class storage solutions built on this architecture include IBM's StorageTank, Panasas' ActiveScale Storage Cluster, and cluster filesystems luster, and Google's Google Filesystem. In particular, Google Filesystem has increased availability by replicating block data for one file to multiple data servers.

이 같은 네트워크 기반 분산 파일 시스템 환경에서는 클라이언트 파일 시스템, 메타데이터 서버 및 데이터 서버들이 네트워크를 통해 교신하면서 데이터의 입출력을 제공한다. 클라이언트는 특정 파일에 접근하기 위해서 메타데이터 서버로부터 파일의 실제 데이터가 저장된 블록의 위치 정보를 획득한 후, 블록이 위치한 데이터 서버에 접근하여 블록의 데이터를 읽어 이를 사용한다.In this network-based distributed file system environment, client file systems, metadata servers, and data servers communicate over a network to provide input and output of data. To access a specific file, the client obtains the location information of the block in which the actual data of the file is stored from the metadata server, and then accesses the data server where the block is located and reads the data of the block.

한편, 이러한 분산 파일 시스템에서 자료의 손상 및 서버(데이터 서버, 메타데이터 서버)나 디스크의 고장 등과 같은 장애가 발생되는 경우, 이를 빠르고 정확하게 감지하고 복구해 내는 것이 분산 파일 시스템의 성능을 크게 작용하는 매우 중요한 이슈로 작용한다.On the other hand, in the case of data corruption and failures such as server (data server, metadata server) or disk failure in such a distributed file system, detecting and recovering it quickly and accurately greatly affects the performance of the distributed file system. It is an important issue.

이러한 분산 파일 시스템에서의 장애 발생에 대비하기 위해서, 일반적으로 다중 복제를 지원하는 분산 파일 시스템 환경에서는 최소한의 가용성을 보장하는 범위 내에서 아래와 같은 방법들을 사용하여 고장이 발생한 블록들을 복구한다.In order to prepare for such a failure in the distributed file system, generally, in a distributed file system environment that supports multiple replication, the failed blocks are repaired using the following methods within the range to ensure the minimum availability.

첫 번째 방법은 모든 블록 정보를 메타데이터 서버의 메모리에 적재하고 고장 상황 발생시 메모리로부터 고장이 발생한 블록 정보를 수집한 후 블록을 복구하는 방법이다. In the first method, all block information is loaded into the memory of the metadata server, and when a failure situation occurs, the block is recovered from the memory after collecting the failed block information.

두 번째 방법은 모든 블록 정보를 별도의 데이터베이스에 저장하고 고장 상황 발생시 데이터베이스로부터 블록 정보를 수집한 후 블록을 복구하는 방법이다. 상세하게는 메타데이터 서버에 블록 정보 저장을 위한 전용의 데이터베이스를 구축하여 블록에 대한 변동이 발생할 때마다 데이터베이스를 편집하여 관리하는 방법이다.The second method is to store all block information in a separate database, recover the block after collecting the block information from the database when a failure occurs. In detail, a method of constructing a dedicated database for storing block information in a metadata server and editing and managing a database whenever a change in a block occurs.

세 번째 방법은 별도의 블록 정보를 관리하지 않고 고장이 발생할 때마다 모든 메타데이터를 검색하여 고장이 발생한 블록 정보를 수집한 후 블록을 복구하는 방법이다.The third method is to recover blocks after collecting all block information by searching all metadata whenever a failure occurs without managing separate block information.

이처럼, 분산 파일 시스템은 장애 발생에 대비하여 다양한 복구 방법들이 연구되고 있으며, 새롭게 연구되어 설계되는 분산 파일 시스템의 테이블 구조에서도 이러한 새로운 구조에 맞는 복구 방법이 적용되어야 보다 효율적인 장애 복구가 가능할 것이다. As such, a variety of recovery methods have been studied in preparation for a failure of distributed file systems, and a more efficient failure recovery will be possible only when a recovery method suitable for such a new structure is applied to a table structure of a newly researched and designed distributed file system.

따라서 본 발명은 상기와 같은 문제점을 해결하기 위해 안출한 것으로서, 자료를 여러 개의 조각(slice)로 분할하고, 분할된 데이터를 적어도 2개 이상의 서버로 저장하는 새로운 구조를 갖는 분산 파일 시스템에서 효율적으로 자료를 복구하기 위한 방법을 제공하는데 그 목적이 있다.Accordingly, the present invention has been made to solve the above problems, and efficiently divides the data into several slices and efficiently distributes the divided data to at least two servers. Its purpose is to provide a method for restoring data.

본 발명의 다른 목적은 자료 복구에 참여하는 데이터서버가 하나의 파이프라인을 구성하고 앞 단의 데이터서버에서 입력된 데이터를 복구 연산 처리하여 뒤 단의 데이터서버에 출력함으로써 동시 작업 수행을 통해 복구 작업시간을 줄일 수 있는 분산 파일 시스템에서 효율적인 자료 복구 방법을 제공하는데 있다.Another object of the present invention is a data server participating in data recovery constitutes one pipeline, recovers data input from the previous data server, and outputs the data to the data server of the next stage. It is to provide an efficient data recovery method in a distributed file system that can save time.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 분산 파일 시스템에서 효율적인 자료 복구 방법의 특징은 (A) 메타데이터 서버를 통해 장애가 발생된 제 1 데이터 서버가 탐지되면, 탐지된 제 1 데이터 서버 장애의 복구를 위해 재구성될 복구 데이터를 저장하고 있는 데이터 서버들의 위치 정보를 검출하는 단계와, (B) 상기 위치 정보가 검출된 데이터 서버들에 복구 데이터를 요청하여, 복구 데이터를 입력받고, 입력되는 복구 데이터를 이용하여 복구 연산을 통해 복구 작업을 수행하는 단계와, (C) 상기 복구 연산을 통해 장애 발생 데이터 서버의 복구 작업을 완료하고, 메타데이터 서버에게 해당 데이터 서버의 복구가 완료되었음을 통보하는 단계를 포함하여 이루어지는데 있다.A feature of an efficient data recovery method in a distributed file system according to the present invention for achieving the above object is (A) if a failed first data server is detected through the metadata server, Detecting location information of data servers storing recovery data to be reconstructed for recovery; (B) requesting recovery data from the data servers where the location information is detected, receiving recovery data, and receiving recovery data; Performing a recovery operation through a recovery operation using the data; and (C) completing a recovery operation of the failed data server through the recovery operation, and notifying the metadata server that the recovery of the data server is completed. It consists of including.

바람직하게 상기 (B) 단계는 장애 복구를 주관하는 임의의 데이터 서버로 상기 검출된 복구 데이터 서버들의 위치 정보를 제공하는 단계와, 상기 제공된 위치 정보를 기반으로 복구 데이터를 보유하고 있는 복구 데이터 서버들에 복구 데이터 요청 명령을 전달하여 해당 복구 데이터 서버 내 로컬 스토리지에 보유하고 있는 복구 데이터를 입력받는 단계와, 상기 제 1 데이터 서버의 장애 복구가 가능할 때까지 상기 복구 데이터를 입력받고, 입력된 복구데이터를 재구성한 후, 복구 연산을 수행하는 단계를 포함하여 이루어지는 것을 특징으로 한다.Preferably, the step (B) includes providing the location information of the detected recovery data servers to any data server in charge of failure recovery, and the recovery data servers holding the recovery data based on the provided location information. Receiving a recovery data request command to receive the recovery data held in the local storage of the corresponding recovery data server, receiving the recovery data until failure recovery of the first data server is possible, and receiving the received recovery data. After reconstructing, characterized in that it comprises a step of performing a recovery operation.

바람직하게 상기 (B) 단계는 상기 위치 정보가 검출된 복구 데이터 서버들에 복구 데이터 요청 명령을 전달하는 단계와, 상기 복구 데이터 요청 명령이 전달된 각 복구 데이터 서버 내의 로컬 스토리지에 보유하고 있는 복구 데이터를 입력받는 단계와, 제 1 데이터 서버의 장애 복구가 가능할 때까지 상기 복구 데이터를 입력받고, 입력된 복구데이터를 재구성한 후, 복구 연산을 수행하는 단계를 포함하여 이루어지는 것을 특징으로 한다.Preferably, the step (B) includes transmitting a recovery data request command to the recovery data servers from which the location information is detected, and recovering data held in local storage in each recovery data server to which the recovery data request command has been transmitted. And receiving the recovery data until the failure of the first data server is possible, reconstructing the input recovery data, and performing a recovery operation.

바람직하게 상기 (B) 단계는 상기 장애가 발생된 제 1 데이터 서버로 상기 검출된 복구 데이터 서버들의 위치 정보를 제공하는 단계와, 상기 제공되는 복구 데이터 서버들의 위치 정보를 기반으로 각각의 복구 데이터 서버로 이루어진 하나의 파이프라인을 구성하여, 상기 구성된 파이프라인을 기반으로 다음 단의 복구 데이터 서버로 복구 데이터 요청 명령이 포함된 파이프라인 정보를 전송하는 단계와, 상기 전송되는 파이프라인 정보를 기반으로 자신의 로컬 스토리지에 보유하고 있는 복구 데이터를 읽어서 이를 상기 파이프라인의 다음 단인 복구 데이터 서버로 복구 데이터와 파이프라인 정보를 함께 전송하는 단계와, 상기 앞 단의 복구 데이터 서버에서 전송되는 복구 데이터를 자신이 보유한 보유 데이터와 재구성한 후, 복구 연산을 수행하면서 파이프라인 처리를 통해 상기 파이프라인의 다음 단인 데이터 서버로 복구 연산의 수행결과인 복구 연산 데이터와 파이프라인 정보를 함께 전송하는 단계와, 상기 전송된 복구 연산 데이터를 자신이 보유한 보유 데이터와 재구성한 후, 복구 연산을 수행하면서 파이프라인 처리를 통해 상기 파이프라인의 다음 단인 데이터 서버로 복구 연산의 수행결과인 복구 연산 데이터와 파이프라인 정보를 함께 전송하는 단계와, 상기 구성된 파이프라인 정보를 기반으로 다음 단인 데이터 서버가 존재하지 않을 때까지 계속 수행하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (B) includes providing location information of the detected recovery data servers to the failed first data server and providing each recovery data server based on location information of the provided recovery data servers. Configuring one pipeline, transmitting pipeline information including a recovery data request command to a next stage recovery data server based on the configured pipeline, and based on the transmitted pipeline information Reading the recovery data held in local storage and transmitting the recovery data and pipeline information together to a recovery data server, which is the next stage of the pipeline, and having the recovery data transmitted from the previous recovery data server. After reconstructing the retained data, performing a recovery operation Transmitting the recovery operation data and the pipeline information together as a result of the recovery operation to the data server, which is the next stage of the pipeline, through the pipeline processing; and reconstructing the transmitted recovery operation data with the retained data owned by And transmitting the recovery operation data and the pipeline information, which are the result of the recovery operation, to the data server, which is the next stage of the pipeline, through the pipeline processing while performing the recovery operation, and the next stage based on the configured pipeline information. And continuing until the data server does not exist.

바람직하게 상기 (B) 단계는 상기 위치 정보가 검출된 복구 데이터 서버들 중 어느 하나의 복구 데이터 서버에 상기 검출된 복구 데이터 서버의 위치정보를 제공하는 단계와, 상기 제공되는 복구 데이터 서버들의 위치 정보를 기반으로 각각의 복구 데이터 서버로 이루어진 하나의 파이프라인을 구성하는 단계와, 상기 구성된 파이프라인 정보를 기반으로 자신의 로컬 스토리지에 보유하고 있는 복구 데이터를 읽어서 이를 상기 파이프라인의 다음 단인 복구 데이터 서버로 복구 데이터와 파이프라인 정보를 함께 전송하는 단계와, 상기 전송된 복구 데이터를 자신이 보유한 보유 데이터와 재구성한 후, 복구 연산을 수행하면서 파이프라인 처리를 통해 상기 파이프라인의 다음 단인 데이터 서버로 복구 연산의 수행결과인 복구 연산 데이터와 파이프라인 정보를 함께 전송하는 단계와, 상기 전송된 복구 연산 데이터를 자신이 보유한 보유 데이터와 재구성한 후, 복구 연산을 수행하면서 파이프라인 처리를 통해 상기 파이프라인의 다음 단인 데이터 서버로 복구 연산의 수행결과인 복구 연산 데이터와 파이프라인 정보를 함께 전송하는 단계와, 상기 구성된 파이프라인 정보를 기반으로 다음 단인 데이터 서버가 존재하지 않을 때까지 계속 수행하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (B) includes providing the location information of the detected recovery data server to any one of the recovery data servers from which the location information is detected, and the location information of the provided recovery data servers. Comprising a step of configuring a pipeline consisting of each recovery data server based on the, and based on the configured pipeline information to read the recovery data held in its own local storage and the recovery data server, which is the next stage of the pipeline Transmitting the recovered data and pipeline information together, and reconstructing the transmitted recovered data with retained data owned by the user, and performing a recovery operation to recover to the next server of the pipeline through a pipeline process. Restore operation data and pipeline as the result of the operation Transmitting the information together, reconstructing the transmitted recovery operation data with the retained data owned by the user, and performing a recovery operation to perform a recovery operation to a data server that is the next stage of the pipeline. And transmitting the recovery operation data and the pipeline information together, and continuing until the next stage of the data server does not exist based on the configured pipeline information.

이상에서 설명한 바와 같은 본 발명에 따른 분산 파일 시스템에서 효율적인 자료 복구 방법은 자료를 여러 개의 조각(slice)로 분할하고, 분할된 데이터를 적어도 2개 이상의 서버로 저장하는 새로운 구조를 갖는 분산 파일 시스템에서 효율적으로 자료를 복구할 수 있다. 특히 자료 복구에 참여하는 복수의 데이터서버를 하나의 파이프라인을 구성하여 동시 작업을 수행함으로써 복구 작업시간을 효율적으로 줄일 수 있는 효과가 있다.An efficient data recovery method in the distributed file system according to the present invention as described above is divided into a plurality of slices (slice), and in a distributed file system having a new structure for storing the divided data to at least two servers Can recover data efficiently. In particular, it is possible to efficiently reduce the recovery work time by performing a simultaneous work by configuring a single pipeline of multiple data servers participating in data recovery.

또한, 이러한 복구작업 시간 절약 뿐만 아니라, 네트워크 자원을 효율적으로 사용함으로써 복구작업으로 인한 부하가 서비스에 미치는 영향을 줄일 수 있는 효과가 있다.In addition, the recovery time, as well as the efficient use of network resources can reduce the effect of the load on the service due to the recovery operation.

[도 1] 본 발명의 실시예에 따른 분산 파일 시스템의 전체 구조를 나타낸 구성도
[도 2] 본 발명에 따른 패리티 데이터를 이용한 자료의 분산 저장 방법을 설명하기 위한 도면
[도 3] 본 발명의 일 실시예에 따른 자료의 분산 저장 방법을 설명하기 위한 도면
[도 4a, 4b 및 도 5a, 5b] 본 발명에 따른 분산 파일 시스템의 효율적인 자료 복구 장치를 나타낸 구성도
[도 4c 및 도 5c] 실시예를 통한 자료 복구 연산시의 복구 작업 시간을 나타낸 타이밍도
[도 6] 본 발명에 따른 분산 파일 시스템에서의 데이터 서버 구조를 나타낸 구성도
[도 7] 본 발명의 실시예에 따른 분산 파일 시스템에서 효율적인 자료 복구 방법을 설명하기 위한 흐름도
[도 8 내지 도 9] 본 발명의 자료 복구 방법에서 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 흐름도
[도 10 내지 도 11] 본 발명의 자료 복구 방법에서 코시 리드-솔로몬을 적용한 복구 연산 방법을 설명하기 위한 도면1 is a block diagram showing the overall structure of a distributed file system according to an embodiment of the present invention
2 is a diagram illustrating a distributed storage method of data using parity data according to the present invention.
3 is a diagram illustrating a distributed storage method of data according to an embodiment of the present invention.
4A, 4B and 5A, 5B are schematic diagrams illustrating an efficient data recovery apparatus of a distributed file system according to the present invention.
4C and 5C are timing diagrams showing a recovery operation time during a data recovery operation according to the embodiment.
6 is a block diagram showing a data server structure in a distributed file system according to the present invention.
7 is a flowchart illustrating an efficient data recovery method in a distributed file system according to an embodiment of the present invention.
8 to 9 are flowcharts for explaining a method for receiving and recovering recovered data in the method for recovering data according to the present invention.
10 to 11 are diagrams for explaining a recovery operation method applying Cosi Reed-Solomon in the data recovery method of the present invention

본 발명의 다른 목적, 특성 및 이점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the detailed description of the embodiments with reference to the accompanying drawings.

본 발명에 따른 분산 파일 시스템에서 효율적인 자료 복구 방법의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 설명하면 다음과 같다. 그러나 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예는 본 발명의 개시가 완전하도록하며 통상의 지식을 가진자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.A preferred embodiment of an efficient data recovery method in a distributed file system according to the present invention will be described with reference to the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but can be embodied in various forms, and only the present embodiments are intended to complete the disclosure of the present invention and to those skilled in the art to fully understand the scope of the invention. It is provided to inform you. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

도 1 은 본 발명의 실시예에 따른 분산 파일 시스템의 전체 구조를 나타낸 구성도이다.1 is a block diagram showing the overall structure of a distributed file system according to an embodiment of the present invention.

도 1과 같이, 분산 파일 시스템(100)은 복수의 데이터 서버(110a~110n)와, 상기 각각의 데이터 서버(110a~110n)에 데이터를 저장하기 위한 별도의 저장 공간을 갖는 로컬 스토리지(120a~120n)로 구성된다.As shown in FIG. 1, the distributed file system 100 includes a plurality of data servers 110a through 110n and local storage 120a through a separate storage space for storing data in each of the data servers 110a through 110n. 120n).

일 예로, 분산 파일 시스템(100)을 구성하는 데이터 서버(110a~110n) 각각은 자신의 로컬 스토리지(120a~120n)에 저장된 데이터에 대해 발생한 이벤트를 감지할 수 있다. 이때, 데이터는 파일 또는 디렉터리를 의미할 수 있다. 또한, 이벤트는 파일 또는 디렉터리에 대한 변경, 삭제 또는 생성을 의미한다.For example, each of the data servers 110a to 110n constituting the distributed file system 100 may detect an event occurring for data stored in its local storage 120a to 120n. In this case, the data may mean a file or a directory. In addition, an event means a change, deletion, or creation of a file or directory.

따라서 제 2 로컬 스토리지(120b)에 저장된 데이터에 대한 이벤트가 발생하면, 제 2 데이터 서버(110b)는 자신을 제외한 분산 파일 시스템(100)의 다른 서버에 이벤트를 통해 변경된 데이터를 전송함으로써, 전체적으로 데이터 서버 간에 동일한 데이터 상태가 유지될 수 있다. Therefore, when an event for data stored in the second local storage 120b occurs, the second data server 110b transmits the changed data through the event to other servers of the distributed file system 100 except for the entire data, thereby providing overall data. The same data state can be maintained between the servers.

그리고 데이터 입출력을 통해 하드웨어인 로컬 스토리지(120a~120n)에서 이벤트를 감지하면, 이벤트에 의해 변경된 데이터의 내용을 어플리케이션에 통보한다. 이때 어플리케이션은 분산 파일 시스템(100)을 구성하는 복수의 데이터 서버(110a~110n)에서 동작하는 프로그램을 의미할 수 있다. 즉, 분산 파일 시스템(100)은 데이터 서버(110a~110n) 각각이 제공하는 서비스 별 프로그래밍 언어로 구현된 전용 어플리케이션을 사용하여 데이터 입출력을 통해 이벤트를 감지하게 된다.When an event is detected in hardware local storage 120a to 120n through data input / output, the application notifies the contents of data changed by the event. In this case, the application may mean a program that operates in the plurality of data servers 110a to 110n constituting the distributed file system 100. That is, the distributed file system 100 detects an event through data input / output using a dedicated application implemented in a programming language for each service provided by each of the data servers 110a to 110n.

그러면 분산 파일 시스템(100)은 운영체제에 따라 결정되는 로컬 스토리지(120a~120n)의 데이터 형태에 기초하여 이미 구현된 데이터 입출력을 사용할 수 있다.Then, the distributed file system 100 may use data input / output already implemented based on the data type of the local storages 120a to 120n determined by the operating system.

이처럼, 분산 파일 시스템(100)은 다른 데이터 서버에 이벤트에 따른 데이터 변경을 반영하기 위해 어플리케이션을 통해 변경된 데이터를 다른 서버에 전송할 수 있다. 반대로, 분산 파일 시스템(100)은 어플리케이션을 통해 변경된 데이터에 대해 다른 데이터 서버로부터 수신할 수 있다. 이때, 어플리케이션은 다른 데이터 서버로부터 수신한 변경된 데이터를 해당 데이터 서버의 로컬 스토리지에 반영할 수 있다. 데이터를 전송하고, 데이터를 수신하는 과정은 분산 파일 시스템(100)을 구성하는 복수의 데이터 서버(110a~110n) 각각이 공통적으로 수행할 수 있다.As such, the distributed file system 100 may transmit the changed data to another server through an application in order to reflect the data change according to an event in another data server. In contrast, the distributed file system 100 may receive data from another data server about the changed data through an application. In this case, the application may reflect the changed data received from another data server to the local storage of the corresponding data server. The process of transmitting data and receiving data may be performed by each of the plurality of data servers 110a to 110n constituting the distributed file system 100 in common.

그리고 자료의 분산 방법은 도 2와 같이, 클라이언트를 통해 입력되는 원 파일을 각 데이터 서버 내 로컬 스토리지의 한 프레임에 해당하는 부호어(codeword) 크기 단위로 분할한다.In the data distribution method, as shown in FIG. 2, the original file input through the client is divided into codeword size units corresponding to one frame of local storage in each data server.

도 3 은 본 발명의 일 실시예에 따른 자료의 분산 저장 방법을 설명하기 위한 도면이다.3 is a diagram illustrating a distributed storage method of data according to an embodiment of the present invention.

도 3과 같이, 하나의 원 파일이 d 개로 분할되어 d₁~d_d 분할 데이터가 순차적으로 나열된다. 그리고 이 분할 데이터에 따른 p개의 패리티 데이터(설명을 용이하게 하기 위해 여기서는 3개(p1~p3) 패리티 데이터를 기재하였다.)를 포함한다. 이때, 상기 d+p는 데이터 서버의 개수와 동일하거나 적은 수를 갖는 것이 바람직하다.As shown in Fig. 3, one original file is divided into d pieces so that d ₁ to d _d divided data are sequentially arranged. And p parity data (three (p1 to p3) parity data are described herein for ease of explanation) according to the divided data. At this time, the d + p is preferably equal to or less than the number of data servers.

이처럼 상기 분할 데이터와 패리티 데이터로 분할된 d+p개의 분할 데이터들(sb1, sb2, .., sbn)을 다수개의 데이터 서버(110a~110n)로 각각 전송하여 제 1 데이터 서버(110a)내의 제 1 로컬 스토리지(120a)에는 제 1 분할 데이터(sb1)를 저장하고, 제 2 데이터 서버(110b)내의 제 2 로컬 스토리지(120b)에는 제 2 분할 데이터(sb2)를 저장하고, 제 3 데이터 서버(110c)내의 제 3 로컬 스토리지(120c)에는 제 3 분할 데이터(sb3)를 저장한다. 이러한 방식으로 마지막에는 제 n 데이터 서버(110n)내의 제 n 로컬 스토리지(120n)에는 제 n 분할 데이터(sbn)를 저장하게 된다.As described above, the d + p pieces of divided data sb1, sb2, .., and sbn divided into parity data are transmitted to the plurality of data servers 110a to 110n, respectively, to generate the first data server 110a. The first divided data sb1 is stored in the first local storage 120a, the second divided data sb2 is stored in the second local storage 120b in the second data server 110b, and the third data server ( Third partitioned data sb3 is stored in the third local storage 120c in 110c. In this manner, the nth divided data sbn is stored in the nth local storage 120n in the nth data server 110n at the end.

그리고 분산 파일 시스템(100)은 데이터 서버들(110a~110n)의 로컬 스토리지(120a~120n)에 각각 저장된 분할 데이터들의 위치정보를 포함하는 메타 정보를 별도의 메타데이터 서버(MDS)(130)에 저장한다.In addition, the distributed file system 100 may store meta information including location information of partitioned data stored in the local storages 120a through 120n of the data servers 110a through 110n, in a separate metadata server (MDS) 130. Save it.

이처럼, 본 발명에 따른 패리티 데이터를 이용한 자료의 분산 저장 방법에서는 하나의 원 파일이 d 개로 분할되어 각각의 데이터 서버 내 로컬 스토리지에 각각 분할되어 저장되게 된다.
As described above, in the distributed storage method of data using parity data according to the present invention, one original file is divided into d pieces and stored in local storage in each data server.

이와 같이 구성된 본 발명에 따른 분산 파일 시스템에서 효율적인 자료 복구 방법을 첨부한 도면을 참조하여 설명하면 다음과 같다.Referring to the accompanying drawings, an efficient data recovery method in the distributed file system according to the present invention configured as described above is as follows.

도 7 은 본 발명의 실시예에 따른 분산 파일 시스템에서 효율적인 자료 복구 방법을 설명하기 위한 흐름도이다.7 is a flowchart illustrating an efficient data recovery method in a distributed file system according to an embodiment of the present invention.

도 7을 참조하여 설명하면, 먼저 메타데이터 서버(130)를 통해 장애가 발생된 데이터 서버(110)를 탐지한다(S100). 이때, 데이터 서버(110)의 장애 탐지는 읽기 작업 중 손상된 자료를 발견한 경우와, 데이터 서버(110)나 디스크(로컬 스토리지)(120) 자체에 고장이 발생하여 장애가 발생된 경우와, 메타데이터 서버(130)를 통한 데이터 서버(110)와의 주기적인 검사를 통해 'checksum' 맞지 않는 자료가 발생, 네트워크 연결 단절, 데이터 서버 프로세스의 비정상 종료, 전원 불량 등에 의해 감지될 수 있다.Referring to FIG. 7, first, a data server 110 having a failure is detected through the metadata server 130 (S100). In this case, failure detection of the data server 110 may include a case in which damaged data is found during a read operation, a failure occurs due to a failure in the data server 110 or the disk (local storage) 120 itself, and metadata. Periodic checks with the data server 110 through the server 130 may result in data that does not match the 'checksum', network disconnection, abnormal termination of the data server process, power failure, and the like.

이어 메타데이터 서버(130)는 장애가 발생된 데이터 서버(110)가 탐지되면, 탐지된 데이터 서버(110)의 장애로부터 복구를 위해 재구성될 복구 데이터를 저장하고 있는 데이터 서버들의 위치를 검출한다(S200). 이때 메타데이터 서버(130)는 파일의 네임스페이스 트리(file namespace tree)를 관리하기 위한 영역으로 각 디렉터리 및 파일의 계층 구조를 표현하며, 각 파일들의 이름, 크기, 권한 및 위치 정보 등 파일의 속성 및 파일에 대한 정보를 저장 및 관리한다. 따라서 메타데이터 서버(130)는 모든 데이터 서버에서 관리하는 모든 데이터 정보를 관리하며, 특정 데이터 서버의 고장에 대한 복구 정보를 수집할 수 있다.Subsequently, when the failed data server 110 is detected, the metadata server 130 detects locations of the data servers that store the recovery data to be reconstructed for recovery from the failure of the detected data server 110 (S200). ). At this time, the metadata server 130 is an area for managing a file namespace tree of a file and expresses a hierarchical structure of each directory and file, and attributes of a file such as name, size, authority, and location information of each file And stores and manages information about files. Therefore, the metadata server 130 manages all data information managed by all data servers, and may collect recovery information on failure of a specific data server.

이러한 상기 메타데이터 서버(130)에서 복구를 위해 검출된 데이터 서버에서는 복구에 사용될 정상적인 복구 데이터를 전송하고, 복구할 데이터 서버는 이를 수신한다. 그리고 이렇게 수신된 복구 데이터를 사용하여 복구 연산을 통해 복구 작업을 수행한다(S300). 이때, 복구 데이터의 수신 및 복구 연산 방법에는 다양한 방법을 통해 이루어질 수 있으며, 이에 따른 상세한 설명은 아래에서 도면을 참조하여 보다 상세하게 설명하도록 한다.The data server detected for recovery by the metadata server 130 transmits normal recovery data to be used for recovery, and the data server to be recovered receives it. Then, the recovery operation is performed through the recovery operation using the received recovery data (S300). In this case, a method of receiving and recovering recovery data may be performed through various methods, and a detailed description thereof will be described below with reference to the accompanying drawings.

그리고 이렇게 수행된 복구 연산을 통해 장애 발생 데이터 서버(110)의 복구 작업을 완료하고, 메타데이터 서버(130)에게 해당 데이터 서버(110)의 복구가 완료되었음을 통보한다(S400).The recovery operation performed in this manner completes the recovery operation of the failed data server 110, and notifies the metadata server 130 that the recovery of the corresponding data server 110 is completed (S400).

한편, 상기 복구 데이터의 수신 및 복구 연산 방법을 도면을 참조하여 다양한 실시예를 통해 살펴보면 다음과 같다.Meanwhile, a method of receiving and restoring recovery data will now be described with reference to the accompanying drawings.

제 1 1st 실시예Example

도 8 은 본 발명의 자료 복구 방법에서 제 1 실시예에 따른 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 흐름도이고, 도 4a 는 본 발명의 자료 복구 방법에서 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 제 1 실시예이다.8 is a flowchart illustrating a method of receiving and restoring restoration data according to the first embodiment in a data restoration method of the present invention, and FIG. 4A illustrates a method of receiving and restoring restoration data in a data restoration method of the present invention. A first embodiment for explanation.

도면을 참조하여 설명하면, 먼저 상기 메타데이터 서버(130)는 장애 데이터 서버(제 1 데이터 서버)(110a)의 장애를 복구하기 위한 복구 데이터를 저장하고 있는 복구 데이터 서버(제 2 데이터 서버, 제 4 데이터 서버, 제 5 데이터 서버, …, 제 m 데이터 서버, …, 제 n 데이터 서버)(110b)(110d)(110e)(110f)(110h)들의 위치가 검출되면(S200), 장애가 발생된 제 1 데이터 서버(110a)로 상기 검출된 복구 데이터 서버들의 위치 정보를 제공한다. 이때, 복구 데이터 서버들의 위치 정보의 제공은 제 1 데이터 서버(110a)에서 메타데이터 서버(130)로 위치정보가 요청되면 제 1 데이터 서버(110a)로 검출된 위치정보가 제공되거나, 또는 별도의 위치정보 요청 없이 장애가 탐지되면 해당 제 1 데이터 서버(110a)로 검출된 위치정보를 제공할 수도 있다.Referring to the drawings, first, the metadata server 130 is a recovery data server (second data server, storing recovery data for recovering a failure of the failure data server (first data server) 110a). When the positions of the fourth data server, the fifth data server, ..., the mth data server, ..., the nth data server 110b, 110d, 110e, 110f and 110h are detected (S200), a failure occurs. The first data server 110a provides location information of the detected recovery data servers. In this case, when the location information is requested from the first data server 110a to the metadata server 130, the location information detected by the first data server 110a may be provided, or a separate information may be provided. If a failure is detected without requesting location information, the detected location information may be provided to the corresponding first data server 110a.

이어 상기 제 1 데이터 서버(110a)는 메타데이터 서버(130)를 통해 제공된 위치정보를 기반으로 복구 데이터를 보유하고 있는 복구 데이터 서버(110b)(110d)(110e)(110f)(110h)들에 복구 데이터 요청 명령을 전달한다. Subsequently, the first data server 110a is connected to the recovery data servers 110b, 110d, 110e, 110f, and 110h that hold the recovery data based on the location information provided through the metadata server 130. Send a command for requesting recovery data.

그러면 각각의 복구 데이터 서버는 제 1 데이터 서버(110a)가 전달한 복구 데이터 요청 명령을 이용하여 서버 내 로컬 스토리지에 보유하고 있는 복구 데이터를 읽어서 이를 제 1 데이터 서버(100a)로 전송한다(S301). 이때, 복구 데이터의 전송은 각 데이터 서버들의 시스템 상황을 고려하여 단계별로 진행될 수 있다. 즉, 단계별 진행은 먼저 제 2 데이터 서버(110b)에서 보유하고 있는 복구 데이터를 모두 제 1 데이터 서버(110a)로 전송이 완료되면, 다음 제 4 데이터 서버(110d)에서 보유하고 있는 복구 데이터를 모두 제 1 데이터 서버(110a)로 전송한다. 이처럼, 하나의 데이터 서버(110a)에 보유하고 있는 복구 데이터의 전송이 완료되면 순차적으로 다음 데이터 서버에서 보유하고 있는 복구 데이터의 전송이 이루어지도록 진행되는 방법이다. 이때 전송은 병렬로 진행될 수도 있다. 다만 네트워크 전송 용량이 병목점이 되므로 순차적으로 진행하거나 병렬로 진행하나 그 차이는 그리 크지 않게 된다.Then, each recovery data server reads the recovery data held in the local storage in the server using the recovery data request command transmitted from the first data server 110a and transmits it to the first data server 100a (S301). In this case, the transmission of the recovery data may be performed step by step in consideration of the system situation of each data server. That is, the step-by-step progression is to first complete all of the recovery data held by the second data server 110b to the first data server 110a, and then to recover all the recovery data held by the fourth data server 110d. The data is transmitted to the first data server 110a. As such, when the transmission of the recovery data held in one data server 110a is completed, the recovery data held in the next data server is sequentially performed. In this case, the transmission may proceed in parallel. However, since network transmission capacity becomes a bottleneck, it proceeds sequentially or in parallel, but the difference is not so great.

한편 제 1 데이터 서버(110a)는 이렇게 각각의 복구 데이터 서버로부터 순차적으로 복구 데이터가 전송되면, 자신의 장애가 복구되는데 필요한 만큼의 복구 데이터가 전송되었는지 검색하면서, 장애 복구에 필요한 복구 데이터가 모두 입력될 때까지 복구 데이터를 수신한다(S302).On the other hand, when the recovery data is sequentially transmitted from each of the recovery data servers, the first data server 110a searches for recovery data as much as necessary to recover its failure, and inputs all the recovery data necessary for the failure recovery. Until the recovery data is received (S302).

그리고 입력된 복구데이터를 재구성한 후, 복구 연산을 수행한다(S303). 이때, 상기 복구 연산은 코시 리드-솔로몬(Cauchy Reed-Solomon) 방법을 통해 이루어진다. 한편, 리드-솔로몬 방법으로 복구연산을 수행할 수도 있다.After reconstructing the inputted recovery data, a recovery operation is performed (S303). In this case, the recovery operation is performed through the Cauchy Reed-Solomon method. Meanwhile, the recovery operation may be performed by the Reed-Solomon method.

상기 복구 연산 방법을 좀 더 상세히 설명하면, 도 10과 같이 제 1 데이터 서버(110a)는 수신되는 복구 데이터를 기반으로 복구를 위한 코시 비트 매트릭스(Cauchy bit Matrix)를 생성하고, 이렇게 생성된 코시 비트 매트릭스와 입력되는 데이터를 디코더에서 읽어들여 복구연산을 수행한 후, 적절한 크기(chunk)로 분할한다. 그리고 상기 적절한 크기로 분할된 모든 데이터를 이용해 삭제된 데이터들 복구하게 된다. 이는 코시 리드-솔로몬의 복구 연산 방법의 공지된 기술로, 자료를 여러 개의 조각(slice)로 분할하고, 분할된 데이터를 적어도 2개 이상의 서버로 저장하는 새로운 구조를 갖는 분산 파일 시스템에서 효율적인 자료 복구를 위해 공지된 코시 리드-솔로몬 등의 복구 연산 방법을 이용하는데 그 특징이 있다.
Referring to the recovery operation method in more detail, as shown in FIG. 10, the first data server 110a generates a Cauchy bit matrix for recovery based on the received recovery data, and then generates the Cauchy bit matrix. The matrix and the input data are read from the decoder to perform a recovery operation, and then divided into appropriate chunks. The deleted data is then recovered using all the data partitioned to the appropriate size. This is a well-known technique for Kosi Reed-Solomon's recovery operation method, which effectively divides data into multiple slices and efficiently recovers data in a distributed file system with a new structure that stores the divided data on at least two servers. It is characterized by using a known recovery operation method such as Kosi Reed-Solomon.

제 2 Second 실시예Example

도 8 은 본 발명의 자료 복구 방법에서 제 2 실시예에 따른 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 흐름도이고, 도 4b 는 본 발명의 자료 복구 방법에서 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 제 2 실시예이다.8 is a flowchart illustrating a method of receiving and restoring restoration data according to a second embodiment in a data restoration method of the present invention, and FIG. 4B is a method of receiving and restoring restoration data in a data restoration method of the present invention. A second embodiment for explanation.

도면을 참조하여 설명하면, 먼저 상기 메타데이터 서버(130)는 장애 데이터 서버(제 1 데이터 서버)(110a)의 장애를 복구하기 위한 복구 데이터를 저장하고 있는 복구 데이터 서버(제 2 데이터 서버, 제 4 데이터 서버, 제 5 데이터 서버, …, 제 m 데이터 서버, …, 제 n 데이터 서버)(110b)(110d)(110e)(110f)(110h)들의 위치가 검출되면(S200), 검출된 복구 데이터 서버(110b)(110d)(110e)(110f)(110h)들에 복구 데이터 요청 명령을 전달한다.Referring to the drawings, first, the metadata server 130 is a recovery data server (second data server, storing recovery data for recovering a failure of the failure data server (first data server) 110a). When the positions of the fourth data server, the fifth data server, ..., the mth data server, ..., the nth data server 110b, 110d, 110e, 110f and 110h are detected (S200), the detected recovery is performed. The recovery data request command is transmitted to the data servers 110b, 110d, 110e, 110f, and 110h.

그러면 각각의 복구 데이터 서버는 메타데이터 서버(130)가 전달한 복구 데이터 요청 명령을 이용하여 서버 내 로컬 스토리지에 보유하고 있는 복구 데이터를 읽어서 이를 제 1 데이터 서버(100a)로 전송한다(S301). 이때, 복구 데이터의 전송은 각 데이터 서버들의 시스템 상황을 고려하여 단계별로 진행될 수 있다. 즉, 단계별 진행은 먼저 제 2 데이터 서버(110b)에서 보유하고 있는 복구 데이터를 모두 제 1 데이터 서버(110a)로 전송이 완료되면, 다음 제 4 데이터 서버(110d)에서 보유하고 있는 복구 데이터를 모두 제 1 데이터 서버(110a)로 전송한다. 이처럼, 하나의 데이터 서버(110a)에 보유하고 있는 복구 데이터의 전송이 완료되면 순차적으로 다음 데이터 서버에서 보유하고 있는 복구 데이터의 전송이 이루어지도록 진행되는 방법이다. 이때 전송은 병렬로 진행될 수도 있다. 다만 네트워크 전송 용량이 병목점이 되므로 순차적으로 진행하거나 병렬로 진행하나 그 차이는 그리 크지 않게 된다.Then, each recovery data server reads the recovery data held in the local storage in the server using the recovery data request command transmitted from the metadata server 130 and transmits it to the first data server 100a (S301). In this case, the transmission of the recovery data may be performed step by step in consideration of the system situation of each data server. That is, the step-by-step progression is to first complete all of the recovery data held by the second data server 110b to the first data server 110a, and then to recover all the recovery data held by the fourth data server 110d. The data is transmitted to the first data server 110a. As such, when the transmission of the recovery data held in one data server 110a is completed, the recovery data held in the next data server is sequentially performed. In this case, the transmission may proceed in parallel. However, since network transmission capacity becomes a bottleneck, it proceeds sequentially or in parallel, but the difference is not so great.

그리고 입력된 복구데이터를 재구성한 후, 복구 연산을 수행한다(S303). 이때, 상기 복구 연산은 도 10에서 설명하고 있는 코시 리드-솔로몬(Cauchy Reed-Solomon) 방법을 통해 이루어진다. 한편, 리드-솔로몬 방법으로 복구연산을 수행할 수도 있다.After reconstructing the inputted recovery data, a recovery operation is performed (S303). In this case, the recovery operation is performed through the Cauchy Reed-Solomon method described in FIG. Meanwhile, the recovery operation may be performed by the Reed-Solomon method.

도 4c 는 제 1 실시예 및 제 2 실시예를 통해 자료 복구 연산시의 복구 작업 시간을 나타낸 타이밍도이다.FIG. 4C is a timing diagram showing a recovery operation time during a data recovery operation in the first and second embodiments. FIG.

이처럼 상기 제 1 데이터 서버(110a)는 각각의 복구 데이터 서버로부터 복구 데이터를 순차적으로 입력받게 됨에 따라 도 4c와 같이 복구 작업 시간은 수학식 1과 같이 나타낼 수 있다.As described above, as the first data server 110a receives the recovery data sequentially from each of the recovery data servers, the recovery work time may be represented by Equation 1 as shown in FIG. 4C.

[수학식 1][Equation 1]

복구 작업 시간 = (복구 서버 별 복구 데이터 전송 시간) × (참여하는 복구 데이터 서버) + (복구연산 시간)Recovery work time = (Recovery data transfer time per recovery server) × (Participation recovery data server) + (Recovery time)

상기 수학식 1과 같이, 총 복구 작업 시간은 참여하는 복구 데이터 서버의 수와 각각의 복구 데이터 전송시간의 곱을 갖는 전체 데이터 전송시간에 제 1 데이터 서버(110a)에서 각각의 복구 데이터 서버로부터 입력된 복구 데이터를 이용하여 장애를 복구하기 위한 복구 연산 처리 시간이 더해진 시간 만큼이 필요하다.As shown in Equation 1, the total recovery operation time is inputted from each recovery data server in the first data server 110a in the total data transmission time having the product of the number of participating recovery data servers and each recovery data transmission time. The time required for the recovery operation processing time to recover the failure using the recovery data is required.

제 1 실시예 및 제 2 실시예에서 설명하고 있는 장애 복구 방법에서 복구 연산 처리 시간을 좀 더 줄일 수 있는 방법에 대해 아래 제 3 실시예 및 제 4 실시예에서 설명한다.
A method for further reducing the recovery operation processing time in the failure recovery method described in the first and second embodiments will be described in the third and fourth embodiments below.

제 3 Third 실시예Example

도 9 는 본 발명의 자료 복구 방법에서 제 3 실시예에 따른 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 흐름도이고, 도 5a 는 본 발명의 자료 복구 방법에서 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 제 3 실시예이다.9 is a flowchart illustrating a method of receiving and recovering recovered data according to a third embodiment in a data recovery method of the present invention, and FIG. 5A illustrates a method of receiving and recovering recovered data in a data recovery method of the present invention. A third embodiment for explanation.

도면을 참조하여 설명하면, 먼저 상기 메타데이터 서버(130)는 장애 데이터 서버(제 1 데이터 서버)(110a)의 장애를 복구하기 위한 복구 데이터를 저장하고 있는 복구 데이터 서버(제 2 데이터 서버, 제 4 데이터 서버, 제 5 데이터 서버, …, 제 m 데이터 서버, …, 제 n 데이터 서버)(110b)(110d)(110e)(110f)(110h)들의 위치가 검출되면(S200), 장애가 발생된 제 1 데이터 서버(110a)로 상기 검출된 복구 데이터 서버들의 위치 정보를 제공한다. 이때, 복구 데이터 서버들의 위치 정보의 제공은 제 1 데이터 서버(110a)에서 메타데이터 서버(130)로 위치정보가 요청되면 해당 제 1 데이터 서버(110a)로 검출된 위치정보를 제공하거나, 또는 별도의 위치정보 요청 없이 장애가 탐지되면 해당 제 1 데이터 서버(110a)로 검출된 위치정보를 제공할 수도 있다.Referring to the drawings, first, the metadata server 130 is a recovery data server (second data server, storing recovery data for recovering a failure of the failure data server (first data server) 110a). When the positions of the fourth data server, the fifth data server, ..., the mth data server, ..., the nth data server 110b, 110d, 110e, 110f and 110h are detected (S200), a failure occurs. The first data server 110a provides location information of the detected recovery data servers. In this case, when the location information is requested from the first data server 110a to the metadata server 130, the location data of the recovery data servers may be provided with the location information detected by the corresponding first data server 110a, or separately. If a failure is detected without the request for the location information, the detected location information may be provided to the corresponding first data server 110a.

이어 상기 제 1 데이터 서버(110a)는 메타데이터 서버(130)를 통해 제공된 위치정보를 기반으로 각각의 복구 데이터 서버로 이루어진 하나의 파이프라인을 구성한다. 그리고 상기 제 1 데이터 서버(110a)는 구성된 파이프라인을 기반으로 다음 단의 복구 데이터 서버(제 2 데이터 서버)(110b)로 복구 데이터 요청 명령이 포함된 파이프라인 정보를 전달한다(S310).Subsequently, the first data server 110a configures one pipeline composed of each recovery data server based on the location information provided through the metadata server 130. The first data server 110a transmits pipeline information including a recovery data request command to a next recovery data server (second data server) 110b based on the configured pipeline (S310).

그러면 상기 제 2 데이터 서버(110b)는 제 1 데이터 서버(110a)가 전달한 파이프라인 정보를 이용하여 서버 내 로컬 스토리지에 보유하고 있는 복구 데이터를 읽어서 이를 구성된 파이프라인의 다음 단인 제 4 데이터 서버(100d)로 복구 데이터와 파이프라인 정보를 함께 전송한다(S320).Then, the second data server 110b reads the recovery data held in the local storage in the server using the pipeline information delivered by the first data server 110a, and then the fourth data server 100d which is the next stage of the configured pipeline. In step S320, the recovery data and the pipeline information are transmitted together.

그리고 상기 제 4 데이터 서버(100d)는 제 2 데이터 서버(110b)가 전달한 복구 데이터를 자신이 보유한 보유 데이터와 재구성한 후, 복구 연산을 수행한다(S330). 이때, 상기 복구 연산은 코시 리드-솔로몬(Cauchy Reed-Solomon) 방법과 같이 삭제 코드(erasure code)에 있어 중간 결과를 유지할 수 있는 복구 연산 방법을 통해 이루어진다. 그러나 본 제 3 실시예의 경우는 파이프라인 처리를 통해 데이터 전송 및 복구 연산이 동시에 이루어짐으로, 파이프라인된 복구를 위한 비트 연산을 수행하여야 한다.The fourth data server 100d reconstructs the recovered data transmitted by the second data server 110b with retained data owned by the fourth data server 100b and then performs a recovery operation (S330). In this case, the recovery operation is performed through a recovery operation method that can maintain an intermediate result in erasure code, such as the Cauchy Reed-Solomon method. However, in the third embodiment, since data transmission and recovery operations are simultaneously performed through pipeline processing, bit operations for pipelined recovery must be performed.

즉, 파이프라인된 복구를 위한 비트 연산 방법은 각각의 데이터 서버에서 XOR 연산에 결합법칙이 성립하므로 각 노드에 저장된 데이터에 대한 부분 연산 결과를 누적하여 파이프라인 오버헤드 처리를 단계별 계산 후 결합되도록 수행한다.That is, in the bit operation method for pipelined recovery, the combining law is established for the XOR operation in each data server, so that the result of partial operation on the data stored in each node is accumulated so that the pipeline overhead processing is combined after stepwise calculation. do.

좀 더 상세히 설명하면, 도 11과 같이 각각의 데이터 서버는 수신되는 복구 데이터를 기반으로 복구를 위한 코시 비트 매트릭스(Cauchy bit Matrix)를 생성하고, 이렇게 생성된 코시 비트 매트릭스와 입력되는 데이터를 읽어들여 복구연산을 수행한 후 적절한 크기(chunk)로 분할한다. 그리고 상기 적절한 크기로 분할된 첫 번째 데이터를 모아 디코더를 수행하여 동일한 크기의 임시 패리티 데이터를 생성한다. 그러면 상기 데이터 서버는 각 단계에서 사용하는 분할된 데이터와 이전 단계에서 생성된 임시 패리티 데이터를 이용하여 디코더를 수행하여 또 다시 새로운 임시 패리티 데이터를 생성하게 된다. 이러한 방법을 파이프라인의 다음 단에 위치하는 데이터 서버로 상기 과정이 반복되게 된다. 이때 전송 과정은 효율을 높이기 위해 버퍼링(buffering)을 사용할 수도 있다.In more detail, as shown in FIG. 11, each data server generates a Cauchy bit matrix for recovery based on the received recovery data, and reads the generated cocy bit matrix and the input data. After performing the recovery operation, partition into appropriate chunks. The first data divided into the appropriate size is collected to perform a decoder to generate temporary parity data of the same size. Then, the data server performs a decoder using the partitioned data used in each step and the temporary parity data generated in the previous step to generate new temporary parity data again. This process is repeated with a data server located at the next stage of the pipeline. In this case, the transmission process may use buffering to increase efficiency.

이어 앞 단의 제 2 데이터 서버(110b)에서 입력된 파이프라인 정보를 이용하여 파이프라인의 다음 단인 제 5 데이터 서버(100e)로 복구 연산의 수행결과인 복구 연산 데이터와 파이프라인 정보를 함께 전송한다(S350).Subsequently, by using the pipeline information input from the second data server 110b of the previous stage, the recovery operation data and the pipeline information, which are the result of performing the recovery operation, are transmitted together to the fifth data server 100e, which is the next stage of the pipeline. (S350).

이때 복구에 참여하는 복구 데이터 서버는 자료를 받는 작업, 복구 연산 작업, 자료를 내보내는 작업을 동시에 수행할 수 있다. 특히 자료를 받는 작업 및 자료를 내보내는 작업은 네트워크의 풀-듀플렉스(Full-duplex) 속성을 최대한 활용할 수 있어서 네트워크 자원을 효율적으로 사용할 수 있다. 물론 복구 연산은 현대의 멀티코어 CPU 자원을 효율적으로 사용할 수 있다. 따라서 복구 연산 데이터의 전송은 하나의 데이터 서버에서 복구 연산과 동시에 이루어지게 된다.At this time, the recovery data server participating in the recovery may simultaneously perform a task of receiving data, a recovery operation, and exporting the data. In particular, the task of receiving data and exporting data can make full use of the network's full-duplex attribute, which makes efficient use of network resources. Of course, recovery operations can efficiently use modern multicore CPU resources. Therefore, the recovery operation data is transmitted simultaneously with the recovery operation in one data server.

그리고 상기 제 5 데이터 서버(100e)는 제 4 데이터 서버(100d)가 전달한 복구 연산 데이터를 자신이 보유한 보유 데이터와 재구성한 후, 또 다시 도 11과 같이, 코시 리드-솔로몬(Cauchy Reed-Solomon) 방법과 같이 삭제 코드(erasure code)에 있어 중간 결과를 유지할 수 있는 복구 연산 방법을 통해 복구 연산을 수행한다(S330). 한편, 리드-솔로몬 방법으로 복구연산을 수행할 수도 있다.The fifth data server 100e reconstructs the recovery operation data transmitted by the fourth data server 100d with the retained data owned by the fourth data server 100d, and then, as illustrated in FIG. 11, Cauchy Reed-Solomon. Like the method, a recovery operation is performed through a recovery operation method capable of maintaining an intermediate result in an erasure code (S330). Meanwhile, the recovery operation may be performed by the Reed-Solomon method.

이러한 복구 연산은 구성된 파이프라인을 통한 데이터 서버로의 전송이 더 이상의 파이프라인의 다음 단인 데이터 서버가 존재하지 않을 때까지 계속 수행한다(S340). 이처럼 앞 단에 더 이상의 데이터 서버가 존재하지 않는 경우는 자신이 장애가 발생된 데이터 서버인 경우로 파이프라인의 처음인 경우에 해당된다.This recovery operation continues until the transmission to the data server through the configured pipeline is no longer the data server that is the next stage of the pipeline (S340). In this case, if there are no more data servers in the preceding stage, it is the case that the data server is a failure and the beginning of the pipeline.

이렇게 하여, 장애가 발생된 제 1 데이터 서버(110a)는 복구 데이터를 보유하고 있는 각각의 복구 데이터 서버로부터 순차적으로 복구 데이터를 입력받는 것이 아니라, 파이프 라인을 거쳐 누적된 복구 연산 결과가 마지막 단에 위치하는 제 n 데이터 서버(110a)를 통해서만 입력 받게 된다.
In this way, the failed first data server 110a does not receive the recovery data sequentially from each of the recovery data servers holding the recovery data, but the result of the recovery operation accumulated through the pipeline is located at the last stage. The input is received only through the n th data server 110a.

제 4 Fourth 실시예Example

도 9 는 본 발명의 자료 복구 방법에서 제 3 실시예에 따른 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 흐름도이고, 도 5b 는 본 발명의 자료 복구 방법에서 복구 데이터의 수신 및 복구 연산 방법을 설명하기 위한 제 4 실시예이다.9 is a flowchart illustrating a method of receiving and restoring recovery data according to a third embodiment in a data recovery method of the present invention, and FIG. 5B is a method of receiving and restoring recovery data in a data recovery method of the present invention. A fourth embodiment for explanation.

도면을 참조하여 설명하면, 먼저 상기 메타데이터 서버(130)는 장애 데이터 서버(제 1 데이터 서버)(110a)의 장애를 복구하기 위한 복구 데이터를 저장하고 있는 복구 데이터 서버(제 2 데이터 서버, 제 4 데이터 서버, 제 5 데이터 서버, …, 제 m 데이터 서버, …, 제 n 데이터 서버)(110d)(110e)(110f)(110h)들의 위치가 검출되면(S200), 검출된 복구 데이터 서버(110b)(110d)(110e)(110f)(110h)들 중 어느 하나의 복구 데이터 서버(여기서는 제 2 데이터 서버(110b)로 지정)에 상기 검출된 복구 데이터 서버의 위치정보를 전달한다. Referring to the drawings, first, the metadata server 130 is a recovery data server (second data server, storing recovery data for recovering a failure of the failure data server (first data server) 110a). When the positions of the fourth data server, the fifth data server, ..., the mth data server, ..., the nth data server 110d, 110e, 110f, and 110h are detected (S200), the detected recovery data server ( The location information of the detected recovery data server is transmitted to the recovery data server (in this case, designated as the second data server 110b) of any one of 110b, 110d, 110e, 110f, and 110h.

그러면, 상기 제 2 데이터 서버(110b)는 메타데이터 서버(130)를 통해 제공된 위치정보를 기반으로 각각의 복구 데이터 서버로 이루어진 하나의 파이프라인을 구성한다(S310). 이때 구성되는 파이프라인의 마지막 단은 장애가 발생된 제 1 데이터 서버(100a)로 구성된다.Then, the second data server 110b configures one pipeline composed of each recovery data server based on the location information provided through the metadata server 130 (S310). At this time, the last stage of the pipeline is composed of the first data server 100a having a failure.

그리고 상기 제 2 데이터 서버(110b)는 자신의 로컬 스토리지에 보유하고 있는 복구 데이터를 읽어서 이를 구성된 파이프라인의 다음 단인 제 4 데이터 서버(100d)로 복구 데이터와 파이프라인 정보를 함께 전송한다(S320).The second data server 110b reads the recovery data held in its local storage and transmits the recovery data and the pipeline information together to the fourth data server 100d, which is the next stage of the configured pipeline (S320). .

그러면 상기 제 4 데이터 서버(110d)는 제 2 데이터 서버(110b)가 전달한 복구 데이터를 자신이 보유한 보유 데이터와 재구성한 후, 도 11과 같이, 코시 리드-솔로몬(Cauchy Reed-Solomon) 방법과 같이 삭제 코드(erasure code)에 있어 중간 결과를 유지할 수 있는 복구 연산 방법을 통해 복구 연산을 수행한다(S330). 한편, 리드-솔로몬 방법으로 복구연산을 수행할 수도 있다.Then, the fourth data server 110d reconstructs the recovered data delivered by the second data server 110b with retained data owned by the fourth data server 110b and then, as shown in FIG. 11, as in the Cauchy Reed-Solomon method. The recovery operation is performed through a recovery operation method capable of maintaining an intermediate result in the erasure code (S330). Meanwhile, the recovery operation may be performed by the Reed-Solomon method.

이어 상기 제 4 데이터 서버(110d)는 앞 단의 제 2 데이터 서버(110b)에서 입력된 파이프라인 정보를 이용하여 파이프라인의 다음 단인 제 5 데이터 서버(100e)로 복구 연산의 수행결과인 복구 연산 데이터와 파이프라인 정보를 함께 전송한다(S350).Subsequently, the fourth data server 110d uses the pipeline information input from the second data server 110b of the previous stage to restore the data to the fifth data server 100e, which is the next stage of the pipeline. The data and the pipeline information are transmitted together (S350).

이렇게 하여, 장애가 발생된 제 1 데이터 서버(110a)는 복구 데이터를 보유하고 있는 각각의 복구 데이터 서버로부터 순차적으로 복구 데이터를 입력받는 것이 아니라, 파이프 라인을 거쳐 누적된 복구 연산 결과가 마지막 단에 위치하는 제 n 데이터 서버(110a)를 통해서만 입력된다.In this way, the failed first data server 110a does not receive the recovery data sequentially from each of the recovery data servers holding the recovery data, but the result of the recovery operation accumulated through the pipeline is located at the last stage. Is input only through the n th data server 110a.

도 5c 는 제 3 실시예 및 제 4 실시예를 통해 자료 복구 연산시의 복구 작업 시간을 나타낸 타이밍도이다.FIG. 5C is a timing diagram showing a recovery operation time during a data recovery operation in the third and fourth embodiments. FIG.

이처럼 상기 제 1 데이터 서버(110a)는 각각의 복구 데이터 서버로부터 복구 데이터를 파이프라인을 기반으로 복구 연산 데이터의 전송 및 복구 연산을 동시에 수행함에 따라 도 5c와 같이 복구 작업 시간은 수학식 2와 같이 나타낼 수 있다.As described above, as the first data server 110a simultaneously transmits and recovers the recovery operation data from each of the recovery data servers based on the pipeline, the recovery operation time is represented by Equation 2 as shown in FIG. Can be represented.

[수학식 2][Equation 2]

복구 작업 시간 = (하나의 데이터 전송 시간) + (파이프라인 오버헤드) × (참여하는 데이터 노드 수)Recovery operation time = (one data transfer time) + (pipeline overhead) × (number of participating data nodes)

상기 수학식 2와 같이, 자료를 받는 작업, 복구 연산 작업 및 자료를 내보내는 작업을 동시에 수행함에 따라 하나의 데이터 전송 시간, 참여하는 복구 데이터 서버 및 복구연산 시간을 파이프라인 오버헤드 시켜 복구 작업 시간을 줄일 수 있다. 이때, 상기 파이프라인 오버헤드는 중첩되지 않는 시간을 의미한다. 그리고 복구 연산 시간은 대부분 데이터 전송 시간 내에 병렬로 수행되어 중첩된다.As shown in Equation 2, as the data receiving operation, the recovery operation operation, and the data export operation are performed at the same time, one data transmission time, the participating recovery data server, and the recovery operation time are pipelined to reduce the recovery operation time. Can be reduced. In this case, the pipeline overhead means a time that does not overlap. The recovery operation time is mostly performed in parallel within the data transmission time and overlapped.

상기에서 설명한 본 발명의 기술적 사상은 바람직한 실시예에서 구체적으로 기술되었으나, 상기한 실시예는 그 설명을 위한 것이며 그 제한을 위한 것이 아님을 주의하여야 한다. 또한, 본 발명의 기술적 분야의 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 실시예가 가능함을 이해할 수 있을 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Although the technical spirit of the present invention described above has been described in detail in a preferred embodiment, it should be noted that the above-described embodiment is for the purpose of description and not of limitation. In addition, those skilled in the art will understand that various embodiments are possible within the scope of the technical idea of the present invention. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

Claims

(A) if a failed first data server is detected through the metadata server, detecting location information of data servers storing recovery data to be reconstructed for recovery of the detected first data server failure;
(B) requesting recovery data from the data servers where the location information is detected, receiving recovery data, and performing a recovery operation through a recovery operation using the received recovery data;
(C) completing the recovery operation of the failed data server through the recovery operation, and notifying the metadata server that the recovery of the data server is completed; .

The method of claim 1, wherein step (B)
Providing location information of the detected recovery data servers to any data server that is responsible for failover;
Receiving a recovery data request command stored in local storage in the recovery data server by transmitting a recovery data request command to the recovery data servers holding the recovery data based on the provided location information;
Efficient data recovery method in a distributed file system comprising the step of receiving the recovery data until the failure recovery of the first data server, reconstructing the input recovery data, and performing a recovery operation .

The method of claim 1, wherein step (B)
Transmitting a recovery data request command to the recovery data servers from which the location information is detected;
Receiving recovery data held in local storage in each recovery data server to which the recovery data request command is transmitted;
And receiving the recovery data until the failure of the first data server is possible, reconstructing the input recovery data, and performing a recovery operation.

The method according to claim 2 or 3,
The input of the recovery data is efficient data recovery method in a distributed file system, characterized in that the transmission of the recovery data held in the next data server is sequentially performed when the transmission of the recovery data held in one data server is completed .

The method according to claim 2 or 3,
The input of the recovery data is effective data recovery method in a distributed file system, characterized in that the detected recovery data server in parallel based on the network transmission capacity.

The method according to claim 2 or 3,
The recovery operation is any one of Reed-Solomon, Cauchy Reed-Solomon method, efficient data recovery method in a distributed file system.

The method of claim 1, wherein step (B)
Providing location information of the detected recovery data servers to the failed first data server;
A pipeline including a recovery data request command to a next recovery data server based on the configured pipeline by configuring one pipeline composed of each recovery data server based on the location information of the provided recovery data servers. Transmitting the information,
Reading recovery data held in its local storage based on the transmitted pipeline information, and transmitting the recovery data and pipeline information together to a recovery data server, which is the next stage of the pipeline;
A reconstruction operation that is a result of performing a reconstruction operation to a data server, which is the next stage of the pipeline, by performing a reconstruction operation after reconstructing the reconstruction data transmitted from the reconstruction data server of the previous stage with retained data owned by the residing data. Sending data and pipeline information together;
After reconstructing the transmitted recovery operation data with the retained data owned by the user, the recovery operation data and the pipeline information, which are the result of the recovery operation, are performed to the data server, which is the next stage of the pipeline, by performing a repair operation while performing a recovery operation. Sending together,
And continuing until the next stage, the data server, based on the configured pipeline information, does not exist.

The method of claim 1, wherein step (B)
Providing location information of the detected recovery data server to any one of the recovery data servers from which the location information has been detected;
Constructing a pipeline consisting of respective recovery data servers based on the location information of the provided recovery data servers;
Reading recovery data held in its local storage based on the configured pipeline information and transmitting the recovery data and pipeline information together to a recovery data server, which is the next stage of the pipeline;
After reconstructing the transmitted recovery data from the retained data owned by the owned data, the recovery operation data and the pipeline information, which are the result of performing the recovery operation together with the data server which is the next stage of the pipeline through the pipeline processing while performing the recovery operation, together Transmitting,
After reconstructing the transmitted recovery operation data with the retained data owned by the user, the recovery operation data and the pipeline information, which are the result of the recovery operation, are performed to the data server, which is the next stage of the pipeline, by performing a repair operation while performing a recovery operation. Sending together,
And continuing until the next stage, the data server, based on the configured pipeline information, does not exist.

9. The method according to claim 7 or 8,
The recovery operation is any one of Reed-Solomon, Cauchy Reed-Solomon method, efficient data recovery method in a distributed file system.