KR20200072128A

KR20200072128A - Distributed file system and file managing method for live service

Info

Publication number: KR20200072128A
Application number: KR1020180159946A
Authority: KR
Inventors: 오재원; 양승관; 윤병조; 김종주; 강진구; 노혜성; 황주비; 박기영
Original assignee: 네이버 주식회사
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2020-06-22

Abstract

Provided are a distributed file system and a file management method for a live service. According to one embodiment of the present invention, a computing device included in the distributed file system for a live service may set an expiration time for each of the files requested to be stored in association with a live service, group the files requested to be stored into a plurality of directories identified by a deletion time and store the files using the set expiration time, and delete all files stored in a directory selected based on a current time and the deletion time.

Description

DISTRIBUTED FILE SYSTEM AND FILE MANAGING METHOD FOR LIVE SERVICE}

아래의 설명은 라이브 서비스를 위한 분산 파일 시스템 및 파일 관리 방법에 관한 것이다.The following description relates to a distributed file system and a file management method for a live service.

분산 파일 시스템은 일반적으로 네트워크 상에 파일을 영구적으로 저장할 목적으로 사용된다. 용도에 따라서 다양한 기능과 특징을 가지는데, 예를 들어, 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS)은 빅데이터 처리를 위해 대용량 파일을 여러 조각으로 나누어 분산 저장하고, 파일 첨부(append) 기능을 제공한다. 세프(Ceph)는 용량 절감에 효과적인 이레이저 코딩(erasure coding)을 지원하고 있으며, SeaweedFS는 작은 파일을 빠르게 처리하는 특징이 있다.Distributed file systems are generally used for the purpose of permanently storing files on a network. Depending on the purpose, it has various functions and features. For example, Hadoop Distributed File System (HDFS) divides and stores large files into several pieces for big data processing, and stores and distributes files. Gives Ceph supports erasure coding, which is effective in reducing capacity, and SeaweedFS features fast processing of small files.

반면, 라이브 서비스에 가장 널리 사용되는 HLS, MPEG-DASH 프로토콜은 영상 데이터를 TS Segment로 잘게 나눈다. 2Mbps 영상을 4시간 동안 라이브 서비스를 할 경우, 750KB 용량의 TS 파일이 약 4800 개 생성된다. 그리고, 이들 파일들은 라이브 서비스가 끝나면 더 이상 필요하지 않기 때문에 삭제되어야 한다.On the other hand, the most widely used HLS and MPEG-DASH protocols for live services divide video data into TS segments. When 2Mbps video is live for 4 hours, about 4800 TS files of 750KB capacity are generated. And, these files should be deleted because they are no longer needed when the live service ends.

따라서, 라이브 서비스의 이러한 특징들, 즉 많은 수의 작은 파일에 대한 실시간 처리와 자동 삭제 기능 등에 최적화된 라이브 전용 분산 파일 시스템의 필요성이 생기게 된다.Accordingly, there is a need for a live dedicated distributed file system optimized for these characteristics of a live service, that is, real-time processing and automatic deletion of a large number of small files.

라이브 서비스에서 저장되는 파일의 휘발성을 고려하여 분산 파일 시스템의 스케일링(scaling) 시 이동되어야 할 대상 파일을 최신 파일로 한정할 수 있는 분산 파일 시스템, 상기 분산 파일 시스템이 포함하는 컴퓨터 장치, 상기 컴퓨터 장치가 수행하는 파일 관리 방법, 컴퓨터 장치와 결합되어 상기 파일 관리 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램과 그 기록매체를 제공한다.A distributed file system capable of limiting a target file to be moved to a newer file when scaling a distributed file system in consideration of volatility of a file stored in a live service, a computer device included in the distributed file system, and the computer device It provides a computer program stored in a computer-readable recording medium and a recording medium to execute the file management method in combination with a computer device and a file management method performed by the computer device.

라이브 서비스에서 저장되는 파일의 휘발성을 고려하여 파일을 생존 시간 별로 그룹핑하여 저장하는 디렉토리 구조를 통해 파일의 삭제를 최적화할 수 있는 분산 파일 시스템, 상기 분산 파일 시스템이 포함하는 컴퓨터 장치, 상기 컴퓨터 장치가 수행하는 파일 관리 방법, 컴퓨터 장치와 결합되어 상기 파일 관리 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램과 그 기록매체를 제공한다.A distributed file system capable of optimizing the deletion of files through a directory structure in which files are grouped and stored for each survival time in consideration of volatility of files stored in a live service, a computer device included in the distributed file system, and the computer device A file management method to be performed, and a computer program stored in a computer-readable recording medium to execute the file management method in combination with a computer device, and a recording medium therefor.

라이브 서비스에서의 상대적으로 작고 많은 파일들을 상대적으로 큰 크기의 병렬 볼륨 파일을 통해 저장함으로써 라이브 서비스가 갖게 되는 작고 많은 파일들을 보다 빠르게 처리할 수 있는 분산 파일 시스템, 상기 분산 파일 시스템이 포함하는 컴퓨터 장치, 상기 컴퓨터 장치가 수행하는 파일 관리 방법, 컴퓨터 장치와 결합되어 상기 파일 관리 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램과 그 기록매체를 제공한다.A distributed file system capable of more quickly processing small and many files of a live service by storing relatively small and many files in a live service through a relatively large parallel volume file, and a computer device included in the distributed file system , A file management method performed by the computer device, and a computer program stored in a computer-readable recording medium for execution of the file management method on the computer device in combination with the computer device and a recording medium therefor.

라이브 서비스를 위한 분산 파일 시스템에 포함되는 컴퓨터 장치에 있어서, 상기 컴퓨터 장치에서 판독 가능한 명령을 실행하도록 구현되는 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서에 의해, 상기 라이브 서비스와 연관하여 저장 요청되는 파일들 각각에 대해 만료 시간을 설정하고, 상기 설정된 만료 시간을 이용하여, 상기 저장 요청되는 파일들을 삭제 시각에 의해 식별되는 복수의 디렉토리로 그룹핑하여 저장하고, 현재 시각과 상기 삭제 시각에 기초하여 선택되는 디렉토리에 저장된 파일들을 일괄 삭제하는 것을 특징으로 하는 컴퓨터 장치를 제공한다.A computer device included in a distributed file system for a live service, comprising: at least one processor implemented to execute readable instructions on the computer device, and stored by the at least one processor in association with the live service Set an expiration time for each of the requested files, and use the set expiration time to group and store the files requested to be stored into a plurality of directories identified by a deletion time, based on the current time and the deletion time It provides a computer device characterized in that the files stored in the directory selected by the batch deletion.

라이브 서비스를 위한 분산 파일 시스템에 포함되는 컴퓨터 장치가 수행하는 데이터 처리 방법에 있어서, 상기 컴퓨터 장치가 포함하는 적어도 하나의 프로세서에 의해, 상기 라이브 서비스와 연관하여 저장 요청되는 파일들 각각에 대해 만료 시간을 설정하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 설정된 만료 시간을 이용하여, 상기 저장 요청되는 파일들을 삭제 시각에 의해 식별되는 복수의 디렉토리로 그룹핑하여 저장하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 현재 시각과 상기 삭제 시각에 기초하여 선택되는 디렉토리에 저장된 파일들을 일괄 삭제하는 단계를 포함하는 파일 관리 방법을 제공한다.A data processing method performed by a computer device included in a distributed file system for a live service, the expiration time for each of the files requested to be stored in association with the live service by at least one processor included in the computer device. Setting it; Grouping and storing, by the at least one processor, the files requested to be stored into a plurality of directories identified by a deletion time using the set expiration time; And collectively deleting files stored in a directory selected based on the current time and the deletion time by the at least one processor.

컴퓨터 장치와 결합되어 상기 파일 관리 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램을 제공한다.In combination with a computer device, a computer program stored in a computer-readable recording medium is provided to execute the file management method on the computer device.

상기 파일 관리 방법을 컴퓨터 장치에 실행시키기 위한 프로그램이 기록되어 있는 컴퓨터 판독 가능한 기록매체를 제공한다.A computer-readable recording medium in which a program for executing the file management method is executed is provided.

라이브 서비스에서 저장되는 파일의 휘발성을 고려하여 분산 파일 시스템의 스케일링(scaling) 시 이동되어야 할 대상 파일을 최신 파일로 한정할 수 있다.In consideration of the volatility of a file stored in a live service, a target file to be moved during scaling of a distributed file system can be limited to the latest file.

라이브 서비스에서 저장되는 파일의 휘발성을 고려하여 파일을 생존 시간 별로 그룹핑하여 저장하는 디렉토리 구조를 통해 파일의 삭제를 최적화할 수 있다.In consideration of the volatility of the files stored in the live service, the deletion of the files can be optimized through a directory structure in which the files are grouped and stored for each survival time.

라이브 서비스에서의 상대적으로 작고 많은 파일들을 상대적으로 큰 크기의 병렬 볼륨 파일을 통해 저장함으로써 라이브 서비스가 갖게 되는 작고 많은 파일들을 보다 빠르게 처리할 수 있다.By storing relatively small and many files in a live service through a relatively large sized parallel volume file, the small and many files that the live service has can be processed more quickly.

도 1은 본 발명의 일실시예에 따른 분산 파일 시스템의 구성 예를 도시한 도면이다.
도 2는 본 발명의 일실시예에 있어서, 서버들의 역할의 예를 도시한 도면이다.
도 3은 본 발명의 일실시예에 있어서, 파일의 저장 과정의 예를 도시한 흐름도이다.
도 4는 본 발명의 일실시예에 있어서, 워밍업 과정을 설명하기 위한 예이다.
도 5는 본 발명의 일실시예에 있어서, 다이렉트 방식, 릴레이 방식 및 리다이렉트 방식을 설명하기 위한 도면이다.
도 6은 본 발명의 일실시예에 있어서, 주키퍼가 저장 및 공유하는 정보의 예를 도시한 도면이다.
도 7은 본 발명의 일실시예에 있어서, 복수의 파일들을 하나의 볼륨 파일에 저장하는 예를 도시한 도면이다.
도 8은 본 발명의 일실시예에 있어서, 만료 시간이 지난 그룹을 삭제하는 예를 도시한 도면이다.
도 9는 본 발명의 일실시예에 있어서, 그룹 전용 레디스의 예를 도시한 도면이다.
도 10은 본 발명의 일실시예에 있어서, 서버들간의 헬스 체크 세션의 예를 도시한 도면이다.
도 12는 본 발명의 일실시예에 있어서, CLUSTER를 통해 조회되는 정보의 예를 도시한 도면이다.
도 13은 본 발명의 일실시예에 있어서, 분산 파일 시스템의 컴포넌트들의 예를 도시한 도면이다.
도 14는 본 발명의 일실시예에 있어서, 쓰레드 모델의 예를 도시한 도면이다.
도 15는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다.
도 16은 본 발명의 일실시예에 따른 파일 관리 방법의 예를 도시한 흐름도이다.1 is a diagram showing an example of a configuration of a distributed file system according to an embodiment of the present invention.
2 is a diagram showing an example of the role of servers in an embodiment of the present invention.
3 is a flowchart illustrating an example of a file storage process in an embodiment of the present invention.
4 is an example for explaining a warm-up process in an embodiment of the present invention.
5 is a view for explaining a direct method, a relay method and a redirect method in one embodiment of the present invention.
6 is a diagram showing an example of information stored and shared by the main keeper in an embodiment of the present invention.
7 is a diagram illustrating an example of storing a plurality of files in one volume file in one embodiment of the present invention.
8 is a diagram illustrating an example of deleting a group whose expiration time has expired in one embodiment of the present invention.
9 is a diagram showing an example of a group-only redis in an embodiment of the present invention.
10 is a diagram illustrating an example of a health check session between servers in an embodiment of the present invention.
12 is a diagram showing an example of information searched through CLUSTER in an embodiment of the present invention.
13 is a diagram showing an example of components of a distributed file system in an embodiment of the present invention.
14 is a diagram showing an example of a thread model in an embodiment of the present invention.
15 is a block diagram showing an example of a computer device according to an embodiment of the present invention.
16 is a flowchart illustrating an example of a file management method according to an embodiment of the present invention.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

본 발명의 실시예들에 따른 분산 파일 시스템은 라이브 서비스에 특화되어 있으나, 일반적인 분산 파일 시스템과 동일한 과제 상황을 갖고 있다. 다시 말해, 파일 검색을 위한 서버 모델(centralized, decentralized), 파일 복제 방식(replication, erasure coding), 장애 대응, 확장성 등을 고려해야 하며, 추가로 라이브 서비스의 실시간 처리를 위해 높은 성능을 낼 수 있어야 한다.The distributed file system according to embodiments of the present invention is specialized for a live service, but has the same problem situation as a general distributed file system. In other words, the server model for file retrieval (centralized, decentralized), file replication (replication, erasure coding), failure response, scalability, etc. should be considered, and additionally, high performance for real-time processing of live services should be possible. do.

일실시예들에 따른 분산 파일 시스템의 기본적인 디자인 철학은 단순성과 성능이다. 분산 파일 시스템으로서의 확장성을 고려해 완전 분산 모델을 기반으로 했고, 여러 선택 사항들에 대해서는 좀더 단순한 쪽을 택하여 전체 구조와 개념이 단순함을 유지할 수 있도록 했다. 머신에서 성능에 병목이 될 수 있는 네트워크 IO(Input/Output)는 멀티 쓰레드 기반의 프로액터 패턴(Proactor Pattern)으로, 디스크 IO는 작고 많은 파일 처리에 효과적인 병렬 볼륨 파일로 구현될 수 있다.The basic design philosophy of a distributed file system according to one embodiment is simplicity and performance. It was based on a fully distributed model considering scalability as a distributed file system, and the simpler one was chosen for many options to keep the overall structure and concept simple. Network IO (Input/Output), which can be a performance bottleneck on the machine, is a multi-threaded Proactor Pattern, and disk IO can be implemented as a parallel volume file that is effective for processing small and large files.

본 발명의 실시예들에 따른 분산 파일 시스템은 아래와 같은 특징들을 가질 수 있다.The distributed file system according to embodiments of the present invention may have the following features.

A. 분산적(Decentralized)A. Decentralized

본 발명의 실시예들에 따른 분산 파일 시스템은 완전 분산 모델의 분산 파일 시스템으로서 어떠한 중앙 서버도 존재하지 않도록 구현될 수 있다. 따라서, 이론적으로 무한대의 확장(Expandable)이 가능하고, 병목 구간이 없기 때문에 확장에 따른 성능 저하가 발생하지 않으며, 일부 서버의 장애가 전체 장애로 이어지지 않는다(No SPOF: No Single Point of Failure).The distributed file system according to embodiments of the present invention may be implemented as a distributed file system of a fully distributed model such that no central server exists. Therefore, theoretically, infinite expansion is possible, and there is no bottleneck, so performance degradation does not occur due to expansion, and failure of some servers does not lead to total failure (No SPOF: No Single Point of Failure).

B. 단순 및 심리스 스케일링(Simple & seamless scaling)B. Simple & seamless scaling

완전 분산 모델에서의 시스템 확장은 대규모 파일 이동을 유발할 수 있고, 파일 이동은 시스템 성능에 큰 영향을 미친다. 따라서 일반적으로는 시스템을 완전히 정지시킨 후에 확장을 수행하기도 한다. 또한, 시스템 확장과 이에 따른 파일 이동은 시스템 구조를 복잡하게 만들 수 있다.System expansion in a fully distributed model can lead to large-scale file movement, and file movement has a significant effect on system performance. Therefore, in general, expansion is performed after the system is completely stopped. Also, system expansion and thus file movement can complicate the system structure.

본 발명의 실시예들에 따른 분산 파일 시스템은 라이브 서비스 전용으로 활용도리 수 있고, 라이브 서비스에서는 대부분의 접근이 최신 파일에 집중되기 때문에 이동의 대상을 최신 파일로 한정할 수 있다. 본 발명의 실시예들에 따른 스케일링에 워밍업(warming-up) 방식을 적용하여 예측 가능하고 중단없는 확장을 수행할 수 있다.The distributed file system according to the embodiments of the present invention may be used exclusively for a live service, and in the live service, since most of the access is concentrated on the latest file, the target of movement may be limited to the latest file. Predictable and uninterrupted expansion can be performed by applying a warm-up method to scaling according to embodiments of the present invention.

C. 실용적이고 효과적인 장애 극복(Practical & effective failover)C. Practical & effective failover

일반적으로 분산 파일 시스템에서는 장애를 대비해 파일을 복제하기 때문에 일부 서버의 장애 시에도 시스템은 거의 항상 가용한 상태를 유지한다(High Availability). 장애는 매우 드물게 발생하지만, 반드시 처리되어야 한다. 하지만, 장애 처리의 과정은 매우 어렵고 복잡하여 때론 정상 상태의 동작과 성능에도 영향을 미칠 수 있다.In general, in a distributed file system, files are copied in case of failure, so even if some servers fail, the system is almost always available (High Availability). Disorders are very rare, but must be addressed. However, the process of handling a fault is very difficult and complicated, and can sometimes affect the normal operation and performance.

본 발명의 실시예들에 따른 분산 파일 시스템은 장애를 복구하지 않도록 구현될 수 있다. 다만, 파일 요청 시점에 해당 파일에 접근할 수 있는 간단한 방법을 제공하며, 이 방법은 정상 상태의 동작과 성능에 영향을 끼치지 않고 단순하게 구현될 수 있다.The distributed file system according to embodiments of the present invention may be implemented not to recover from a failure. However, it provides a simple way to access the file at the time the file is requested, and this method can be implemented simply without affecting normal operation and performance.

D. 작고, 많으며, 휘발성의 파일을 위한 최적화(Optimized for small, many & volatile files)D. Optimized for small, many & volatile files

본 발명의 실시예들에 따른 분산 파일 시스템은 작고, 많은 수의 파일을 빠르게 처리하기 위해 병렬 볼륨 파일을 도입할 수 있으며, 파일의 삭제에 최적화된 디렉토리 구조를 가질 수 있다.The distributed file system according to embodiments of the present invention is small and can introduce a parallel volume file to quickly process a large number of files, and may have a directory structure optimized for deletion of files.

E. 최소의 레스트풀 APIs(Minimal RESTful Application Program Interfaces)E. Minimal RESTful Application Program Interfaces (APIs)

본 발명의 실시예들에 따른 분산 파일 시스템에서는 배포와 호환성 이슈가 있는 클라이언트 SDK(Software Development Kit)는 지원하지 않을 수 있다. 대신 개발 언어에 종속적이지 않고 사용하기 쉬운 레스트풀 API를 지원할 수 있다. 최소로 정제된 4 개의 API(PUT, GET, DEL, CLUSTER)만으로 충분히 본 발명의 실시예들에 따른 분산 파일 시스템 효율적으로 사용할 수 있다.In a distributed file system according to embodiments of the present invention, a client software development kit (SDK) having distribution and compatibility issues may not be supported. Instead, it can support restful APIs that are not dependent on the development language and are easy to use. Only a minimally refined 4 APIs (PUT, GET, DEL, CLUSTER) can be efficiently used in a distributed file system according to embodiments of the present invention.

1. 시스템 아키텍처(System Architecture)1. System Architecture

A. 클러스터(Cluster)A. Cluster

도 1은 본 발명의 일실시예에 따른 분산 파일 시스템의 구성 예를 도시한 도면이다. 도 1은 분산 파일 시스템(100)이 N개의 그룹들을 포함할 수 있음을 나타내고 있다. 또한, 그룹들 각각은 복수 개(일례로 3 이상)의 머신들을 포함할 수 있다. 예를 들어, 도 1에서는 그룹 1(110)이 세 개의 머신들(111, 112, 113)을 포함할 수 있고, 그룹 N(120) 역시 세 개의 머신들(121, 122, 123)을 포함하는 예를 나타내고 있다. 분산 파일 시스템(100)이 포함하는 다른 그룹들 역시 유사하게 구현될 수 있다. 또한, 머신들은 각각 서버를 구현할 수 있다. 도 1에서는 그룹 1(110)이 포함하는 머신 1-1(111)이 서버 1-1(111-2)을, 머신 1-2(112)가 서버 1-2(112-2)를, 머신 1-3(113)이 서버 1-3(113-2)을 각각 구현하는 예를 나타내고 있다. 이와 유사하게 도 1에서는 그룹 N(120)이 포함하는 머신 N-1(121)이 서버 N-1(121-2)을, 머신 N-2(122)가 서버 N-2(122-2)를, 머신 N-3(123)이 서버 N-3(123-2)을 각각 구현하는 예를 나타내고 있다. 실시예에 따라 머신들 각각은 캐쉬 서버를 더 구현할 수도 있다. 도 1에서는 머신들(111, 112, 113, 121, 122, 123)이 캐쉬 서버들(111-1, 112-1, 113-1, 121-1, 122-1, 123-1)을 구현하는 예를 나타내고 있다.1 is a diagram showing an example of a configuration of a distributed file system according to an embodiment of the present invention. 1 shows that the distributed file system 100 may include N groups. Also, each of the groups may include a plurality (eg 3 or more) of machines. For example, in FIG. 1, group 1 110 may include three machines 111, 112, 113, and group N 120 also includes three machines 121, 122, 123. It shows an example. Other groups included in the distributed file system 100 may be similarly implemented. Also, each machine can implement a server. In FIG. 1, machines 1-1 (111) included in group 1 110 include servers 1-1 (111-2), machines 1-2 (112) represent servers 1-2 (112-2), and machines. 1-3(113) shows an example of implementing the server 1-3(113-2), respectively. Similarly, in FIG. 1, the machine N-1 (121) included in the group N 120 is the server N-1 (121-2), and the machine N-2 (122) is the server N-2 (122-2). In the example, the machine N-3 123 implements the servers N-3 123-2, respectively. Depending on the embodiment, each of the machines may further implement a cache server. In FIG. 1, machines 111, 112, 113, 121, 122, and 123 implement cache servers 111-1, 112-1, 113-1, 121-1, 122-1, and 123-1. It shows an example.

또한, 하나의 그룹에는 세 개의 머신들에 걸쳐서 하나의 레디스(Redis)가 구현될 수 있다. 도 1에서는 그룹 1(110)의 세 개의 머신들(111, 112, 113)에 대해 레디스 센티널 1(114)이 구현된 예를, 그룹 N(120)의 세 개의 머신들에 대해 레디스 센티널 N(124)이 구현된 예를 각각 나타내고 있다. 분산 파일 시스템(100)이 포함하는 다른 그룹들 각각에도 레디스 센티널이 구현될 수 있다.Also, one Redis may be implemented in three groups over three machines. In FIG. 1, an example in which the redis sentinel 1 114 is implemented for the three machines 111, 112, and 113 of the group 1 110, the redis sentinel for the three machines in the group N 120 Each of the examples of N 124 is shown. Redis Sentinel may be implemented in each of the other groups included in the distributed file system 100.

또한, N 개의 그룹들에 걸쳐서 주키퍼(ZooKeeper, 130)가 구현될 수 있다.Further, a ZooKeeper 130 may be implemented across N groups.

분산 파일 시스템(100)이 포함하는 구성요소들과 그 동작에 대해서는 이후 더욱 자세히 설명한다.Components and operations of the distributed file system 100 will be described in more detail later.

1) 그룹(Group)1) Group

본 발명의 실시예들에 따른 분산 파일 시스템에서는 완전 분산 모델(Decentralized Model)을 구현하기 위해 "jump-consistent-hash(jc-hash)" 알고리즘을 이용하여 파일의 위치를 저장하는 대신 파일의 위치를 런타임에 계산한다. jc-hash 알고리즘은 서버 대수 변화에 대해 최소한의 키 이동과 서버 간 밸런스를 유지하는데 매우 유용하다.In the distributed file system according to embodiments of the present invention, instead of storing the location of a file using a "jump-consistent-hash (jc-hash)" algorithm to implement a decentralized model, the location of the file is determined. Compute at runtime. The jc-hash algorithm is very useful for maintaining minimal key movement and server-to-server balance against changes in the number of servers.

그룹(Group)은 완전 분산 모델에서 파일이 저장되는 논리적인 위치를 나타내며, 동일 그룹내의 서버들은 복제(replication)의 대상이 된다.Group represents a logical location where files are stored in a fully distributed model, and servers in the same group are subject to replication.

같은 디렉토리 밑에 있는 파일들을 동일 그룹에 저장하기 위해 파일명을 제외한 디렉토리명을 사용하여 "fnv1a" 해쉬 알고리즘(64비트 해쉬값을 생성하는 경량 해쉬 알고리즘)으로 해쉬값을 얻고, 이 해쉬값과 그룹 개수를 입력으로 jc-hash 알고리즘을 수행하여 최종적으로 파일이 저장될 그룹 인덱스를 얻게 된다. 그룹 인덱스를 얻는 과정은 "Jc-hash(fnv1a(filedir), N) => 0~N-1"와 같이 표현될 수 있다. 여기서, "filedir"는 파일명을 제외한 디렉토리명을, "fnv1a()"는 "fnv1a" 해쉬 알고리즘을, "Jc-hash()"는 jc-hash 알고리즘을 각각 의미할 수 있으며, N개의 그룹들을 위한 0부터 N-1까지의 그룹 인덱스들 중에서 하나의 그룹 인덱스가 얻어질 수 있다.To store files under the same directory in the same group, get the hash value using the "fnv1a" hash algorithm (a lightweight hash algorithm that generates a 64-bit hash value) using the directory name excluding the file name, and calculate the hash value and the number of groups. By executing the jc-hash algorithm as input, we finally get the group index to store the file. The process of obtaining the group index can be expressed as "Jc-hash(fnv1a(filedir), N) => 0~N-1". Here, "filedir" may mean a directory name excluding a file name, "fnv1a()" may mean a "fnv1a" hash algorithm, and "Jc-hash()" may mean a jc-hash algorithm, respectively, for N groups. One of the group indices from 0 to N-1 can be obtained.

2) 역할(Role)2) Role

그룹 내에는 3 대의 서버가 존재할 수 있으며, 이들은 복제의 대상이 될 수 있다. 복제 개수가 3 개를 초과하는 것은 크게 의미가 없다고 판단하였고 단순함을 위해 3개로 고정하였으나, 이에 한정되지는 않는다.There can be three servers in the group, and they can be targeted for replication. It was determined that the number of copies exceeding 3 was not meaningful, and was fixed to 3 for simplicity, but is not limited thereto.

도 2는 본 발명의 일실시예에 있어서, 서버들의 역할의 예를 도시한 도면이다. 도 2에서 M은 마스터 서버를, S는 슬레이브 서버를, F는 피더(feeder)로 설정된 슬레이브 서버를 각각 의미할 수 있다. 이때, "up"은 파일의 업로드를, "down"은 파일의 다운로드를 "change"는 서버들간에 변경될 수 있는 역할에 대해 나타내고 있다. 예를 들어, 피더는 마스터 서버로 그 역할이 변경될 수 있으며, 슬레이브 서버는 마스터 서버나 피더로 그 역할이 변경될 수 있다. 또한, 마스터 서버는 피더로 그 역할이 변경될 수 있다. 이때, 3 대의 서버들은 동시에는 각기 다른 역할을 수행하는데, 그룹에서의 쓰기를 담당하는 마스터 서버(Master, M), 나머지 2 대는 슬레이브 서버(Slave, S)로서 레플리카(Replica)를 저장할 수 있다. 그리고, 리샤딩(Resharding, 또는 리밸런싱(Rebalancing)) 시에 데이터 마이그레이션(data migration)을 수행하는 슬레이브 서버를 피더(Feeder, F)라고 정의할 수 있다.2 is a diagram showing an example of the role of servers in an embodiment of the present invention. In FIG. 2, M may refer to a master server, S to a slave server, and F to a slave server set as a feeder. At this time, "up" indicates uploading of files, "down" indicates downloading of files, and "change" indicates roles that can be changed between servers. For example, a feeder can change its role to a master server, and a slave server can change its role to a master server or feeder. In addition, the role of the master server as a feeder can be changed. At this time, the three servers perform different roles at the same time, and the master server (Master, M) in charge of writing in the group, and the other two servers can store replicas as slave servers (Slave, S). In addition, a slave server performing data migration during resharding (or rebalancing) may be defined as a feeder (F).

서버들의 역할 결정 및 전환은 주키퍼(ZooKeeper)를 사용하여 안전하게 수행될 수 있다.Role determination and switching of servers can be safely performed using ZooKeeper.

1) 복제(Replication)1) Replication

그룹에 파일 저장 요청이 있을 경우, 그룹의 마스터가 해당 요청을 처리할 수 있다. 파일은 로컬 디스크에 저장되며, 빠른 접근을 위해 파일의 저장 위치, 크기 등의 메타 데이터는 메모리에서 관리할 수 있다.If there is a request to save a file in the group, the group's master can process the request. The file is stored on the local disk, and for quick access, metadata such as the location and size of the file can be managed in memory.

도 3은 본 발명의 일실시예에 있어서, 파일의 저장 과정의 예를 도시한 흐름도이다. 클라이언트로부터의 파일 저장 요청에 따라 마스터 서버는 파일을 로컬 저장소에 저장한 후, 슬레이브들(피더와 슬레이브 서버)에 레플리카(파일의 복제본)를 저장하도록 요청할 수 있으며, 1개 이상의 레플리카의 저장이 성공했을 경우, 메타 데이터 저장을 통해 클라이언트의 파일 저장 요청을 성공으로 확정할 수 있다. 만약, 1 개의 레플리카도 저장하지 못했을 경우에는, 파일 저장 요청이 실패로 처리될 수 있으며, 이때 메타 데이터는 저장되지 않으며, 파일은 폐기될 수 있다.3 is a flowchart illustrating an example of a file storage process in an embodiment of the present invention. In response to a request to save a file from a client, the master server can save the file to local storage, and then request that slaves (feeders and slave servers) store replicas (replicas of files), and one or more replicas are successfully saved. If it does, the client's request to save the file can be confirmed as successful by storing the metadata. If even one replica cannot be stored, the request to save the file may be treated as a failure, and at this time, the metadata is not stored, and the file may be discarded.

1) 리샤딩(Resharding, 또는 리밸런싱(Rebalancing))One) Resharding (or rebalancing)

분산 파일 시스템에서 클러스터의 확장과 축소에 따른 키 재배치 작업을 리샤딩(또는 리밸런싱)이라고 한다. 리샤딩은 분산 파일 시스템의 성능과 복잡도에 영향을 미칠 수 있는 부분인데, 본 발명의 실시예들에 따른 분산 파일 시스템에서는 라이브 서비스의 특성을 고려하여 리샤딩이 높은 성능을 유지하면서 복잡하지 않도록 설계되었다.In a distributed file system, the key relocation operation according to the expansion and contraction of a cluster is called resharding (or rebalancing). Resharding is a part that can affect the performance and complexity of a distributed file system. In the distributed file system according to embodiments of the present invention, resharding is designed to maintain high performance and not be complicated in consideration of characteristics of a live service. Became.

라이브 서비스는 실시간 서비스이므로 일정 시간이 지난 후 해당 라이브 데이터는 더 이상 필요하지 않기 때문에 삭제되어야 한다(라이브 파일의 휘발성). 라이브 타임머신 기능(라이브지만 과거 시간으로 이동해서 재생할 수 있는 기능. DVR(Digital Video Recorder) 기능이라고도 함)을 고려했을 때, 라이브 파일의 저장 시간은 1~4 시간 정도이고, 리샤딩 시 마이그레이션의 대상 파일은 최근 4시간 정도로 볼 수 있다.Since the live service is a real-time service, after a certain period of time, the corresponding live data is no longer needed and should be deleted (volatility of the live file). Considering the live time machine function (a function that can be played by moving to the past time in live mode, also called the DVR (Digital Video Recorder) function), the storage time of a live file is about 1 to 4 hours, and is subject to migration when resharding The file can be viewed in the last 4 hours.

도 4는 본 발명의 일실시예에 있어서, 워밍업 과정을 설명하기 위한 예이다. 도 4는 본 발명의 일실시예에 따른 분산 파일 시스템의 서비스 그룹들(G1, G2, G3, G4)에 신규 그룹들(G5, G6)을 추가하고자 하는 상황을 나타내고 있다. 이때, 분산 파일 시스템에서는 마이그레이션을 먼저 수행한 후에 클러스터를 변경할 수 있다. 여기서, 마이그레이션 진행 과정을 워밍업(warming-up)이라고 정의하며, 그룹 내 피더는 복제 요청에 대해 피딩(feeding)을 수행할 수 있다. 피딩은 워밍업 후에 재배치될 파일을 신규 그룹으로 전달하는 과정을 의미할 수 있다. 이러한 워밍업은 분산 파일 시스템에 설정된 시간 동안 수행될 수 있는데, 앞서 설명했듯 이 시간은 실시예에 따라 4시간 정도로 설정될 수 있다. 워밍업이 끝나고 클러스터가 변경되면, 이미 필요한 모든 파일의 재배치가 완료된 상태가 될 수 있다.4 is an example for explaining a warm-up process in an embodiment of the present invention. 4 illustrates a situation in which new groups G5 and G6 are added to service groups G1, G2, G3, and G4 of the distributed file system according to an embodiment of the present invention. At this time, in the distributed file system, migration can be performed first and then the cluster can be changed. Here, the migration process is defined as warming-up, and the feeder in the group can feed the replication request. Feeding may refer to a process of transferring a file to be relocated after warming up to a new group. The warm-up may be performed for a set time in the distributed file system, and as described above, this time may be set to about 4 hours according to an embodiment. When the warm-up is over and the cluster is changed, relocation of all necessary files may be completed.

워밍업은 클러스터의 부하를 모니터링하고 확장을 예측할 수 있을 때 적용할 수 있다. 부득이 하게 클러스터를 급히 확장해야 한다면 워밍업 시간을 줄이거나 생략할 수 있다. 이런 경우, 파일의 재배치가 완전하지 않으므로, 일부 파일에 접근이 불가능할 수 있다. 하지만, 라이브 서비스는 대부분 최신 파일에 접근하므로, 파일 접근 문제는 타임머신 서비스에만 영향을 미친다고 볼 수 있다. 요컨대, 라이브 타임머신 서비스보다 클러스터 확장이 더 시급할 때, 워밍업을 생략할 수도 있다.Warm-up can be applied when monitoring the load of the cluster and predicting the expansion. If you have to inevitably expand the cluster, you can reduce or skip the warm-up time. In this case, since the relocation of files is not complete, some files may not be accessible. However, since the live service mostly accesses the latest files, the file access problem affects only the time machine service. In short, when cluster expansion is more urgent than a live time machine service, warm-up can be omitted.

2) 릴레이 및 리다이렉트(Relay & Redirect) 2) Relay & Redirect

도 5는 본 발명의 일실시예에 있어서, 다이렉트 방식, 릴레이 방식 및 리다이렉트 방식을 설명하기 위한 도면이다. 다이렉트(Direct) 방식은 클라이언트와 그룹(G1)이 직접 요청과 파일을 주고 받는 방식을, 릴레이(Relay) 방식은 클라이언트와 그룹(G2)이 다른 그룹(G1)의 중계를 통해 요청과 파일을 주고 받는 방식을, 리다이렉트 방식은 그룹(G1)이 클라이언트에게 필요한 파일이 저장된 다른 그룹(G2)을 알려주면, 클라이언트가 해당 그룹(G2)과 직접 요청과 파일을 주고 받는 방식을 각각 의미할 수 있다.5 is a view for explaining a direct method, a relay method and a redirect method in one embodiment of the present invention. In the direct method, the client and the group G1 directly send and receive requests and files, and in the relay method, the client and the group G2 send and receive requests and files through relays of other groups G1. The receiving method and the redirect method may mean a method in which the client directly sends and receives a request and a file with the corresponding group G2 when the group G1 informs the client of the other group G2 in which the necessary file is stored.

사용자 요청(PUT, GET 등)이 타겟 그룹(target group, 해당 파일을 관리하는 그룹)에 직접 전달되면 바로 처리(Direct)되겠지만, 다른 그룹으로 요청이 왔을 경우에는 해당 요청을 타켓 그룹으로 전달해야 하는데, 이 때, 릴레이(Relay)와 리다이렉트(Redirect) 방식이 사용될 수 있으며, 본 발명의 실시예들에 따른 분산 파일 시스템에서는 두 방식 모두를 지원할 수 있다.If a user request (PUT, GET, etc.) is delivered directly to a target group (a group that manages the file), it will be processed immediately, but when a request comes to another group, the request must be delivered to the target group , At this time, a relay (Relay) and a redirect (Redirect) method can be used, the distributed file system according to embodiments of the present invention can support both methods.

리다이렉트 방식은 변경된 서버 주소로 사용자가 직접 접속 가능할 경우에만 사용 가능하다. 반면, 릴레이 방식은 사용자가 특정 서버에 직접 접속할 수 없는 상황에서 사용될 수 있다.The redirect method can be used only when the user can directly access the changed server address. On the other hand, the relay method can be used in a situation where a user cannot directly access a specific server.

보통 상업용 서버들은 보안을 위해 사설 네트워크로 구성하고 외부에서는 L4 스위치를 통해 접속하는 사례가 일반적이다. 그렇기 때문에 일반적인 사용자는 릴레이 방식을 사용하면 된다. 만약, 분산 파일 시스템을 사용하는 다른 서비스용 서버가 있다면, 이들 서버들은 분산 파일 시스템과 동일 네트워크로 구성할 수 있고, 이럴 경우, 리다이렉트 방식이 성능면에서 더 나은 선택이 될 수 있다.Typically, commercial servers are configured as a private network for security, and it is common to connect through an L4 switch from the outside. Therefore, the general user can use the relay method. If there are servers for other services using a distributed file system, these servers can be configured on the same network as the distributed file system, and in this case, the redirect method may be a better option in terms of performance.

2) 주키퍼(ZooKeeper)2) ZooKeeper

클러스터 구성, 그룹별 역할, 서버의 실행 상태, 설정 등의 정보 공유를 위해 주키퍼가 사용될 수 있다. 주키퍼는 자체는 리더/팔로워(Leader/Followers) 패턴으로 구현된 중앙 서버 모델인데, 본 발명의 실시예들에 따른 분산 파일 시스템에서는 매우 작은 양의 고정된 정보만을 공유하므로, 주키퍼가 병목이 되지는 않는다.The main keeper can be used to share information such as cluster configuration, group-specific roles, server execution status, and settings. The main keeper itself is a central server model implemented in a leader/follower pattern. In the distributed file system according to embodiments of the present invention, only a very small amount of fixed information is shared, so the main keeper is the bottleneck. Does not work.

도 6은 본 발명의 일실시예에 있어서, 주키퍼가 저장 및 공유하는 정보의 예를 도시한 도면이다. 도 6에 도시된 바와 같이, 주키퍼는 클러스터(cluster), 설정(config), 머신(machine) 및 런타임(runtime)에 대한 정보를 저장할 수 있다.6 is a diagram showing an example of information stored and shared by the main keeper in an embodiment of the present invention. As illustrated in FIG. 6, the main keeper may store information about a cluster, a configuration, a machine, and a runtime.

(a) 클러스터(cluster)(a) cluster

주키퍼는 클러스터를 구성하는 그룹 정보, 그룹을 구성하는 머신 정보, 리샤딩 진행 여부를 저장할 수 있다.The main keeper may store group information constituting the cluster, machine information constituting the group, and whether resharding is in progress.

(b) 설정(config)(b) config

주키퍼는 각 모듈별(common, 코디네이터(coordinator), 파일 매니저(fileManager), group) 설정을 저장할 수 있다.The main keeper can save each module (common, coordinator, file manager, group) settings.

(c) 머신(machine)(c) machine

주키퍼는 분산 파일 시스템을 위한 서버의 실행 여부를 기록하며 중복 실행을 방지하는 목적의 머신을 사용할 수 있다.The main keeper records whether the server for the distributed file system is running and can use a machine for the purpose of preventing duplicate execution.

(d) 런타임(runtime)(d) Runtime

주키퍼는 그룹 내에서 마스터, 피더, 슬레이브를 런타임에 결정할 수 있으며, 또 해당 정보를 그룹 간에 공유하기 위해 사용될 수 있다.The master keeper can determine the master, feeder, and slave at runtime in the group, and can be used to share the information between groups.

2. 파일 관리(File Management)2. File Management

라이브 서비스의 특징으로 작고 많은 수의 파일과 휘발성을 언급했었다. 본 발명의 실시예들에 따른 분산 파일 시스템에서는 병렬 볼륨 파일(Volume File)을 도입하여 작고 많은 파일을 효율적으로 처리할 수 있으며, 만료된 파일의 자동 삭제를 위해 디렉토리 구조에 시간 개념을 도입할 수 있다.As a feature of the live service, we mentioned a small number of files and volatile. In a distributed file system according to embodiments of the present invention, a parallel volume file can be introduced to efficiently process small and many files, and a time concept can be introduced to the directory structure for automatic deletion of expired files. have.

1) 볼륨 파일(Volume File)1) Volume File

파일의 개수가 일정 수준 이상이 되면 운영체제와 분산 파일 시스템의 성능은 저하될 수 있다. 본 발명의 실시예들에 따른 분산 파일 시스템에서는 여러 개의 개별 파일을 하나의 볼륨 파일에 저장함으로써 파일 개수를 줄여 파일 수에 따른 성능 저하 문제를 방지할 수 있다.If the number of files exceeds a certain level, the performance of the operating system and distributed file system may deteriorate. In the distributed file system according to embodiments of the present invention, by storing a plurality of individual files in one volume file, the number of files can be reduced to prevent a performance degradation problem due to the number of files.

도 7은 본 발명의 일실시예에 있어서, 복수의 파일들을 하나의 볼륨 파일에 저장하는 예를 도시한 도면이다. 앞서 설명한 바와 같이, 본 발명의 실시예들에서는 작은 파일들을 빠르게 처리하기 위한 병렬 볼륨 파일을 도입할 수 있다. 예를 들어, 라이브 서비스에 가장 널리 사용되는 HLS(HTTP Live Streaming) 프로토콜에서 720p 라이브 스트림(2Mbps)의 경우 TS 기간(duration)을 3초로 설정했을 때, TS 파일의 크기는 대략 750KB이고, 볼륨 파일의 기본값이 1GB인 경우, 하나의 볼륨 파일에는 약 1300개의 개별 TS 파일이 저장될 수 있다. 이때, 도 7은 복수의 파일들이 저장되는 볼륨 파일의 예를 나타내고 있다. 이때, 분산 파일 시스템이 포함하는 서버들간의 데이터의 이동이 이러한 볼륨 파일 단위로 이루어질 수 있다. 예를 들어, 앞서 설명한 리샤딩을 위한 데이터 마이그레이션이 이러한 볼륨 파일 단위로 이루어짐에 따라 파일 수에 따른 성능 저하 문제를 방지할 수 있다.7 is a diagram illustrating an example of storing a plurality of files in one volume file in one embodiment of the present invention. As described above, embodiments of the present invention may introduce a parallel volume file for quickly processing small files. For example, in the case of 720p live stream (2Mbps) in the HTTP Live Streaming (HLS) protocol most widely used for live service, when the TS duration is set to 3 seconds, the size of the TS file is approximately 750 KB, and the volume file When the default value of is 1 GB, about 1300 individual TS files can be stored in one volume file. 7 shows an example of a volume file in which a plurality of files are stored. At this time, data movement between servers included in the distributed file system may be performed in units of such volume files. For example, as the data migration for resharding described above is performed in units of such volume files, performance degradation due to the number of files can be prevented.

2) 만료(Expiration)2) Expiration

본 발명의 실시예들에 따른 분산 파일 시스템에서는 최대 만료 시간을 설정하여 활용할 수 있고, 모든 파일은 생성 후 최대 만료 시간 이전에 자동으로 삭제될 수 있다. 파일 저장 요청에 만료 시간(TTL)을 설정하면, 해당 파일은 TTL 후에 삭제될 수 있고, TTL을 설정하지 않으면, 해당 파일은 최대 만료 시간이 지난 후에 삭제 처리될 수 있다. 이처럼 파일에 대한 최대 만료 시간 및 만료 시간에 따라 각각의 파일들은 삭제 시각이 결정될 수 있다. 이때, 자동 삭제를 효율적으로 수행하기 위해서 파일을 생존 시간(삭제 시각) 별 디렉토리로 그룹핑하여 저장하고 만료 시간이 지난 그룹의 파일들을 한번에 삭제할 수도 있다.In the distributed file system according to embodiments of the present invention, the maximum expiration time may be set and utilized, and all files may be automatically deleted before the maximum expiration time after creation. If the expiration time (TTL) is set in the file storage request, the file can be deleted after the TTL, and if the TTL is not set, the file can be deleted after the maximum expiration time. As described above, the deletion time of each file may be determined according to the maximum expiration time and expiration time for the file. At this time, in order to efficiently perform automatic deletion, files may be grouped and stored in a directory by survival time (deletion time), and files of a group having expired time may be deleted at once.

도 8은 본 발명의 일실시예에 있어서, 만료 시간이 지난 그룹을 삭제하는 예를 도시한 도면이다. 도 8은 디렉토리 단위로 그룹핑된 파일들을 디렉토리 단위로 한꺼번에 삭제(현재 시간 이전의 만료 시간을 갖는 디렉토리의 파일들을 한꺼번에 삭제)하는 예를 나타내고 있다.8 is a diagram illustrating an example of deleting a group whose expiration time has expired in one embodiment of the present invention. 8 shows an example of deleting files grouped in units of directories in units of directories at once (deleting files in a directory having an expiration time before the current time at a time).

3) 메타 데이터(Meta Data)3) Meta Data

파일은 만료 시간에 따라 디렉토리가 결정될 수 있으며, 저장 시점에 할당된 볼륨 파일의 특정 오프셋에 저장될 수 있다. 파일에 접근하기 위해서는 물리적인 저장 위치를 관리해야 하는데, 이 정보를 메타 데이터(Meta Data)라고 부른다. 빠른 파일 검색을 위해 메타 데이터는 메모리에서 관리될 수 있다. 다만, 장애 극복에 대비해 마스터는 메타 데이터를 레디스(Redis)에 저장하고, 그룹내 서버들은 필요에 따라 레디스에 저장된 메타 데이터를 참조할 수 있도록 한다.The directory may be determined according to the expiration time, and may be stored at a specific offset of the volume file allocated at the time of storage. In order to access a file, a physical storage location needs to be managed, and this information is called meta data. Metadata can be managed in memory for fast file retrieval. However, in order to overcome the obstacle, the master stores the metadata in Redis, and the servers in the group can refer to the metadata stored in Redis as needed.

4) 장애 극복(Failover)4) Failover

도 9는 본 발명의 일실시예에 있어서, 그룹 전용 레디스의 예를 도시한 도면이다. 같은 그룹에 속한 서버들은 평상시와 장애시 모든 상황에 파일 정보를 동일하게 유지해야 한다. 파일 정보(메타 데이터)는 해당 그룹 전용 레디스에 저장되고, 장애 복구시에 레디스를 통해 파일 정보를 최신으로 유지하게 된다. 레디스에 파일 정보를 저장하는 작업은 그룹의 마스터가 수행할 수 있다. 도 9에서는 마스터 서버인 서버 1-1(111-2)이 그룹 1(110)의 파일 정보인 메타 데이터를 레디스 센티널 1(114)에 저장하는 예를 나타내고 있다.9 is a diagram showing an example of a group-only redis in an embodiment of the present invention. Servers belonging to the same group must keep the file information the same in all situations under normal and failure conditions. The file information (meta data) is stored in the corresponding redis for the group, and the file information is kept up-to-date through the redis when a failure is recovered. Storing file information on Redis can be done by the group's master. FIG. 9 shows an example in which the server 1-1 (111-2), which is the master server, stores metadata, which is file information of the group 1 110, in the redis sentinel 1 (114).

본 발명의 실시예들에 따른 분산 파일 시스템은 장애 극복 시에 파일과 메타 데이터를 적극적으로 복구하지 않는다. 그 대신 파일 읽기 요청이 있을 경우에 레디스를 조회하여 해당 파일을 생성한 마스터를 찾고 이 서버로부터 파일과 메타 데이터를 복구할 수 있다. 이런 정책은 라이브 서비스에서는 대부분의 요청이 새로운 파일에 집중된다는 특성을 고려한 것이다. 도 9에서는 슬레이브 서버들인 서버 1-2(112-2) 및 서버 1-3(113-2)이 레디스 센티널 1(114)에 저장된 메타 데이터를 이용할 수 있음을 나타내고 있다. 예를 들어, 서버 1-2(112-2) 및 서버 1-3(113-2)은 장애시 레디스 센티널 1(114)에 저장된 메타 데이터를 이용하여 마스터 서버인 서버 1-1(111-2)를 찾고, 서버 1-1(111-2)로부터 파일과 메타 데이터를 복구할 수 있다.The distributed file system according to embodiments of the present invention does not actively recover files and metadata when a failure is overcome. Instead, if there is a request to read a file, you can query Redis to find the master that created the file and recover files and metadata from this server. This policy takes into account the fact that in live services, most requests are concentrated on new files. In FIG. 9, the slave servers Server 1-2 (112-2) and Server 1-3 (113-2) can use the metadata stored in the Redis Sentinel 1 (114). For example, the server 1-2 (112-2) and the server 1-3 (113-2) use the metadata stored in the redis sentinel 1 (114) in the event of a failure, such as the master server server 1-1 (111-). 2) can find and recover files and metadata from server 1-1 (111-2).

레디스는 그룹 전용으로 사용되므로, 각 그룹마다 별개의 레디스 인스턴스(Redis Instance)가 존재할 수 있다. 레디스에는 그룹의 메타 데이터가 저장될 수 있는데, 레디스 클러스터로 구성하기에는 그 규모가 매우 작은 수준이므로, 작은 규모에서 레디스 클러스터보다 더 나은 성능을 발휘하는 레디스 센티널(Redis Sentinel)을 적용할 수 있다. 레더스 센티널은 리더/팔로워(Leader/Followers) 패턴을 적용하고 있어 리더가 모든 쓰기/읽기 요청을 처리하며, 리더 장애 시 센터널들의 합의로 팔로워들 중에 한 서버를 새로운 리더로 선출하도록 구현될 수 있다.Since Redis is used only for groups, a separate Redis Instance may exist for each group. Redis Sentinel, which can perform better than Redis Cluster at a small scale, is applied because Redis Cluster can store metadata of a group. Can be. Leathers Sentinel applies the leader/follower pattern, so the leader can handle all write/read requests, and can be implemented to elect a server among the followers as a new leader with the agreement of the centers when the leader fails. have.

마스터는 슬레이브들의 강건성(healthiness)를 감지하여 복제 대상을 선정할 수 있다. 반면, 슬레이브는 마스터의 강건성을 감지하여 자신의 고립 여부를 판단할 수 있으며, 고립된 경우 마스터와의 메타 데이터 불일치를 방지하기 위해 자신의 메모리에서 관리하는 메타 데이터를 모두 버릴 수 있다. 이때, 슬레이브는 레디스에 기록된 최신 정보를 읽어와 서버 간에 데이터 불일치가 발생하는 것을 방지할 수 있게 된다. 마스터와 슬레이브들은 가능한 빠르게 서로의 강건성을 감지하기 위해 TCP(Transmission Control Protocol)로 헬스 체크(health check) 세션을 맺고 있을 수 있다. 도 10은 본 발명의 일실시예에 있어서, 서버들간의 헬스 체크 세션의 예를 도시한 도면이다. 도 10에 도시된 바와 같이 마스터 서버는 동일 그룹의 슬레이브 서버와 피더 각각과 TCP(Transmission Control Protocol)로 핑(ping)을 주고 받으면서 헬스 체크 세션을 맺어 다른 서버들의 강건성을 감지할 수 있다.The master can select the replication target by sensing the health of slaves. On the other hand, the slave can determine whether it is isolated by detecting the robustness of the master, and if it is isolated, all metadata managed in its memory can be discarded to prevent metadata mismatch with the master. At this time, the slave reads the latest information recorded in the redis and can prevent data mismatch between servers. Masters and slaves may have a health check session with Transmission Control Protocol (TCP) to detect each other's robustness as quickly as possible. 10 is a diagram illustrating an example of a health check session between servers in an embodiment of the present invention. As shown in FIG. 10, the master server can detect the robustness of other servers by establishing a health check session while exchanging pings with a transmission control protocol (TCP) with each of the slave servers and feeders of the same group.

5) 청크 전송(Chunked Transfer)5) Chunked Transfer

본 발명의 실시예들에 따른 분산 파일 시스템은 "HTTP chunked transfer"를 지원할 수 있다. 청크 전송(Chunked Transfer)은 생성 중인 파일, 즉 크기가 아직 정해지지 않은 파일을 생성된 만큼씩 전송할 수 있는 방식이다. 따라서, 파일이 완전히 생성된 후에 전송하는 것보다 상대적으로 빠르게 파일을 전송할 수 있다. 라이브 서비스에서는 지연 속도(Latency) 가 중요하기 때문에 청크 전송으로 지연 속도를 일정 부분 줄일 수 있게 된다.The distributed file system according to embodiments of the present invention may support "HTTP chunked transfer". The chunked transfer is a method of transmitting a file being created, that is, a file whose size is not yet determined, as much as it is created. Therefore, it is possible to transfer the file relatively quickly after the file is completely generated. Latency is important in live service, so it is possible to reduce the delay part by chunk transmission.

도 11은 본 발명의 일실시예에 있어서, 청크 전송을 위한 볼륨 파일의 예를 도시한 도면이다. 파일 저장을 위해 볼륨 파일을 할당할 때, 고정 크기 파일은 볼륨 파일에서의 시작과 끝 오프셋을 알 수 있으므로 가용한 볼륨 파일 중에 하나가 할당될 수 있다. 여러 개의 고정 크기 파일의 저장 요청이 있을 경우, 하나의 볼륨 파일에는 동시에 파일 쓰기가 진행될 수 있다. 반면, 청크 전송으로 전송되는 파일은 크기를 알 수 없으므로 할당된 볼륨 파일에는 해당 청크 전송이 완료될 때까지 다른 파일을 저장할 수 없다. 만약, 또 다른 파일 저장 요청이 있다면, 청크 전송이 진행 중이지 않은 다른 볼륨 파일, 혹은 새로운 볼륨 파일이 할당될 수 있다.11 is a diagram illustrating an example of a volume file for chunk transmission in an embodiment of the present invention. When allocating a volume file for file storage, a fixed-size file knows the start and end offsets from the volume file, so one of the available volume files can be allocated. When multiple fixed size files are requested to be stored, file writing may be simultaneously performed on one volume file. On the other hand, since the size of the file transmitted by the chunk transmission is unknown, other files cannot be stored in the allocated volume file until the corresponding chunk transmission is completed. If another file storage request is made, another volume file, or a new volume file, in which chunk transmission is not in progress may be allocated.

3. 인터페이스3. Interface

대부분의 분산 파일 시스템들은 프로그래밍 언어별로 클라이언트 SDK(또는 라이브러리(Library))를 제공하고, 사용자는 해당 SDK를 사용하여 파일을 읽고 쓰는 프로그램을 개발할 수 있다. 일부 분산 파일 시스템은 레스트풀 API를 제공하기도 하는데, 레스트풀 API 처리를 담당하는 별도의 서버를 두는 형태이며, 이 서버에서 클라이언트 SDK 를 사용하여 분산 파일 시스템에 접근한다. 클라이언트 SDK에서 분산 파일 시스템의 복잡한 부분을 대신 처리하기 때문에 유용하다고 볼 수 있다. 그렇지만, 클라이언트 SDK를 사용할 때는 배포 이슈와 호환성 이슈를 감안해야 하고, 때론 이런 이슈들이 더 큰 문제가 되기도 한다. 여기서 배포와 호환성 이슈는 수정된 SDK 를 적용하는 주체가 사용자이기 때문에 시스템에서는 SDK 버전을 특정할 수 없고, 따라서 배포된 모든 버전을 지원해야 하는 이슈를 의미할 수 있다.Most distributed file systems provide a client SDK (or library) for each programming language, and users can develop programs to read and write files using the SDK. Some distributed file systems also provide rest pool APIs, which have a separate server in charge of rest pool API processing, and access the distributed file system using a client SDK on this server. This is useful because the client SDK handles the complex parts of the distributed file system instead. However, when using the client SDK, it is necessary to consider distribution issues and compatibility issues, and sometimes these issues become even bigger. Here, the distribution and compatibility issues cannot refer to the SDK version in the system because the subject applying the modified SDK is a user, so it can mean an issue that must support all the deployed versions.

본 발명의 실시예들에 따른 분산 파일 시스템은 레스트풀 API만으로 쉽게 사용할 수 있다. 일반 사용자는 PUT, GET, DELETE API를 이용하여 분산 파일 시스템을 충분히 활용할 수 있다. 또한, CLUSTER API를 사용하면, 분산 파일 시스템의 클러스터 정보를 조회할 수 있고, 파일을 관리하는 서버에 직접 접근하여 좀더 효율적으로 분산 파일 시스템을 사용할 수 있다.The distributed file system according to embodiments of the present invention can be easily used with only the restful API. General users can fully utilize the distributed file system by using PUT, GET, and DELETE APIs. In addition, by using the CLUSTER API, cluster information of a distributed file system can be queried, and a distributed file system can be used more efficiently by directly accessing a server that manages files.

1) APIs1) APIs

(a) PUT / PUT2(a) PUT / PUT2

PUT / PUT2는 분산 파일 시스템에 파일을 저장하기 위한 API일 수 있다. 이미 설명한 바와 같이, 저장하고자 하는 파일에 대해 TTL 옵션을 지정하면, 해당 파일은 지정된 TTL 시간이 경과한 후 분산 파일 시스템에서 자동으로 삭제될 수 있다. PUT는 릴레이 모드일 수 있고, PUT2는 리다이렉트 모드일 수 있다.PUT / PUT2 may be an API for storing files in a distributed file system. As described above, if a TTL option is specified for a file to be stored, the corresponding file may be automatically deleted from the distributed file system after the specified TTL time has elapsed. PUT may be in relay mode, and PUT2 may be in redirect mode.

(b) GET / GET2(b) GET / GET2

GET / GET2는 분산 파일 시스템에 저장된 파일을 읽어오기 위한 API일 수 있다. GET은 릴레이 모드일 수 있고, GET2는 리다이렉트 모드일 수 있다.GET / GET2 may be an API for reading a file stored in a distributed file system. GET may be in relay mode, GET2 may be in redirect mode.

(c) DEL(c) DEL

DEL은 분산 파일 시스템에 저장된 파일을 삭제하기 위한 API일 수 있다. 일례로, 사용자는 DEL을 이용하여 개별 파일을 삭제할 수 있다.DEL may be an API for deleting a file stored in a distributed file system. As an example, the user can delete individual files using DEL.

(d) CLUSTER(d) CLUSTER

CLUSTER는 분산 파일 시스템의 클러스터 정보를 조회하기 위한 API일 수 있다. 사용자는 jc-hash 알고리즘과 "fnv1a" 해쉬 알고리즘을 사용하여 파일의 위치(그룹 인덱스)를 얻어 해당 서버에 직접 접속할 수 있다. 일례로, CLUSTER를 통해 조회되는 "groups" 필드 중 "m1"에는 마스터 서버 정보가, "m2"에는 피더 서버 정보가, "m3"에는 슬레이브 서버 정보가 기록될 수 있다. 도 12는 본 발명의 일실시예에 있어서, CLUSTER를 통해 조회되는 정보의 예를 도시한 도면이다.CLUSTER may be an API for querying cluster information of a distributed file system. The user can access the server directly by obtaining the location (group index) of the file using the jc-hash algorithm and the "fnv1a" hash algorithm. For example, among "groups" fields searched through CLUSTER, master server information may be recorded in "m1", feeder server information in "m2", and slave server information in "m3". 12 is a diagram showing an example of information searched through CLUSTER in an embodiment of the present invention.

2) HTTP 통신2) HTTP communication

본 발명의 실시예들에 따른 분산 파일 시스템은 외부 클라이언트와 HTTP(일례로, v1.0, v1.1) 프로토콜로 통신할 수 있다. 분산 파일 시스템은 서버 간에도 HTTP를 이용할 수 있으며, 별도의 비공개 커맨드(복제(replication), 피딩(feeding), 헬스 체크(health check) 등)를 이용할 수도 있다.The distributed file system according to embodiments of the present invention may communicate with an external client through an HTTP (eg, v1.0, v1.1) protocol. Distributed file systems can also use HTTP between servers, and can also use separate private commands (replication, feeding, health check, etc.).

3. 컴포넌트(Components)3. Components

1) 전체 뷰(Overall View)1) Overall View

도 13은 본 발명의 일실시예에 있어서, 분산 파일 시스템의 컴포넌트들의 예를 도시한 도면이다.13 is a diagram showing an example of components of a distributed file system in an embodiment of the present invention.

코디네이터(Coordinator)는 HTTP를 통해 전달된 사용자 요청(혹은, 분산 파일 시스템의 내부 커맨드)을 분석하고, 클러스터 정보와 그룹에서의 역할을 고려하여 정의된 동작을 수행할 수 있으며, 파일 입출력 명령은 파일 매니저(FileManager)에 위임할 수 있다.The coordinator analyzes user requests (or internal commands of the distributed file system) transmitted through HTTP, and can perform defined operations in consideration of cluster information and roles in groups. You can delegate to the Manager (FileManager).

파일 매니저는 저수준의 파일 저장과 읽기 명령을 처리할 수 있으며, 실제 파일 입출력은 볼륨 매니저(VolumeManager)를 통해 수행될 수 있으며, 메타 데이터는 레디스를 이용한 메타 데이터 매니저(MetadataManager)에서 관리할 수 있다.The file manager can process low-level file storage and read commands, and the actual file input/output can be performed through the volume manager (VolumeManager), and the metadata can be managed in the metadata manager using Redis. .

2) 쓰레드 모델(Thread Model)2) Thread Model

도 14는 본 발명의 일실시예에 있어서, 쓰레드 모델의 예를 도시한 도면이다. 10G NIC의 성능을 최대로 끌어내기 위해서는 프로세서의 모든 코어를 활용해야 하고, 네트워크 IO와 디스크 IO 동안 기다리는 시간을 최소화해야 한다. 본 발명의 실시예들에 따른 분산 파일 시스템은 멀티쓰레드 프로액터 패턴(Multi-threaded Proactor Pattern)을 적용할 수 있으며, 모듈들(HTTP, 코디네이터, 파일 매니저)마다 쓰레드 풀(Thread Pool)을 두어 IO를 비동기적으로 수행하여 멈추는 구간(blocking)이 없도록 구현될 수 있다.14 is a diagram showing an example of a thread model in an embodiment of the present invention. To get the most out of the 10G NIC's performance, you need to utilize all the cores of the processor and minimize the wait time for network IO and disk IO. In the distributed file system according to embodiments of the present invention, a multi-threaded proactor pattern may be applied, and a thread pool is placed for each module (HTTP, coordinator, file manager) to perform IO It can be implemented to perform asynchronously so that there is no blocking.

도 15는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다. 본 실시예에 따른 라이브 서비스를 위한 분산 파일 시스템에 포함되는 머신들(일례로, 일례로, 머신들(111, 112, 113, 121, 122, 123)) 각각은 도 15를 통해 도시된 컴퓨터 장치(1500)에 대응될 수 있으며, 이러한 컴퓨터 장치(1500)를 통해 서버들이 구현될 수 있다. 또한, 일실시예에 따른 파일 관리 방법이 도 15를 통해 도시된 컴퓨터 장치(1500)에 의해 수행될 수 있다.15 is a block diagram showing an example of a computer device according to an embodiment of the present invention. Each of the machines (for example, machines 111, 112, 113, 121, 122, and 123) included in the distributed file system for a live service according to the present embodiment is illustrated in FIG. 15. It may correspond to (1500), it may be implemented through the server server (1500). Also, a file management method according to an embodiment may be performed by the computer device 1500 illustrated through FIG. 15.

이러한 컴퓨터 장치(1500)는 도 15에 도시된 바와 같이, 메모리(1510), 프로세서(1520), 통신 인터페이스(1530) 그리고 입출력 인터페이스(1540)를 포함할 수 있다. 메모리(1510)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 디스크 드라이브와 같은 비소멸성 대용량 기록장치는 메모리(1510)와는 구분되는 별도의 영구 저장 장치로서 컴퓨터 장치(1500)에 포함될 수도 있다. 또한, 메모리(1510)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(1510)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 메모리(1510)로 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 인터페이스(1530)를 통해 메모리(1510)에 로딩될 수도 있다. 예를 들어, 소프트웨어 구성요소들은 네트워크(1560)를 통해 수신되는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 컴퓨터 장치(1500)의 메모리(1510)에 로딩될 수 있다.15, the computer device 1500 may include a memory 1510, a processor 1520, a communication interface 1530, and an input/output interface 1540. The memory 1510 is a computer-readable recording medium, and may include a non-permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Here, a non-destructive large-capacity recording device such as a ROM and a disk drive may be included in the computer device 1500 as a separate permanent storage device separate from the memory 1510. In addition, an operating system and at least one program code may be stored in the memory 1510. These software components may be loaded into the memory 1510 from a computer-readable recording medium separate from the memory 1510. Such a separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, disk, tape, DVD/CD-ROM drive, and memory card. In other embodiments, software components may be loaded into memory 1510 through communication interface 1530 rather than a computer-readable recording medium. For example, software components may be loaded into memory 1510 of computer device 1500 based on a computer program installed by files received over network 1560.

프로세서(1520)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(1510) 또는 통신 인터페이스(1530)에 의해 프로세서(1520)로 제공될 수 있다. 예를 들어 프로세서(1520)는 메모리(1510)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 1520 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to processor 1520 by memory 1510 or communication interface 1530. For example, the processor 1520 may be configured to execute a received command according to program code stored in a recording device such as the memory 1510.

통신 인터페이스(1530)은 네트워크(1560)를 통해 컴퓨터 장치(1500)가 다른 장치(일례로, 앞서 설명한 저장 장치들)와 서로 통신하기 위한 기능을 제공할 수 있다. 일례로, 컴퓨터 장치(1500)의 프로세서(1520)가 메모리(1510)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이나 명령, 데이터, 파일 등이 통신 인터페이스(1530)의 제어에 따라 네트워크(1560)를 통해 다른 장치들로 전달될 수 있다. 역으로, 다른 장치로부터의 신호나 명령, 데이터, 파일 등이 네트워크(1560)를 거쳐 컴퓨터 장치(1500)의 통신 인터페이스(1530)를 통해 컴퓨터 장치(1500)로 수신될 수 있다. 통신 인터페이스(1530)를 통해 수신된 신호나 명령, 데이터 등은 프로세서(1520)나 메모리(1510)로 전달될 수 있고, 파일 등은 컴퓨터 장치(1500)가 더 포함할 수 있는 저장 매체(상술한 영구 저장 장치)로 저장될 수 있다.The communication interface 1530 may provide a function for the computer device 1500 to communicate with other devices (eg, the above-described storage devices) through the network 1560. For example, requests, commands, data, files, etc. generated by the processor 1520 of the computer device 1500 according to program codes stored in a recording device such as the memory 1510 may be transmitted through a network ( 1560). Conversely, signals, commands, data, files, and the like from other devices may be received through the network 1560 to the computer device 1500 through the communication interface 1530 of the computer device 1500. Signals, instructions, data, etc. received through the communication interface 1530 may be transferred to the processor 1520 or the memory 1510, and files and the like may be further stored by the computer device 1500 (described above) Permanent storage device).

입출력 인터페이스(1540)는 입출력 장치(1550)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 마이크, 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 디스플레이, 스피커와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(1540)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 입출력 장치(1550)는 컴퓨터 장치(1500)와 하나의 장치로 구성될 수도 있다.The input/output interface 1540 may be a means for interfacing with the input/output device 1550. For example, the input device may include a device such as a microphone, keyboard or mouse, and the output device may include a device such as a display or speaker. As another example, the input/output interface 1540 may be a means for an interface with a device in which functions for input and output are integrated into one, such as a touch screen. The input/output device 1550 may be configured as a computer device 1500 and one device.

또한, 다른 실시예들에서 컴퓨터 장치(1500)는 도 15의 구성요소들보다 더 적은 혹은 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 장치(1500)는 상술한 입출력 장치(1550) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다.Further, in other embodiments, the computer device 1500 may include fewer or more components than the components in FIG. 15. However, there is no need to clearly show most prior art components. For example, the computer device 1500 may be implemented to include at least a portion of the input/output device 1550 described above, or may further include other components such as a transceiver, database, and the like.

통신 방식은 제한되지 않으며, 네트워크(1560)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망)을 활용하는 통신 방식뿐만 아니라 블루투스(Bluetooth)나 NFC(Near Field Communication)와 같은 근거리 무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크(1560)는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크(1560)는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and a communication method that utilizes a communication network (eg, a mobile communication network, a wired Internet, a wireless Internet, a broadcast network) that the network 1560 may include, as well as Bluetooth or NFC (Near Field Communication). Short-range wireless communications such as can also be included. For example, the network 1560 includes a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , Any one or more of the networks such as the Internet. Further, the network 1560 may include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, etc. It is not limited.

도 16은 본 발명의 일실시예에 따른 파일 관리 방법의 예를 도시한 흐름도이다. 본 실시예에 따른 파일 관리 방법은 일례로 앞서 설명한 컴퓨터 장치(1500)에 의해 수행될 수 있다. 예를 들어, 컴퓨터 장치(1500)의 프로세서(1520)는 메모리(1510)가 포함하는 운영체제의 코드나 적어도 하나의 프로그램의 코드에 따른 제어 명령(instruction)을 실행하도록 구현될 수 있다. 여기서, 프로세서(1520)는 컴퓨터 장치(1500)에 저장된 코드가 제공하는 제어 명령에 따라 컴퓨터 장치(1500)가 도 16의 방법이 포함하는 단계들(1610 내지 1630)을 수행하도록 컴퓨터 장치(1500)를 제어할 수 있다. 이러한 컴퓨터 장치(1500)는 분산 파일 시스템에 포함될 수 있다.16 is a flowchart illustrating an example of a file management method according to an embodiment of the present invention. The file management method according to the present embodiment may be performed by the computer device 1500 described above as an example. For example, the processor 1520 of the computer device 1500 may be implemented to execute control instructions according to code of an operating system included in the memory 1510 or code of at least one program. Here, the processor 1520 is a computer device 1500 so that the computer device 1500 performs steps 1610 to 1630 included in the method of FIG. 16 according to a control command provided by a code stored in the computer device 1500. Can be controlled. The computer device 1500 may be included in a distributed file system.

단계(1610)에서 컴퓨터 장치(1500)는 라이브 서비스와 연관하여 저장 요청되는 파일들 각각에 대해 만료 시간을 설정할 수 있다. 일실시예로, 컴퓨터 장치(1500)는 저장 요청되는 제1 파일의 만료 시간을, 제1 파일에 대응하는 저장 요청에 설정된 만료 시간으로 설정하거나 또는 분산 파일 시스템에 기 설정된 최대 만료 시간으로 설정할 수 있다. 예를 들어, 제1 파일의 저장 요청에 만료 시간이 설정되어 있는 경우, 제1 파일의 만료 시간은 저장 요청에 설정되어 있는 만료 시간으로 설정될 수 있다. 반면, 저장 요청에 만료 시간이 설정되어 있지 않은 경우, 제1 파일의 만료 시간은 분산 파일 시스템에 기 설정되어 있는 최대 만료 시간으로 설정될 수 있다. 앞서 설명한 바와 같이 만료 시간은 해당 파일의 삭제 시각을 결정하는데 활용될 수 있다. 예를 들어, 파일의 저장 시각이 08시 00분이고, 해당 파일에 설정되는 만료 시간이 3시간 30분인 경우, 해당 파일의 삭제 시각은 11시 30분이 될 수 있다.In step 1610, the computer device 1500 may set an expiration time for each of the files requested to be stored in association with the live service. In one embodiment, the computer device 1500 may set the expiration time of the first file requested to be stored, to the expiration time set in the storage request corresponding to the first file, or to the maximum expiration time set in the distributed file system. have. For example, when an expiration time is set in a request to store the first file, the expiration time of the first file may be set to an expiration time set in the storage request. On the other hand, when the expiration time is not set in the storage request, the expiration time of the first file may be set as the maximum expiration time that is previously set in the distributed file system. As described above, the expiration time can be used to determine the deletion time of the corresponding file. For example, if the storage time of the file is 08: 00 and the expiration time set for the file is 3 hours and 30 minutes, the deletion time of the corresponding file may be 11:30.

단계(1620)에서 컴퓨터 장치(1500)는 설정된 만료 시간을 이용하여, 저장 요청되는 파일들을 삭제 시각에 의해 식별되는 복수의 디렉토리로 그룹핑하여 저장할 수 있다. 일실시예로, 컴퓨터 장치(1500)는 저장 요청되는 파일들 각각에 설정된 만료 시간에 따라 파일들 각각의 삭제 시각을 결정하고, 동일한 삭제 시각 또는 기설정된 시간 범위 내의 삭제 시각을 갖는 파일들을 동일한 디렉토리로 저장할 수 있다. 예를 들어, 삭제 시각이 11시 00분인 파일들이 동일한 디렉토리에 저장될 수 있다. 이때, 해당 디렉토리는 삭제 시각 "11시 00분"에 의해 식별될 수 있다. 다른 예로, 삭제 시각이 "11시 01분부터 11시 30분 사이"의 시간 범위에 포함되는 파일들이 동일한 디렉토리에 저장될 수 있다. 이 경우 해당 디렉토리는 상술한 시간 범위에 의해 식별될 수 있다.In operation 1620, the computer device 1500 may group and store files requested to be stored into a plurality of directories identified by a deletion time using the set expiration time. In one embodiment, the computer device 1500 determines the deletion time of each of the files according to the expiration time set for each of the files requested to be stored, and the files having the same deletion time or a deletion time within a preset time range are the same directory. Can be saved as. For example, files with a deletion time of 11:00 may be stored in the same directory. At this time, the directory can be identified by the deletion time "11:00." As another example, files included in a time range in which the deletion time is "between 11:01 and 11:30" may be stored in the same directory. In this case, the directory may be identified by the time range described above.

이때, 파일들을 디렉토리로 그룹핑하여 저장하는 것은 파일들의 논리적인 저장을 의미할 수 있으며, 파일들의 물리적인 저장은 이후 설명되는 단계들(1621 내지 1624)를 통해 이루어질 수 있다. 단계(1625)는 물리적으로 저장된 파일들의 검색을 위한 과정을 설명한다.At this time, grouping and storing files into a directory may mean logical storage of files, and physical storage of files may be performed through steps 1621 to 1624 described later. Step 1625 describes a process for retrieving physically stored files.

단계(1621)에서 컴퓨터 장치(1500)는 복수의 볼륨 파일을 생성 및 저장할 수 있다. 예를 들어, 약 1300개의 개별 TS 파일들이 저장되는 하나의 볼륨 파일을 이미 설명한 바 있다. 이러한 컴퓨터 장치(1500)는 이처럼 각각 다수의 파일들을 저장할 수 있는 복수의 볼륨 파일들을 생성 및 저장할 수 있다.In operation 1621, the computer device 1500 may generate and store a plurality of volume files. For example, a volume file in which about 1300 individual TS files are stored has already been described. The computer device 1500 may generate and store a plurality of volume files, each of which can store multiple files.

단계(1622)에서 컴퓨터 장치(1500)는 저장 요청되는 파일들 각각을 위한 볼륨 파일을 할당할 수 있다.In step 1622, the computer device 1500 may allocate a volume file for each of the files requested to be stored.

단계(1623)에서 컴퓨터 장치(1500)는 저장 요청되는 파일들 각각을 할당된 볼륨 파일의 특정 오프셋에 저장할 수 있다.In operation 1623, the computer device 1500 may store each of the files requested to be stored at a specific offset of the assigned volume file.

파일의 논리적인 저장을 위해, 파일은 만료 시간에 따라 디렉토리가 결정될 수 있는 반면, 파일의 물리적인 저장을 위한 볼륨 파일의 할당은 다양한 방식으로 이루어질 수 있다. 예를 들어, 저장 요청되는 파일들은 하나의 볼륨 파일에 순차적으로 저장되거나 또는 랜덤하게 선택되는 볼륨 파일에 저장될 수 있다. 다른 예로, 저장 요청되는 파일들에 할당될 볼륨 파일들이 순차적으로 선택될 수도 있다.For logical storage of files, a directory may be determined according to an expiration time, while volume file allocation for physical storage of files may be performed in various ways. For example, files requested to be stored may be sequentially stored in one volume file or may be stored in a randomly selected volume file. As another example, volume files to be allocated to files to be stored may be sequentially selected.

단계(1624)에서 컴퓨터 장치(1500)는 특정 오프셋을 포함하는 메타데이터를 메모리(1510)에 저장할 수 있다.In operation 1624, the computer device 1500 may store metadata including a specific offset in the memory 1510.

단계(1625)에서 컴퓨터 장치(1500)는 메모리(1510)에 저장된 메타데이터를 이용하여 복수의 볼륨 파일에 저장된 파일을 검색할 수 있다.In operation 1625, the computer device 1500 may search for files stored in a plurality of volume files using metadata stored in the memory 1510.

이처럼 메모리(1510)에 저장되는 메타데이터는 볼륨 파일에 저장되는 파일들에 대한 빠른 검색을 가능하게 할 수 있다.As such, metadata stored in the memory 1510 may enable fast searching for files stored in the volume file.

한편, 분산 파일 시스템에 포함된 서버들간의 데이터 이동은 이러한 볼륨 파일 단위로 이루어질 수 있다. 이미 설명한 바와 같이 라이브 서비스의 특성 상 파일들은 상대적으로 작고 많으며, 휘발성을 갖는 특징을 포함하고 있다. 이미 설명한 바와 같이, 분산 파일 시스템에서는 파일의 개수가 일정 수준 이상이 되면 운영체제와 분산 파일 시스템의 성능이 저하될 수 있다. 따라서, 본 실시예에서와 같이 여러 개의 개별 파일을 하나의 볼륨 파일에 저장함으로써 파일 개수를 줄일 수 있으며, 이러한 볼륨 파일 단위의 데이터 이동을 통해 파일 수에 따른 성능 저하 문제를 방지할 수 있게 된다.Meanwhile, data movement between servers included in the distributed file system may be performed in units of such volume files. As described above, due to the characteristics of the live service, the files are relatively small, many, and contain volatile characteristics. As described above, in the distributed file system, when the number of files exceeds a certain level, the performance of the operating system and the distributed file system may deteriorate. Therefore, as in the present embodiment, it is possible to reduce the number of files by storing a plurality of individual files in a single volume file, and it is possible to prevent a performance degradation problem due to the number of files by moving data in units of volume files.

단계(1630)에서 컴퓨터 장치(1500)는 현재 시각과 삭제 시각에 기초하여 선택되는 디렉토리에 저장된 파일들을 일괄 삭제할 수 있다. 일례로, 컴퓨터 장치(1500)는 삭제 시각이 현재 시각이거나 현재 시각 이후인 디렉토리에 저장된 파일들을 일괄 삭제할 수 있다.In operation 1630, the computer device 1500 may collectively delete files stored in the selected directory based on the current time and the deletion time. For example, the computer device 1500 may collectively delete files stored in a directory in which the deletion time is the current time or after the current time.

이처럼 본 발명의 실시예들에 따르면, 라이브 서비스에서 저장되는 파일의 휘발성을 고려하여 분산 파일 시스템의 스케일링(scaling) 시 이동되어야 할 대상 파일을 최신 파일로 한정할 수 있다. 또한, 라이브 서비스에서 저장되는 파일의 휘발성을 고려하여 파일을 생존 시간 별로 그룹핑하여 저장하는 디렉토리 구조를 통해 파일의 삭제를 최적화할 수 있다. 또한, 라이브 서비스에서의 상대적으로 작고 많은 파일들을 상대적으로 큰 크기의 병렬 볼륨 파일을 통해 저장함으로써 라이브 서비스가 갖게 되는 작고 많은 파일들을 보다 빠르게 처리할 수 있다.As described above, according to embodiments of the present invention, in consideration of the volatility of a file stored in a live service, a target file to be moved during scaling of a distributed file system may be limited to the latest file. Also, considering the volatility of the files stored in the live service, the deletion of the files can be optimized through a directory structure in which the files are grouped and stored by survival time. In addition, by storing relatively small and many files in a live service through a relatively large sized parallel volume file, it is possible to process small and many files that the live service has more quickly.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or device described above may be implemented as a hardware component, or a combination of hardware components and software components. For example, the devices and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors (micro signal processors), microcomputers, field programmable gate arrays (FPGAs). , A programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose computers or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and/or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. Can be embodied in The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The medium may be a computer that continuously stores executable programs or may be temporarily stored for execution or download. In addition, the medium may be various recording means or storage means in the form of a combination of single or several hardware, and is not limited to a medium directly connected to a computer system, but may be distributed on a network. Examples of the medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks, And program instructions including ROM, RAM, flash memory, and the like. In addition, examples of other media include an application store for distributing applications, a site for distributing or distributing various software, and a recording medium or storage medium managed by a server. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by a limited embodiment and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A computer device included in a distributed file system for a live service,
At least one processor implemented to execute readable instructions on the computer device
Including,
By the at least one processor,
Set an expiration time for each of the files requested to be stored in association with the live service,
Using the set expiration time, the files requested to be stored are grouped into a plurality of directories identified by a deletion time and stored,
Batch deletion of files stored in the selected directory based on the current time and the deletion time
Computer device characterized in that.

According to claim 1,
By the at least one processor,
Setting the expiration time of the first file requested to be stored to an expiration time set in the storage request corresponding to the first file, or to a maximum expiration time preset in the distributed file system.
Computer device characterized in that.

According to claim 1,
By the at least one processor,
Determining the deletion time of each of the files according to the expiration time set for each of the files requested to be stored, and storing files having the same deletion time or a deletion time within a preset time range in the same directory
Computer device characterized in that.

According to claim 1,
By the at least one processor,
Batch deletion of files stored in a directory whose deletion time is the current time or after the current time
Computer device characterized in that.

According to claim 1,
By the at least one processor,
Create and store multiple volume files,
Allocate a volume file for each of the files requested to be stored,
Each of the files requested to be stored is stored at a specific offset of the allocated volume file,
Storing metadata including the specific offset in a memory further included by the computer device
Computer device characterized in that.

The method of claim 5,
By the at least one processor,
Retrieving files stored in the plurality of volume files using metadata stored in the memory
Computer device characterized in that.

The method of claim 5,
Data movement between servers included in the distributed file system is performed in units of the volume file
Computer device characterized in that.

A data processing method performed by a computer device included in a distributed file system for a live service,
Setting an expiration time for each of the files requested to be stored in association with the live service by at least one processor included in the computer device;
Storing, by the at least one processor, the files requested to be stored into a plurality of directories identified by a deletion time using the set expiration time; And
Batch deleting files stored in a directory selected based on the current time and the deletion time by the at least one processor
File management method comprising a.

The method of claim 8,
The step of setting the expiration time,
Setting the expiration time of the first file requested to be stored to an expiration time set in the storage request corresponding to the first file, or to a maximum expiration time preset in the distributed file system.
File management method characterized by.

The method of claim 8,
Grouping and storing the plurality of directories,
Determining the deletion time of each of the files according to the expiration time set for each of the files requested to be stored, and storing files having the same deletion time or a deletion time within a preset time range in the same directory
File management method characterized by.

The method of claim 8,
Creating and storing a plurality of volume files;
Allocating a volume file for each of the files requested to be stored;
Storing each of the files requested to be stored at a specific offset of the allocated volume file; And
Storing metadata including the specific offset in a memory further included by the computer device
File management method further comprising a.

The method of claim 11,
Data movement between servers included in the distributed file system is performed in units of the volume file
File management method characterized by.

A computer program stored in a computer readable recording medium in combination with a computer device for executing the method of claim 8 on a computer device.

A computer-readable recording medium on which a computer program for executing the method of any one of claims 8 to 12 is executed on a computer device.