KR102377726B1

KR102377726B1 - Apparatus for controlling reproduction of file in distributed file system and method

Info

Publication number: KR102377726B1
Application number: KR1020150054305A
Authority: KR
Inventors: 차명훈; 김영균; 김홍연; 최완
Original assignee: 한국전자통신연구원
Priority date: 2015-04-17
Filing date: 2015-04-17
Publication date: 2022-03-24
Also published as: KR20160123748A

Abstract

분산 파일 시스템에서 파일 복제 제어 장치 및 방법이 개시된다. 본 발명의
일실시예에 따른 분산 파일 시스템에서 파일 복제 제어 장치는 분산 파일 시스템을 구성하는 서버의 개수에 기반하여 복제 쓰레드들의 개수를 제어하고, 상기 복제 쓰레드들 각각에 파일 식별 정보를 삽입하는 복제 제어부; 및 상기 복제 쓰레드들이 상기 파일 식별 정보에 기반하여 상기 파일에 대한 복제를 수행하는 복제 수행부를 포함한다.Disclosed are an apparatus and method for controlling file replication in a distributed file system. of the present invention
In a distributed file system according to an exemplary embodiment, an apparatus for controlling file replication includes: a replication controller configured to control the number of replication threads based on the number of servers constituting the distributed file system, and insert file identification information into each of the replication threads; and a duplication performing unit for the duplication threads to perform duplication on the file based on the file identification information.

Description

APPARATUS FOR CONTROLLING REPRODUCTION OF FILE IN DISTRIBUTED FILE SYSTEM AND METHOD

본 발명은 분산 파일 시스템에서의 파일을 복제하는 방법에 관한 것으로, 분산 파일 시스템 관리를 위한 다량의 내부 작업을 처리하는데 있어서, 그 작업을 처리하는 작업 쓰레드 수의 동적인 증가/감소에 따라 작업량을 재분배하기 위한 기술에 관한 것이다.The present invention relates to a method of duplicating files in a distributed file system. In processing a large amount of internal work for managing a distributed file system, the amount of work is increased according to the dynamic increase/decrease in the number of work threads that process the work. It is about technology for redistribution.

분산 파일 시스템은 다수의 서버들이 네트워크로 연결되어 있는 환경에서, 메타데이터는 별도로 관리되는 소수의 서버가 관리하고, 데이터는 다수의 데이터 서버에 분산 저장되는 구조이다. 일반적으로 다수의 서버들로 운용되는 분산 파일 시스템에서는 서버의 잦은 고장을 가정하고 시스템이 운영된다. 따라서, 분산파일시스템에 파일이 저장되면 그 원본 파일에 대한 복제본을 자동으로 만들고, 그 파일을 포함하고 있는 데이터 서버가 고장나더라도 여분의 복제본을 가진 서버가 살아 있기만 하면 그 파일의 데이터를 정상적으로 접근할 수 있게 된다.In a distributed file system, in an environment in which a plurality of servers are connected by a network, metadata is managed by a small number of separately managed servers, and data is distributed and stored in a plurality of data servers. In general, in a distributed file system operated by multiple servers, the system is operated assuming frequent server failures. Therefore, when a file is stored in the distributed file system, a copy of the original file is automatically created, and even if the data server containing the file fails, as long as the server with an extra copy is alive, the data of the file can be accessed normally. be able to do

이와 같은 자동 복제 기능을 지원하는 분산 파일 시스템에서, 메타데이터 서버에 복제 큐가 존재하고, 그 복제 큐에는 분산파일시스템에 신규 저장된 파일들의 정보가 삽입되는 환경에서는 다음과 같이 작업이 처리된다. In a distributed file system supporting such an automatic replication function, in an environment where a replication queue exists in the metadata server, and information on files newly stored in the distributed file system is inserted into the replication queue, the job is processed as follows.

실제 복제 작업을 담당하는 다수의 복제 쓰레드들은 복제 큐에 삽입된 파일 정보들 중에서 각 쓰레드가 처리해야 할 파일들의 정보를 가져오며, 그렇게 복제 쓰레드에 이동된 파일 정보들은 복제 큐로부터 제거된다. 복제 쓰레드는 복제 큐로부터 가져온 파일 정보들을 기반으로 그 파일에 대한 복제본을 만드는 작업을 수행하며, 복제본 생성이 완료되면 그 복제 쓰레드로부터 그 파일 정보는 제거된다. A number of replication threads in charge of the actual replication task bring the file information to be processed by each thread from among the file information inserted into the replication queue, and the file information moved to the replication thread is removed from the replication queue. The replication thread creates a copy of the file based on the file information retrieved from the replication queue, and when the copy creation is completed, the file information is removed from the replication thread.

복제 작업을 담당하는 복제 쓰레드의 개수가 많을수록 복제 효율이 올라가기 때문에 분산 파일 시스템의 관리자는 상황에 따라 복제 쓰레드의 개수를 동적으로 증가시키거나 감소시킬 수 있다. 이와 같이 복제 쓰레드 수의 동적 변경이 발생하면, 기존에 복제 쓰레드가 복제큐로부터 가져왔던 파일 정보들이 다시 재배치되어야 한다. 이러한 재배치 작업은 통상적인 분산 시스템 환경에서 노드의 추가, 삭제에 따라 각 노드에 배치되었던 정보들을 변화된 노드 구성에 따라 재배치하는 작업과 유사하다.As the number of replication threads in charge of replication increases, replication efficiency increases. Therefore, the administrator of a distributed file system can dynamically increase or decrease the number of replication threads depending on the situation. In this way, when the number of replication threads dynamically changes, the file information previously brought from the replication queue by the replication thread must be rearranged. This relocation operation is similar to the operation of relocating information placed in each node according to the changed node configuration according to the addition or deletion of nodes in a typical distributed system environment.

한국 공개 특허 제 2011-0057125호는 데이터센터에서 증가하는 클라이언트의 요청을 적합한 호스트에 분배하기 위하여 로드 밸런스의 증설을 허용하며, 로드 밸런스 편입에 따라 부하 분산을 용이하게 위하여 인공지능 컴포넌트를 사용하는 방법을 제시하고 있고, 미국 공개 특허 제 2012-0290582호는 대량의 키 값을 분산 저장 관리하는데 있어서 병목 현상을 줄이기 위한 목적으로, 노드 리스트를 2개 이상 노드에 저장하고, 로드 밸런서를 두어, 라운드 로빈 방식으로 노드 리스트를 선택하는 방법을 제공하고 있다. 그러나, 한국 공개 특허 제 2011-0056125호 및 미국 공개 특허 제 2012-0290582호에서는 분산 파일 시스템에서 복제 대상 자원을 관리, 반납, 재분배함으로써 분산 파일 시스템의 복제 효율을 향상시키는 방법에 관한 내용은 언급하고 있지 않다.Korean Patent Laid-Open Patent No. 2011-0057125 discloses a method of using an artificial intelligence component to facilitate load balancing according to load balance incorporation, allowing the extension of a load balance to distribute an increasing number of client requests in a data center to an appropriate host. and US Patent Publication No. 2012-0290582, for the purpose of reducing the bottleneck in distributed storage and management of a large number of key values, a node list is stored in two or more nodes, a load balancer is placed, and a round robin It provides a method to select a list of nodes in this way. However, Korean Patent Publication No. 2011-0056125 and US Patent Publication No. 2012-0290582 refer to a method for improving replication efficiency of a distributed file system by managing, returning, and redistributing replication target resources in a distributed file system. there is not

따라서, 클라우드 시스템 등 분산 파일 시스템을 이용하는 현재의 추세를 볼 때, 분산 파일 시스템을 구성하는 서버의 개수가 동적으로 변화해도 복제의 효율을 크게 높일 수 있는 기술의 필요성이 대두되고 있다.Accordingly, in view of the current trend of using a distributed file system such as a cloud system, the need for a technology capable of greatly increasing the efficiency of replication even if the number of servers constituting the distributed file system is dynamically changed is emerging.

본 발명의 목적은 분산 파일 시스템을 구성하는 서버 개수의 변동이 있어, 복제 쓰레드가 동적으로 변화할 때, 효율적으로 파일 복제를 수행하게 하는 것이다.An object of the present invention is to efficiently perform file replication when a replication thread dynamically changes due to a change in the number of servers constituting a distributed file system.

또한, 본 발명의 목적은 분산 파일 시스템을 구성하는 서버의 개수의 변동이 있더라도 효율적인 작업 재분배가 가능하게 하는 것이다.Another object of the present invention is to enable efficient job redistribution even if the number of servers constituting the distributed file system is changed.

또한, 본 발명의 목적은 복제 쓰레드의 동적인 변화에도 복제 대상 자원의 관리를 효율적으로 수행하는 것이다.Also, it is an object of the present invention to efficiently manage a replication target resource even when a replication thread is dynamically changed.

상기한 목적을 달성하기 위한 본 발명에 따른 분산 파일 시스템에서의 파일 복제 제어 장치는 분산 파일 시스템을 구성하는 서버의 개수에 기반하여 복제 쓰레드들의 개수를 제어하고, 상기 복제 쓰레드들 각각에 파일 식별 정보를 삽입하는 복제 제어부; 및 상기 복제 쓰레드들이 상기 파일 식별 정보에 기반하여 상기 파일에 대한 복제를 수행하는 복제 수행부를 포함한다.In order to achieve the above object, an apparatus for controlling file replication in a distributed file system according to the present invention controls the number of replication threads based on the number of servers constituting the distributed file system, and provides file identification information to each of the replication threads. Replication control unit to insert; and a duplication performing unit for the duplication threads to perform duplication on the file based on the file identification information.

이 때, 상기 복제 제어부는 복제 쓰레드들의 개수의 변동을 감지하는 감지부; 상기 복제 쓰레드들의 개수의 변동이 감지된 경우, 기존의 복제 쓰레드들에 저장된 파일 식별 정보를 복제큐에 반납하는 반납부; 및 상기 복제큐에 저장된 파일 식별 정보들을 상기 복제 쓰레드들에 삽입하는 삽입부를 포함할 수 있다.In this case, the duplication control unit may include: a sensing unit for detecting a change in the number of duplicate threads; a return unit returning file identification information stored in existing duplicate threads to a copy queue when a change in the number of duplicate threads is detected; and an inserter for inserting the file identification information stored in the copy queue into the copy threads.

이 때, 상기 반납부는 상기 복제큐에 상기 파일 식별 정보를 반납한 이후, 상기 기존의 복제 쓰레드들에 저장된 파일 식별 정보를 삭제할 수 있다.In this case, the return unit may delete the file identification information stored in the existing replication threads after returning the file identification information to the replication queue.

이 때, 상기 삽입부는 상기 복제 쓰레드들에 상기 파일 식별 정보를 삽입한 이후, 상기 복제큐에 저장된 상기 파일 식별 정보를 삭제할 수 있다.In this case, the inserter may delete the file identification information stored in the copy queue after inserting the file identification information into the copy threads.

또한, 본 발명의 일실시예에 따른 분산 파일 시스템에서의 파일 복제 제어 방법은 분산 파일 시스템을 구성하는 서버의 개수에 기반하여 복제 쓰레드들의 개수를 제어하는 단계; 상기 복제 쓰레드들 각각에 파일 식별 정보들의 일부분을 각각 삽입하는 단계; 및 상기 복제 쓰레드들이 상기 파일 식별 정보에 기반하여 상기 파일에 대한 복제를 수행하는 단계를 포함한다.In addition, a file replication control method in a distributed file system according to an embodiment of the present invention includes: controlling the number of replication threads based on the number of servers constituting the distributed file system; inserting a portion of file identification information into each of the replication threads; and performing, by the duplication threads, duplication of the file based on the file identification information.

이 때, 상기 파일 식별 정보들의 일부분을 삽입하는 단계는 복제 쓰레드들의 개수의 변동을 감지하는 단계; 기존의 복제 쓰레드들에 저장된 파일 식별 정보를 복제큐에 반납하는 단계; 및 상기 복제큐에 저장된 파일 식별 정보들을 상기 복제 쓰레드들에 삽입하는 단계를 포함할 수 있다.In this case, the step of inserting a portion of the file identification information includes: detecting a change in the number of duplicate threads; returning file identification information stored in existing replication threads to a replication queue; and inserting the file identification information stored in the copy queue into the copy threads.

이 때, 상기 복제큐에 반납하는 단계는 상기 복제큐에 상기 파일 식별 정보를 반납한 이후, 상기 기존의 복제 쓰레드들에 저장된 파일 식별 정보를 삭제할 수 있다.In this case, the returning to the copy queue may delete the file identification information stored in the existing replication threads after returning the file identification information to the copy queue.

이 때, 상기 복제 쓰레드들에 삽입하는 단계는 상기 복제 쓰레드들에 상기 파일 식별 정보를 삽입한 이후, 상기 복제큐에 저장된 상기 파일 식별 정보를 삭제할 수 있다.In this case, the inserting into the duplicate threads may include deleting the file identification information stored in the copy queue after inserting the file identification information into the duplicate threads.

본 발명은 분산 파일 시스템을 구성하는 서버 개수의 변동이 있어, 복제 쓰레드가 동적으로 변화할 때, 효율적으로 파일 복제를 수행할 수 있다.According to the present invention, when the number of servers constituting the distributed file system varies and the replication thread dynamically changes, file replication can be efficiently performed.

또한, 본 발명은 분산 파일 시스템을 구성하는 서버의 개수의 변동이 있더라도 효율적인 작업 재분배가 가능해지므로, 분산 파일 시스템을 좀 더 효과적으로 이용할 수 있다.In addition, the present invention enables efficient work redistribution even when the number of servers constituting the distributed file system changes, so that the distributed file system can be used more effectively.

또한, 본 발명은 복제 쓰레드의 동적인 변화에도 복제 대상 자원의 관리를 효율적으로 수행할 수 있어, 복제 효율이 향상될 수 있다.In addition, the present invention can efficiently manage the replication target resource even when the replication thread is dynamically changed, so that replication efficiency can be improved.

도 1은 본 발명의 일 실시예에 따른 파일 복제 제어 장치를 나타낸 블록도이다.
도 2는 도 1에 도시된 복제제어부의 일실시예를 나타낸 블록도이다.
도 3은 본 발명의 일실시예에 따른 복제큐, 복제쓰레드를 나타낸 도면이다.
도 4는 본 발명의 일실시예에 따른 파일 복제 제어 방법에서 복제큐와 복쓰레드간의 동작을 나타낸 동작 흐름도이다.
도 5는 본 발명의 일실시예에 따른 파일 복제 제어 방법을 나타낸 동작흐름도이다.
도 6은 본 발명의 다른 실시예에 따른 파일 복제 제어 방법을 나타낸 동작 흐름도이다.1 is a block diagram illustrating an apparatus for controlling file duplication according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating an embodiment of the copy control unit shown in FIG. 1. Referring to FIG.
3 is a diagram illustrating a copy queue and a copy thread according to an embodiment of the present invention.
4 is an operation flowchart illustrating an operation between a copy queue and a copy thread in a file copy control method according to an embodiment of the present invention.
5 is an operation flowchart illustrating a file duplication control method according to an embodiment of the present invention.
6 is an operation flowchart illustrating a file duplication control method according to another embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.The present invention will be described in detail with reference to the accompanying drawings as follows. Here, repeated descriptions, well-known functions that may unnecessarily obscure the gist of the present invention, and detailed descriptions of configurations will be omitted. The embodiments of the present invention are provided in order to more completely explain the present invention to those of ordinary skill in the art. Accordingly, the shapes and sizes of elements in the drawings may be exaggerated for clearer description.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.
Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 파일 복제 제어 장치를 나타낸 블록도이다.1 is a block diagram illustrating an apparatus for controlling file duplication according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 파일 복제 제어 장치는 복제 제어부(110) 및 복제 수행부(120)로 구성되어 있다.Referring to FIG. 1 , an apparatus for controlling file duplication according to an embodiment of the present invention includes a duplication control unit 110 and a duplication performing unit 120 .

복제 제어부(110)는 분산 파일 시스템을 구성하는 서버의 개수에 기반하여 복제 쓰레드들의 개수를 제어하고, 상기 복제 쓰레드들 각각에 파일 식별 정보를 삽입한다.The replication control unit 110 controls the number of replication threads based on the number of servers constituting the distributed file system, and inserts file identification information into each of the replication threads.

이 때, 복제 쓰레드들의 개수를 제어하는 것은 분산 파일 시스템의 서버의 개수에 기반한다. 이는 분산 파일 시스템을 구성하는 서버의 개수가 커질수록 복제 하는 파일들의 개수가 늘어날 수 있고, 이로 인해 복제하는 파일의 개수가 늘어나게 되고, 따라서 복제를 수행하는 복제 쓰레드들의 개수가 늘어나기 때문이다. 반대로, 분산 파일 시스템을 구성하는 서버의 개수가 작아질수록 복제하는 파일들의 개수가 줄어들게 되고, 이로 인해 복제하는 파일의 개수가 작아지게 되고, 따라서 복제를 수행하는 복제 쓰레드의 개수가 줄어들기 때문이다.In this case, controlling the number of replication threads is based on the number of servers in the distributed file system. This is because as the number of servers constituting the distributed file system increases, the number of files to be replicated may increase, which in turn increases the number of files to be replicated, and thus the number of replication threads performing replication. Conversely, as the number of servers constituting the distributed file system decreases, the number of files to be replicated decreases. .

이 때, 복제 제어부(110)가 존재하는 위치는 제한이 없다. 분산 파일 시스템에서 독립된 서버에 존재할 수도 있으며, 다수의 서버에 각각 존재할 수도 있다.At this time, there is no limit to the location where the duplication control unit 110 is located. It may exist on an independent server in a distributed file system, or it may exist on multiple servers, respectively.

이 때, 복제 제어부(110)는 복제 쓰레드들의 개수의 변동을 감지할 수 있다.In this case, the replication control unit 110 may detect a change in the number of replication threads.

이 때, 복제 제어부(110)는 복제 쓰레드들의 개수의 변동이 감지된 경우, 기존의 복제 쓰레드들에 저장된 파일 식별 정보를 복제큐에 반납할 수 있다.In this case, when a change in the number of duplicate threads is detected, the duplication control unit 110 may return the file identification information stored in the existing duplicate threads to the duplication queue.

이 때, 반납된 파일 식별 정보는 복제큐에 저장될 수 있다.In this case, the returned file identification information may be stored in the copy queue.

이 때, 복제큐는 기존의 복제 쓰레드들과 추가된 복제 쓰레드들이 있는 경우, 각각의 복제 쓰레드들이 처리할 수 있는 파일 식별 정보를 각각의 복제 쓰레드들에 삽입할 수 있다. 즉, 복제 쓰레드들의 개수가 늘어나게 되면, 복제큐를 이용하여 다시 작업을 재분배할 수 있다.At this time, when there are existing replication threads and additional replication threads, the replication queue may insert file identification information that can be processed by each replication thread into each replication thread. That is, if the number of replication threads is increased, work can be redistributed again using the replication queue.

또한, 복제 쓰레드들이 복제큐에 파일 식별 정보를 반납한 이후, 기존에 저장되있던 파일 식별 정보를 삭제할 수도 있다.In addition, after the replication threads return the file identification information to the replication queue, the previously stored file identification information may be deleted.

또한, 복제큐가 파일 식별 정보를 다시 복제 쓰레드들에 삽입하고 나서, 기존에 복제큐에 저장된 파일 식별 정보를 삭제할 수도 있다.Also, after the replication queue inserts the file identification information back into the replication threads, the file identification information previously stored in the replication queue may be deleted.

이 때, 복제큐는 기존의 복제 쓰레드들의 개수가 감소한 경우, 각각의 복제 쓰레드들이 처리할 수 있는 파일 식별 정보를 각각의 복제 쓰레드들에 삽입할 수도 있다. 즉, 복제 쓰레드들의 개수가 줄어들게 되면, 복제큐를 이용하여 다시 작업을 재분배할 수도 있다.In this case, when the number of existing duplicate threads is reduced, the duplicate queue may insert file identification information that can be processed by each duplicate thread into each duplicate thread. That is, if the number of replication threads is reduced, the work may be redistributed again using the replication queue.

복제 수행부(120)는 상기 복제 쓰레드들에 삽입된 상기 파일 식별 정보에 기반하여 상기 파일에 대한 복제를 수행한다.
The duplication performing unit 120 duplicates the file based on the file identification information inserted into the duplication threads.

도 2는 도 1에 도시된 복제제어부의 일실시예를 나타낸 블록도이다.FIG. 2 is a block diagram illustrating an embodiment of the copy control unit shown in FIG. 1. Referring to FIG.

도 2를 참조하면 복제제어부(110)는 감지부(210), 반납부(220) 및 삽입부(230)로 구성되어 있다.Referring to FIG. 2 , the duplication control unit 110 includes a sensing unit 210 , a return unit 220 , and an insertion unit 230 .

감지부(210)는 복제 쓰레드들의 개수의 변동을 감지한다.The sensing unit 210 detects a change in the number of duplicate threads.

이 때, 복제 쓰레드들의 개수가 변동하는 것은 분산 파일 시스템의 서버의 개수가 영향을 끼칠 수도 있다. 이는 분산 파일 시스템을 구성하는 서버의 개수가 커질수록 복제 하는 파일들의 개수가 늘어날 수 있고, 이로 인해 복제하는 파일의 개수가 늘어나게 되고, 따라서 복제를 수행하는 복제 쓰레드들의 개수가 늘어나기 때문이다. 반대로, 분산 파일 시스템을 구성하는 서버의 개수가 작아질수록 복제하는 파일들의 개수가 줄어들게 되고, 이로 인해 복제하는 파일의 개수가 작아지게 되고, 따라서 복제를 수행하는 복제 쓰레드의 개수가 줄어들기 때문이다.At this time, the change in the number of replication threads may affect the number of servers in the distributed file system. This is because as the number of servers constituting the distributed file system increases, the number of files to be replicated may increase, which in turn increases the number of files to be replicated, and thus the number of replication threads performing replication. Conversely, as the number of servers constituting the distributed file system decreases, the number of files to be replicated decreases. .

반납부(220)는 감지부(210)에서 개수의 변동이 감지된 경우, 기존의 복제 쓰레드들에 저장된 파일 식별 정보를 복제큐에 반납한다.The return unit 220 returns the file identification information stored in the existing replication threads to the replication queue when a change in the number is detected by the detection unit 210 .

이 때, 기존의 복제 쓰레드들에 저장된 파일 식별 정보는 복제큐에 반납된 이후에 삭제될 수 있다.In this case, the file identification information stored in the existing replication threads may be deleted after being returned to the replication queue.

이 때, 모든 복제 쓰레드들에는 파일 식별 정보가 남아있지 않을 수 있다.In this case, file identification information may not remain in all replication threads.

이 때, 복제큐에 기존의 복제 쓰레드들에 저장된 파일 식별 정보가 모두 존재할 수 있다.In this case, all of the file identification information stored in the existing replication threads may exist in the replication queue.

삽입부(230)는 반납부(220)에서 복제큐에 반납한 파일 식별 정보들을 복제 쓰레드들에 삽입할 수 있다.The insertion unit 230 may insert the file identification information returned to the replication queue by the return unit 220 into the replication threads.

이 때, 복제 쓰레드들에 파일 식별 정보를 삽입할 때, 복제 쓰레드가 처리할 수 있는 파일들의 식별 정보만을 삽입할 수 있다.In this case, when file identification information is inserted into the replication threads, only identification information of files that the replication thread can process may be inserted.

이 때, 복제큐에 반납한 파일 식별 정보들이 복제 쓰레드에 삽입된 이후에 복제큐에 존재하는 파일 식별 정보가 모두 삭제될 수도 있다.At this time, after the file identification information returned to the replication queue is inserted into the replication thread, all of the file identification information present in the replication queue may be deleted.

이 때, 동적으로 복제 쓰레드의 개수가 변할 수도 있다. 이 경우, 도 2에 도시된 감지부(210)에서 다시 복제 쓰레드의 개수의 변동을 인식하고, 반납부(220) 및 삽입부(230)에서 이 과정을 반복할 수도 있다.In this case, the number of replication threads may be dynamically changed. In this case, the sensing unit 210 shown in FIG. 2 may recognize the change in the number of duplicate threads again, and the return unit 220 and the insertion unit 230 may repeat this process.

이 때, 복제큐는 기존의 복제 쓰레드들의 개수가 감소한 경우, 각각의 복제 쓰레드들이 처리할 수 있는 파일 식별 정보를 각각의 복제 쓰레드들에 삽입할 수도 있다. 즉, 복제 쓰레드들의 개수가 줄어들게 되면, 복제큐를 이용하여 다시 작업을 재분배할 수도 있다.
In this case, when the number of existing duplicate threads is reduced, the duplicate queue may insert file identification information that can be processed by each duplicate thread into each duplicate thread. That is, if the number of replication threads is reduced, the work may be redistributed again using the replication queue.

도 3은 본 발명의 일실시예에 따른 복제큐, 복제쓰레드를 나타낸 도면이다.3 is a diagram illustrating a copy queue and a copy thread according to an embodiment of the present invention.

도 3을 참조하면, 도 3은 복제큐(301), 복제쓰레드들(302 내지 305) 및 파일 식별 정보(306 내지 312)로 구성되어 있다.Referring to FIG. 3 , FIG. 3 is composed of a copy queue 301 , duplicate threads 302 to 305 , and file identification information 306 to 312 .

이 때, 복제 큐(301)는 파일 식별 정보들의 집합을 의미한다.In this case, the replication queue 301 means a set of file identification information.

이 때, 복제 큐(301)은 파일 식별 정보(306 내지 312)를 포함하는 데이터임을 도 3에서 확인할 수 있다.At this time, it can be confirmed from FIG. 3 that the replication queue 301 is data including the file identification information 306 to 312 .

이 때, 복제 쓰레드들(302 내지 305)은 복제를 수행하는 흐름의 단위를 의미한다. 본 발명에서는 새로 추가된 파일에 대한 복제를 수행하는 프로세스 각각의 단위는 복제 쓰레드라고 표현하였다.At this time, the replication threads 302 to 305 refer to a unit of flow for performing replication. In the present invention, it is expressed that each unit of a process that performs replication of a newly added file is a replication thread.

이 때, 복제 큐(301)는 복제 쓰레드들(302 내지 305)에 파일 식별 정보를 삽입할 때, 복제 쓰레드가 처리할 수 있는 파일들의 식별 정보만을 삽입할 수 있다.At this time, when the file identification information is inserted into the replication threads 302-305, the replication queue 301 may insert only identification information of files that the replication thread can process.

도 3을 참조하여 설명하면 복제 쓰레드(302)에서 처리할 수 있는 파일 식별 정보 (306, 307)이 있으면, 복제 큐(301)는 복제 쓰레드(302)에 파일 식별 정보(306, 307)을 삽입하게 된다.Referring to FIG. 3 , if there is file identification information 306 and 307 that can be processed by the replication thread 302 , the replication queue 301 inserts the file identification information 306 and 307 into the replication thread 302 . will do

이 때, 복제 쓰레드(303)에서 처리할 수 있는 파일 식별 정보(308)이 있으면, 복제 큐(301)는 복제 쓰레드(303)에 파일 식별 정보(307)을 삽입하게 된다.At this time, if there is file identification information 308 that can be processed by the replication thread 303 , the replication queue 301 inserts the file identification information 307 into the replication thread 303 .

이 때, 복제 쓰레드(304)에서 처리할 수 있는 파일 식별 정보(309, 310)이 있으면, 복제 큐(301)는 복제 쓰레드(304)에 파일 식별 정보(309, 310)를 삽입하게 된다.At this time, if there is file identification information 309 and 310 that can be processed by the replication thread 304 , the replication queue 301 inserts the file identification information 309 and 310 into the replication thread 304 .

이 때, 파일 식별 정보(306 내지 312)는 복제될 파일의 정보를 의미한다.In this case, the file identification information 306 to 312 means information of a file to be copied.

이 때, 파일 식별 정보(306 내지 312)에 포함되는 파일의 정보의 종류는 제한이 없다. 예를 들어, 파일의 확장자명, 파일에 태그가 존재하는 경우 태그 정보 등이 파일 식별 정보에 포함될 수 있음은 자명하다.In this case, the type of information on the file included in the file identification information 306 to 312 is not limited. For example, it is self-evident that the file identification information may include the file extension name and tag information when a tag exists in the file.

이 때, 복제 쓰레드들(302 내지 305)은 동적으로 감소 또는 증가할 수도 있다.At this time, the replication threads 302-305 may be dynamically decreased or increased.

이 때, 복제 쓰레드들(302 내지 305) 각각은 복제 큐(301)로부터 가져온 파일 식별 정보를 쓰레드 단위로 보유할 수도 있다.
In this case, each of the replication threads 302 to 305 may retain file identification information obtained from the replication queue 301 in units of threads.

도 4는 본 발명의 일실시예에 따른 파일 복제 제어 방법에서 복제큐와 복쓰레드간의 동작을 나타낸 동작 흐름도이다.4 is an operation flowchart illustrating an operation between a copy queue and a copy thread in a file copy control method according to an embodiment of the present invention.

도 4를 참조하면, 먼저 복제 쓰레드들의 개수의 변화를 감지한다(S401).Referring to FIG. 4 , a change in the number of duplicate threads is first detected ( S401 ).

이 때, 복제 쓰레드들의 개수가 변화하는 것은 분산 파일 시스템의 서버의 개수에 기반한다. 이는 분산 파일 시스템을 구성하는 서버의 개수가 커질수록 복제 하는 파일들의 개수가 늘어날 수 있고, 이로 인해 복제하는 파일의 개수가 늘어나게 되고, 따라서 복제를 수행하는 복제 쓰레드들의 개수가 늘어나기 때문이다. 반대로, 분산 파일 시스템을 구성하는 서버의 개수가 작아질수록 복제하는 파일들의 개수가 줄어들게 되고, 이로 인해 복제하는 파일의 개수가 작아지게 되고, 따라서 복제를 수행하는 복제 쓰레드의 개수가 줄어들기 때문이다.At this time, the change in the number of replication threads is based on the number of servers in the distributed file system. This is because as the number of servers constituting the distributed file system increases, the number of files to be replicated may increase, which in turn increases the number of files to be replicated, and thus the number of replication threads performing replication. Conversely, as the number of servers constituting the distributed file system decreases, the number of files to be replicated decreases. .

또한, 복제 쓰레드에 저장된 모든 파일 식별 정보의 재삽입을 복제큐에 요청한다(S402).In addition, the replication queue requests re-insertion of all file identification information stored in the replication thread (S402).

또한, 재 삽입 요청된 파일 식별 정보들을 복제큐에 삽입 한다(S403).In addition, the re-insertion requested file identification information is inserted into the copy queue (S403).

또한, 복제 쓰레드에 저장된 파일 식별 정보를 삭제한다(S404).In addition, the file identification information stored in the replication thread is deleted (S404).

이 때, 복제 쓰레드에 저장된 파일 식별 정보를 삭제하는 것은, 복제 큐에서 복제 쓰레드로 다시 파일 식별 정보를 삽입 하기 때문이다.At this time, the file identification information stored in the replication thread is deleted because the file identification information is inserted from the replication queue back to the replication thread.

또한, 파일 식별 정보를 복제 쓰레드들에 재삽입 할 수 있는 상태로 전환한다(S405).Also, the file identification information is switched to a state in which it can be re-inserted into duplicate threads (S405).

또한, 복제 큐로부터 파일 식별 정보를 회수한다(S406).Also, file identification information is retrieved from the copy queue (S406).

도 5는 본 발명의 일실시예에 따른 파일 복제 제어 방법을 나타낸 동작흐름도이다.5 is an operation flowchart illustrating a file duplication control method according to an embodiment of the present invention.

도 5를 참조하면, 먼저, 저장된 파일에 대한 식별 정보가 복제큐에 삽입된다(S501).Referring to FIG. 5 , first, identification information for a stored file is inserted into the copy queue (S501).

또한, 복제 쓰레드가 처리 가능한 파일 식별 정보를 회수한다(S502).In addition, file identification information that the replication thread can process is retrieved (S502).

또한, 복제 큐 내부에서 회수해간 파일 식별 정보를 삭제한다(S503).In addition, the file identification information retrieved from the replication queue is deleted (S503).

또한, 파일 식별 정보를 복제 쓰레드 내부에 삽입한다(S504).In addition, the file identification information is inserted into the replication thread (S504).

또한, 파일 식별 정보에 기반하여 파일 복제 명령을 송신한다(S505).In addition, a file copy command is transmitted based on the file identification information (S505).

또한, 파일 복제 명령을 수신하고(S506), 파일 복제 후 원본 파일이 존재하는 서버 이외의 서버에 복제된 파일을 저장한다(S507).In addition, the file duplication command is received (S506), and the duplicated file is stored in a server other than the server in which the original file exists after the file is duplicated (S507).

또한, 복제가 완료된 복제 쓰레드 내부의 파일의 식별 정보를 삭제한다(S508).In addition, the identification information of the file inside the replication thread that has been copied is deleted (S508).

이 때, 모든 복제 쓰레드들에는 파일 식별 정보가 남아있지 않을 수 있다.
In this case, file identification information may not remain in all replication threads.

도 6은 본 발명의 다른 실시예에 따른 파일 복제 제어 방법을 나타낸 동작 흐름도이다.6 is an operation flowchart illustrating a file duplication control method according to another embodiment of the present invention.

도 6을 참조하면, 먼저, 복제 쓰레드들의 개수를 제어한다(S601).Referring to FIG. 6 , first, the number of duplicate threads is controlled ( S601 ).

또한, 복제 쓰레드들 각각에 파일 식별 정보를 삽입한다(S602).In addition, file identification information is inserted into each of the duplicate threads (S602).

이 때, 복제 쓰레드들의 개수의 변동이 감지된 경우, 기존의 복제 쓰레드들에 저장된 파일 식별 정보를 복제큐에 반납할 수 있다.In this case, when a change in the number of duplicate threads is detected, file identification information stored in existing duplicate threads may be returned to the duplicate queue.

이 때, 복제큐는 기존의 복제 쓰레드들과 추가된 복제 쓰레드들이 있는 경우, 각각의 복제 쓰레드들이 처리할 수 있는 파일 식별 정보를 각각의 복제 쓰레드들에 삽입할 수 있다. 즉, 복제 쓰레드들의 개수가 늘어나게 되면, 복제큐를 이용하여 다시 작업을 재분배할 수 있다.At this time, when there are existing replication threads and additional replication threads, the replication queue may insert file identification information that can be processed by each replication thread into each replication thread. That is, if the number of replication threads increases, the work can be redistributed again using the replication queue.

또한, 파일 식별 정보에 기반한 파일 복제를 수행한다(S603).
In addition, file duplication is performed based on the file identification information (S603).

이상에서와 같이 본 발명에 따른 분산 파일 시스템에서의 파일 복제 제어 장치 및 방법은 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, in the apparatus and method for controlling file duplication in a distributed file system according to the present invention, the configuration and method of the embodiments described above are not limitedly applicable, but various modifications may be made to the embodiments. All or part of each embodiment may be selectively combined and configured.

301: 복제 큐
302, 303, 304, 305: 복제 쓰레드
306, 307, 308, 309, 310, 311, 312: 파일 식별 정보301: replication queue
302, 303, 304, 305: replication thread
306, 307, 308, 309, 310, 311, 312: file identification information

Claims

a replication control unit that controls the number of replication threads based on the number of servers constituting the distributed file system, and inserts file identification information into each of the replication threads using a replication queue; and
A replication execution unit for the replication threads to perform replication on the file based on the file identification information
including,
The replication control unit
A change in the number of the duplicate threads is detected based on the increase or decrease in the number of servers of the distributed file system in proportion to the number of the file identification information corresponding to the file to be duplicated, and when the change in the number of duplicate threads is detected, the existing A file duplication control apparatus, characterized in that after returning the file identification information stored in the duplication threads of the duplication queue to the duplication queue, a task redistribution process of reinserting the file identifier information stored in the duplication queue into the changed duplication threads is performed.

The method according to claim 1,
The replication control unit,
a sensing unit for detecting a change in the number of duplicate threads;
a return unit returning file identification information stored in existing duplicate threads to a copy queue when a change in the number of duplicate threads is detected; and
an insertion unit for inserting the file identification information stored in the copy queue into the changed copy threads;
File duplication control device comprising a.

3. The method according to claim 2,
The return unit,
After returning the file identification information to the replication queue, the file replication control apparatus of claim 1, wherein the file identification information stored in the existing replication threads is deleted.

3. The method according to claim 2,
The insert part,
After reinserting the file identification information into the changed copy threads, the file identification information stored in the copy queue is deleted.

controlling the number of replication threads based on the number of servers constituting the distributed file system;
inserting a portion of file identification information into each of the replication threads; and
performing, by the duplication threads, duplication of the file based on the file identification information;
including,
The step of inserting a portion of the file identification information is
A change in the number of the duplicate threads is detected based on the increase or decrease in the number of servers of the distributed file system in proportion to the number of the file identification information corresponding to the file to be duplicated, and when the change in the number of duplicate threads is detected, the existing A file replication control method, characterized in that after returning the file identification information stored in the replication threads of the replication queue to the replication queue, a task redistribution process of reinserting the file identification information stored in the replication queue into the changed replication threads is performed.

delete

6. The method of claim 5,
In the process of redistribution of work
After returning the file identification information to the replication queue, the file replication control method according to claim 1, wherein the file identification information stored in the existing replication threads is deleted.

6. The method of claim 5,
In the process of redistribution of work
After re-inserting the file identification information into the changed replication threads, the file identification information stored in the replication queue is deleted.