KR20130133989A

KR20130133989A - System and method for parallel file transfer between file storage clusters

Info

Publication number: KR20130133989A
Application number: KR1020120057168A
Authority: KR
Inventors: 허제민; 김광진
Original assignee: 삼성에스디에스 주식회사
Priority date: 2012-05-30
Filing date: 2012-05-30
Publication date: 2013-12-10
Also published as: KR101901266B1

Abstract

The present invention discloses a system and method for parallel file transfer between file storage clusters. The parallel file transmission system between the file storage clusters according to the embodiment of the present invention can receive the information of electrical transmission object files stored in the first file storage cluster from the master server of the first file storage cluster and also the system for transmitting the preserved file in the first file storage cluster with the second file storage cluster. The invention also includes the file transfer manager assigning the electrical transmission object files to multiple file transmission parts based on the information of the electrical transmission object files, and multiple file transmission parts receiving the electrical transmission object files allocated from the file transfer manager from the first file storage cluster stores the received electrical transmission object files in the second file storage cluster. [Reference numerals] (110) File Transfer Manager;(112) 1DFS client;(114,D1,D2) File Transfer unit;(116,E1,E2) 1DFS client;(118,F1,F2) 2DFS client;(120) Network;(AA) Relay server 1;(BB) Relay server 2;(CC) Relay server N;(GG) 1DFS Network;(II,RR) Slave 2;(JJ,QQ) Slave 1;(KK,PP) Slave N;(L1,L2,L3,U1,U2,U3) Data node;(M1,M2,M3,V1,V2,V3) Disk;(NN,TT) Metadata node;(OO) 2DFS Network

Description

System and method for parallel file transfer between file storage clusters {SYSTEM AND METHOD FOR PARALLEL FILE TRANSFER BETWEEN FILE STORAGE CLUSTERS}

본 발명은 파일 스토리지 클러스터간에 효율적으로 파일을 전송하기 위한 기술과 관련된다.
The present invention relates to techniques for transferring files efficiently between file storage clusters.

파일 스토리지 클러스터(File Storage Cluster)란 다수의 서버들이 클러스터로 묶어 파일 스토리지 서비스를 제공하는 시스템이다. 파일 스토리지 클러스터는 클라이언트로부터 파일의 읽기 또는 쓰기 요청이 있을 경우 이를 클러스터 내 서버들이 병렬로 처리하도록 구성되는 것을 특징으로 한다. 이러한 파일 스토리지 클러스터에는 분산 파일 시스템(DFS; Distributed File System), 오브젝트 스토리지(Object Storage) 등의 시스템이 포함되며, 하둡(Hadoop) HDFS, GlusterFS, Lustre, Swift 등이 대표적이다. 파일 스토리지 클러스터는 수십 페타바이트(Petabyte) 규모의 대용량 데이터를 용이하게 저장하고 관리할 수 있어 빅 데이터(Big Data) 분야에서 널리 이용되고 있다.A file storage cluster is a system in which a plurality of servers are bundled into clusters to provide a file storage service. The file storage cluster is characterized in that the server in the cluster is configured to process in parallel when there is a request to read or write a file from the client. Such file storage clusters include systems such as Distributed File System (DFS) and Object Storage (DFS), including Hadoop HDFS, GlusterFS, Luster, and Swift. File storage clusters are widely used in big data because they can easily store and manage tens of petabytes of large data.

이와 같이 파일 스토리지 클러스터들을 사용하는 엔터프라이즈 시장의 경우, 저장된 데이터의 백업, 파일 스토리지 클러스터의 교체, 저장된 데이터의 분석 등의 이유로 저장된 데이터를 다른 파일 스토리지 클러스터로 전송하여야 할 필요가 있다. 그러나 파일 스토리지 클러스터의 경우 그 특성 상 데이터의 용량이 매우 크므로 일반적인 데이터 이전 방식을 이용할 경우 데이터의 이동에 너무 많은 시간이 소요되게 된다. 이에 따라 파일 스토리지 클러스터 간 데이터 이전 시 이를 효율적으로 지원하기 위한 기술이 필요하게 되었다.
In the enterprise market using file storage clusters as described above, the stored data needs to be transferred to another file storage cluster for reasons such as backup of the stored data, replacement of the file storage cluster, and analysis of the stored data. However, in the case of file storage clusters, the capacity of the data is very large. Therefore, when using a general data migration method, the data movement takes too much time. As a result, there is a need for technology to efficiently support data transfer between file storage clusters.

본 발명의 실시예들은 파일 스토리지 클러스터의 특성을 이용하여 파일 스토리지 클러스터간에 효과적으로 데이터를 전송하기 위한 수단을 제공하는 데 그 목적이 있다.
It is an object of the present invention to provide a means for effectively transferring data between file storage clusters using the characteristics of the file storage cluster.

본 발명의 일 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템은, 제1 파일 스토리지 클러스터에 저장된 파일을 제2 파일 스토리지 클러스터로 전송하기 위한 시스템으로서, 상기 제1 파일 스토리지 클러스터의 마스터 서버로부터 상기 제1 파일 스토리지 클러스터에 저장된 전송 대상 파일들의 정보를 수신하고, 상기 전송 대상 파일의 정보에 근거하여 상기 전송 대상 파일들을 복수 개의 파일 전송부에 할당하는 파일 전송 관리자; 및 상기 파일 전송 관리자로부터 할당된 전송 대상 파일을 상기 제1 파일 스토리지 클러스터로부터 수신하고, 수신된 전송 대상 파일을 상기 제2 파일 스토리지 클러스터에 저장하는 상기 복수 개의 파일 전송부를 포함한다.The parallel file transfer system between file storage clusters according to an exemplary embodiment of the present invention is a system for transferring a file stored in a first file storage cluster to a second file storage cluster, the master file of the first file storage cluster. A file transfer manager that receives information of transfer target files stored in a first file storage cluster and allocates the transfer target files to a plurality of file transfer units based on the transfer target file information; And the plurality of file transfer units that receive a transfer target file allocated from the file transfer manager from the first file storage cluster and store the received transfer target file in the second file storage cluster.

또한, 본 발명의 일 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 방법은, 제1 파일 스토리지 클러스터에 저장된 파일을 제2 파일 스토리지 클러스터로 전송하기 위한 방법으로서, 파일 전송 관리자에서, 상기 제1 파일 스토리지 클러스터의 마스터 서버로부터 상기 제1 파일 스토리지 클러스터에 저장된 전송 대상 파일들의 정보를 수신하는 단계; 상기 파일 전송 관리자에서, 수신된 상기 전송 대상 파일의 정보에 근거하여 상기 전송 대상 파일들을 복수 개의 파일 전송부에 할당하는 단계; 및 복수 개의 파일 전송부에서, 상기 파일 전송 관리자로부터 할당된 전송 대상 파일을 상기 제1 파일 스토리지 클러스터로부터 수신하고, 수신된 전송 대상 파일을 상기 제2 파일 스토리지 클러스터에 저장하는 단계를 포함한다.
In addition, the parallel file transfer method between file storage clusters according to an embodiment of the present invention is a method for transferring a file stored in a first file storage cluster to a second file storage cluster, and in the file transfer manager, the first file. Receiving information of transfer target files stored in the first file storage cluster from a master server of a storage cluster; Assigning, by the file transfer manager, the transfer target files to a plurality of file transfer units based on the received information of the transfer target file; And a plurality of file transfer units, receiving a transfer target file allocated from the file transfer manager from the first file storage cluster and storing the received transfer target file in the second file storage cluster.

본 발명의 실시예들에 다를 경우, 파일 스토리지 클러스터간 파일 전송을 위한 노드의 수를 늘려 줌으로써 병목 현상이 발생하는 것을 방지할 수 있다. 또한 분산파일시스템의 파일 저장 특성을 고려하여 효율적으로 파일을 전송 노드에 분산하여 파일 전송 작업을 병렬로 처리함으로써 파일 전송 시간을 단축할 수 있는 효과가 있다.
According to embodiments of the present invention, the bottleneck may be prevented by increasing the number of nodes for file transfer between file storage clusters. In addition, the file transfer time can be shortened by efficiently distributing the files to the transfer nodes in consideration of the file storage characteristics of the distributed file system and processing the file transfer operations in parallel.

도 1은 본 발명의 제1 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(100)을 설명하기 위한 도면이다.
도 2는 본 발명의 제1 실시예에 따른 제1 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(100)에서의 파일 전송 방법(200)을 도시한 순서도이다.
도 3은 본 발명의 제2 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(300)을 설명하기 위한 도면이다.
도 4는 본 발명의 제3 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(400)을 설명하기 위한 도면이다.
도 5는 본 발명의 제2 실시예 및 제3 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(300, 400)에서의 파일 전송 방법(500)을 도시한 순서도이다.1 is a diagram illustrating a parallel file transfer system 100 between file storage clusters according to a first exemplary embodiment of the present invention.
2 is a flowchart illustrating a file transfer method 200 in a file storage cluster-to-cluster parallel file transfer system 100 according to a first embodiment of the present invention.
3 is a diagram illustrating a parallel file transfer system 300 between file storage clusters according to a second exemplary embodiment of the present invention.
4 is a diagram illustrating a parallel file transfer system 400 between file storage clusters according to a third exemplary embodiment of the present invention.
5 is a flowchart illustrating a file transfer method 500 in the parallel file transfer system 300 and 400 between file storage clusters according to the second and third exemplary embodiments of the present invention.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. However, this is merely an example and the present invention is not limited thereto.

본 발명을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. In the following description, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions of the present invention, and may be changed according to the intention or custom of the user, the operator, and the like. Therefore, the definition should be based on the contents throughout this specification.

본 발명의 기술적 사상은 청구범위에 의해 결정되며, 이하의 실시예는 본 발명의 기술적 사상을 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 효율적으로 설명하기 위한 일 수단일 뿐이다.
The technical idea of the present invention is determined by the claims, and the following embodiments are merely a means for effectively explaining the technical idea of the present invention to a person having ordinary skill in the art to which the present invention belongs.

도 1은 본 발명의 제1 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(100)을 설명하기 위한 도면이다. 도시된 바와 같이, 본 발명의 제1 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(100)은 제1 파일 스토리지 클러스터(102)에 저장된 파일을 제2 파일 스토리지 클러스터(104)로 전송하기 위한 시스템으로서, 파일 중계 장치(106) 및 복수 개의 중계 서버(108)를 포함하며, 상술한 각각의 구성요소들은 네트워크(120)를 통하여 서로 연결되어 데이터를 송수신한다. 본 발명의 실시예에서 네트워크(120)는 원활한 데이터 전송을 위하여 제1 파일 스토리지 클러스터(102) 및 제2 파일 스토리지 클러스터(104) 간에 충분한 네트워크 대역폭을 보장하도록 구성된다. 예를 들어, 도시된 실시에에서 제1 파일 스토리지 클러스터(102)에 슬레이브 서버들이 10대가 있고 각각 네트워크 성능이 1Gbps라면 제1 DFS 네트워크와 제2 DFS 네트워크간에 각각 최소 10Gbps(굵은 선으로 표시)가 보장되어야 최대한의 성능이 나올 수 있다. 1 is a diagram illustrating a parallel file transfer system 100 between file storage clusters according to a first exemplary embodiment of the present invention. As shown, the file storage inter-cluster parallel file transfer system 100 according to the first embodiment of the present invention is configured to transfer files stored in the first file storage cluster 102 to the second file storage cluster 104. As a system, a file relay device 106 and a plurality of relay servers 108 are included, and each of the above-described components are connected to each other through a network 120 to transmit and receive data. In an embodiment of the invention, the network 120 is configured to ensure sufficient network bandwidth between the first file storage cluster 102 and the second file storage cluster 104 for smooth data transfer. For example, in the illustrated embodiment, if there are 10 slave servers in the first file storage cluster 102 and each has 1 Gbps of network performance, then at least 10 Gbps (in bold) between the first and second DFS networks, Guaranteed performance can be achieved.

제1 파일 스토리지 클러스터(102)는 제2 파일 스토리지 클러스터(104)로 전송될 데이터가 저장되어 있는 파일 스토리지 클러스터이다. 제1 파일 스토리지 클러스터(102)는 복수 개의 서버들이 제1 DFS 네트워크를 통해 클러스터로 묶여 파일을 여러 노드에 분산해서 저장하는 파일시스템으로, 예를 들어 하둡(Hadoop)의 HDFS가 있다. The first file storage cluster 102 is a file storage cluster in which data to be transmitted to the second file storage cluster 104 is stored. The first file storage cluster 102 is a file system in which a plurality of servers are clustered through a first DFS network and distributed files are stored in multiple nodes. For example, HDFS of Hadoop.

도시된 바와 같이, 제1 파일 스토리지 클러스터(102)는 하나의 마스터 서버 및 하나 이상의 슬레이브 서버를 포함한다. 마스터 서버는 메타데이터 노드를 포함하며, 상기 메타데이터 노드는 각 슬레이브 서버에 저장되는 파일의 정보를 관리한다. 즉, 파일을 어떠한 슬레이브 서버에 저장할지의 여부는 마스터 서버의 메타데이터 노드에 의하여 결정되며, 파일을 저장하려는 클라이언트는 상기 메타데이터 노드에 의하여 결정된 슬레이브 서버의 데이터 노드를 통해 디스크에 파일을 저장한다. 또한, 반대로 클라이언트에서 파일을 읽을 때에는 클라이언트에서 먼저 메타데이터 노드에서 해당 파일에 대한 정보를 얻은 뒤, 상기 정보에 따라 해당 슬레이브 서버의 데이터 노드를 통하여 디스크에 저장된 파일을 읽어오게 된다. 한편, 도시된 실시예에서는 마스터 서버에 메타데이터 노드가 포함되는 것으로 도시되었으나, 예를 들어 GlusterFS 등의 마스터 서버가 없는 파일 스토리지 클러스터의 경우 메타데이터 노드 또한 구성상으로는 존재하지 않게 된다. 그러나 이 경우에도 시스템 내의 다른 구성요소가 메타데이터 노드와 동일한 기능을 수행하도록 구성되는 바, 이 경우에도 논리적으로는 메타데이터 노드가 존재하는 것으로 보아야 한다.As shown, the first file storage cluster 102 includes one master server and one or more slave servers. The master server includes a metadata node, which manages information of files stored in each slave server. That is, which slave server to store the file is determined by the metadata node of the master server, and the client to store the file stores the file on the disk through the data node of the slave server determined by the metadata node. . In contrast, when a client reads a file, the client first obtains information on the file from the metadata node, and then reads the file stored in the disk through the data node of the slave server according to the information. Meanwhile, in the illustrated embodiment, although the metadata server is included in the master server, in the case of a file storage cluster without a master server such as GlusterFS, the metadata node also does not exist in configuration. However, even in this case, other components in the system are configured to perform the same function as the metadata node. In this case, it is logically regarded that the metadata node exists.

제2 파일 스토리지 클러스터(104)는 제1 파일 스토리지 클러스터(102)로 읽어 들인 데이터가 저장되기 위한 파일 스토리지 클러스터이다. 제2 파일 스토리지 클러스터(104) 또한 제1 파일 스토리지 클러스터(102)와 동일하게 하나의 마스터 서버 및 하나 이상의 슬레이브 서버를 포함한다. 이와 같은 제2 파일 스토리지 클러스터(104)는 제1 파일 스토리지 클러스터(102)와 동일한 종류의 파일 스토리지 클러스터를 사용할 수도 있고, 또는 다른 종류의 파일 스토리지 클러스터를 사용하여 구현될 수도 있다. 즉, 본 발명은 특정한 종류의 파일 스토리지 클러스터에 한정되는 것은 아니다. The second file storage cluster 104 is a file storage cluster for storing data read into the first file storage cluster 102. The second file storage cluster 104 also includes one master server and one or more slave servers in the same manner as the first file storage cluster 102. The second file storage cluster 104 may use the same type of file storage cluster as the first file storage cluster 102 or may be implemented using another type of file storage cluster. In other words, the invention is not limited to any particular type of file storage cluster.

파일 중계 장치(106)는 후술할 복수 개의 중계 서버(108)에서 수행되는 파일 스토리지 클러스터간 파일 전송을 전체적으로 제어하기 위한 장치로서, 파일 전송 관리자(110) 및 제1 DFS 클라이언트(112)를 포함한다.The file relay device 106 is a device for totally controlling file transfer between file storage clusters performed by a plurality of relay servers 108 to be described later, and includes a file transfer manager 110 and a first DFS client 112. .

파일 전송 관리자(110)는 제1 파일 스토리지 클러스터(102)의 마스터 서버의 메타데이터 노드로부터 제1 파일 스토리지 클러스터(102)에 저장된 전송 대상 파일들의 정보를 수신하고, 수신된 상기 전송 대상 파일의 정보에 근거하여 전송 대상 파일들을 복수 개의 중계 서버(108)내의 각 파일 전송부(114)에 할당한다. 또한 파일 전송 관리자(110)는 상기 파일을 할당한 이후에도 지속적으로 중계 서버(108)의 파일 전송작업을 스케줄링하고 모니터링한다.The file transfer manager 110 receives the information of the transfer target files stored in the first file storage cluster 102 from the metadata node of the master server of the first file storage cluster 102, and receives the information of the received transfer target file. Based on this, the transfer target files are assigned to each file transfer unit 114 in the plurality of relay servers 108. In addition, the file transfer manager 110 continuously schedules and monitors the file transfer operation of the relay server 108 even after allocating the file.

상기 전송 대상 파일의 정보는 전송 대상 파일의 제1 파일 스토리지 클러스터(102)에서의 유알엘(URL) 및 파일 크기 정보를 포함한다. 이를 이용하여, 파일 전송 관리자(110)는 각 파일 전송부(114)에 할당되는 전송 대상 파일들의 크기 합이 각 파일 전송부(114) 별로 균등하게 되도록 전송 대상 파일들을 복수 개의 파일 전송부(114)에 할당한다.The information of the transfer target file includes a URL and file size information in the first file storage cluster 102 of the transfer target file. Using this, the file transfer manager 110 transmits the plurality of file transfer units 114 to the transfer target files such that the sum of the sizes of transfer target files allocated to each file transfer unit 114 is equalized for each file transfer unit 114. Is assigned to).

파일 전송 관리자(110)에서의 파일 할당을 예를 들어 설명하면 다음과 같다. 먼저, 제1 파일 스토리지 클러스터(102)에 저장된 각 파일들의 크기가 모두 동일하거나 또는 완전히 동일하지는 않더라도 충분히 그 크기가 유사한 경우를 가정하자. 이 경우에는 개별 파일 들의 파일 크기 정보를 활용하여 전체 용량을 구하고 각 파일 전송부(114)에게 "전체 용량/파일 전송부 개수"에 해당하는 용량 만큼 파일을 할당한다. 예를 들면, 모두 1GB 크기의 파일 100개가 있고 파일 전송부(114)가 10개로 구성될 경우 총 파일 크기는 100GB이므로 파일 전송부(114) 당 10개의 파일 씩을 나누어서 할당 한다. 할당 방법으로는 100개의 파일 URL 리스트를 10개씩 나누어 그 리스트를 파일 전송부(114)에 전달하도록 구성될 수 있다.For example, file allocation in the file transfer manager 110 will be described below. First, suppose that the sizes of the respective files stored in the first file storage cluster 102 are sufficiently similar in size, although they are not all the same or completely the same. In this case, the total capacity is obtained by using file size information of individual files, and the file is allocated to each file transfer unit 114 as much as the capacity corresponding to the total capacity / number of file transfer units. For example, if there are 100 files each having a size of 1 GB and the file transfer unit 114 is composed of 10 files, the total file size is 100 GB, so that 10 files per file transfer unit 114 are divided and allocated. The allocation method may be configured to divide the list of 100 file URLs by 10 and transmit the list to the file transfer unit 114.

이와 달리 제1 파일 스토리지 클러스터(102)에 저장된 각 파일 크기의 편차가 심한 경우, 파일 전송 관리자(110)는, 전송 대상 파일들을 크기에 따라 정렬하고, 정렬된 전송 대상 파일들을 파일 전송부(114)에 순차적으로 할당하되, 특정 회(round)의 파일 할당 순서는 이전 회의 파일 할당 순서와 반대가 되도록 전송 대상 파일들을 할당하게 된다. 이를 좀 더 쉽게 설명하면 다음과 같다.On the contrary, when the size of each file stored in the first file storage cluster 102 is severe, the file transfer manager 110 may arrange the transfer target files according to the size, and arrange the aligned transfer target files in the file transfer unit 114. ), But the file allocation order of a specific round is assigned to the files to be transferred so that the file allocation order of the previous round is reversed. This is explained more easily as follows.

예를 들어, 제1 파일 스토리지 클러스터(102)에 100개의 파일이 존재하고, 각각의 용량은 1MB에서 100MB까지 1MB 간격으로 증가한다고 가정하자. 이를 각각 a부터 j까지의 이름을 가지는 10개의 파일 전송부(114)에 할당하기 위해서는 먼저, 상기 파일들을 파일 크기순으로 내림차순 (또는 오름차순)으로 정렬한다. 이후, 상기 정렬된 파일들을 각 파일 전송부(114)에 순서대로 배치하되, 처음에는 a에서 j의 순서로, 다음에는 j에서 a의 순서로, 그 다음에는 다시 a에서 j의 순서로, 즉 특정 회(round)의 파일 할당 순서가 이전 회의 파일 할당 순서와 반대가 되도록 전송 대상 파일들을 할당한다. 이를 표로 나타내면 다음과 같다.
For example, assume that there are 100 files in the first file storage cluster 102, and that each capacity increases from 1MB to 100MB in 1MB intervals. In order to assign these to ten file transfer units 114 each having a name from a to j, the files are first sorted in descending order (or ascending order) by file size. Then, the sorted files are arranged in each file transfer unit 114 in order, first in the order of a to j, then in the order of j to a, then again in the order of a to j, that is, Allocate files to be transferred so that a file rounding order of a specific round is reversed from the previous file ordering. This is shown in the table below.

파일 전송부 이름File transfer name aa bb cc dd ee ff gg hh ii jj 1One 22 33 44 55 66 77 88 99 1010 2020 1919 1818 1717 1616 1515 1414 1313 1212 1111 2121 2222 2323 2424 2525 2626 2727 2828 2929 3030 4040 3939 3838 3737 3636 3535 3434 3333 3232 3131 4141 4242 4343 4444 4545 4646 4747 4848 4949 5050 6060 5959 5858 5757 5656 5555 5454 5353 5252 5151 6161 6161 6161 6161 6161 6161 6161 6161 6161 6161 8080 7979 7878 7777 7676 7575 7474 7373 7272 7171 8181 8282 8383 8484 8585 8686 8787 8888 8989 9090 100100 9999 9898 9797 9696 9595 9494 9393 9292 9191 파일 전송부 별 사이즈 합Sum of size by file transfer part 505505 504504 503503 502502 501501 500500 499499 498498 497497 496496

이와 같이 파일들을 할당할 경우, 상기 표 1에서 알 수 있는 바와 같이 각 파일 전송부(114)에 할당되는 파일의 크기 합이 거의 비슷해지게 되므로, 특정 파일 전송부(114)의 작업이 지나치게 빨리 끝나거나 또는 지나치게 늦게 끝나는 일이 없이 효율적으로 파일 전송 작업을 수행할 수 있게 된다.When the files are allocated in this way, as shown in Table 1, the sum of the sizes of the files allocated to each file transfer unit 114 is almost the same, so that the operation of the specific file transfer unit 114 ends too quickly. This allows you to perform file transfers efficiently and without ending too late.

제1 DFS 클라이언트(112)는 파일 전송 관리자(110)가 상기 전송 대상 파일들의 정보를 수신하기 위하여 제1 파일 스토리지 클러스터(102)에 접속하기 위해 사용되는 에이전트이다. 즉, 파일 전송 관리자(110)는 제1 DFS 클라이언트(112)를 통하여 제1 파일 스토리지 클러스터(102)의 마스터 서버에 접속하여 상기 전송 대상 파일들의 정보를 수신하게 된다.The first DFS client 112 is an agent used by the file transfer manager 110 to connect to the first file storage cluster 102 to receive information of the transfer target files. That is, the file transfer manager 110 accesses the master server of the first file storage cluster 102 through the first DFS client 112 and receives the information of the transfer target files.

다음으로, 중계 서버(108)는 파일 전송부(114), 제1 DFS 클라이언트(116) 및 제2 DFS 클라이언트(118)를 포함한다. 본 발명의 실시예에서 중계 서버(108)의 개수에는 제한이 없으나, 파일을 전송하려는 파일 스토리지 클러스터의 슬레이브 서버의 개수와 동일한 개수의 중계 서버(108)를 구비하는 것이 바람직하다.Next, the relay server 108 includes a file transfer unit 114, a first DFS client 116, and a second DFS client 118. In the embodiment of the present invention, the number of relay servers 108 is not limited, but it is preferable to have the same number of relay servers 108 as the number of slave servers in a file storage cluster to which files are to be transferred.

파일 전송부(114)는 파일 전송 관리자(110)로부터 할당된 전송 대상 파일을 제1 DFS 클라이언트(116)를 이용하여 제1 파일 스토리지 클러스터(102)로부터 수신하고, 수신된 전송 대상 파일을 제2 DFS 클라이언트(118)를 이용하여 제2 파일 스토리지 클러스터(104)에 저장하며, 파일 전송이 완료되면 이를 파일 전송 관리자(110)에 보고한다.The file transfer unit 114 receives the transfer target file allocated from the file transfer manager 110 from the first file storage cluster 102 using the first DFS client 116, and receives the received transfer target file in a second manner. The file is stored in the second file storage cluster 104 using the DFS client 118, and the file transfer manager 110 reports the file transfer when the file transfer is completed.

제1 DFS 클라이언트(116)는 중계 서버(108)가 전송 대상 파일을 읽기 위하여 제1 파일 스토리지 클러스터(102)에 접속하기 위해 사용되는 에이전트이고, 제2 DFS 클라이언트(118)는 중계 서버(108)가 전송 대상 파일을 저장하기 위하여 제2 파일 스토리지 클러스터(104)에 접속하기 위해 사용되는 에이전트이다.
The first DFS client 116 is an agent used by the relay server 108 to connect to the first file storage cluster 102 to read the transfer destination file, and the second DFS client 118 is the relay server 108. Is the agent used to connect to the second file storage cluster 104 to store the transfer target file.

도 2는 본 발명의 제1 실시예에 따른 제1 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(100)에서의 파일 전송 방법(200)을 도시한 순서도이다.2 is a flowchart illustrating a file transfer method 200 in a file storage cluster-to-cluster parallel file transfer system 100 according to a first embodiment of the present invention.

먼저 파일 전송 관리자(110)가 제1 DFS 클라이언트(112)를 통해 제1 파일 스토리지 클러스터(102)의 마스터 서버의 메타데이터 노드로부터 제1 파일 스토리지 클러스터(102)에 저장된 전송 대상 파일들의 정보를 요청하여 이를 수신한다(202). 전술한 바와 같이, 상기 전송 대상 파일의 정보는 전송 대상 파일의 제1 파일 스토리지 클러스터(102)에서의 유알엘(URL) 및 파일 크기를 포함한다.First, the file transfer manager 110 requests information of transfer target files stored in the first file storage cluster 102 from the metadata node of the master server of the first file storage cluster 102 through the first DFS client 112. To receive it (202). As described above, the information of the transfer target file includes a URL and a file size in the first file storage cluster 102 of the transfer target file.

만약 제1 파일 스토리지 클러스터(102)가 HDFS 기반일 경우, 파일 전송 관리자(110)는 namenode에서 제공하는 API를 통해 파일 정보를 얻을 수 있다. 또한 GlusterFS와 같이 클라이언트가 POSIX(Portable Operating System Interface)를 지원하여 마운트되는 경우에는 ls와 같은 파일 정보를 얻는 명령어를 사용하여 전송 대상 파일들의 정보를 수신할 수 있다.If the first file storage cluster 102 is HDFS-based, the file transfer manager 110 may obtain file information through an API provided by namenode. In addition, when the client is mounted by supporting a Portable Operating System Interface (POSIX) such as GlusterFS, a command for obtaining file information such as ls may be used to receive information about files to be transferred.

다음으로, 파일 전송 관리자(110)가 수신된 상기 전송 대상 파일의 정보에 근거하여 전송 대상 파일들을 복수 개의 파일 전송부(114)에 할당한다(204). 이때, 파일 전송 관리자(110)는 각 파일 전송부(114)에 할당되는 전송 대상 파일들의 크기 합이 각 파일 전송부(114)별로 균등하게 되도록 전송 대상 파일들을 복수 개의 파일 전송부(114)에 할당할 수 있다. 상기 파일 할당 방법에 대해서는 도 1에서 상세히 설명하였으므로, 여기서는 그 상세한 설명을 생략한다.Next, the file transfer manager 110 allocates the transfer target files to the plurality of file transfer units 114 based on the received information of the transfer target file (204). At this time, the file transfer manager 110 transmits the transfer target files to the plurality of file transfer units 114 such that the sum of sizes of transfer target files allocated to each file transfer unit 114 is equalized for each file transfer unit 114. Can be assigned. Since the file allocation method has been described in detail with reference to FIG. 1, a detailed description thereof will be omitted.

다음으로, 각각의 파일 전송부(114)는 제1 DFS 클라이언트(116)를 통해 파일 전송 관리자(110)로부터 할당된 전송 대상 파일을 제1 파일 스토리지 클러스터(102)로부터 수신한다(206). 이때, 제1 DFS 클라이언트(116)는 메타데이터 노드를 통해 실제 파일 데이터가 저장되어 있는 슬레이브 서버의 정보를 얻은 뒤, 실제 데이터가 저장되어 있는 슬레이브 서버의 데이터노드를 통해 전송 대상 파일을 읽게 된다. 상기 206 단계 및 후술할 208, 210 단계는 복수 개의 파일 전송부(114)를 통하여 병렬로 수행된다.Next, each file transfer unit 114 receives a transfer target file allocated from the file transfer manager 110 from the first file storage cluster 102 through the first DFS client 116 (206). At this time, the first DFS client 116 obtains the information of the slave server in which the actual file data is stored through the metadata node, and then reads the transmission target file through the data node of the slave server in which the actual data is stored. Step 206 and steps 208 and 210 to be described later are performed in parallel through a plurality of file transfer units 114.

다음으로, 각각의 파일 전송부(114)가 제2 파일 스토리지 클러스터(104)에 수신된 상기 전송 대상 파일을 저장할 URL의 할당을 제2 파일 스토리지 클러스터(104)의 메타데이터 노드에 요청한다(208). 만약 제1 파일 스토리지 클러스터(102) 및 제2 파일 스토리지 클러스터(104)가 동일한 종류의 파일 스토리지 클러스터이어서 제1 파일 스토리지 클러스터(102)에서의 URL을 그대로 사용할 수 있는 경우에는 본 단계는 생략될 수 있다. 그러나 이기종의 파일 스토리지 클러스터 간에는 URL 표기법이 다를 수 있으므로, 이 경우에는 제2 파일 스토리지 클러스터(104)에 새로운 URL 할당을 요청하여야 한다.Next, each file transmitter 114 requests the metadata node of the second file storage cluster 104 to allocate a URL for storing the received file to be transmitted to the second file storage cluster 104 (208). ). If the first file storage cluster 102 and the second file storage cluster 104 are file storage clusters of the same type so that the URL of the first file storage cluster 102 can be used as it is, the step may be omitted. have. However, since the URL notation may be different between heterogeneous file storage clusters, in this case, a new URL allocation should be requested to the second file storage cluster 104.

다음으로, 각각의 파일 전송부(114)는 수신한 URL이 사용 가능한지의 여부를 판단하고, 사용 가능한 경우 해당 URL을 이용하여 수신된 전송 대상 파일을 제2 DFS 클라이언트(118)를 통해 제2 파일 스토리지 클러스터(104)에 저장한다(210). 이때 제2 파일 스토리지 클러스터(104)가 스트라이핑 방식으로 저장할 경우에는 하나의 파일이 복수 개의 슬레이브 서버에 그룹(chunk) 단위로 나뉘어 저장되며, 분산 방식으로 저장할 경우에는 지정된 슬레이브 서버에 파일 단위로 저장된다.Next, each file transmission unit 114 determines whether or not the received URL is available, and if available, the second file through the second DFS client 118 using the URL to be received. Stored in storage cluster 104 (210). In this case, when the second file storage cluster 104 stores a striping method, one file is stored in groups of a plurality of slave servers in chunks, and when the second file storage cluster 104 is stored in a distributed manner, the second file storage cluster 104 is stored in file units on a designated slave server. .

다음으로, 각각의 파일 전송부(114)는 전송할 파일이 남아있는지를 판단하고(212), 남아있는 파일이 존재하는 경우 206 내지 210 단계를 반복 수행한다.Next, each file transmitter 114 determines whether a file to be transmitted remains (212), and if the remaining file exists, repeats steps 206 to 210.

만약 전송할 파일이 남아있지 않은 경우, 각각의 파일 전송부(114)는 파일 전송 결과를 파일 전송 관리자(110)에게 보고한다(214). 그러면 파일 전송 관리자(110)는 상기 파일 전송 결과로부터 전송 실패 파일이 존재하는지의 여부를 판단하고(216), 존재하는 경우 204 단계로 돌아가 해당 파일의 재전송을 시도한다. 다만, 무한 루프를 방지하기 위하여 본 단계에서 일정 횟수 이상 파일 전송이 실패할 경우 해당 파일에 대해서는 작업 할당을 중지하는 것이 바람직하다.If no file remains to be transmitted, each file transfer unit 114 reports the file transfer result to the file transfer manager 110 (214). Then, the file transfer manager 110 determines whether or not the transfer failed file exists from the file transfer result (216), and if so, returns to step 204 and attempts to retransmit the file. However, in order to prevent an infinite loop, when a file transfer fails more than a predetermined number of times in this step, it is preferable to stop work assignment for the file.

마지막으로, 전송 실패 파일이 존재하지 않거나 또는 일정 횟수 이상 파일 전송에 실패하여 해당 파일의 전송이 취소된 경우 파일 전송은 종료된다(218).
Finally, if the transfer failed file does not exist or if the transfer of the file is canceled because the file transfer fails for a predetermined number of times, the file transfer is terminated (218).

도 3은 본 발명의 제2 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(300)을 설명하기 위한 도면이다. 도면에서 알 수 있는 바와 같이, 본 실시예의 경우 별도의 중계 서버(108)가 존재하지 않고, 제1 파일 스토리지 클러스터(102)의 각각의 슬레이브 서버와 동일한 하드웨어 내에 중계 서버(108)의 각 구성 요소들, 즉 파일 전송부(114), 제1 DFS 클라이언트(116) 및 제2 DFS 클라이언트(118)가 구비된다는 점이 제1 실시예와 상이하다. 즉, 본 실시예의 경우 제1 파일 스토리지 클러스터(102)의 각각의 슬레이브 서버들이 중계 서버의 역할을 수행한다. 이와 같이 구성될 경우 실제 전송 대상 파일은 제1 DFS 네트워크에서 제2 DFS 네트워크로 직접 전달되며, 파일 전송 관리자(110)로는 각 전송 대상 파일의 정보 등의 적은 용량의 데이터만이 전송되므로, 제1 DFS 네트워크 및 제2 DFS 네트워크간에만 충분한 대역폭(도면에서 굵은 선으로 표시)을 확보하면 된다.3 is a diagram illustrating a parallel file transfer system 300 between file storage clusters according to a second exemplary embodiment of the present invention. As can be seen from the figure, in the present embodiment, there is no separate relay server 108, and each component of the relay server 108 is in the same hardware as each slave server of the first file storage cluster 102. For example, the file transfer unit 114, the first DFS client 116 and the second DFS client 118 is different from the first embodiment. That is, in the present embodiment, each slave server of the first file storage cluster 102 serves as a relay server. In this case, the actual transfer target file is directly transferred from the first DFS network to the second DFS network, and since only a small amount of data such as information of each transfer target file is transmitted to the file transfer manager 110, Sufficient bandwidth (indicated by bold lines in the drawing) is required only between the DFS network and the second DFS network.

또한 중계 서버와 슬레이브 서버가 하나의 하드웨어 내에 존재하므로 파일 전송부(114)는 자신과 동일한 하드웨어 내에 존재하는 데이터 노드로부터 파일을 수신할 경우에는 네트워크를 사용하지 않고 바로 서버 내부 통신을 이용하게 되는 바, 네트워크를 사용할 때 보다 훨씬 빠른 속도로 파일을 읽어올 수 있다.
In addition, since the relay server and the slave server exist in one hardware, the file transfer unit 114 directly uses the internal server communication without using a network when receiving a file from a data node existing in the same hardware. This allows you to read files much faster than when using a network.

도 4는 본 발명의 제3 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(400)을 설명하기 위한 도면이다. 도면에서 알 수 있는 바와 같이, 본 실시예의 경우 제1 파일 스토리지 클러스터(102)의 각 노드들, 제2 파일 스토리지 클러스터(104)의 각 노드들 및 복수 개의 중계 서버(108)가 모두 하나의 하드웨어 내에 구비된다. 즉, 본 실시예의 경우 제1 파일 스토리지 클러스터의 하드웨어 자원이 충분할 경우 동일한 하드웨어 내에 제2 파일 스토리지 클러스터(104) 및 중계 서버를 구성하고 내부 통신망을 통하여 제1 파일 스토리지 클러스터(102)의 파일을 제2 파일 스토리지 클러스터(104)로 이동하도록 구성된다.4 is a diagram illustrating a parallel file transfer system 400 between file storage clusters according to a third exemplary embodiment of the present invention. As can be seen in the figure, in this embodiment, each node of the first file storage cluster 102, each node of the second file storage cluster 104, and the plurality of relay servers 108 are all one piece of hardware. It is provided in. That is, in the present exemplary embodiment, when the hardware resources of the first file storage cluster are sufficient, the second file storage cluster 104 and the relay server are configured in the same hardware, and the files of the first file storage cluster 102 are deleted through the internal communication network. Two file storage cluster 104.

본 실시예에 따를 경우 다음과 같은 장점이 있다. 먼저, 제2 파일 스토리지 클러스터(104)를 구성하기 위하여 새로운 하드웨어 자원을 구비할 필요가 없이 기존의 파일 스토리지 클러스터를 그대로 이용할 수 있다.According to this embodiment, there are advantages as follows. First, the existing file storage cluster may be used as it is without having to provide new hardware resources to configure the second file storage cluster 104.

또한, 파일을 읽어 올 제1 파일 스토리지 클러스터(102)의 데이터 노드와 이를 저장할 제2 파일 스토리지 클러스터(104)의 데이터 노드가 동일한 하드웨어 내에 존재할 경우 네트워크를 전혀 이용하지 않고 내부 통신망 만을 통해 데이터 전송이 이루어지므로 앞선 실시예에 비해 훨씬 빠른 속도로 데이터 전송이 가능하다.
In addition, if the data node of the first file storage cluster 102 to read the file and the data node of the second file storage cluster 104 to store the file exist in the same hardware, the data transmission may be performed through the internal communication network without using the network at all. As a result, data transmission is possible at a much higher speed than the previous embodiment.

도 5는 본 발명의 제2 실시예 및 제3 실시예에 따른 파일 스토리지 클러스터간 병렬 파일 전송 시스템(300, 400)에서의 파일 전송 방법(500)을 도시한 순서도이다.5 is a flowchart illustrating a file transfer method 500 in the parallel file transfer system 300 and 400 between file storage clusters according to the second and third exemplary embodiments of the present invention.

먼저 파일 전송 관리자(110)가 제1 DFS 클라이언트(112)를 통해 제1 파일 스토리지 클러스터(102)의 마스터 서버의 메타데이터 노드로부터 제1 파일 스토리지 클러스터(102)에 저장된 전송 대상 파일들의 정보를 요청하여 이를 수신한다(502). First, the file transfer manager 110 requests information of transfer target files stored in the first file storage cluster 102 from the metadata node of the master server of the first file storage cluster 102 through the first DFS client 112. To receive it (502).

이때 상기 전송 대상 파일들의 정보는 제1 파일 스토리지 클러스터(02)의 파일 저장 형태에 따라 2가지 경우로 나뉠 수 있다.At this time, the information of the transfer target files may be divided into two cases according to the file storage type of the first file storage cluster (02).

첫번째로, 파일을 분산(distributed) 방식으로 저장하는 경우이다. 분산 방식은 파일을 쪼개지 않고 하나의 슬레이브 서버에 저장하는 방식이다. 이 경우 파일 전송 관리자(110)가 제1 파일 스토리지 클러스터(102)로부터 수신하는 상기 전송 대상 파일의 정보는 전송 대상 파일의 제1 파일 스토리지 클러스터(102)에서의 유알엘(URL), 파일 크기 이외에 해당 파일이 저장된 슬레이브 서버의 식별 정보를 더 포함한다.First, the file is stored in a distributed manner. The distributed method is to store files in one slave server without splitting them. In this case, the information of the transfer target file that the file transfer manager 110 receives from the first file storage cluster 102 corresponds to a URL other than a URL and a file size of the transfer target file in the first file storage cluster 102. The file further includes identification information of the slave server in which the file is stored.

두번째로, 파일을 스트라이핑(striping) 방식으로 저장하는 경우이다. 스트라이핑 방식이란 파일을 복수 개의 조각(chunk)으로 나누어 이를 복수 개의 슬레이브 서버에 나누어 저장하는 방식을 의미한다. 이 경우 상기 전송 대상 파일의 정보는 전송 대상 파일의 제1 파일 스토리지 클러스터(102)에서의 유알엘(URL), 파일 크기 및 각 전송 대상 파일을 구성하는 각각의 조각(chunk)이 저장된 슬레이브 서버의 식별 정보를 포함한다.Secondly, the file is stored by striping. The striping method refers to a method of dividing a file into a plurality of chunks and storing them in a plurality of slave servers. In this case, the information of the transfer target file may include identification of a slave server in which a URL, a file size, and each chunk constituting each transfer target file are stored in the first file storage cluster 102 of the transfer target file. Contains information.

다음으로, 파일 전송 관리자(110)가 수신된 상기 전송 대상 파일의 정보에 근거하여 전송 대상 파일들을 복수 개의 파일 전송부(114)에 할당한다(504). Next, the file transfer manager 110 allocates the transfer target files to the plurality of file transfer units 114 based on the received information of the transfer target file (504).

먼저, 제1 파일 스토리지 클러스터(102)가 파일을 분산 방식으로 저장하는 경우, 파일 전송 관리자(110)는 상기 전송 대상 파일을 파일 전송부(114)의 개수만큼 복수 개의 그룹으로 분할하고 분할된 각각의 그룹들을 각 파일 전송부(114)에 할당하되, 각각의 그룹에 포함되는 전송 대상 파일들의 크기 합이 그룹 별로 균등하게 되도록 분할할 수 있다. 즉, 상기 정보에 기재된 파일 전송 용량에 따라 전체 용량을 구하고, 각 파일 전송부(114)가 "전체 용량/파일 전송부 개수"에 해당하는 용량만큼 파일을 분배받도록 그룹을 나눌 수 있다.First, when the first file storage cluster 102 stores the files in a distributed manner, the file transfer manager 110 divides the transfer target file into a plurality of groups by the number of file transfer units 114 and each of the divided files. Groups of are allocated to each file transfer unit 114, but may be divided so that the sum of the size of the transfer target files included in each group is equal for each group. That is, the total capacity may be obtained according to the file transfer capacity described in the above information, and each file transfer unit 114 may be divided into groups so that the files are distributed as much as the capacity corresponding to " total capacity / number of file transfer units. &Quot;

또한, 이 경우 파일 전송 관리자(110)는 특정 파일 전송부(114)와 동일한 슬레이브 서버에 저장된 전송 대상 파일을 가장 많이 포함하는 그룹을 특정 파일 전송부(114)에 할당하도록 구성될 수 있다. 즉, 파일 전송부(114)와 동일한 위치의 슬레이브 서버에 저장된 파일이 가장 많을 그룹을 해당 파일 전송부(114)에 최우선적으로 할당하게 된다. Also, in this case, the file transfer manager 110 may be configured to assign a group including the largest number of transfer target files stored in the same slave server as the specific file transfer unit 114 to the specific file transfer unit 114. That is, the group having the most files stored in the slave server at the same location as the file transfer unit 114 is assigned to the file transfer unit 114 first.

한편, 제1 파일 스토리지 클러스터(102)가 파일을 Striping 방식으로 저장하는 경우, 파일 전송 관리자(110)는 분산 방식과 동일하게 상기 전송 대상 파일을 파일 전송부(114)의 개수만큼 복수 개의 그룹으로 분할하고 분할된 각각의 그룹들을 각 파일 전송부(114)에 할당하되, 각각의 그룹에 포함되는 전송 대상 파일들의 크기 합이 그룹 별로 균등하게 되도록 분할할 수 있다. 다만 이 경우 파일 전송 관리자(110)는 특정 파일 전송부(114)와 동일한 슬레이브 서버에 저장된 전송 대상 파일의 조각을 가장 많이 포함하는 그룹을 특정 파일 전송부(114)에 할당하게 된다. 즉, 파일 전송부(114)와 동일한 위치의 슬레이브 서버에 저장된 조각이 가장 많은 그룹을 해당 파일 전송부(114)에 최우선적으로 할당하게 된다. On the other hand, when the first file storage cluster 102 stores the files in a striping manner, the file transfer manager 110 may divide the transfer target files into a plurality of groups as many as the number of the file transfer units 114 in the same manner as in the distributed scheme. Each of the divided groups may be divided and allocated to each file transfer unit 114. The divided files may be divided so that the sum of the size of the transfer target files included in each group is equalized for each group. In this case, however, the file transfer manager 110 assigns the group including the most pieces of the transfer target file stored in the same slave server as the specific file transfer unit 114 to the specific file transfer unit 114. That is, the group having the most fragments stored in the slave server at the same location as the file transfer unit 114 is assigned to the file transfer unit 114 first.

다음으로, 각각의 파일 전송부(114)는 제1 DFS 클라이언트(116)를 통해 파일 전송 관리자(110)로부터 할당된 전송 대상 파일을 제1 파일 스토리지 클러스터(102)로부터 수신한다(506). 이때, 제1 DFS 클라이언트(116)는 메타데이터 노드를 통해 실제 파일 데이터가 저장되어 있는 슬레이브 서버의 정보를 얻은 뒤, 실제 데이터가 저장되어 있는 슬레이브 서버의 데이터 노드를 통해 전송 대상 파일을 읽게 된다. 상기 206 단계 및 후술할 208, 210 단계는 복수 개의 파일 전송부(114)를 통하여 병렬로 수행된다.Next, each file transfer unit 114 receives (506) the transfer target file allocated from the file transfer manager 110 through the first DFS client 116 from the first file storage cluster 102. In this case, the first DFS client 116 obtains the information of the slave server in which the actual file data is stored through the metadata node, and then reads the transmission target file through the data node of the slave server in which the actual data is stored. Step 206 and steps 208 and 210 to be described later are performed in parallel through a plurality of file transfer units 114.

다음으로, 각각의 파일 전송부(114)가 제2 파일 스토리지 클러스터(104)에 수신된 상기 전송 대상 파일을 저장할 URL의 할당을 제2 파일 스토리지 클러스터(104)의 메타데이터 노드에 요청한다(508). 만약 제1 파일 스토리지 클러스터(102) 및 제2 파일 스토리지 클러스터(104)가 동일한 종류의 파일 스토리지 클러스터여서 제1 파일 스토리지 클러스터(102)에서의 URL을 그대로 사용할 수 있는 경우에는 본 단계는 생략될 수 있다. 그러나 이기종의 파일 스토리지 클러스터 간에는 URL 표기법이 다를 수 있으므로, 이 경우에는 제2 파일 스토리지 클러스터(104)에 새로운 URL 할당을 요청하여야 한다.Next, each file transfer unit 114 requests the metadata node of the second file storage cluster 104 to allocate a URL for storing the received file to be transmitted to the second file storage cluster 104 (508). ). If the first file storage cluster 102 and the second file storage cluster 104 are file storage clusters of the same type, and thus the URL of the first file storage cluster 102 can be used as it is, this step may be omitted. have. However, since the URL notation may be different between heterogeneous file storage clusters, in this case, a new URL allocation should be requested to the second file storage cluster 104.

다음으로, 각각의 파일 전송부(114)는 수신한 URL이 사용 가능한지의 여부를 판단하고, 사용 가능한 경우 해당 URL을 이용하여 수신된 전송 대상 파일을 제2 DFS 클라이언트(118)를 통해 제2 파일 스토리지 클러스터(104)에 저장한다(510). 이때 제2 파일 스토리지 클러스터(104)가 스트라이핑 방식으로 저장할 경우에는 하나의 파일이 복수 개의 슬레이브 서버에 그룹(chunk) 단위로 나뉘어 저장되며, 분산 방식으로 저장할 경우에는 지정된 슬레이브 서버에 파일 단위로 저장된다.Next, each file transmission unit 114 determines whether or not the received URL is available, and if available, the second file through the second DFS client 118 using the URL to be received. Stored in storage cluster 104 (510). In this case, when the second file storage cluster 104 stores a striping method, one file is stored in groups of a plurality of slave servers in chunks, and when the second file storage cluster 104 is stored in a distributed manner, the second file storage cluster 104 is stored in file units on a designated slave server. .

다음으로, 각각의 파일 전송부(114)는 전송할 파일이 남아있는지를 판단하고(512), 남아있는 파일이 존재하는 경우 206 내지 210 단계를 반복 수행한다.Next, each file transmitter 114 determines whether a file to be transmitted remains (512), and if the remaining file exists, repeats steps 206 to 210.

만약 전송할 파일이 남아있지 않은 경우, 각각의 파일 전송부(114)는 파일 파일 전송 결과를 파일 전송 관리자(110)에게 보고한다(514). 그러면 파일 전송 관리자(110)는 상기 파일 전송 결과로부터 전송 실패 파일이 존재하는지의 여부를 판단하고(516), 존재하는 경우 204 단계로 돌아가 해당 파일의 재전송을 시도한다. 다만, 무한 루프를 방지하기 위하여 본 단계에서 일정 횟수 이상 파일 전송이 실패할 경우 해당 파일에 대해서는 작업 할당을 중지하는 것이 바람직하다.If no file remains to be transmitted, each file transfer unit 114 reports the file file transfer result to the file transfer manager 110 (514). Then, the file transfer manager 110 determines whether or not the transfer failed file exists from the file transfer result (516), and if so, returns to step 204 and attempts to retransmit the file. However, in order to prevent an infinite loop, when a file transfer fails more than a predetermined number of times in this step, it is preferable to stop work assignment for the file.

마지막으로, 전송 실패 파일이 존재하지 않거나 또는 일정 횟수 이상 파일 전송에 실패하여 해당 파일의 전송이 취소된 경우 파일 전송은 종료된다(518).
Finally, if the transfer failed file does not exist or if the transfer of the file is canceled because the file transfer has failed for a predetermined number of times, the file transfer is terminated (518).

한편, 본 발명의 실시예는 본 명세서에서 기술한 방법들을 컴퓨터상에서 수행하기 위한 프로그램을 포함하는 컴퓨터 판독 가능 기록매체를 포함할 수 있다. 상기 컴퓨터 판독 가능 기록매체는 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야에서 통상의 지식을 가진 자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광 기록 매체, 플로피 디스크와 같은 자기-광 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.
On the other hand, an embodiment of the present invention may include a computer-readable recording medium including a program for performing the methods described herein on a computer. The computer-readable recording medium may include a program command, a local data file, a local data structure, or the like, alone or in combination. The media may be those specially designed and constructed for the present invention or may be known and available to those of ordinary skill in the computer software arts. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floppy disks, and magnetic media such as ROMs, And hardware devices specifically configured to store and execute program instructions. Examples of program instructions may include machine language code such as those generated by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like.

이상에서 대표적인 실시예를 통하여 본 발명에 대하여 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be construed as limiting the scope of the present invention. I will understand.

그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.
Therefore, the scope of the present invention should not be limited to the above-described embodiments, but should be determined by equivalents to the appended claims, as well as the appended claims.

100: 파일 스토리지 클러스터간 병렬 파일 전송 시스템
102: 제1 파일 스토리지 클러스터
104: 제2 파일 스토리지 클러스터
106: 파일 중계 장치
108: 중계 서버
110: 파일 전송 관리자
112: 제1 DFS 클라이언트
114: 파일 전송부
116: 제1 DFS 클라이언트
118: 제2 DFS 클라이언트
120: 네트워크100: parallel file transfer system between file storage clusters
102: first file storage cluster
104: second file storage cluster
106: file relay device
108: relay server
110: file transfer manager
112: First DFS Client
114: file transfer unit
116: First DFS Client
118: Second DFS Client
120: Network

Claims

A system for transferring a file stored in a first file storage cluster to a second file storage cluster,
A file that receives information of transfer target files stored in the first file storage cluster from a master server of the first file storage cluster, and allocates the transfer target files to a plurality of file transfer units based on the transfer target file information. Transmission manager; And
A file inter-cluster parallel file including the plurality of file transfer units for receiving a transfer target file allocated from the file transfer manager from the first file storage cluster and storing the received transfer target file in the second file storage cluster. Transmission system.

The method according to claim 1,
And wherein the information of the transfer target file includes a URL and a file size in the first file storage cluster of the transfer target file.

The method according to claim 2,
And the file transfer manager assigns the transfer target files to the plurality of file transfer units such that the sum of sizes of transfer target files allocated to each file transfer unit is equalized for each file transfer unit.

The method according to claim 3,
The file transfer manager arranges the transfer target files according to size and sequentially allocates the sorted transfer target files to the file transfer unit, wherein a specific round file allocation order is the same as the previous conference file allocation order. Allocating the transfer destination files to be reversed.

The method according to claim 2,
The plurality of file transfer units request a URL allocation in the second file storage cluster of the transfer target file to a master server of the second file storage cluster, and from the first file storage cluster according to the allocated URL. And a file transfer system for storing the received transfer target file.

The method according to claim 1,
The plurality of file transfer units are disposed in a plurality of slave servers included in the first file storage cluster.

The method of claim 6,
The information of the transfer target file includes a URL, a file size, and identification information of a slave server in which the file is stored, in the first file storage cluster of the transfer target file.

The method of claim 7,
The file transfer manager divides the transfer target file into a plurality of groups by the number of the file transfer units and allocates each of the divided groups to each file transfer unit, but the sum of the size of the transfer target files included in each group is increased. A file transfer system that divides evenly into groups.

The method according to claim 8,
And the file transfer manager assigns to the specific file transfer unit a group containing the most transfer target files stored in the same slave server as the specific file transfer unit.

The method of claim 6,
The information of the transfer target file includes identification information of a slave server in which a URL, a file size, and each chunk constituting each transfer target file are stored in the first file storage cluster of the transfer target file. File transfer system.

The method of claim 10,
The file transfer manager divides the transfer target file into a plurality of groups by the number of the file transfer units and allocates each of the divided groups to each file transfer unit, but the sum of the size of the transfer target files included in each group is increased. A file transfer system that divides evenly into groups.

The method of claim 11,
And the file transfer manager assigns to the specific file transfer unit a group including the most pieces of a transfer target file stored in the same slave server as the specific file transfer unit.

A method for transferring a file stored in a first file storage cluster to a second file storage cluster, the method comprising:
Receiving, at a file transfer manager, information of transfer target files stored in the first file storage cluster from a master server of the first file storage cluster;
Assigning, by the file transfer manager, the transfer target files to a plurality of file transfer units based on the received information of the transfer target file; And
In the plurality of file transfer unit, File storage cluster comprising the step of receiving a transfer target file allocated from the file transfer manager from the first file storage cluster, and storing the received transfer target file in the second file storage cluster Parallel file transfer method.

The method according to claim 13,
And the information of the transfer target file comprises a URL and a file size in the first file storage cluster of the transfer target file.

The method according to claim 14,
The step of allocating the transfer target files to the plurality of file transfer units may include transferring the transfer target files to the plurality of file transfer units such that the sum of sizes of transfer target files allocated to each file transfer unit is equalized for each file transfer unit. File transfer method.

16. The method of claim 15,
The step of assigning the transfer target files to a plurality of file transfer unit,
Sorting the transfer target files according to a size; And
Allocating the transferred target files sequentially to the file transfer unit, allocating the transfer target files such that a specific round file allocation order is opposite to a previous conference file allocation order. Way.

The method according to claim 14,
Storing the transfer target file in the second file storage cluster,
Requesting a URL allocation in the second file storage cluster of the transfer target file to a master server of the second file storage cluster; And
And storing the transfer target file received from the first file storage cluster in the second file storage cluster according to the assigned URL.

The method according to claim 14,
The information of the transfer target file includes a URL, a file size, and identification information of a slave server in which the file is stored, in the first file storage cluster of the transfer target file.

19. The method of claim 18,
The step of allocating the transfer target files to a plurality of file transfer units may include dividing the transfer target files into a plurality of groups by the number of the file transfer units and assign each divided group to each file transfer unit, wherein each group And dividing the sum of the size of the transfer target files included in the evenly into groups.

The method of claim 19,
The allocating of the transfer target files to the plurality of file transfer units may include: allocating a group including the most transfer target files stored in the same slave server as the transfer file to the specific file transfer unit.

The method according to claim 14,
The information of the transfer target file includes identification information of a slave server in which a URL, a file size, and each chunk constituting each transfer target file are stored in the first file storage cluster of the transfer target file. File transfer method.

The method of claim 10,
The step of allocating the transfer target files to a plurality of file transfer units may include dividing the transfer target files into a plurality of groups by the number of the file transfer units and assign each divided group to each file transfer unit, wherein each group And dividing the sum of the size of the transfer target files included in the evenly into groups.

23. The method of claim 22,
The allocating of the transfer target files to the plurality of file transfer units may include: allocating a group including the most fragments of the transfer target files stored in the same slave server as the specific file transfer unit to the specific file transfer unit. .

A computer-readable recording medium having recorded thereon a program for performing the method according to any one of claims 13 to 23 on a computer.