KR101497669B1

KR101497669B1 - File management apparatus and method for recovering original file with at least predetermined number of file fragment

Info

Publication number: KR101497669B1
Application number: KR1020130047409A
Authority: KR
Inventors: 박영훈; 서승우
Original assignee: 서울대학교산학협력단
Priority date: 2013-04-29
Filing date: 2013-04-29
Publication date: 2015-03-11
Also published as: KR20140128685A

Abstract

본 발명은 복수의 분산 저장 장치로부터 수신된 파일 조각들로부터 원본 파일을 복원하는 파일 관리 장치 및 방법에 관한 것이다. 본 발명에 따른 파일 관리 장치 또는 방법은 원본 파일이 분산 저장된 복수의 분산 저장 장치들과 각각 대응되는 파일 조각 목록들을 결정하는 단계, 파일 관리 장치와 복수의 분산 저장 장치들 사이의 통신 속도들에 따라, 파일 조각 목록들 중 적어도 일부를 수정하는 단계, 수정된 파일 조각 목록들을 참조하여, 복수의 분산 저장 장치들 각각으로부터 파일 조각들을 수신하는 단계 및 수신된 파일 조각들로부터 원본 파일을 복원하는 단계를 포함한다.
본 발명에 따르면, 파일 관리 장치는 분산 저장 장치들로부터 파일 블록을 중복하여 수신하지 않고, 통신 속도가 빠른 분산 저장 장치로부터 더 많은 파일 블록을 수신한다. 따라서, 파일 조각들을 수신하는데 소요되는 시간 및 통신 비용이 절감될 수 있다.The present invention relates to a file management apparatus and method for restoring an original file from file fragments received from a plurality of distributed storage devices. A file management apparatus or method according to the present invention includes the steps of: determining file fragment lists corresponding to a plurality of distributed storage devices in which an original file is distributed; determining, based on communication speeds between the file management apparatus and a plurality of distributed storage devices , Modifying at least a portion of the file fragment lists, receiving file fragments from each of the plurality of distributed storage devices with reference to the modified file fragment lists, and restoring the original file from the received file fragments .
According to the present invention, the file management apparatus does not receive the file blocks from the distributed storage devices redundantly, but receives more file blocks from the distributed distributed storage device. Thus, the time and communication cost required to receive the file fragments can be reduced.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a file management apparatus and method for recovering an original file with a predetermined number or more of file fragments. BACKGROUND OF THE INVENTION 1. Field of the Invention < RTI ID =

본 발명은 복수의 분산 저장 장치에 분산 저장된 파일 조각들로부터 원본 파일을 복구하는 파일 관리 장치 및 방법에 관한 것으로서, 보다 구체적으로는 복수의 분산 저장 장치에 저장된 파일 조각들 중 일정 수 이상의 파일 조각이 획득될 때만 원본 파일을 복구하는 파일 관리 장치 및 방법에 관한 것이다.The present invention relates to a file management apparatus and method for recovering an original file from file fragments stored in a plurality of distributed storage devices, and more particularly, to a file management apparatus and method for restoring a predetermined number or more of file fragments stored in a plurality of distributed storage devices To a file management apparatus and method for restoring an original file only when it is acquired.

비밀 정보를 보관하는 경우, 비밀 정보가 분실되거나 파괴될 위험성은 항상 존재한다. 이와 동시에 비밀정보가 도난될 수 있는 위험성 역시 존재한다. 분실 또는 파괴의 위험성은 비밀 정보를 복수 곳의 장소에 보관해 둠으로써 줄일 수 있지만, 이 경우 도난의 위험성이 증가하는 단점이 있다. 이들 위험성을 함께 해결하는 방법의 하나로서 비밀 분산법(Secret Sharing)이 제시되었다.
비밀 분산법은 비밀 정보 MSK로부터 복수의 분산 정보(예를 들어, SH(1), … , SH(N))를 생성하고, 이들을 복수의 분산 저장 장치(예를 들어, PA(1) ,...,PA(N))에 분산하여 관리시키고, 이들 분산 정보 SH(1) ,...,SH(N) 중 소정 수 이상의 정보를 취득할 수 있는 경우에만, 비밀 정보 MSK를 복원할 수 있는 방식이다.
비밀 분산법은 저장되는 값의 기밀성, 가용성, 무결성을 모두 보장해줄 수 있다는 장점 덕분에 많은 분산 관리 장치에 적용되어 왔다. 저장된 파일(F)을 작은 단위인 값(예를 들어, F[1], F[2], …, F[s], 단, s는 파일을 이루는 값의 개수)으로 쪼갠 뒤, 각각의 값 F[i]들을 비밀 분산법을 이용하여 복수의 분산 정보 f[i,1], f[i,2], …, f[i,n] (단, n은 저장 장치의 개수)을 만들어 내고, 분산 정보를 f(s) = f[1,m] || f[2,m] || … || f[s,m] (단, m=1,2,…,n) 과 같이 연결하여 n개의 파일 조각 f(1), f(2), …, f(n)을 만들어 낸다.
기존 비밀 분산법이 적용된 파일 관리 방법에서는 일정 수 미만의 파일 조각으로 전체 파일의 내용을 알 수 없으므로 저장된 파일의 기밀성을 제공할 수 있었고, 일정 수 미만의 파일 조각이 없어지거나 손상되더라도 나머지 파일 조각으로 파일을 복구할 수 있기 때문에 가용성을 보장해줄 수 있었다. 그러나, 최근 들어 저장되는 파일의 크기가 커지면서 연산량이 많은 기존 비밀 분산법을 적용하기에는 무리가 있음과 동시에 파일 조각이 원래의 파일과 크기가 동일하여 저장 공간 및 통신 비용의 낭비가 커질 가능성이 있다.
위와 같은 문제점을 해결하기 위하여, 관련된 선행 특허 한국특허출원 제10-2013-0016390호(PCT/KR2013/002084)가 제안되었다. 하지만, 이 선행 특허를 앞서 설명한 파일 분산 관리 시스템에 적용하였을 경우, 여러 분산된 저장 장치로부터 중복된 파일 블록들이 전송될 수 있다. 이로 인하여, 원본 파일을 복구하기 위하여 파일 조각들을 다운로드 할 시 원본 파일 크기의 몇 배에 해당하는 데이터를 전송받아야 하므로 통신 비용 낭비가 여전히 크고, 복구 시간도 증가하는 문제점이 있다.When confidential information is kept, there is always the risk that the confidential information will be lost or destroyed. At the same time, there is a risk that confidential information can be stolen. The risk of loss or destruction can be reduced by storing confidential information in multiple locations, but in this case the risk of theft increases. Secret Sharing is proposed as one of the ways to solve these risks together.
The secret distribution method generates a plurality of pieces of distributed information (for example, SH (1), ..., SH (N)) from the secret information MSK and stores them in a plurality of distributed storage devices (for example, PA (1),. The secret information MSK can be restored only when it is possible to acquire a predetermined number or more of these pieces of distributed information SH (1), ..., SH (N) It is a way.
The secret distribution method has been applied to many distributed management devices because of the advantage that it can guarantee the confidentiality, availability and integrity of stored values. , And F [s], where s is the number of values constituting the file), and then stores the value of each value (F [1], F [ F [i] are distributed to a plurality of distributed information f [i, 1], f [i, 2], ... , f [i, n] (where n is the number of storage devices), and distributes the distributed information f (s) = f [1, m] || f [2, m] || ... || f (1), f (2), ..., n by connecting them as f [s, m] , and f (n).
In the file management method using the existing secret distribution method, it is possible to provide the confidentiality of the stored file because the contents of the entire file can not be known by the file fragments of less than a certain number, and even if the file fragments under the predetermined number are lost or damaged, We were able to ensure availability because we could recover the files. However, since the size of a file to be stored has increased in recent years, it is difficult to apply an existing secret distribution method with a large amount of computation, and at the same time, there is a possibility that storage space and communication cost are wasted because the file fragments are the same size as the original file.
In order to solve the above problems, related prior Korean Patent Application No. 10-2013-0016390 (PCT / KR2013 / 002084) has been proposed. However, when this prior patent is applied to the above-described file distribution management system, redundant file blocks can be transmitted from various distributed storage devices. Therefore, when downloading the file fragments to recover the original file, data corresponding to several times the size of the original file must be transmitted, so that the communication cost is wasted and the recovery time is also increased.

삭제delete

한국특허출원 제10-2013-0016390호Korean Patent Application No. 10-2013-0016390

본 발명의 목적은 분산 저장된 파일 조각들로부터 원본 파일을 복구할 때, 분산 저장 장치와의 통신 속도를 고려함으로써 복구 시간을 최소화하는 파일 관리 장치 및 방법을 제공하는 데 있다.
본 발명의 다른 목적은 분산 저장된 파일 조각들로부터 원본 파일을 복구할 때 중복되는 데이터가 다운로드 되지 않도록 함으로써, 복구 시간 및 필요 통신량을 최소화하고 통신 비용을 절감시키는 파일 관리 장치 및 방법을 제공하는 데 있다.
본 발명의 다른 목적은 분산 저장된 파일 조각들 중 일정 수 이상의 파일 조각을 획득할 때만 원본 파일이 복구되도록 함으로써, 파일 관리 장치 및 방법의 보안성과 기밀성을 향상시키는 데 있다.An object of the present invention is to provide a file management apparatus and method for minimizing a recovery time by considering a communication speed with a distributed storage device when restoring an original file from distributed file fragments.
It is another object of the present invention to provide a file management apparatus and method for minimizing recovery time and required communication amount and reducing communication cost by preventing duplicated data from being downloaded when restoring an original file from distributed file fragments .
Another object of the present invention is to improve the security and confidentiality of a file management apparatus and method by allowing an original file to be recovered only when a predetermined number or more of pieces of fragmented file fragments are acquired.

본 발명에 따른 파일 관리 방법은 원본 파일이 분산 저장된 복수의 분산 저장 장치들과 각각 대응되는 파일 조각 목록들을 결정하는 단계; 파일 관리 장치와 상기 복수의 분산 저장 장치들 사이의 통신 속도들에 따라, 상기 파일 조각 목록들 중 적어도 일부를 수정하는 단계; 상기 수정된 파일 조각 목록들을 참조하여, 상기 복수의 분산 저장 장치들 각각으로부터 상기 파일 조각들을 수신하는 단계; 및 상기 수신된 파일 조각들로부터 상기 원본 파일을 복원하는 단계를 포함한다.
실시 예로서, 상기 파일 조각 목록들 각각은 대응되는 분산 저장 장치에 저장된 상기 원본 파일의 파일 블록들 중 적어도 일부를 포함하도록 결정된다.
실시 예로서, 상기 파일 조각 목록들은 서로 중복되는 블록을 포함하지 않도록 결정된다.
실시 예로서, 상기 파일 조각 목록들을 결정하는 단계는, 상기 파일 조각 목록들 중 하나의 파일 조각 목록에 상기 하나의 파일 조각 목록과 대응되는 분산 저장 장치에 저장된 상기 원본 파일의 파일 블록들 전부를 할당하는 단계; 및 상기 파일 조각 목록들 중 상기 하나의 파일 조각 목록을 제외한 나머지 목록들에 순차적으로 대응되는 분산 저장 장치에 저장된 상기 원본 파일의 파일 블록들 중 적어도 일부의 파일 블록들을 각각 할당하는 단계를 포함하되, 상기 적어도 일부의 파일 블록들은 앞서 할당된 파일 조각 목록들에 포함된 파일 블록들과 중복되지 않는다.
실시 예로서, 상기 하나의 파일 조각 목록은 상기 복수의 분산 저장 장치들 중 상기 파일 관리 장치와의 통신 속도가 가장 빠른 분산 저장 장치와 대응되는 파일 조각 목록이다.
실시 예로서, 상기 파일 조각 목록들 중 적어도 일부를 수정하는 단계는, 상기 파일 조각 목록들이, 대응되는 분산 저장 장치와 상기 파일 관리 장치 사이의 통신 속도가 빠를수록 더 많거나 동일한 수의 파일 블록들을 포함하도록 하는 단계를 포함한다.
실시 예로서, 상기 파일 조각 목록들 각각의 파일 블록들의 개수는 상기 대응되는 분산 저장 장치와 상기 파일 관리 장치 사이의 통신 속도에 비례한다.
실시 예로서, 상기 더 많거나 동일한 수의 파일 블록들을 포함하도록 하는 단계는, 상기 파일 조각 목록들 중 적어도 하나의 파일 조각 목록에 대해, 상기 적어도 하나의 파일 조각 목록에 대응되는 분산 저장 장치와 상기 파일 관리 장치 사이의 통신 속도에 비해서 상기 적어도 하나의 파일 조각 목록에 포함된 파일 블록의 개수가 과잉 또는 과소인지 판단하는 단계; 및 상기 과잉 또는 과소의 판단 결과에 따라, 적어도 일부의 파일 블록을 옮기는 단계를 포함한다.
실시 예로서, 상기 적어도 일부의 파일 블록을 옮기는 단계는, 상기 옮겨지는 파일 블록이 상기 복수의 분산 저장 장치 중 얼마나 많은 분산 저장 장치들에 공통적으로 저장되어 있는지를 나타내는 인기도 함수를 이용하여, 상기 옮겨지는 파일 블록을 결정하는 단계를 포함한다.
실시 예로서, 상기 인기도 함수는,

의 수학식으로 정의되고, 상기 k는 상기 복수의 분산 저장 장치들의 개수이고, 상기 r_j는 상기 복수의 분산 저장 장치들 중 j번째 분산 저장 장치의 상기 파일 관리 장치와의 통신 속도에 비례하도록 결정되는 파일 블록의 개수이고, 상기 A_j는 상기 파일 조각 목록들 중 상기 j번째 분산 저장 장치에 대응되는 파일 조각 목록의 크기 또는 원소의 개수이고, 상기 S_j는 상기 j번째 분산 저장 장치에 저장된 상기 원본 파일의 파일 블록들을 포함한다.
실시 예로서, 상기 파일 조각들의 적어도 일부가 기준 시간 내에 수신되었는지 판단하는 단계; 및 상기 판단 결과에 따라, 상기 파일 조각들의 적어도 일부를 상기 파일 관리 장치로부터 삭제하거나, 상기 파일 조각들의 적어도 일부를 전송하던 분산 저장 장치와는 상이한 다른 분산 저장 장치에서 상기 파일 조각들의 적어도 일부를 재전송하는 단계; 및 상기 파일 관리 장치와 상기 복수의 분산 저장 장치들 사이의 갱신된 통신 속도들에 따라, 상기 파일 조각 목록들 중 적어도 일부를 재수정하는 단계를 포함한다.
실시 예로서, 상기 파일 조각들은 소정의 개수의 파일 블록 단위로 분할되어 상기 파일 관리 장치에 수신된다.
실시 예로서, 상기 재수정된 상기 파일 조각 목록들은 상기 파일 관리 장치가 상기 복수의 분산 저장 장치들로부터 이미 수신한 파일 블록을 포함하지 않는다.
실시 예로서, 상기 파일 조각 목록들 중 적어도 일부를 재수정하는 단계는, 상기 파일 조각 목록들이, 대응되는 분산 저장 장치의 상기 갱신된 통신 속도가 빠를수록 더 많거나 동일한 수의 파일 블록들을 포함하도록 하는 단계를 포함한다.
실시 예로서, 상기 원본 파일은, 상기 복수의 분산 저장 장치들의 개수가 소정의 수 이상일 때만 성공적으로 복원된다.
본 발명에 따른 복수의 분산 저장 장치들로부터 파일 조각들을 수신하는 파일 관리 장치는, 상기 복수의 분산 저장 장치들과 각각 대응되는 파일 조각 목록들을 결정하고, 상기 파일 관리 장치와 상기 복수의 분산 저장 장치들 사이의 통신 속도들에 따라, 상기 파일 조각 목록들 중 적어도 일부를 수정하는 스케쥴러; 상기 복수의 분산 저장 장치들과의 통신을 수행하거나 상기 복수의 분산 저장 장치들과의 통신을 위한 인터페이스를 제공하는 통신부; 및 상기 수정된 파일 조각 목록들을 참조하여, 상기 통신부를 통해 상기 복수의 분산 저장 장치들 각각으로부터 상기 파일 조각들을 수신하고, 상기 수신된 파일 조각들로부터 상기 원본 파일을 복원하도록 상기 파일 관리 장치를 제어하는 컨트롤러를 포함하고, 상기 원본 파일은, 상기 수신된 파일 조각들의 개수가 소정의 수 이상일 때만 성공적으로 복원된다.
실시 예로서, 상기 스케쥴러는, 상기 파일 조각 목록들 각각이 대응되는 분산 저장 장치에 저장된 파일 블록들 중 적어도 일부를 포함하고, 동시에 서로 중복되는 블록을 포함하지 않도록 상기 파일 조각 목록들을 결정하고, 상기 파일 조각 목록들이, 대응되는 분산 저장 장치의 상기 통신 속도가 빠를수록 더 많거나 동일한 수의 파일 블록들을 포함하도록 상기 파일 조각 목록을 수정한다.
본 발명에 따른 컴퓨터 판독 가능한 기록매체는 원본 파일이 분산 저장된 복수의 분산 저장 장치들과 각각 대응되는 파일 조각 목록들을 결정하는 단계, 파일 관리 장치와 상기 복수의 분산 저장 장치들 사이의 통신 속도들에 따라, 상기 파일 조각 목록들 중 적어도 일부를 수정하는 단계, 상기 수정된 파일 조각 목록들을 참조하여, 상기 복수의 분산 저장 장치들 각각으로부터 상기 파일 조각들을 수신하는 단계 및 상기 수신된 파일 조각들로부터 상기 원본 파일을 복원하는 단계를 포함하고, 상기 원본 파일은, 상기 수신된 파일 조각들의 개수가 소정의 수 이상일 때만 성공적으로 복원되는 파일 분산 관리 방법을 실행하기 위한 컴퓨터 프로그램을 록한다.
실시 예로서, 상기 파일 조각 목록들을 결정하는 단계는 상기 파일 조각 목록들 각각은 대응되는 분산 저장 장치에 저장된 파일 블록들 중 적어도 일부를 포함하고, 동시에 서로 중복되는 블록을 포함하지 않도록 상기 파일 조각 목록들을 결정하고, 상기 파일 조각 목록들을 수정하는 단계는, 상기 파일 조각 목록들이, 대응되는 분산 저장 장치의 상기 통신 속도가 빠를수록 더 많거나 동일한 수의 파일 블록들을 포함하도록 상기 파일 조각 목록을 수정하는, 파일 분산 관리 방법을 실행하기 위한 컴퓨터 프로그램을 기록한다.
A file management method according to the present invention includes the steps of: determining file fragment lists corresponding to a plurality of distributed storage devices in which an original file is distributed; Modifying at least some of the file fragment lists according to communication speeds between the file management device and the plurality of distributed storage devices; Receiving the file fragments from each of the plurality of distributed storage devices with reference to the modified file fragment lists; And restoring the original file from the received file fragments.
In an embodiment, each of the file fragment lists is determined to include at least some of the file blocks of the original file stored in the corresponding distributed storage device.
In an embodiment, the file fragment lists are determined not to include overlapping blocks.
The determining of the file fragment lists may include allocating all of the file blocks of the original file stored in the distributed storage device corresponding to the one file fragment list to one file fragment list of the file fragment lists ; And allocating at least some file blocks among the file blocks of the original file stored in the distributed storage device sequentially corresponding to the list of the file fragments excluding the one file fragment list, The at least some file blocks do not overlap with the file blocks included in the previously allocated file fragment lists.
In one embodiment, the one file fragment list is a file fragment list corresponding to a distributed storage apparatus having the highest communication speed with the file management apparatus among the plurality of distributed storage apparatuses.
As an embodiment, the step of modifying at least some of the file fragments may further comprise the step of determining whether the file fragment lists have more or less the same number of file blocks as the communication speed between the corresponding distributed storage device and the file management device is faster .
In an embodiment, the number of file blocks of each of the file fragment lists is proportional to the communication speed between the corresponding distributed storage device and the file management device.
As an embodiment, the step of including the more or equal number of file blocks may include, for at least one file fragment list of the file fragment lists, a distributed storage device corresponding to the at least one file fragment list, Determining whether the number of file blocks included in the at least one file fragment list is excessive or small compared to a communication speed between the file management apparatuses; And moving at least some of the file blocks according to the determination result of the excess or the excess.
In an embodiment, the step of moving the at least a portion of the file blocks may include using a popularity function indicating how many of the plurality of distributed storage devices the transferred file blocks are commonly stored in, And determining a lost file block.
As an embodiment,

Wherein k is the number of the plurality of distributed storage devices, and r _j is determined to be proportional to the communication speed with the file management device of the j-th distributed storage device among the plurality of distributed storage devices _Wherein A _j is a size or number of elements of a file fragment list corresponding to the jth distributed storage device among the file fragment lists, S _j is a number of file blocks in the jth distributed storage device, Contains file blocks of the original file.
Determining, as an embodiment, whether at least a portion of the file fragments have been received within a reference time; And deleting at least a portion of the file fragments from the file management device in a different distributed storage device than the distributed storage device that was transmitting at least a portion of the file fragments, ; And reassigning at least some of the file fragment lists according to updated communication rates between the file management device and the plurality of distributed storage devices.
In an embodiment, the file fragments are divided into a predetermined number of file block units and are received by the file management apparatus.
In an embodiment, the reassigned file fragment lists do not include file blocks already received from the plurality of distributed storage devices by the file management apparatus.
As an embodiment, reattempting at least some of the file fragment lists may be performed such that the file fragment lists include more or equal number of file blocks as the updated communication rate of the corresponding distributed storage device is faster .
In an embodiment, the original file is successfully restored only when the number of the plurality of distributed storage devices is equal to or greater than a predetermined number.
A file management apparatus for receiving file fragments from a plurality of distributed storage apparatuses according to the present invention is characterized by determining file fragment lists respectively corresponding to the plurality of distributed storage apparatuses, A scheduler for modifying at least some of the file fragment lists according to communication rates between the file fragment lists; A communication unit for performing communication with the plurality of distributed storage devices or providing an interface for communication with the plurality of distributed storage devices; And receiving the file fragments from each of the plurality of distributed storage devices through the communication unit by referring to the modified file fragment lists, and controlling the file management apparatus to restore the original file from the received file fragments Wherein the original file is successfully restored only when the number of the received file fragments is equal to or greater than a predetermined number.
In an embodiment, the scheduler determines the file slice lists so that each of the file slice lists includes at least a part of file blocks stored in a corresponding distributed storage device, and does not include blocks that are overlapped with each other at the same time, The file fragment lists modify the file fragment list such that the faster the communication rate of the corresponding distributed storage device, the more or equal number of file blocks.
A computer-readable recording medium according to the present invention includes: determining file fragment lists corresponding to a plurality of distributed storage devices in which an original file is distributed; The method of claim 1, further comprising: modifying at least a portion of the file fragment lists; receiving the file fragments from each of the plurality of distributed storage devices with reference to the modified file fragment lists; And restoring the original file, wherein the original file is successfully restored only when the number of the received file fragments is equal to or greater than a predetermined number.
In an embodiment, the step of determining file fragment lists may include the step of determining that each of the file fragment lists includes at least some of the file blocks stored in the corresponding distributed storage device, And modifying the file fragment lists, the step of modifying the file fragment lists such that the file fragment lists include more or equal number of file blocks as the communication speed of the corresponding distributed storage device is faster , And records a computer program for executing the file distribution management method.

본 발명에 따르면 분산 저장된 파일 조각들로부터 원본 파일을 복구할 때, 분산 저장 장치와의 통신 속도를 고려함으로써 복구 시간이 최소화될 수 있다.
또한, 분산 저장된 파일 조각들로부터 원본 파일을 복구할 때 중복되는 데이터가 다운로드 되지 않도록 함으로써, 복구 시간 및 필요 통신량이 최소화되고 통신 비용이 절감될 수 있다.
또한, 분산 저장된 파일 조각들 중 일정 수 이상의 파일 조각을 획득할 때만 원본 파일이 복구되도록 함으로써, 파일 관리 장치 및 방법의 보안성과 기밀성이 향상될 수 있다.According to the present invention, when recovering the original file from the distributed file fragments, the recovery time can be minimized by considering the communication speed with the distributed storage device.
In addition, when the original file is restored from the distributed file fragments, the redundant data is not downloaded, thereby minimizing the recovery time and the required amount of communication and reducing the communication cost.
In addition, the security and confidentiality of the file management apparatus and method can be improved by restoring the original file only when acquiring a predetermined number or more of file fragments among distributedly stored file fragments.

도 1은 본 발명의 일 실시예에 따른, 파일 분산 관리 시스템(1000) 및 그 내부 구성을 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른, 파일 관리 장치(100)의 내부 구성을 나타내는 도면이다.
도 3은 본 발명의 실시 예에 따른 파일 관리 장치의 파일 관리 방법을 나타내는 순서도이다.
도 4a 내지 도 4d는 본 발명의 실시 예에 따른, 파일 조각 목록 결정 방법을 구체적인 예를 들어 설명하는 도면이다.
도 5는 본 발명의 일 실시 예에 따라, 파일 조각 목록들을 재스케쥴링하는 파일 관리 방법을 나타내는 순서도이다.
도 6은 본 발명의 다른 일 실시 예에 따라, 파일 조각 목록들을 재스케쥴링하는 파일 관리 방법을 나타내는 순서도이다.1 is a diagram showing a file distribution management system 1000 and its internal configuration according to an embodiment of the present invention.
2 is a diagram showing an internal configuration of a file management apparatus 100 according to an embodiment of the present invention.
3 is a flowchart illustrating a file management method of a file management apparatus according to an embodiment of the present invention.
FIGS. 4A through 4D illustrate a method of determining a file fragment list according to an embodiment of the present invention.
5 is a flowchart illustrating a file management method for rescheduling file fragment lists according to an embodiment of the present invention.
6 is a flowchart illustrating a file management method for rescheduling file fragment lists according to another embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 기재된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.
한편, 본 발명의 선행 특허 한국특허출원 제 10-2013-0016390 호는 여기서 그 전체가 참조로서 인용될 수 있으며, 파일의 분산 저장 및 복원에 대해 본 발명에서 설명되지 않은 것들에 대해서는 한국특허출원 제 10-2013-0016390 호의 내용이 차용될 수 있다. 예를 들어, 본 발명에 따른 분산 저장 장치들에 저장된 파일 조각들은 한국특허출원 제 10-2013-0016390 호에 기재된 방법에 의해 생성된 것일 수 있다. 또한, 본 발명에서 파일 관리 장치는 분산 저장 장치들로부터 파일 조각들을 수신하고, 한국특허출원 제 10-2013-0016390 호에 기재된 복원 방법에 의해 수신한 파일 조각들로부터 원본 파일을 복원할 수 있다.
도 1은 본 발명의 일 실시예에 따른, 파일 분산 관리 시스템(1000) 및 그 내부 구성을 나타낸 도면이다. 파일 분산 관리 시스템(1000)은 파일 관리 장치(1100) 및 n개의 파일 분산 저장 장치들(1100a, 1100b, …, 1100n)로 구성된다.
파일 분산 관리 시스템(1000)은 원본 파일로부터 생성한 복수의 파일 조각들을 n개의 분산 저장 장치들(1100a, 1100b, …, 1100n)에 분산하여 저장한다. 그리고, 파일 분산 관리 시스템(1000)이 l분산된 파일 조각들로부터 원본 파일을 복원하기 위해서는, k개(k는 0보다 큰 n이하의 정수)의 임의의 분산 저장 장치들에 저장된 k개의 파일 조각들을 획득하여야 한다. 즉, 파일 분산 관리 시스템(1000)에서 k개의 파일 조각들을 획득하는 경우 원본 파일이 완전히 복원될 수 있지만, k개보다 적은 수의 파일 조각들만을 획득하는 경우 원본 파일은 완전히 복원되지 않는다. 이와 관련된 분산 및 복원 원리에 관한 구체적인 설명은 한국특허출원 제 10-2013-0016390 호에 상세히 기재되어 있다.
분산 저장 장치들(1100a, 1100b, …, 1100n) 각각은 원본 파일로부터 생성된 파일 조각을 저장한다. 그리고, 파일 관리 장치(1100)의 제어 또는 파일 관리 장치(1100)로부터의 요청에 응답하여 저장된 파일 조각의 일부 또는 전부를 파일 관리 장치(1100)에 제공한다. 예를 들어, 분산 저장 장치들(1100a, 1100b, …, 1100n)에는 한국특허출원 제 10-2013-0016390 호에 기재된 방법에 의해 생성된 파일 조각들이 분산 저장될 수 있다.
분산 저장 장치들(1100a, 1100b, …, 1100n)은 특별한 기능이 없더라도 각각의 파일 조각을 분리하여 저장할 수 있는 장치로서 구성되면 충분하다. 도 1에서, 분산 저장 장치들(1100a, 1100b, …, 1100n)은 물리적으로 분리된 형태로 표현되어 있지만, 이에 한정되는 것은 아니며, 분산 저장 장치들(1100a, 1100b, …, 1100n)은 1개의 물리적 장치에서 논리적으로 분리된 형태로도 가능할 수 있다.
일 실시예에서, 분산 저장 장치들(1100a, 1100b, …, 1100n) 각각은 파일 관리 장치(1100)와의 통신을 위한 별도의 통신부 또는 통신 인터페이스를 구비할 수 있다.
파일 관리 장치(1100)는 분산 저장 장치들로부터 파일 조각들을 수신하고, 수신된 파일 조각들을 이용하여 원본 파일을 복원한다. 파일 관리 장치(1100)에 대한 더욱 상세한 설명은 아래에서 도 2를 참조하여 제공된다.
도 2는 본 발명의 일 실시예에 따른, 파일 관리 장치(100)의 내부 구성을 나타낸 도면이다. 파일 관리 장치(1100)는 통신부(1110), 컨트롤러(1120), 스케쥴러(1130) 및 저장부(1140)를 포함할 수 있으나, 이에 제한되는 것은 아니다.
통신부(1110)는 파일 관리 장치(1100)와 외부 객체와의 통신을 수행하는 역할을 한다. 예를 들어, 파일 관리 장치(1100)는 통신부(1110)를 통해 분산 저장 장치들(1100a, 1100b, …, 1100n, 도 1 참조)과 통신할 수 있다. 통신부(1110)는 외부와 네트워킹 또는 통신이 가능한 모든 통신 수단을 포함할 수 있으며, 그러한 통신 수단에 수반되는 통신 인터페이스를 구비할 수 있다. 통신부(1110)가 이용하는 통신 인터페이스 또는 통신 방법은 유선 통신, 무선 통신, 3G, 4G, 또는 그 밖의 다양한 통신 인터페이스 또는 통신 방법을 포함할 수 있다.
실시 예로서, 통신부(1110)는 파일 관리 장치(1100)와 외부 객체 사이의 통신 상태 정보를 관리하는 통신 상태 관리자(1110)를 포함할 수 있다. 여기서의 통신 상태 정보는 파일 관리 장치(1100)와 외부 객체 사이의 채널 상태 또는 통신 속도에 대한 정보를 포함할 수 있다. 또한, 통신 상태 관리자(1111)는 단순히 통신 상태 정보를 관리 및 저장하는 것 이외에, 외부 객체와의 채널 상태 또는 통신 속도를 검출하여 새로운 통신 상태 정보를 생성하거나 기존의 통신 상태 정보를 갱신할 수도 있다.
여기서는, 통신 상태 관리자(1111)는 통신부(1110)에 포함되는 것으로 설명되었으나, 이에 한정되는 것은 아니며, 통신 상태 관리자(1111)는 파일 관리 장치(1100)의 어느 위치에도 배치될 수 있고, 심지어는 파일 관리 장치(1100)와 분리된 별도의 모듈로서 배치될 수도 있다.
컨트롤러(1120)는 파일 관리 장치(1100) 및 파일 관리 장치를 구성하는 다른 모듈들(1110, 1130, 1140)의 전반적인 동작을 제어한다. 그리고, 컨트롤러(1120)는 파일 관리 장치(1100)의 구동에 필요한 연산들을 수행하는 중앙 처리 장치로서 기능할 수 있다.
스케쥴러(1130)는 파일 관리 장치(1100)가 수신할 파일 블록 또는 파일 조각들의 목록을 생성 및 관리한다. 그리고, 생성된 파일 목록들에 따라 분산 저장 장치들(1100a, 1100b, …, 1100n)로부터 파일 블록 또는 파일 조각들이 수신될 수 있도록 필요한 정보를 컨트롤러(1120) 또는 통신부(1110)에 제공한다. 스케쥴러(1130)가 파일 조각 목록을 생성 및 관리하는 구체적인 방법은 도 3 이하에서 더욱 상세히 설명될 것이다.
저장부(1140)는 파일 관리 장치(1100)에 수신된 데이터 또는 파일 관리 장치(1100)에서 처리한 데이터를 저장한다. 예를 들어, 저장부(1140)는 통신 상태 관리자(1111), 컨트롤러(1120) 또는 스케쥴러(1130)의 기능을 수행하는 과정에서 얻어진 데이터들을 저장할 수 있다.
본 발명은 분산되어 저장된 파일 조각들을 수신하여 원본 파일을 복원하는 파일 관리 장치 및 방법에 관한 것으로서, 파일 관리 장치(1100)와 분산 저장 장치들(1100a, 1100b, …, 1100n) 사이의 통신 속도를 고려하여 파일 조각을 수신함으로써 원본 파일 복원에 걸리는 시간 및 통신 비용을 최소화하는 기술에 관한 것이다.
예를 들어, 일정 수(k 개)의 분산 저장 장치들에 저장된 파일 블록 전체를 수신하면, 중복된 파일 블록들이 수신될 수 있고 그에 따른 통신 비용도 증가하게 된다. 이에, 본 발명에서는 중복되는 파일 블록들이 수신되지 않도록, 파일 조각 목록들을 스케쥴링한다. 여기서, 파일 조각 목록은 분산 저장 장치로부터 수신할 파일 블록들에 대한 정보를 나타내는 목록을 의미하며, 하나의 분산 저장 장치로부터 수신하는 파일 블록들의 세트를 하나의 파일 조각으로 정의할 수 있다. 예를 들어, 분산 저장 장치로부터 하나의 파일 조각을 수신할 때, 그 파일 조각에는 복수의 파일 블록들이 포함될 수 있고, 파일 조각 목록에는 파일 조각에 포함된 파일 블록들의 주소 또는 크기 정보가 포함될 수 있다.
또한, k개의 분산 저장 장치로부터 파일 조각들을 단순히 수신하는 경우, 통신 속도가 낮은 분산 저장 장치로부터 많은 수의 파일 블록들을 수신하고, 통신 속도가 높은 분산 저장 장치로부터 적은 수의 파일 블록들을 수신하게 될 수 있다. 이러한 경우, 전체 파일 블록을 수신하는데 걸리는 시간이 매우 커질 수 있다. 따라서, 본 발명에서는, 각 분산 저장 장치와의 통신 속도에 따라, 통신 속도가 빠른 분산 저장 장치로부터는 상대적으로 많은 파일 블록을 수신하고, 통신 속도가 적은 분산 장치로부터는 상대적으로 적은 파일 블록을 수신하도록, 파일 조각 목록들을 스케쥴링한다.

도 3 이하에서는 본 발명에 따라 수신할 파일 조각 목록들을 스케쥴링하고, 스케쥴링된 파일 조각 목록에 따라 파일 조각들을 수신하는 방법이 설명될 것이다.
이후의 설명을 위해, 몇 가지 용어들이 아래와 같이 정의된다.
n: 분산 저장 장치들의 수
M₁, M₂, …, M_n : 총 n개의 분산 저장 장치들
파일 관리 장치: 분산 저장 장치들로부터 파일 조각을 수신하는 장치
저장 파일 조각들: 각 분산 저장 장치들에 저장된 원본 파일 조각들, 설명의 편의를 위해, 각 분산 저장 장치에는 하나의 저장 파일 조각이 저장되는 것으로 가정한다.
수신 파일 조각들: 파일 관리 장치가 각 분산 장치들로부터 수신하는 파일 조각들, 설명의 편의를 위해, 각 분산 저장 장치로부터는 하나의 수신 파일 조각이 수신되는 것으로 가정한다. 이때, 어떤 분산 저장 장치로부터 수신되는 수신 파일 조각은, 해당 분산 저장 장치에 저장된 저장 파일 조각에 기초하여 생성될 수 있다.
k: 원본 파일을 성공적으로 복구하기 위해 필요한 최소의 저장 파일 조각들의 수. 단, k는 0보다 크고 n이하인 정수이다.
S_i: i번째 분산 저장 장치 M_i의 저장 파일 조각(i번째 저장 파일 조각)에 포함된 파일 블록들을 나타내는 저장 파일 조각 목록
r₁, r₂, …, r_k: 파일 관리 장치와 k개의 분산 저장 장치 M₁, M₂, …, M_k 사이의 속도 비. 단, r₁ ≥ r₂ ≥ … ≥ r_k 이고, r₁ + r₂ + … + r_k =

이다.
A_i: 파일 관리 장치가 i번째 분산 저장 장치 M_i로부터 수신하는 수신 파일 조각(i 번째 수신 파일 조각)에 포함되는 파일 블록들을 나타내는 수신 파일 조각 목록
W: {1, 2, 3, …, _nC_k-1}
단, 여기서는 설명의 편의를 위해, 저장 및 수신 파일 조각들에 포함된 파일 블록들은 모두 동일한 크기(예를 들어, 4바이트)를 갖는 것으로 가정한다. 따라서, 파일 조각의 크기가 일정한 상태에서 파일의 크기가 커질수록 파일 블록의 개수가 늘어난다.

본 발명에서 제안하는 기술은 파일 관리 장치가 n개 중 k개의 분산 저장 장치(M₁, M₂, …, M_k)로부터 수신 파일 조각들을 수신하되, 수신한 수신 파일 조각들이 서로 중복되는 파일 블록을 포함하지 않도록 한다. 또한, 파일 관리 장치와 분산 저장 장치 사이의 통신 속도에 따라 각각의 분산 저장 장치로부터 수신하는 파일 블록들의 개수(또는, 수신 파일 조각의 크기)를 조정함으로써, 파일 관리 장치가 수신 파일 조각들을 모두 수신하는 데 걸리는 시간을 최소화한다.
파일 관리 장치가 수신하는 수신 파일 조각들이 서로 중복되는 파일 블록을 포함하지 않도록 하기 위해서는, 임의의 i, j(단, i ≠ j)에 대하여, A_i ∩ A_j = 공집합 이어야 한다.
또한, 통신 속도를 고려하였을 때, 파일 관리 장치는 통신 속도가 빠른 분산 저장 장치로부터 더 많은 파일 블록들을 수신할수록 전체 수신 파일 조각들을 수신하는데 걸리는 시간이 감소된다. 이론적으로는, 파일 관리 장치와 각각의 분산 저장 장치들 사이의 통신 속도의 비율이 각 분산 저장 장치들로부터 수신할 파일 블록 수(또는 수신 파일 조각의 크기)들의 비율과 같을 때, 전체 수신 파일 조각들을 수신하기 위한 시간이 가장 작게 된다.
이하에서는, 파일 관리 장치가 각 분산 저장 장치들로부터 수신할 파일 조각들을 나타내는 파일 조각 목록(수신 파일 조각 목록, A_i)들을 결정 또는 스케쥴링하는 방법이 설명된다.
먼저, 수학식 1과 같은 인기도 함수 P：W→Z가 정의된다.

인기도 함수는 어떠한 파일 블록이 복수의 분산 저장 장치 중 얼마나 많은 분산 저장 장치들에 공통적으로 저장되어 있는지를 나타내는 함수이다.
본 발명에서 파일 관리 장치는 n개의 분산 저장 장치들(M₁, M₂, M₃,… , M_n) 중 k개의 분산 저장 장치들(M_p1, M_p2, M_p3,… , M_pk)로부터 각각 하나씩 총 k개의 수신 파일 조각들을 수신한다(단, r_p1≥r_p2≥…≥r_pk). 이때, k개의 수신 파일 조각들은 k개의 분산 저장 장치들(M_p1, M_p2, …, M_pk)에 저장된 k개의 저장 파일 조각들로부터 생성된다.
위와 같은 전제하에서, 파일 관리 장치가 수신 파일 조각 목록들을 결정 또는 스케쥴링하는 방법은 다음과 같은 알고리즘에 의해 결정된다.
1: A_p1 에 S_p1을 대입(A_p1 = S_p1)한다.
2: (반복문 시작) j=2 부터 k까지 반복
3: A_pj에 S_pj＼(A_p1 ∪ A_p2 ∪ … ∪ A_p{j-1})을 대입한다.
4: (반복문 끝)
5: (반복문 시작)
6: {p1, p2, …, pk}의 원소들의 순서를 재배열한 결과를 {q1, q2, …, qk}로 정의하되, {q1, q2, …, qk}는 아래 식을 만족하도록 정의된다.
｜A_q1｜- r_q1 ≥ ｜A_q2｜- r_q2 ≥ … ≥ ｜A_qk｜- r_qk
단, 여기서 ｜X｜의 의미는 집합 X의 크기 또는 원소의 개수를 의미한다.
7: (조건문 시작) 만일 ｜A_q1｜= r_q1 이면,
8: (A₁, A₂, …, A_k)를 수신 파일 조각 목록들으로서 출력하고 프로세스를 종료
9: (조건문 끝)
10: t에 k를 대입(t=k), g에 ｜A_q1｜- r_q1 를 대입(g=｜A_q1｜- r_q1)
11: (반복문 시작) r_qt 〉｜A_qt｜인 동안 반복
12: 집합 A_q1∩S_qt의 원소들을 순서를 재배열한 결과를 {π₁, π₂, …, π_μ}로 정의하되, {π₁, π₂, …, π_μ}는 아래 식을 만족하도록 정의된다.
μ=｜A_q1∩S_qt｜, P(π₁) ≤ P(π₂) ≤ … ≤ P(π_μ)
13: υ에 min{μ,｜A_q1｜- r_q1,r_qt - ｜A_qt｜}를 대입
(υ = min{μ,｜A_q1｜- r_q1,r_qt - ｜A_qt｜})
14: A_q1에 A_q1＼{π₁, π₂, …, π_υ})을 대입
(A_q1= A_q1＼{π₁, π₂, …, π_υ})
15: A_qt에 A_qt∪{π₁, π₂, …, π_υ})을 대입
(A_qt= A_qt∪{π₁, π₂, …, π_υ})
16: t에 t-1을 대입(t = t-1)
17: (반복문 끝)
18: (조건문 시작) 만일, g = ｜A_q1｜- r_q1이면,
19: (A_p1, A_p2, …, A_pk)를 수신 파일 조각 목록들으로서 출력하고 프로세스를 종료
20: (조건문 끝)
21: (반복문 끝)
단, 여기서는 r_p1, r_p2, …, r_pk의 합이 수신 파일 조각들에 포함되는 전체 파일 블록들의 개수와 동일한 것으로 가정하였다. 만약 그렇지 않다면, 위 알고리즘에서 r_p1, r_p2, …, r_pk는 각 분산 저장 장치들의 속도에 비례하도록 수신 파일 조각들에 포함되는 전체 파일 블록들을 나눈 값으로 이해되어야 한다(여기서, 파일 관리 장치와 p₁번째 분산 저장 장치, p₂번째 분산 저장 장치 , …, p_k 번째 분산 저장 장치와의 속도의 비는 r_p1 : p_r2 : … : r_pk 이며, r_p1, r_p2, …, r_pk = _nC_k-1).
본 발명에 따른 파일 관리 장치(1100, 도 1 참조)는 위에서 설명된 알고리즘에 따라 수신 파일 조각 목록을 생성 또는 스케쥴링하고, 생성 또는 스케쥴링된 수신 파일 조각 목록에 따라 k개의 분산 저장 장치들(1100a, 1100b, …, 1100k, 도 1 참조)로부터 수신 파일 조각들을 수신한다.
위에서 설명된, 알고리즘에 따르면, 복수의 분산 저장 장치들(1100a, 1100b, …, 1100k)에 각각 대응되는 집합 형식으로 표현된 k개의 수신 파일 조각 목록들(A_p1, A_p2, …, A_pk)이 출력되고, 이때 수신 파일 조각 목록들의 크기 비는 파일 관리 장치와 분산 저장 장치들(1100a, 1100b, …, 1100n) 사이의 통신 속도 비에 근접 또는 동일해진다.

도 3은 본 발명의 실시 예에 따른 파일 관리 장치가 분산 저장 장치들로부터 파일 조각들(수신 파일 조각들)을 수신하여 원본 파일을 복원하는 파일 관리 방법을 나타내는 순서도이다.
도 3의 파일 관리 방법은 S110 단계 내지 S140 단계를 포함한다. 그 중, S110 단계 및 S120 단계는 위에서 설명한 알고리즘에 따라 파일 조각 목록(수신 파일 조각 목록)을 결정 또는 스케쥴링하는 스케쥴링 단계를 구성한다.
S110 단계에서, 파일 관리 장치(1100, 도 1 참조)는 복수의 분산 저장 장치(1100a, 1100b, …, 1100k, 도 1 참조)들에 각각 대응되는 수신 파일 조각 목록을 결정한다. 여기서 수신 파일 조각 목록은 대응되는 저장 파일 조각 목록에 포함되는 포함관계가 성립할 수 있다.
S110 단계에서는 파일 관리 장치(1100)는 각 수신 파일 조각 목록이 중복되는 파일 블록들을 나타내지 않도록 수신 파일 조각 목록을 결정한다. S110 단계는 위에서 설명한 알고리즘의 1 단계 내지 4 단계에 해당한다.
구체적으로, 파일 관리 장치(1100)는 P₁ 번째 수신 파일 조각 목록(A_p1)에는 P₁ 번째 저장 파일 조각 목록 전부(S_p1)를 대입하고, P₂ 번째 수신 파일 조각 목록(A_p2)에는 P₂ 번째 저장 파일 조각 목록(S_p2)에서 P₁ 번째 수신 파일 조각 목록(A_p1)과의 중복 부분을 뺀 나머지(S_p2＼ A_p1)를 대입하고, 동일한 방법으로 순차적으로, P_k번째 수신 파일 조각 목록(A_pk)에는 P_k번째 저장 파일 조각 목록(S_pk)에서 P₁번째 수신 파일 조각 목록(A_p1) 내지 P_{k-1}번째 저장 파일 조각 목록(A_p{k-1})과의 중복 부분을 뺀 나머지(S_k＼(A_p1 ∪ A_p2 ∪ … ∪ A_p{k-1})를 대입한다.
위와 같은 방법에 의해, k 개의 분산 저장 장치에 저장된 모든 파일 블록들(저장 파일 조각들에 포함된)은 k 개의 수신 파일 조각 목록들에 할당되고, 각각의 수신 파일 조각 목록들은 서로 중복되는 파일 블록을 포함하지 않게 된다.
여기서 할당된다는 것은 어떤 수신 파일 조각 목록이 파일 블록들을 포함하도록 되는 것을 의미하고, 어떤 수신 파일 조각 목록(또는, 저장 파일 조각 목록)에 어떤 파일 블록이 포함된다는 것은 해당 수신 파일 조각 목록(또는, 저장 파일 조각 목록)에 해당 파일 블록을 지시하거나 나타내는 정보가 포함되어 있음을 의미한다.
S120 단계에서, 파일 관리 장치(1100)는 분산 저장 장치들(1100a, 1100b, …, 1100k)과의 통신 속도에 따라 파일 조각 목록(수신 파일 조각 목록)들을 수정 또는 다시 스케쥴링한다. S120 단계는 위에서 설명한 알고리즘의 5 단계 내지 21 단계에 해당한다.
구체적으로, 파일 관리 장치(1100)는 분산 저장 장치와의 통신 속도가 빠를수록 대응되는 수신 파일 조각 목록에 더 많은 파일 블록들이 포함되도록 수신 파일 조각 목록들을 수정 또는 스케쥴링하며, 바람직하게는 분산 저장 장치와의 통신 속도의 비에 수신 파일 조각 목록에 포함된 파일 블록 수의 비가 근접 또는 일치하도록 수신 파일 조각 목록들을 수정 또는 스케쥴링한다. 예를 들어, 첫 번째 분산 저장 장치(1100a)와 두 번째 분산 저장 장치(1100b)의 통신 속도의 비가 1:2 이면, 첫 번째 수신 파일 조각 목록(A₁)과 두 번째 수신 파일 조각 목록(A₂)에 포함된 파일 블록들의 비도 1:2가 되도록 수신 파일 조각 목록들을 수정한다. 만약, 수신 파일 조각 목록들에 포함되는 파일 블록들의 비가 수학적으로 정확히 통신 속도의 비에 일치할 수 없는 경우, 파일 블록들의 비가 통신 속도의 비에 최대한 근접한 값이 되도록 수신 파일 조각 목록들이 수정된다.
S120 단계에서, 파일 관리 장치(1100)가 수신 파일 조각 목록들을 주어진 목적에 맞게 수정 또는 다시 스케쥴링하는 구체적인 방법은 위의 알고리즘 5 단계 내지 21 단계에 기재된 방법과 동일하므로, 여기서는 그에 대한 구체적인 설명은 생략한다.
S130 단계에서, 파일 관리 장치(1100)는 파일 조각 목록(수신 파일 조각 목록)들을 참조하여 분산 저장 장치들로부터 파일 조각(수신 파일 조각)들을 수신한다. 앞서 설명한 바와 같이, 파일 관리 장치(1100)는 원본 파일을 복원하는데 필요한 파일 블록들을 수신 파일 조각 목록을 참조하여 복수의 분산 저장 장치들로부터 나누어 수신한다.
예를 들어, 첫 번째 수신 파일 조각 목록에 1, 2, 4, 6, 14, 20 번 파일 블록이 포함되어 있고, 두 번째 수신 파일 조각 목록에 3, 5, 8, 11, 12, 13 번 파일 블록이 포함되어 있다고 가정한다. 이 경우, 파일 관리 장치(1100)는 수신 파일 조각 목록들을 참조하여, 첫 번째 분산 저장 장치(1100a)로부터는 1, 2, 4, 6, 14, 20 번 파일 블록(또는, 그러한 파일 블록들을 포함하는 수신 파일 조각)을 수신하고, 두 번째 분산 저장 장치(1100b)로부터는 3, 5, 8, 11, 12, 13 번 파일 블록(또는, 그러한 파일 블록들을 포함하는 수신 파일 조각)을 수신한다.
동일한 방법으로, 파일 관리 장치(1100)는 k 개의 분산 저장 장치들(1100a, 1100b, …, 1100k)로부터 각각 파일 블록들 또는 수신 파일 조각들을 수신한다. 앞서 설명한 바와 같이, 이렇게 수신된 파일 블록들 또는 수신 파일 조각들에는 원본 파일을 복원하는데 필요한 파일 블록들이 모두 포함되어 있다.
S140 단계에서, 파일 관리 장치(1100)는 S130 단계에서 수신한 파일 블록들 또는 수신 파일 조각들을 잇거나 조합하여 원본 파일을 복원한다. 파일 블록들 또는 파일 조각들을 잇거나 조합하여 원본 파일을 복원하는 구체적인 방법은 여러가지가 있을 수 있다. 예를 들어, 파일 관리 장치(1100)는 한국특허출원 제 10-2013-0016390 호에 기재된 복원 방법에 의해 수신된 파일 블록 또는 수신 파일 조각들로부터 원본 파일을 복원할 수 있다.

이하에서는 구체적인 예를 통해, 본 발명에 따른 수신 파일 조각 목록을 결정 또는 스케쥴링하는 방법이 상세히 설명된다.
도 4a 내지 도 4d는 본 발명의 실시 예에 따른, 파일 조각 목록 결정 방법을 구체적인 예로서 도시한 도면이다. 도 4a 내지 도 4d에서, 파일 관리 장치(예를 들어, 도 1의 1100)는 위에서 설명한 알고리즘 또는 도 3의 순서도에 도시된 결정 방법에 따라 파일 조각 목록(수신 파일 조각 목록)을 결정한다.
먼저, 도 4a에서, 전체 분산 저장 장치는 6개이고, 이 중 4개의 임의의 분산 저장 장치에 저장된 파일 조각을 획득할 수 있으면 원본 파일이 복원 가능하다고 가정한다. 즉, n=6이고, k=4인 것으로 가정한다. 이때, 원본 파일은 총 20 개의 블록으로 구성된다.
제 1 상자(100)에는 전체 분산 저장 장치들(이하, M₁, M₂, M₃, M₄, M₅, M₆라 한다) 각각에 저장된 저장 조각 파일을 나타내는 저장 파일 목록들(S₁, S₂, S₃, S₄, S₅, S₆)이 나타나 있다. 저장 파일 목록들(S₁, S₂, S₃, S₄, S₅, S₆)을 참조하면, 제 1 분산 저장 장치(M₁)에는 원본 파일의 파일 블록(1번 내지 20번 파일 블록)들 중 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 번 파일 블록이 포함되어 있고, 제 6 분산 저장 장치(M₆)에는 원본 파일의 파일 블록들 중 4, 7, 9, 10, 13, 15, 16, 18, 19, 20 번 파일 블록이 포함되어 있음을 알 수 있다.
앞서 설명한 바와 같이, 파일 관리 장치는 전체 분산 장치들 중 4개의 임의의 분산 저장 장치로부터 수신 조각 파일을 획득하면 원본 파일을 복원할 수 있다. 여기서는, 수신 조각 파일을 획득할 4개의 분산 저장 장치로서 각각 제 1, 제 2, 제 3, 제 4 분산 저장 장치(M₁, M₂, M₃, M₄)를 선택하기로 한다.
본 발명에서, 파일 관리 장치는 각 분산 저장 장치들과의 통신 속도에 따라 수신 파일 조각의 구성을 달리한다. 이에 대한 설명을 위해, 여기서는 파일 관리 장치와 제 1, 제 2, 제 3, 제 4 분산 저장 장치(M₁, M₂, M₃, M₄) 사이의 통신 속도를 각각 r₁, r₂, r₃, r₄로 정의하고, r₁, r₂, r₃, r₄는 각각 8:5:5:2의 비율을 갖는 것으로 가정한다.
이상의 내용 및 가정들이 도 4a의 제 1 박스(100)에 도시되어 있다.
초기 수신 파일 조각 목록을 결정하기 위해, 원본 파일을 구성하는 총 20개의 파일 블록을 제 1, 제 2, 제 3, 제 4 분산 저장 장치(M₁, M₂, M₃, M₄)에 각각 대응되는 4개의 수신 파일 조각 목록(A₁, A₂, A₃, A₄)에 할당한다.
실시 예로서, 파일 블록들을 할당할 때, 속도가 가장 빠른 분산 저장 장치에 대응되는 수신 파일 조각 목록에 가능한 많은 파일 블록을 할당하고, 동일한 방법으로 분산 저장 장치의 속도에 따라 순차적으로 나머지 파일 블록들을 할당할 수 있다.
예를 들면, 속도가 가장 빠른 분산 저장 장치는 제 1 분산 저장 장치(M₁)이므로, 제 1 수신 파일 조각 목록(A₁)에는 할당 가능한 모든 파일 블록(즉, S₁에 포함된 모든 파일 블록)을 할당한다. 따라서, 제 1 수신 파일 조각 목록(A₁)에는 제 1 저장 파일 목록(S₁)에 포함된 모든 파일 블록들(1, 2, 3, 4, 5, 6, 7, 8, 9, 10 번 블록)이 할당된다.
그리고, 속도가 다음으로 빠른 제 2 분산 저장 장치(M₂)에 대응되는 제 2 수신 파일 조각 목록(A₂)에는 할당 가능한 모든 파일 블록(S₂)을 할당하되, 중복 파일을 배제하기 위해 제 1 수신 파일 조각 목록(A₁)에 포함된 파일 블록은 제외한다. 결과적으로, 제 2 수신 파일 조각 목록(A₂)에는 제 2 저장 파일 목록(S₂)에서 제 1 수신 파일 조각 목록(A₁)과의 중복 부분을 제외한 파일 블록들(11, 12, 13, 14, 15, 16 번 블록)이 할당된다.
그리고, 속도가 다음으로 빠른 제 3 분산 저장 장치(M₃)에 대응되는 제 3 수신 파일 조각 목록(A₃)에는 할당 가능한 모든 파일 블록(S₃)을 할당하되, 중복 파일을 배제하기 위해 제 1 및 제 2 수신 파일 조각 목록(A₁, A₂)에 포함된 파일 블록은 제외한다. 결과적으로, 제 3 수신 파일 조각 목록(A₃)에는 제 3 저장 파일 목록(S₃)에서 제 1 및 제 2 수신 파일 조각 목록(A₁, A₂)과의 중복 부분을 제외한 파일 블록들(17, 18, 19 번 블록)이 할당된다.
그리고, 속도가 가장 느린 제 4 분산 저장 장치(M₄)에 대응되는 제 4 수신 파일 조각 목록(A₄)에는 할당 가능한 모든 파일 블록(S₄)을 할당하되, 중복 파일을 배제하기 위해 제 1, 제 2 및 제 3 수신 파일 조각 목록(A₁, A₂, A₃)에 포함된 파일 블록은 제외한다. 결과적으로, 제 4 수신 파일 조각 목록(A₄)에는 다른 수신 파일 조각 목록들(A₁, A₂, A₃)에 포함된 파일 블록들을 제외한 나머지 파일 블록(20 번 블록)만이 할당된다.
위와 같은 방법에 의해 결정된 수신 파일 조각 목록들은 제 2 상자(110)에 도시되어 있다.

도 4b 내지 도 4c에는 각 분산 저장 장치들의 통신 속도의 비와 매칭되도록 수신 파일 조각 목록들(A₁, A₂, A₃, A₄)을 스케쥴링하는 방법이 도시된다. 도 4b 내지 도 4c에서는 위에서 설명한 알고리즘 5 단계 내지 21 단계의 반복 단계를 2회 반복함으로써, 수신 파일 조각 목록들(A₁, A₂, A₃, A₄)에 포함된 파일 블록의 수를 대응되는 분산 저장 장치들의 통신 속도의 비와 일치시켰다. 다만, 이는 구체적인 사례에 의존하는 것으로서, 몇 번의 반복 수행을 통해 수신 파일 조각 목록들(A₁, A₂, A₃, A₄)의 스케쥴링을 완료하느냐 하는 것은 사례에 따라 매번 달라질 수 있다.
도 4b 내지 도 4c에서 설명되는 스케쥴링 방법을 개략적으로 설명하면 다음과 같다. 먼저, 분산 저장 장치의 통신 속도를 참조할 때, 대응되는 수신 파일 조각 목록(이하, 제공 목록)에 과잉 파일 블록이 가장 많은 분산 저장 장치(이하, 제공 저장 장치)을 선택한다. 그리고, 반대로, 분산 저장 장치의 통신 속도를 참조할 때, 대응되는 수신 파일 조각 목록(이하, 대응 목록)에 포함된 파일 블록이 가장 과소한 분산 저장 장치(이하, 대응 저장 장치)를 선택한다.
그리고, 대응 저장 장치에 저장된 파일 블록들 중에서 제공 목록과 공통되는 파일 블록들의 인기도를 판단한다. 파일 블록들의 인기도 판단은 수학식 1의 인기도 함수 P(w)를 이용한다. 그리고, 구해진 인기도를 기준으로, 공통되는 파일 블록들 중 인기도 값이 가장 낮은 파일 블록부터 순서대로 제공 목록으로부터 대응 목록으로 옮긴다.
이와 같은 과정을 반복 단계의 조건 또는 조건 단계의 조건이 충족될 때까지 반복함으로써, 모든 수신 파일 조각 목록들에 대한 스케쥴링이 완료된다. 아래에서는, 이러한 방법을 도 4b 및 도 4c의 예를 통해 구체적으로 설명한 내용이 제시된다.
제 3 상자(120)에는, 알고리즘의 제 6 단계 내지 제 21 단계를 1회 반복하는 과정이 나타난다. 이하에서 알고리즘의 제 n 단계는 달리 언급이 없으면 단순히 제 n 단계로 지칭하기로 한다.
제 6 단계는 어떤 수신 파일 목록이 과잉 파일 블록을 많이 포함하는지 또는 가장 과소한지를 판단하기 위한 단계이다. 여기서, 원본 파일의 모든 파일 블록은 20개 이고, 4개의 분산 저장 장치들(M₁, M₂, M₃, M₄)의 통신 속도의 비는 각각 8:5:5:2 이므로, 4개의 분산 저장 장치들(M₁, M₂, M₃, M₄) 각각으로부터 8개, 5개, 5개, 2개의 파일 블록을 수신하면 통신 속도에 비례하여 파일 블록들을 수신하게 된다(이하에서는, 이때의 8개, 5개, 5개, 2개의 파일 블록의 수를 기준 파일 블록의 수라고 한다). 여기서는, 통신 속도의 비와 기준 파일 블록의 수가 서로 동일하므로, 양자를 혼용하여 표현하기로 한다. 다만, 이는 사례마다 달라질 수 있으므로, 그외의 경우에는 통신 속도의 비와 기준 파일 블록의 수를 서로 구분하여야 할 수도 있다.
제 6 단계의 조건에 부합하게 {1, 2, 3, 4}를 재배열하면, {q1, q2, q3, q4}는 각각 {1, 2, 4, 3}가 된다. 즉, 제 1 수신 파일 조각 목록(A₁)에는 10개의 파일 블록이 포함되고 제 1 기준 파일 블록의 수는 8개이므로, 제 1 수신 파일 조각 목록(A₁)은 2개의 과잉 파일 블록을 갖는다. 동일한 방법으로, 제 2 수신 파일 조각 목록(A₂)은 1개의 과잉 파일 블록을 갖고, 제 3 수신 파일 조각 목록(A₃)은 2개의 파일 블록이 과소하며, 제 4 수신 파일 조각 목록(A₄)은 1개의 파일 블록이 과소하다. 따라서, 파일 블록이 과잉인 순서대로 나열하면, {1, 2, 4, 3} 수신 파일 목록의 순서가 되고, 이를 각각 {q1, q2, q3, q4}로 정의하는 것이 제 6 단계에 나타나는 내용이다.
제 7 단계 내지 제 9 단계는 현재 상태에서 수신 파일 조각 목록들의 스케쥴링이 각 분산 저장 장치의 속도의 비와 매칭되도록 이루어졌는지 판단하는 단계이다. 도 4b의 예에서는 파일 블록이 과잉인 수신 파일 조각 목록이 존재하므로, 여기서의 조건문을 충족하지 않는다. 따라서, 제 7 단계 내지 제 9 단계는 통과된다.
제 10 단계에서, k가 4이므로 l에는 4가 대입된다. 그리고, g에는 10-8=2가 대입된다.
제 11 단계 내지 제 16 단계의 반복에서, 가장 과잉인 수신 파일 조각 목록(A₁)의 과잉상태가 해소될 때까지, 과잉 수신 파일 조각 목록(A₁)의 파일 블록들을 다른 수신 파일 조각 목록들로 옮긴다. 이때, 다른 수신 파일 조각 목록들이 파일 블록들을 옮겨받는 우선 순위는 과소한 순서에 따라 결정된다.
도 4b의 예에서는, 제 11 단계 내지 제 16 단계의 반복 단계에 의해 수행되고 나면, 제 1 수신 파일 목록(A₁)은 2, 3, 4, 5, 7, 8, 9, 10 번 블록을 포함하도록 수정되고, 제 3 수신 파일 목록(A₃)은 1, 6, 17, 18, 19 번 블록을 포함하도록 수정된다. 이와 같은 결과는 제 4 상자(130)에 도시되어 있다.
그리고, 제 18 단계 내지 제 20 단계의 조건 단계에 의해 종료 조건을 판단한다. 조건 단계의 조건은 제 11 단계 내지 제 16 단계에서 파일 블록이 전혀 옮겨지지 않은 경우이므로, 이 경우 스케쥴링이 완료된 것으로 판단하고 프로세스는 종료된다.
도 4b의 실시 예에서는 2개의 파일 블록이 옮겨졌으므로, 제 18 단계 내지 제 20 단계는 통과된다.
그리고, 도 4c에서, 제 21 단계의 회귀에 의해, 제 5 단계 이하의 반복 단계가 반복된다. 제 5 단계 이하의 반복 단계가 반복되는 시점에서, 각 수신 파일 조각 목록들의 상태가 제 5 상자(140)에 도시되어 있다. 반복 단계의 수행에 의해 수신 파일 조각 목록들을 수정하는 과정이 제 6 상자(150)에 도시되어 있다. 반복 단계의 수행에 의해 수신 조각 파일 목록들을 수정하는 방법은 도 4b의 제 3 상자(120)에서 설명한 바와 동일하므로 구체적인 내용은 여기서 생략된다. 제 6 상자(150)에 도시된 방법에 따라 제 2 및 제 4 수신 파일 목록(A₂, A₄)이 수정된 결과가 제 7 상자(160)에 도시되어 있다.
도 4d에는 도 4a 내지 도 4c에서 수행된 수신 조각 파일 목록의 수정 또는 스케쥴링의 결과가 도시되어 있다. 도 4d를 참조하면, 제 1 내지 제 4 수신 파일 목록들(A₁, A₂, A₃, A₄)은 각각 8개, 5개, 5개, 2개의 파일 블록들을 포함하므로, 그 크기의 비가 대응되는 분산 저장 장치들의 속도의 비와 일치함을 알 수 있다. 또한, 제 1 내지 제 4 수신 파일 목록들(A₁, A₂, A₃, A₄)은 서로 중복되는 파일 블록을 포함하지 않도록 구성되어 있다.
도 4d에 도시된 수신 파일 목록들(A₁, A₂, A₃, A₄)은 파일 관리 장치에 의해 참조되며, 파일 관리 장치는 수신 파일 목록에 포함된 파일 블록만을 대응하는 분산 저장 장치로부터 수신하도록 제어된다.

이하에서는, 앞서 설명한 스케쥴링 방법에 기초하여, 더욱 빠른 시간 내에 수신 파일 조각들을 수신하는 파일 관리 장치 및 방법을 제안한다. 이하에서, 설명되는 파일 관리 장치 및 방법은 파일 관리 장치와 분산 저장 장치들 사이의 통신 상태가 변경될 때(예를 들어, 통신 속도가 달라질 때), 이미 스케쥴링이 완료된 수신 파일 조각 목록에 대해 재스케쥴링을 수행하여 수신 파일 조각 목록들을 수정하는 구성을 포함한다.
이와 같은 재스케쥴링에 대한 실시 예들이 도 5 및 도 6에서 설명된다.
도 5는 본 발명의 일 실시 예에 따라, 파일 조각 목록들을 재스케쥴링하는 파일 관리 방법을 나타내는 순서도이다.
여기서는, 파일 관리 장치가 분산 저장 장치들로부터 수신 파일 조각을 수신할 때, 소정의 개수(예를 들어, _nC_k-1개)의 파일 블록 단위로 수신 파일 조각을 수신하는 것으로 가정한다.
도 5를 참조하면, 파일 관리 방법은 S210 단계 내지 S270 단계를 포함한다.
S210 단계에서, 파일 관리 장치는 수신 파일 조각 목록들에 대한 스케쥴링을 수행하여 수신 파일 조각 목록들을 결정한다. 이때, 수신 파일 조각 목록들은 서로 중복되는 파일 블록들을 포함하지 않도록, 그리고, 대응되는 분산 저장 장치의 통신 속도가 빠를수록 더 많거나 동일한 수의 파일 블록을 포함하도록 결정된다. S210 단계에서, 수신 파일 조각 목록들을 결정하는 단계(S210)는 도 3에서 설명한 스케쥴링 단계(S110 및 S120)와 실질적으로 동일하다. 따라서, 수신 파일 조각 목록들을 결정하는 구체적인 방법에 대한 설명은 여기서 생략된다.
S220 단계에서, 파일 관리 장치는 수신 파일 조각 목록들을 참조하여 분산 저장 장치로부터 또는 분산 저장 장치들 각각으로부터 동시에 소정의 개수(예를 들어, _nC_k-1개)의 파일 블록들을 수신한다.
S230 단계에서, 파일 관리 장치는 소정의 개수의 파일 블록들의 수신이 기준 시간 이내에 완료되었는지 판단한다. 기준 시간 이내에 완료되었으면, 파일 관리 방법은 S260 단계로 진행한다. 그렇지 않으면, 파일 관리 방법은 S240 단계로 진행한다.
소정의 개수의 파일 블록들의 수신이 기준 시간 이내에 완료되지 않은 경우, S240 단계에서 파일 관리 장치는 수신중인 소정의 개수의 파일 블록들을 삭제한다. 그리고, S240 단계에 이어, 파일 관리 방법은 S250 단계로 진행한다.
S250 단계에서, 파일 관리 장치는 파일 관리 장치와 분산 저장 장치들 사이의 통신 속도에 따라, 수신 파일 조각 목록들을 재스케쥴링한다. 여기서, 파일 관리 장치는 분산 저장 장치들 사이의 통신 속도를 나타내는 통신 상태 정보를 참조할 수 있고, 통신 상태 정보는 소정의 시간 또는 소정의 조건에 따라 새롭게 생성되거나 갱신될 수 있다.
그리고, 여기서 수행되는 재스케쥴링은 S210 단계의 스케쥴링과 동일한 방법에 의해 수행되되, 파일 관리 장치가 수신을 완료한 파일 블록들을 제외한 나머지 파일 블록들에 대해서만 수행된다.
재스케쥴링이 완료되면, 파일 관리 방법은 S220 단계로 복귀한다.
다시 S230 단계로 돌아가서, 소정의 개수의 파일 블록들의 수신이 기준 시간 이내에 완료된 경우, 파일 관리 방법은 S260 단계로 진행한다.
S260 단계에서, 파일 관리 장치는 수신된 파일 조각 또는 수신된 파일 블록을 이용하여 원본 파일을 복원한다. 예를 들어, 파일 관리 장치는 수신된 파일 블록들을 순서에 맞게 이어붙여서 원본 파일을 복원할 수 있다.
S270 단계에서, 파일 관리 장치는 원본 파일의 복원이 완료되었는지 판단한다. 원본 파일의 복원이 완료되었으면, 파일 관리 방법은 종료한다. 원본 파일의 복원이 완료되지 않았으면, 원본 파일의 파일 블록들 중 아직 수신되지 않은 파일 블록들이 있다는 것과 동일하므로, 계속하여 파일 블록들을 수신하기 위해 S220 단계로 복귀한다.
상기와 같은 파일 관리 방법에 따르면, 수신 파일 조각 목록이 결정된 후, 파일 관리 장치와 분산 저장 장치들 사이의 통신 속도가 변화하여도, 수신 파일 조각 목록의 재스케쥴링을 통해 파일 조각을 수신하는데 걸리는 시간을 최소화할 수 있다.
도 6은 본 발명의 다른 일 실시 예에 따라, 파일 조각 목록들을 재스케쥴링하는 파일 관리 방법을 나타내는 순서도이다.
도 6은 도 5의 실시 예와 유사하나, 기준 시간 내에 소정의 파일 블록들의 수신이 완료되지 않을 시, 수신중인 파일 블록들을 처리하는 방법에 있어서 차이가 있다.
여기서도, 파일 관리 장치가 분산 저장 장치들로부터 수신 파일 조각을 수신할 때, 소정의 개수(예를 들어, _nC_k-1개)의 파일 블록 단위로 수신 파일 조각을 수신하는 것으로 가정한다.
도 6을 참조하면, 파일 관리 방법은 S310 단계 내지 S370 단계를 포함한다.
S310 단계에서, 파일 관리 장치는 수신 파일 조각 목록들에 대한 스케쥴링을 수행하여 수신 파일 조각 목록들을 결정한다. 이때, 수신 파일 조각 목록들은 서로 중복되는 파일 블록들을 포함하지 않도록, 그리고, 대응되는 분산 저장 장치의 통신 속도가 빠를수록 더 많거나 동일한 수의 파일 블록을 포함하도록 결정된다. S310 단계에서, 수신 파일 조각 목록들을 결정하는 단계(S310)는 도 3에서 설명한 스케쥴링 단계(S110 및 S120)와 실질적으로 동일하다. 따라서, 수신 파일 조각 목록들을 결정하는 구체적인 방법에 대한 설명은 여기서 생략된다.
S320 단계에서, 파일 관리 장치는 수신 파일 조각 목록들을 참조하여 분산 저장 장치로부터 또는 분산 저장 장치들 각각으로부터 동시에 소정의 개수(예를 들어, _nC_k-1개)의 파일 블록들을 수신한다.
S330 단계에서, 파일 관리 장치는 소정의 개수의 파일 블록들의 수신이 기준 시간 이내에 완료되었는지 판단한다. 기준 시간 이내에 완료되었으면, 파일 관리 방법은 S360 단계로 진행한다. 그렇지 않으면, 파일 관리 방법은 S340 단계로 진행한다.
소정의 개수의 파일 블록들의 수신이 기준 시간 이내에 완료되지 않은 경우, S340 단계에서 파일 관리 장치는 수신중인 소정의 개수의 파일 블록들 중 수신이 완료되지 않은 파일 블록들을 다른 분산 저장 장치로부터 수신한다. 예를 들어, 어떤 분산 저장 장치로부터 5개의 파일 블록들을 수신하는 경우, 2개의 파일만이 수신된 상태에서 기준 시간이 초과하면, 파일 관리 장치는 아직 수신되지 않은 나머지 3개의 블록들을 다른 분산 저장 장치로부터 수신한다.
실시 예로서, 파일 관리 장치는 나머지 3개의 블록들을 다른 분산 저장 장치들 중 파일 관리 장치와의 통신 속도가 가장 빠른 분산 저장 장치로부터 수신할 수 있다. 그리고, S340 단계에 이어, 파일 관리 방법은 S350 단계로 진행한다.
S350 단계에서, 파일 관리 장치는 파일 관리 장치와 분산 저장 장치들 사이의 통신 속도에 따라, 수신 파일 조각 목록들을 재스케쥴링한다. 여기서, 파일 관리 장치는 분산 저장 장치들 사이의 통신 속도를 나타내는 통신 상태 정보를 참조할 수 있고, 통신 상태 정보는 소정의 시간 또는 소정의 조건에 따라 새롭게 생성되거나 갱신될 수 있다.
그리고, 여기서 수행되는 재스케쥴링은 S310 단계의 스케쥴링과 동일한 방법에 의해 수행되되, 파일 관리 장치가 수신을 완료한 파일 블록들을 제외한 나머지 파일 블록들에 대해서만 수행된다.
재스케쥴링이 완료되면, 파일 관리 방법은 S320 단계로 복귀한다.
다시 S330 단계로 돌아가서, 소정의 개수의 파일 블록들의 수신이 기준 시간 이내에 완료된 경우, 파일 관리 방법은 S360 단계로 진행한다.
S360 단계에서, 파일 관리 장치는 수신된 파일 조각 또는 수신된 파일 블록을 이용하여 원본 파일을 복원한다. 예를 들어, 파일 관리 장치는 수신된 파일 블록들을 순서에 맞게 이어붙여서 원본 파일을 복원할 수 있다. 파일 관리 장치가 수신된 파일 조각 또는 수신된 파일 블록을 이용하여 원본 파일을 복원하는 방법에는 여러가지가 있을 수 있으며, 그 중 하나로서 한국특허출원 제10-2013-0016390호에 기재된 복원 방법이 사용될 수 있다.
S370 단계에서, 파일 관리 장치는 원본 파일의 복원이 완료되었는지 판단한다. 원본 파일의 복원이 완료되었으면, 파일 관리 방법은 종료한다. 원본 파일의 복원이 완료되지 않았으면, 원본 파일의 파일 블록들 중 아직 수신되지 않은 파일 블록들이 있다는 것과 동일하므로, 계속하여 파일 블록들을 수신하기 위해 S320 단계로 복귀한다.
상기와 같은 파일 관리 방법에 따르면, 수신 파일 조각 목록이 결정된 후, 파일 관리 장치와 분산 저장 장치들 사이의 통신 속도가 변화하여도, 수신 파일 조각 목록의 재스케쥴링을 통해 파일 조각을 수신하는데 걸리는 시간을 최소화할 수 있다.

이상에서 설명된 실시 예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어, 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 이때, 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 또는 그것들의 조합을 포함할 수 있다. 컴퓨터 판독 가능한 기록 매체는 예를 들어, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크 (floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 저장 장치를포함할 수 있다. 이러한 저장 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈에 의해 작동하도록 구성될 수 있으며, 그 반대도 마찬가지이다. 또한, 여기서 설명된 프로그램 명령어는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.

본 발명의 상세한 설명에서는 구체적인 실시 예를 들어 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한 각 실시 예는 여러 가지 형태로 변형될 수 있다.
또한, 여기서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 발명의 범위는 상술한 실시 예에 국한되어 정해져서는 안되며 후술하는 특허 청구범위뿐만 아니라 이 발명의 특허 청구범위와 균등한 것들에 의해 정해져야 한다.
The following detailed description of the invention refers to the accompanying drawings, which illustrate, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain features, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in connection with an embodiment. It is also to be understood that the position or arrangement of the individual components within each described embodiment may be varied without departing from the spirit and scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is to be limited only by the appended claims, along with the full scope of equivalents to which such claims are entitled, if properly explained. In the drawings, like reference numerals refer to the same or similar functions throughout the several views.
Meanwhile, the prior patent application of Korean Patent Application No. 10-2013-0016390 of the present invention can be cited as a whole as a reference, and those not described in the present invention for distributed storage and restoration of files are disclosed in Korean Patent Application 10-2013-0016390 can be borrowed. For example, file fragments stored in distributed storage devices according to the present invention may be generated by the method described in Korean Patent Application No. 10-2013-0016390. In addition, in the present invention, the file management apparatus receives file fragments from distributed storage devices and can restore an original file from file fragments received by the restoration method described in Korean Patent Application No. 10-2013-0016390.
1 is a diagram showing a file distribution management system 1000 and its internal configuration according to an embodiment of the present invention. The file distribution management system 1000 includes a file management apparatus 1100 and n file

distribution storage devices

1100a, 1100b, ..., 1100n.
The file distribution management system 1000 distributes and stores a plurality of file fragments generated from the original file to n distributed

storage devices

1100a, 1100b, ..., 1100n. Then, in order for the file distribution management system 1000 to restore the original file from the distributed file fragments, k pieces of file fragments (k is an integer equal to or larger than 0) distributed in any distributed storage devices . That is, when acquiring k pieces of file in the file distribution management system 1000, the original file can be completely restored but the original file is not completely restored when acquiring fewer than k pieces of file fragments. A detailed description of the dispersion and restoration principle related thereto is described in detail in Korean Patent Application No. 10-2013-0016390.
Each of the distributed

storage devices

1100a, 1100b, ..., 1100n stores a file fragment generated from the original file. In response to the control of the file management apparatus 1100 or the request from the file management apparatus 1100, the file management apparatus 1100 provides some or all of the stored file fragments to the file management apparatus 1100. For example, file fragments generated by the method described in Korean Patent Application No. 10-2013-0016390 can be distributedly stored in the distributed

storage devices

1100a, 1100b, ..., 1100n.
It is sufficient that the distributed

storage devices

1100a, 1100b, ..., 1100n can be configured as devices capable of storing individual file fragments separately without any special function. 1, the distributed

storage devices

1100a, 1100b, ..., and 1100n are physically separated from each other, but the present invention is not limited thereto. The distributed

storage devices

1100a, 1100b, But may also be logically separated from the physical device.
In one embodiment, each of the distributed

storage devices

1100a, 1100b, ..., 1100n may have a separate communication unit or communication interface for communication with the file management apparatus 1100. [
The file management apparatus 1100 receives the file fragments from the distributed storage devices, and restores the original file using the received file fragments. A more detailed description of the file management device 1100 is provided below with reference to FIG.
2 is a diagram showing an internal configuration of a file management apparatus 100 according to an embodiment of the present invention. The file management apparatus 1100 may include a communication unit 1110, a controller 1120, a scheduler 1130, and a storage unit 1140, but is not limited thereto.
The communication unit 1110 performs communication between the file management apparatus 1100 and an external object. For example, the file management apparatus 1100 can communicate with the distributed

storage devices

1100a, 1100b, ..., 1100n (see FIG. 1) via the communication unit 1110. [ The communication unit 1110 may include any communication means capable of networking or communicating with the outside, and may include a communication interface accompanied by such communication means. The communication interface or communication method used by the communication unit 1110 may include wired communication, wireless communication, 3G, 4G, or various other communication interfaces or communication methods.
As an example, the communication unit 1110 may include a communication status manager 1110 that manages communication status information between the file management apparatus 1100 and an external object. The communication status information may include information on the channel status or communication speed between the file management apparatus 1100 and the external object. In addition to simply managing and storing communication status information, the communication status manager 1111 may detect new channel status or communication speed with an external object to generate new communication status information or update existing communication status information .
Although the communication status manager 1111 has been described as being included in the communication unit 1110, the communication status manager 1111 is not limited to this, and the communication status manager 1111 can be located anywhere in the file management apparatus 1100, Or may be arranged as a separate module separate from the file management apparatus 1100.
The controller 1120 controls the overall operation of the file management apparatus 1100 and the

other modules

1110, 1130, and 1140 constituting the file management apparatus. The controller 1120 can function as a central processing unit that performs operations necessary for driving the file management apparatus 1100. [
The scheduler 1130 generates and manages a list of file blocks or file fragments to be received by the file management apparatus 1100. The controller 1120 or the communication unit 1110 provides necessary information so that file blocks or file fragments can be received from the distributed

storage devices

1100a, 1100b, ..., 1100n according to the generated file lists. A specific method by which the scheduler 1130 generates and manages a file fragment list will be described in more detail later in FIG.
The storage unit 1140 stores the data received in the file management apparatus 1100 or the data processed in the file management apparatus 1100. For example, the storage unit 1140 may store data obtained in the process of performing the functions of the communication status manager 1111, the controller 1120, or the scheduler 1130.
The present invention relates to a file management apparatus and method for restoring an original file by receiving fragmented and stored file fragments, and more particularly, to a file management apparatus and method for managing a communication speed between a file management apparatus 1100 and distributed

storage devices

1100a, 1100b, The present invention relates to a technique for minimizing the time and communication cost required for restoration of original files by receiving a fragment of a file.
For example, receiving all of the file blocks stored in a certain number (k) of distributed storage devices may result in duplicate file blocks being received and thereby increasing communication costs. Accordingly, in the present invention, file fragment lists are scheduled so that duplicate file blocks are not received. Here, the file fragment list represents a list of information on file blocks to be received from the distributed storage device, and a set of file blocks received from one distributed storage device can be defined as one file fragment. For example, when receiving a file fragment from a distributed storage device, the file fragment may include a plurality of file blocks, and the file fragment list may include address or size information of file blocks included in the file fragment .
In addition, when simply receiving file fragments from k distributed storage devices, a large number of file blocks are received from a distributed storage device having a low communication speed and a small number of file blocks are received from a distributed storage device having a high communication speed . In this case, the time taken to receive the entire file block can be very large. Therefore, according to the present invention, a relatively large number of file blocks are received from a distributed storage device having a high communication speed and a relatively small number of file blocks are received from an distributed device having a low communication speed, depending on the communication speed with each distributed storage device And schedules file fragment lists.

Hereinafter, a method for scheduling file fragment lists to be received according to the present invention and receiving file fragments according to a scheduled file fragment list will be described.
For the purpose of the following description, some terms are defined as follows.
n: number of distributed storage devices
M_One, M₂, ... , M_n : A total of n distributed storage devices
File Management Device: A device that receives a piece of a file from distributed storage devices
Stored file fragments: For ease of explanation, the original file fragments stored in each of the distributed storage devices assume that one stored file fragment is stored in each distributed storage device.
Received File Piece: For the convenience of description, the file fragments received by the file management apparatus from each of the distributed apparatuses, it is assumed that one received file fragment is received from each distributed storage apparatus. At this time, the received file fragments received from certain distributed storage devices can be generated based on the stored file fragments stored in the distributed storage device.
k: The minimum number of save file fragments needed to successfully recover the original file. Where k is an integer greater than 0 and less than or equal to n.
S_i: i-th distributed storage device M_iA storage file fragment list indicating file blocks included in the storage file fragment (i-th storage file fragment)
r_One, r₂, ... , r_k: A file management device and k distributed storage devices M_One, M₂, ... , M_k Speed ratio between. However, r_One ≥ r₂ ≥ ... ≥ r_k And r_One + r₂ + ... + r_k =

to be.
A_i: When the file management device is the i-th distributed storage device M_i(I < th > received file fragment) received from the received file fragment list
W: {1, 2, 3, ... ,_nC_k-1}
However, for convenience of explanation, it is assumed that all the file blocks included in the stored and received file fragments have the same size (for example, 4 bytes). Therefore, as the size of the file becomes larger, the number of file blocks increases.

The technique proposed by the present invention is such that the file management apparatus includes k distributed storage devices M_One, M₂, ... , M_kThe received file fragments do not include file blocks that are overlapped with each other. Further, by adjusting the number of file blocks (or the size of the received file fragments) received from each distributed storage device according to the communication speed between the file management device and the distributed storage device, the file management device can receive To minimize the time it takes.
In order that the pieces of received files received by the file management apparatus do not include file blocks that overlap with each other, it is preferable that A (i, j)_i ∩ A_j = Must be empty.
Also, considering the communication speed, as the file management device receives more file blocks from the distributed storage device with a faster communication speed, the time taken to receive the entire received file fragments is reduced. Theoretically, when the ratio of the communication speed between the file management apparatus and each of the distributed storage devices is equal to the ratio of the number of file blocks to be received from each distributed storage device (or the size of the received file fragment) The time for receiving the data is the smallest.
Hereinafter, a description will be given of a file fragment list (a list of received file fragments, A_i) Are determined or scheduled.
First, a popularity function P: W? Z as in Equation (1) is defined.

The popularity function is a function that indicates how many of the plurality of distributed storage devices the file block is stored in common to the distributed storage devices.
In the present invention, the file management apparatus includes n distributed storage devices M_One, M₂, M₃, ... , M_nOf k distributed storage devices M_p1, M_p2, M_p3, ... , M_pk), Where k is the total number of received file fragments (where r_p1≥r_p2≥ ... ≥r_pk). At this time, k pieces of received file fragments are stored in k distributed storage devices M_p1, M_p2, ... , M_pkLt; RTI ID = 0.0 > k) < / RTI >
Under such a premise, a method for the file management apparatus to determine or schedule received file fragment lists is determined by the following algorithm.
1: A_p1 To S_p1Assign & A_p1 = S_p1)do.
2: (start of loop) j = 2 to k
3: A_pjTo S_pj\ (A_p1 ∪A_p2 ∪ ... ∪A_{p {j-1}}).
4: (end of loop)
5: (starting loop)
6: {p1, p2, ... , pk} is rearranged as {q1, q2, ... , qk}, where {q1, q2, ... , qk} is defined to satisfy the following equation.
| A_q1| - r_q1 ≥ | A_q2| - r_q2 ≥ ... ≥ | A_qk| - r_qk
Here, the meaning of | X | means the size of the set X or the number of elements.
7: (conditional statement) If | A_q1| = R_q1 If so,
8: (A_One, A₂, ... , A_k) As received file fragment lists and ends the process
9: (end condition)
10: Substitute k for t (t = k), A for g_q1| - r_q1 (G = | A_q1| - r_q1)
11: (start of loop) r_qt > | A_qtRepeat for |
12: Set A_q1∩ S_qtThe order of rearranging the elements of {π_One, π₂, ... , π_μ}, Where {π_One, π₂, ... , π_μ} Is defined to satisfy the following expression.
μ = | A_q1∩ S_qt|, P (?_One)? P (?₂) ≤ ... ? P (?_μ)
13: min {μ, | A_q1| - r_q1,r_qt - | A_qt|}
(υ = min {μ, | A_q1| - r_q1,r_qt - | A_qt|})
14: A_q1A_q1\ {Π_One, π₂, ... , π_υ})
(A_q1= A_q1\ {Π_One, π₂, ... , π_υ}}
15: A_qtA_qt∪ {π_One, π₂, ... , π_υ})
(A_qt= A_qt∪ {π_One, π₂, ... , π_υ}}
16: t-1 is substituted for t (t = t-1)
17: (end of loop)
18: (start condition) If g = | A_q1| - r_q1If so,
19: (A_p1, A_p2, ... , A_pk) As received file fragment lists and ends the process
20: (end condition)
21: (end of loop)
Here, r_p1, r_p2, ... , r_pkIs equal to the total number of file blocks included in the received file fragments. If not, then r_p1, r_p2, ... , r_pkShould be understood as the value divided by the total file blocks included in the received file fragments so as to be proportional to the speed of each distributed storage device_OneTh distributed storage device, p₂Th distributed storage, ... , p_k Lt; RTI ID = 0.0 > r < / RTI >_p1 : p_r2 : ... : r_pk And r_p1, r_p2, ... , r_pk =_nC_k-1).
The file management apparatus 1100 (see FIG. 1) according to the present invention generates or schedules a received file fragment list according to the above-described algorithm, and generates k distributed

storage devices

1100a, 1100b, ..., 1100k, see Figure 1).
According to the above-described algorithm, k pieces of received file fragments A (1100a, 1100b, ..., 1100k) expressed in a set form corresponding to a plurality of distributed storage devices 1100a,_p1, A_p2, ... , A_pkAt this time, the size ratio of the received file fragment lists becomes close to or equal to the communication rate ratio between the file management apparatus and the distributed

storage devices

1100a, 1100b, ..., 1100n.

3 is a flowchart illustrating a file management method for a file management apparatus according to an embodiment of the present invention, which receives file fragments (received file fragments) from distributed storage devices and restores an original file.
The file management method of FIG. 3 includes steps S110 to S140. Among them, steps S110 and S120 constitute a scheduling step of determining or scheduling a file fragment list (received file fragment list) according to the above-described algorithm.
In step S110, the file management apparatus 1100 (see FIG. 1) determines a received file fragment list corresponding to each of the plurality of distributed

storage devices

1100a, 1100b, ..., 1100k (see FIG. 1). Herein, the received file fragment list may have a containment relationship included in the corresponding stored file fragment list.
In step S110, the file management apparatus 1100 determines a received file fragment list such that each received file fragment list does not indicate duplicated file blocks. Step S110 corresponds to steps 1 to 4 of the algorithm described above.
More specifically, the file management apparatus 1100 includes P_One Receiving file fragment list (A_p1) Contains P_One List of all saved file fragments (S_p1), And P₂ Receiving file fragment list (A_p2) Contains P₂ & Save file fragment list_p2) To P_One Receiving file fragment list (A_p1) And subtracting the overlapping portion (S_p2\ A_p1). Subsequently, in the same manner, P_kReceiving file fragment list (A_pk) Contains P_k& Save file fragment list_pk) To P_OneReceiving file fragment list (A_p1) To P_{k-1}List of saved file fragments_{p {k-1}}) And subtracting the overlapping portion (S_k\ (A_p1 ∪A_p2 ∪ ... ∪A_{p {k-1}}).
By the above method, all the file blocks (included in the storage file fragments) stored in the k distributed storage devices are allocated to k received file fragment lists, and each received file fragment lists are allocated to the file blocks .
Herein, being assigned means that a list of received file fragments includes file blocks, and the inclusion of a certain file block in a received file fragment list (or a stored file fragment list) means that the received file fragment list Quot; file fragment list ") includes information indicating or indicating the corresponding file block.
In step S120, the file management apparatus 1100 corrects or re-schedules the file fragment list (received file fragment lists) according to the communication speed with the distributed

storage devices

1100a, 1100b, ..., 1100k. Step S120 corresponds to steps 5 to 21 of the algorithm described above.
Specifically, the file management apparatus 1100 modifies or schedules the received file fragment lists so that the higher the communication speed with the distributed storage device, the more the file blocks are included in the corresponding received file fragment list, Or the ratio of the number of file blocks included in the received file fragment list is close to or coincides with the ratio of the communication speed of the received file fragment list. For example, if the ratio of the communication speed between the first distributed storage device 1100a and the second distributed storage device 1100b is 1: 2, the first received file fragment list A_One) And the second received file fragment list (A₂) Of the received file fragments to be 1: 2. If the ratio of the file blocks included in the received file fragment lists can not mathematically match the ratio of the communication speed accurately, the received file fragment lists are modified such that the ratio of the file blocks is as close as possible to the ratio of the communication speed.
In step S120, the specific method for the file management apparatus 1100 to modify or reschedule the received file fragment lists according to a given purpose is the same as the method described in steps 5 to 21 of the above algorithm. do.
In step S130, the file management apparatus 1100 receives the file fragments (received file fragments) from the distributed storage devices by referring to the file fragment list (received file fragment lists). As described above, the file management apparatus 1100 divides the file blocks necessary for restoring the original file from the plurality of distributed storage devices by referring to the received file fragment list.
For example, if the first received file fragment list contains file blocks 1, 2, 4, 6, 14, 20 and the second received file fragment list contains files 3, 5, 8, 11, 12, It is assumed that the block is included. In this case, the file management apparatus 1100 refers to the received file fragment lists, and from the first distributed

storage apparatus

1100a, 1, 2, 4, 6, 14 and 20 file blocks And receives the 3, 5, 8, 11, 12, and 13 file blocks (or the received file fragments including such file blocks) from the second distributed storage device 1100b.
In the same way, the file management apparatus 1100 receives file blocks or received file fragments from k distributed

storage devices

1100a, 1100b, ..., 1100k, respectively. As described above, the received file blocks or received file fragments include all the file blocks necessary for restoring the original file.
In step S140, the file management apparatus 1100 restores the original file by adding or combining the file blocks or the received file fragments received in step S130. There are various concrete methods for restoring the original file by adding or combining file blocks or file fragments. For example, the file management apparatus 1100 can restore the original file from the received file block or received file fragments by the restoration method described in Korean Patent Application No. 10-2013-0016390.

Hereinafter, a method for determining or scheduling a received file fragment list according to the present invention will be described in detail with reference to specific examples.
4A to 4D are diagrams illustrating a method of determining a file fragment list according to an exemplary embodiment of the present invention. 4A to 4D, a file management apparatus (for example, 1100 in FIG. 1) determines a file fragment list (received file fragment list) according to the algorithm described above or the determination method shown in the flowchart of FIG.
First, in FIG. 4A, it is assumed that the original file can be restored if 6 pieces of the entire distributed storage devices can be obtained and 4 pieces of file stored in any of the distributed storage devices can be acquired. That is, it is assumed that n = 6 and k = 4. At this time, the original file is composed of 20 blocks in total.
The first box 100 is provided with all distributed storage devices (hereinafter referred to as M_One, M₂, M₃, M₄, M₅, M₆Storage file lists S < 1 >_One, S₂, S₃, S₄, S₅, S₆). Saved file lists_One, S₂, S₃, S₄, S₅, S₆), The first distributed storage device M_OneIncludes file blocks 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 among the file blocks (file blocks 1 to 20) of the original file, (M₆) Includes file blocks of 4, 7, 9, 10, 13, 15, 16, 18, 19, 20 among the file blocks of the original file.
As described above, the file management apparatus can restore the original file by acquiring the received fragment file from any of four distributed storage devices among the entire distributed devices. Here, as four distributed storage devices to acquire the reception fragment file, first, second, third, and fourth distributed storage devices M_One, M₂, M₃, M₄) Is selected.
In the present invention, the file management apparatus changes the configuration of the received file fragments according to the communication speed with the respective distributed storage devices. For the purpose of explanation, here, the file management apparatus and the first, second, third, and fourth distributed storage devices M_One, M₂, M₃, M₄) Is expressed as r_One, r₂, r₃, r₄And r_One, r₂, r₃, r₄Are assumed to have a ratio of 8: 5: 5: 2, respectively.
The above contents and assumptions are shown in the first box 100 of FIG. 4A.
In order to determine an initial received file fragment list, a total of 20 file blocks constituting the original file are divided into first, second, third and fourth distributed storage devices M_One, M₂, M₃, M₄) Corresponding to the four received file fragment lists A_One, A₂, A₃, A₄.
As an embodiment, when allocating file blocks, it is possible to allocate as many file blocks as possible to the received file fragment list corresponding to the distributed storage device having the fastest speed, and sequentially allocate the remaining file blocks in accordance with the speed of the distributed storage device Can be assigned.
For example, the fastest distributed storage device is a first distributed storage device M_One), The first received file fragment list (A_One) Contains all allocatable file blocks (i.e., S_OneAll of the file blocks included in the file). Therefore, the first received file fragment list A_One) Contains a first storage file list S_One(1, 2, 3, 4, 5, 6, 7, 8, 9, 10 blocks)
Then, the second distributed storage device M₂) Corresponding to the second received file fragment list A₂) Contains all allocatable file blocks S₂), But in order to exclude duplicate files, the first received file fragment list A_One) Are excluded. As a result, the second received file fragment list A₂) Contains a second storage file list S₂) To the first received file fragment list A_One(11th, 12th, 13th, 14th, 15th, and 16th blocks) except for the overlapped portion with the file block.
Then, the third distributed storage device M₃) Corresponding to the third received file fragment list A₃) Contains all allocatable file blocks S₃), But in order to exclude duplicate files, the first and second received file fragment lists A_One, A₂) Are excluded. As a result, the third received file fragment list A₃) Has a third storage file list S₃) From the first and second received file fragment lists A_One, A₂(Blocks 17, 18, and 19) except for the overlapping portion with the file blocks (blocks 17, 18, and 19).
Then, the fourth distributed storage device M₄) Corresponding to the fourth received file fragment list A₄) Contains all allocatable file blocks S₄), But the first, second and third received file fragment lists A_One, A₂, A₃) Are excluded. As a result, the fourth received file fragment list A₄) Includes other received file fragment lists (A_One, A₂, A₃(Block 20) except for the file blocks included in the file block.
The received file fragment lists determined by the above method are shown in the second box 110. [

4B to 4C, the reception file fragment lists A_One, A₂, A₃, A₄) Is shown. 4B to 4C, the iterative steps 5 to 21 of the above-described algorithm are repeated twice to obtain the received file fragment lists A_One, A₂, A₃, A₄) Is equal to the ratio of the communication speeds of the corresponding distributed storage devices. However, this is dependent on a specific case, and the received file fragment lists A_One, A₂, A₃, A₄) May be different each time depending on the case.
The scheduling method illustrated in FIGS. 4B through 4C will be briefly described as follows. First, when referring to the communication speed of the distributed storage device, a distributed storage device (hereinafter referred to as a providing storage device) having the largest number of redundant file blocks is selected in the corresponding received file slip list (hereinafter, referred to as a provided list). Conversely, when referring to the communication speed of the distributed storage device, a distributed storage device (hereinafter referred to as a corresponding storage device) having the least file block included in the corresponding received file fragment list (hereinafter referred to as a corresponding list) is selected.
Then, the popularity of the file blocks common to the provided list among the file blocks stored in the corresponding storage device is determined. The determination of the popularity of the file blocks uses the popularity function P (w) in Equation (1). Then, based on the obtained popularity, the file blocks having the lowest popularity value among the common file blocks are sequentially moved from the provided list to the corresponding list.
By repeating this process until the condition of the iteration step or the condition of the condition step is satisfied, the scheduling for all the received file fragment lists is completed. Hereinafter, this method is specifically described with reference to the examples of FIGS. 4B and 4C.
In the third box 120, steps 6 to 21 of the algorithm are repeated one time. Hereinafter, the n-th stage of the algorithm will be referred to simply as the n-th stage unless otherwise stated.
Step 6 is a step for determining which received file list contains a large number of redundant file blocks or the least redundant file block. Here, all the file blocks of the original file are 20, and four distributed storage devices M_One, M₂, M₃, M₄) Are 8: 5: 5: 2, respectively, the ratio of the communication speeds of the four distributed storage devices M_One, M₂, M₃, M₄5, 5, and 2 file blocks from each of the 8, 5, 5, and 2 file blocks, respectively, the file blocks are received in proportion to the communication speed The number is referred to as the number of reference file blocks). Here, since the ratio of the communication speed and the number of the reference file blocks are equal to each other, they are expressed in a mixed manner. However, since this may vary from case to case, in other cases, it may be necessary to distinguish the communication speed ratio and the number of reference file blocks.
If {1, 2, 3, 4} are rearranged in accordance with the condition of the sixth step, {q1, q2, q3, q4} are {1, 2, 4, 3} respectively. That is, the first received file fragment list A_One) Contains 10 file blocks and the number of first reference file blocks is 8, the first received file fragment list A_OneHas two redundant file blocks. In the same manner, the second received file fragment list A₂Has one redundant file block, and the third received file fragment list A₃), The two file blocks are inefficient, and the fourth received file fragment list A₄) Is less than one file block. Therefore, if the file blocks are listed in an excessive order, the order of the received file list is {1, 2, 4, 3}, and defining {q1, q2, q3, q4} to be.
The seventh through ninth steps are for judging whether the scheduling of the received file fragment lists in the current state is made to match the rate of each distributed storage device. In the example of FIG. 4B, there is a received file fragment list in which the file block is excessive, and therefore the conditional statement is not satisfied. Therefore, the seventh to ninth steps are passed.
In the tenth step, since k is 4, 4 is assigned to l. Then, 10-8 = 2 is substituted for g.
In the repetition of steps 11 to 16, the most excessive received file fragment list A_OneUntil the excess state of the excess received file fragment list A_One) To other received file fragment lists. At this time, the order of priority for the other received file fragment lists to move the file blocks is determined according to the order of least order.
In the example of FIG. 4B, when the processes of steps 11 to 16 are repeated, the first received file list A_OneIs modified to include 2, 3, 4, 5, 7, 8, 9, and 10 blocks, and the third received file list A₃) Is modified to include 1, 6, 17, 18, and 19 blocks. This result is shown in the fourth box 130.
Then, the termination condition is judged by the conditional steps of the 18th step to the 20th step. In the conditional step, since the file blocks are not moved at all in steps 11 to 16, it is determined that the scheduling is completed and the process is terminated.
In the embodiment of FIG. 4B, since two file blocks have been moved, steps 18 to 20 are passed.
Then, in Fig. 4C, by the regression of the twenty-first step, the repetition steps after the fifth step are repeated. At the point when the iterative steps after step 5 are repeated, the status of each received file fragment lists is shown in the fifth box 140. [ The process of modifying the received file fragment lists by performing the iteration step is shown in the sixth box 150. [ The method of modifying the received fragment file lists by performing the iterative steps is the same as that described in the third box 120 of FIG. 4B, and therefore, detailed contents thereof are omitted here. The second and fourth received file lists A < RTI ID = 0.0 > (A) < / RTI &₂, A₄The modified result is shown in the seventh box 160.
FIG. 4D shows the result of the modification or scheduling of the received fragment file list performed in FIGS. 4A to 4C. Referring to FIG. 4D, the first to fourth received file lists A_One, A₂, A₃, A₄) Includes 8, 5, 5, and 2 file blocks, respectively, so that the ratio of the sizes thereof is consistent with the ratio of the speeds of the corresponding distributed storage devices. Also, the first to fourth received file lists A_One, A₂, A₃, A₄) Are configured not to include file blocks that overlap with each other.
The received file lists A_One, A₂, A₃, A₄) Is referred to by the file management device, and the file management device is controlled to receive only the file blocks included in the received file list from the corresponding distributed storage device.

Hereinafter, a file management apparatus and method for receiving received file fragments within a shorter time based on the above-described scheduling method are proposed. Hereinafter, the file management apparatus and method described herein will be described with respect to a list of received file fragments that have already been subjected to the scheduling process when the communication status between the file management apparatus and the distributed storage devices is changed (for example, And performing scheduling to modify received file fragment lists.
Embodiments of such re-scheduling are described in Figures 5 and 6.
5 is a flowchart illustrating a file management method for rescheduling file fragment lists according to an embodiment of the present invention.
Here, when the file management apparatus receives the received file fragments from the distributed storage devices, a predetermined number (for example,_nC_k-1The number of received file fragments is assumed to be in units of file blocks.
Referring to FIG. 5, the file management method includes steps S210 to S270.
In step S210, the file management apparatus performs scheduling on received file fragment lists to determine received file fragment lists. At this time, the received file fragment lists are determined so as not to include file blocks overlapping each other, and as the communication speed of the corresponding distributed storage device increases, more or equal number of file blocks are included. In step S210, the step S210 of determining the received file fragment lists is substantially the same as the scheduling steps S110 and S120 described in FIG. Therefore, a description of a specific method for determining received file fragment lists is omitted here.
In step S220, the file management device refers to the received file fragment lists and simultaneously obtains a predetermined number (for example, from the distributed storage device or from each of the distributed storage devices)_nC_k-1&Lt; / RTI >
In step S230, the file management apparatus determines whether the reception of the predetermined number of file blocks is completed within the reference time. If it is completed within the reference time, the file management method proceeds to step S260. Otherwise, the file management method proceeds to step S240.
If the reception of the predetermined number of file blocks is not completed within the reference time, the file management apparatus deletes the predetermined number of file blocks being received in step S240. Then, in step S240, the file management method proceeds to step S250.
In step S250, the file management apparatus reschedules the received file fragment lists according to the communication speed between the file management apparatus and the distributed storage apparatuses. Here, the file management apparatus may refer to the communication status information indicating the communication speed between the distributed storage devices, and the communication status information may be newly generated or updated according to a predetermined time or predetermined condition.
The rescheduling performed here is performed in the same manner as the scheduling in step S210, but is performed only for the remaining file blocks except for the file blocks for which the file management apparatus has completed the reception.
When the rescheduling is completed, the file management method returns to step S220.
Returning to step S230, if the reception of a predetermined number of file blocks is completed within the reference time, the file management method proceeds to step S260.
In step S260, the file management apparatus restores the original file using the received file block or the received file block. For example, the file management apparatus can restore the original file by sequentially connecting the received file blocks.
In step S270, the file management apparatus determines whether restoration of the original file is completed. When restoration of the original file is completed, the file management method ends. If the restoration of the original file is not completed, it is the same as that of the file blocks of the original file that have not yet been received, so that the process returns to step S220 to continue to receive the file blocks.
According to the above-described file management method, even if the communication speed between the file management apparatus and the distributed storage devices changes after the received file fragment list is determined, the time required to receive the file fragment through the rescheduling of the received file fragment list Can be minimized.
6 is a flowchart illustrating a file management method for rescheduling file fragment lists according to another embodiment of the present invention.
FIG. 6 is similar to the embodiment of FIG. 5 but differs in the method of processing the file blocks being received when the reception of certain file blocks is not completed within the reference time.
Here again, when the file management apparatus receives the received file fragments from the distributed storage devices, a predetermined number (for example,_nC_k-1The number of received file fragments is assumed to be in units of file blocks.
Referring to FIG. 6, the file management method includes steps S310 to S370.
In step S310, the file management apparatus performs scheduling on received file fragment lists to determine received file fragment lists. At this time, the received file fragment lists are determined so as not to include file blocks overlapping each other, and as the communication speed of the corresponding distributed storage device increases, more or equal number of file blocks are included. In step S310, the step S310 of determining the received file fragment lists is substantially the same as the scheduling steps S110 and S120 described in FIG. Therefore, a description of a specific method for determining received file fragment lists is omitted here.
In step S320, the file management device refers to the received file fragment lists and simultaneously obtains a predetermined number (for example, a predetermined number) from the distributed storage device or from each of the distributed storage devices,_nC_k-1&Lt; / RTI >
In step S330, the file management apparatus determines whether the reception of the predetermined number of file blocks is completed within the reference time. If it is completed within the reference time, the file management method proceeds to step S360. Otherwise, the file management method proceeds to step S340.
If the reception of the predetermined number of file blocks is not completed within the reference time, the file management apparatus receives from the other distributed storage device file blocks that have not been received among the predetermined number of file blocks being received in step S340. For example, in the case of receiving five file blocks from a certain distributed storage device, if the reference time is exceeded in the state where only two files are received, the file management device transmits the remaining three blocks, which have not yet been received, Lt; / RTI >
As an embodiment, the file management apparatus can receive the remaining three blocks from the distributed storage apparatus having the highest communication speed with the file management apparatus among the other distributed storage apparatuses. In step S340, the file management method proceeds to step S350.
In step S350, the file management apparatus reschedules the received file fragment lists according to the communication speed between the file management apparatus and the distributed storage apparatuses. Here, the file management apparatus may refer to the communication status information indicating the communication speed between the distributed storage devices, and the communication status information may be newly generated or updated according to a predetermined time or predetermined condition.
The rescheduling performed here is performed in the same manner as the scheduling in step S310, but is performed only for the remaining file blocks excluding the file blocks for which the file management apparatus has completed the reception.
When the rescheduling is completed, the file management method returns to step S320.
Returning to step S330, if the reception of a predetermined number of file blocks is completed within the reference time, the file management method proceeds to step S360.
In step S360, the file management apparatus reconstructs the original file using the received file fragment or the received file block. For example, the file management apparatus can restore the original file by sequentially connecting the received file blocks. There are various methods for restoring an original file using a received file block or a received file block by the file management apparatus. One of them can be a restoration method described in Korean Patent Application No. 10-2013-0016390 have.
In step S370, the file management apparatus determines whether restoration of the original file is completed. When restoration of the original file is completed, the file management method ends. If the restoration of the original file is not completed, it is the same as that of the file blocks of the original file that have not yet been received, so that the process returns to step S320 to continue to receive the file blocks.
According to the above-described file management method, even if the communication speed between the file management apparatus and the distributed storage devices changes after the received file fragment list is determined, the time required to receive the file fragment through the rescheduling of the received file fragment list Can be minimized.

The embodiments described above can be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. At this time, the computer-readable recording medium may comprise program instructions, data files, data structures, or a combination thereof. The computer-readable recording medium includes, for example, a magnetic medium such as a hard disk, a floppy disk, and a magnetic tape, an optical recording medium such as a CD-ROM, a DVD, a magneto-optical medium such as a floptical disk, optical media), and storage devices such as ROM, RAM, flash memory, and the like. Such a storage device may be configured to operate by one or more software modules to perform processing in accordance with the present invention, and vice versa. The program instructions described herein also include machine language code such as those generated by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments.
In addition, although specific terms are used herein, they are used for the purpose of describing the present invention only and are not used to limit the scope of the present invention described in the claims or the claims. Therefore, the scope of the present invention should not be limited to the above-described embodiments, but should be determined by the equivalents of the claims of the present invention as well as the claims of the following.

1000: 파일 분산 관리 시스템
1100: 파일 관리 장치
1100a, 1100b, …, 1100n: 분산 저장 장치들
1110: 통신부
1111: 통신 상태 관리자
1120: 컨트롤러
1130: 스케쥴러
1140: 저장부1000: File Distribution Management System
1100: File management device
1100a, 1100b, ... , 1100n: distributed storage devices
1110:
1111: Communication status manager
1120: Controller
1130: Scheduler
1140:

Claims

Determining file fragment lists corresponding respectively to a plurality of distributed storage devices in which an original file is distributed;
Modifying at least some of the file fragment lists according to communication speeds between the file management device and the plurality of distributed storage devices;
Receiving the file fragments from each of the plurality of distributed storage devices with reference to the modified file fragment lists; And
And restoring the original file from the received file fragments,
Wherein each of the determined file fragment lists includes at least some of the file blocks of the original file stored in the corresponding distributed storage device.

delete

The method according to claim 1,
Wherein the determined file fragment lists do not include overlapping blocks.

The method of claim 3,
Wherein determining the file fragment lists comprises:
Allocating all of the file blocks of the original file stored in the distributed storage device corresponding to the one file fragment list to one file fragment list of the file fragment lists; And
Allocating at least some file blocks among the file blocks of the original file stored in the distributed storage device sequentially corresponding to the other file list except for the one file fragment list among the file fragment lists,
Wherein the at least some file blocks do not overlap with the file blocks included in the previously allocated file fragment lists.

5. The method of claim 4,
Wherein the one file fragment list is a file fragment list corresponding to a distributed storage apparatus having the highest communication speed with the file management apparatus among the plurality of distributed storage apparatuses.

The method according to claim 1 or 3,
Wherein modifying at least some of the file fragment lists comprises:
Wherein the file fragment lists include more or equal number of file blocks as the communication speed between the corresponding distributed storage device and the file management device is faster.

The method according to claim 6,
Wherein the number of file blocks of each of the file fragment lists is proportional to the communication speed between the corresponding distributed storage device and the file management device.

The method according to claim 6,
Wherein the step of including more or equal number of file blocks comprises:
A file included in the at least one file fragment list in comparison with a communication speed between the distributed storage device corresponding to the at least one file fragment list and the file management device, for at least one file fragment list of the file fragment lists, Determining whether the number of blocks is excessive or inadequate; And
And moving at least some of the file blocks in accordance with the determination result of the excess or the undue.

9. The method of claim 8,
Wherein moving at least some of the file blocks comprises:
Determining the transferred file block using a popularity function indicating how many of the plurality of distributed storage devices share the transferred file block in common to the distributed storage devices.

10. The method of claim 9,
The above-

, &Lt; / RTI >
K is the number of the plurality of distributed storage devices,
Wherein r _j is the number of file blocks determined to be proportional to the communication speed with the file management apparatus of the jth distributed storage apparatus among the plurality of distributed storage apparatuses,
A _j is the size or number of elements of the file fragment list corresponding to the jth distributed storage device among the file fragment lists,
And S _j is a list including file blocks of the original file stored in the jth distributed storage device.

The method according to claim 6,
Determining if at least a portion of the file fragments were received within a reference time; And
Deletes at least a portion of the file fragments from the file management device or retransmits at least a portion of the fragments of the file in a different distributed storage device than the distributed storage device that was transmitting at least a portion of the fragments step; And
And reattempting at least a portion of the file fragment lists according to updated communication rates between the file management apparatus and the plurality of distributed storage devices.

delete

12. The method of claim 11,
Wherein the reassociated file fragment lists do not include a file block already received from the plurality of distributed storage devices by the file management apparatus.

14. The method of claim 13,
Wherein reattaching at least some of the file fragment lists comprises:
Wherein the file fragment lists include more or equal number of file blocks as the updated communication rate of the corresponding distributed storage device is faster.

delete

A file management apparatus for receiving file fragments from a plurality of distributed storage devices,
A scheduler for determining file fragment lists corresponding respectively to the plurality of distributed storage devices and for modifying at least some of the file fragment lists according to communication rates between the file management apparatus and the plurality of distributed storage devices, ;
A communication unit for performing communication with the plurality of distributed storage devices or providing an interface for communication with the plurality of distributed storage devices; And
A controller for controlling the file management apparatus to receive the file fragments from each of the plurality of distributed storage devices through the communication unit and to restore an original file from the received file fragments, , &Lt; / RTI &
The scheduler includes:
Wherein the file fragment lists are determined such that each of the file fragment lists includes at least a part of file blocks stored in a corresponding distributed storage device and does not include blocks which are overlapped with each other at the same time.

17. The method of claim 16,
The scheduler includes:
Wherein the file fragment lists modify the file fragment list such that the faster the communication speed of the corresponding distributed storage device, the more or equal number of file blocks are included.

Determining file fragment lists corresponding respectively to a plurality of distributed storage devices in which an original file is distributed; Modifying at least some of the file fragment lists according to communication speeds between the file management device and the plurality of distributed storage devices; Receiving the file fragments from each of the plurality of distributed storage devices with reference to the modified file fragment lists; And restoring the original file from the received file fragments, wherein each of the determined fragment file lists includes at least some of the file blocks of the original file stored in the corresponding distributed storage device. A computer-readable recording medium recording a computer program for executing a method.

19. The method of claim 18,
Wherein the step of determining the file fragment lists comprises determining the file fragment lists so as not to include blocks that overlap with each other at the same time,
Wherein modifying the file slice lists comprises modifying the file slice list such that the file slice lists include more or equal number of file blocks as the communication speed of the corresponding distributed storage device is faster, A computer-readable recording medium recording a computer program for executing a method.