KR20020096027A

KR20020096027A - Grid filesystem that uses the network and storage resource of personal computer that resident remotely in the world

Info

Publication number: KR20020096027A
Application number: KR1020020064401A
Authority: KR
Inventors: 장용표
Original assignee: (주)컴파인
Priority date: 2002-10-22
Filing date: 2002-10-22
Publication date: 2002-12-28

Abstract

PURPOSE: A grid file system for storing the files and offering a service by using the storage of a personal computer and the network resources distributed worldwide is provided to transmit the large capacity of the data regardless of a bandwidth by dispersing the bandwidth requested in the file transfer depending on the number of the personal computers in a grid area. CONSTITUTION: A grid file system server fragments the file of the large capacity and replicates the fragment files. The replica files are dispersed in the personal computers participating in the grid file system service. If the specific data are requested, the file search and inquiry are requested for a central server having the file distribution information of the grid file system via a grid file system agent. The files dispersed in the various places are downloaded sequentially, and added to each other to obtain a complete file. Herein, the parallel processing is possible and the network environment of a high bandwidth is not needed. Thus, a high-quality file service is realized.

Description

Grid filesystem that uses the network and storage resource of personal computer that resident remotely in the world}

기존에는 대용량의 텍스트 데이터나 멀티미디어 데이터를 서비스하기 위하여 클러스터링, 분산 데이터베이스 시스템, 분산 파일 시스템등과 같은 기술을 이용하였다. 그러나, 이 기술은 서버 자원을 확장한 것으로 서버 자원 확장에는 한계가 있다. 그리고, 대용량의 멀티미디어 데이터를 클라이언트에게 데이터를 전송하기 위해서는 서버를 구성하고 있는 클러스터에 고대역폭의 네트워크가 요구되는 문제점이 있다. 이 문제를 해결하기 위해서 그리드 컴퓨팅 기술을 응용하여 인터넷으로 연결되어 그리드 서비스에 참여하고 있는 세계 곳곳에 퍼져있는 컴퓨터의 스토리지 자원을 이용하여 파일을 분산시키고 특정 파일을 요청하는 클라이언트에게 파일을 전송할 때에는 중앙집중서버에서 파일을 가져오는 것이 아니라 분산되어있는 그리드 서비스에 참여하고 있는 각각의 개인용 컴퓨터들로부터 분해된 데이터를 직접 다운로드 받아 병합하여 원래의 파일을 얻게 된다. 이로서 중앙집중서버는 파일 전송에 필요한 별도의 네트워크 대역폭을 고려하지 않아도 된다.In the past, technologies such as clustering, distributed database systems, and distributed file systems have been used to service large amounts of text or multimedia data. However, this technique is an extension of server resources, and there is a limit to server resource expansion. In addition, in order to transmit data of a large amount of multimedia data to a client, there is a problem that a high bandwidth network is required for a cluster constituting a server. To solve this problem, grid computing technology is used to distribute files using the storage resources of computers connected to the Internet and participate in grid services, and to send files to clients requesting specific files. Instead of retrieving the file from the centralized server, the original file is obtained by directly merging the disassembled data from each personal computer participating in the distributed grid service. This allows the centralized server not to consider the extra network bandwidth needed for file transfer.

기존에는 대용량의 텍스트 데이터나 멀티미디어 데이터를 서비스하기 위하여 클러스터링, 분산 데이터베이스 시스템, 분산 파일 시스템등과 같은 기술을 이용하였다. 그러나, 이 기술은 서버 자원을 확장한 것으로 서버 자원 확장에는 한계가 있다. 그리고, 대용량의 멀티미디어 데이터를 클라이언트에게 데이터를 전송하기 위해서는 서버를 구성하고 있는 클러스터에 고대역폭의 네트워크가 요구되는 문제점이 있다. 이 문제를 해결하기 위해서 본 특허건은 그리드 컴퓨팅 기술을 응용하였다.In the past, technologies such as clustering, distributed database systems, and distributed file systems have been used to service large amounts of text or multimedia data. However, this technique is an extension of server resources, and there is a limit to server resource expansion. In addition, in order to transmit data of a large amount of multimedia data to a client, there is a problem that a high bandwidth network is required for a cluster constituting a server. In order to solve this problem, this patent applies grid computing technology.

제1도는 그리드 파일 시스템을 구성하는 전체 시스템 구성도1 is an overall system diagram constituting the grid file system

제2도는 그리드 파일 시스템 영역을 구성하고 있는 중앙집중데이터서버와 그리드Figure 2 shows the centralized data server and grid that make up the grid file system area.

파일 시스템에 참여하고 있는 개인용 컴퓨터의 구성도Schematic diagram of a personal computer participating in the file system

1. 복수의 이진 파일을 압축하여 파일의 크기를 줄인 후 지정된 크기(1메가, 5메가, 10메가 등)로 분해한다1. Compress multiple binary files to reduce the file size and decompose them into specified sizes (1, 5, 10, etc.)

2. 각각의 분해한 파일 조각들은 암호화 과정을 거친다.2. Each piece of disassembled file is encrypted.

3. 암호화 과정을 거친 각각의 분해한 파일 조각들의 복사본(replica)을 여러 개 만든다.3. Make multiple copies of each fragmented file fragment that has been encrypted.

4. 만들어진 분해되고 암호화된 파일 조각들의 복사본을 그리드 서비스 영역에 참여하고 있는 개인용 컴퓨터의 저장공간에 분산 배치한다.4. Distribute copies of the created disassembled and encrypted file fragments in storage on the personal computer participating in the grid service area.

5. 분산 배치된 파일들에 대한 정보는 중앙집중데이터서버에 저장된다.5. Information about distributed files is stored in a centralized data server.

제3도는 그리드 파일 시스템과 그리드 파일 시스템 에이전트의 상호작용을 나타내는 구성도3 is a schematic diagram showing the interaction of a grid file system and a grid file system agent.

1. 클라이언트가 그리드파일시스템 에이전트를 통하여 원하는 파일에 대한 검색 쿼리 요청 또는 저장된 파일을 카테고리 또는 정리된 폴더 방식으로 인덱스를 요청한다.1. The client requests a search query for a desired file through a grid file system agent or an index of stored files by category or organized folder method.

2. 그리드파일시스템 에이전트는 클라이언트의 요청에 따라 중앙집중서버에 저장된 파일 정보를 검색하여 원하는 파일에 대한 분해 정보와 분산 배치 정보, 현재 연결 가능한 그리드 서비스에 참여하고 있는 개인 컴퓨터들의 상태 정보를 얻는다.2. The grid file system agent retrieves the file information stored in the centralized server at the client's request and obtains decomposition information, distributed layout information, and status information of individual computers participating in the grid service that can be connected.

3. 그리드파일시스템 에이전트는 얻은 정보를 이용하여 그리드 서비스에 참여하고 있는 복수의 개인 컴퓨터들에 접속하여 순차적으로 분해된 파일을 다운로드하여 병합하여 원래의 파일을 재구성한다.3. The grid file system agent uses the obtained information to access a plurality of personal computers participating in the grid service, download and merge the disassembled files sequentially, and reconstruct the original file.

4. 파일 다운로드 중에는 현재의 연결이 끊겼을 경우를 대비하여 현재 연결 가능한 동일한 분해된 파일 아이템을 가지고 있는 다른 컴퓨터들의 후보목록을 만들어 현재 연결이 끊겼을 경우 현재 연결 가능한 네트워크 대역폭이 가장 큰 후보를 대신 연결하여 계속하여 파일 다운로드 작업을 수행한다.4. During the file download, a candidate list of other computers with the same disassembled file item that can be currently connected is created in case the current connection is lost. Connect and continue to download files.

도2의 상단에서 나타낸 그리드 파일 시스템 서버는 대용량의 이진 파일을 분해(Fragment)하여 여러개의 복사본(Replica)을 만들어 도2의 하단에서 표현된 것과 같이 그리드 파일 시스템 서비스에 참여하고 있는 개인용 컴퓨터들에 분산 배치하여 특정한 데이터 요청시 도3과 같은 그리드 파일시스템 에이전트(Grid Filesystem Agent) 응용 프로그램을 이용하여 그리드 파일시스템의 파일 배치 정보를 갖고 있는 중앙서버에 파일 검색 및 조회 작업을 요청하고 얻은 결과를 이용하여 미리 분해하여 곳곳에 배치된 파일을 순차적으로 다운로드받아 하나로 합쳐 원하는 완벽한 파일을 얻을 수 있게 된다. 또한 파일 다운로드시 병렬처리가 가능하고 서버에서 대량의 파일 전송에 필요한 고대역폭의 네트워크 환경이 필요하지 않기 때문에 초고속 인터넷의 보급의 확산에 힘입어 저렴한 비용으로 고비용의 클러스터링 파일 시스템 서비스에서나 가능한 고품질의 파일 서비스가 가능하다.The grid file system server, shown at the top of FIG. 2, fragments a large number of binary files to create multiple replicas to personal computers participating in grid file system services as represented at the bottom of FIG. When requesting specific data by distributed arrangement, use the Grid Filesystem Agent application program as shown in FIG. 3 to request a file search and inquiry operation from the central server that has file layout information of the grid filesystem and use the results obtained. By disassembling in advance, files placed in various places are downloaded sequentially and merged into one to get the perfect file you want. In addition, since the file download can be parallelized and the server does not need the high-bandwidth network environment required for transferring a large amount of files from the server, the high-quality files available only at high-cost clustering file system services at low cost due to the spread of high-speed Internet. Service is available.

그리드 파일시스템 에이전트 프로그램은 독립 실행형 응용프로그램의 형태이거나 가상의 드라이브 디바이스 형태 등 다양하게 개발하여 서비스할 수 있다. 후자의 형태로 개발할 경우 사용자는 마치 자신의 컴퓨터에 직접 연결되어있는 저장장치처럼 그리드 스토리지 서비스에 저장된 파일을 이용할 수 있다.The grid file system agent program can be developed and serviced in various forms such as a standalone application program or a virtual drive device. In the latter form, users can use the files stored in the grid storage service as if they were connected directly to their computers.

중앙집중식 파일 서버의 문제점인 고대역폭 요구문제와 대용량의 서버 자원 필요 문제를 해결하였다. 즉, 중앙집중식 파일서버는 데이터 전송에 필요한 고대역폭을 요구하지만 본 그리드 파일 시스템을 이용하면 그리드 영역에 참여하고 있는 개인용 컴퓨터의 수만큼 파일 전송시 요구하는 대역폭이 분산되므로 대역폭에 큰 영향을 받지 않고 대용량의 데이터 전송이 가능하다. 이 방식은 데이터의 변경 작업은 빈번하지 않으나 데이터의 읽기나 추가 작업이 빈번하며 비교적 정보의 보안 요구정도가 약한 데이터의 서비스, 예를 들면 방송국이나 각종 영상물 제작업체들의 VOD서비스나 국내외의 문화재에 대한 다양한 정보를 담고 있는 멀티미디어 데이터, 백과사전에 연결된 멀티미디어 데이터와 같은 다양한 형태의 대용량 멀티미디어 데이터 서비스에 적합하다. 향후 분해된 데이터의 암호화의 강도를 높이는 등 보안문제를 더욱 깊이 고려하면 좀 더 다양한 분야에서도 폭넓게 활용 가능할 것이다. 특히, 정부나 지자체 여러 기관을 비롯한 공공단체의 개인용 컴퓨터의 저장공간을 활용하여 공익목적의 데이터 서비스에 이용한다면 더욱 유용할 것이다.We solved the problem of high bandwidth and the need for large server resources. In other words, centralized file server requires high bandwidth for data transmission, but using this grid file system, bandwidth required for file transfer is distributed as much as the number of personal computers participating in grid area. Large data transfer is possible. This method does not change data frequently but frequently reads or adds data, and has relatively low security requirements, such as VOD services of broadcasting stations or various video production companies, or cultural assets at home and abroad. It is suitable for various types of large-capacity multimedia data services such as multimedia data containing various information and multimedia data connected to an encyclopedia. Further consideration of security issues, such as increasing the strength of encryption of decomposed data in the future, will be widely applicable in more diverse fields. In particular, it would be more useful to use data storage for public interest by using the storage space of personal computers of public organizations including government and local government.

Claims

Method of decomposing and distributing data by using storage resources of a personal computer distributed in a remote place (FIG. 2)

Requesting data retrieval from the central server using the grid file system agent, receiving the results, and downloading data by merging and disassembling the data stored in a personal computer distributed remotely (Fig. 3).