KR101638727B1

KR101638727B1 - Cluster system

Info

Publication number: KR101638727B1
Application number: KR1020150000581A
Authority: KR
Inventors: 고건환; 김판규; 윤종철
Original assignee: 한국생명공학연구원
Priority date: 2015-01-05
Filing date: 2015-01-05
Publication date: 2016-07-20

Abstract

Disclosed is a cluster system. The cluster system, which includes a plurality of nodes and is used for combined usage of a Unix file system and a distribution file system, comprises a data management part. When there is a request to input and output data of the Unix file system, the data management part accesses each of the nodes and searches and provides a required data in a disc cache. The cluster system is able to reduce costs for a storage system.

Description

CLUSTER SYSTEM {CLUSTER SYSTEM}

본 발명은 클러스터 시스템에 관한 것으로, 더욱 상세하게는 유닉스 시스템과 분산파일 시스템이 혼용되는 클러스터 시스템에 관한 것이다.
The present invention relates to a cluster system, and more particularly, to a cluster system in which a Unix system and a distributed file system are mixed.

최근 클라우드 컴퓨팅 기술의 발달로 인하여, 사용자들은 고성능 클러스터 시스템을 직접적으로 접근하여 제어할 필요 없이 간단하게 서비스를 신청하여 고성능 컴퓨팅 작업을 수행할 수 있게 되었다. 하둡은 가장 대표적인 클라우드 컴퓨팅 관련 기술로써, 병렬분산처리가 가능하고 비교적 싼 서버들을 활용하여 고성능 클러스터 환경을 구축할 수 있다. 하지만, 하둡에서 지원하는 병렬 분산처리 방식인 MapReduce를 활용하기 위해서는, 기존 리눅스/유닉스(Linux/Unix)에서 실행이 가능한 프로그램이나 툴을 MapReduce수행이 가능한 하둡방식으로 전환이 필요하며, 이는 추가적인 개발시간과 공수가 필요하다. With the recent development of cloud computing technology, users can apply high-performance computing tasks simply by applying services without having to directly access and control high-performance cluster system. Hadoop is one of the most popular cloud computing related technologies, enabling parallel distributed processing and building a high-performance cluster environment using relatively inexpensive servers. However, in order to utilize MapReduce, a parallel distributed processing method supported by Hadoop, it is necessary to convert the programs and tools that can be executed on existing Linux / Unix (Linux / Unix) into the Hadoop method capable of MapReduce, And karate are needed.

결국 기존의 리눅스/유닉스(Linux/Unix)용 프로그램은 수정없이 현재의 방식대로 수행하고, 새롭게 개발되는 프로그램들은 MapReduce방식의 하둡프로그램으로 개발하는 것이 현재 하둡을 이용한 클라우드 시스템으로 가장 이상적인 형태이다. 이를 위해서 하둡의 파일저장시스템인 HDFS를 일반 리눅스/유닉스(Linux/Unix)에서 활용 가능한 파일저장시스템으로 공유를 해야 하는데, 이때 여러 대의 개별 노드에 있는 Local storage를 하나의 합쳐진 스토리지로 인식하여 공유하게 되므로 데이터입출력 속도가 현저히 저하되는 문제점이 있다.
As a result, existing Linux / Unix programs do not need to be modified in the current way, and developing newly developed programs as Hadoop programs using MapReduce is the ideal form of cloud system using Hadoop. To do this, we need to share HDFS, a file storage system from Hadoop, into a file storage system that can be used by general Linux / Unix. At this time, local storage on multiple individual nodes is recognized as a combined storage and shared There is a problem that the data input / output speed is remarkably lowered.

본 발명이 이루고자 하는 기술적 과제는 리눅스/유닉스(Linux/Unix)와 같은 일반 클러스터 시스템과 하둡 분산파일 시스템(HDFS, Hadoop Distribuye Fil System) 등 분산파일 시스템을 사용하는 클러스터 시스템의 혼용이 가능한 클러스터 시스템을 제공하는데 있다.SUMMARY OF THE INVENTION The present invention provides a cluster system capable of mixing a cluster system using a distributed file system such as a Hadoop Distributed Fil System (HDFS) and a general cluster system such as Linux / Unix .

또한, 일반 클러스터 시스템이 분산파일 시스템에 액세스하여 데이터 입출력시 발생하는 속도 저하를 방지할 수 있는 클러스터 시스템을 제공하는데 있다.Another object of the present invention is to provide a cluster system capable of preventing a general cluster system from accessing a distributed file system and slowing down data input / output.

또한, 일반 클러스터 시스템과 분산파일 시스템의 저장체계를 통합함으로써 저장체계에 대한 비용을 줄일 수 있는 클러스터 시스템을 제공하는데 있다.The present invention also provides a cluster system capable of reducing the cost of a storage system by integrating storage systems of a general cluster system and a distributed file system.

또한, 일반 클러스터 시스템과 분산파일 시스템을 혼용함으로써 빅데이터 분석에 적합한 클러스터 시스템을 제공하는데 있다.
Another object of the present invention is to provide a cluster system suitable for large data analysis by using a common cluster system and a distributed file system.

본 발명의 일 양태에 따르면, 복수개 노드를 포함하며 유닉스 파일 시스템과 분산파일 시스템의 혼용을 위한 클러스터 시스템에 있어서 상기 유닉스 파일 시스템의 데이터 입출력요청시 상기 복수개 노드에 각각 엑세스하고 디스크 캐시에서 소요 데이터를 검색하여 제공하는 데이터관리부를 포함하는 클러스터 시스템을 제공한다.According to an aspect of the present invention, there is provided a cluster system for a mixed use of a UNIX file system and a distributed file system including a plurality of nodes, wherein each of the nodes accesses the plurality of nodes when requesting data input / output of the UNIX file system, And a data management unit for searching and providing the cluster system.

상기 데이터관리부는, 상기 유닉스 파일 시스템의 데이터 입출력요청에 대응하여 상기 복수개 노드의 디스크 캐시에서 상기 소요 데이터를 검색하는 검색부; 상기 복수개 노드의 디스크 캐시에서 상기 소요 데이터가 검색되지 않는 경우 상기 분산파일 시스템에 데이터를 요청하는 데이터요청부; 상기 분산파일 시스템으로부터 데이터를 받아 상기 디스크 캐시에 업데이트하는 업데이트부; 및 상기 디스크 캐시에서 상기 소요 데이터를 수집하여 상기 유닉스 파일 시스템에 제공하는 데이터처리부를 포함하여 구성될 수 있다.Wherein the data management unit comprises: a retrieval unit for retrieving the required data from the disk cache of the plurality of nodes in response to a data input / output request of the UNIX file system; A data request unit for requesting data to the distributed file system when the required data is not retrieved from the disk cache of the plurality of nodes; An update unit for receiving data from the distributed file system and updating the data in the disk cache; And a data processor for collecting the required data from the disk cache and providing the collected data to the UNIX file system.

상기 검색부는 상기 복수개 노드 중 적어도 하나의 노드에서 제공하는 인덱스 정보를 이용하여 상기 소요 데이터를 검색할 수 있다.The search unit may search the required data using index information provided by at least one of the plurality of nodes.

상기 업데이트부는 우선순위에 따른 데이터 목록을 작성하고 상기 디스크 캐시 데이터 정보와 비교하여 누락된 데이터를 업데이트 할 수 있다.The update unit may generate a data list according to the priority and compare the disc cache data information to update the missing data.

상기 업데이트부는 파일 사용일자, 파일 사용 빈도수, 파일 생성일자 및 파일 수정일자 중 적어도 하나를 이용하여 상기 우선순위에 따른 데이터 목록을 작성할 수 있다.
The update unit may create a data list according to the priority using at least one of a file use date, a file use frequency, a file creation date, and a file modification date.

본 발명인 클러스터 시스템은 리눅스/유닉스(Linux/Unix)와 같은 일반 클러스터 시스템과 하둡 분산파일 시스템(HDFS, Hadoop Distribuye Fil System)과 같은 분산파일 시스템과의 혼용이 가능하며, 일반 클러스터 시스템이 하둡 분산파일 시스템에 액세스하여 데이터 입출력시 발생하는 속도 저하를 방지할 수 있다.The cluster system of the present invention can be used with a general cluster system such as Linux / Unix and a distributed file system such as Hadoop Distributed Fil System (HDFS) It is possible to prevent a speed reduction in data input / output by accessing the system.

또한, 일반 클러스터 시스템과 하둡 분산파일 시스템의 저장체계를 통합함으로써 저장체계에 대한 비용을 줄일 수 있으며, 일반 클러스터 시스템과 하둡 분산파일 시스템을 혼용함으로써 빅데이터 분석에 적합하다.
In addition, by integrating the storage system of the general cluster system and the Hadoop distributed file system, the cost of the storage system can be reduced, and it is suitable for the big data analysis by using the general cluster system and the Hadoop distributed file system.

도1은 본 발명의 일실시예에 따른 클러스터 시스템의 개념도,
도2는 본 발명의 일실시예에 따른 데이터 관리부의 구성 블록도,
도3은 본 발명의 일실시예에 따른 데이터관리부의 동작을 설명하기 위한 순서도,
도4는 본 발명의 일실시예에 따른 업데이트부의 동작을 설명하기 위한 순서도,
도5 내지 도7은 본 발명의 일실시예에 따른 클러스터 시스템과 기존 클러스터 시스템의 성능을 비교하기 위한 도면이다.1 is a conceptual diagram of a cluster system according to an embodiment of the present invention;
2 is a block diagram of a data management unit according to an embodiment of the present invention.
3 is a flowchart illustrating an operation of a data management unit according to an embodiment of the present invention.
4 is a flowchart illustrating an operation of the update unit according to an embodiment of the present invention;
5 to 7 are diagrams for comparing performance of a cluster system and an existing cluster system according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated and described in the drawings. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

제2, 제1 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제2 구성요소는 제1 구성요소로 명명될 수 있고, 유사하게 제1 구성요소도 제2 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. The terms including ordinal, such as second, first, etc., may be used to describe various elements, but the elements are not limited to these terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the second component may be referred to as a first component, and similarly, the first component may also be referred to as a second component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

이하, 첨부된 도면을 참조하여 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 대응하는 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.
Hereinafter, embodiments will be described in detail with reference to the accompanying drawings, wherein like or corresponding elements are denoted by the same reference numerals, and redundant description thereof will be omitted.

도1은 본 발명의 일실시예에 따른 클러스터 시스템의 개념도 및 도2는 본 발명의 일실시예에 따른 데이터 관리부(100)의 구성 블록도이다.FIG. 1 is a conceptual diagram of a cluster system according to an embodiment of the present invention, and FIG. 2 is a configuration block diagram of a data management unit 100 according to an embodiment of the present invention.

도1 및 도2를 참조하면 본 발명의 일실시예에 따른 클러스터 시스템은 디스크 캐시를 포함하는 복수개 노드(31~35)에 데이터를 분산 저장하는 분산파일 시스템(10) 및 유닉스 파일 시스템(20)의 데이터 입출력요청시 복수개 노드의(31~35) 디스크 캐시에서 소요 데이터를 검색하여 제공하는 데이터 관리부(100)를 포함하여 구성될 수 있다.1 and 2, a cluster system according to an embodiment of the present invention includes a distributed file system 10 and a UNIX file system 20 that distribute and store data to a plurality of nodes 31 to 35 including a disk cache, And a data management unit 100 for retrieving and providing required data from the (31-35) disk cache of a plurality of nodes in response to a data input / output request of the node.

먼저, 본 발명의 일실시예에 따른 클러스터 시스템은 리눅스/유닉스(Linux/Unix) 등 공유형식의 일반 파일 시스템을 기반으로 하는 유닉스 파일 시스템(20)과 하둡 시스템 등과 같은 분산파일 시스템(10)이 혼용되어 있으며 각각의 운영체제하의 프로그램이나 툴(tools)을 통하여 동작하게 된다.The cluster system according to an embodiment of the present invention includes a UNIX file system 20 based on a common file system of a shared type such as Linux / Unix and a distributed file system 10 such as a Hadoop system They are mixed and run through programs or tools under each operating system.

복수개의 노드(31~35)는 데이터를 분산하여 저장하고 있다. 복수개의 노드(31~35)는 일반 저장체계인 저장부와 데이터 캐쉬를 포함하여 구성되며 복수개의 노드(31~35) 중 하나는 마스터 노드로 동작할 수 있으며, 데이터 캐쉬는 SSD(Solid State Drive), HDD(Hard Disk Drive) 등이 사용될 수 있다. 본 발명의 일실시예에서 데이터 캐쉬는 저장부의 데이터를 미리 로딩하고 있는 버퍼의 개념으로 사용될 수 있다.The plurality of nodes 31 to 35 distribute and store data. The plurality of nodes 31 to 35 may include a storage unit and a data cache. The plurality of nodes 31 to 35 may function as a master node. The data cache may include a solid state drive (SSD) ), A hard disk drive (HDD), or the like. In an embodiment of the present invention, the data cache can be used as a concept of a buffer for preloading data in a storage unit.

하둡파일시스템(HDFS)과 같은 분산파일시스템(10)은 복수개의 노드(31~35)에 데이터를 중복하여 분산 저장하며 클러스터 시스템으로부터 데이터 입출력 요청을 수신하면 복수개 노드(31~35)에 엑세스하여 요청한 동작을 처리할 수 있다. 이때 각 데이터가 저장되어 있는 노드안에서 해당 데이터를 처리함으로써 파일 입출력을 최소화하면서 분산처리 효과를 높일 수 있다. The distributed file system 10 such as the Hadoop file system (HDFS) distributes and stores data redundantly to a plurality of nodes 31 to 35. When receiving a data input / output request from the cluster system, the nodes 31 to 35 are accessed The requested action can be handled. In this case, by processing the corresponding data in the node where each data is stored, the effect of distributed processing can be enhanced while minimizing file input / output.

데이터 관리부(100)는 검색부(110), 데이터 요청부(120), 업데이트부(130) 및 데이터 처리부(140)를 포함하여 구성될 수 있다.The data management unit 100 may include a search unit 110, a data request unit 120, an update unit 130, and a data processing unit 140.

검색부(110)는 유닉스 파일 시스템(20)의 데이터 입출력요청에 대응하여 복수개 노드(31~35)의 디스크 캐시에서 소요 데이터를 검색한다. 이 때 검색부(110)는 복수개 노드(31~35) 중 적어도 하나의 노드에서 제공하는 인덱스 정보를 이용하여 소요 데이터를 검색할 수 있다. 인덱스 정보는 디스크 캐시에 존재하는 데이터 목록으로 예를 들면, 복수개의 노드(31~35) 중 마스터 노드로 선정된 노드를 통하여 제공될 수 있다.The retrieving unit 110 retrieves required data from the disk cache of the plurality of nodes 31 to 35 in response to a data input / output request of the UNIX file system 20. [ At this time, the search unit 110 can search for required data by using the index information provided by at least one of the nodes 31 to 35. The index information is a list of data existing in the disk cache, for example, can be provided through a node selected as a master node among the plurality of nodes 31 to 35.

데이터 요청부(120)는 복수개 노드(31~35)의 디스크 캐시에서 소요 데이터가 검색되지 않는 경우 분산파일 시스템(10)에 데이터를 요청할 수 있다.The data requesting unit 120 can request data from the distributed file system 10 when required data can not be retrieved from the disk cache of the plurality of nodes 31 to 35. [

업데이트부(130)는 분산파일 시스템(10)로부터 데이터를 받아 디스크 캐시에 업데이트 할 수 있다. 업데이트부(130)는 데이터 요청부(120)의 요청에 의하여 분산파일 시스템(10)로부터 데이터를 받으면 이를 디스크 캐시로 복사하고 인덱스 정보의 데이터 목록을 수정한다.The update unit 130 receives data from the distributed file system 10 and can update the disk cache. Upon receiving data from the distributed file system 10 at the request of the data requesting unit 120, the updating unit 130 copies the data to the disk cache and corrects the data list of the index information.

또한, 업데이트부(130)는 주기적으로 디스크 캐시의 데이터를 업데이트 할 수 있다. 업데이트부(130)는 우선순위에 따른 데이터 목록을 작성하고 디스크 캐시의 데이터 정보와 비교하여 누락된 데이터를 업데이트 할 수 있다. 이 때 업데이트부(130)는 마스터 노드의 인덱스 정보와 비교하여 누락된 데이터를 추출할 수 있다.In addition, the update unit 130 may periodically update data in the disk cache. The update unit 130 may generate a data list according to the priority and compare the data list with the data information of the disk cache to update the missing data. At this time, the updating unit 130 may compare the index information of the master node to extract the missing data.

업데이트부(130)는 파일 사용일자, 파일 사용 빈도수, 파일 생성일자 및 파일 수정일자 중 적어도 하나를 이용하여 우선순위에 따른 데이터 목록을 작성할 수 있다. 업데이트부(130)는 상기 나열된 요소를 이용하여 최근에 사용된 파일, 빈번하게 사용된 파일, 먼저 생성된 파일, 최근에 수정된 파일 등의 순서로 우선순위 목록을 작성할 수 있으며 무작위 방식을 통하여 우선순위 목록을 작성할 수도 있다.The updating unit 130 may generate a data list according to the priority by using at least one of a file use date, a file use frequency, a file creation date, and a file modification date. The updating unit 130 may create a priority list in the order of recently used files, frequently used files, first generated files, recently modified files, etc. using the listed elements, You can also create a ranking list.

도3은 본 발명의 일실시예에 따른 데이터 관리부(100)의 동작을 설명하기 위한 순서도이다.3 is a flowchart illustrating an operation of the data management unit 100 according to an embodiment of the present invention.

먼저, 데이터 관리부(100)는 유닉스 파일 시스템(20)으로부터 데이터 입출력 요청이 있는지 확인한다.First, the data management unit 100 checks whether there is a data input / output request from the UNIX file system 20.

데이터 관리부(100)는 유닉스 파일 시스템(20)으로부터 데이터 입출력 요청이 확인되면 복수개의 노드에 포함된 디스크 캐시에 각각 엑세스하여 소요 데이터가 존재하는지 확인한다. 이 때 데이터 관리부(100)는 마스터 노드에서 제공하는 인덱스 정보를 이용하여 디스크 캐시에 소요 데이터가 존재하는지 확인할 수 있다.When the data input / output request is confirmed from the Unix file system 20, the data management unit 100 accesses the disk caches included in the plurality of nodes to check whether the required data exists. At this time, the data management unit 100 can check whether there is necessary data in the disk cache by using the index information provided by the master node.

데이터 관리부(100)는 소요 데이터가 디스크 캐시에 존재하면 해당 소요 데이터를 추출하여 유닉스 파일 시스템(20)에 제공한다.The data management unit 100 extracts the required data if the required data exists in the disk cache, and provides the extracted data to the UNIX file system 20.

데이터 관리부(100)는 소요 데이터가 디스크 캐시에 존재하지 않으면 분산파일 시스템(10)에 소요 데이터를 요청한다.The data management unit 100 requests the distributed file system 10 for the required data if the required data does not exist in the disk cache.

데이터 관리부(100)는 분산파일 시스템(10)로부터 요청한 소요 데이터를 복사하여 디스크 캐시에 저장하고, 디스크 캐시의 데이터 목록을 수정한다.The data management unit 100 copies the required data requested from the distributed file system 10, stores the copied data in the disk cache, and modifies the data list of the disk cache.

도4는 본 발명의 일실시예에 따른 업데이트부(130)의 동작을 설명하기 위한 순서도이다.4 is a flowchart illustrating an operation of the update unit 130 according to an embodiment of the present invention.

먼저, 업데이트부(130)는 기 설정된 시간에 따라 업데이트를 수행하기 위한 주기가 도래하였는지 확인한다.First, the update unit 130 checks whether a period for performing an update according to a preset time has arrived.

업데이트부(130)는 업데이트 주기가 도래한 것으로 확인되면 우선순위 파일목록을 생성한다.The update unit 130 generates a priority file list when it is determined that the update period has arrived.

업데이트부(130)는 우선순위 파일목록을 디스크 캐시의 데이터 정보와 비교하여 누락 데이터가 존재하는지 체크한다.The updating unit 130 compares the priority file list with the data information of the disk cache to check whether there is missing data.

업데이트부(130)는 누락 데이터가 존재하는 경우 우선순위 파일목록에 따라 분산파일 시스템(10)로부터 데이터를 복사하여 디스크 캐시에 저장한다. 업데이트부(130)는 디스크 캐시 업데이트 후 디스크 캐시의 데이터 목록을 수정한다.The update unit 130 copies the data from the distributed file system 10 according to the priority file list and stores the data in the disk cache when the missing data exists. The update unit 130 updates the data list of the disk cache after the disk cache update.

도5 내지 도7은 본 발명의 일실시예에 따른 클러스터 시스템과 기존 클러스터 시스템의 성능을 비교하기 위한 도면이다.5 to 7 are diagrams for comparing performance of a cluster system and an existing cluster system according to an embodiment of the present invention.

도5 내지 도7에서 NFS3 GATEWAY는 분산파일 시스템을 NFS형태로 이용하여 유닉스 파일 시스템(20)과 분산파일 시스템(10)을 혼용하고 있는 클러스터 시스템을 의미하여, OpenBio는 본 발명의 일실시예에 따른 클러스터 시스템을 의미한다. OpenBio의 Hit rate는 디스크 캐시에 소요 데이터가 존재하는 비율을 의미하며 Hit rate가 0%인 경우는 디스크 캐시에 소요 데이터가 존재하지 않는 경우이고, Hit rate가 100%인 경우는 디스크 캐시에 소요 데이터가 모두 존재하는 경우이다.5 to 7, the NFS3 gateway refers to a cluster system in which a distributed file system is used in the form of an NFS to mix a UNIX file system 20 and a distributed file system 10. OpenBio is an embodiment of the present invention Quot; cluster system " The hit rate of OpenBio means the ratio of data to disk cache. If hit rate is 0%, there is no data in disk cache. If Hit rate is 100% Are all present.

도5를 참조하면, 기존의 클러스터 시스템과 비교하여 5G ~ 50G의 모든 데이터 구간에서 데이터 출력에 소모되는 시간이 감소하였음을 확인할 수 있다. Referring to FIG. 5, it can be seen that the time consumed for data output in all the data intervals of 5G to 50G is reduced as compared with the conventional cluster system.

도6을 참조하면, 감소한 시간에 따른 성능 개선율을 확인할 수 있는데 Hit rate가 0%인 경우에는 개선율이 약 200~380%에 가깝게 나타났으며, Hit rate가 100%인 경우에는 개선율이 약 480%에서 810%까지 나타나는 것을 확인할 수 있다.Referring to FIG. 6, it can be seen that the performance improvement ratio with time decreases. When the Hit rate is 0%, the improvement rate is close to 200 to 380%. When the Hit rate is 100%, the improvement rate is about 480% To 810%.

도7은 데이터 용량에 따라 기존의 클러스터 시스템과 본 발명의 일실시예에 따른 클러스터 시스템의 시간, 성능 향상율을 그래프로 나타낸 것이다.
FIG. 7 is a graph showing time and performance improvement rates of a conventional cluster system and a cluster system according to an embodiment of the present invention, according to data capacity.

본 실시예에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field-programmable gate array) 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.As used in this embodiment, the term " portion " refers to a hardware component such as software or an FPGA (field-programmable gate array) or ASIC, and 'part' performs certain roles. However, 'part' is not meant to be limited to software or hardware. &Quot; to " may be configured to reside on an addressable storage medium and may be configured to play one or more processors. Thus, by way of example, 'parts' may refer to components such as software components, object-oriented software components, class components and task components, and processes, functions, , Subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functions provided in the components and components may be further combined with a smaller number of components and components or further components and components. In addition, the components and components may be implemented to play back one or more CPUs in a device or a secure multimedia card.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the present invention as defined by the following claims It can be understood that

10: 분산파일 시스템
20: 유닉스파일 시스템
30: 노드
100: 데이터 관리부10: Distributed File System
20: Unix file system
30: node
100:

Claims

CLAIMS What is claimed is: 1. A cluster system including a plurality of nodes and for mixing a UNIX file system and a distributed file system,
And a data manager for accessing the plurality of nodes in parallel when a data input / output request of the UNIX file system is requested, and retrieving and providing required data from the disk cache,
The data management unit,
A retrieval unit retrieving the required data from the disk cache of the plurality of nodes in response to a data input / output request of the UNIX file system;
A data request unit for requesting data to the distributed file system when the required data is not retrieved from the disk cache of the plurality of nodes;
An update unit for receiving data from the distributed file system and updating the data in the disk cache; And
And a data processor for collecting the required data from the disk cache and providing the collected data to the UNIX file system.

delete

The method according to claim 1,
Wherein the searching unit searches the required data using index information provided by at least one of the plurality of nodes.

The method according to claim 1,
Wherein the update unit creates a data list according to the priority and compares the data list with the required data of the disk cache to update the missing data.

5. The method of claim 4,
Wherein the update unit creates a data list according to the priority using at least one of a file use date, a file use frequency, a file creation date, and a file modification date.