KR101432745B1

KR101432745B1 - Distributed file system of cluster in the virtualized cloud environment and data replication method

Info

Publication number: KR101432745B1
Application number: KR1020120122978A
Authority: KR
Inventors: 박성용; 이권용; 김준영
Original assignee: 서강대학교산학협력단
Priority date: 2012-11-01
Filing date: 2012-11-01
Publication date: 2014-08-22
Also published as: KR20140056838A

Abstract

본 발명은 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법에 관한 것으로, 보다 구체적으로는 적어도 하나의 가상머신을 포함하는 클러스터의 각 가상머신의 물리노드에 존재하며, 분산 파일 시스템의 데이터 복제를 수행하는 노드로부터 복제 데이터를 수신한 후, 상기 가상머신이 탑재된 물리노드의 자원에 대한 가용율에 기초하여 수신한 상기 복제 데이터를 상기 가상머신으로 전송하는 드라이버 도메인;을 포함한다.
이러한 구성에 의해, 본 발명의 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법은 드라이버 도메인이 가상머신을 대신하여 복제 데이터를 수신한 후, 상기 가상머신이 탑재된 물리노드의 시스템에 대한 자원 가용율에 따라 수신한 복제 데이터를 가상머신으로 전달함으로써, 물리적 수준의 내고장성 정도를 일정하게 유지할 수 있고, 복제 오버헤드의 발생을 감소시킴에 따라, 분산 파일 시스템의 성능을 향상시킬 수 있는 효과가 있다.The present invention relates to a distributed file system and a data replication method of a cluster in a virtual cloud environment, and more particularly, to a distributed file system and a data replication method that exist in a physical node of each virtual machine in a cluster including at least one virtual machine, And a driver domain for transmitting the received replicated data to the virtual machine based on the availability ratio of resources of the physical node on which the virtual machine is mounted.
According to this configuration, in the distributed file system and the data replication method of the cluster in the virtual cloud environment of the present invention, after the driver domain receives the replicated data on behalf of the virtual machine, the resource for the system of the physical node on which the virtual machine is mounted By transferring the received replicated data to the virtual machine according to the available rate, it is possible to maintain the physical level of fault tolerance at a constant level and reduce the occurrence of replication overhead, thereby improving the performance of the distributed file system .

Description

[0001] The present invention relates to a distributed file system and a data replication method for a cluster in a virtual cloud environment,

본 발명은 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법 에 관한 것으로, 특히 분산 파일 시스템의 쓰기 작업이 많은 클라우드 기반 클러스터의 성능을 향상시킬 수 있는 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법에 관한 것이다.
The present invention relates to a distributed file system and a data replication method of a cluster in a virtual cloud environment, and more particularly, to a distributed file system and a data replication method of a cluster in a virtual cloud environment capable of improving the performance of a cloud- Lt; / RTI >

각종 분야의 전산화로 인하여 데이터량이 급속도로 증가함에 따라, 데이터 병렬 또는 분산 처리를 위한 클러스터 컴퓨팅이 널리 사용되고 있으며, 이러한 클러스터 컴퓨팅 시스템은 다수의 물리 머신을 연결하여 이용하고 있다. As the amount of data increases rapidly due to computerization in various fields, cluster computing for data parallel or distributed processing is widely used, and such cluster computing system connects many physical machines.

최근에는, 가상머신 기반의 클라우드 서비스들이 출현함에 따라, 기존의 클러스터 시스템들이 점차 클라우드 상으로 이동하고 있다. In recent years, with the advent of virtual machine-based cloud services, existing cluster systems are increasingly moving into the cloud.

이처럼, 클라우드 상에서 동작하는 클러스터 시스템은 관리의 유용성, 신뢰성, 가용성, 그리고 클러스터 구성의 용이성 등의 장점을 갖는다. As such, cluster systems operating in the cloud have advantages such as manageability, reliability, availability, and ease of cluster configuration.

이러한 클러스터 시스템의 기반 기술 중 하나인 분산 파일 시스템은 데이터에 대한 빠른 접근과 내고장성(Fault tolerance)을 위해 데이터를 적절한 노드에 복제해둔다. 이러한 분산 파일 시스템의 데이터 복제는 네트워크를 통해 클러스터 상의 다른 노드로 복제 데이터를 전송하는 방식으로 수행된다. 클라우드 환경에서의 데이터 복제는 복제 시작 노드인 하나의 가상머신에서 복제 대상 노드인 다른 가상머신으로의 복제 데이터 전송으로 이루어지고, 가상화된 노드의 드라이버 도메인들을 거쳐 수행된다. 하지만 이처럼, 모든 복제 데이터가 드라이버 도메인을 거치게 되므로, 분산 파일 시스템의 쓰기 작업은 성능 저하의 문제점이 발생했다.Distributed file systems, one of the underlying technologies of these cluster systems, replicate data to the appropriate nodes for fast access to data and fault tolerance. Data replication of such distributed file systems is performed by transferring replicated data over the network to other nodes in the cluster. Data replication in a cloud environment consists of replication data transfer from one virtual machine as the replication origin node to another virtual machine as the replication target node, and is performed through the driver domains of the virtualized node. However, since all replicated data goes through the driver domain like this, the write operation of the distributed file system suffers from a performance degradation.

상술한 바와 같이, 가상 클라우드 환경에서 데이터 복제를 수행하는 클러스터 및 클러스터의 데이터 복제방법을 살펴보면 다음과 같다. As described above, a data replication method of a cluster and a cluster that performs data replication in a virtual cloud environment will be described below.

선행기술 1은 한국등록특허공보 제10-108551호(2011.01.10)로서, 클라우드 방식의 파일 복사 및 광역 통신망을 통한 디스크 복제 시스템 및 그 방법에 관한 것이다. 이러한 선행기술 1은 상기 서버가 서버 관리자로부터 이미지 소스 드라이브를 입력받고, 그 입력받은 이미지 소스 드라이브를 클라우드 디스크로 지정하고, IVP 서버 모듈에 디스크 관리제어권을 넘기도록 운영체제(OS)에 요청하는 인터럽트를 생성하는 클라우드 캐스트(Cloud Cast) 서버 모듈; 및 상기 인터럽트 요청을 받은 운영체제(OS)에 의해 램(RAM)에 로딩되고, 상기 운영체제(OS)로부터 디스크 관리 제어권을 넘겨받아 상기 클라우드 디스크를 포함하는 서버의 파일 시스템을 복제하고, 적어도 하나 이상의 클라이언트로부터 클라우드 디스크의 마운트 요청을 입력받고, 광역 통신망(WAN)을 통하여 상기 마운트를 요청한 클라이언트에 상기 복제된 파일 시스템을 전송하고, 상기 클라이언트로부터 파일 제공 요청을 대기하고 파일 제공요청이 입력되면 그 파일을 상기 클라이언트에 전송하는 IVP(Internet Virtual Partition) 서버 모듈;을 포함하고, 상기 적어도 하나 이상의 클라이언트는, 사용자로부터 서버의 클라우드 디스크를 마운트 할 드라이브 명과 마운트 요청을 입력받고 상기 IVP 서버 모듈에 클라우드 디스크의 마운트를 요청하고, 광역 통신망(WAN)을 통하여 서버의 복제된 파일 시스템을 전송받아 메모리에 로드하여, 상기 전송받은 복제된 파일 시스템의 클라우드 디스크의 드라이브 명을 상기 마운트할 드라이브 명으로 변경하여 가상 디스크를 생성하고, 자신 클라이언트의 원본 파일시스템에 상기 생성된 가상 디스크를 포함하여 자신 클라이언트의 사본 파일 시스템을 생성하는 클라우드 캐스트 클라이언트 모듈을 포함한다. Prior Art 1 is Korean Patent Registration No. 10-108551 (Jan. 10, 2011), which relates to a cloud copying method and a disk copying system via a wide area network and a method thereof. In the prior art 1, the server receives an image source drive from the server manager, designates the input image source drive as a cloud disk, and sends an interrupt requesting the operating system (OS) to pass the disk management control right to the IVP server module Generate Cloud Cast server module; And a copying unit for copying the file system of the server including the cloud disk by receiving the disk management control right from the operating system (OS), loading at least one client Receives the mount request of the cloud disk from the server, transmits the replicated file system to the client requesting the mount via the wide area network (WAN), waits for the file supply request from the client, Wherein the at least one client receives a drive name and a mount request for mounting a cloud disk of a server from a user and receives a mount request and a mount request of the cloud disk from the IVP server module, , And a wide area communication (WAN) and loads the copied file system into the memory. The virtual disk is created by changing the drive name of the cloud disk of the transferred replicated file system to the drive name to be mounted, And a cloud cast client module for creating a copy file system of the client including the generated virtual disk in the original file system.

또한, 선행기술 2는 한국등록특허공보 제10-1008557호(2011.01.10)로서, 클라우드 컴퓨팅을 이용한 디스크 순간 복구 시스템 및 그 방법에 관한 것이다. Prior art 2 is Korean Patent Registration No. 10-1008557 (Jan. 10, 2011), which relates to a disk instant recovery system and method using cloud computing.

이러한 선행기술 2는 서버 관리자로부터 클라이언트와 공유할 클라우드 디스크로 지정될 이미지 소스 드라이브를 입력받는 클라우드 캐스트(Cloud Cast) 서버 모듈 및 적어도 하나 이상의 클라이언트로부터 상기 클라우드 디스크의 마운트 요청을 입력받고, 상기 클라우드 디스크를 포함하는 서버의 파일 시스템을 복제하여 복제된 파일 시스템을 상기 클라우드 디스크의 마운트를 요청한 클라이언트에 전송하고, 상기 클라이언트로부터 파일 제공 요청을 대기하고 파일 제공 요청이 입력되면 그 파일을 상기 클라이언트에 전송하고, 상기 클라이언트로부터 상기 클라우드 디스크의 재마운트 요청을 대기하고, 재마운트 요청이 입력되면 상기 복제된 파일 시스템을 다시 상기 클라이언트에 전송하는 IVP(Internet Virtual Partition) 서버 모듈을 포함하는 서버; 및 클라우드 캐스트 클라이언트 모듈을 포함하는 적어도 하나 이상의 클라이언트;를 포함하고, 상기 클라우드 캐스트 클라이언트 모듈은, 사용자로부터 서버의 클라우드 디스크를 마운트 할 드라이브 명과 마운트 요청을 입력받거나 또는, 사용자 또는 자신의 클라이언트의 운영체제(OS)로부터 재마운트 요청을 입력받고, 에이전트 프로그램(Agent Program)을 통하여 서버에 접속하여 IVP 서버 모듈에 상기 클라우드 디스크의 마운트 또는 재마운트를 요청하며, 상기 IVP 서버 모듈로부터 복제된 파일 시스템을 전송받아 메모리에 로드하여, 상기 전송받은 복제된 파일 시스템의 클라우드 디스크의 드라이브 명을 상기 마운트할 드라이브 명으로 변경하여 가상 디스크를 생성하고, 자신 클라이언트의 원본 파일시스템에 상기 생성된 가상 디스크를 포함하여 자신 클라이언트의 사본 파일 시스템을 생성한다.
The prior art 2 receives a request for mounting the cloud disk from a cloud cast server module and at least one client receiving an image source drive to be designated as a cloud disk to be shared with a client from a server manager, And transmits the file to the client requesting the mount of the cloud disk. When the file providing request is input from the client, the file is transmitted to the client An Internet Virtual Partition (IVP) server module that waits for a remount request of the cloud disk from the client and transmits the replicated file system to the client when a remount request is input; And a cloud cast client module, wherein the cloud cast client module receives a drive name and a mount request for mounting a cloud disk of the server from a user, or receives a drive name and a mount request for mounting a cloud disk of the server from a user or an operating system OS), receives a remount request, accesses the server through an agent program (Agent Program), requests the IVP server module to mount or remount the cloud disk, receives the copied file system from the IVP server module And the virtual disk is created by changing the drive name of the cloud disk of the transferred replicated file system to the drive name of the mounted file system and storing the created virtual disk in the original file system of the client It creates a copy of the client file system.

상기와 같은 종래 기술의 문제점을 해결하기 위해, 본 발명은 클러스터의 물리노드 내 드라이버 도메인이 복제 데이터를 가상머신 대신 수신한 후, 상기 가상머신이 탑재된 물리노드의 자원 가용율에 따라 수신한 복제 데이터를 가상머신으로 전송하여 데이터 복제로 인해 성능 병목 현상이 발생하던 분산 파일 시스템의 성능을 향상시키는 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법을 제공하고자 한다.
In order to solve the problems of the prior art as described above, according to the present invention, after a driver domain in a physical node of a cluster receives replication data instead of a virtual machine, We propose a clustered distributed file system and data replication method in a virtual cloud environment that improves the performance of distributed file systems where data bottlenecks are caused by data replication by transferring data to virtual machines.

위와 같은 과제를 해결하기 위한 본 발명의 한 실시 예에 따른 가상 클라우드 환경 내 데이터 복제를 수행하는 클러스터의 분산 파일 시스템은 적어도 하나의 가상머신을 포함하는 클러스터의 물리노드에 존재하며, 분산 파일 시스템의 데이터 복제를 수행하는 노드로부터 복제 데이터를 수신한 후, 상기 가상머신이 탑재된 물리노드의 자원에 대한 가용율에 기초하여 수신한 상기 복제 데이터를 상기 가상머신으로 전송하는 드라이버 도메인;을 포함하는 것을 특징으로 한다. According to an aspect of the present invention, there is provided a distributed file system of a cluster for performing data replication in a virtual cloud environment, the distributed file system residing in a physical node of a cluster including at least one virtual machine, And a driver domain for receiving the replicated data from the node performing data replication and transmitting the received replicated data to the virtual machine based on the availability ratio of the resources of the physical node mounted on the virtual machine .

보다 바람직하게는 상기 가상머신이 탑재된 물리노드의 CPU 또는 메모리 중 적어도 하나의 자원에 대한 가용율이 기설정된 임계치 보다 높은 경우에, 상기 분산 파일 시스템의 복제 시작 가상머신으로부터 수신한 복제 데이터를 복제 대상 가상머신으로 전송하는 드라이버 도메인을 포함할 수 있다. More preferably, when the availability rate of the resource of at least one of the CPU or the memory of the physical node on which the virtual machine is mounted is higher than a preset threshold value, the replication data received from the replication start virtual machine of the distributed file system is replicated And may include a driver domain that transfers to the target virtual machine.

보다 바람직하게는 상기 분산 파일 시스템의 복제 시작 가상머신이 복제 대상 가상머신으로 전송한 복제 데이터를 가로채어 수신하거나, 상기 분산 파일 시스템의 복제 시작 가상머신으로부터 복제 데이터를 직접 수신하는 드라이버 도메인을 포함할 수 있다. More preferably, it includes a driver domain for intercepting the replication data transferred to the replication target virtual machine by the replication starting virtual machine of the distributed file system, or receiving the replication data directly from the replication starting virtual machine of the distributed file system .

보다 바람직하게는 상기 물리노드에 탑재된 가상머신으로 전달되는 네트워크 패킷을 감시하는 패킷감시모듈; 복제대상 가상머신으로 전달되는 복제 데이터의 네트워크 패킷을 가로채어 대신 수신하는 데이터대행수신모듈; 수신한 상기 복제 데이터를 복제 큐(Queue)에 저장하도록 제어하는 저장제어모듈; 및 상기 네트워크 패킷에 대한 응답 패킷을 상기 분산 파일 시스템으로 전송하는 응답패킷전송모듈;을 포함하는 드라이버 도메인을 포함할 수 있다.More preferably, the packet monitoring module monitors a network packet transmitted to a virtual machine mounted on the physical node. A data proxy receiving module for intercepting and receiving a network packet of the replicated data transmitted to the virtual machine to be replicated; A storage control module for controlling to store the received duplicate data in a duplication queue; And a response packet transmission module for transmitting a response packet for the network packet to the distributed file system.

특히, 상기 분산 파일 시스템에 상기 드라이버 도메인이 최종 목적지로 설정되어, 복제 데이터를 직접 수신하는 데이터대행수신모듈을 더 포함할 수 있다.In particular, the distributed file system may further include a data proxy receiving module in which the driver domain is set as a final destination, and the copy data is directly received.

위와 같은 과제를 해결하기 위한 본 발명의 한 실시 예에 따른 가상 클라우드 환경에서 클러스터의 분산 파일 시스템이 수행하는 데이터 복제 방법은 상기 클러스터가 분산 파일 시스템으로 데이터 복제를 위한 쓰기작업을 요청하는 단계; 상기 클러스터 내 포함된 가상머신 중 복제 대상 가상머신을 선택하는 단계; 상기 클러스터 내 포함된 가상머신이 탑재된 물리노드의 드라이버 도메인이 복제 데이터를 수신하여 복제 큐(Queue)에 저장하는 단계; 상기 드라이버 도메인이 상기 가상머신이 탑재된 물리노드의 자원에 대한 가용율에 따라 상기 가상머신으로 상기 복제 데이터를 전송하는 단계; 및 상기 드라이버 도메인이 상기 가상머신으로의 복제 데이터 전송을 완료하면, 분산 파일 시스템에 이를 알리는 단계;를 포함하는 것을 특징으로 한다. According to an embodiment of the present invention, there is provided a data replication method performed by a distributed file system of a cluster in a virtual cloud environment, the method comprising: requesting a write operation for data replication to a distributed file system; Selecting a copy target virtual machine among the virtual machines included in the cluster; Receiving the replicated data from the driver domain of the physical node on which the virtual machine included in the cluster is mounted and storing the replicated data in a replication queue; Transferring the replica data to the virtual machine in accordance with the availability rate of the driver domain to resources of the physical node on which the virtual machine is mounted; And notifying the distributed file system when the driver domain has completed transferring the copy data to the virtual machine.

보다 바람직하게는 상기 분산 파일 시스템이 복제 대상 가상머신으로 전송한 복제 데이터를 가로채어 수신하는 과정; 상기 분산 파일 시스템으로부터 복제 데이터를 직접 수신하는 과정; 중 적어도 하나의 과정을 수행하는 상기 드라이버 도메인이 복제 데이터를 수신하여 복제 큐에 저장하는 단계를 포함할 수 있다. More preferably, the distributed file system intercepts and receives replication data transmitted to the replication target virtual machine; Directly receiving duplicate data from the distributed file system; The driver domain receiving the duplicate data and storing the duplicate data in the duplicate queue.

보다 바람직하게는 상기 물리노드에 탑재되는 가상머신으로 전달되는 네트워크 패킷을 감시하는 과정; 복제 대상 가상머신으로 전달되는 복제 데이터의 네트워크 패킷을 가로채어 대신 수신하는 과정; 수신한 상기 복제 데이터를 복제 큐에 저장하는 과정; 및 상기 네트워크 패킷에 대한 응답 패킷을 상기 분산 파일 시스템으로 전송하는 과정;을 포함하는 상기 드라이버 도메인이 복제 데이터를 가로채어 수신하는 과정을 포함할 수 있다. Monitoring a network packet transmitted to a virtual machine mounted on the physical node; A step of intercepting and receiving a network packet of replicated data delivered to a replication target virtual machine instead; Storing the received replica data in a replica queue; And transmitting a response packet for the network packet to the distributed file system, wherein the driver domain intercepts and receives the duplicate data.

특히, 상기 분산 파일 시스템에 상기 드라이버 도메인이 최종 목적지로 설정되어, 상기 분산 파일 시스템으로부터 복제 데이터를 직접 수신하는 상기 분산 파일 시스템으로부터 복제 데이터를 직접 수신하는 과정을 포함할 수 있다. In particular, the method may include a step of directly receiving replica data from the distributed file system, in which the driver domain is set as a final destination in the distributed file system and replica data is directly received from the distributed file system.

보다 바람직하게는 상기 가상머신이 탑재된 물리노드의 CPU 또는 메모리 중 적어도 하나의 자원에 대한 가용율을 확인하는 과정; 상기 가용율을 기설정된 임계치와 비교하는 과정; 및 상기 가용율이 기설정된 임계치보다 높으면, 상기 분산 파일 시스템으로부터 수신한 복제 데이터를 상기 가상머신으로 전송하는 과정;을 포함하는 상기 드라이버 도메인이 가상머신으로 복제 데이터를 전송하는 단계를 포함할 수 있다.
More preferably, the step of checking the availability rate of at least one resource of the CPU or memory of the physical node on which the virtual machine is mounted is performed. Comparing the availability rate with a predetermined threshold value; And transmitting the replica data received from the distributed file system to the virtual machine if the availability rate is higher than a preset threshold value, transmitting the replica data to the virtual machine by the driver domain .

본 발명의 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법은 드라이버 도메인이 복제 대상 가상머신을 대신하여 복제 데이터를 수신한 후, 상기 가상머신이 탑재된 물리노드의 시스템에 대한 자원 가용율에 따라 수신한 복제 데이터를 가상머신으로 전달함으로써, 물리적 수준의 내고장성 정도를 일정하게 유지할 수 있고, 복제 오버헤드의 발생을 감소시킴에 따라, 분산 파일 시스템의 성능을 향상시킬 수 있는 효과가 있다.
The distributed file system and data replication method of a cluster in a virtual cloud environment according to the present invention is characterized in that after a driver domain receives replication data on behalf of a replication target virtual machine, By transferring the received replicated data to the virtual machine, the degree of fault tolerance at the physical level can be maintained constant, and the occurrence of the replication overhead is reduced, thereby improving the performance of the distributed file system.

도 1은 데이터 복제 매커니즘을 나타낸 도면이다.
도 2는 가상머신으로 구성된 클러스터의 복제 데이터 수에 따른 분산 파일 시스템의 쓰기 성능을 나타낸 그래프이다.
도 3은 가상머신 및 여러 드라이버 도메인으로 구성된 클러스터의 복제 데이터 수에 따른 분산 파일 시스템의 쓰기 성능을 나타낸 그래프이다.
도 4는 본 발명의 일 실시 예에 따른 가상 클라우드 환경에서 데이터 복제를 수행하는 클러스터 및 클러스터의 데이터 복제방법을 나타낸 도면이다.Figure 1 is a diagram illustrating a data replication mechanism.
2 is a graph showing a write performance of a distributed file system according to the number of replicated data in a cluster constituted by a virtual machine.
3 is a graph showing write performance of a distributed file system according to the number of replicated data in a cluster composed of a virtual machine and a plurality of driver domains.
4 is a diagram illustrating a data replication method for clusters and clusters for performing data replication in a virtual cloud environment according to an embodiment of the present invention.

이하, 본 발명을 바람직한 실시 예와 첨부한 도면을 참고로 하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 여기에서 설명하는 실시 예에 한정되는 것은 아니다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Hereinafter, the present invention will be described in detail with reference to preferred embodiments and accompanying drawings, which will be easily understood by those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

클라우드 기반의 클러스터 시스템에서 사용되던 분산 파일 시스템의 기존 데이터 복제 기법을 사용하는 경우에는 분산 파일 시스템의 쓰기 작업으로 인해 가상 머신 간 데이터 복제 발생 시, 복제 데이터가 물리 노드에 도착하여 물리적 수준의 내고장성이 보장되더라도 복제 대상 가상 머신까지 복제 데이터가 도달해야 다음 작업을 수행한다. When using the existing data replication method of the distributed file system used in the cloud-based cluster system, when data replication occurs between the virtual machines due to the write operation of the distributed file system, the replication data arrives at the physical node, Even if it is guaranteed that replication data reaches the replication target virtual machine, the next operation is performed.

도 1은 데이터 복제 매커니즘을 나타낸 도면이다.Figure 1 is a diagram illustrating a data replication mechanism.

도 1에 도시된 바와 같이, HDFS 클라이언트가 특정 파일을 만들어서(1~2) 쓰기를 수행하는 경우(3~4), 복제 데이터는 여러 데이터 노드에 걸쳐 파이프라인을 통해 전송되고 저장된다(5~6). 이러한 데이터 복제 과정이 끝난 후에야 비로소 해당 파일의 쓰기 작업이 완료(7)되고 분산 파일 시스템에서의 다음 작업을 수행하게 된다.As shown in FIG. 1, when the HDFS client creates a specific file (1 to 2) and performs the write (3 to 4), the replicated data is transmitted and stored through the pipeline across the plurality of data nodes 6). Only after the data replication process is completed, the write operation of the file is completed (7) and the next operation in the distributed file system is performed.

분산 파일 시스템에서 쓰기 작업으로 인한 데이터 복제가 필요한 경우에는, 파이프라인을 통해 복제 데이터가 병렬로 전송되지만, 전송 과정 중 오버헤드가 발생하게 된다. 또한 클라우드 기반의 클러스터 시스템에서는 클라우드를 구성하기 위한 가상화 기술의 오버헤드가 더해짐에 따라, 쓰기 작업이 빈번하게 일어나는 경우, 그 성능이 현저히 느려지는 현상이 발생한다. 이러한 현상은 복제 데이터가 이미 물리 노드에 도착하여 물리적 수준의 내고장성을 충족시켰음에도 불구하고 가상 머신까지 전달되기 위해 필요한 시간과 자원이 추가적으로 발생하기 때문이다. If the distributed file system requires data replication due to write operations, replication data is transferred in parallel through the pipeline, but overhead occurs during the transfer process. In addition, cloud-based cluster systems add to the overhead of virtualization technology to configure the cloud, resulting in significantly slower performance when write operations occur frequently. This is because the replicated data already arrives at the physical node and meets the physical level of fault tolerance, but it also requires additional time and resources to be delivered to the virtual machine.

도 2는 가상머신으로 구성된 클러스터의 복제 데이터 수에 따른 분산 파일 시스템의 쓰기 성능을 나타낸 그래프이다.2 is a graph showing a write performance of a distributed file system according to the number of replicated data in a cluster constituted by a virtual machine.

도 2에 도시된 바와 같이, 분산 파일 시스템에서 각 데이터에 대한 복제 데이터의 개수가 증가할수록 해당 데이터의 쓰기 작업에 걸리는 시간이 증가함을 알 수 있다. As shown in FIG. 2, as the number of replicated data for each data in the distributed file system increases, the time required for writing data increases.

도 3은 가상머신 및 여러 드라이버 도메인으로 구성된 클러스터의 복제 데이터 수에 따른 분산 파일 시스템의 쓰기 성능을 나타낸 그래프이다. 3 is a graph showing write performance of a distributed file system according to the number of replicated data in a cluster constituted by a virtual machine and a plurality of driver domains.

복제 대상 가상 머신까지 복제 데이터를 전달하지 않더라도 복제 데이터가 복제 대상 가상 머신이 탑재된 물리 노드에 도착하였을 때 이미 물리적 수준의 내고장성이 보장되므로 도 3에서 도시된 시스템의 성능 향상을 얻을 수 있게 된다.Even when the copying data is not transferred to the copying target virtual machine, when the copying data arrives at the physical node on which the copying target virtual machine is mounted, the physical level fault tolerance is guaranteed, so that the performance improvement of the system shown in FIG. 3 can be obtained .

종래의 가상 머신 간의 데이터 복제에서의 분산 파일 시스템은 물리 노드에 복제 데이터가 도착하여 물리적인 수준에서의 내고장성을 유지하더라도 복제 대상 가상 머신까지 복제 데이터가 도달해야만, 다음 수행해야 하는 작업단계로 넘어갈 수가 있어 쓰기 작업이 느려지는 문제점이 있었다. Conventional Distributed File System in Data Replication between Virtual Machines Even if replication data arrives at the physical node and maintains fault tolerance at the physical level, replication data must reach the replication target virtual machine before the next step There is a problem that the writing operation is slowed down.

이에 따라, 본 발명은 가상 머신들 사이에서 이루어지던 데이터 복제를 복제 대상 가상 머신이 아닌 가상 머신이 탑재된 물리 노드에 전송하는 것으로 물리적인 수준의 내고장성을 보장하고, 이후 물리 노드가 충분한 CPU나 메모리 등의 시스템 자원을 가질 때 복제 대상 가상 머신으로 복제 데이터를 전달하도록 한다. Accordingly, the present invention provides a technique for ensuring physical level fault tolerance by transmitting data replication between virtual machines to a physical node mounted on a virtual machine rather than a replication target virtual machine, When you have system resources such as memory, copy data is transferred to the replication target virtual machine.

또한 데이터 복제를 위한 대상 가상 머신을 선택할 때 물리 노드의 자원 가용성에 기반함에 따라, 클라우드 기반의 클러스터 시스템에서 동작하는 분산 파일 시스템의 보다 빠른 쓰기 작업이 이루어지도록 한다.In addition, when selecting a target virtual machine for data replication, based on resource availability of a physical node, a faster write operation of a distributed file system operating in a cloud-based cluster system is performed.

이하, 도 4를 참조하여 본 발명의 일 실시 예에 따른 가상 클라우드 환경에서 데이터 복제를 수행하는 클러스터에 대하여 자세히 살펴보도록 한다. Hereinafter, a cluster for performing data replication in a virtual cloud environment according to an embodiment of the present invention will be described in detail with reference to FIG.

도 4는 본 발명의 일 실시 예에 따른 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법을 나타낸 도면이다.4 is a diagram illustrating a distributed file system and a data replication method of a cluster in a virtual cloud environment according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 본 발명에 따른 가상 클라우드 환경에서 데이터 복제를 수행하기 위해, 클러스터를 구성하는 적어도 하나의 가상머신은 각각의 물리노드에 드라이버 도메인을 포함한다. 이러한 드라이버 도메인은 분산 파일 시스템 또는 데이터 복제를 수행하는 노드로부터 복제 데이터를 수신한 후, 상기 가상머신이 탑재된 물리노드의 자원 즉, CPU 또는 메모리 등에 대한 가용율에 기초하여 수신한 상기 복제 데이터를 상기 가상머신으로 전송한다. 특히, 상기 드라이버 도메인은 상기 가상머신이 탑재된 물리노드의 CPU 또는 메모리 중 적어도 하나의 자원에 대한 가용율이 기설정된 임계치 보다 높은 경우에, 상기 가상머신이 탑재된 물리노드의 시스템 자원에 여유가 존재한다고 판단하여, 상기 분산 파일 시스템으로부터 수신한 복제 데이터를 상기 가상머신으로 전달한다. As shown in FIG. 4, in order to perform data replication in the virtual cloud environment according to the present invention, at least one virtual machine constituting the cluster includes a driver domain in each physical node. The driver domain receives the replicated data from the distributed file system or a node performing data replication and then transmits the replicated data received based on the availability of the physical node, that is, CPU, memory, or the like, To the virtual machine. In particular, when the availability rate of the driver domain to at least one of the CPU or the memory of the physical node on which the virtual machine is mounted is higher than a predetermined threshold, the driver domain has a margin in the system resources of the physical node on which the virtual machine is mounted And transfers the copy data received from the distributed file system to the virtual machine.

이러한 드라이버 도메인은 패킷감시모듈, 데이터대행수신모듈, 저장제어모듈 및 응답패킷전송모듈을 포함한다. These driver domains include a packet monitoring module, a data proxy receiving module, a storage control module, and a response packet transmission module.

패킷감시모듈은 상기 물리노드에 탑재되는 가상머신으로 전달되는 네트워크 패킷을 감시한다. The packet monitoring module monitors a network packet transmitted to a virtual machine mounted on the physical node.

데이터대행수신모듈은 복제 대상 가상머신으로 전달되는 복제 데이터의 네트워크 패킷을 가로채어 대신 수신하거나, 상기 분산 파일 시스템에 상기 드라이버 도메인이 최종 목적지로 설정되어, 복제 데이터를 직접 수신할 수 있다. The data proxy receiving module intercepts or receives the network packet of the replicated data to be transferred to the replication target virtual machine or the replicated data can be directly received by setting the driver domain as the final destination in the distributed file system.

저장제어모듈은 수신한 상기 복제 데이터를 복제 큐(Queue)에 저장하도록 제어한다. The storage control module controls to store the received copy data in a copy queue.

응답패킷전송모듈은 상기 네트워크 패킷에 대한 응답 패킷을 상기 분산 파일 시스템으로 전송한다. The response packet transmission module transmits a response packet for the network packet to the distributed file system.

이어서, 본 발명의 다른 실시 예에 따른 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법에 대하여 보다 자세히 살펴보도록 한다. Next, a distributed file system and a data replication method of a cluster in a virtual cloud environment according to another embodiment of the present invention will be described in more detail.

도 4에 도시된 바와 같이, 클러스터의 클라이언트가 분산 파일 시스템에 쓰기 작업을 요청하여 특정 데이터를 작성하면(1) 해당 데이터는 클라이언트가 위치한 가상 머신이 탑재된 물리 노드의 드라이버 도메인에 위치하는 복제 큐(Queue)에 1차적으로 복제(2)된다. 그 후 복제 대상 가상 머신을 선택할 때, 다른 물리 노드들 중 복제 데이터를 빠르게 받아줄 수 있는 물리 노드를 선택하여 복제 데이터를 전달하고 복제 데이터를 받은 물리 노드의 드라이버 도메인은 받은 복제 데이터를 복제 큐에 저장(3)한다. 그리고 다음 복제 데이터를 전달할 물리노드를 선택하여 전달(4)한다. 마지막으로 복제 데이터를 받은 물리 노드의 드라이버 도메인은 복제 작업이 완료되었음을 알린다(5).As shown in FIG. 4, when a client of a cluster requests a write operation to a distributed file system to create specific data (1), the data is stored in a duplicate queue located in the driver domain of the physical node on which the virtual machine (2) to the queue (Queue). Then, when selecting the replication target virtual machine, among other physical nodes, the physical node that can receive the replicated data is selected quickly, and the replicated data is transferred. In the driver domain of the physical node receiving the replicated data, Save (3). Then, the physical node to which the next copy data is to be transmitted is selected and transferred (4). Finally, the driver domain of the physical node receiving the replication data informs that the replication operation is completed (5).

이에 따라, 분산 파일 시스템은 다음 작업으로 넘어갈 수 있게 된다. 결국, 물리적 수준의 내고장성은 기존의 기법과 동일하게 유지하면서 보다 빠른 작업 처리를 수행할 수 있게 된다. 이후 복제 큐에 복제 데이터를 담고 있는 드라이버 도메인은 해당 물리 노드의 시스템 자원이 충분하여 다른 가상 머신들의 성능에 영향을 미치지 않고도 복제 데이터를 복제 대상 가상머신에 전달할 수 있을 때, 상기 복제 데이터를 가상머신으로 전달하고(6 또는 8), 해당 복제 데이터가 복제 대상 가상 머신으로 입력되었음을 분산 파일 시스템에 알린다(7 또는 9).This allows the distributed file system to move on to the next job. As a result, the physical level of fault tolerance can be performed faster while keeping the same as the conventional technique. When the driver domain that contains the replication data in the replication queue can transfer the replication data to the replication target virtual machine without the performance of the other virtual machines being sufficient because the system resources of the corresponding physical node are sufficient, (6 or 8) and notifies the distributed file system that the corresponding replication data has been entered into the replication target virtual machine (7 or 9).

각 물리 노드의 드라이버 도메인이 가상머신을 대신하여 복제 데이터를 수신하는 방법은 두 가지 방법으로 구현될 수 있다. 하나는 가상 머신들로 구성된 클러스터의 분산 파일 시스템이 드라이버 도메인의 존재를 모르는 경우로서 분산 파일 시스템은 복제 데이터를 복제 대상 가상 머신에게 전달하지만 드라이버 도메인이 이 복제 데이터를 중간에서 가로채는 방법이고, 다른 하나는 분산 파일 시스템이 드라이버 도메인의 존재를 알고 복제 데이터를 복제 대상 가상 머신이 탑재된 물리 노드의 드라이버 도메인에게 전달하는 방법이다. The method by which the driver domain of each physical node receives replication data on behalf of the virtual machine can be implemented in two ways. One is a case where the distributed file system of a cluster consisting of virtual machines does not know the existence of the driver domain. The distributed file system delivers the replicated data to the replicated virtual machine, but the driver domain intercepts the replicated data. Is a method in which the distributed file system knows the existence of the driver domain and transfers the replicated data to the driver domain of the physical node on which the replication target virtual machine is mounted.

먼저, 드라이버 도메인이 복제 데이터를 가로채어 대신 수신하는 과정에 대하여 살펴보도록 한다. 데이터 복제에 사용되는 포트가 고정되어 있는 경우, 각 노드의 드라이버 도메인이 해당 물리 노드에 탑재된 가상 머신들로 전달되는 네트워크 패킷을 감시하고, 데이터 복제에 사용되는 포트를 통해 전달되는 패킷을 가로채어 복제 큐에 저장한 후, 데이터 복제나 응답 패킷 전달을 대신 수행한다. 이에 따라, 기존의 분산 파일 시스템이 수행하는 데이터 복제를 변경할 필요가 없게 된다. First, let's take a look at the process of the driver domain intercepting and receiving replicated data. When the port used for data replication is fixed, the driver domain of each node monitors the network packet delivered to the virtual machines mounted on the corresponding physical node, intercepts the packet transmitted through the port used for data replication After it is stored in the replication queue, it performs data replication or response packet delivery instead. Thereby, there is no need to change the data replication performed by the existing distributed file system.

이와 달리, 드라이버 도메인이 분산 파일 시스템으로부터 복제 데이터를 직접 수신하는 과정에 대하여 살펴보도로 한다. On the other hand, let's take a look at the process in which the driver domain directly receives the replicated data from the distributed file system.

분산 파일 시스템이 복제 대상 가상 머신을 선정한 후, 해당 가상 머신이 탑재된 물리 노드의 드라이버 도메인을 데이터 복제의 목적지로 설정하여, 데이터 복제를 수행한다. 이에 따라, 데이터 복제에 사용되는 포트가 고정되어 있지 않거나, 다른 목적으로 해당 포트를 동시에 사용하는 경우에도 데이터 복제를 수행할 수 있다. After the distributed file system selects the replication target virtual machine, the driver domain of the physical node on which the virtual machine is mounted is set as the destination of data replication, and data replication is performed. Accordingly, data replication can be performed even when the port used for data replication is not fixed or the port is used for other purposes at the same time.

이어서, 상기 드라이버 도메인이 상기 가상머신이 탑재된 물리노드의 자원에 대한 가용율에 따라 상기 가상머신으로 상기 복제 데이터를 전송한다. 이러한 복제 데이터 전송 과정은 먼저, 상기 가상머신이 탑재된 물리노드의 CPU 또는 메모리 중 적어도 하나의 자원에 대한 가용율을 확인한다. Then, the driver domain transfers the replicated data to the virtual machine according to the availability rate of resources of the physical node on which the virtual machine is mounted. In this replication data transfer process, the availability rate of at least one of the CPU or memory of the physical node on which the virtual machine is mounted is checked.

상기 가용율을 기설정된 임계치와 비교하여, 그 비교결과 상기 가용율이 기설정된 임계치보다 높으면, 상기 분산 파일 시스템으로부터 수신한 복제 데이터를 상기 가상머신으로 전송한다. And transmits the copy data received from the distributed file system to the virtual machine when the availability ratio is higher than a predetermined threshold as a result of the comparison.

이와 같이, 상기 드라이버 도메인이 상기 가상머신으로의 복제 데이터 전송을 완료하면, 분산 파일 시스템에 이를 알린다. Thus, when the driver domain completes the copy data transfer to the virtual machine, it informs the distributed file system.

본 발명을 통해 클라우드 상에서 가상머신들로 구성된 클러스터의 분산 파일 시스템이 내고장성 등을 위해 사용하는 데이터 복제 기법을 클라우드 환경을 고려하도록 설계됨으로써, 성능을 향상시킬 수 있게 되었다. 뿐만 아니라, 클라우드 환경을 고려하여 분산 파일 시스템의 작업을 물리적 수준의 내고장성은 유지하면서 보다 빠르게 수행할 수 있도록 각 노드의 드라이버 도메인에서 복제 데이터를 대신 수신하여, 추후 충분한 시스템 자원이 제공될 때 복제 대상 가상 머신에 복제 데이터를 전달하도록 하였으며 복제 대상 가상 머신을 선정할 때에는 가상 머신의 자원만이 아닌 물리 노드 전체의 시스템 자원을 고려하여 보다 빠르게 복제 데이터를 받을 수 있도록 한다. 실험 결과 가상 머신 인식 데이터 복제기법을 사용하여 클라우드 상에서 구성된 클러스터의 분산 파일 시스템이 처리하는 쓰기 작업 속도가 보다 빨라졌으며 물리적 수준의 내고장성도 동일하게 보장된다.Through the present invention, it is possible to improve performance by designing a data replication technique that a distributed file system of a cluster composed of virtual machines on a cloud uses for fault tolerance, considering a cloud environment. In addition, in consideration of the cloud environment, the replica data is received in the driver domain of each node instead so that the operations of the distributed file system can be performed faster while maintaining the physical level of fault tolerance. When selecting a virtual machine to be replicated, it is possible to receive replicated data more quickly considering the system resources of the entire physical node, not only the resources of the virtual machine. Experimental results show that the distributed file system of a cluster configured in the cloud using the virtual machine - aware data replication method is faster and the physical level of fault tolerance is the same.

본 발명의 가상 클라우드 환경에서 클러스터의 분산 파일 시스템 및 데이터 복제 방법은 드라이버 도메인이 가상머신을 대신하여 복제 데이터를 수신한 후, 상기 가상머신의 시스템에 대한 자원 가용율에 따라 수신한 복제 데이터를 가상머신으로 전달함으로써, 물리적 수준의 내고장성 정도를 유지할 수 있고, 복제 오버헤드의 발생을 감소시킴에 따라, 분산 파일 시스템의 성능을 향상시킬 수 있는 효과가 있다. A distributed file system and a data replication method of a cluster in a virtual cloud environment of the present invention are characterized in that after a driver domain receives replication data on behalf of a virtual machine, To the machine, it is possible to maintain the physical level of fault tolerance and reduce the occurrence of the replication overhead, thereby improving the performance of the distributed file system.

상기에서는 본 발명의 바람직한 실시 예에 대하여 설명하였지만, 본 발명은 이에 한정되는 것이 아니고 본 발명의 기술 사상 범위 내에서 여러 가지로 변형하여 실시하는 것이 가능하고 이 또한 첨부된 특허청구범위에 속하는 것은 당연하다.
While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, Do.

Claims

In a clustered distributed file system in a virtual cloud environment,
Wherein the virtual machine is located in a physical node of a cluster including at least one virtual machine and after receiving replication data from a node performing data replication of the distributed file system, A driver domain for transmitting the replicated data received by the virtual machine to the virtual machine;
&Lt; / RTI >
The driver domain
When the availability rate of the resource of at least one of the CPU and the memory of the physical node on which the virtual machine is mounted is higher than a preset threshold value, the copy data received from the distributed file system is transferred to the virtual machine to which the virtual machine belongs And the distributed file system of the cluster in the virtual cloud environment.

delete

The method according to claim 1,
The driver domain
Wherein the distributed file system intercepts the replica data transmitted to the replication target virtual machine and receives the replica data directly or receives the replica data directly from the distributed file system.

The method of claim 3,
The driver domain
A packet monitoring module for monitoring a network packet transmitted to a virtual machine mounted on the physical node;
A data proxy receiving module for intercepting a network packet transmitted to a replication target virtual machine and receiving replicated data instead;
A storage control module for controlling to store the received duplicate data in a duplication queue; And
A response packet transmission module for transmitting a response packet to the network packet to the distributed file system;
And a distributed file system of the cluster in a virtual cloud environment.

5. The method of claim 4,
The data proxy receiving module
Wherein the driver domain is set as a final destination in the distributed file system, and the replication domain is directly received, so that the distributed file system of the cluster in the virtual cloud environment.

A method for data replication of a distributed file system in a cluster in a virtual cloud environment,
The cluster requesting a write operation for data replication to a distributed file system;
Receiving a duplicate data from a driver domain of a physical node mounted in a virtual machine included in the cluster and storing the duplicate data in a replication queue;
Transferring the replica data to the virtual machine in accordance with the availability rate of the driver domain to resources of the physical node on which the virtual machine is mounted; And
Informing the distributed file system when the driver domain has completed transferring the replicated data to the virtual machine;
The method comprising the steps of: (a) copying data in a distributed file system in a cluster in a virtual cloud environment;

The method according to claim 6,
The step of the driver domain receiving the replication data and storing the replication data in the replication queue
A step of intercepting and receiving replica data transmitted from the distributed file system to a replica target virtual machine;
Directly receiving duplicate data from the distributed file system;
Wherein the at least one process is performed in a cluster cloud environment.

8. The method of claim 7,
The process of intercepting and receiving replica data by the driver domain
Monitoring a network packet transmitted to a virtual machine mounted on the physical node;
A step of intercepting network packets of the replicated data transferred to the replicated target virtual machine and receiving replicated data;
Storing the received replica data in a replica queue; And
Transmitting a response packet to the network packet to the distributed file system;
The method comprising the steps of: (a) copying data in a distributed file system in a cluster in a virtual cloud environment;

8. The method of claim 7,
The process of directly receiving the duplicate data from the distributed file system
Wherein the driver domain is set as a final destination in the distributed file system, and the replication data is directly received from the distributed file system.

The method according to claim 6,
The step of transferring the replicated data to the virtual machine by the driver domain
Determining availability of at least one of a CPU and a memory of the physical node on which the virtual machine is mounted;
Comparing the availability rate with a predetermined threshold value; And
Transmitting the copy data received from the distributed file system to the virtual machine when the availability rate is higher than a preset threshold;
The method comprising the steps of: (a) copying data in a distributed file system in a cluster in a virtual cloud environment;

A computer-readable recording medium on which a program for executing the method according to any one of claims 6 to 10 is recorded.