KR20210019751A

KR20210019751A - Method for Real time backup from distributed and replicated storage system to public cloud and backup apparatus for the same

Info

Publication number: KR20210019751A
Application number: KR1020190098760A
Authority: KR
Inventors: 오인영; 김동갑
Original assignee: 주식회사 블루밴드
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2021-02-23

Abstract

The present invention relates to a method for backing up data from a distributed and replicated storage system to public cloud in real time and a backup apparatus therefor. According to the present invention, the method for backing up data from a distributed and replicated storage system to public cloud in real time, in which a plurality of nodes each having a plurality of bricks that are minimum unit storage spaces are provided so that a distribution function and a replication function are applied individually or in combination, includes: creating a distributed replication volume using bricks in the plurality of nodes by applying a distributed file system; selecting any one of the nodes where the distributed replication volume is generated as a leader node, in the event management block as a block that manages local file events; extracting local disk or folder information in the leader node from the event management block and registering a file event; and receiving a file event from the event management block using a synchronization block, and sending file of the corresponding event to a predefined cloud.

Description

Method for Real time backup from distributed and replicated storage system to public cloud and backup apparatus for the same}

본 발명은 분산복제저장시스템(distributed/replicated storage system)에서 퍼블릭 클라우드(Cloud)로의 실시간 백업방법 및 백업장치에 관한 것으로, 보다 구체적으로는, 분산복제저장장치에서 네트워크 부하량을 줄이면서도 효율적으로 퍼블릭 클라우드에 실시간 백업을 할 수 있는 분산복제저장장치에서 퍼블릭 클라우드로의 실시간 백업방법 및 백업장치에 관한 것이다.The present invention relates to a real-time backup method and a backup device from a distributed/replicated storage system to a public cloud, and more specifically, to a public cloud efficiently while reducing the network load in a distributed/replicated storage system. It relates to a real-time backup method and a backup device from a distributed replication storage device capable of real-time backup to a public cloud.

일반적으로, 분산복제저장시스템(Distributed/Replicated Storage System)은 여러 대의 저장장치들을 네트워크를 통해 연결하여 가상의 1대의 저장장치로 사용할 수 있게 한다. 저장장치의 사용자는 저장장치 내의 디스크 1개를 사용자 컴퓨터에 마운트하여 마치 원격 디스크가 사용자 컴퓨터에 직접 연결된 것처럼 읽기 쓰기 동작을 할 수 있으나. 실제는 여러 대의 저장장치에 분산/복제되어 있는 파일을 사용하게 된다. 이러한 동작은 저장장치내의 분산파일시스템에 의해 수행된다.In general, a Distributed/Replicated Storage System connects multiple storage devices through a network so that they can be used as one virtual storage device. The user of the storage device can mount one disk in the storage device to the user's computer and read and write operations as if the remote disk was directly connected to the user's computer. Actually, files distributed/replicated to several storage devices are used. This operation is performed by the distributed file system in the storage device.

분산파일시스템의 일예로 레드햇(RedHat)의 글러스터(GlusterFS)를 들 수 있는데 레드햇은 전세계에서 가장 널리 사용되는 상용 리눅스인 레드햇 엔터프라이즈 리눅스(RHEL: Red Hat Enterprise Linux)를 공급하는 회사이다. 글러스터는 각 저장장치의 디스크를 브릭(brick)이라는 기본 단위로 사용하며, 분산볼륨, 복제볼륨, 또는 분산복제볼륨 구성으로 여러 저장장치의 브릭을 하나의 가상 볼륨으로 묶을 수 있다. An example of a distributed file system is RedHat's GlusterFS. Red Hat is a company that supplies Red Hat Enterprise Linux (RHEL), the most widely used commercial Linux in the world. Gluster uses the disks of each storage device as a basic unit called a brick, and it can group bricks of multiple storage devices into one virtual volume by configuring a distributed volume, a duplicate volume, or a distributed replication volume.

도 1은 일반적인 분산파일시스템을 구성하는 분산 볼륨을 도시한 것이다. 1 illustrates a distributed volume constituting a general distributed file system.

도 1에 도시된 바와 같이, 분산볼륨#1(20)은 사용자(10)에게는 하나의 디스크처럼 보이지만 실제 분산볼륨#1(20)은 3개의 다른 저장장치(22,24,26)에 위치한 브릭(b21,b22,b23)으로 구성된다. 파일별로 고유하게 정해지는 해쉬 알고리즘에 의해 한 개의 파일은 3개의 브릭(b21,b22,b23) 중 하나의 브릭(예를 들면, b21)에만 저장된다. 예를 들면, 파일(f21)은 제1저장장치(22)의 브릭(b21)에 저장되고, 파일(f22)는 제2저장장치(24)의 브릭(b22)에 저장되고, 파일(f23)은 제3저장장치(26)의 브릭(b23)에 저장된다. As shown in Figure 1, distributed volume #1 (20) looks like a single disk to the user 10, but the actual distributed volume #1 (20) is a brick located in three different storage devices (22, 24, 26). It consists of (b21,b22,b23). One file is stored in only one brick (for example, b21) out of three bricks (b21, b22, b23) by a hash algorithm that is uniquely determined for each file. For example, the file (f21) is stored in the brick (b21) of the first storage device 22, the file (f22) is stored in the brick (b22) of the second storage device 24, and the file (f23) Is stored in the brick b23 of the third storage device 26.

도 2는 일반적인 분산파일시스템을 구성하는 복제볼륨의 예를 도시한 것이다.2 shows an example of a replication volume constituting a general distributed file system.

도 2에 도시된 바와 같이, 사용자(10)는 복제볼륨#2(30)만을 사용하지만 실제3개의 다른 저장장치(32,34,36)의 브릭(b31,b32,b33)이 사용되며, 하나의 파일(f31)은 반드시 3개의 브릭(b31,b32,b33)에 동시에 복제되어 저장된다. 따라서 저장장치 중 1대가 고장이 나도 사용자(10)는 정상적으로 복제볼륨#2(30)를 사용하는 것이 가능하다.As shown in Figure 2, the user 10 uses only the duplicate volume #2 (30), but the bricks (b31, b32, b33) of three different storage devices (32, 34, 36) are actually used. The file (f31) of is always duplicated and stored in three bricks (b31, b32, and b33). Therefore, even if one of the storage devices fails, the user 10 can use the duplicate volume #2 30 normally.

도 3은 일반적인 분산파일시스템을 구성하는 분산복제볼륨을 도시한 것이다.3 shows a distributed replication volume constituting a general distributed file system.

도 3에 도시된 바와 같이, 분산복제볼륨#3(40)은 분산2회, 복제2회로 구성된다. 이 경우 사용자(10)는 분산복제볼륨#3(40) 하나만을 사용하지만 실제 4개의 다른 저장장치(42,44,46,48)의 브릭(b41,b42,b43,b44)이 사용되며 하나의 파일(f41,f42)은 4개의 브릭(b41,b42,b43,b44)중 복제2회인 한 쪽 분산세트(a 또는 b)에만 저장된다. 이것은 분산을 통해 저장장치의 속도를 높이면서 복제를 통해 데이터의 안정성을 높이는 혼합 구성인 것이다.As shown in Fig. 3, the distributed replication volume #3 40 is composed of two distributed and two replicated circuits. In this case, the user (10) uses only one distributed replication volume #3 (40), but the bricks (b41,b42,b43,b44) of four different storage devices (42,44,46,48) are actually used, and one The files f41 and f42 are stored only in one distribution set (a or b), which is two copies of the four bricks (b41, b42, b43, and b44). This is a hybrid configuration that improves the stability of data through replication while increasing the speed of the storage device through distribution.

한편, 지난 수 년동안 클라우드 기술은 빠르게 발전하여 현재 아마존, 마이크로소프트, 구글로 대표되는 퍼블릭 클라우드가 보편화되고 있다. Meanwhile, over the past few years, cloud technology has developed rapidly, and public clouds represented by Amazon, Microsoft, and Google are now becoming common.

또한 인터넷의 계속적인 발전과 유튜브로 대표되는 동영상 서비스의 대약진으로 인해 동영상을 스트리밍으로 보는 것 뿐만 아니라 개인 장비에 저장하고 이를 넘어서 직접 제작하는 사람들이 늘고 있다.In addition, due to the continuous development of the Internet and the great leaps in video services represented by YouTube, more and more people not only watch videos as streaming, but also store them on personal devices and create them themselves.

이에 전통적인 기업용 대용량 스토리지 뿐 아니라 개인용 저장장치의 시장도 급격히 증가하고 있으나, 개인용 저장장치들은 RAID등의 기술로 보호되긴 하나 장치자체의 고장이나 정전 등에는 취약하고, 기업용 저장장치들은 장치간 백업기능이나 이중화, 분산저장 등 다양한 고가용성 기능을 제공하나, 사이트 전체의 정전이나 화재 등 지역적인 사고에 대해서는 취약하다. Accordingly, the market of personal storage devices as well as traditional corporate large-capacity storage is rapidly increasing, but personal storage devices are protected by technologies such as RAID, but are vulnerable to failure or power outages of the device itself, and corporate storage devices have a backup function between devices or It provides various high-availability functions such as redundancy and distributed storage, but is vulnerable to local accidents such as power outages or fires throughout the site.

이에 소형 저장장치에도 분산복제처리를 도입하거나 로컬에 있는 저장장치의 데이터를 클라우드에 주기적으로 백업하는 기능이 확산되고 있다.Accordingly, the function of introducing distributed replication processing to small storage devices or periodically backing up data from local storage devices to the cloud is spreading.

그러나 클라우드로의 주기적 백업 방식은 실시간 성이 부족하다는 약점이 있다. 특히 분산복제 저장장치의 실시간 백업은 네트워크로 나뉘어진 여러 저장장치들의 변동사항을 실시간으로 동기화 해야해서 많은 동기화 트래픽을 유발하여 장치의 부하를 증가시키고 네트워크 점유율을 증가시키는 문제점이 있다.However, the periodic backup method to the cloud has a weakness that it lacks real-time capability. In particular, real-time backup of a distributed replication storage device causes a lot of synchronization traffic to synchronize changes in various storage devices divided into a network in real time, thereby increasing the load on the device and increasing the network share.

이에따라 분산복제저장시스템에서 클라우드 백업의 실시간 성을 높이면서 네트워크 부하를 줄이고자 하는 노력이 있어왔다.Accordingly, efforts have been made to reduce the network load while increasing the real-time performance of cloud backup in a distributed replication storage system.

대한민국 등록특허공보 제10-1891425(2018.08.17.)Republic of Korea Patent Publication No. 10-1891425 (2018.08.17.)

따라서, 본 발명의 목적은 상기한 종래의 문제점을 극복할 수 있는 분산복제저장장치에서 퍼블릭 클라우드로의 실시간 백업방법 및 백업장치를 제공하는 데 있다.Accordingly, an object of the present invention is to provide a real-time backup method and a backup device from a distributed replication storage device to a public cloud that can overcome the above-described conventional problems.

본 발명의 다른 목적은 네트워크 부하량을 줄이면서도 효율적으로 퍼블릭 클라우드에 실시간 백업을 할 수 있는 분산복제저장장치에서 퍼블릭 클라우드로의 실시간 백업방법 및 백업장치를 제공하는 데 있다.Another object of the present invention is to provide a real-time backup method and a backup device from a distributed replication storage device to a public cloud that can efficiently perform real-time backup to a public cloud while reducing network load.

상기한 기술적 과제들의 일부를 달성하기 위한 본 발명의 구체화에 따라, 본 발명에 따른 최소단위 저장공간인 브릭을 복수로 각각 구비하는 복수의 노드들을 구비하여 분산기능 및 복제기능을 각각 또는 혼합하여 적용하는 분산복제저장시스템에서 퍼블릭 클라우드로의 데이터 실시간 백업방법은, 분산파일시스템을 적용하여 상기 복수의 노드들 내의 브릭들을 이용하여 분산복제볼륨을 생성하는 단계; 로컬 파일이벤트를 관리하는 블록인 이벤트 관리블록에서, 상기 분산복제볼륨이 생성된 상기 복수의 노드들 중 어느 하나의 노드를 리더노드로 선택하는 단계; 상기 이벤트 관리블록에서 상기 리더노드 내의 로컬디스크 또는 폴더정보를 추출하여 파일이벤트를 등록하는 단계; 및 동기화 블록을 이용하여 상기 이벤트 관리블록에서 파일이벤트를 수신하고, 해당 이벤트의 파일을 미리 지정된 클라우드로 전송하는 단계를 구비한다.According to the embodiment of the present invention for achieving some of the above technical problems, a plurality of nodes each having a plurality of bricks, which is the smallest unit storage space according to the present invention, is provided to apply a distribution function and a replication function individually or by mixing. A method for real-time data backup from a distributed replication storage system to a public cloud includes: creating a distributed replication volume using bricks in the plurality of nodes by applying a distributed file system; Selecting any one of the plurality of nodes in which the distributed replication volume is generated as a leader node in an event management block, which is a block for managing local file events; Registering a file event by extracting local disk or folder information in the leader node from the event management block; And receiving a file event from the event management block using a synchronization block and transmitting a file of the event to a predetermined cloud.

상기한 기술적 과제들의 일부를 달성하기 위한 본 발명의 다른 구체화에 따라, 본 발명에 따른 최소단위 저장공간인 브릭을 복수로 각각 구비하는 복수의 노드들을 구비하여 분산기능 및 복제기능을 각각 또는 혼합하여 적용하는 분산복제저장시스템에서 퍼블릭 클라우드로의 데이터 실시간 백업장치는, 상기 복수의 노드들 내의 브릭들을 이용하여 분산복제볼륨을 생성하는 분산파일시스템과; 로컬 파일이벤트를 관리하고, 상기 분산복제볼륨이 생성된 상기 복수의 노드들 중 어느 하나의 노드를 리더노드로 선택하며, 상기 리더노드 내의 로컬디스크 또는 폴더정보를 추출하여 파일이벤트를 등록하는 이벤트 관리블록과; 상기 이벤트 관리블록에서 파일이벤트를 수신하고, 해당 이벤트의 파일을 미리 지정된 클라우드로 전송하는 동기화 블록을 구비한다.According to another embodiment of the present invention for achieving some of the above technical problems, a plurality of nodes each having a plurality of bricks, which is the smallest unit storage space according to the present invention, is provided, so that the distribution function and the replication function are each or mixed. The apparatus for real-time data backup from an applied distributed replication storage system to a public cloud includes: a distributed file system for generating a distributed replication volume using bricks in the plurality of nodes; Event management that manages local file events, selects any one node among the plurality of nodes in which the distributed replication volume is created as a leader node, and registers file events by extracting local disk or folder information in the leader node Block and; And a synchronization block for receiving a file event from the event management block and transmitting a file of the event to a predetermined cloud.

본 발명에 따르면, 즉 네트워크를 통해 접근되는 파일시스템들의 파일 이벤트 처리 시 노드간에 파일이벤트 공유가 불필요하게 하여 모든 리더 노드가 로컬 파일 이벤트만으로 전체 분산복제파일시스템의 파일이벤트를 처리할 수 있게 된다. 따라서 각 노드들의 컴퓨팅 부하와 네트워크 부하를 모두 감소시켜 효율적으로 분산복제파일시스템의 모든 파일이벤트를 감지하여 동기화블록으로 전달하고 동기화 블록은 이를 클라우드로 복제하여 실시간 클라우드 백업의 목표를 달성하게 한다. 이에 따라, 보다 저사양의 저장장치 스펙을 사용할 수 있게 하며 실시간 성의 증가로 데이터의 안정성을 더 높일 수 있게 한다. 또한 분산복제저장장치를 설계 시 파일이벤트를 지원하지 않는 어떤 분산파일시스템이라도 사용할 수 있게 있게 된다. According to the present invention, that is, when processing file events of file systems accessed through a network, file event sharing between nodes is unnecessary, so that all leader nodes can process file events of the entire distributed replication file system only with local file events. Therefore, by reducing both the computing load and the network load of each node, all file events of the distributed replication file system are efficiently detected and delivered to the synchronization block, and the synchronization block replicates it to the cloud to achieve the goal of real-time cloud backup. Accordingly, it is possible to use a storage device specification of a lower specification, and the stability of data can be further improved by increasing real-time performance. In addition, when designing a distributed replication storage device, any distributed file system that does not support file events can be used.

도 1은 일반적인 분산파일시스템을 구성하는 분산 볼륨의 일 실시예를 그린 것이고,
도 2는 일반적인 분산파일시스템을 구성하는 복제볼륨을 도시한 것이고,
도 3은 일반적인 분산파일시스템을 구성하는 분산복제볼륨을 도시한 것이고,
도 4는 본 발명의 일 실시예에 따른 분산복제저장시스템에서 퍼블릭 클라우드로의 데이터 실시간 백업장치의 개략적인 블록도를 나타낸 것이고,
도 5는 도 4의 분산복제저장시스템에서 퍼블릭 클라우드로의 데이터 실시간 백업방법의 동작순서도이고,
도 6은 도 4의 분산복제저장시스템(100)에서 분산복제용 볼륨을 생성한 일 예를 나타낸 것이고,
도 7은 리눅스 운영체제의 파일이벤트 일예를 나타낸 것이고,
도 8은 도 7의 파일이벤트 과정을 나타낸 것이고,
도 9는 여러 개의 분산복제볼륨이 동시에 존재하는 경우 각 블록간의 동작을 도시한 것이다.1 is a diagram illustrating an embodiment of a distributed volume constituting a general distributed file system.
2 is a diagram showing a replication volume constituting a general distributed file system,
3 is a diagram illustrating a distributed replication volume constituting a general distributed file system,
4 is a schematic block diagram of an apparatus for real-time data backup from a distributed replication storage system to a public cloud according to an embodiment of the present invention,
5 is a flowchart illustrating an operation of a data real-time backup method from the distributed replication storage system of FIG. 4 to a public cloud,
6 shows an example of creating a volume for distributed replication in the distributed replication storage system 100 of FIG. 4,
7 shows an example of a file event of the Linux operating system,
8 shows the file event process of FIG. 7,
9 is a diagram illustrating an operation between blocks when several distributed replication volumes exist at the same time.

이하에서는 본 발명의 바람직한 실시예가, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 철저한 이해를 제공할 의도 외에는 다른 의도 없이, 첨부한 도면들을 참조로 하여 상세히 설명될 것이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings, without any intention other than to provide a thorough understanding of the present invention to those of ordinary skill in the art to which the present invention pertains.

도 4는 본 발명의 일 실시예에 따른 분산복제저장시스템에서 퍼블릭 클라우드로의 데이터 실시간 백업장치의 개략적인 블록도를 나타낸 것이다.4 is a schematic block diagram of an apparatus for real-time data backup from a distributed copy storage system to a public cloud according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 본 발명의 일 실시예에 따른 실시간 백업장치에 구비되는 분산복제저장시스템(100)은, 분산복제저장장치인 노드(110,120,130,140)를 복수로 구비하고, 각 노드는 최소단위 저장공간(최소 디스크 단위)인 브릭(b)을 복수로 각각 구비하는 구성을 가진다. As shown in Fig. 4, the distributed replication storage system 100 provided in the real-time backup device according to an embodiment of the present invention includes a plurality of nodes 110, 120, 130, 140, which are distributed replication storage devices, and each node has a minimum It has a configuration including a plurality of bricks (b), which are unit storage spaces (minimum disk units).

여기서 각 노드는 분산기능과 복제기능이 각각 적용되거나, 혼합하여 적용될 수 있는 저장장치를 의미할 수 있다. 그리고 분산복제저장시스템 내에서 각 노드는 네트워크 스위치(ns)를 통해 상호 연결되고, 분산복제저장시스템은 다수의 네트워크 인터페이스가 존재할 수 있고, 다수의 네트워크 스위치와 연결될 수 있다.Here, each node may mean a storage device to which a distributed function and a replication function are applied, or may be applied by mixing. In addition, in the distributed replication storage system, each node is interconnected through a network switch (ns), and the distributed replication storage system may have a plurality of network interfaces and may be connected to a plurality of network switches.

도 5는 도 4의 분산복제저장시스템에서 퍼블릭 클라우드로의 데이터 실시간 백업방법의 동작순서도이다.5 is a flowchart illustrating an operation of a data real-time backup method from the distributed replication storage system of FIG. 4 to a public cloud.

도 5에 도시된 바와 같이, 우선적으로, 분산복제저장시스템에서 퍼블릭 클라우드로의 데이터 실시간 백업을 위해 분산복제볼륨을 생성한다(S110). 분산복제볼륨의 생성의 일예는 도 6에 도시된다. 도 6에 도시된 바와 같이, 분산복제저장시스템(100)에서 분산파일시스템을 적용하여, 분산2, 복제2인 구성을 가지는 볼륨1, 분산2의 구성을 가지는 볼륨2, 복제3의 구성을 가지는 볼륨3이 생성되게 된다. 즉 볼륨1은 분산복제볼륨, 볼륨2는 분산볼륨, 볼륨3은 복제볼륨으로 생성된다. 각 노드가 분산기능과 복제기능이 각각 적용되거나, 혼합하여 적용될 수 있는 것이다. 도 6과 달리 다양한 분산 및 복제의 구성으로 볼륨을 생성하는 것이 가능하다. 이렇게 생성된 볼륨은 분산복제관리자에 의해 전체 노드에 전파되게 된다.As shown in FIG. 5, first, a distributed replication volume is created for real-time backup of data from the distributed replication storage system to the public cloud (S110). An example of generation of a distributed replication volume is shown in FIG. 6. As shown in FIG. 6, by applying a distributed file system in the distributed replication storage system 100, a volume 1 having a configuration of distributed 2 and a copy 2, volume 2 having a configuration of distributed 2, and a configuration of replication 3 Volume 3 is created. That is, Volume 1 is created as a distributed replication volume, Volume 2 is a distributed volume, and Volume 3 is a duplicate volume. Each node can be applied with a distributed function and a replication function, or applied in combination. Unlike FIG. 6, it is possible to create a volume with various distribution and replication configurations. The volume created in this way is propagated to all nodes by the distributed replication manager.

다음으로, 각 노드의 로컬 파일이벤트를 관리하는 블록인 이벤트 관리블록에서, 상기 분산복제볼륨이 생성된 상기 복수의 노드들 중 어느 하나의 노드를 리더노드로 선출(선택)하게 된다(S112). 분산복제볼륨이 생성되면 각 노드는 해당 볼륨에 속한 각 노드의 브릭의 메타정보를 가지고 해쉬함수로 리더 노드를 계산한다. 동일 해쉬 함수로 만들어내는 값은 브릭의 메타정보별로 고유하기 때문에 노드 간 리더 노드 정보를 교환하거나 협상할 필요가 없다.Next, in the event management block, which is a block for managing local file events of each node, one of the plurality of nodes for which the distributed replication volume is generated is elected (selected) as a leader node (S112). When a distributed replication volume is created, each node calculates a leader node with a hash function with the meta information of the bricks of each node belonging to the volume. Since the value generated by the same hash function is unique for each brick meta-information, there is no need to exchange or negotiate leader node information between nodes.

여기서 리더노드는, 생성된 분산복제볼륨에서 동일한 복제본을 갖는 모든 노드들의 이벤트관리블록이 동시에 동기화 리더노드를 선출하게 된다. 모든 이벤트관리블록이 전체 참여 노드가 공유하는 메타정보를 해쉬 함수를 사용하여 리더노드를 선출하게 된다. 해쉬 함수를 사용하면 각 노드에서 계산된 리더노드가 서로 같기 때문에 상호간에 선출한 리더노드 정보를 주고 받은 후 합의하는 단계를 거치지 않는 장점이 있다. 여기서 해쉬 함수에 사용되는 메타정보는 분산복제볼륨에 참여하는 각 노드의 브릭(디스크)정보가 포함된다.Here, the leader node elects a synchronization leader node at the same time by event management blocks of all nodes having the same copy in the generated distributed replication volume. The leader node is elected by all event management blocks using the hash function of meta information shared by all participating nodes. If the hash function is used, since the leader nodes calculated in each node are the same, it has the advantage of not going through the step of consensus after exchanging information on the leader nodes that have been elected. Here, the meta information used in the hash function includes brick (disk) information of each node participating in the distributed replication volume.

선출된 리더노드의 이벤트관리블록이 해당 분산복제볼륨 정보에서 로컬볼륨의 로컬폴더를 추출하여 커널에 파일이벤트를 등록하여 모니터링하게 된다. 즉 상기 리더노드의 이벤트 관리블록에서 상기 리더노드 내의 로컬디스크 또는 폴더정보를 추출하여 파일이벤트를 등록하게 된다(S114). 여기서 로컬 볼륨은 브릭을 구성하는 로컬 디스크나 폴더를 의미하고, 이벤트관리 블록 은 분산복제볼륨의 실시간 동기화를 위한 로컬 파일 이벤트를 관리하는 블록이다. The event management block of the elected leader node extracts the local folder of the local volume from the distributed replication volume information and registers the file event in the kernel for monitoring. That is, a file event is registered by extracting the local disk or folder information in the leader node from the event management block of the leader node (S114). Here, the local volume means a local disk or folder that composes the brick, and the event management block is a block that manages local file events for real-time synchronization of distributed replication volumes.

도 6에서 복제3(3*1)인 분산복제볼륨의 예를 들면, 파일이벤트가 발생하여, 복제가 된 경우, 복제3이므로 3카피가 존재하는데 이중 리더노드만이'이벤트관리 블록'에서 로컬 파일이벤트를 등록한다. 나머지 노드들은 리더가 아니므로 로컬파일이벤트를 등록하지 않고 대기 한다.In FIG. 6, for example, when a file event occurs and is replicated, 3 copies exist because of the replication 3 (3*1), of which only the leader node is local in the'event management block'. Register file event. The rest of the nodes are not readers, so they wait without registering local file events.

여기서, 파일이벤트에 대하여 도 7 및 도 8을 통해 간단히 설명하면 다음과 같다. Here, a brief description of the file event through FIGS. 7 and 8 is as follows.

파일 이벤트는 컴퓨터의 운영체계 하의 임의의 파일시스템에서 그 파일시스템이 관리하고 있는 어떤 파일이 접근/변경되었을때 그 내용을 이벤트로 통보하는 것이다. 파일 이벤트가 없는 경우 운영체계나 응용프로그램이 파일시스템 하의 변화를 인지하려면 전체 파일시스템을 다 검색해야 하고 이는 매우 비효율적인 작업이다. A file event is to notify the contents of an event when a file managed by the file system is accessed/changed in an arbitrary file system under the computer's operating system. If there is no file event, in order for the operating system or application program to recognize changes under the file system, it must search the entire file system, which is a very inefficient operation.

파일 이벤트의 일 예로 도 7에 도시된 리눅스 운영체제의 inotify가 있다. An example of a file event is inotify of the Linux operating system shown in FIG. 7.

도 7에 도시된 바와 같이, Inotify는 리눅스 하의 파일시스템의 변경 사항을 알아내고 리눅스 커널이 직접 응용프로그램에게 이 사항을 보고한다. 이를 사용하면 파일 시스템의 변경 내용을 찾기 위해 주기적으로 반복하여 파일시스템을 검색할 필요가 없어진다. Inotify는 디렉토리 단위로 등록하게 되어 있고 재귀적으로 작동하지는 않는다. 사용자는 초기화를 통해 파일 기술자를 하나 생성하여 관찰함수를 추가한 후 이벤트 큐에 대한 읽기함수를 작동시키면, 이벤트가 발생할 때까지 대기상태에 머문다. 따라서 변경을 감지하기 위한 불필요한 CPU 점유를 하지 않는다. 도 8은 이러한 도 7의 파일이벤트 과정을 나타낸 것이다. As shown in Fig. 7, Inotify detects changes in the file system under Linux, and the Linux kernel directly reports the changes to the application program. This eliminates the need to repeatedly search the file system periodically to find changes to the file system. Inotify is supposed to register on a per-directory basis and does not work recursively. When a user creates a file descriptor through initialization, adds an observation function, and activates a read function for the event queue, it stays in a waiting state until an event occurs. Therefore, it does not occupy unnecessary CPU to detect changes. 8 shows the file event process of FIG. 7.

도 7 및 도 8에 도시된 바와 같이, 파일이벤트 초기화 함수로 inotify_init를 이용하여 파일이벤트 함수 초기화를 수행하고(S10), 관찰함수로 inotify_add_watch를 사용하여 watch list에 해당 watch를 등록한다(S12). 이 watch list 기반으로 커널은 파일시스템 이벤트를 발생시켜 이벤트 큐에 넣고,이것들은 read 함수에 의해 사용자 공간의 응용프로그램들에게 전달된다(S14). 응용프로그램에서는 읽어온 파일들에 대한 처리를 수행한다(S16).7 and 8, a file event function is initialized using inotify_init as a file event initialization function (S10), and a corresponding watch is registered in a watch list using inotify_add_watch as an observation function (S12). Based on this watch list, the kernel generates file system events and puts them in the event queue, and these are delivered to the application programs in user space by the read function (S14). The application program processes the read files (S16).

여기서, inotify는 로컬 디스크 즉 사용자 컴퓨터에 직접 연결된 디스크에서만 작동하고 네트워크 파일시스템을 통해 연결된 디스크는 작동하지 않는다. 이는 분산파일시스템인 글러스터 역시 마찬가지이다. 글러스터 서버와 클라이언트간에 이벤트 송수신 시스템을 갖고 있으나, 이는 파일단위의 이벤트가 아닌 설정 및 상태 이벤트이다. Here, inotify works only on a local disk, that is, a disk directly connected to the user's computer, and not a disk connected through a network file system. This is the same for Gluster, a distributed file system. There is an event sending/receiving system between the cluster server and the client, but this is a configuration and status event, not a file unit event.

분산파일시스템이 파일 단위의 이벤트를 지원하기 어려운 것은 구조상 디스크들이 네트워크로 분리된 원격지 상에 위치하기 때문에 어느 한곳에서 이 이벤트들을 집중하여 관리하는 경우 모든 저장장치의 변경 이벤트가 네트워크를 통해 몰려와서 망 부하 및 CPU 부하가 증가하기 때문이다. It is difficult for a distributed file system to support file-level events because disks are located on remote locations separated by a network. Therefore, if these events are centrally managed in any one place, all storage device change events are gathered through the network. This is because the load and CPU load increase.

따라서 분산파일시스템을 사용하는 분산복제저장장치의 경우 클라우드와 임의의 볼륨을 동기화 또는 실시간 백업하기 위해서는 파일 이벤트를 지원하는 분산파일시스템을 사용하거나 주기적인 폴링을 통해 변경 내역을 수집해야 한다. Therefore, in the case of a distributed replication storage device using a distributed file system, in order to synchronize or back up an arbitrary volume with the cloud, it is necessary to use a distributed file system that supports file events or collect changes through periodic polling.

다음으로, 도 5에 도시된 바와 같이, 파일이벤트를 등록한(S114) 이후에는, 이벤트 발생시 동기화 블록에 전달하게 된다(S116). 여기서, 동기화 블록은 상기 이벤트관리 블록에서 파일 이벤트를 수신하여 미리 지정된 클라우드로 이벤트 내의Next, as shown in FIG. 5, after registering the file event (S114), the event is transmitted to the synchronization block when the event occurs (S116). Here, the synchronization block receives a file event from the event management block and sends it to a predefined cloud.

파일들을 복사한다(S118). 즉 상기 리더노드의 동기화 블록을 이용하여 상기 리더노드의 이벤트 관리블록에서 파일이벤트를 수신하고, 해당 이벤트의 파일을 미리 지정된 클라우드로 전송하여 데이터를 저장하게 된다. 이 과정을 통해 분산파일시스템의 볼륨이 파일 이벤트를, 네트워크를 통하지 않고 로컬정보로만 전달할 수 있게 된다. The files are copied (S118). That is, a file event is received from the event management block of the leader node using the synchronization block of the leader node, and the file of the event is transmitted to a predetermined cloud to store data. Through this process, the volume of the distributed file system can transmit file events only as local information without going through the network.

도 9는 여러 개의 분산복제볼륨이 동시에 존재하는 경우 각 블록간의 동작을 도시한 것이다.9 is a diagram illustrating an operation between blocks when several distributed replication volumes exist at the same time.

도 9에 도시된 바와 같이, 여러개의 분산복제볼륨(V1,V2,V3)이 존재하는 경우를 살펴보면, 분산복제볼륨(V1)의 리더노드는 노드#1(110)이고 그 노드#1(110)의 이벤트관리블록이 V1에 대한 파일 이벤트를 등록하고 동기화 블록을 활성화한다. 그리고, 분산복제볼륨(V2)의 리더노드는 노드#2(120)이고 그 노드#2(120)의 이벤트관리블록이 V2에 대한 파일 이벤트를 등록하고 동기화 블록을 활성화한다. 다음으로, 분산복제볼륨(V3)의 리더노드는 노드#1(110)이고 그 노드#1(110)의 이벤트관리블록이 V3에 대한 파일 이벤트를 등록하고 동기화 블록을 활성화한다. As shown in FIG. 9, looking at the case where several distributed replication volumes (V1, V2, V3) exist, the leader node of the distributed replication volume (V1) is node #1 (110) and the node #1 (110). ), the event management block registers the file event for V1 and activates the synchronization block. In addition, the leader node of the distributed replication volume V2 is node #2 (120), and the event management block of node #2 (120) registers a file event for V2 and activates the synchronization block. Next, the leader node of the distributed replication volume V3 is node #1 (110), and the event management block of the node #1 (110) registers a file event for V3 and activates the synchronization block.

여기서, 노드#1(110)은 V1,V3에 대한 리더 노드이고 노드#1(110)의 이벤트관리블록은 V1,V3의 로컬 볼륨에 파일 이벤트를 등록하고 동기화 블록을 활성화하게 된다. 그리고, 노드#2(120)는 V2에 대한 리더노드이고 노드#2(120)의 이벤트관리블록은 V2의 로컬 볼륨에 파일 이벤트를 등록하고 동기화 블록을 활성화하게 된다. 노드#3(130)는 리더 노드가 아니고 노드#3(130)의 이벤트관리블록은 아무런 파일이벤트도 등록하지 않으며 동기화 블록은 비활성상태이다.Here, node #1 (110) is a leader node for V1 and V3, and event management block of node #1 (110) registers file events in local volumes of V1 and V3 and activates the synchronization block. Further, node #2 120 is a leader node for V2, and the event management block of node #2 120 registers a file event in the local volume of V2 and activates the synchronization block. Node #3 (130) is not a leader node, the event management block of node #3 (130) does not register any file events, and the synchronization block is inactive.

상술한 바와 같이, 본 발명에 따르면, 모든 분산복제볼륨 또는 선택된 분산복제볼륨의 파일이벤트를 전부 로컬로만 등록하여 네크워크의 부하를 없애고 동기화에 대한 분산효과를 얻는다. N개의 복제 볼륨의 경우 1/N만큼 파일 이벤트의 부하량을 줄일 수 있으며, 분산 볼륨의 경우 각 노드로 부하를 분산시켜 각 노드의 CPU의 점유율을 줄일 수 있다. As described above, according to the present invention, all distributed replication volumes or all file events of selected distributed replication volumes are registered only locally to remove the load on the network and obtain a distribution effect for synchronization. In the case of N replicated volumes, the load of file events can be reduced by 1/N, and in the case of distributed volumes, the load can be distributed to each node to reduce the CPU share of each node.

즉 네트워크를 통해 접근되는 파일시스템들의 파일 이벤트 처리 시 노드간에 파일이벤트 공유가 불필요하게 하여 모든 리더 노드가 로컬 파일 이벤트만으로 전체 분산복제파일시스템의 파일이벤트를 처리할 수 있게 된다. 따라서 각 노드들의 컴퓨팅 부하와 네트워크 부하를 모두 감소시켜 효율적으로 분산복제파일시스템의 모든 파일이벤트를 감지하여 동기화블록으로 전달하고 동기화 블록은 이를 클라우드로 복제하여 실시간 클라우드 백업의 목표를 달성하게 한다. 이에 따라, 보다 저사양의 저장장치 스펙을 사용할 수 있게 하며 실시간 성의 증가로 데이터의 안정성을 더 높일 수 있게 한다. 또한 분산복제저장장치를 설계 시 파일이벤트를 지원하지 않는 어떤 분산파일시스템이라도 사용할 수 있게 있게 된다. That is, when processing file events of file systems accessed through the network, file event sharing between nodes is unnecessary, so that all leader nodes can process file events of the entire distributed replication file system only with local file events. Therefore, by reducing both the computing load and the network load of each node, all file events of the distributed replication file system are efficiently detected and delivered to the synchronization block, and the synchronization block replicates it to the cloud to achieve the goal of real-time cloud backup. Accordingly, it is possible to use a storage device specification of a lower specification, and the stability of data can be further improved by increasing real-time performance. In addition, when designing a distributed replication storage device, any distributed file system that does not support file events can be used.

상기한 실시예의 설명은 본 발명의 더욱 철저한 이해를 위하여 도면을 참조로 예를 든 것에 불과하므로, 본 발명을 한정하는 의미로 해석되어서는 안될 것이다. 또한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기본적 원리를 벗어나지 않는 범위 내에서 다양한 변화와 변경이 가능함은 명백하다 할 것이다. Since the description of the above-described embodiment is merely an example with reference to the drawings for a more thorough understanding of the present invention, it should not be construed as limiting the present invention. In addition, it will be apparent to those of ordinary skill in the art to which the present invention pertains that various changes and changes can be made without departing from the basic principles of the present invention.

110,120,130 : 노드 b : 브릭
ns : 네트워크 스위치110,120,130: Node b: Brick
ns: network switch

Claims

In a method for real-time data backup from a distributed replication storage system to a public cloud in a distributed replication storage system that includes a plurality of nodes each having a plurality of bricks, which is the smallest unit storage space, and applies a distributed function and a replication function individually or in combination:
Generating a distributed replication volume using bricks in the plurality of nodes by applying a distributed file system;
Selecting any one of the plurality of nodes in which the distributed replication volume is generated as a leader node in an event management block, which is a block for managing local file events of each node;
Registering a file event by extracting local disk or folder information in the leader node from the event management block of the leader node; And
Receiving a file event from the event management block of the leader node using the synchronization block of the leader node, and transmitting a file of the event to a pre-designated cloud. Data real-time backup method.

In a data real-time backup device from a distributed replication storage system to a public cloud in which a distribution function and a replication function are applied by each having a plurality of nodes each having a plurality of bricks, which is the smallest unit storage space:
A distributed file system for generating a distributed replication volume using bricks in the plurality of nodes;
Event management that manages local file events, selects any one of the plurality of nodes in which the distributed replication volume is generated as a leader node, and extracts local disk or folder information in the leader node to register file events Block and;
A data real-time backup device from a distributed replication storage system to a public cloud, comprising: a synchronization block for receiving a file event from the event management block and transmitting a file of the event to a predetermined cloud.