KR101374533B1

KR101374533B1 - High performance replication system and backup system for mass storage data, method of the same

Info

Publication number: KR101374533B1
Application number: KR1020130042259A
Authority: KR
Inventors: 박성원
Original assignee: 주식회사 엔써티
Priority date: 2013-04-17
Filing date: 2013-04-17
Publication date: 2014-03-14

Abstract

An objective of the present invention is to copy data in real time through a multi-thread scheme in the process of copying data of a system to be copied. In order to accomplish the objective, the high performance copy system for mass storage data according to the present invention is characterized by including a copy-target disk in which data to be copied are stored, A copy-target server for dividing data blocks stored in the copy-target disk into a plurality of predetermined unit data, for computing a predetermined duration, for capturing the predetermined unit data for the predetermined duration by multiple threads, and for transmitting the predetermined unit data into a queue while computing another duration next to the predetermined duration to capture predetermined unit data during another duration by the thread which has finished the capturing work, a second agent for analyzing each unit data received from a first agent to create copy data by the multiple threads, and a copy server having a copy disk in which the copied data created by the second agent are stored. [Reference numerals] (100) Copy-target server; (110) First agent; (120) Copy-target disk; (200) Copy server; (210) Second agent; (220) Copy disk

Description

HIGH PERFORMANCE REPLICATION SYSTEM AND BACKUP SYSTEM FOR MASS STORAGE DATA, METHOD OF THE SAME

본 발명은 대용량 데이터에 대한 고성능 복제 및 백업 시스템과, 고성능 복제 방법에 관한 것으로서, 특히 대용량 DBMS(데이터)에 대한 고성능 복제를 위해 데이터 테이블의 프라이머리 키(Primary Key)를 기반으로 분산 병렬 처리가 가능하고, 이러한 PK를 기준으로 자동으로 분산시켜 각각의 담당하는 쓰레드에서 병렬로 복제하되 동일한 PK를 가진 트랜잭션은 동일한 쓰레드에서 처리할 수 있도록 분산해 줌으로써, 병렬 처리시에도 데이터 정합성을 보장하며 쓰레드를 이용한 병렬 처리를 통하여 성능을 극대화하고, 특히 1개의 테이블에 대해서도 정합성에 위배되지 않고 분산 및 병렬 처리가 가능한 대용량 데이터에 대한 고성능 복제 및 백업 시스템과, 고성능 복제 방법에 관한 것이다.The present invention relates to a high performance replication and backup system for a large amount of data, and a high performance replication method. In particular, distributed parallel processing based on a primary key of a data table is required for high performance replication for a large DBMS (data). It is possible to automatically distribute based on these PKs and replicate them in parallel in each thread, but to distribute transactions with the same PK so that they can be processed in the same thread, ensuring data consistency even in parallel processing. The present invention relates to a high performance replication and backup system and a high performance replication method for a large amount of data that can be maximized through parallelism using distributed data, and in particular, a single table can be distributed and parallelized without violating consistency.

미국 비상계획연구소에 따르면 전산장애에 따른 데이터 유실로 인한 산업계 평균 손실액이 수백만 달러에 달한 것으로 보고되고 있으며, 기업뿐만 아니라 전자 정부 구현을 표방한 국가의 데이터 자원을 관리하는 관공서의 경우에서도 데이터의 백업과 복구의 중요성은 경제적 손실은 제외하고 국가 경쟁력 및 안보와도 직결된 가장 중요한 사안임이 강조되고 있다.According to the U.S. Emergency Planning Institute, the average industry lost more than a million dollars due to data loss due to computer failure, and the backup of data is not only for companies but also for government offices that manage data resources in countries that have implemented e-government. It is emphasized that the importance of recovery and recovery is the most important issue directly related to national competitiveness and security, excluding economic losses.

그리고, 현재는 산업 전 부분이 인터넷 환경으로 전환되면서 개인의 데이터량은 물론 기업의 정보 데이터가 기하급수적으로 증가함에 따라 데이터 웨어 하우스(data warehouse), 전사적 자원관리(enterprise resource planning), 고객관계관리(customer relationship management), 지식관리(knowledge management) 등 스토리지를 기반으로 한 첨단 엔터프라이즈 컴퓨팅 환경구축 및 증설이 대폭 증가하고 있는 실정이다.And now, as the entire industry is shifting to the Internet environment, as data volume of individuals as well as corporate information data increases exponentially, data warehouse, enterprise resource planning, customer relationship management The establishment and expansion of advanced enterprise computing environments based on storage, such as customer relationship management and knowledge management, are increasing significantly.

위와 같이 다양한 업종에서 구축되고 있는 스토리지는 하루에도 수백 메가바이트(MB)에서 수십 기가바이트(GB)의 저장공간 증설을 필요로 하고 있으며, 이러한 시대적 상황에 더불어 방대해지는 데이터를 홍수나 화재 등의 천재지변 혹은 테러 등의 재난, 장애, 사고 등으로부터 유지 및 보호하는 일이 기업의 존망을 좌우할 만큼 중요성을 지니게 되었다.Storage, which is being built in various industries as above, needs to expand storage space from hundreds of megabytes (MB) to several tens of gigabytes (GB) a day. Maintaining and protecting against disasters, obstacles and accidents such as earthquake or terrorism has become important to determine the company's existence.

이와 같은 상황 변화에 따라 수많은 기업에서는 복제 솔루션을 개발하여, 주 시스템에 연결된 주 저장장치인 복제대상 디스크에 저장되어 있는 데이터를 복제 디스크로 복제시키는 소프트웨어를 제공하고 있다. As a result of this change, many companies have developed a replication solution to provide software for copying data stored on a target disk, which is a main storage device connected to a main system, to a copy disk.

그러나, 종래의 복제 솔루션들은 복제해야 하는 주 저장 장치 내의 파일의 개수 또는 데이터의 양이 많아질수록 복제를 수행하는 속도가 저하되는 문제가 존재하므로 데이터 복제 시간을 최대한 단축시키는 것이 중요한 과제이며, 복제할 데이터를 저장하는 공간이 한정된 범위 내에서, 보다 효율적으로 많은 데이터를 저장하는 것도 중요한 과제 중 하나라 할 수 있다.However, in the conventional replication solutions, as the number of files or the amount of data in the main storage device to be replicated increases, the speed of performing replication decreases. Therefore, it is important to minimize data replication time as much as possible. Within a limited space for storing data to be done, storing more data more efficiently is one of the important challenges.

본 발명은 상술한 바와 같은 문제를 해결하기 위해 안출된 것으로서, 본 발명의 목적은 복제 대상 시스템의 데이터를 복제하는 과정에 있어서, 멀티쓰레드 방식을 이용하여 실시간으로 데이터 복제를 수행할 수 있도록 하는 것이다.The present invention has been made to solve the above problems, an object of the present invention is to enable data replication in real time using a multi-threaded method in the process of copying the data of the system to be replicated. .

본 발명의 다른 목적은 멀티쓰레드 방식을 이용하여 실시간으로 데이터 백업을 수행할 수 있도록 하는 것이다.Another object of the present invention is to enable data backup in real time using a multithreaded method.

상기 목적을 달성하기 위해, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 시스템은, 복제 대상 데이터가 저장된 복제 대상 디스크와, 상기 복제 대상 디스크의 데이터의 블록을 복수의 소정 단위 데이터로 각각 분할 후, 일정 구간을 산정하여 복수의 쓰레드로 상기 일정 구간의 상기 소정 단위 데이터를 캡처하고, 캡처가 종료된 쓰레드는 큐(Queue)로 상기 소정 단위 데이터를 전송함과 아울러 상기 일정 구간 다음의 다른 구간을 산정하여 상기 다른 구간의 소정 단위 데이터의 캡처를 진행하는 제 1 에이전트를 갖는 복제 대상 서버; 및 상기 제 1 에이전트로부터 수신된 각각의 상기 소정 단위 데이터를 분석하여 복수의 쓰레드로 복제 데이터를 생성하는 제 2 에이전트와, 상기 제 2 에이전트로부터 생성된 상기 복제 데이터를 저장하는 복제 디스크를 갖는 복제 서버;를 포함하는 것을 특징으로 한다.In order to achieve the above object, a high-performance replication system for a large amount of data according to the present invention, after dividing the replication target disk and the block of data of the replication target data stored in a plurality of predetermined unit data, respectively By calculating an interval, the predetermined unit data of the predetermined interval is captured by a plurality of threads, and the captured thread transmits the predetermined unit data to a queue and calculates another interval following the predetermined interval. A replication target server having a first agent for capturing predetermined unit data of the another section; And a second agent for analyzing the predetermined unit data received from the first agent to generate duplicate data in a plurality of threads, and a duplicate disk storing the duplicate data generated from the second agent. It characterized by including.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 시스템은, 상기 제 1 에이전트는 상기 제 2 에이전트로 상기 소정 단위 데이터를 큐의 구조로 송신하는 것을 특징으로 한다.In addition, the high-performance replication system for a large amount of data according to the present invention, the first agent is characterized in that for transmitting the predetermined unit data in a queue structure to the second agent.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 시스템은, 상기 제 2 에이전트는 상기 큐로부터 수신한 상기 소정 단위 데이터를 PK 컬럼 또는 유니크 인덱스 컬럼을 기준으로 병렬로 분산시켜 처리하는 것을 특징으로 한다.In addition, the high performance replication system for a large amount of data according to the present invention, the second agent is characterized in that the predetermined unit data received from the queue in parallel to the processing based on the PK column or unique index column.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 시스템은, 상기 제 2 에이전트는 상기 제 1 에이전트로부터 수신된 상기 큐 구조의 상기 소정 단위 데이터를 상기 멀티쓰레드의 개수만큼 데이터 커넥션을 생성하며, 상기 복제 디스크에서, 동일한 상기 PK 컬럼 또는 동일한 상기 유니크 인덱스 컬럼을 갖는 트랜잭션들이 상기 커넥션 중 하나만을 사용하여 복제되는 것을 특징으로 한다.In addition, in the high performance replication system for a large amount of data according to the present invention, the second agent generates the data connection of the predetermined unit data of the queue structure received from the first agent by the number of the multi-threaded, the replication In disk, transactions having the same PK column or the same unique index column are replicated using only one of the connections.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 시스템은, 상기 제 2 에이전트는 상기 멀티쓰레드에 의한 실시간 데이터 복제 중 상기 PK 컬럼 또는 상기 유니크 인덱스 컬럼의 값이 변경될 경우, 상기 멀티쓰레드를 단일 쓰레드로 전환하고 하나의 데이터 커넥션만을 사용하여 순차적으로 상기 복제 디스크에 상기 소정 단위 데이터를 복제하는 것을 특징으로 한다.In addition, the high performance replication system for a large amount of data according to the present invention, the second agent is a single thread when the value of the PK column or the unique index column is changed during the real-time data replication by the multi-threaded And the predetermined unit data is sequentially copied to the copy disk using only one data connection.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 백업 시스템은, 백업 대상 데이터가 저장된 백업 대상 디스크와, 백업 운영 명령어를 포함하는 명령어가 입력되고, 소정 명령에 대한 결과를 출력하는 입ㆍ출력부와, 상기 입ㆍ출력부로 인가되는 백업 운영 명령어를 처리하여 백업이 수행되도록 제어하는 중앙 제어 유니트와, 상기 입ㆍ출력부와 상기 중앙 제어 유니트를 통해 인가되는 상기 백업 운영 명령어를 인가받아 백업 매니저 모듈로 전송하는 백업 마스터 모듈과, 상기 백업 마스터 모듈로부터 백업 운영에 필요한 상기 백업 운영 명령어를 수신받아 볼륨별 백업 예약 정보를 관리하고, 볼륨별 백업 상태 및 백업 히스토리 정보를 수집하고 관리하며, 백업 스케쥴에 따라 디스크 볼륨에 대한 백업 명령어를 백업 에이전트 모듈로 전송하는 백업 매니저 모듈과, 상기 백업 대상 디스크로부터 소정 단위 데이터의 캡처를 진행하는 제 3 에이전트를 갖는 백업 대상 서버; 및 상기 제 3 에이전트로부터 수신된 각각의 상기 소정 단위 데이터를 분석하여 복수의 쓰레드로 백업 데이터를 생성하는 제 4 에이전트와, 상기 제 4 에이전트로부터 생성된 상기 백업 데이터를 저장하는 백업 디스크를 갖는 백업 클라이언트 서버를 포함하되, 상기 제 3 에이전트는 상기 백업 매니저 모듈로부터 백업 명령어를 인가받아 백업 대상 디스크의 데이터의 볼륨을 소정 크기의 단위 데이터들로 분할하고, 하나의 프로세스 내에서 여러 개의 플로우를 진행하는 N개의 쓰레드를 생성하여 상기 분할된 단위 데이터들을 순차적으로 압축하여 상기 제 4 에이전트로 전송하는 것을 특징으로 한다.In addition, a high-performance backup system for large-capacity data according to the present invention includes an input / output unit for inputting a backup target disk in which backup target data is stored, a command including a backup operation command, and outputting a result for a predetermined command; A central control unit which processes a backup operation command applied to the input / output unit and controls the backup to be performed, and receives the backup operation command applied through the input / output unit and the central control unit to be transmitted to a backup manager module Receiving a backup master module for performing backup operation from the backup master module and managing the backup schedule information for each volume, collecting and managing backup status and backup history information for each volume, and a disk according to a backup schedule. Backups that transfer backup commands for volumes to the backup agent module. A backup target server having a manager module and a third agent for capturing predetermined unit data from the backup target disk; And a fourth agent for analyzing the predetermined unit data received from the third agent and generating backup data in a plurality of threads, and a backup disk storing the backup data generated from the fourth agent. N, including a server, wherein the third agent receives a backup command from the backup manager module, divides a volume of data of a backup target disk into unit data of a predetermined size, and performs several flows in one process. Generating four threads and sequentially compressing the divided unit data and transmitting the same to the fourth agent.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 백업 시스템은, 상기 백업 마스터 모듈을 포함하는 백업 마스터 서버와, 상기 백업 매니저 모듈과 상기 제 3 에이전트를 포함하고, 상기 백업 대상 디스크가 구비되는 복수 개의 백업 대상 서버가 별도로 구성되며, 상기 백업 마스터 서버에서 백업 운영 명령어를 포함하는 명령어를 입력받아 백업 매니저 서버 측으로 전송하면, 백업 매니저 모듈에서 볼륨별 백업예약정보를 관리하고, 볼륨별 백업상태 및 백업 히스토리 정보를 수집하고 관리하며, 백업 스케쥴에 따라 디스크 볼륨에 대한 백업 명령어를 제 3 에이전트 모듈로 전송하고, 제 3 에이전트 모듈에서는 백업 매니저 모듈로부터 인가되는 백업 명령에 따라 상기 백업 대상 디스크의 데이터의 볼륨을 소정 크기의 단위 데이터들로 분할하고, 하나의 프로세스 내에서 여러 개의 플로우를 진행하는 N개의 쓰레드를 생성하여 상기 분할된 단위 데이터들을 순차적으로 압축하여 상기 제 4 에이전트로 전송시키도록 구성되는 것을 특징으로 한다.In addition, a high-performance backup system for a large amount of data according to the present invention, a backup master server including the backup master module, the backup manager module and the third agent, a plurality of backups provided with the backup target disk If the target server is configured separately, and receives a command including a backup operation command from the backup master server and transmits the command to the backup manager server side, the backup manager module manages backup schedule information for each volume, and backup status and backup history information for each volume. Collects and manages and transmits a backup command for a disk volume to a third agent module according to a backup schedule, and the third agent module selects a volume of data of the backup target disk according to a backup command applied from a backup manager module. Split into unit data of size, and To generate the N threads advancing multiple flows in the process is characterized in that by compressing the divided unit data is sequentially configured to transfer to the fourth agent.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 백업 시스템은, 상기 백업 마스터 모듈을 포함하는 백업 마스터 서버; 상기 백업 매니저 모듈을 포함하고, 백업 대상 디스크가 구비되는 복수 개의 백업 매니저 서버; 및 제 3 에이전트를 포함하고, 백업 디스크가 구비되는 백업 전용 서버가 별도로 구성되며, 상기 백업 마스터 서버에서 백업 운영 명령어를 포함하는 명령어를 입력받아 백업 매니저 서버 측으로 전송하면, 백업 매니저 서버 내의 백업 매니저 모듈에서는 볼륨별 백업예약정보를 관리하고, 백업 대상 디스크에서 데이터의 볼륨을 소정 크기의 단위 데이터들로 분할하여 읽어들여 백업 서버 측으로 전송하며, 백업 서버 측에서의 백업진행에 따른 볼륨별 백업상태 및 백업 히스토리 정보를 수집하고 관리하며, 백업 스케쥴에 따라 디스크 볼륨에 대한 백업 명령어를 백업 전용 서버로 전송하고, 백업 서버 내의 제 4 에이전트 모듈에서는 백업 매니저 모듈로부터 인가되는 백업 명령어에 의해 N개의 쓰레드를 생성하고, 소정 크기의 단위 데이터들을 차례로 받아들여 생성된 N개의 쓰레드가 상기 분할된 단위 데이터들을 순차적으로 압축하여 상기 백업 디스크로 저장시키도록 구성되는 것을 특징으로 한다.In addition, a high performance backup system for a large amount of data according to the present invention, a backup master server including the backup master module; A plurality of backup manager servers including the backup manager module and having a backup target disk; And a third agent, and a backup dedicated server having a backup disk is separately configured, and when a command including a backup operation command is received from the backup master server and transmitted to a backup manager server, a backup manager module in the backup manager server Manages backup schedule information for each volume, divides the volume of data from the backup target disk into unit data of a certain size, reads it, and sends it to the backup server, and backup status and backup history information for each volume according to the backup progress on the backup server side. Collects and manages them, and transmits a backup command for a disk volume to a backup dedicated server according to a backup schedule, and the fourth agent module in the backup server creates N threads by a backup command authorized from a backup manager module. In turn, unit data of size To the son opening the generated N number of threads compressing the divided unit data sequentially it is characterized in that is configured to store to the backup disc.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 방법은, 복제 대상 데이터가 저장된 복제 대상 디스크와, 상기 복제 대상 디스크로부터 소정 단위 데이터의 캡처를 진행하는 제 1 에이전트를 갖는 복제 대상 서버; 및 상기 제 1 에이전트로부터 수신된 각각의 상기 소정 단위 데이터를 분석하여 복수의 쓰레드로 복제 데이터를 생성하는 제 2 에이전트와, 상기 제 2 에이전트로부터 생성된 상기 복제 데이터를 저장하는 복제 디스크를 갖는 복제 서버를 포함하는 대용량 데이터에 대한 고성능 복제 시스템을 이용한 대용량 데이터에 대한 고성능 복제 방법으로서, 상기 복제 대상 서버의 복제 대상 디스크에 대한 실시간 데이터 복제를 개시하는 단계(S100)와, 상기 제 2 에이전트에 의해 임의 설정된 멀티쓰레드의 개수만큼 큐(Queue)를 생성하는 단계(S200)와, 생성된 상기 큐에 상기 제 1 에이전트로부터 전달받은 데이터를 저장하는 단계(S300)와, 상기 큐에 저장된 데이터를 파싱(Parsing)하여 키(key)로 사용할 컬럼 값을 추출하는 단계(S400)와, 상기 컬럼 값을 각각의 멀티쓰레드 별로 분기하고, 분기된 각각의 상기 멀티쓰레드 별 큐에 상기 컬럼 값을 저장하는 단계(S500)와, 상기 컬럼 값이 저장된 상기 큐에서 상기 데이터를 추출하고, 추출된 상기 데이터를 상기 복제 서버에 적용하는 단계(S600)를 포함하는 것을 특징으로 한다.In addition, the high-performance replication method for a large amount of data according to the present invention, a replication target server having a replication target disk storing the replication target data, and a first agent for capturing predetermined unit data from the replication target disk; And a second agent for analyzing the predetermined unit data received from the first agent to generate duplicate data in a plurality of threads, and a duplicate disk storing the duplicate data generated from the second agent. A high performance replication method for a large amount of data using a high performance replication system for a large amount of data comprising a step of initiating real-time data replication for the replication target disk of the replication target server (S100), and optionally by the second agent Generating a queue by the number of the set multi-thread (S200), storing the data received from the first agent in the generated queue (S300), and parsing the data stored in the queue (Sars) Extracting a column value to be used as a key (S400), and multithreading each of the column values. Branching each other, and storing the column value in each branched multi-threaded queue (S500), extracting the data from the queue in which the column value is stored, and applying the extracted data to the replication server. It characterized in that it comprises a step (S600).

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 방법은, 상기 단계(S200)에서, 상기 멀티쓰레드에서 데이터 커넥션을 각각 생성하고, 생성된 상기 데이터 커넥션에 상기 큐의 객체를 연결하는 단계(S210)를 더 포함하는 것을 특징으로 한다.In addition, the high-performance replication method for a large amount of data according to the present invention, in the step (S200), respectively generating a data connection in the multi-thread, and connecting the object of the queue to the generated data connection (S210) It characterized in that it further comprises.

또한, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 방법은, 복제 대상 데이터가 저장된 복제 대상 디스크와, 상기 복제 대상 디스크로부터 소정 단위 데이터의 캡처를 진행하는 제 1 에이전트를 갖는 복제 대상 서버; 및 상기 제 1 에이전트로부터 수신된 각각의 상기 소정 단위 데이터를 분석하여 복수의 쓰레드로 복제 데이터를 생성하는 제 2 에이전트와, 상기 제 2 에이전트로부터 생성된 상기 복제 데이터를 저장하는 복제 디스크를 갖는 복제 서버를 포함하는 대용량 데이터에 대한 고성능 복제 시스템을 이용한 대용량 데이터에 대한 고성능 복제 방법으로서, 복제 대상 서버의 복제 대상 디스크에 대한 실시간 데이터 캡처를 개시하는 단계(S1000)와, 상기 제 1 에이전트에 의해 임의 설정된 멀티쓰레드의 개수만큼 큐를 생성하는 단계(S2000)와, 동작중인 멀티쓰레드의 개수가 임의 설정된 최대 상기 멀티쓰레드의 개수보다 작은지의 여부를 판별하는 단계(S3000)와, 동작중인 상기 멀티쓰레드의 개수가 상기 최대 멀티쓰레드의 개수 이상일 경우에는, 동작중인 상기 멀티쓰레드의 데이터 캡처 주기가 완료될 때까지 대기(S4000)하고, 동작중인 상기 멀티쓰레드의 개수가 상기 최대 멀티쓰레드의 개수 미만일 경우에는, 데이터 캡처를 진행할 멀티쓰레드를 구동 후 상기 데이터 캡처를 진행(S5000)하는 단계와, 상기 데이터 캡처의 진행이 완료되면, 상기 복제 대상 디스크에 캡처할 데이터가 더 존재하는지의 여부를 판별하는 단계(S6000)와, 캡처할 데이터가 더 존재할 경우에는 상기 단계(S3000)로 진행되고, 캡처할 데이터가 더 존재하지 않을 경우에는 대기하는 단계(S4000)로 진행되는 것을 특징으로 한다.In addition, the high-performance replication method for a large amount of data according to the present invention, a replication target server having a replication target disk storing the replication target data, and a first agent for capturing predetermined unit data from the replication target disk; And a second agent for analyzing the predetermined unit data received from the first agent to generate duplicate data in a plurality of threads, and a duplicate disk storing the duplicate data generated from the second agent. A high performance replication method for a large amount of data using a high performance replication system for a large amount of data, comprising: initiating real-time data capture of a replication target disk of a replication target server (S1000) and optionally set by the first agent. Generating a queue by the number of multithreads (S2000), determining whether the number of the multithreads in operation is smaller than a predetermined number of multithreads (S3000), and the number of the multithreads in operation. Is greater than or equal to the maximum number of multithreads, Wait until the data capture cycle of the thread is completed (S4000), and if the number of the multithreads in operation is less than the maximum number of multithreads, the data capture proceeds after driving the multithread to proceed with data capture (S5000). And determining whether there is more data to be captured on the copy target disk (S6000) when the data capturing process is completed, and when the data to be captured is further present (S3000). If there is no more data to be captured, the process proceeds to the waiting step (S4000).

본 발명에 의하면, 복제 대상 시스템의 데이터를 복제하는 과정에 있어서, 멀티쓰레드 방식을 이용하여 실시간으로 데이터 복제를 수행할 수 있도록 하는 효과가 있다.According to the present invention, in the process of copying data of a replication target system, data replication can be performed in real time using a multithreaded method.

또한, 본 발명의 다른 목적은 멀티쓰레드 방식을 이용하여 실시간으로 데이터 백업을 수행할 수 있는 효과가 있다.In addition, another object of the present invention has the effect of performing a data backup in real time using a multi-threaded method.

도 1은 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 시스템의 구성을 도시한 구성도.
도 2는 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 시스템에서 멀티쓰레드에 의해 대용량 데이터의 복제가 이루어지는 것을 도시한 도면.
도 3은 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 시스템에서 단일쓰레드에 의해 대용량 데이터의 복제가 이루어지는 것을 도시한 도면.
도 4는 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 방법의 흐름을 도시한 플로어 차트.
도 5는 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 방법에서 복제 대상 데이터의 데이터 캡처 방법의 흐름을 도시한 플로어 차트.
도 6은 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 백업 시스템의 구성을 도시한 구성도.
도 7은 본 발명의 다른 실시예에 따른 대용량 데이터에 대한 고성능 백업 시스템의 구성을 도시한 구성도.
도 8은 본 발명에 또 다른 실시예에 따른 대용량 데이터에 대한 고성능 백업 시스템의 구성을 도시한 구성도.1 is a block diagram showing the configuration of a high performance replication system for large amounts of data according to an embodiment of the present invention.
2 is a diagram illustrating the replication of a large amount of data by multithreading in a high performance replication system for large amounts of data according to an embodiment of the present invention.
3 is a diagram illustrating the replication of a large amount of data by a single thread in a high performance replication system for a large amount of data according to an embodiment of the present invention.
4 is a floor chart showing the flow of a high performance replication method for large amounts of data according to an embodiment of the present invention.
5 is a floor chart illustrating a flow of a data capturing method of data to be replicated in a high performance replication method for a large amount of data according to an embodiment of the present invention.
6 is a block diagram showing the configuration of a high performance backup system for a large amount of data according to an embodiment of the present invention.
7 is a block diagram showing the configuration of a high performance backup system for a large amount of data according to another embodiment of the present invention.
8 is a block diagram showing the configuration of a high performance backup system for a large amount of data according to another embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.BRIEF DESCRIPTION OF THE DRAWINGS The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms "comprises" or "having" and the like refer to the presence of stated features, integers, steps, operations, elements, components, or combinations thereof, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

또한, 명세서에 기재된 "…부", "…모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Also, the terms " part, "" module," and the like, which are described in the specification, refer to a unit for processing at least one function or operation, and may be implemented by hardware or software or a combination of hardware and software.

또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

이하, 본 발명의 실시예에 대해 관련 도면들을 참조하여 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 시스템의 구성을 도시한 구성도이다.1 is a block diagram showing the configuration of a high performance replication system for a large amount of data according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 시스템은, 복제 대상 데이터가 저장된 복제 대상 디스크(120)와, 복제 대상 디스크(120)의 데이터의 블록을 복수의 소정 단위 데이터로 각각 분할 후, 일정 구간을 산정하여 복수의 쓰레드로 일정 구간의 소정 단위 데이터를 캡처하고, 캡처가 종료된 쓰레드는 큐(Queue)로 소정 단위 데이터를 전송함과 아울러 일정 구간 다음의 다른 구간을 산정하여 다른 구간의 소정 단위 데이터의 캡처를 진행하는 제 1 에이전트(110)를 갖는 복제 대상 서버(100); 및 제 1 에이전트(110)로부터 수신된 각각의 소정 단위 데이터를 분석하여 복수의 쓰레드로 복제 데이터를 생성하는 제 2 에이전트(210)와, 제 2 에이전트(210)로부터 생성된 복제 데이터를 저장하는 복제 디스크(220)를 갖는 복제 서버(200)를 포함한다.Referring to FIG. 1, a high performance replication system for a large amount of data according to an exemplary embodiment of the present invention may include a replication target disk 120 in which replication target data is stored, and a block of data of the replication target disk 120 in a plurality of predetermined units. After splitting into data, each section is calculated to capture certain unit data of a certain section with a plurality of threads, and the captured thread transfers the predetermined unit data to the queue and other sections after the certain section. A replication target server 100 having a first agent 110 for estimating and capturing predetermined unit data of another section; And a second agent 210 which analyzes each predetermined unit data received from the first agent 110 to generate duplicate data in a plurality of threads, and stores the duplicate data generated from the second agent 210. A replication server 200 having a disk 220.

우선, 제 1 에이전트(110)는 복제 대상 디스크(120)의 데이터의 블록을 복수의 소정 단위 데이터로 각각 분할한 후 일정 구간을 임의로 산정하고, 복수의 쓰레드로 일정 구간의 소정 단위 데이터를 캡처하게 된다. 이와 같이, 캡처가 종료된 쓰레드는 큐(Queue)로 소정 단위 데이터를 전송한다. 또한, 임으로 산정된 일정 구간에 대한 소정 단위 데이터의 캡처가 완료되면, 다음의 다른 구간을 임으로 산정하여 다른 구간의 소정 단위 데이터의 캡처를 진행하는 역할을 수행한다.First, the first agent 110 divides each block of data of the target disk 120 into a plurality of predetermined unit data, calculates a predetermined section arbitrarily, and captures predetermined unit data of the predetermined section with a plurality of threads. do. In this way, the thread that has completed the capture transfers predetermined unit data to the queue. In addition, when the capturing of predetermined unit data for a predetermined section is completed, the next other section is calculated as a random and serves to capture the predetermined unit data of another section.

다음, 복제 대상 디스크(120)는 사용자가 복제하고자 하는 실제 복제 대상 데이터로, 일반적인 파일 시스템의 파일 데이터인 것이 바람직하다. 하지만, 복제 대상 데이터가 데이터베이스의 온라인 리두 로그의 트랜잭션(Transaction) 정보일 수도 있다. 즉, 복제 대상 디스크(120)의 복제 대상 데이터가 데이터베이스일 경우에는, 복제 대상 서버(100)가 운영 DB를 포함하며, 이러한 운영 DB는 온라인 리두 로그의 정보 데이터를 아카이브 로그 파일로 생성하게 된다. 이때, 제 1 에이전트(110)는 온라인 리두 로그의 트랜잭션 정보를 캡처하여 후술하는 제 2 에이전트에 트랜잭션 로그 파일로 실시간 복제시키게 된다.Next, the copy target disk 120 is the actual copy target data to be copied by the user, and is preferably file data of a general file system. However, the replication target data may be transaction information of the online redo log of the database. That is, when the replication target data of the replication target disk 120 is a database, the replication target server 100 includes an operation DB, which generates the information data of the online redo log as an archive log file. At this time, the first agent 110 captures transaction information of the online redo log and replicates the transaction information to a second agent, which will be described later, in real time as a transaction log file.

다음, 제 2 에이전트(210)는 제 1 에이전트(110)로부터 수신된 각각의 소정 단위 데이터를 분석하여 복수의 쓰레드로 복제 데이터를 생성하는 역할을 수행한다. 이에 대해, 도 2 및 도 3을 이용하여 좀더 상세히 후술하도록 한다.Next, the second agent 210 analyzes each piece of predetermined unit data received from the first agent 110 to generate duplicate data in a plurality of threads. This will be described later in more detail with reference to FIGS. 2 and 3.

다음, 복제 디스크(200)는 제 2 에이전트(210)로부터 생성된 복제 데이터를 저장하는 역할을 수행한다. 또한, 제 1 에이전트(110)는 제 2 에이전트(210)로 소정 단위 데이터를 큐의 구조로 송신하는 것이 바람직하다.
Next, the clone disk 200 stores the duplicate data generated from the second agent 210. In addition, the first agent 110 preferably transmits predetermined unit data to the second agent 210 in a queue structure.

다음, 도 2는 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 시스템에서 멀티쓰레드에 의해 대용량 데이터의 복제가 이루어지는 것을 도시한 도면이다.Next, FIG. 2 is a diagram illustrating replication of a large amount of data by multithreading in a high performance replication system for a large amount of data according to an embodiment of the present invention.

도 2를 참조하여, 멀티쓰레드 방식에 의해 데이터의 복제가 이루어지는 구조에 대해 설명한다.Referring to Fig. 2, a structure in which data is replicated by a multithreaded method will be described.

상술한 바와 같이, 제 1 에이전트(110)로부터 제 2 에이전트(210)로 수신된 소정 단위 데이터는 멀티쓰레드 방식에 의해 복제 디스크(220)에 복제된다.As described above, the predetermined unit data received from the first agent 110 to the second agent 210 is copied to the copy disk 220 by a multithreaded method.

상술한 바와 같이, 복제 대상 서버(100)의 제 1 에이전트(110)는 복제 대상 디스크(120)의 트랜잭션 정보를 캡처하고, 캡처한 트랜잭션 정보를 제 2 에이전트(210)로 전송함으로써 트랜잭션 로그 파일로 생성하게 된다.As described above, the first agent 110 of the replication target server 100 captures the transaction information of the replication target disk 120 and transmits the captured transaction information to the second agent 210 to the transaction log file. Will be created.

이러한 트랜잭션 로그 파일에 대해 멀티쓰레드 방식을 이용한 데이터 복제 방법은 다음과 같다.The multi-threaded data replication method for such a transaction log file is as follows.

제 1 에이전트(110)로부터 제 2 에이전트(210)로 수신된 트랜잭션 로그 파일은 큐(queue)(300)의 구조로 되어 있다. 이러한 큐(300)는 PK 컬럼(310) 또는 유니크 인덱스(unique index) 컬럼(320)을 기준으로 병렬로 분산시켜 처리하게 된다. 예를 들면, 큐(300)가 PK 컬럼(310)을 기준으로 하여 큐(300)를 구성할 경우에는, PK = 100, PK = 200, PK = 300, PK = 400 등으로 구성된다. 또한, 큐(300)가 유니크 인덱스 컬럼(320)을 기준으로 하여 큐(300)를 구성할 경우에는, (TX#1), (TX#2), (TX#3), (TX#4), (TX#5), (TX#6), (TX#7) 등으로 구성된다. 물론, 본 발명에서는 설명의 용이함을 위해, 도 2에 도시된 바와 같이, 7개의 트랜잭션 로그 파일을 예로써 설명하였지만, 이에 한정되는 것은 아니다.The transaction log file received from the first agent 110 to the second agent 210 has a structure of a queue 300. The queue 300 is distributed and processed in parallel with respect to the PK column 310 or the unique index column 320. For example, when the queue 300 configures the queue 300 based on the PK column 310, the queue 300 includes PK = 100, PK = 200, PK = 300, PK = 400, and the like. In addition, when the queue 300 configures the queue 300 based on the unique index column 320, (TX # 1), (TX # 2), (TX # 3), and (TX # 4). , (TX # 5), (TX # 6), (TX # 7), and the like. Of course, in the present invention, for convenience of description, seven transaction log files have been described as an example as shown in FIG. 2, but the present invention is not limited thereto.

다음, 큐(300)를 전송받은 제 2 에이전트(210)는 멀티쓰레드(340)의 개수만큼 데이터 커넥션(330)을 생성한다. 여기서, 동일한 멀티 키(동일한 PK 컬럼(310) 또는 동일한 유니크 인덱스 컬럼(320))를 가진 큐(300)들은 하나의 데이터 커넥션(330)을 사용하여 순차적으로 복제 시스템(220)으로 복제된다. 여기서, 멀티쓰레드(340)의 개수는 트랜잭션 로그 파일의 발생량에 따라 적절히 조절할 수 있다.
Next, the second agent 210 receiving the queue 300 creates the data connection 330 as many as the number of multithreaded 340. Here, the queues 300 having the same multi-key (the same PK column 310 or the same unique index column 320) are sequentially replicated to the replication system 220 using one data connection 330. Here, the number of multithreaded 340 may be appropriately adjusted according to the generation amount of the transaction log file.

한편, 도 3은 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 시스템에서 단일쓰레드에 의해 대용량 데이터의 복제가 이루어지는 것을 도시한 도면이다.On the other hand, Figure 3 is a diagram showing the replication of a large amount of data by a single thread in a high-performance replication system for large amounts of data according to an embodiment of the present invention.

도 3을 참조하면, 멀티쓰레드 방식에 의한 데이터 복제 도중 해당 멀티 키 컬럼 값이 변경될 경우에는, 임시적으로 단일쓰레드 방식을 적용함으로써, 트랜잭션 로그 파일의 단위를 보강하여야 데이터의 정합성을 보장하게 된다.Referring to FIG. 3, when a corresponding multi-key column value is changed during data replication by a multithreaded method, by temporarily applying a single threaded method, the unit of a transaction log file must be reinforced to ensure data consistency.

예를 들면, 도 3에 도시된 바와 같이, 트랜잭션 로그 파일의 집합 구조인 큐(300)를 분석한 결과, PK = 400이 PK = 300으로 멀티 키 컬럼 값이 변경될 경우에는, 멀티 키 컬럼 값과 상관없이 오직 하나의 데이터 커넥션(330) 만을 사용하여 순차적으로 복제 시스템(220)으로 복제된다.For example, as shown in FIG. 3, when the queue 300, which is a collection structure of transaction log files, is analyzed and PK = 400 is changed to PK = 300, the multi-key column value is changed. Regardless, only one data connection 330 is used to replicate to the replication system 220 sequentially.

이에 대한 처리가 완료되면, 도 2에서 설명한 바와 같이, 다시 멀티쓰레드 방식을 사용하여 트랜잭션을 적용하면 된다.
When the processing for this is completed, as described in FIG. 2, the transaction may be applied again using a multithreaded method.

다음, 도 4는 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 방법의 흐름을 도시한 플로어 차트이다.Next, FIG. 4 is a floor chart showing the flow of a high performance replication method for a large amount of data according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 방법은, 복제 대상 데이터가 저장된 복제 대상 디스크(120)와, 복제 대상 디스크(120)로부터 소정 단위 데이터의 캡처를 진행하는 제 1 에이전트(110)를 갖는 복제 대상 서버(100); 및 제 1 에이전트(110)로부터 수신된 각각의 소정 단위 데이터를 분석하여 복수의 쓰레드로 복제 데이터를 생성하는 제 2 에이전트(210)와, 제 2 에이전트(210)로부터 생성된 복제 데이터를 저장하는 복제 디스크(220)를 갖는 복제 서버(200)를 포함하는 대용량 데이터에 대한 고성능 복제 시스템을 이용한 대용량 데이터에 대한 고성능 복제 방법으로서, 복제 대상 서버(100)의 복제 대상 디스크(120)에 대한 실시간 데이터 복제를 개시하는 단계(S100)와, 제 2 에이전트(210)에 의해 임의 설정된 멀티쓰레드의 개수만큼 큐(Queue)를 생성하는 단계(S200)와, 생성된 큐에 제 1 에이전트(110)로부터 전달받은 데이터를 저장하는 단계(S300)와, 큐에 저장된 데이터를 파싱(Parsing)하여 키(key)로 사용할 컬럼 값을 추출하는 단계(S400)와, 컬럼 값을 각각의 멀티쓰레드 별로 분기하고, 분기된 각각의 멀티쓰레드 별 큐에 컬럼 값을 저장하는 단계(S500)와, 컬럼 값이 저장된 큐에서 데이터를 추출하고, 추출된 데이터를 복제 서버(200)에 적용하는 단계(S600)를 포함한다.Referring to FIG. 4, a high performance replication method for a large amount of data according to an embodiment of the present invention includes capturing predetermined unit data from a replication target disk 120 and a replication target disk 120 in which replication target data is stored. A replication target server 100 having a first agent 110; And a second agent 210 which analyzes each predetermined unit data received from the first agent 110 to generate duplicate data in a plurality of threads, and stores the duplicate data generated from the second agent 210. As a high performance replication method for a large amount of data using a high performance replication system for a large amount of data including a replication server 200 having a disk 220, real-time data replication to the replication target disk 120 of the replication target server 100 Starting step (S100), creating a queue (S200) by the number of multi-threads arbitrarily set by the second agent 210 (S200), and received from the first agent 110 to the generated queue Storing the data (S300), parsing the data stored in the queue (Parsing) and extracting a column value to be used as a key (S400), branching the column value for each multi-thread, Extracting data from the step (S500) to gidoen store the column values for each of the multiple threads per queue, the queue the column values are stored, and a step (S600) for applying the extracted data to the replication server 200.

여기서, 제 2 에이전트(210)에 의해 임의 설정된 멀티쓰레드의 개수만큼 큐(Queue)를 생성하는 단계(S200)에서는, 각각의 멀티쓰레드(440)에서 데이터 커넥션을 각각 생성하고, 생성된 데이터 커넥션에 큐의 객체를 연결하는 단계(S210)를 더 포함하게 된다.
Here, in the step (S200) of generating a queue by the number of multi-threads arbitrarily set by the second agent 210, each of the multi-threads 440 generates a data connection, respectively, to the generated data connection The method may further include a step S210 of connecting the objects of the queue.

다음, 도 5는 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 복제 방법에서 복제 대상 데이터의 데이터 캡처 방법의 흐름을 도시한 플로어 차트이다.Next, FIG. 5 is a flowchart illustrating a flow of a data capturing method of data to be replicated in a high performance copy method for a large amount of data according to an embodiment of the present invention.

도 5를 참조하면, 본 발명에 따른 대용량 데이터에 대한 고성능 복제 방법에서 복제 대상 데이터의 데이터 캡처 방법의 흐름은 다음과 같다.Referring to FIG. 5, the flow of a data capturing method of data to be replicated in a high performance replication method for a large amount of data according to the present invention is as follows.

복제 대상 데이터가 저장된 복제 대상 디스크(120)와, 복제 대상 디스크(120)로부터 소정 단위 데이터의 캡처를 진행하는 제 1 에이전트(110)를 갖는 복제 대상 서버(100); 및 제 1 에이전트(110)로부터 수신된 각각의 소정 단위 데이터를 분석하여 복수의 쓰레드로 복제 데이터를 생성하는 제 2 에이전트(210)와, 제 2 에이전트(210)로부터 생성된 복제 데이터를 저장하는 복제 디스크(220)를 갖는 복제 서버(200)를 포함하는 대용량 데이터에 대한 고성능 복제 시스템을 이용한 대용량 데이터에 대한 고성능 복제 방법으로서, 복제 대상 서버(100)의 복제 대상 디스크(120)에 대한 실시간 데이터 캡처를 개시하는 단계(S1000)와, 제 1 에이전트(110)에 의해 임의 설정된 멀티쓰레드의 개수만큼 큐를 생성하는 단계(S2000)와, 동작중인 멀티쓰레드의 개수가 임의 설정된 최대 멀티쓰레드의 개수보다 작은지의 여부를 판별하는 단계(S3000)와, 동작중인 멀티쓰레드의 개수가 최대 멀티쓰레드의 개수 이상일 경우에는, 동작중인 멀티쓰레드의 데이터 캡처 주기가 완료될 때까지 대기(S4000)하고, 동작중인 멀티쓰레드의 개수가 최대 멀티쓰레드의 개수 미만일 경우에는, 데이터 캡처를 진행할 멀티쓰레드를 구동 후 데이터 캡처를 진행(S5000)하는 단계와, 데이터 캡처의 진행이 완료되면, 복제 대상 디스크(120)에 캡처할 데이터가 더 존재하는지의 여부를 판별하는 단계(S6000)와, 캡처할 데이터가 더 존재할 경우에는 상기 단계(S3000)로 진행되고, 캡처할 데이터가 더 존재하지 않을 경우에는 대기하는 단계(S4000)로 진행된다.
A replication target server (100) having a replication target disk (120) in which replication target data is stored, and a first agent (110) for capturing predetermined unit data from the replication target disk (120); And a second agent 210 which analyzes each predetermined unit data received from the first agent 110 to generate duplicate data in a plurality of threads, and stores the duplicate data generated from the second agent 210. A high performance replication method for a large amount of data using a high performance replication system for a large amount of data including a replication server 200 having a disk 220, the real-time data capture of the replication target disk 120 of the replication target server 100 Initiating a step (S1000), generating a queue by the number of multithreads arbitrarily set by the first agent 110 (S2000), and the number of operating multithreads is smaller than the maximum number of multithreads set arbitrarily Step (S3000) and, if the number of multi-threads in operation is greater than the maximum number of multi-threads, the data of the multi-threads in operation Waiting until the capture cycle is completed (S4000), and if the number of running multithreads is less than the maximum number of multithreads, driving the multithread to perform data capture and then performing data capture (S5000), and When the process of capturing is completed, it is determined whether there is more data to be captured on the copy target disk 120 (S6000), and when there is more data to be captured, the process proceeds to the above step (S3000). If there is no data to be performed, the process proceeds to the waiting step (S4000).

다음, 도 6은 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 백업 시스템의 구성을 도시한 구성도이다.Next, Figure 6 is a block diagram showing the configuration of a high-performance backup system for a large amount of data according to an embodiment of the present invention.

도 6을 참조하면, 본 발명의 실시예에 따른 대용량 데이터에 대한 고성능 백업 시스템은, 백업 대상 데이터가 저장된 백업 대상 디스크(440)와, 백업 운영 명령어를 포함하는 명령어가 입력되고, 소정 명령에 대한 결과를 출력하는 입출력부(460)와, 입출력부(460)로 인가되는 백업 운영 명령어를 처리하여 백업이 수행되도록 제어하는 중앙 제어 유닛(450)과, 입출력부(460)와 중앙 제어 유닛(450)을 통해 인가되는 백업 운영 명령어를 인가받아 백업 매니저 모듈(420)로 전송하는 백업 마스터 모듈(410)과, 백업 마스터 모듈(410)로부터 백업 운영에 필요한 백업 운영 명령어를 수신받아 볼륨별 백업 예약 정보를 관리하고, 볼륨별 백업 상태 및 백업 히스토리 정보를 수집하고 관리하며, 백업 스케쥴에 따라 디스크 볼륨에 대한 백업 명령어를 제 3 에이전트(430)로 전송하는 백업 매니저 모듈(420)과, 백업 대상 디스크(440)로부터 소정 단위 데이터의 캡처를 진행하는 제 3 에이전트(430)를 갖는 백업 대상 서버(400); 및 제 3 에이전트(430)로부터 수신된 각각의 소정 단위 데이터를 분석하여 복수의 쓰레드로 백업 데이터를 생성하는 제 4 에이전트(510)와, 제 4 에이전트(510)로부터 생성된 백업 데이터를 저장하는 백업 디스크(520)를 갖는 백업 서버(500)를 포함한다.Referring to FIG. 6, in the high performance backup system for a large amount of data according to an embodiment of the present invention, a backup target disk 440 in which backup target data is stored, and a command including a backup operation command are input, and An input / output unit 460 for outputting a result, a central control unit 450 for processing a backup operation command applied to the input / output unit 460 to perform backup, and an input / output unit 460 and a central control unit 450 The backup master command 410, which is authorized through the backup operation command and transmits the backup operation command to the backup manager module 420, and the backup operation command required for the backup operation from the backup master module 410. Management, collect and manage backup status and backup history information for each volume, and send backup commands for the disk volume to the third agent 430 according to the backup schedule. A backup target server 400 having a backup manager module 420 and a third agent 430 for capturing predetermined unit data from the backup target disk 440; And a fourth agent 510 which analyzes each predetermined unit data received from the third agent 430 and generates backup data in a plurality of threads, and a backup which stores backup data generated from the fourth agent 510. Backup server 500 with disk 520.

도 6에 의하면, 본 발명에 따른 대용량 데이터에 대한 고성능 백업 시스템은 백업 대상 서버(400)와, 백업 서버(500)가 분리되어 구성되어 있지만, 하나의 컴퓨터 시스템 내에 통합되어 구성될 수도 있으며, 컴퓨터 시스템에 있어서 본 발명과 직접적인 관련이 없는 부분은 생략하기로 한다.Referring to FIG. 6, the high performance backup system for a large amount of data according to the present invention may be configured by separately separating the backup target server 400 and the backup server 500, but may be integrated into one computer system. Portions of the system not directly related to the present invention will be omitted.

도 6의 구체적인 구성으로는 백업 마스터 모듈(410)과, 백업 매니저 모듈(420), 제 3 에이전트(430)와 같이 하나 또는 그 이상의 고유한 기능을 수행하기 위한 단위 유닛들인 모듈들과, 외부로부터 백업 운영 명령어를 포함하는 명령어들이 입력되는 입출력부(460)와, 백업할 대상 데이터가 저장되어 있는 백업 대상 디스크(440), 그리고, 입출력부(450)를 통해 인가되는 명령어에 의해 모듈들(410, 420, 430)을 제어하는 중앙 제어 유닛(450)으로 구성된다.6 includes a backup master module 410, modules which are unit units for performing one or more unique functions, such as the backup manager module 420 and the third agent 430, and from outside. Modules 410 by an input / output unit 460 to which instructions including a backup operation command are input, a backup target disk 440 storing target data to be backed up, and a command applied through the input / output unit 450. It consists of a central control unit 450 for controlling the, 420, 430.

구체적으로, 백업 마스터 모듈(410)은 백업 대상 서버(400)를 관리 운영하기 위한 기능을 수행하는 부분으로서, 백업 매니저 모듈(420)이 복수 개로 구성되는 경우에는 다수 개의 백업 매니저 모듈(420)들을 그룹 단위로 묶어서 백업을 운영 및 관리하도록 구성되는데, 볼륨별로 백업 예약 정보를 관리하며, 백업 스케쥴에 따라 백업 명령을 백업 매니저 모듈(420)로 제공한다.In detail, the backup master module 410 performs a function for managing and operating the backup target server 400. When the backup manager module 420 includes a plurality of backup manager modules 420, a plurality of backup manager modules 420 may be provided. It is configured to operate and manage the backup in groups, and manages backup schedule information for each volume, and provides a backup command to the backup manager module 420 according to the backup schedule.

이때, 백업 예약 정보란 백업 운영자가 자동 백업에 대하여 어떤 디스크 내의 데이터를, 어느 백업 디스크로, 어떤 시간대에, 며칠 주기로 등을 설정해두는 것을 말하는 것이며, 예약된 백업 스케줄에 따라 백업 마스터 모듈(410)이 자동으로 운영되어 백업 매니저 모듈(420)과 제 3 에이전트(430)에서 백업이 진행되도록 구성된다.In this case, the backup schedule information means that the backup operator sets data in a certain disk, which backup disk, at a certain time period, and every few days for automatic backup, and the backup master module 410 according to the scheduled backup schedule. This operation is automatically performed, and the backup manager module 420 and the third agent 430 are configured to proceed with the backup.

백업 매니저 모듈(420)은 백업 마스터 모듈(410)로부터 백업 운영에 필요한 백업 운영 명령어를 수신받아 제 3 에이전트(430)로 전송하도록 구성되고, 제 3 에이전트(430)에서 수행되는 백업 진행에 대한 모든 정보를 제공받아 볼륨별 백업 상태 및 백업 히스토리에 대한 정보를 수집하여 백업 마스터 모듈(410)로 인가하도록 구성된다.The backup manager module 420 is configured to receive a backup operation command required for backup operation from the backup master module 410 and transmit the backup operation command to the third agent 430, and all of the backup progresses performed by the third agent 430. The information is configured to collect information on the backup status and backup history for each volume and apply the information to the backup master module 410.

그리고, 제 3 에이전트(430)는 백업 매니저 모듈(420)로부터 백업 명령어를 수신받아, 수신된 명령어에 따라 백업을 진행하도록 구성되는데, 백업 대상 디스크(440)에 대한 백업을 명령받으면, 백업 대상 디스크(440) 내의 소정 단위 데이터의 캡처를 진행하게 된다.The third agent 430 receives the backup command from the backup manager module 420 and performs a backup according to the received command. When the third agent 430 receives a backup command for the backup target disk 440, the backup target disk is received. The capture of the predetermined unit data in 440 is performed.

이렇게 제 3 에이전트(430)로부터 수신된 각각의 소정 단위 데이터를 분석하여 복수의 쓰레드로 백업 데이터를 생성한 백업 데이터를 백업 디스크(520)에 저장시키도록 구성된다.The predetermined unit data received from the third agent 430 is analyzed to store the backup data generated in the backup data in a plurality of threads on the backup disk 520.

그리고, 제 3 에이전트 모듈(430)에서는 백업을 진행하면서 볼륨별 백업 정보를 수집하고 관리하도록 구성되고, 백업 진행 상황을 백업 매니저 모듈(420)에 통보하도록 구성된다.The third agent module 430 is configured to collect and manage backup information for each volume while the backup is in progress, and to notify the backup manager module 420 of the backup progress.

이와 같이, 본 발명에 따른 대용량 데이터에 대한 고성능 백업 시스템은 백업할 백업 대상 디스크(440) 내의 데이터를 단위 데이터로 분할하여 읽어들이는 특징과, 읽어들인 단위 데이터들을 복수 개의 쓰레드가 각각 동시에 압축을 진행하여 백업 디스크(520)로 저장시키는 특징에 의해 백업에 소요되는 시간이 대폭 줄어들고, 데이터의 압축율이 증가하여 동일한 백업 디스크(520) 환경하에서 보다 많은 데이터를 저장하는 것이 가능해진다.
As described above, the high performance backup system for a large amount of data according to the present invention divides and reads the data in the backup target disk 440 to be backed up into unit data, and a plurality of threads simultaneously compress the read unit data. The time required for the backup is greatly reduced due to the feature of storing the backup disk 520 in advance, and the compression rate of the data is increased to store more data in the same backup disk 520 environment.

다음, 도 7은 본 발명의 다른 실시예에 따른 대용량 데이터에 대한 고성능 백업 시스템의 구성을 도시한 구성도이다.Next, FIG. 7 is a diagram illustrating a configuration of a high performance backup system for a large amount of data according to another embodiment of the present invention.

도 7을 참조하면, 본 실시예에 따른 대용량 데이터에 대한 고성능 백업 시스템은, 백업 매니저 모듈(420)과 제 3 에이전트(430), 백업 대상 디스크(440)를 별도의 백업 매니저 서버(700)로 구성하고, 백업 명령을 인가받아 백업 매니저 서버(700)에 인가하는 백업 마스터 모듈(410)을 갖는 백업 마스터 서버(600)를 구성한 것으로서, 백업 마스터 서버(600)와 백업 매니저 서버(700) 사이는 인터페이스로 연결되거나, 네트워크에 의해 연결되는 구성을 가질 수 있으며, 이러한 구성을 갖는 복수 개의 백업 매니저 서버(700)를 하나의 백업 마스터 서버(600)에서 관리하도록 트리형으로 구성할 수 있다.Referring to FIG. 7, a high performance backup system for a large amount of data according to the present embodiment may include a backup manager module 420, a third agent 430, and a backup target disk 440 as a separate backup manager server 700. And a backup master server 600 having a backup master module 410 that receives a backup command and applies the backup command to the backup manager server 700. Between the backup master server 600 and the backup manager server 700, It may have a configuration connected to the interface or connected by a network, it may be configured in a tree type to manage a plurality of backup manager server 700 having such a configuration in one backup master server 600.

이와 같이, 도 7에 나타난 구성 및 작용은 도 6에 나타난 구성 및 작용과 크게 다르지 않으며, 인터넷과 같은 개방형 네트워크에 의해 연결될 때 백업 마스터 서버(600)에 대해 클라이언트의 대응 개념을 갖게 되는 복수 개의 백업 매니저 서버(700)를 하나의 백업 마스터 서버(600)에서 예약된 백업정보에 따라 백업 운영 명령을 인가하여 관리하며, 백업 매니저 서버(700)에서는 백업 매니저 모듈(420)로 인가되는 백업 명령에 따라 제 3 에이전트(430)에 백업 명령을 전달하고, 제 3 에이전트(430)에서는 상기 백업 대상 디스크(440)의 데이터의 볼륨을 소정 크기의 단위 데이터들로 분할하여 읽어들인 뒤, N개의 쓰레드를 생성하여 상기 분할된 단위 데이터들을 순차적으로 압축하여 상기 백업 디스크 (520)로 저장시키도록 구성될 수 있다.As such, the configuration and operation shown in FIG. 7 is not significantly different from the configuration and operation shown in FIG. 6, and a plurality of backups having a concept of correspondence of the client to the backup master server 600 when connected by an open network such as the Internet. The manager server 700 is authorized to manage the backup operation command according to the backup information reserved by one backup master server 600, and the backup manager server 700 according to the backup command applied to the backup manager module 420. The backup command is transmitted to the third agent 430, and the third agent 430 divides and reads the volume of data of the backup target disk 440 into unit data having a predetermined size, and then generates N threads. The compressed unit data may be sequentially compressed to be stored in the backup disk 520.

이와 같이, 도 7에 따른 실시예도 또한 백업할 백업 대상 디스크(440) 내의 데이터를 단위 데이터로 분할하여 읽어들이는 특징과, 읽어들인 단위 데이터들을 복수 개의 쓰레드가 각각 동시에 압축을 진행하여 백업 디스크(520)로 저장시키는 특징에 의해 백업에 소요되는 시간이 대폭 줄어들고, 데이터의 압축율이 증가하여 동일한 백업 디스크 환경하에서 보다 많은 데이터를 저장하는 것이 가능해짐은 물론, 인터넷과 같은 개방형 네트워크를 통해 접속되는 클라이언트들, 즉 임시의 백업 매니저 서버(700) 들을 그룹 단위로 묶어서 백업을 운영 및 관리할 수 있다.
As described above, the embodiment according to FIG. 7 also divides and reads data in the backup target disk 440 into unit data to be backed up, and a plurality of threads compress the read unit data at the same time. 520) significantly reduces the time required for backup and increases the compression rate of the data, making it possible to store more data in the same backup disk environment, as well as to clients connected through an open network such as the Internet. For example, temporary backup manager servers 700 may be grouped into groups to operate and manage backups.

다음, 도 8은 본 발명에 또 다른 실시예에 따른 대용량 데이터에 대한 고성능 백업 시스템의 구성을 도시한 구성도이다.Next, Figure 8 is a block diagram showing the configuration of a high-performance backup system for a large amount of data according to another embodiment of the present invention.

도 8을 참조하면, 백업 마스터 서버(600)와, 백업 매니저 서버(700), 백업 서버(500)를 각각 별도의 서버로 구성하고, 각각의 서버들을 인터페이스 내지는 네트워크로 연결하여 백업을 수행하는 실시예로서, 하나의 백업 마스터 서버(600)에 복수 개의 백업 매니저 서버(700)들이 연결되어 있으며, 각각의 백업 매니저 서버(700)에는 각각 백업 서버(500)들이 연결되어 있다.Referring to FIG. 8, the backup master server 600, the backup manager server 700, and the backup server 500 are configured as separate servers, and each server is connected to an interface or a network to perform backup. For example, a plurality of backup manager servers 700 are connected to one backup master server 600, and backup servers 500 are connected to each backup manager server 700, respectively.

이때, 백업 대상 디스크(440)는 각각의 백업 매니저 서버(700)에 구성되어 데이터가 저장되어 있으며, 백업 디스크(520)는 각각의 백업 서버(500)에 구성되어 백업 대상 디스크(440)의 데이터를 압축하여 저장되도록 구성된다.In this case, the backup target disk 440 is configured in each backup manager server 700 to store data, and the backup disk 520 is configured in each backup server 500 to store data of the backup target disk 440. It is configured to compress and store.

이와 같은 도 8의 구성에서는 백업 마스터 서버(600)에서 백업 운영 명령어를 포함하는 명령어를 입력받아 백업 매니저 서버(700) 측으로 전송하면, 백업 매니저 서버(700) 내의 백업 매니저 모듈(420)에서는 볼륨별 백업예약정보를 관리하고, 백업 대상 디스크에서 데이터의 볼륨을 소정 크기의 단위 데이터들로 분할하여 읽어들여 백업 서버(500) 측으로 전송한다.In the configuration of FIG. 8, when the backup master server 600 receives a command including a backup operation command and transmits the command to the backup manager server 700, the backup manager module 420 in the backup manager server 700 controls volume by volume. The backup schedule information is managed, the volume of data is divided into unit data of a predetermined size in the backup target disk, and read and transmitted to the backup server 500.

백업 서버(500) 측에서는 백업 매니저 서버(700) 측으로부터 인가되는 백업 명령에 의해 N개의 쓰레드를 생성하고, 백업 매니저 서버(700) 측에서 인가되는 단위 데이터들을 생성된 N개의 쓰레드가 순차적으로 받아들여 압축하여 상기 백업 디스크로 저장시키도록 구성된다.
On the backup server 500 side, N threads are generated by a backup command applied from the backup manager server 700 side, and the generated N data are sequentially received by the N threads generated by the backup manager server 700 side. Compress and store the data on the backup disk.

이상에서는 본 발명의 실시예를 예로 들어 설명하였지만, 당업자의 수준에서 다양한 변경이 가능하다. 따라서, 본 발명은 상기의 실시예에 한정되어 해석되어서는 안되며, 이하에 기재된 특허청구범위에 의해 해석되어야 함이 자명하다.Although the embodiments of the present invention have been described above, various modifications may be made by those skilled in the art. Therefore, it should be understood that the present invention should not be construed as being limited to the above embodiments, but should be construed in accordance with the following claims.

100 : 복제 대상 서버
110 : 제 1 에이전트
120 : 복제 대상 디스크
200 : 복제 서버
210 : 제 2 에이전트
220 : 복제 디스크
400 : 백업 대상 서버
410 : 백업 마스터 모듈
420 : 백업 매니저 모듈
430 : 제 3 에이전트
440 : 백업 대상 디스크
450 : 중앙 제어 유닛
460 : 입출력부
500 : 백업 서버
510 : 제 4 에이전트
520 : 백업 디스크
600 : 백업 마스터 서버
700 : 백업 매니저 서버100: replication target server
110: first agent
120: Clone Target Disk
200: replication server
210: second agent
220: clone disk
400: Backup destination server
410: backup master module
420: backup manager module
430: third agent
440: Backup destination disk
450: central control unit
460: input and output unit
500: backup server
510: fourth agent
520: backup disc
600: backup master server
700: backup manager server

Claims

A replication target disk on which replication target data is stored;
After dividing the block of data of the disk to be copied into a plurality of predetermined unit data respectively, a predetermined section is calculated to capture the predetermined unit data of the predetermined section with a plurality of threads, and the finished thread is a queue. A replication target server having a first agent which transmits the predetermined unit data to the mobile station and calculates another section following the predetermined section to capture the predetermined unit data of the other section; And
A second agent which analyzes each of the predetermined unit data received from the first agent and generates duplicate data in a plurality of threads;
A replication server having a replication disk for storing the replication data generated from the second agent;
The first agent transmits the predetermined unit data to the second agent in a queue structure;
The second agent creates a data connection by the number of multithreads of the predetermined unit data of the queue structure received from the first agent,
In the replica disk, transactions having the same PK column or the same unique index column are replicated using only one of the connections,
When the value of the PK column or the unique index column is changed during the real-time data replication by the multithreaded, the second agent converts the multithreaded into a single thread and sequentially uses the single data connection for the duplicate disk. The high-performance replication system for a large amount of data, characterized in that for copying the predetermined unit data.

delete

The method of claim 1,
The second agent is a high-performance replication system for a large amount of data, characterized in that for processing in parallel to distribute the predetermined unit data received from the queue based on a PK column or a unique index column.

delete

A backup target disk on which the backup target data is stored;
An input / output unit for inputting a command including a backup operation command and outputting a result of the predetermined command;
A central control unit which controls a backup to be performed by processing a backup operation command applied to the input / output unit;
A backup master module which receives the backup operation command applied through the input / output unit and the central control unit and transmits the backup operation command to a backup manager module;
Receiving the backup operation command required for backup operation from the backup master module to manage the backup schedule information for each volume,
A backup manager module that collects and manages backup status and backup history information for each volume, and sends backup commands for disk volumes to a backup agent module according to a backup schedule;
A backup target server having a third agent for capturing predetermined unit data from the backup target disk; And
A backup server having a fourth agent for analyzing the predetermined unit data received from the third agent and generating backup data in a plurality of threads, and a backup disk storing the backup data generated from the fourth agent; Including,
The third agent receives a backup command from the backup manager module, divides the volume of data of the backup target disk into unit data having a predetermined size, and creates N threads for performing several flows in one process. High performance backup system for a large amount of data, characterized in that for sequentially compressing the divided unit data transmitted to the fourth agent.

The method according to claim 6,
A backup master server including the backup master module;
A plurality of backup target server including the backup manager module and the third agent, the backup target disk is provided separately,
When a command including a backup operation command is received from the backup master server and transmitted to the backup manager server, the backup manager module manages backup schedule information for each volume, collects and manages backup status and backup history information for each volume, and backs up the backup manager server. The backup command for the disk volume is transmitted to the third agent module according to a schedule, and the third agent module divides the volume of the data of the backup target disk into unit data having a predetermined size according to the backup command applied from the backup manager module. And generating N threads that perform several flows in one process, and sequentially compressing the divided unit data and transmitting the divided unit data to the fourth agent. .

The method according to claim 6,
A backup master server including the backup master module;
A plurality of backup manager servers including the backup manager module and having a backup target disk; And a third agent, and a backup-only server having a backup disk is configured separately.
When the backup master server receives a command including a backup operation command and transmits the command to the backup manager server, the backup manager module in the backup manager server manages backup schedule information for each volume, and manages the volume of data on the backup target disk. It divides the data into unit data and sends it to the backup server side. It collects and manages backup status and backup history information for each volume according to the backup progress on the backup server side. In the fourth agent module in the backup server, N threads are generated by a backup command applied from a backup manager module, and the N threads generated by sequentially receiving unit data of a predetermined size sequentially process the divided unit data. Compressed to above High-performance back-up system for the high-volume data, wherein configured to store up to the disk.

A replication target server having a replication target disk storing replication target data and a first agent for capturing predetermined unit data from the replication target disk; And a second agent for analyzing the predetermined unit data received from the first agent to generate duplicate data in a plurality of threads, and a duplicate disk storing the duplicate data generated from the second agent. As a high performance replication method for a large amount of data using a high performance replication system for a large amount of data, including;
Initiating real-time data replication for the replication target disk of the replication target server (S100),
Generating a queue by the number of multithreads arbitrarily set by the second agent (S200);
Storing the data received from the first agent in the created queue (S300);
Parsing the data stored in the queue and extracting a column value to be used as a key (S400);
Branching the column value for each multithread, and storing the column value in each branched multithreaded queue (S500);
Extracting the data from the queue in which the column value is stored, and applying the extracted data to the replication server (S600).

The method of claim 9,
In the step S200,
Generating a data connection in the multithread, and connecting the object of the queue to the generated data connection (S210).

A replication target server having a replication target disk storing replication target data and a first agent for capturing predetermined unit data from the replication target disk; And a second agent for analyzing the predetermined unit data received from the first agent to generate duplicate data in a plurality of threads, and a duplicate disk storing the duplicate data generated from the second agent. As a high performance replication method for a large amount of data using a high performance replication system for a large amount of data, including;
Initiating a real-time data capture for the replication target disk of the replication target server (S1000),
Generating a queue by the number of multithreads arbitrarily set by the first agent (S2000);
Determining whether the number of multithreads in operation is smaller than a predetermined number of multithreads (S3000);
When the number of multithreads in operation is equal to or greater than the maximum number of multithreads, the processor waits until the data capture period of the multithreads in operation is completed (S4000), and the number of multithreads in operation is the maximum multithread. If the number is less than, driving the multi-thread to proceed with data capture and proceeding to capture the data (S5000),
When the data capture process is completed, determining whether there is more data to be captured on the copy target disk (S6000);
If there is more data to be captured, the process proceeds to the step S3000, and if there is no more data to capture, the process proceeds to the waiting step S4000.