KR20190113366A

KR20190113366A - Distributed cluster management system and method for thereof

Info

Publication number: KR20190113366A
Application number: KR1020180035943A
Authority: KR
Inventors: 김학철; 강조현; 박정도; 진홍석; 한혁; 진성일
Original assignee: 주식회사 리얼타임테크
Priority date: 2018-03-28
Filing date: 2018-03-28
Publication date: 2019-10-08
Also published as: KR102038527B1; WO2019189963A1

Abstract

The present invention relates to a system for managing a distributed cluster and a method thereof. The system comprises: at least one task node performing parallel processing on a task requested by a client using meta information required for distributed task processing; and at least one cluster management node managing the task nodes in group units and managing the task nodes associated with itself by and performing a cluster operation process and synchronizing the meta information. Therefore, the system may improve task processing performance by processing a task requested by the client using only meta information. The task nodes with a corrected error may be reconnected during operation without stopping a disturbed cluster system after synchronizing information need for task processing without driving the entire cluster again when the error occurs in a specific task node while the cluster is running.

Description

DISTRIBUTED CLUSTER MANAGEMENT SYSTEM AND METHOD FOR THEREOF

본 발명은 분산 클러스터 관리 시스템 및 그 방법에 관한 것으로, 보다 상세하게는 클러스터를 구성하는 작업 노드들은 메타 정보만을 이용하여 클라이언트에서 요청한 작업을 분산 환경에서 다수의 작업 노드들이 협업하여 동시에 처리할 수 있는 분산 클러스터 관리 시스템 및 그 방법에 관한 것이다.The present invention relates to a distributed cluster management system and a method thereof, and more particularly, the working nodes constituting the cluster can simultaneously process a plurality of working nodes in a distributed environment by simultaneously working with a client request using only meta information. A distributed cluster management system and method thereof are provided.

클러스터 시스템(Cluster system)은 네트워크로 연결된 단일 컴퓨팅들의 집합을 통해 하나의 작업을 공동으로 처리하는 것을 의미한다. 슈퍼컴퓨터가 다수의 CPU를 내부적으로 연결하고, 각각의 CPU들이 공통의 가상 메모리에 접근하도록 제작되어 있다면, 클러스터 시스템은 각각의 연결을 네트워크를 통해 구현한다는 점이 가장 큰 차이점이라고 할 수 있다. Cluster system refers to the joint processing of a task through a single set of networked computing. The biggest difference is that if a supercomputer connects multiple CPUs internally and each CPU is designed to access a common virtual memory, the cluster system implements each connection through the network.

클러스터 시스템은 일반적인 서버를 이용해 구성할 수 있기 때문에 슈퍼컴퓨터와 비교해서 구축 비용이 적게 든다는 장점이 있다. 사용자의 필요에 따라 시스템을 구축할 수 있으며, 연결된 서버의 수를 늘리는 것으로 쉽게 성능을 향상시킬 수 있는 장점이 있지만, 반면에 단일 시스템에 비해 유지 보수가 힘들고, 시스템의 성능이 네트워크 성능에 많은 영향을 받는다는 단점이 있다.Cluster systems have the advantage of being less expensive to build than supercomputers because they can be configured using common servers. You can build the system according to your needs, and increase the number of connected servers to improve performance easily, but it is more difficult to maintain than a single system, and the performance of the system affects the network performance. There is a downside to receiving it.

이러한 클러스터 시스템은 동종 또는 이종 노드들을 네트워크로 연결하여 시스템을 확장하는 방법으로 클러스터 내의 개별 시스템을 서버 또는 노드라고 한다. 부하 분산 클러스터 시스템은 클라이언트로부터의 요청을 클러스터 내의 다른 노드로 할당하는 역할을 수행하는 부하 분배기와, 요청에 대해 서비스를 제공하는 노드로 구성된다. 부하 분산 클러스터 시스템에서 부하 분배기는 서비스 요청이 있을 경우에 특정한 알고리즘을 바탕으로 적절한 노드를 선발하여 그 요청을 처리하게 된다. 즉, 하나의 작업을 여러 대의 노드가 나누어 처리하는 것이 아니고, 분산 알고리즘에 의해 선택된 노드가 배정받은 작업 전체를 처리한다. 그러므로 병렬 시스템과는 달리 쇄도하는 모든 요청이 여러 대의 노드에 골고루 할당되어 부하가 분산되도록 한다. Such a cluster system is a method of expanding a system by connecting homogeneous or heterogeneous nodes to a network, and individual systems in the cluster are called servers or nodes. The load balancing cluster system is composed of a load distributor that serves to allocate a request from a client to another node in the cluster, and a node that provides a service for the request. In a load balancing cluster system, when there is a service request, the load divider selects an appropriate node based on a specific algorithm and processes the request. In other words, one node is not divided by several nodes and processed by the distributed algorithm. Therefore, unlike a parallel system, all flooding requests are evenly distributed across multiple nodes, allowing for load balancing.

이러한 부하 분산 클러스터 시스템을 관리하기 위해 크게 마스터-슬레이브 방식과 P2P 방식을 사용한다. 먼저, 마스터-슬레이브 방식의 분산 클러스터 관리 시스템은 클러스터 시스템을 관리하는 마스터 노드와 작업 처리를 담당하는 슬레이브 노드로 구성되고, P2P 방식의 분산 클러스터 관리 시스템은 모든 작업 노드들이 동등한 수준에서 작업 처리 동작을 수행하도록 한다. The master-slave and P2P schemes are largely used to manage such a load balancing cluster system. First, a master-slave distributed cluster management system is composed of a master node managing a cluster system and a slave node for job processing. A peer-to-peer distributed cluster management system performs a job processing operation at an equal level. Do it.

종래의 부하 분산 클러스터 시스템은 분산 환경에서 마스터-슬레이브 방식을 적용하여 작업 처리를 수행하는 슬레이브 노드에서 마스터 노드를 경유하도록 함으로써 처리 성능의 저하를 초래하고, 마스터 노드의 오류 발생시 전체 클러스터를 재구동해야하는 문제점이 있다. In a conventional load balancing cluster system, the master node is applied in a distributed environment, and the slave node performing the work processing passes through the master node, resulting in a decrease in processing performance, and when the master node fails, the entire cluster needs to be restarted. There is a problem.

한편, 종래의 P2P 방식을 적용한 부하 분산 클러스터 시스템은 클러스터 관리를 위한 작업 수행시 작업 노드들 중에 마스터 노드를 선정하기 위한 추가 작업이 필요하다는 문제점이 있다.On the other hand, the load balancing cluster system applying the conventional P2P scheme has a problem that additional work for selecting the master node among the work nodes is required when performing the task for cluster management.

한국공개특허 제10-2017-0102725호 " 데이터 관리 시스템 및 방법 "Korean Laid-open Patent No. 10-2017-0102725 "Data Management System and Method" 한국등록특허 제10-1035857호 " 데이터 관리 방법 및 그 시스템 "Korean Patent Registration No. 10-1035857 "Data Management Method and System"

본 발명은 클러스터를 작업을 처리하는 작업 노드와 작업 노드들을 관리하는 클러스터 관리 노드로 구성하고, 클러스터 관리 노드를 통해 각 작업 노드들의 작업 처리에 필요한 정보들을 동기화하여 관리함으로써 작업 노드들은 메타 정보만을 이용하여 클라이언트에서 요청한 작업을 분산 환경에서 다수의 작업 노드들이 협업하여 동시에 처리할 수 있는 분산 클러스터 관리 시스템 및 그 방법을 제공한다. According to the present invention, a cluster consists of a work node processing a work and a cluster management node managing the work nodes, and the work nodes use only meta information by synchronizing and managing information required for work processing of each work node through the cluster management node. The present invention provides a distributed cluster management system and a method for cooperatively processing a plurality of work nodes simultaneously in a distributed environment.

실시예들 중에서, 분산 클러스터 관리 시스템은, 분산 작업 처리에 필요한 메타 정보를 이용하여 클라이언트에서 요청한 작업을 병렬 처리하는 적어도 하나 이상의 작업 노드; 및 상기 작업 노드들을 그룹 단위로 관리하고, 클러스터 운용 프로세스를 진행하여 상기 메타 정보들을 동기화하여 자신과 연결된 작업 노드를 관리하는 적어도 하나 이상의 클러스터 관리 노드를 포함하는 것을 특징으로 한다. In embodiments, a distributed cluster management system includes: at least one job node for parallel processing a job requested by a client using meta information required for distributed job processing; And at least one cluster management node managing the work nodes in group units, and performing a cluster operation process to synchronize the meta information to manage work nodes connected to the work nodes.

상기 클러스터 관리 노드는, 클러스터 설정 파일을 이용하여 클러스터를 구성하는 상기 클러스터 관리 노드와 작업 노드의 디바이스 연결 정보를 추출하고, 상기 클러스터 운용 프로세스를 통해 상기 디바이스 연결 정보를 이용하여 상기 클러스터 관리 노드와 작업 노드를 구동하는 클러스터 운용 관리자; 상기 클러스터 설정 파일을 이용하여 연결된 작업 노드의 상태 정보를 획득한 후 상기 작업 노드에 대한 통합 관리 기능을 수행하는 클러스터 관리자; 및 상기 클러스터 설정 파일을 포함한 클러스터 관리 정보를 저장하는 로컬 저장소를 포함하는 것을 특징으로 한다. The cluster management node extracts device connection information of the cluster management node and the work node constituting the cluster using a cluster configuration file, and works with the cluster management node using the device connection information through the cluster operation process. A cluster operation manager running the node; A cluster manager which acquires state information of a connected work node using the cluster configuration file and then performs an integrated management function for the work node; And a local repository for storing cluster management information including the cluster configuration file.

상기 메타 정보는 네트워크를 통해 접속하기 위한 각 노드의 접속 정보를 포함하고, 상기 클러스터 설정 파일은 클러스터를 구성하는 클러스터 관리 노드와 작업 노드와의 연결을 위한 디바이스 연결 정보를 포함하는 것을 특징으로 한다. The meta information includes connection information of each node for connecting through a network, and the cluster configuration file includes device connection information for connecting a cluster management node and a work node constituting a cluster.

상기 클러스터 운용 관리자는, 각 클러스터 관리 노드가 관리하는 관리 그룹에 속하는 작업 노드에 상기 클러스터 설정 파일을 전송하고, 상기 클러스터 구성 변경시 상기 클러스터 설정 파일을 수정하여 상기 관리 그룹에 속하는 작업 노드에 배포하는 것을 특징으로 한다. The cluster operation manager transmits the cluster configuration file to a work node belonging to a management group managed by each cluster management node, and modifies the cluster configuration file to distribute to a work node belonging to the management group when the cluster configuration is changed. It is characterized by.

상기 클러스터 관리자는, 상기 클라이언트의 요청에 따라 작업 노드와의 연결 또는 연결 해제 기능을 수행하는 작업 노드 연결 매니저; 상기 클러스터 설정 파일로부터 클러스터를 구성하는 클러스터 관리 노드와 작업 노드의 접속 정보를 추출하고, 상기 클러스터 설정 파일의 변경 여부에 따라 상기 작업 노드에 클러스터 설정 파일을 전송하여 각 노드를 구동하는 클러스터 정보 매니저; 상기 클러스터 관리 노드가 관리하는 관리 그룹의 연결된 작업 노드의 상태 정보를 수집하는 작업 노드 매니저; 및 상기 관리 그룹에 속하는 작업 노드의 메타 정보를 동기화하여 공유하는 동기화 매니저를 포함하는 것을 특징으로 한다. The cluster manager may include a work node connection manager that performs a connection or disconnection function with a work node at the request of the client; A cluster information manager which extracts connection information between a cluster management node and a work node constituting the cluster from the cluster configuration file, and transmits a cluster configuration file to the work node according to whether the cluster configuration file is changed; A work node manager for collecting state information of connected work nodes of a management group managed by the cluster management node; And a synchronization manager for synchronizing and sharing meta information of work nodes belonging to the management group.

상기 작업 노드는, 상기 클러스터 관리자와 하트비트(Heartbeat) 프로토콜을 이용하여 상기 작업 노드의 상태를 체크한 후 상태 정보를 메타 정보에 반영하는 클러스터 에이전트; 및 상기 클라이언트의 테이블 생성 요청에 따라 DBMS 엔진과 연동하여 테이블 정보를 생성한 후 테이블 생성 처리 결과를 클러스터 관리 노드에 전송하고, 상기 클러스터 관리 노드의 관리 그룹에 속한 다른 작업 노드들과 테이블 정보를 공유하며, 상기 작업 노드의 메타정보를 포함한 상태 정보를 저장하는 DBMS 엔진을 포함하는 것을 특징으로 한다. The work node may include a cluster agent which checks a state of the work node using the cluster manager and the heartbeat protocol and reflects state information in meta information; And generating table information by interworking with the DBMS engine according to the table creation request of the client, and transmitting the table generation result to a cluster management node, and sharing table information with other work nodes belonging to the management group of the cluster management node. And a DBMS engine for storing state information including meta information of the work node.

상기 클러스터 에이전트는, 상기 작업 노드의 상태 정보를 수집하는 노드 상태 정보 수집기; 및 상기 테이블 정보, 상기 테이블 정보와 연결된 작업 노드의 상태 정보를 동기화하는 노드 정보 동기화 관리자를 포함하는 것을 특징으로 한다.The cluster agent may include: a node state information collector for collecting state information of the work node; And a node information synchronization manager for synchronizing the table information and state information of a work node connected with the table information.

상기 DBMS 엔진은, 상기 테이블 정보와 작업 노드의 상태 정보를 포함한 메타 정보를 관리하는 메타정보 관리 엔진; 상기 클라이언트의 테이블 생성 요청을 수신하여 작업 노드의 상태 정보를 검색한 후 테이블 생성 처리 결과 또는 테이블 생성 처리 오류 정보를 처리하는 질의 처리 엔진; 및 상기 클라이언트의 테이블 생성 처리 결과 또는 테이블 생성 처리 오류 정보를 저장하는 저장 엔진을 포함하는 것을 특징으로 한다.The DBMS engine may include: a meta information management engine managing meta information including the table information and state information of a work node; A query processing engine configured to receive a table creation request of the client, retrieve state information of a work node, and process a table generation processing result or table generation processing error information; And a storage engine for storing table generation processing result or table generation processing error information of the client.

실시예들 중에서, 분산 클러스터 관리 방법은, 클라이언트에서 요청한 작업을 분산 환경에서 다수의 노드가 처리하는 분산 클러스터 관리 시스템에 의해 수행되는 분산 클러스터 관리 방법에 있어서, 클라이언트에서 요청한 작업을 처리하는 적어도 하나 이상의 작업 노드와, 상기 작업 노드들을 그룹 단위로 관리하고, 클러스터 운용 프로세스를 진행하여 자신과 연결된 작업 노드를 관리하는 적어도 하나 이상의 클러스터 관리 노드로 클러스터를 구성하고, 상기 클러스터의 각 노드에 대한 디바이스 연결 정보를 포함하는 클러스터 설정 파일을 저장하는 제1 단계; 상기 클라이언트의 작업 처리 요청시, 상기 클러스터 운용 프로세스에 의해 현재 구동 중인 상기 클러스터 관리 노드의 접속 정보를 획득하고, 상기 클러스터 관리 노드가 관리하는 관리 그룹에 속하는 작업 노드의 접속 정보를 포함한 로컬 정보를 추출하는 제2 단계; 상기 로컬 정보가 상기 클러스터 설정 파일의 접속 정보와 매핑되는 경우에, 상기 클러스터 관리 노드에 연결된 다수의 작업 노드에 클러스터 설정 파일을 전송하는 제3 단계; 상기 클러스터 설정 파일을 이용하여 연결된 작업 노드의 상태 정보를 획득한 후 상기 작업 노드에 대한 통합 관리 기능을 수행하는 제4 단계; 및 상기 작업 노드의 상태 정보를 이용하여 클라이언트와 연결할 적어도 하나 이상의 작업 노드를 선별하여 구동하는 제5 단계를 포함하는 것을 특징으로 한다. Among the embodiments, the distributed cluster management method is a distributed cluster management method performed by a distributed cluster management system in which a plurality of nodes in a distributed environment process the work requested by the client, wherein at least one or more processes of the work requested by the client are performed. A cluster consists of a work node and at least one cluster management node managing the work nodes in groups, and managing a work node connected to the work node through a cluster operation process, and device connection information for each node of the cluster. A first step of storing a cluster configuration file including a; When requesting a job processing of the client, obtains connection information of the cluster management node currently running by the cluster operation process, and extracts local information including connection information of a work node belonging to a management group managed by the cluster management node. A second step of doing; Transmitting the cluster configuration file to a plurality of work nodes connected to the cluster management node when the local information is mapped to the connection information of the cluster configuration file; A fourth step of performing integrated management function for the work node after acquiring state information of a connected work node using the cluster configuration file; And a fifth step of selecting and driving at least one work node to be connected to a client using state information of the work node.

상기 제3 단계는 상기 로컬 정보가 상기 클러스터 설정 파일의 접속 정보가 매핑되지 않는 경우에, 상기 클러스터 운용 프로세스를 종료하는 것을 특징으로 한다. The third step is characterized in that the cluster operation process is terminated when the local information does not map the access information of the cluster configuration file.

상기 제2 단계는, 상기 클라이언트가 상기 클러스터 관리 노드에 연결 요청하는 단계; 상기 클러스터 관리 노드는 작업 노드의 상태 정보를 검색하고, 상기 작업 노드의 상태 정보를 이용하여 상기 클라이언트에 연결할 작업 노드를 선택한 후 상기 선택된 작업 노드 정보를 상기 클라이언트에 전송하는 단계; 및 상기 클라이언트는 상기 작업 노드 정보를 이용하여 작업 노드의 DBMS 엔진과 연결한 후 상기 DBMS 엔진에 작업을 요청하는 단계를 포함하는 것을 특징으로 한다. The second step includes the client requesting to connect to the cluster management node; The cluster management node retrieving state information of a work node, selecting a work node to connect to the client using the state information of the work node, and then transmitting the selected work node information to the client; And requesting a job from the client after connecting to the DBMS engine of the work node using the work node information.

상기 작업 노드의 상태 정보는 각 작업 노드에 연결된 클라이언트 개수를 포함한 작업 상태 정보이고, 상기 작업 노드 정보는 해당 작업 노드에 연결하기 위한 디바이스 연결 정보를 포함하는 것을 특징으로 한다. The state information of the work node is work state information including the number of clients connected to each work node, and the work node information includes device connection information for connecting to the work node.

상기 제4 단계는, 상기 클러스터 관리 노드는 기설정된 시간 간격으로 상기 작업 노드로 하트비트 메시지를 요청하는 단계; 상기 작업 노드는 하트비트 메시지 요청에 따른 하트비트 메시지를 기설정된 전송시간 이내에 상기 클러스터 관리 노드로 전송하는 단계; 상기 클러스터 관리 노드는 기설정된 전송시간 이내에 하트비트 메시지가 수신되지 않는 작업 노드는 오류 발생 작업 노드로 판단한 후 오류 정보로 기록하고, 자신의 관리 그룹에 속한 다른 작업 노드의 DBMS 엔진의 메타 정보에 오류 정보를 실시간 반영하는 단계; 및 상기 클러스터 관리 노드는 기설정된 전송시간 이내에 하트비트 메시지가 수신된 작업 노드들의 상태 정보를 수집하고, 자신의 관리 그룹에 속한 모든 작업 노드의 상태 정보를 동기화하여 공유하는 단계를 포함하는 것을 특징으로 한다. The fourth step may include: requesting, by the cluster management node, a heartbeat message to the work node at a predetermined time interval; The work node transmitting a heartbeat message according to a heartbeat message request to the cluster management node within a predetermined transmission time; The cluster management node determines that a work node that does not receive a heartbeat message within a predetermined transmission time is determined to be an error-producing work node and records it as error information, and errors in meta information of DBMS engines of other work nodes in its management group. Reflecting information in real time; And collecting, by the cluster management node, the status information of the work nodes receiving the heartbeat message within a preset transmission time, and synchronizing and sharing the status information of all work nodes belonging to its management group. do.

상기 제1 단계는, 상기 클러스터 구성이 변경된 경우에 클러스터 관리 노드에 대한 클러스터 설정 파일을 수정하여 자동으로 배포하는 단계를 더 포함하는 것을 특징으로 한다. The first step may further include modifying and automatically distributing a cluster configuration file for a cluster management node when the cluster configuration is changed.

제5 단계에서 테이블에 속하는 레코드들을 다수의 작업 노드에 분산 저장하기 위한 상기 클라이언트의 테이블 생성 질의가 요청된 경우에, 상기 클러스터 운영 프로세스는 상기 테이블 생성 질의문을 수신하는 제5-1 단계; 상기 테이블 생성 질의문을 최초 수신한 제1 작업 노드는 테이블 생성 질의문에 포함된 작업 노드의 상태 정보를 검색하는 제5-2 단계; 상기 작업 노드의 상태 정보에 오류가 없는 경우에 제1 작업 노드의 DBMS 엔진은 상기 테이블 생성 질의문 수신 상태를 그룹 내의 모든 작업 노드에 전송하고, 해당 작업 노드를 관리하는 클러스터 관리 노드에 상기 테이블 생성 질의문 수신 상태를 통지하는 제5-3 단계; 상기 클러스터 관리 노드는 자신의 관리 그룹 내의 모든 작업 노드의 테이블 정보를 공유하기 위해 모든 작업 노드에 테이블 생성 질의문을 전달하고 기설정된 대기 시간 동안 대기하는 제5-4 단계; 상기 테이블 생성 질의문을 수신한 작업 노드는 DBMS 엔진과 연동하여 테이블 정보를 생성한 후 테이블 생성 처리 결과 메시지를 상기 클러스터 관리 노드에 전달하는 제5-5 단계; 및 상기 클러스터 관리 노드는 기설정된 대기 시간 이내에 모든 작업 노드에서 상기 테이블 생성 처리 결과 메시지가 전송되면 테이블 정보 동기화를 수행하는 제5-6 단계를 포함하는 것을 특징으로 한다. In a fifth step, when the table generation query of the client for distributing and storing records belonging to a table to a plurality of work nodes is requested, the cluster operation process may include receiving the table generation query statement; Step 5-2, in which the first work node that first received the table creation query retrieves state information of the work node included in the table creation query; When there is no error in the state information of the work node, the DBMS engine of the first work node transmits the table creation query reception state to all work nodes in the group, and generates the table in the cluster management node managing the work node. Step 5-3 of notifying a query reception state; Step 5-4 of the cluster management node forwarding a table generation query statement to all work nodes to share table information of all work nodes in its management group and waiting for a preset waiting time; Step 5-5 of receiving the table generation query statement to generate table information in association with a DBMS engine and then transmitting a table generation result message to the cluster management node; And a fifth step of the cluster management node performing table information synchronization when the table generation process result message is transmitted to all work nodes within a preset waiting time.

상기 테이블 생성 질의문은 기설정된 칼럼에 대해 조건에 따라서 레코드를 저장할 작업노드 정보를 포함하는 것을 특징으로 한다. The table generation query statement may include work node information for storing a record according to a condition of a predetermined column.

상기 제5-3 단계는, 상기 레코드를 저장하기 위한 작업 노드들 중에서 적어도 1개 이상의 작업 노드의 상태 정보가 오류 정보인 경우에 테이블 생성 처리 오류 정보를 반환하는 단계를 더 포함하는 것을 특징으로 한다. The step 5-3 further includes returning table generation processing error information when the state information of at least one of the work nodes for storing the record is error information. .

상기 제5-6 단계는, 상기 클러스터 관리 노드는 상기 테이블 생성 처리 오류 정보를 반환하는 시점에서 오류 발생 작업 노드의 식별 정보, 시간 정보, 테이블 생성 요청 정보를 포함한 재접속 동기화 정보를 저장하는 것을 특징으로 한다. In step 5-6, the cluster management node stores reconnection synchronization information including identification information, time information, and table creation request information of the error generating work node at the time point of returning the table generation processing error information. do.

상기 제5-6 단계는, 상기 클러스터 관리 노드는 기설정된 시간 이내에 상기 테이블 생성 처리 결과 메시지가 전송되지 않은 작업 노드가 존재하는 경우에 테이블 생성 요청 처리 오류 상태로 판단하고, 모든 작업 노드에 테이블 생성 요청 처리 결과를 삭제한 후 상기 제1 작업 노드에 오류 정보를 전달하는 단계를 더 포함하는 것을 특징으로 한다. In step 5-6, the cluster management node determines that a table creation request processing error state exists when there is a work node to which the table creation process result message is not transmitted within a preset time, and creates a table in all work nodes. The method may further include transmitting error information to the first work node after deleting the request processing result.

상기 제5-6 단계는, 상기 클러스터 관리 노드는 상기 제1 작업 노드가 상기 테이블 생성 질의문을 수신한 시점에 오류가 발생한 오류 발생 작업 노드의 경우에, 상기 오류 발생 작업 노드는 오류 수정 후 상기 재접속 동기화 정보를 이용하여 상기 테이블 정보를 동기화하여 재접속을 수행하는 재접속 수행 단계를 더 포함하는 것을 특징으로 한다. In step 5-6, the cluster management node is an error-producing work node in which an error occurs at the time when the first work node receives the table generation query, and the error-producing work node is configured to correct the error. The method may further include reconnecting to perform reconnection by synchronizing the table information using reconnection synchronization information.

상기 오류 발생 작업 노드는 오류 수정 후에 DBMS 엔진의 메타 정보에 대한 업데이트를 진행하는 것을 특징으로한다. The error generating task node may update meta information of the DBMS engine after the error is corrected.

상기 재접속 수행 단계는, 상기 오류 발생 작업 노드는 상기 클러스터 관리 노드에 재연결 요청 메시지를 송신하는 단계; 상기 클러스터 관리 노드는 테이블 생성 로그 정보와 현재 구동중인 작업 노드의 상태 정보를 상기 오류 발생 작업 노드에 전송하는 단계; 상기 오류 발생 작업 노드는 다른 작업 노드의 상태 정보를 업데이트 하고, 상기 테이블 생성 로그 정보를 이용하여 순차적으로 테이블 생성 작업을 수행하여 테이블 정보를 업데이트한 후에 테이블 생성 처리 결과 메시지를 상기 클러스터 관리 노드에 전송하는 단계; 상기 클러스터 관리 노드는 다른 작업 노드에 오류 수정 후 재접속한 재접속 작업 노드의 상태 정보에 대한 업데이트를 요청하는 단계; 및 상기 재접속 작업 노드를 제외한 다른 작업 노드는 작업 노드의 상태 정보를 업데이트하는 단계를 포함하는 것을 특징으로 한다. The reconnecting step may include: transmitting, by the error generating work node, a reconnection request message to the cluster management node; Transmitting, by the cluster management node, table generation log information and state information of a currently running work node to the failed work node; The error-producing work node updates the state information of another work node, sequentially performs table creation by using the table generation log information, updates table information, and then sends a table creation processing result message to the cluster management node. Doing; Requesting, by the cluster management node, an update on status information of a reconnected work node reconnected after error correction to another work node; And updating the status information of the work node other than the work node except for the reconnection work node.

본 발명의 분산 클러스터 관리 시스템 및 그 방법은, 작업 노드들은 메타 정보만을 이용하여 클라이언트가 요청한 작업을 처리함으로써 작업 처리 성능을 향상시킬 수 있고, 클러스터 운영 중에 특정한 작업 노드에 오류 발생시 전체 클러스터를 재구동하지 않고 작업 처리에 필요한 정보들을 동기화한 후 분산 클러스터 시스템의 중지 없이 동작 중에 오류가 수정된 작업 노드가 재접속할 수 있는 효과가 있다.In the distributed cluster management system and method of the present invention, the work nodes can improve work processing performance by processing work requested by a client using only meta information, and restart the entire cluster when an error occurs in a specific work node during cluster operation. After synchronizing the information necessary for processing a job without the need to stop the distributed cluster system, the error-corrected job node can be reconnected.

도 1은 본 발명의 일 실시예에 따른 분산 클러스터 관리 시스템을 설명하는 도면이다.
도 2는 도 1의 클러스터 관리 노드와 작업 노드의 구성을 설명하는 도면이다.
도 3은 본 발명의 일 실시예에 따른 분산 클러스터 관리 방법을 설명하는 순서도이다.
도 4는 본 발명의 일 실시예에 따른 클라이언트 연결 과정을 설명하는 순서도이다.
도 5는 도 4의 작업노드의 상태정보를 동기화하는 과정을 설명하는 순서도이다.
도 6은 본 발명의 일 실시예에 따른 클라이언트의 테이블 생성 요청을 처리하는 과정을 설명하는 순서도이다.
도 7은 본 발명의 일 실시예에 따른 오류 발생 작업 노드의 재접속하는 과정을 설명하는 순서도이다. 1 is a diagram illustrating a distributed cluster management system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating the configuration of the cluster management node and the work node of FIG. 1.
3 is a flowchart illustrating a distributed cluster management method according to an embodiment of the present invention.
4 is a flowchart illustrating a client connection process according to an embodiment of the present invention.
5 is a flowchart illustrating a process of synchronizing state information of a work node of FIG. 4.
6 is a flowchart illustrating a process of processing a table creation request of a client according to an embodiment of the present invention.
7 is a flowchart illustrating a process of reconnecting an error generating work node according to an embodiment of the present invention.

본 발명에 기재된 실시예 및 도면에 도시된 구성은 본 발명의 바람직한 실시예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 표현하는 것은 아니므로, 본 발명의 권리범위는 본문에 설명된 실시예 및 도면에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Configurations shown in the embodiments and drawings described in the present invention are merely preferred embodiments of the present invention, and do not represent all of the technical idea of the present invention, the scope of the invention is the embodiments and drawings described in the text It should not be construed as limited by That is, since the embodiments may be variously modified and may have various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical idea. In addition, the objects or effects presented in the present invention does not mean that a specific embodiment should include all or only such effects, the scope of the present invention should not be understood as being limited thereby.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 발명에서 명백하게 정의하지 않는 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. The terms defined in the commonly used dictionary should be interpreted to correspond with the meanings in the context of the related art, and should not be interpreted as having an ideal or excessively formal meaning not explicitly defined in the present invention.

도 1은 본 발명의 일 실시예에 따른 분산 클러스터 관리 시스템을 설명하는 도면이다.1 is a diagram illustrating a distributed cluster management system according to an embodiment of the present invention.

도 1을 참고하면, 분산 클러스터 시스템(100)은 클라이언트(400)에서 요청한 작업을 분산 환경에서 동시에 처리하기 위해 적어도 하나 이상의 클러스터 관리 노드(200) 및 다수의 작업 노드(300)로 클러스터를 구성한다. Referring to FIG. 1, the distributed cluster system 100 configures a cluster with at least one cluster management node 200 and a plurality of work nodes 300 to simultaneously process a work requested by a client 400 in a distributed environment. .

적어도 하나 이상의 클러스터 관리 노드(200)는 작업 노드(300)들을 그룹 단위로 관리하고, 클러스터 운용 프로세스를 진행하여 각 작업 노드(300)의 메타 정보들을 동기화하여 자신과 연결된 작업 노드(300)들을 관리한다. The at least one cluster management node 200 manages the work nodes 300 as a group, and proceeds with a cluster operation process to synchronize meta information of each work node 300 to manage the work nodes 300 connected to the work node 300. do.

메타 정보는 작업 노드(300)의 접속 정보를 포함한다. 여기서 메타 정보는 데이터에 대한 데이터라고 할 수 있는데, 데이터 또는 데이터 셋(data set)을 효율적으로 접근하고 관리할 수 있도록 해 주는 데이터에 대한 정보를 총칭한다. 광의로 메타 정보는 데이터의 생성에 따른 기본적 내용, 질적 요소, 문서 구조, 구현 기법, 참조 정보, 데이터 내용에 관한 서술 정보, 데이터 접근, 획득, 배포, 활용에 관한 정보, 생성자, 관리자 정보 등을 광범위하게 기술하는 데이터 셋 또는 정보를 의미한다.The meta information includes connection information of the work node 300. Here, meta information may be referred to as data about data, and generically refers to information about data that enables efficient access and management of data or data sets. In general, meta information includes basic contents, qualitative factors, document structure, implementation techniques, reference information, narration information about data contents, data access, acquisition, distribution, and utilization, creator, and administrator information. Refers to a data set or information that is widely described.

적어도 하나 이상의 작업 노드(300)는 작업 처리에 필요한 메타 정보를 이용하여 클라이언트(400)에서 요청한 작업을 동시에 병렬 처리한다.At least one job node 300 simultaneously processes the job requested by the client 400 using meta information necessary for job processing.

클러스터 관리 노드(200)는 자신이 관리하는 관리 그룹의 작업 노드(300)에 대한 클러스터 관리 기능을 수행하고, 작업 노드(300)는 동일한 그룹에 속하는 이웃 작업 노드(300)와 연동하여 클라이언트(400)에서 요청한 작업을 동시에 병렬 처리한다. The cluster management node 200 performs a cluster management function for the work node 300 of the management group managed by the cluster management node 200, and the work node 300 works with the neighboring work node 300 belonging to the same group as the client 400. Parallelize the tasks requested in

경우에 따라, 작업 노드(300)는 자신의 그룹을 관리하는 클러스터 관리 노드(200)가 아닌 다른 클러스터 관리 노드(200)에 연결된 작업 노드(300)들과 연동하여 작업을 수행할 수도 있다. In some cases, the work node 300 may work in conjunction with the work nodes 300 connected to the cluster management node 200 other than the cluster management node 200 managing its group.

도 2는 도 1의 클러스터 관리 노드와 작업 노드의 구성을 설명하는 도면이다.FIG. 2 is a diagram illustrating the configuration of the cluster management node and the work node of FIG. 1.

도 2를 참고하면, 클러스터 관리 노드(200)는 클러스터 운용 관리자(210), 클러스터 관리자(220) 및 로컬 저장소(230)를 포함한다.Referring to FIG. 2, the cluster management node 200 includes a cluster operations manager 210, a cluster manager 220, and a local repository 230.

클러스터 운용 관리자(210)는 클러스터 설정 파일을 이용하여 클러스터를 구성하는 디바이스 연결 정보(예를 들어, IP 주소 등의 접속 정보)를 추출하고, 이 디바이스 연결 정보를 통해 클러스터 운용 프로세스를 진행하여 다수의 클러스터 관리 노드(200) 및 작업 노드(300)를 원격 구동한다. The cluster operation manager 210 extracts device connection information (for example, connection information such as an IP address) constituting the cluster by using the cluster configuration file, and proceeds with the cluster operation process through the device connection information. Remotely drives the cluster management node 200 and the work node 300.

클러스터 운용 관리자(210)는 클러스터 관리 노드(200)의 관리 그룹에 속하는 작업 노드(300)에 클러스터 설정 파일을 전송하고, 클러스터 구성 변경시 클러스터 설정 파일을 수정하여 관리 그룹에 속하는 모든 작업 노드(300)에 자동으로 배포한다. The cluster operation manager 210 transmits the cluster configuration file to the work node 300 belonging to the management group of the cluster management node 200, and modifies the cluster configuration file when the cluster configuration is changed to all the work nodes 300 belonging to the management group. To be automatically deployed).

클러스터 관리자(220)는 자신이 관리하는 그룹의 작업 노드(300)에 대한 통합 관리 기능을 수행한다. 이때, 클러스터 관리자(220)는 연결된 작업 노드(300)의 상태 정보를 획득하기 위하여 클러스터 설정 파일을 이용한다.The cluster manager 220 performs an integrated management function for the work node 300 of the group managed by the cluster manager 220. In this case, the cluster manager 220 uses the cluster configuration file to obtain status information of the connected work node 300.

이러한 클러스터 관리자(220)는 작업 노드 연결 매니저(221), 클러스터 정보 매니저(222), 작업 노드 매니저(223) 및 동기화 매니저(224)를 포함한다.The cluster manager 220 includes a work node connection manager 221, a cluster information manager 222, a work node manager 223, and a synchronization manager 224.

작업 노드 연결 매니저(221)는 클라이언트(400)의 요청에 따라 작업 노드(300)와의 연결 또는 연결 해제 기능을 수행한다. The work node connection manager 221 performs a connection or disconnection with the work node 300 at the request of the client 400.

클러스터 정보 매니저(222)는 클러스터 설정 파일로부터 클러스터를 구성하는 클러스터 관리 노드(200)와 작업 노드(300)의 접속 정보를 추출하고, 클러스터 설정 파일의 변경 여부에 따라 작업 노드(300)에 클러스터 설정 파일을 전송하여 구동한다.The cluster information manager 222 extracts connection information between the cluster management node 200 and the work node 300 constituting the cluster from the cluster configuration file, and sets the cluster on the work node 300 according to whether the cluster configuration file is changed. Run by transferring the file.

작업 노드 매니저(223)는 관리 그룹에 연결된 작업 노드(300)의 상태 정보를 수집한다.The work node manager 223 collects state information of the work node 300 connected to the management group.

동기화 매니저(224)는 클러스터 관리 노드(200)의 관리 그룹에 속하는 작업 노드(300)의 메타 정보를 동기화하여 공유한다. The synchronization manager 224 synchronizes and shares meta information of the work node 300 belonging to the management group of the cluster management node 200.

한편, 작업 노드(300)는 클러스터 에이전트(310)와 DBMS 엔진(320)을 포함한다.Meanwhile, the work node 300 includes a cluster agent 310 and a DBMS engine 320.

클러스터 에이전트(310)는 클러스터 관리자(220)와 하트비트(Heartbeat) 프로토콜을 이용하여 작업 노드(300)의 상태 정보를 체크한 후 상태 정보를 메타 정보에 실시간 반영한다. 이러한 클러스터 에이전트(310)는 작업 노드의 상태 정보를 수집하는 노드 상태 정보 수집기(311)와 테이블 정보 및 테이블 정보와 연결된 작업 노드의 상태 정보를 동기화하는 노드 정보 동기화 관리자(312)를 포함한다. The cluster agent 310 checks the state information of the work node 300 using the cluster manager 220 and the heartbeat protocol and reflects the state information in meta information in real time. The cluster agent 310 includes a node state information collector 311 for collecting state information of a work node and a node information synchronization manager 312 for synchronizing table information and state information of a work node connected with the table information.

DBMS 엔진(320)은 클라이언트(400)에서 요청한 테이블 생성 요청을 수신한 후 DBMS 엔진(320)과 연동하여 테이블 정보를 생성한 후 테이블 생성 처리 결과를 클러스터 관리 노드(200)에 전송하고, 클러스터 관리 노드(200)의 관리 그룹에 속한 다른 작업 노드(300)들과 테이블 정보를 공유하며, 메타정보와 작업 노드의 상태 정보, 오류 정보 등을 저장한다. The DBMS engine 320 receives the table creation request requested by the client 400, generates table information in association with the DBMS engine 320, and then transmits the table generation processing result to the cluster management node 200, and manages the cluster. The table information is shared with other work nodes 300 belonging to the management group of the node 200 and stores meta information, state information of the work node, error information, and the like.

이때, DBMS 엔진(320)은 테이블 정보와 작업 노드의 상태 정보를 포함한 메타 정보를 관리하는 메타정보 관리 엔진(321), 클라이언트(400)의 테이블 생성 질의를 수신하면 질의에 포함된 작업 노드의 상태 정보를 검색하여 테이블 생성 처리 결과 또는 테이블 생성 처리 오류 정보를 처리하는 질의 처리 엔진(322) 및 클라이언트(400)의 테이블 생성 처리 결과를 저장하는 저장 엔진(323)을 포함한다. At this time, the DBMS engine 320 receives the table creation query of the meta information management engine 321 and the client 400 managing the meta information including the table information and the status information of the work node, and the state of the work node included in the query. A query processing engine 322 for retrieving information and processing table generation processing error information or table generation processing error information, and a storage engine 323 for storing the table generation processing result of the client 400.

한편, 클라이언트(400)는 클러스터 관리자(220)로부터 수신한 디바이스 연결정보(예를 들어, 네트워크 주소 등)를 이용하여 작업 노드의 DBMS 엔진(320)에 연결한다. 응용 프로그램(420)은 클러스터 연결 관리자(410)를 통해 DBMS 엔진(320)에 작업을 요청하며, 분산 클라이언트 관리 시스템(100)은 다수의 작업 노드(300)들이 협업하여 클라이언트(400)에서 요청한 작업을 처리한다. Meanwhile, the client 400 connects to the DBMS engine 320 of the work node using the device connection information (for example, a network address) received from the cluster manager 220. The application 420 requests a job to the DBMS engine 320 through the cluster connection manager 410, and the distributed client management system 100 cooperates with a plurality of job nodes 300 to request jobs from the client 400. To deal with.

도 3은 본 발명의 일 실시예에 따른 분산 클러스터 관리 방법을 설명하는 순서도이다.3 is a flowchart illustrating a distributed cluster management method according to an embodiment of the present invention.

도 3을 참고하면, 분산 클러스터 관리 방법은, 클러스터 관리 노드(200)가 클러스터 운용 관리자(210)에 의해 클러스터 운용 프로세스를 구동하면, 작업 노드(300)의 접속 정보를 추출하고, 현재 구동중인 클러스터 관리 노드(200)의 접속 정보를 로컬 정보로 획득한다.(S310) 이때, 접속 정보는 IP 주소 또는 인피니밴드(infiniband) 주소 등 네트워크를 통하여 접속하기 위해서 필요한 주소 정보를 의미한다. Referring to FIG. 3, in the distributed cluster management method, when the cluster management node 200 drives the cluster operation process by the cluster operation manager 210, extracts the connection information of the work node 300, and currently runs the cluster. The access information of the management node 200 is obtained as local information (S310). At this time, the access information refers to address information necessary for accessing through a network such as an IP address or an infiniband address.

클러스터 운용 관리자(210)는 클러스터 설정 파일로부터 클러스터를 구성하는 클러스터 관리 노드(200) 주소 정보 및 작업 노드(300)들의 주소 정보를 추출한다. 현재 구동중인 장치의 로컬 정보가 클러스터 설정 파일(220)의 클러스터 관리 노드(200)의 접속 정보에 매핑되지 않는 경우 분산 클러스터 시스템(100)의 구동을 종료한다. 그러나 현재 구동중인 장치의 로컬 정보가 클러스터 설정 파일(220)의 클러스터 관리 노드(200)의 접속 정보와 동일할 경우에 클러스터 설정 파일(220)을 연결 되어 있는 모든 작업 노드(300)에 전송한다.(S320, S330)The cluster operation manager 210 extracts the cluster management node 200 address information and the address information of the work nodes 300 constituting the cluster from the cluster configuration file. If the local information of the currently running device is not mapped to the connection information of the cluster management node 200 of the cluster configuration file 220, the driving of the distributed cluster system 100 is terminated. However, when the local information of the currently running device is the same as the connection information of the cluster management node 200 of the cluster configuration file 220, the cluster configuration file 220 is transmitted to all the work nodes 300 connected thereto. (S320, S330)

클러스터 운용 관리자(210)는 현재 구동중인 클러스터 관리 노드(200)의 클러스터 관리자(220)와 로컬 저장소(230)를 구동함으로써 연결된 작업 노드(300)의 클러스터 에이전트(310)와 연동하여 클라이언트(400)에서 요청한 작업을 처리에 필요한 정보들을 동기화를 수행하여 클러스터 관리를 수행한다.(S340) 클러스터 운용 관리자(210)는 현재 구동 중인 클러스터 관리 노드(200)에 연결되어 있는 작업 노드의 클러스터 에이전트(310)와 DBMS 엔진(320) 프로세스를 원격으로 구동하여 클라이언트(400)에서 요청한 작업을 분산 환경에서 동시에 처리한다.(S350)The cluster operation manager 210 operates the cluster manager 220 of the currently running cluster management node 200 and the local storage 230 to interoperate with the cluster agent 310 of the work node 300 connected to the client 400. The cluster management is performed by synchronizing the information required for processing the job requested in step S340. The cluster operation manager 210 is a cluster agent 310 of a work node connected to the currently running cluster management node 200. And the DBMS engine 320 process by remotely processing the work requested by the client 400 in a distributed environment at the same time (S350).

따라서, 클러스터 운용 프로세스는 클러스터 구성이 변경될 경우에 클러스터 관리 노드(200)에 한하여 클러스터 설정파일(220)을 수정하여 자동으로 배포함으로써 분산 클러스터 관리의 편이성을 향상될 수 있다. Therefore, the cluster operation process may improve convenience of distributed cluster management by automatically modifying and distributing the cluster configuration file 220 only in the cluster management node 200 when the cluster configuration is changed.

상기와 같이 분산 클러스터 관리 시스템(100)은 다수의 클러스터 관리 노드(200)와 작업 노드(300)들을 구동하고, 클라이언트(400)가 작업 노드 연결 매니저(221)에 접속하여 작업 처리를 진행하도록 한다. 이때, 클라이언트(400)는 클러스터 관리 노드(200)의 메타 정보만 이용하여 연결된다. As described above, the distributed cluster management system 100 drives the plurality of cluster management nodes 200 and the work nodes 300, and allows the client 400 to access the work node connection manager 221 to perform work processing. . At this time, the client 400 is connected using only meta information of the cluster management node 200.

도 4는 본 발명의 일 실시예에 따른 클라이언트 연결 과정을 설명하는 순서도이다. 4 is a flowchart illustrating a client connection process according to an embodiment of the present invention.

도 4를 참고하면, 클라이언트(400)는 클러스터 관리 노드(200)에 접속 연결을 요청한다.(S410) 클라이언트의 연결 요청은 클러스터 관리 노드(200)의 클러스터 관리자(230)에 의해서 처리되고, 클라이언트의 연결 요청을 수신한 클러스터 관리자(230)는 연결된 작업 노드(300)들의 클러스터 에이전트(310)에 작업 노드(300)의 상태정보를 요청한다. Referring to Figure 4, The client 400 requests a connection connection to the cluster management node 200 (S410). The connection request of the client is processed by the cluster manager 230 of the cluster management node 200, and receives the client's connection request. The cluster manager 230 requests status information of the work node 300 from the cluster agent 310 of the connected work nodes 300.

클러스터 에이전트(310)는 DBMS 엔진(320)과 연동하여 작업 노드의 상태 정보를 검색한 후 현재 작업 노드(300)에 연결되어 있는 클라이언트(400)의 개수를 반환한다.(S420) 클러스터 관리자(230)는 다수의 작업 노드(300)들로부터 수집한 상태 정보를 활용하여 클라이언트(400)에 연결할 작업 노드(300) 정보를 반환한다(S430). 클러스터 관리자(220)는 현재 연결되어 있는 클라이언트의 개수가 최소인 작업 노드를 선택하는 것을 기본으로 하지만, 작업 노드의 선택 조건은 다양하게 설정될 수 있다. The cluster agent 310 retrieves the status information of the work node in cooperation with the DBMS engine 320 and returns the number of clients 400 currently connected to the work node 300. (S420) Cluster manager 230 ) Returns the work node 300 information to be connected to the client 400 using the state information collected from the plurality of work nodes 300 (S430). The cluster manager 220 is based on selecting a work node having a minimum number of clients currently connected, but a selection condition of the work node may be variously set.

클라이언트(400)의 클러스터 연결 관리자(410)는 클러스터 관리자(230)로부터 수신한 디바이스 연결 정보를 이용하여 작업 노드(300)의 DBMS 엔진(320)에 연결하고, 클라이언트(400)의 응용 프로그램(420)은 클러스터 연결 관리자(410)를 통하여 DBMS 엔진(320)에 작업을 요청하며, 분산 클라이언트 시스템(100)은 다수의 작업 노드(300)들이 협업하여 요청한 작업을 처리하게 된다.(S440, S450)The cluster connection manager 410 of the client 400 connects to the DBMS engine 320 of the work node 300 by using the device connection information received from the cluster manager 230, and the application program 420 of the client 400. ) Requests a job to the DBMS engine 320 through the cluster connection manager 410, the distributed client system 100 is a plurality of work nodes 300 to work in cooperation with the requested job (S440, S450).

이를 위해 각 작업 노드(300)에서 동작하는 DBMS 엔진(320)들이 클라이언트(300)가 요청한 작업을 처리하기 위해서 필요한 정보들을 동기화하여 공유한다. 이때, 주요 동기화 정보는 DBMS 엔진(320)의 테이블 정보와 작업 노드(300)들의 상태 정보이다. 작업 노드(300)들의 상태 정보는 클러스터 관리 노드(200)의 작업노드 관리자(233)에 의해서 수집하여 동기화시 반영된다. To this end, the DBMS engine 320 operating in each work node 300 synchronizes and shares information necessary for processing the work requested by the client 300. In this case, the main synchronization information is table information of the DBMS engine 320 and state information of the work nodes 300. The state information of the work nodes 300 is collected by the work node manager 233 of the cluster management node 200 and reflected at the time of synchronization.

도 5는 도 4의 작업노드의 상태정보를 동기화하는 과정을 설명하는 순서도이다. 5 is a flowchart illustrating a process of synchronizing state information of a work node of FIG. 4.

도 5를 참고하면, 클러스터 관리자(230)는 연결되어 있는 다수의 작업노드(300)들에게 주기적으로 하트비트 메시지를 요청한다.(S510) 클러스터 에이전트(310)는 클러스터 관리자(220)의 하트비트 메시지 요청에 따른 하트비트 메시지를 클러스터 관리자(220)로 송신한다. 하트비트 프로토콜은 정해진 시간 간격을 두고 하트비트 메시지를 물리적으로 연결된 노드에 보내 일정시간 응답이 없으면 오류가 발생한 것으로 판단한다.Referring to FIG. 5, the cluster manager 230 periodically requests a heartbeat message from a plurality of connected work nodes 300 (S510). The cluster agent 310 performs a heartbeat of the cluster manager 220. The heartbeat message according to the message request is transmitted to the cluster manager 220. The heartbeat protocol sends a heartbeat message to a physically connected node at a fixed time interval and determines that an error has occurred if there is no response for a certain time.

즉, 클러스터 관리자(230)는 특정한 작업 노드(300)의 클러스터 에이전트(310)로부터 기설정된 전송 시간 이내에 하트비트 메시지를 수신하면 작업 노드가 정상 상태이고, 기설정된 전송 시간 이내에 하트비트 메시지를 수신하지 못 하는 경우 해당 작업 노드(300)가 오류가 발생한 것으로 판단하여 관련 정보들을 다른 작업 노드(300)의 DBMS 엔진(320)의 글로벌 메타 정보에 반영하는 동시에 클러스터 관리 노드(200)의 로컬 저장소(240)에 오류 정보를 기록한다.(S520, S530) That is, when the cluster manager 230 receives a heartbeat message within a predetermined transmission time from the cluster agent 310 of a specific working node 300, the working node is in a normal state and does not receive a heartbeat message within the predetermined transmission time. If not, the work node 300 determines that an error has occurred and reflects the relevant information in the global meta information of the DBMS engine 320 of the other work node 300 and at the same time, the local storage 240 of the cluster management node 200. Error information is recorded in step S520 and S530.

개별 작업 노드(300)에서 연결된 다수의 작업 노드(300)들과 협력하여 작업 처리시 작업 노드는 메타 정보만을 이용한다. 따라서, 메타 정보가 변경되는 경우에, 클러스터 관리 노드(200)의 클러스터 관리자(230)와 각 작업 노드(300)의 클러스터 에이전트(310)가 협업하여 작업 노드(300)의 DBMS 엔진(320)의 글로벌 메타 정보(321)에 변경된 메타 정보를 실시간으로 반영한다. The work node uses only meta information when processing a work in cooperation with a plurality of work nodes 300 connected in individual work nodes 300. Therefore, when the meta information is changed, the cluster manager 230 of the cluster management node 200 and the cluster agent 310 of each work node 300 collaborate to perform the DBMS engine 320 of the work node 300. The changed meta information is reflected in the global meta information 321 in real time.

도 6은 본 발명의 일 실시예에 따른 클라이언트의 테이블 생성 요청을 처리하는 과정을 설명하는 순서도이다. 6 is a flowchart illustrating a process of processing a table creation request of a client according to an embodiment of the present invention.

DBMS의 동일한 테이블에 속하는 레코드들이 다수의 작업 노드에 분산 저장될 수 있는데, 레코드들을 다수의 작업 노드(300)들에 분산 저장하는 방법은 기존에 제시된 다양한 방법들을 적용할 수 있다. 분산 작업 처리를 위해서 필요한 정보 가운데 가장 중요한 정보는 특정 레코드가 저장되어 있는 작업 노드(300)의 접속 정보이다.Records belonging to the same table of the DBMS may be distributed and stored in a plurality of work nodes. The method of distributing and storing the records in a plurality of work nodes 300 may apply various methods previously presented. The most important information necessary for distributed job processing is connection information of the job node 300 in which a specific record is stored.

도 6을 참고하면, 특정한 작업 노드(300)의 DBMS 엔진(320)은 클라이언트(400)에서 요청한 테이블 생성 질의문을 수신한다.(S610) Referring to FIG. 6, the DBMS engine 320 of a specific work node 300 receives a table generation query requested by the client 400 (S610).

이때, 클라이언트(400)의 테이블 생성 질의문에 특정 칼럼에 대해서 조건에 따라서 레코드를 저장할 작업노드 정보를 포함할 수 있도록 한다. 예를 들어, 칼럼 ‘AGE’에 대해서 0 ~ 20은 1번 작업 노드, 21 ~ 40은 2번 작업노드, 41 ~ 60은 3번 작업노드, 61이상은 5번 작업 노드에 저장하도록 작업 노드 정보를 지정할 수 있다. 이를 고려하여 테이블 생성 질의를 수신한 제1 작업 노드(300)는 테이블 생성 질의문에 포함된 작업노드의 상태 정보를 검색한다. 만약 레코드를 저장할 작업 노드 가운데 1개 이상의 작업 노드 상태가 오류인 경우에, 해당 오류 발생 노드에 대한 테이블 생성 처리 오류 정보를 반환하도록 한다(S620). In this case, the table generation query statement of the client 400 may include work node information for storing a record according to a condition for a specific column. For example, for the column 'AGE', the work node information is stored in work node 0 to 20, work node 21 to 40, work node 2 to 41, work node 3 to 41, and work node 5 to 61 and above. Can be specified. In consideration of this, the first work node 300 receiving the table generation query retrieves state information of the work node included in the table generation query. If the status of one or more work nodes among the work nodes for storing records is an error, the table generation processing error information for the corresponding error generating node is returned (S620).

다수의 작업 노드(300)들이 협업하여 클라이언트가 요청한 작업을 동시에 처리하기 위해서 모든 작업 노드(300)들의 DBMS 엔진(320)들은 동일한 테이블 정보를 공유해야 한다. 이를 위해서 테이블 생성 요청을 수신한 DBMS 엔진(320)은 해당 작업 노드(300)의 클러스터 에이전트(310)에 테이블 생성 요청을 수신한 것을 다른 작업 노드들에게 전송한다.(S630) The DBMS engines 320 of all the work nodes 300 must share the same table information in order for a plurality of work nodes 300 to collaborate and process a work requested by a client at the same time. To this end, the DBMS engine 320 having received the table creation request transmits the received table creation request to the other work nodes to the cluster agent 310 of the work node 300 (S630).

클러스터 에이전트(310)는 해당 작업 노드(300)들을 관리하는 클러스터 관리 노드(200)의 클러스터 관리자(230)에 테이블 생성 요청을 통지하며 이후 테이블 생성 요청을 수신한 클러스터 관리 노드(220)의 클러스터 관리자(230)에 의해서 전체 작업 노드(300)들의 테이블 정보를 동기화하는 과정을 수행한다. The cluster agent 310 notifies the cluster manager 230 of the cluster management node 200 managing the work nodes 300 of the table creation request and then receives the table creation request from the cluster manager of the cluster management node 220. The process of synchronizing table information of all work nodes 300 is performed by 230.

클러스터 관리자(230)는 연결되어 있는 다수의 작업 노드(300)들의 클러스터 에이전트(310)에 테이블 생성 요청을 전달하고 처리 결과를 기다린다. 각 작업 노드(300)의 클러스터 에이전트(310)는 클러스터 관리 노드(200)의 클러스터 관리자(230)로부터 테이블 생성 요청을 수신 후 DBMS 엔진(320)과 연동하여 테이블 정보를 생성하고 테이블 생성 처리 결과를 클러스터 관리자(230)에 전달한다.(S640) 클러스터 관리자(230)는 분산 클러스터 시스템(100)에 포함된 작업 노드(300)들이 동일한 테이블 정보를 공유하도록 클라이언트(400)가 테이블 생성 요청 시 오류 처리 기능을 수행한다(S650). The cluster manager 230 transmits a table creation request to the cluster agent 310 of the plurality of working nodes 300 connected thereto and waits for the processing result. The cluster agent 310 of each work node 300 receives a table creation request from the cluster manager 230 of the cluster management node 200, generates table information in association with the DBMS engine 320, and generates a table generation process result. The cluster manager 230 handles an error when the client 400 requests to create a table so that the work nodes 300 included in the distributed cluster system 100 share the same table information. Perform the function (S650).

클러스터 관리자(230)는 현재 동작 중인 작업 노드(300)들에 테이블 생성 요청을 전송한 후 일정 시간 동안 테이블 생성 처리 결과 메시지를 기다린다. 이때, 테이블 생성 요청에 대한 처리 결과 메시지를 기다리는 대기 시간 정보는 사용자가 정의하도록 한다. 대기 시간 동안 처리 결과 메시지를 전송하지 않는 작업 노드(300)가 존재하는 경우에 클러스터 관리자(230)는 테이블 생성 요청 처리 오류로 간주하고 모든 작업노드(300)들에게 테이블 생성 요청 처리 결과를 삭제하도록 한 후에 오류 정보를 최초로 수신한 제1 작업노드(300)에 전달한다. The cluster manager 230 waits for a table creation process result message for a predetermined time after transmitting a table creation request to the working nodes 300 currently in operation. At this time, the waiting time information waiting for the processing result message for the table creation request is defined by the user. If there is a work node 300 that does not send a processing result message during the waiting time, the cluster manager 230 considers the table creation request processing error and instructs all work nodes 300 to delete the table creation request processing result. After that, the error information is transmitted to the first work node 300 first received.

모든 작업 노드(300)들이 테이블 생성 요청에 대해서 성공 메시지를 전송한 경우, 테이블 생성 요청을 수신한 시점에서 작업 노드(300)들 중에서 오류가 발생한 작업 노드(300)가 있는 경우 오류 발생 작업 노드의 재접속 과정에서 테이블 정보를 동기화하기 위하여 해당 작업노드(300) 아이디와 함께 시간정보, 테이블 생성 요청 정보를 포함한 재접속 동기화 정보를 로컬 저장소(240)에 저장한다.(S660)When all the work nodes 300 transmit a success message for the request to create a table, when there is a work node 300 among the work nodes 300 at the time when the request for creating the table is received, the error occurs in the work node 300. In order to synchronize the table information in the reconnection process, the reconnection synchronization information including the time information and the table creation request information together with the work node 300 ID is stored in the local storage 240 (S660).

도 7은 본 발명의 일 실시예에 따른 오류 발생 작업 노드의 재접속하는 과정을 설명하는 순서도이다. 7 is a flowchart illustrating a process of reconnecting an error generating work node according to an embodiment of the present invention.

도 7에서는 분산 클러스터 시스템(100)을 구성하는 클러스터 관리 노드(200)와 작업 노드(300)들이 구동한 후 오류 발생 작업 노드(300)가 오류 수정 후 재접속하는 경우 전체 분산 클러스터 시스템(100)을 종료하지 않고 동작 중에 오류를 수정하여 재접속할 수 있다. In FIG. 7, when an error-producing work node 300 reconnects after error correction after the cluster management node 200 and the work node 300 that constitute the distributed cluster system 100 are driven, the entire distributed cluster system 100 is connected. You can reconnect by correcting errors during operation without shutting down.

이때, 오류 발생 작업 노드는 오류 수정 후에 DBMS 엔진(320)의 글로벌 메타 정보(321)에 대한 최신화 작업, 즉 업데이트를 진행한 후 추가된다. 오류 발생 작업 노드(300)는 오류 수정 후 클러스터 관리자(230)로 재연결 요청 메시지를 송신한다.(S710) 클러스터 관리자(230)는 로컬 저장소(240)의 테이블 생성 로그 정보와 현재 구동중인 작업 노드(300)들의 상태정보를 재접속하는 오류 발생 작업 노드(300)에 전송하여 관련 정보들을 업데이트 하도록 한다.(S720) In this case, the error occurrence work node is added after updating the global meta information 321 of the DBMS engine 320, that is, updating the error. The error generating work node 300 transmits a reconnection request message to the cluster manager 230 after the error is corrected (S710). The cluster manager 230 is a table node generating log information of the local storage 240 and a currently running work node. The relevant information is updated by transmitting the status information of the 300 to the error generating work node 300 for reconnecting (S720).

재접속한 작업 노드(300)는 다른 작업노드의 상태 정보를 업데이트 하고, 테이블 생성 로그 정보를 이용하여 순차적으로 테이블 생성 작업을 수행하여 테이블 정보를 업데이트한 후에 처리 결과를 클러스터 관리자(230)에 전달한다.(S730) 클러스터 관리자(220)는 다른 작업 노드(300)들의 클러스터 에이전트(310)에게 오류 수정 후 재접속하는 작업 노드(300)의 상태정보를 갱신하도록 요청하게 되며 클러스터 에이전트(310)는 DBMS 엔진(320)의 작업노드 정보를 갱신하도록 한다. The reconnected work node 300 updates the status information of another work node, sequentially performs table creation by using the table creation log information, updates the table information, and then transfers the processing result to the cluster manager 230. (S730) The cluster manager 220 requests the cluster agent 310 of other work nodes 300 to update the status information of the work node 300 reconnecting after correcting an error, and the cluster agent 310 requests the DBMS engine. The work node information of 320 is updated.

상기와 같이 동작함으로써 본 발명에 의한 분산 클러스터 관리 시스템(100)은 구동 후 특정 작업 노드(300)에 오류 발생 후에도 전체 시스템을 중지하지 않고 오류를 수정한 작업 노드가 재접속할 수 있다. 상기와 같은 과정들을 수행한 후 최초로 테이블 생성 질의를 수신한 작업노드는 처리 결과를 클라이언트에 전달한다.By operating as described above, the distributed cluster management system 100 according to the present invention may reconnect a work node having corrected an error without stopping the entire system even after an error occurs in a specific work node 300 after being driven. After performing the above processes, the work node that first receives the table creation query delivers the processing result to the client.

본 발명에 의하면 클라이언트(400)는 클러스터 관리 노드(200)에 최초로 접속하며 클러스터 관리 노드(200)는 다수의 작업 노드(300)들 중에서 클라이언트(400)가 연결할 대상을 지정하게 된다. 이후 클라이언트(400)는 특정 작업 노드(300)의 DBMS 엔진(320)과 연결하여 작업을 수행할 수 있다. According to the present invention, the client 400 connects to the cluster management node 200 for the first time, and the cluster management node 200 designates a target to which the client 400 connects among the plurality of working nodes 300. Thereafter, the client 400 may connect with the DBMS engine 320 of the specific work node 300 to perform a task.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to a preferred embodiment of the present invention, those skilled in the art will be variously modified and changed within the scope of the invention without departing from the spirit and scope of the invention described in the claims below I can understand that you can.

100 : 분산 클러스터 관리 시스템
200 : 클러스터 관리 노드 210 : 클러스터 운용 관리자
220 : 클러스터 관리자 230 : 로컬 저장소
300 : 작업 노드 310 : 클러스터 에이전트
320 : DBMS 엔진 400 : 클라이언트100: distributed cluster management system
200: cluster management node 210: cluster operation manager
220: Cluster Administrator 230: Local Storage
300: job node 310: cluster agent
320: DBMS Engine 400: Client

Claims

At least one job node for parallel processing a job requested by a client using meta information required for distributed job processing; And
And at least one cluster management node for managing the work nodes in group units and performing a cluster operation process to synchronize the meta information to manage work nodes connected to the work nodes.

The method of claim 1,
The cluster management node,
Extracting the device connection information of the cluster management node and the work node constituting the cluster using a cluster configuration file, and the cluster operation to drive the cluster management node and work node using the device connection information through the cluster operation process manager;
A cluster manager which acquires state information of a connected work node using the cluster configuration file and then performs an integrated management function for the work node; And
And a local repository for storing cluster management information including the cluster configuration file.

The method of claim 1,
The meta information includes connection information of each node for connecting through a network,
The cluster configuration file includes a device connection information for connecting the cluster management node and the work node constituting the cluster.

The method of claim 2,
The cluster operation manager,
Transfer the cluster configuration file to a work node belonging to a management group managed by each cluster managed node,
And distributing the cluster configuration file to a work node belonging to the management group when the cluster configuration is changed.

The method of claim 2,
The cluster administrator,
A work node connection manager configured to connect to or disconnect from a work node according to a request of the client;
A cluster information manager which extracts connection information between a cluster management node and a work node constituting the cluster from the cluster configuration file, and transmits a cluster configuration file to the work node according to whether the cluster configuration file is changed;
A work node manager for collecting state information of connected work nodes of a management group managed by the cluster management node; And
And a synchronization manager for synchronizing and sharing meta information of work nodes belonging to the management group.

The method of claim 1,
The work node,
A cluster agent which checks the state of the work node using the cluster manager and the heartbeat protocol and reflects the state information in meta information; And
In response to the table creation request of the client, the table information is generated by interworking with the DBMS engine, and the result of the table generation is transmitted to the cluster management node, and the table information is shared with other work nodes in the management group of the cluster management node. And a DBMS engine for storing state information including meta information of the work node.

The method of claim 6,
The cluster agent,
A node state information collector for collecting state information of the work node; And
And a node information synchronization manager for synchronizing the table information and state information of a work node connected with the table information.

The method of claim 6,
The DBMS engine,
A meta information management engine managing meta information including the table information and state information of a work node;
A query processing engine configured to receive a table creation request of the client, retrieve state information of a work node, and process a table generation processing result or table generation processing error information; And
And a storage engine for storing table generation processing result or table generation processing error information of the client.

In a distributed cluster management method performed by a distributed cluster management system in which a plurality of nodes in a distributed environment process tasks requested by a client,
Comprising a cluster consisting of at least one work node for processing the work requested by the client, at least one cluster management node for managing the work nodes in a group unit, and proceeds to the cluster operation process to manage the work node associated with it, A first step of storing a cluster configuration file including device connection information for each node of the cluster;
When requesting a job processing of the client, obtains connection information of the cluster management node currently running by the cluster operation process, and extracts local information including connection information of a work node belonging to a management group managed by the cluster management node. A second step of doing;
Transmitting the cluster configuration file to a plurality of work nodes connected to the cluster management node when the local information is mapped to the connection information of the cluster configuration file;
A fourth step of performing integrated management function for the work node after acquiring state information of a connected work node using the cluster configuration file; And
And a fifth step of selecting and driving at least one work node to be connected to a client using state information of the work node.

The method of claim 9,
The third step is
And if the local information does not map the access information of the cluster configuration file, terminating the cluster operation process.

The method of claim 9,
The second step,
Requesting the client to connect to the cluster management node;
The cluster management node retrieving state information of a work node, selecting a work node to connect to the client using the state information of the work node, and then transmitting the selected work node information to the client; And
And the client connecting to the DBMS engine of the work node using the work node information and requesting work from the DBMS engine.

The method of claim 11,
Status information of the work node is job status information including the number of clients connected to each work node,
The work node information includes device connection information for connecting to the work node.

The method of claim 9,
The fourth step,
Requesting, by the cluster management node, a heartbeat message to the working node at a predetermined time interval;
The work node transmitting a heartbeat message according to a heartbeat message request to the cluster management node within a predetermined transmission time;
The cluster management node determines that a work node that does not receive a heartbeat message within a predetermined transmission time is determined to be an error-producing work node and records it as error information. Reflecting information in real time; And
And collecting, by the cluster management node, the status information of the work nodes that have received the heartbeat message within a predetermined transmission time, and synchronizing and sharing the status information of all work nodes belonging to their management group. How to manage distributed clusters.

The method of claim 9,
The first step,
If the cluster configuration is changed, further comprising the step of modifying the cluster configuration file for the cluster management node automatically distributed.

The method of claim 9,
In a fifth step, when the table generation query of the client for distributing and storing records belonging to a table to a plurality of work nodes is requested, the cluster operation process may include receiving the table generation query statement;
Step 5-2, in which the first work node that first received the table creation query retrieves state information of the work node included in the table creation query;
When there is no error in the state information of the work node, the DBMS engine of the first work node transmits the table creation query reception state to all work nodes in the group, and generates the table in the cluster management node managing the work node. Step 5-3 of notifying a query reception state;
Step 5-4 of the cluster management node forwarding a table generation query statement to all work nodes to share table information of all work nodes in its management group and waiting for a predetermined waiting time;
Step 5-5 of receiving the table generation query statement to generate table information in association with a DBMS engine and then transmitting a table generation result message to the cluster management node; And
And (c) 5-6, wherein the cluster management node performs table information synchronization when the table generation result message is transmitted from all work nodes within a predetermined waiting time.

The method of claim 15,
The table generation query statement includes work node information for storing a record according to a condition of a predetermined column.

The method of claim 15,
The step 5-3 further includes returning table generation processing error information when the state information of at least one of the work nodes for storing the record is error information. How to manage distributed clusters.

The method of claim 17,
In step 5-6, the cluster management node stores reconnection synchronization information including identification information, time information, and table creation request information of the error generating work node at the time point of returning the table generation processing error information. Distributed cluster management methods.

The method of claim 15,
In step 5-6, the cluster management node determines that a table creation request processing error state exists when there is a work node to which the table creation process result message is not transmitted within a preset time, and creates a table in all work nodes. And transmitting the error information to the first working node after deleting the request processing result.

The method of claim 18,
In step 5-6, the cluster management node is an error-producing work node in which an error occurs at the time when the first work node receives the table generation query, and the error-producing work node is configured to correct the error. And performing reconnection by synchronizing the table information using reconnection synchronization information.

The method of claim 20,
The error generating task node updates the meta information of the DBMS engine after the error is corrected.

The method of claim 20,
The reconnecting step,
Sending, by the failing work node, a reconnection request message to the cluster management node;
Transmitting, by the cluster management node, table generation log information and state information of a currently running work node to the failed work node;
The error-producing work node updates the state information of another work node, sequentially performs table creation by using the table generation log information, updates table information, and then sends a table creation processing result message to the cluster management node. Doing;
Requesting, by the cluster management node, an update on status information of a reconnected work node reconnected after error correction to another work node; And
The other work node except for the reconnection work node comprises the step of updating the status information of the work node.