KR100363523B1

KR100363523B1 - Method for controlling distributed processing in cluster severs

Info

Publication number: KR100363523B1
Application number: KR1019990060750A
Authority: KR
Inventors: 이재혁
Original assignee: 주식회사 아라기술; 정보통신연구진흥원
Priority date: 1999-12-23
Filing date: 1999-12-23
Publication date: 2002-12-05
Also published as: KR20000012756A

Abstract

본 발명은 각 서버간들이 인터넷 트래픽의 분산 처리를 위한 하나의 주서버를 자동 결정하고, 각 서버간의 상호 감시를 통해 특정 서버에서의 오류 발생을 감시하도록 하여, 트래픽의 효율적인 분산 처리 및 클러스터 운용의 안정성을 실현할 수 있도록 한 클러스터링 서버에서의 분산 처리 제어 기법에 관한 것으로, 이를 위하여 본 발명은, 선택된 하나의 마스터 서버가 클러스터링에 참여하는 모든 슬레이브 서버들의 활동 상황을 감시 및 분산 제어하는 전술한 종래 방법과는 달리, 마스터 서버를 선택하지 않은 상황에서 모든 서버들이 네트워크상에 자신이 정상 동작하고 있음을 알리는 상태 확인 패킷을 일정 시간 간격으로 방송하고, 이 방송된 패킷을 모든 서버가 각각 분석하여 자신 이외의 다른 서버들의 동작 상태를 체크하는 방식으로 각 서버간에 서로 동작 상태를 감시하며, 임의의 서버에서 오류가 발생할 때 설정된 규약에 따라 다수의 서버중의 어느 한 서버를 오류 복구 제어를 위한 임시 마스터 서버로 결정하여 오류 복구를 위한 설정 변경 및 오류 복구를 제어하도록 함으로써, 클러서터링 서버 환경에서 트래픽의 효율적인 분산 처리, 오류 복구 및 클러스터 운용의 안정성을 실현할 수 있는 것이다.The present invention allows each server to automatically determine one main server for distributed processing of Internet traffic, and to monitor the occurrence of errors in a specific server through mutual monitoring between each server, thereby enabling efficient distributed processing of traffic and cluster operation. The present invention relates to a distributed processing control scheme in a clustering server to realize stability. To this end, the present invention relates to the aforementioned conventional method for monitoring and distributedly controlling the activity of all slave servers participating in a clustering. Unlike the master server, all servers broadcast a status check packet at regular intervals to indicate that they are operating normally on the network. Each server by checking the operation of other servers It monitors each other's operation status and decides which server among the multiple servers is the temporary master server for error recovery control according to the set protocol when an error occurs in any server. By controlling, it is possible to realize efficient distributed processing, error recovery, and stability of cluster operation in a clustering server environment.

Description

Distributed processing control method in clustering server {METHOD FOR CONTROLLING DISTRIBUTED PROCESSING IN CLUSTER SEVERS}

본 발명은 분산 처리 환경상에서 클러스터링 서버를 제어하는 방법에 관한 것으로, 더욱 상세하게는 인터넷 트래픽의 분산 처리 환경에서 다수의 클러스터링서버간의 분산 처리 및 고장 대응 처리를 원활하게 수행하도록 제어하는 데 적합한 클러스터링 서버에서의 분산 처리 제어 방법에 관한 것이다.The present invention relates to a method for controlling a clustering server in a distributed processing environment, and more particularly, a clustering server suitable for controlling distributed processing and failure response processing among a plurality of clustering servers in a distributed processing environment of Internet traffic. It relates to a distributed process control method in.

최근들어, 인터넷의 대역폭이 커지고 인터넷의 사용자수가 기하급수적으로 증가하면서 인터넷 서버, 특히 웹서버의 분산 처리 요구가 증대되고 있는 실정이며, 이러한 요구에 부응할 수 있도록 여러대의 서버를 사용하여 인터넷 트래픽을 분산 처리함으로써, 방대한 사용자의 요청을 효율적으로 처리하고, 또한 소속된 특정 서버가 오동작할 때 다른 서버가 이를 대신 처리하도록 함으로써 안정성을 보장하는 서버 제어 방법이 제안되고 있다.In recent years, as the bandwidth of the Internet increases and the number of users of the Internet increases exponentially, the demand for distributed processing of Internet servers, especially web servers, is increasing. By performing distributed processing, a server control method has been proposed that efficiently processes a large number of user requests and ensures stability by causing another server to take over when a specific server to which it belongs malfunctions.

특히, 여러 서버(예를들면, 컴퓨터 등)들이 작업을 분담해서 처리하는 서버 클러스터에서의 가장 큰 이슈는 한 컴퓨터에 고장이 발생하는 경우(즉, 서버가 죽었다든지 혹은 서비스 프로세스가 죽었을 경우), 그 고장을 다른 컴퓨터들이 발견해서 죽은 서버에 해당하는 패킷을 현재 작동중인 나머지 서버들이 수용하는 형태로 설정해야 한다는 것이다.In particular, the biggest issue in server clusters where multiple servers (eg computers) share the work and handle them is when one computer fails (i.e. the server dies or the service process dies). In other words, the failure should be discovered by other computers and set up the packet corresponding to the dead server to be accepted by the rest of the servers in operation.

이를 위해서는, 다른 서버들이 현재 작동인 상태이거나 혹은 고장난 상태인지를 각 서버들이 어떻게 발견할 것인가, 그리고 발견된 상황에 대처하기 위해서는 각 서버에 동시에 설정이 적용되어야 하는 데(즉, 각 서버간에 설정 불일치가 일어나는 경우 동작이 제대로 수행되지 않는 시점이 발생하기 때문에 동시에 설정이 적용되어야 함.), 이때 설정 시점의 동기화는 어떻게 할 것인가를 결정하는 것이 반드시 필요하다.To do this, how each server will find out if other servers are currently up or down, and how to deal with the situation, the configuration must be applied to each server at the same time (i.e. configuration mismatch between each server). In this case, the setting must be applied at the same time since the operation is not performed properly.) At this time, it is necessary to decide how to synchronize the setting time.

따라서, 상술한 점을 고려하여 종래에는 다수의 서버중 하나를 마스터 서버로 선택하고, 나머지 서버들을 슬레이브 서버로 선택하여, 인터넷 트래픽을 분산 처리하는 방법이 제안되어 실제 적용되고 있다.Accordingly, in view of the above, conventionally, a method of distributing Internet traffic by selecting one of a plurality of servers as a master server and selecting other servers as a slave server has been proposed and actually applied.

도 4는 종래 방법에 따라 클러스터링 서버에서 분산 처리를 제어하는 방법을 적용하기 위한 종래 클러스터링 서버의 구성도로써, 하나의 마스터 서버(402)와 이에 연결되는 다수의 슬레이브 서버(404/1 - 404/n)를 포함한다.4 is a configuration diagram of a conventional clustering server for applying a method of controlling distributed processing in a clustering server according to a conventional method, and includes one master server 402 and a plurality of slave servers 404/1 to 404 / connected thereto. n).

도 4를 참조하면, 마스터 서버(402)는 자신과 연결된 각 슬레이브 서버들에 정보를 분산시키고 감독하는 기능을 수행하는 것으로, 도시 생략된 사용자들로부터 서버 클러스터로 정보 요청이 들어오면 이를 받아 자신이 관리하는 다수의 슬레이브 서버(404/1 - 404/n)중 하나의 슬레이브 서버로 분배해 주는 기능을 수행한다. 즉, 마스터 서버(402)는 각 슬레이브 서버(404/1 - 404/n)의 부하 상태를 체크해 가면서 각 슬레이브 서버들에 대한 작업 분산을 관리 및 제어한다.Referring to FIG. 4, the master server 402 performs a function of distributing and supervising information among respective slave servers connected to the master server 402. It performs a function of distributing to one slave server among a plurality of managed slave servers 404/1 to 404 / n. That is, the master server 402 manages and controls work distribution for each slave server while checking the load state of each slave server 404/1-404 / n.

또한, 마스터 서버(402)는 자신이 관리하는 각 슬레이브 서버(404/1 - 404/n)가 현재 정상적으로 작동하는 지의 여부를 수시로 체크하는 기능을 수행하는 데, 이를 위해 각 슬레이브 서버에 네트워크 요구 메시지를 전송하며, 각 슬레이브 서버(404/1 - 404/n)들은 네트워크 요구 메시지에 대한 응답 메시지를 주어진 일정 시간내에 마스터 서버(402)로 반송한다.In addition, the master server 402 performs a function of frequently checking whether each slave server 404/1 to 404 / n managed by the master is currently operating normally, and for this purpose, a network request message to each slave server. Each slave server 404/1-404 / n returns a response message to the network request message to the master server 402 within a given time.

따라서, 마스터 서버(402)에서는 각 슬레이브 서버(404/1 - 404n)로부터의 응답 메시지의 수신 여부에 의거하여 각 슬레이브 서버들의 고장 여부를 점검하게 된다. 즉, 마스터 서버(402)에서는 주어진 일정 시간내에 임의의 슬레이브 서버로부터 응답 메시지가 도착하지 않으면 해당 슬레이브 서버가 고장(또는 오류 발생)난 것으로 간주하며, 오류가 발생한 슬레이브 서버의 작업을 현재 정상적으로 작동하는 나머지 슬레이브 서버로 분배하여 처리하도록 제어함으로써, 데이터 처리의 안정성을 보장하고 있다.Therefore, the master server 402 checks whether or not each slave server has failed based on whether a response message from each slave server 404/1-404n is received. That is, if a response message does not arrive from any slave server within a given time period, the master server 402 considers that the slave server has failed (or has an error). By distributing to the remaining slave servers and controlling them, the stability of data processing is guaranteed.

즉, 종래 방법에서는 선택된 하나의 마스터 서버가 자신이 관리하는 다수의 슬레이브 서버들의 활동 상황을 감시(즉, 각 슬레이브 서버에 질의를 주고 응답해 오는 정보를 이용하여 각 슬레이브 서버의 현재 상태를 확인)하고, 문제가 발생(즉, 특정 슬레이브 서버의 고장)할 때 그 고장 상황에 대처(고장난 슬레이브 서버의 작업을 정상 동작중인 나머지 슬레이브 서버로 분배)하는 방식을 사용하고 있다.That is, in the conventional method, one selected master server monitors the activity of a plurality of slave servers managed by itself (i.e., checks the current state of each slave server by using information to query and respond to each slave server). When a problem occurs (i.e., a failure of a specific slave server), a method of coping with a failure situation (distributing work of a failed slave server to the remaining slave servers in normal operation) is used.

그러나, 상술한 바와같이 서버간에 기선택된 마스터 - 슬레이브 구조를 이용하여 인터넷 트래픽의 분산 처리를 수행하는 종래 제어 방법은 다음과 같은 문제점을 갖는다.However, as described above, the conventional control method of performing distributed processing of Internet traffic using a pre-selected master-slave structure among servers has the following problems.

첫째, 선택된 하나의 마스터 서버가 다수의 슬레이브 서버를 관리하기 때문에 마스터 서버에 고장이 발생하는 경우, 이를 실시간으로 인지할 수 있는 개체가 없기 때문에, 이를 발견하고 새로운 서버를 마스터 서버로 선택하여 다시 작업을 수행하는 과정이 매우 복잡하며, 또한 시간이 많이 소요된다는 문제가 있다.First, when a selected master server manages multiple slave servers, if a master server fails, there is no object that can recognize it in real time. The process of doing this is very complicated and also takes a long time.

둘째, 한 서버가 고장난 경우 다른 서버들이 재설정을 할 때 재설정의 시점이 맞지 않으면 인터넷 트래픽의 범위중 일부가 서로 다른 서버끼리 겹쳐져서 오동작이 발생할 수 있기 때문에 재설정의 시점을 동기화(즉, 참여하는 모든 서버들이 일괄적인 실행 환경을 가지게 하는 것)시키는 것이 반드시 필요한 데, 하나의 마스터 서버가 모든 서버들을 직접 관리해야 하기 때문에 동작 시점을 일치시키가 어렵다는 문제, 즉 마스터 서버가 각 슬레이브 서버와 통신을 하는 과정중에 시간이라는 요소가 반드시 필요하므로 실시간으로 변화하는 환경인 서버 클러스터링에 적용이 어렵다는 문제가 있다.Second, if one server fails, if the timing of the reset does not match when the other servers reset, some of the range of Internet traffic may overlap with each other, resulting in a malfunction, thus synchronizing the timing of the reset (ie It is imperative that the servers have a batch execution environment, and it is difficult to match the timing of operation because one master server manages all servers directly, that is, the master server communicates with each slave server. Since the time factor is necessary during the process, it is difficult to apply to server clustering, which is an environment that changes in real time.

셋째, 마스터 서버가 오류 발생을 확인하고 오류를 복구하는 과정중에, 오류가 발생한 서버가 다시 자동 복구(예를들면, 네트워크 라인의 접점 불량 등)되는 경우 그 처리 과정이 복잡, 즉 잠시 동안의 접점 불량으로 오류하고 인식되는 서버를 제외시키고 나머지 서버들로 작업을 배분했다가 오류 발생 서버가 다시 정상으로 자동 복구될 때 이를 처리하는 과정이 복잡하고 비효율적이라는 문제가 있다.Third, while the master server checks for errors and recovers from errors, if the failed server is automatically recovered (for example, a bad contact on a network line), the processing is complicated, i.e., a short contact. There is a problem that it is complicated and inefficient to exclude a server that is recognized as bad and recognize a server, distribute the work to the remaining servers, and process the error server when it is automatically restored to normal.

따라서, 본 발명은, 상기한 종래기술의 문제점을 해결하기 위한 것으로, 각 서버간들이 인터넷 트래픽의 분산 처리를 위한 하나의 주서버를 자동 결정하고, 각 서버간의 상호 감시를 통해 특정 서버에서의 오류 발생을 감시하도록 함으로써, 트래픽의 효율적인 분산 처리 및 클러스터 운용의 안정성을 실현할 수 있는 클러스터링 서버에서의 분산 처리 제어 방법을 제공하는 데 그 목적이 있다.Accordingly, the present invention is to solve the above problems of the prior art, each server automatically determines one main server for distributed processing of Internet traffic, and the error in a specific server through mutual monitoring between each server It is an object of the present invention to provide a distributed processing control method in a clustering server capable of realizing efficient distributed processing of traffic and stability of cluster operation by monitoring occurrence.

상기 목적을 달성하기 위하여 본 발명은, 네트워크를 통해 상호 연결된 다수의 서브를 갖는 클러스터링 서버 환경에서 각 서버간이 분산 처리 및 오류 감시 및 복구를 제어하는 방법에 있어서, 클러스터링에 참여하는 다수의 각 서버들이 자신의 현재 상태를 실은 상태 확인 패킷을 기설정된 일정 시간 간격으로 상기 네트워크상에 연속적으로 방송하는 제 1 과정; 상기 상태 확인 패킷의 수신 및 분석을 통해 상기 각 서버들이 자신을 제외한 다른 서버들의 현재 작동 상태를 체크하는 제 2 과정; 상기 다수의 서버들중 어느 하나의 서버에서 오류가 검출될 때, 현재 작동중인 나머지 서버들중 어느 한 서버를 기설정된 규약에 따라 임시 마스터 서버로 자동 결정하는 제 3 과정; 상기 임시 마스터 서버가 설정 변경의 시작 시간 정보를 포함하는 설정 변경 패킷을 생성하여 상기 네트워크상에 방송하는 제 4 과정; 및 상기 설정 변경 패킷에 포함된 설정 변경 시작 시간 정보와 각 서버의 상태 정보에 의거하여 상기 임시 마스터 서버 및 현재 작동중인 나머지 서버들이 동기화된 설정 변경을 수행하는 제 5 과정으로 이루어진 클러스터링 서버에서의 분산 처리 제어 방법을 제공한다.In order to achieve the above object, the present invention provides a method for controlling distributed processing and error monitoring and recovery between servers in a clustering server environment having a plurality of subs interconnected through a network. A first step of continuously broadcasting a status check packet carrying the current status on the network at a predetermined time interval; A second step of checking each server's current operation state except for itself by receiving and analyzing the status check packet; A third step of automatically determining any one of the remaining servers currently operating as a temporary master server according to a predetermined protocol when an error is detected in one of the plurality of servers; A fourth step of the temporary master server generating a configuration change packet including the start time information of the configuration change and broadcasting it on the network; And a fifth process in which the temporary master server and the remaining servers currently in operation perform a synchronized configuration change based on configuration change start time information included in the configuration change packet and state information of each server. Provide a process control method.

도 1은 본 발명에 따라 클러스터링 서버에서 분산 처리를 제어하는 방법을 적용하는 데 적합한 클러스터링 서버의 구성도,1 is a configuration diagram of a clustering server suitable for applying a method for controlling distributed processing in a clustering server according to the present invention;

도 2는 본 발명의 바람직한 실시예에 따라 클러스터링 서버에서 분산 처리를 제어하는 과정을 도시한 플로우챠트,2 is a flowchart illustrating a process of controlling distributed processing in a clustering server according to a preferred embodiment of the present invention;

도 3은 각 서버에서 브로트캐스팅되는 상태 확인 또는 설정 변경 패킷 구조의 일예를 도시한 패킷 구조도,3 is a packet structure diagram illustrating an example of a status check or configuration change packet structure broadcasted by each server;

도 4는 종래 방법에 따라 클러스터링 서버에서 분산 처리를 제어하는 방법을 적용하기 위한 종래 클러스터링 서버의 구성도.4 is a configuration diagram of a conventional clustering server for applying a method of controlling distributed processing in a clustering server according to a conventional method.

＜도면의 주요부분에 대한 부호의 설명＞<Description of the code | symbol about the principal part of drawing>

102 : 사용자 104/1 - 1-4/3 : 서버102: user 104/1-1-4 / 3: server

본 발명의 상기 및 기타 목적과 여러 가지 장점은 이 기술분야에 숙련된 사람들에 의해 첨부된 도면을 참조하여 하기에 기술되는 본 발명의 바람직한 실시예로부터 더욱 명확하게 될 것이다.The above and other objects and various advantages of the present invention will become more apparent from the preferred embodiments of the present invention described below with reference to the accompanying drawings by those skilled in the art.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 대하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

먼저, 본 발명의 핵심 기술요지는, 다수의 서버중 하나를 마스터 서버로 선택하고 선택된 하나의 마스터 서버가 모든 슬레이브 서버들의 활동 상황을 감시 및 분산 제어하는 전술한 종래 방법과는 달리, 마스터 서버를 선택하지 않은 상황에서 모든 서버들이 네트워크상에 자신이 살아 있음(즉, 정상 동작을 수행하고 있음)을 알리는 패킷(즉, 상태 확인 패킷)을 일정 시간(예를들면, 1초, 2초 등) 간격으로방송하고, 이 방송된 패킷을 모든 서버가 각각 분석하여 자신 이외의 다른 서버들의 동작 상태(정상 동작 또는 오류 발생)를 체크하는 방식으로 각 서버간에 서로 동작 상태를 체크하며, 임의의 서버에서 오류가 발생한 경우 다수의 서버중의 어느 한 서버(예를들어, 가장 작은 IP 주소를 갖는 현재 작동중인 서버)가 임시 마스터 서버로 동작하여 오류 복구(즉, 오류가 발생한 서버의 작업을 다른 서브들로 분배)를 위한 설정 변경을 제어한다는 것으로, 이러한 기술적 수단을 통해 본 발명에서 목적으로 하는 바를 쉽게 달성할 수 있다.First, a key technical aspect of the present invention is that unlike the aforementioned conventional method of selecting one of a plurality of servers as a master server and selecting one master server to monitor and distribute the activity of all slave servers, When not selected, all servers have a certain amount of time (e.g. 1 second, 2 seconds, etc.) to indicate that they are alive (i.e. performing normal operations) on the network (i.e. health check packets). At the interval, each server analyzes each broadcast packet and checks the operation status of each server other than itself (normal operation or error). In the event of a failure, any one of the multiple servers (for example, the currently running server with the smallest IP address) acts as a temporary master server to recover from the failure. (That is, an error is generated in the distribution server operations to other sub) that that controls the setting change for, by these technical means can easily attain the bar for the purpose of the present invention.

도 1은 본 발명에 따라 클러스터링 서버에서 분산 처리를 제어하는 방법을 적용하는 데 적합한 클러스터링 서버의 구성도이다.1 is a block diagram of a clustering server suitable for applying a method for controlling distributed processing in a clustering server according to the present invention.

도 1을 참조하면, 클러스터링 서버는 네트워크를 통해 상호 연결되는 다수의 서버, 예를들면 세 개의 서버(104/1, 104/2, 104/3)들로 구성되는 데, 마스터 서버가 미리 선택되는 전술한 종래 방법과는 달리, 여기에서는 마스터 서버가 미리 선택되지 않으며 단지 특수 상황(즉, 임의의 서버에서의 오류 발생) 발생시에 다수의 서버중 한 서버(예를들면, 현재 작동중인 서버중 가장 작은 IP 주소를 갖는 서버)가 임시 마스터 서버로 자동 결정되어 설정 변경, 동기화 및 오류 복구를 제어하며, 이러한 임시 마스터 서버는 오류 발생 상황에 따라 가변, 즉 임시 마스터 서버에서 오류가 발생하는 경우 현재 정상 작동중인 서버중 가장 작은 IP 주소를 갖는 서버가 다시 새로운 임시 마스터 서버로 결정된다.Referring to FIG. 1, the clustering server is composed of a plurality of servers interconnected through a network, for example, three servers 104/1, 104/2, 104/3, wherein the master server is preselected. Unlike the conventional method described above, here the master server is not preselected and only one of the multiple servers (e.g., the server currently running at the time of a special situation (i.e. an error on any server)) occurs. Servers with small IP addresses) are automatically determined as temporary master servers to control configuration changes, synchronization, and failover, and these temporary master servers are variable depending on the failure situation, that is, if a failure occurs at the temporary master server, The server with the smallest IP address among the running servers is again determined as the new temporary master server.

따라서, 각 서버(104/1, 104/2, 104/3)들은 자신의 상태 확인 패킷을 네트워크로 방송하고, 임의의 서버에서 오류가 발생하여 특정 서버가 임시 마스터 서버로결정될 때 각 서버들은 임시 마스터 서버로부터 방송되는 설정 변경 패킷에 의거하여 서로 동기를 마추면서 설정 변경을 수행하게 된다.Thus, each server (104/1, 104/2, 104/3) broadcasts its own status check packet over the network, and when an error occurs in any server and a particular server is determined to be a temporary master server, each server is temporary. The configuration change is performed while synchronizing with each other based on the configuration change packet broadcast from the master server.

이때, 각 서버가 네트워크로 방송하는 상태 확인 패킷 또는 설정 변경 패킷은, 일예로서 도 3에 도시된 바와같이, 송신 클러스터 번호 정보(302), 송신 IP 주소 정보(304), 수신 클러스터 번호 정보(306) 및 제어 데이터 정보(308)를 포함할 수 있으며, 오류 복구를 위해 임시 마스터 서버가 네트워크로 방송하는 설정 변경 패킷에는 설정 변경 시작 시점에 대한 시간 정보가 포함된다. 여기에서, 상태 확인 또는 설정 변경 패킷내의 제어 데이터 정보(308) 영역에는 해당 서버의 상태 정보(작동 체크 정보 또는 설정 변경 정보)와 기타 응용 프로그램 실행에 관련된 정보가 삽입된다.At this time, the status check packet or the setting change packet broadcasted by each server on the network is, for example, as shown in FIG. 3, the transmission cluster number information 302, the transmission IP address information 304, and the reception cluster number information 306. And control data information 308, and the configuration change packet broadcasted by the temporary master server to the network for error recovery includes time information on a configuration change start time. Here, the control data information 308 area in the status check or setting change packet is inserted with status information (operation check information or setting change information) of the corresponding server and information related to execution of other applications.

한편, 도 3에 일예로서 도시한 상태 확인 패킷에서는 송신 및 수신 클러스터 번호(즉, 가상 IP 주소)를 별도로 분리하여 구성하였으나 반드시 이와같이 구성할 필요는 없으며, 클러스터의 상황 또는 구조에 따라 송신 클러스터 번호 또는 수신 클러스터 번호에 대한 항목만을 갖도록 구성할 수도 있다.On the other hand, in the status check packet shown as an example in FIG. 3, the transmitting and receiving cluster numbers (i.e., virtual IP addresses) are separately configured, but not necessarily configured as such, and are not necessarily configured according to the situation or structure of the cluster. It can also be configured to have only an entry for the receiving cluster number.

또한, 본 발명에서는 클러스터링 서버의 파워가 셋팅되면, 정보를 요청한 사용자 IP 또는 포트 주소인 TCP/IP에 의거하여 참여중인 다수의 서버중 어느 한 서버가 정보 요청에 응답하는 응답 서버로써 자동 결정된다. 즉, 도 1에 도시된 바와같이 사용자(102)로부터의 정보 요청 패킷은 네트워크를 통해 클러서터링에 참여중인 모든 서버, 즉 서버 1(104/1), 서버 2(104/2) 및 서버 3(104/3) 모두에게 전달되는 데, 예를들어 서버 3(104/3)이 응답 서버로 결정되는 경우 서버 1(104/1) 및서버 2(104/2)에서는 정보 요청에 대한 응답을 거부하고 서버 3(104/3)에서만 정보 요청에 응답하여 사용자가 요청한 정보를 사용자에게 전송하게 될 것이다.In addition, in the present invention, when the power of the clustering server is set, any one of a plurality of participating servers is automatically determined as a response server responding to the information request based on the user IP or the port address TCP / IP that requested the information. That is, as shown in FIG. 1, the information request packet from the user 102 is transmitted to all servers participating in the clustering process through the network, that is, server 1 (104/1), server 2 (104/2), and server 3. (104/3) to all, for example, if Server 3 (104/3) is determined to be the answering server, then Server 1 (104/1) and Server 2 (104/2) respond to the request for information. Only the server 3 (104/3) will respond to the request for information and will send the user the requested information.

다음에, 상술한 바와같은 구성을 갖는 클러스터링 서버를 이용하여 본 발명에 따라 분산 처리를 제어하는 과정에 대하여 설명한다.Next, a process of controlling distributed processing according to the present invention using the clustering server having the above-described configuration will be described.

도 2는 본 발명의 바람직한 실시예에 따라 클러스터링 서버에서 분산 처리를 제어하는 과정을 도시한 플로우챠트이다.2 is a flowchart illustrating a process of controlling distributed processing in a clustering server according to a preferred embodiment of the present invention.

도 2를 참조하면, 클러스터링 서버에 파워가 셋팅되어 작동 모드를 수행하는 중에 각 서버들은 자신의 현재 상태에 관련된 상태 확인 정보를 생성하여 일정 시간 간격, 예를들면 1초 간격으로 네트워크상에 방송(브로드캐스팅)하게 되며, 각 서버에서는 브로드캐스팅된 상태 확인 패킷에 의거하여 자신을 제외한 모든 서버들에 대한 현재 상태(즉, 작동 상태 또는 오류 발생 상태)를 감시한다(단계 202). 이때, 방송되는 상태 확인 패킷은, 일예로서 도 3에 도시된 바와같이, 송신 클러스터 번호 정보(302), 송신 IP 주소 정보(304), 수신 클러스터 번호 정보(306) 및 제어 데이터 정보(308)를 포함할 수 있다.Referring to FIG. 2, while the clustering server is set to power and performing an operation mode, each server generates status check information related to its current state and broadcasts on a network at a predetermined time interval, for example, one second interval. And each server monitors the current state (ie, operating state or error state) for all servers except itself based on the broadcasted status check packet (step 202). At this time, the broadcast status check packet includes, as an example, the transmission cluster number information 302, the transmission IP address information 304, the reception cluster number information 306, and the control data information 308 as shown in FIG. 3. It may include.

즉, 각 서버에서는 임의의 서버로부터 상태 확인 패킷이 방송되지 않은 경우를 체크하여 해당 서버의 오류 발생을 검출하게 되는 데(단계 204), 여기에서의 체크 결과 특정 서버, 예를들면, 서버 2(104/2)에서 오류가 발생(즉, 상택 확인 패킷의 미방송)한 것으로 판단되면, 현재 정상 작동중인 다른 서버, 즉 서버 1(104/1) 및 서버 3(104/3)에서는 제로값으로 클리어된 상태인 카운트값 C를 1증가시킨 다음(단계 206), 현재 카운트값 C가 기설정된 카운트값 n에 도달했는 지의 여부를체크하며(단계 208), 체크 결과 현재 카운트값 C가 기설정된 카운트값 n에 도달하지 않은 것으로 판단되면, 처리는 전술한 단계(202)로 되돌아가 그 이후의 과정을 반복 수행하게 된다.That is, each server checks a case where a status check packet is not broadcasted from an arbitrary server and detects an error occurrence of the corresponding server (step 204). As a result of the check here, a specific server, for example, server 2 ( If it is determined that an error has occurred (i.e., unbroadcasting of the acknowledgment packet) at 104/2), the other servers currently operating normally, that is, server 1 (104/1) and server 3 (104/3), return to zero. The count value C, which is in the cleared state, is increased by one (step 206), and then it is checked whether the current count value C has reached the preset count value n (step 208), and the check result shows that the current count value C is the preset count. If it is determined that the value n has not been reached, the process returns to step 202 described above and repeats the subsequent steps.

여기에서, 기설정 카운트값 n은, 예를들면 5로 설정할 수 있는 데, 기설정 카운트값 n을 5로 설정한다는 것은 임의의 서버가 연속하여 다섯 번동안 상태 확인 패킷을 미방송하는 지의 여부를 체크한다는 것을 의미하는 데, 이것은 트래픽의 상황에 따라 상태 확인 패킷이 제대로 전달되지 않을 경우 또는 접점 불량 등에 의해 한, 두 번 정도 상태 확인 패킷을 방송하지 않는 경우 등에 대해 이를 오류 발생으로 판단하지 않도록 하기 위해서이다.Here, the preset count value n can be set to, for example, 5, but setting the preset count value n to 5 indicates whether any server has not broadcasted the status check packet five times in a row. This means that if the status check packet is not delivered properly according to the traffic situation, or if the status check packet is not broadcasted once or twice due to bad contact, etc., it is not judged as an error occurrence. For that.

즉, 한, 두 번 정도의 상태 확인 패킷 미방송, 또는 수신 실패가 발생할 때마다 이를 모두 오류 발생으로 판단하여 오류 복구 과정을 수행하게 되는 경우, 불필요하게 많은 오류 복구 처리 과정으로 인해 오히려 각 서버들의 작업 효율이 저하되는 것을 방지하기 위해서이다.In other words, when one or two status check packets are not broadcasted or receive failures, all of them are regarded as an error occurrence and the error recovery process is performed. This is to prevent the work efficiency from being lowered.

한편, 단계(208)에서의 체크 결과, 현재 카운트값 C가 기설정된 카운트값 n에 도달한 것으로 판단, 예를들어 서버 2(104/2)가 연속하여 다섯 번동안 상태 확인 패킷을 미방송한 것으로 판단되면, 서버 1(104/1) 및 서버 3(104/3)은 서버 2(104/2)에 오류가 발생한 것으로 판단하며, 서버 2(104/2)에서의 오류 발생을 복구하기 위한 설정 변경 제어를 수행, 즉 서버 1(104/1) 또는 서버 3(104/3)중 어느 하나가 오류 복구를 감독하는 임시 마스터 서버로 결정되는 데, 예를들면 기설정된 규약에 따라 가장 작은 IP 주소를 갖는 서버가 임시 마스터 서버(예를들면, 서버1(104/1))로 결정된다(단계 210).On the other hand, as a result of the check in step 208, it is determined that the current count value C has reached the preset count value n. For example, the server 2 104/2 has not broadcasted the status check packet five times in a row. If it is determined that the server 1 (104/1) and the server 3 (104/3) determines that an error has occurred in the server 2 (104/2), to recover the error occurrence in the server 2 (104/2) Either Server 1 (104/1) or Server 3 (104/3) is responsible for performing configuration change control, which determines the temporary master server to oversee error recovery, e.g. the smallest IP The server with the address is determined to be a temporary master server (e.g., server 1 104/1) (step 210).

따라서, 임시 마스터 서버로 결정된 서버 1(104/1)은 설정 변경 시작 시점의 시간 정보(즉, 몇초후에 설정 변경 과정이 시작됨을 알리는 시간 정보)를 포함하는 설정 변경 패킷을 생성하여 네트워크상에 방송하고(단계 212), 기설정되어 셋팅된 설정 변경 제어 카운트값 C를 1감소시킨 다음(단계 214), 현재의 설정 변경 제어 카운트값이 제로값(0)에 도달했는 지의 여부를 체크하며(단계 216), 체크 결과 현재의 설정 변경 제어 카운트값 C가 제로값에 도달하지 않은 것으로 판단되면, 처리는 전술한 단계(212)로 되돌아가 그 이후의 과정을 반복 수행하게 된다.Therefore, the server 1 (104/1) determined as the temporary master server generates a configuration change packet including time information at the start of the configuration change (that is, time information indicating that the configuration change process starts after a few seconds), and then on the network. Broadcast (step 212), decrease the preset setting change control count value C by one (step 214), and then check whether the current setting change control count value has reached zero (0) ( In step 216, if it is determined that the current setting change control count value C has not reached the zero value, the process returns to step 212 described above and repeats the subsequent steps.

여기에서, 기설정된 설정 변경 제어 카운트값 C는, 예를들면 5로 설정할 수 있는 데, 설정 변경 제어 카운트값 C를 5로 설정한다는 것은 오류가 발생한 서버에서의 오류 복구를 위해 설정 변경을 위한 설정 변경 제어신호를 일정 시간(예를들면, 1초 등) 간격으로 연속하여 다섯 번을 방송한다는 것을 의미하는 데, 이것은 임시 마스터 서버가 아닌 현재 작동중인 다른 서버들이 설정 변경 패킷의 수신에 실패할 경우에 대비하기 위해서이다. 즉, 설정 변경 패킷을 일정 시간 간격으로 n번 방송하는 것은 오류 복구를 위한 설정 변경을 보다 확실하게 수행하기 위해서이다.Here, the preset setting change control count value C may be set to, for example, 5, but setting the setting change control count value C to 5 means setting for setting change for error recovery in a server where an error has occurred. This means that the change control signal is broadcast five times in succession at a certain time interval (for example, 1 second, etc.), which means that other servers other than the temporary master server currently fail to receive the configuration change packet. To prepare for. That is, broadcasting the configuration change packet n times at regular time intervals is to more reliably perform the configuration change for error recovery.

한편, 상기한 단계(216)에서의 체크 결과, 설정 변경 제어 카운터값 C가 제로값인 것으로 판단되면, 현재 작동중인 각 서버에서는 설정 변경 패킷에 들어 있는 설정 변경 시점 시작 시간 정보와 기존에 받아 저장하고 있는 각 서버들의 상태 정보를 이용하여 동기화된 설정 변경을 수행하게 된다. 이때, 임시 마스터 서버 또는 동기를 마추면서 설정 변경을 수행하게 된다(단계 218).On the other hand, if it is determined that the setting change control counter value C is zero as a result of the check in the above step 216, each server currently operating receives and stores the setting change time start time information contained in the setting change packet and previously stored. Synchronized configuration change is performed using the status information of each server. At this time, the configuration change is performed while finishing the temporary master server or synchronization (step 218).

따라서, 임시 마스터 서버에서는 설정 변경이 완료되면, 오류가 발생한 서버의 작업을 현재 작동중인 다른 서버로 그 작업 상태에 따라 적절하게 분배하여 처리하도록 제어함으로써, 오류 발생에 대한 복구를 수행하게 된다. 이때, 각 서버에서 증가 카운트한 카운트값 C와 임시 마스터 서버에서 감소 카운트한 C는 모두 원상태로 재셋팅(제로값 및 5값)된다.Therefore, when the configuration change is completed, the temporary master server controls the error server to appropriately distribute and process the job of the server in error according to the job status, thereby recovering from the error. At this time, both the count value C incremented by each server and C counted by the temporary master server are reset to their original state (zero value and 5 value).

한편, 상술한 바람직한 실시예에서는 클러스터링에 참여중인 임의의 서버에서 오류가 발생한 경우를 일예로서 설명하였으나, 임의의 서버가 아닌 임시 마스터 서버로 결정된 서버에서 오류가 발생하더라도, 상술한 바와 동일한 과정들을 통해 해당 서버(임시 마스터 서버)에 대한 오류 복구를 수행할 수 있음은 물론이다.Meanwhile, in the above-described preferred embodiment, an error occurs in an arbitrary server participating in clustering as an example. However, even if an error occurs in a server determined as a temporary master server instead of an arbitrary server, the same process as described above is performed. Of course, you can perform error recovery on the server (temporary master server).

다른한편, 본 발명의 바람직한 실시예에서는 상태 확인 패킷의 수신 체크를 1초 간격으로 연속적으로 5회 수행하고, 오류 발생시에 설정 변경 패킷을 5회 연속하여 방송하는 것으로하여 설명하였으나, 본 발명이 반드시 이에 국한되는 것은 아니며, 이 기술분야의 숙련자라면 트래픽 상황(또는 패킷 분실 확률)에 따라 변경 가능함을 충분하게 이해할 수 있을 것이다.On the other hand, in the preferred embodiment of the present invention, the reception check of the status check packet is continuously performed five times at an interval of one second, and when the error occurs, the description has been made by broadcasting the setting change packet five times in succession. The present invention is not limited thereto, and a person of ordinary skill in the art may fully understand that it can be changed according to traffic conditions (or packet loss probability).

예를들어, 보다 빠른 고장 발견이 요구되는 네트워크 환경인 경우 수신 체크 시간을 1초 이하로 낮추어 사용할 수 있으며, 패킷 분실이 상대적으로 빈번하게 발생하는 네트워크 환경인 경우 상태 확인 패킷과 설정 변경 패킷이 수를 늘려 사용함으로써, 네트워크 환경의 적응성을 보다 확보 또는 확장할 수 있다.For example, in a network environment where faster failure detection is required, the reception check time can be reduced to less than 1 second. In a network environment where packet loss occurs relatively frequently, the number of status check packets and configuration change packets can be reduced. By increasing the number, the adaptability of the network environment can be more secured or expanded.

이상 설명한 바와같이 본 발명에 따르면, 선택된 하나의 마스터 서버가 클러스터링에 참여하는 모든 슬레이브 서버들의 활동 상황을 감시 및 분산 제어하는 전술한 종래 방법과는 달리, 마스터 서버를 선택하지 않은 상황에서 모든 서버들이 네트워크상에 자신이 정상 동작하고 있음을 알리는 상태 확인 패킷을 일정 시간 간격으로 방송하고, 이 방송된 패킷을 모든 서버가 각각 분석하여 자신 이외의 다른 서버들의 동작 상태(정상 동작 또는 오류 발생)를 체크하는 방식으로 각 서버간에 서로 동작 상태를 체크하며, 임의의 서버에서 오류가 발생한 경우 설정된 규약에 따라 다수의 서버중의 어느 한 서버를 오류 복구 제어를 위한 임시 마스터 서버로 결정하여 오류 복구를 위한 설정 변경 및 오류 복구를 제어하도록 함으로써, 클러서터링 서버 환경에서 트래픽의 효율적인 분산 처리, 오류 복구 및 클러스터 운용의 안정성을 실현할 수 있다.As described above, according to the present invention, unlike the above-described conventional method of monitoring and distributedly controlling the activity of all slave servers participating in clustering, the selected master server is not all the servers in the situation where the master server is not selected. Broadcasts a status check packet at regular intervals to indicate that it is operating normally on the network, and checks the operating status (normal operation or error) of other servers other than itself by analyzing each broadcast packet. Checks the operating status of each server in the same way, and if an error occurs in any server, it sets up for error recovery by determining one server among the multiple servers as a temporary master server for error recovery control according to the set protocol. By controlling change and error recovery, you It is possible to realize an efficient distributed processing, reliability of failover and cluster management traffic.

또한, 본 발명에 따른 클러스터링 서버에서의 분산 처리 제어 방법은 기설정된 n번의 상태 확인 패킷 실패시를 오류 발생으로 검출하여 오류 복구 과정을 수행하도록 함으로써, 불필요하게 많은 오류 복구 처리로 인해 각 서버들의 작업 효율이 저하되는 것을 효과적으로 억제, 즉 환경 적응성을 갖는 분산 처리 및 제어를 실현할 수 있다.In addition, the distributed processing control method in the clustering server according to the present invention detects a predetermined n times of status check packet failure as an error occurrence and performs an error recovery process. It is possible to effectively suppress deterioration of efficiency, that is, to achieve distributed processing and control with environmental adaptability.

Claims

A method for controlling distributed processing and error monitoring and recovery between servers in a clustering server environment having a plurality of subs interconnected through a network,

A first step in which each of the plurality of servers participating in clustering continuously broadcasts a status check packet carrying its current status on the network at predetermined time intervals;

A second step of checking each server's current operation state except for itself by receiving and analyzing the status check packet;

A third step of automatically determining any one of the remaining servers currently operating as a temporary master server according to a predetermined protocol when an error is detected in one of the plurality of servers;

A fourth step of the temporary master server generating a configuration change packet including the start time information of the configuration change and broadcasting it on the network; And

Distributed processing in the clustering server comprising a fifth process of performing the configuration change synchronized with the temporary master server and the remaining servers currently operating based on the configuration change start time information included in the configuration change packet and state information of each server. Control method.

2. The clustering server of claim 1, wherein the method automatically determines a server having the smallest IP address among the remaining servers currently operating as the temporary master server when an error is detected at any server. Distributed processing control method.

The method of claim 1 or 2, wherein the second process is:

A twenty-first step of, if the status check packet for any server is not received, increasing the current count value by one;

A twenty-second step of checking whether the current count value reaches a preset count value;

A twenty-third step of repeating the first step if it is determined that the current count value does not reach the preset count value; And

And a twenty-fourth step proceeding to the third step when the current count value reaches the preset count value.

The method of claim 3, wherein the broadcast time interval of the status check packet is changeable according to a traffic environment of the network.

4. The distributed processing control method according to claim 3, wherein the check of the number of receptions of the status check packet is changeable according to a relative packet loss probability of the network environment.

The method of claim 1 or 2, wherein the fourth process comprises:

A 41st step of generating and broadcasting the configuration change packet and then reducing the predetermined configuration change control count value by one;

A 42nd step of checking whether the reduced setting change control count value has reached a zero value;

If it is determined that the reduced setting change control count value does not reach the zero value, step 43 of repeating steps 41 and 42; And

And a forty-fourth step of proceeding to the fifth step when the reduced setting change control count value reaches the zero value.

7. The distributed processing control method according to claim 6, wherein the number of broadcasts of the configuration change packet can be changed according to a relative packet loss probability of the network environment.