WO2020158968A1

WO2020158968A1 - Method for extending task node in hybrid p2p cluster system

Info

Publication number: WO2020158968A1
Application number: PCT/KR2019/001245
Authority: WO
Inventors: 김학철; 강조현; 정태균; 한혁; 진성일
Original assignee: ㈜리얼타임테크
Priority date: 2019-01-29
Filing date: 2019-01-30
Publication date: 2020-08-06
Also published as: KR102112047B1

Abstract

The present invention relates to a technique which performs an operation of adding a new task node while maintaining a service for a client as much as possible in a hybrid P2P distributed cluster system, and automatically performs load balancing processing among task nodes including the new task node in consideration of a data loading state of an existing task node.

Description

A method of extending a work node in a hybrid P2P cluster system

The present invention performs a task of adding a new task node while maintaining the service to a client as much as possible in a hybrid P2P-type distributed cluster system, and considers the data load state of the existing task node and distributes the load between task nodes including the new task node. It relates to a technique to perform automatically.

A cluster system means collectively processing a single task through a single set of networked computings. If the supercomputer connects multiple CPUs internally and each CPU is designed to access a common virtual memory, the biggest difference is that the cluster system implements each connection through a network.

Since the cluster system can be configured using a general server, it has the advantage of being less expensive to build than a supercomputer. The system can be built according to the needs of users, and it has the advantage of easily improving performance by increasing the number of connected servers, but on the other hand, it is difficult to maintain compared to a single system, and the performance of the system greatly affects network performance. It has the disadvantage of receiving it.

Such a cluster system is a method of extending a system by connecting homogeneous or heterogeneous nodes with a network, and individual systems in a cluster are called servers or nodes. The load balancing cluster system is composed of a load distributor that serves to allocate requests from clients to other nodes in the cluster, and a node that provides services for requests. In a load balancing cluster system, when a service request is made, the load divider selects an appropriate node based on a specific algorithm and processes the request. In other words, a plurality of nodes do not divide and process a single task, but a node selected by a distribution algorithm processes all assigned tasks. Therefore, unlike a parallel system, all rush requests are evenly allocated to multiple nodes so that the load is distributed.

In order to manage such a load-balanced cluster system, a master-slave method and a P2P method are largely used. First, the master-slave distributed cluster management system is composed of a master node that manages the cluster system and a slave node that handles work processing. In the P2P distributed cluster management system, all work nodes perform work processing operations at an equal level. Do it.

In a conventional load-balanced cluster system, a master node is applied to a slave node in a distributed environment by passing a master node through a master node, which causes a decrease in processing performance, and when the master node fails, the entire cluster must be restarted. There is a problem.

In addition, the load balancing cluster system to which the conventional P2P method is applied has a problem in that additional work is required to select a master node among work nodes when performing a task for cluster management.

On the other hand, recently, a hybrid P2P cluster system has been proposed that is composed of a coordinator node and a plurality of work nodes, and the coordinator node performs only a role of managing work nodes and performs work through collaboration of only work nodes.

The hybrid P2P system cluster system combines the advantages of the master-slave system and the P2P system, and the client connects to the work node received from the coordinator server to perform the work request, and the specific work node connected to the client is applied from the client. The data of the same table is distributed and stored in multiple other work nodes.

However, in the work node, the data in the table is distributed and stored in another work node according to a predetermined rule, and a work node is added due to a certain period of time exceeding the storage capacity or data being unevenly stored in the work node. The situation to be done occurs.

In order to add a new work node, it is necessary to perform the procedure of registering the work node, and it is common to temporarily suspend the cluster system and perform such configuration processing for service stability.

However, in order to provide quality of service to clients, a method of expanding a new work node without stopping the cluster system as much as possible is required.

Accordingly, the present invention was created in view of the above-described circumstances, and the present invention is a hybrid P2P-type cluster system capable of adding new work nodes while maintaining service to a client in a hybrid P2P-type distributed cluster system. The technical purpose is to provide a method for extending a node.

In addition, the present invention is a hybrid P2P cluster that automatically performs a load balancing operation between work nodes including a new work node in consideration of the data load state of an existing work node, and then normally operates the cluster system including the new work node. Another technical purpose is to provide a method for extending a work node in a system.

According to an aspect of the present invention for achieving the above object, in a hybrid P2P cluster system that configures a cluster with a plurality of work nodes and a coordinator server that manages the work nodes to simultaneously process the work requested by the client in a distributed environment. In the method of extending a work node, when the request information for adding a new work node is received from the coordinator server from the outside, a new connection request from the client is rejected, and an ongoing service is provided to the client connected to the existing work node without interruption. In addition, the first step of updating the metadata by reflecting the new work node access information, and the coordinator server remotely copies the metadata updated in the first step to the new work node, and drives the new work node, as well as the new work. In the second step of setting the state of the node to a standby state, if the coordinator server determines that all other ongoing services except the search service are completed for the client currently connected from all existing work nodes, all work nodes including the new work node. A third step of performing a load balancing operation between the coordinator server and updating the metadata in response to the load balancing operation result, and a fourth step of synchronizing the updated metadata with all work nodes including the new work node, and The coordinator server changes the status information of the new work node to an operation state for all work nodes including the new work node, operates all work nodes including the new work node as a cluster, and restarts the service for the client. Provided is a method for extending a work node in a hybrid P2P system cluster system, characterized in that it is configured to include.

In addition, in the third step, the existing work nodes reject the service for requesting to add/update/delete newly requested data from the currently connected client, and provide the service without interruption to the newly requested data retrieval request. A hybrid P2P cluster comprising a step of transmitting completion service completion information indicating that the data is being added/updated/deleted for the data being processed is ready to perform a distributed operation to the coordinator server. A method of extending a work node in a system is provided.

In addition, the existing work nodes are configured to create a copy corresponding to a preset number and store them in different work nodes, and in the first step, the coordinator server has a number of new work nodes than the preset number of copies (K). The step of managing only the access information of the work node until the number of "1" is a multiple of a large number (K+1), and adding the new work node information corresponding to the multiple of (K+1) is completed from the client. Provided is a method for extending a work node in a hybrid P2P cluster system comprising rejecting a new connection request and requesting to provide a service for a client currently connected to an existing work node without interruption. do.

In addition, the third step is a step of collecting data loading status information including the table unit size of the existing work nodes, the total memory size and the current memory usage by the coordinator server, and based on the data loading status information for each existing work node. Determining data partitioning information including a work target to be partitioned to move data, a data capacity of the partitioning decision data, and a storage target worknode to store the partitioning decision data, based on the determined data partitioning information Provided is a method of extending a work node in a hybrid P2P system cluster system, characterized in that it comprises a step of moving.

In addition, in the fourth step, a method for extending a work node in a hybrid P2P cluster system is provided, characterized in that the new work node duplicates data stored in the mobile and stores it in another new work node.

In addition, the step of determining the data partitioning information is calculated by dividing the total data capacity stored in all existing work nodes by the coordinator server by the sum of the number of existing work nodes and new work nodes, and calculating an average data capacity. A method of extending a work node in a hybrid P2P cluster system is provided, characterized in that an error of a value or more is allowed and existing work nodes exceeding this are set as a work node to be divided.

Further, in the determining of the data partitioning information, in the hybrid P2P cluster system, the data capacity and the storage target work node are determined by dividing the key value range of the partitioned determined table into the total number of work nodes to be stored. A method of extending a working node is provided.

Further, in the determining of the data partitioning information, the data capacity and the storage target work node are determined by dividing and setting the key value list corresponding to the number of work targets to be stored in consideration of the number of data corresponding to the key values of the partitioned table. Provided is a method for extending a work node in a hybrid P2P cluster system, characterized in that.

Further, in the determining of the data partitioning information, determining a data capacity and a storage target work node by dividing the range of the hash value calculated by applying the key value of the partitioned table to the hash function by the number of work nodes to be stored. A method of extending a work node in a hybrid P2P system cluster system is provided.

In addition, in the fourth step, the coordinator server generates data division log information including the number of new operation nodes, the operation node number to be divided, and the operation node number to be stored in response to the load balancing operation, and manages the metadata as work nodes. Synchronize with, and the work node checks the data partitioning log information of the search query table corresponding to the key value for the data search query including the specific key value from the client, and if the search target data is the data before the partition, the hash value If a search target work node number corresponding to the remaining value divided by the number of existing work nodes prior to adding a new work node is acquired, and the obtained search target work node number is a target node number to be divided, the target of storing the data split log information Provided is a method for extending a work node in a hybrid P2P cluster system, characterized in that a data search query is propagated to a work node corresponding to a work node number to provide a service corresponding to a client request.

According to the present invention, it is possible to perform a task node extension process by automatically recognizing a new task node that is newly added, while providing the service to the client as much as possible without stopping the currently operating system.

Further, according to the present invention, the data stored in the existing work node is distributed to other work nodes to prevent overload generated in a specific work node, so that the cluster can be operated more efficiently.

1 is a diagram illustrating a schematic configuration of a hybrid P2P system cluster system to which the present invention is applied.

2 is a view for explaining the configuration of the coordinator server 100 and the work node 200 shown in FIG.

3 is a flowchart illustrating a method for extending a work node in a hybrid P2P cluster system according to a first embodiment of the present invention.

4A to 4C are views schematically showing the operation of FIG. 3 and showing a process in which the original 1 table stored in the work node 1 is dividedly stored in the new work node 1.

5 is a view for explaining in more detail the load balancing operation (ST500) of FIG.

6A to 6C are diagrams for explaining a method (ST520) for determining segmentation information in FIG. 5;

FIG. 7 is a diagram illustrating data split log information managed as metadata for the load balancing operation ST500 of FIG. 3.

FIG. 8 is a diagram for explaining a query operation for a specific key value of a hash-based partition table using partition log information shown in FIG. 7.

The configurations shown in the embodiments and drawings described in the present invention are only preferred embodiments of the present invention, and do not represent all of the technical spirit of the present invention, so the scope of the present invention is the embodiments and drawings described in the text It should not be construed as limited by. That is, since the embodiments can be variously changed and have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing technical ideas. In addition, the purpose or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such an effect, and the scope of the present invention should not be understood as being limited thereby.

All terms used herein have the same meaning as generally understood by a person skilled in the art to which the present invention pertains, unless otherwise defined. The terms defined in the commonly used dictionary should be interpreted to be consistent with meanings in the context of related technologies, and cannot be interpreted as having ideal or excessively formal meanings that are not explicitly defined in the present invention.

Referring to FIG. 1, a hybrid P2P system cluster system to which the present invention is applied is configured to cluster a cluster with a coordinator server 100 and a plurality of work nodes 200 to simultaneously process tasks requested by a client 300 in a distributed environment. Make up. At this time, the coordinator server 100 may be configured by adding a replica server. In addition, although only one client 300 is shown, it may be composed of a plurality of clients 300.

The coordinator server 100 manages the work nodes 200 and manages the work nodes 200 by synchronizing metadata of each work node 200 through a cluster operation process.

The work node 200 distributes and processes the work requested by the client 300 using metadata stored in the local system.

That is, the hybrid P2P type cluster system performs a connection request from the client 300 to the coordinator server 100, and the coordinator server 100 uses the information of a plurality of work nodes 200 to connect to the work node 200. Is determined and provided to the client 300. The client 300 connects to the work node 200 received from the coordinator server 100 to perform a work request, and the specific work node 200 connected to the client 300 is of the same table applied from the client 300 Data is distributed and managed in a number of other work nodes 200.

At this time, a phenomenon in which data is intensively stored in an arbitrary work node 200 may occur, and in the present invention, a new work node may be newly added to operate a cluster.

FIG. 2 is a view for explaining the configuration of the coordinator server 100 and the work node 200 shown in FIG. 1.

Referring to FIG. 2, the coordinator server 100 includes a cluster manager 110, a setting information storage unit 120, and a DBMS engine 130, and the DBMS engine 130 includes a metadata table 131. .

The cluster manager 110 drives the DBMS engine 130 of the local system and obtains connection information of the work nodes 200 using the cluster setting information, and copies the cluster setting information to the work nodes 200, Remotely drive the work nodes (200). At this time, the cluster manager 110 starts the cluster agent 210 of the work node 200, and the DBMS engine 300 of the local system is driven simultaneously with the operation of the cluster agent 210, so that the work node 200 is It becomes an operation state capable of providing a service. In this situation, the DBMS engine 220 of the work node processes the work in a P2P manner with the DBMS engine 220 of the other work node 200.

In addition, the cluster manager 110 transmits the cluster configuration file to the work node 200, and when the cluster configuration is changed, the cluster configuration file is modified to automatically distribute to all the work nodes 200 when the cluster is stopped and restarted. Then, the cluster manager 110 performs a connection or disconnection function with the work node 200 at the request of the client 300, and synchronizes and shares metadata with the work node 200.

In addition, the cluster manager 110 grasps data distribution information of the work nodes 200 and, if necessary, performs an expansion task for the new work node 200 and performs a load balancing task using the expanded new work node. .

The configuration information storage unit 120 stores coordinator server connection information, connection information of work nodes, and cluster configuration information including the number of copies.

The DBMS engine 130 manages metadata for processing a user request operation in a peer-to-peer manner using only information from a local system including status information of a work node.

Meanwhile, the work node 200 includes the cluster agent 210 and the DBMS engine 220.

The cluster agent 210 drives the DBMS engine 220 of the local system, checks the status information of the work node 200 using the cluster manager 110 and the heartbeat protocol, and then displays the status information. Real-time reflection in metadata.

The DBMS engine 220 generates table information for the table creation request requested by the client 300 and transmits the result of the table creation processing to the coordinator server 100. Then, the table information is shared with the DBMS engine 220 of other work nodes 200.

In addition, the DBMS engine 220 includes a metadata storage 221 that stores metadata including table information and status information of a work node.

Hereinafter, a method for extending a work node according to the present invention will be described using the hybrid P2P type cluster system illustrated in FIGS. 1 and 2.

3 is a flowchart illustrating a method of extending a work node in a hybrid P2P cluster system according to a first embodiment of the present invention. In FIG. 3, an operation of distributing and storing data stored in an existing work node as a new work node is described.

First, the cluster manager 110 monitors the status of the existing work nodes 200, and when a preset work node addition condition is satisfied, the cluster manager 110 informs the manager, and the new work node 200-1, the existing work node by the manager In order to distinguish it from, the description is given by adding "-1" to the reference number related to the existing work node). At this time, the administrator performs a request to add a new work node 200-1 to the cluster manager 110, which is input by the manager directly through the input means (not shown) of the cluster manager 110, or a separate manager terminal. Can be done through

The cluster manager 110 rejects a new connection request from the client 300 when the new work node addition request information is received from the outside (ST100), and provides an ongoing service through the existing work node 200 without interruption. Set to the work node extension mode (ST200).

In step ST200, the cluster manager 110 adds a new work node until the number of new work nodes 200-1 to be added is a multiple of “1” greater than the preset copy number K (K+1). For (200-1), only access information is managed.

In addition, when the number of new work nodes is added to the number of (K+1) new work node information, the cluster manager 110 adds new work nodes newly added to the currently connected cluster agents 210. Request to add information for (200-1). At this time, the cluster agent 210 of the existing work nodes 200 requests this from the corresponding DBMS engine 220, and the DBMS engines 220 request the client 300 to change the set metadata such as a table creation request. In the case of rejection, only the new work node 200-1 information is added to the metadata. Then, the cluster agent 210 notifies the cluster manager 110 of the coordinator server 100 that information about the new work node 200-1 has been added. The cluster manager 110 updates metadata based on the completion of all metadata change operations, such as changing the current DDL/work node status information.

Subsequently, the cluster manager 110 drives the new work node 200-1 and sets the new work node 200-1 to a standby state (ST300 ). That is, the cluster manager 110 remotely copies metadata to the DBMS engine 220-1 of the new work node 200-1, and drives the cluster agent 210-1 of the new work node 200-1. The cluster agent 210-1 of the new work node 200-1 starts the DBMS engine 220-1, and then displays the operation setting completion information indicating that it is ready to participate in the P2P cluster system. To send.

Subsequently, if it is confirmed that the current ongoing service is completed from all the existing work nodes 200 except for the search service currently being performed (ST400), all the work nodes including the new work node 200-1. Perform load balancing operation between (ST500).

That is, in the ST400 step, the cluster manager 110 transmits load balancing management preparation request information to the cluster agent 210 of the existing work node 200, and receives load distribution management preparation request information from the cluster agent 210. In the received DBMS engine 300, the request service for adding/updating/deleting newly requested data from the client 300 is rejected, and providing a service without interruption to the newly requested data search request, DBMS engine 300 When the service for requesting to add/update/delete data currently being performed is completed, the method includes transmitting progress service completion information to the cluster manager 110 informing that it is ready to perform a load balancing operation.

Meanwhile, the load balancing operation of the ST500 step will be described in detail with reference to FIGS. 3 to 6 below.

Subsequently, when the load load balancing operation for all the work nodes is completed in step ST500, the cluster manager 110 updates metadata in response to the result of performing the load balancing operation, and the updated metadata includes new work nodes. Synchronize with all work nodes (ST600).

That is, the cluster manager 110 from the cluster agent 210 of the existing work node 200 that moved the data in step ST500 and the cluster agent 210-1 of the new work node 200-1 to which the data was moved. Data segmentation information is received, and metadata is updated based on this.

Then, the cluster manager 110 requests metadata update to all work nodes 200 and receives the update result for metadata from each work node 200.

At this time, the DBMS engine 220 of the existing work node 200 to which the currently connected client 300 exists to divide the data stops receiving the SELECT query request from the client 300 and completes processing for the currently executing SELECT query After that, the metadata is updated, and the moved data is deleted from the corresponding table, and the processing result is transmitted.

In addition, in the case of a work node that does not perform data movement among the existing work nodes 200, metadata is updated and the result is transmitted to the cluster manager 110.

Then, the DBMS engine 220-1 of the new work node 200-1 stores data moved from the DBMS 220 of the existing work node 200, updates metadata, and displays the results to the cluster manager 110. ).

As described above, in the present invention, while the load balancing operation is in progress, the additional operation for the new data is stopped, but the service for requesting data retrieval can be provided without interruption.

When the above operation is performed, the metadata is reflected data distribution information for all work nodes including the new work node.

In the above state, the cluster manager 110 sets the new work node 200-1 to an operation state, updates the cluster setting information, and configures the new work node 200-1 as a cluster to the client. Service is resumed (ST700).

That is, the cluster manager 110 registers the setting information for the new work node 200-1 in the setting information storage unit 120 and sets the operation mode of the current coordinator server 100 to the normal mode to create a new work node. All work nodes, including, are organized into clusters.

In addition, the cluster manager 110 to the cluster agent 210 of all the work nodes 200 including the new work node 200-1 in order to operate the new work node 200-1 as a member of the hybrid P2P cluster normally. Request to change the status information of the new work node (200-1) to a normal state. Accordingly, the new work nodes 200-1 operate as normal members, and at the same time, the DBMS engine 300 of all work nodes 200 sets the load distribution mode to the operation mode, and requests all of the clients 300 from the point in time. To receive.

4A to 4C are diagrammatic views of the operation of FIG. 3, and a process in which the original 1 table stored in the work node 1 is dividedly stored in the new work node 1 is illustrated.

In FIG. 4A, in a hybrid P2P cluster system composed of six existing work nodes (1 to 6), when the number of copies is two, the data distribution state of each work node 200 and three (2+1) new work nodes. The configuration in which (1~3) is added is illustrated, and FIG. 4B shows a process in which the original 1 data of the work node 1 is distributed to the newly added new work nodes 1 to 3, and FIG. 4C shows the new operation. The process of copying a replica from node 1 to new work node 2 and new work node 3 is shown.

That is, according to the present invention, the service for the client is normally performed until the reception of the connection information for the new work node is completed, and after the reception of the connection information for the new work node is completed, the connection to the client connected to the existing work node is performed. In addition, it performs load balancing processing for all work nodes including the new work node at the time when all services except the search service are terminated among the services currently in progress in the existing work node. Until the metadata and cluster configuration information are updated, the ongoing search service is continuously performed.

Next, the load balancing operation ST500 of FIG. 3 will be described in more detail with reference to FIG. 5.

First, the cluster manager 100 collects data distribution status information of the existing work node 200 to perform the load balancing operation (ST510). At this time, the cluster manager 110 requests statistical information of the corresponding work node, that is, data loading status information, to the cluster agents 210 of each existing work node 200, and each cluster agent 210 receives the corresponding DBMS engine ( The data loading status information of 220) is collected and transmitted to the cluster manager 100. Here, the data loading status information may include a table unit size, the total memory size, the current memory usage, etc. loaded on the corresponding work node 200, and the present invention is not limited thereto. According to the present invention, if necessary, more operation node operation status information can be collected.

Subsequently, the cluster manager 110 includes a partition target work node to move data using the data load status information of the existing work nodes 200, a data capacity of the partition decision data, and a storage target work node to store the partition decision data. Data division information to be determined is determined (ST520).

At this time, the cluster manager 110 calculates the average data capacity by dividing the total data capacity stored in all the existing work nodes 200 by the sum of the number of the existing work nodes 200 and the new work nodes 200-1. By allowing an error of a predetermined value or more relative to the data capacity, the existing work nodes 200 exceeding this can be set as a work node to be divided.

In addition, the cluster manager 110 performs a series of data distribution processing for moving the partition decision data stored in the partition target work node to the storage target work node based on the data partition information (ST530). At this time, the cluster manager 110 performs a data partition request to the cluster agent 210 of the work node 1 to be partitioned, and the cluster agent 210 is transmitted to the DBMS engine 220. The DBMS engine 220 partitions the determined data table and moves it to the storage target work node. Here, the DBMS engine 220 performs a SELECT query on all the data to be moved by using the partitioning information determined for the corresponding table, and temporarily stores all records corresponding to the SELECT result in the temporary target storage node (not shown). To reduce the load caused by data movement, the INSERT operation is performed on all records stored in the temporary storage (not shown) of the work node to be partitioned by the storage target work node.

In addition, when an INSERT query is processed at any one storage target work node, the DBMS engine 220 of the corresponding storage target work node automatically creates a replica of data that is inserted into the replica work node of the corresponding work node (ST540).

For example, as shown in FIG. 4C, the original 1 of the work node 1 is divided and moved to the new work node 7, and the DBMS engine 220 of the work node 7 automatically generates another new work node 8, 9) Create a replica.

Meanwhile, FIGS. 6A to 6C are diagrams for explaining a method (ST520) for determining partitioning information in FIG. 5, which divides a specific table in order to distribute data and store it in a plurality of work nodes 200 in the present invention. Different methods are illustrated. 6A to 6C illustrate a case in which the work node to be divided is "6".

First, as illustrated in FIG. 6A, the data capacity and the storage target work node may be determined by dividing the key value range of the partitioned determined table by the total number of storage target work nodes. For example, data corresponding to a key value of 0 to v1 is stored in work node 1, and data corresponding to a v1 to v2 section is stored in work node 2.

In addition, as illustrated in FIG. 6B, the data capacity and the storage target work node can be determined by dividing and setting the key value list corresponding to the number of storage target work nodes in consideration of the number of data corresponding to the key values of the partitioned table. . For example, data corresponding to the key values "abc1", "bcb1", and "cde1" is stored in the work node 1, and data corresponding to "abc2", "bcb2", and "cde2" are stored in the work node 2.

In addition, as shown in FIG. 6C, by dividing the range of the hash value calculated by applying the key value of the partitioned table to the hash function by the number of work nodes to be stored, the data capacity and the work node to be stored can be determined. For example, data corresponding to a hash value of 0 to v1 is stored in work node 1, and data corresponding to a section of v1 to v2 is stored in work node 2.

On the other hand, in the present invention, the cluster manager 110 in response to a load balancing operation for a table partitioned based on a hash, the number of new work nodes, a work target to be partitioned, a work target to be stored, a timestamp, and the like as shown in FIG. By generating data partitioning log information including partitioning key information and managing it as metadata, it is possible to synchronize with work nodes. Here, "timestamp" is for synchronizing and managing global time for all work nodes, and the system automatically grants a partition key when data is input. The "segmentation key" is an arrangement of key values included in the work node to be divided, and means a key value used in the division.

At this time, in the present invention, data of all work nodes must be searched to process a region query for a hash-based partition table in each work node, but all work nodes are searched for data for a specific key value using split log information. It can be done not to search.

Referring to FIG. 8, first, when a data search query corresponding to a specific key value is requested (ST910), the log information of the search target table is checked to determine whether the search target data is data after the split (ST920).

In step ST920, in the case of data after the table to be searched is divided, the search query is propagated to the work node corresponding to the remaining value obtained by dividing the hash value by the number of all work nodes including the new work node (ST930).

Meanwhile, in step ST920, when the search target table is data before partitioning, work node information corresponding to the remaining value obtained by dividing the hash value by the number of existing work nodes before adding a new work node is obtained (ST940).

Then, it is checked whether or not the target node is a partition based on whether or not the partitioning task log for the work node information obtained in step ST940 exists (ST950).

In step ST950, when the work node obtained in step ST940 is not a target node to be split, the search query is propagated to the obtained work node (ST960).

On the other hand, if the work node obtained in step ST940 in step ST950 is a target node to be split, it is acquired as job node information to propagate the storage target job node information in the corresponding split job log, and a search query is propagated to the obtained job node. (ST970).

That is, according to the present invention, the search query can be processed more quickly by propagating the search query only to the work node loaded with the search target data.

This operation is applied when a hash-based partitioning method is applied by adding a work node while the hybrid P2P cluster system is running. In the present invention, before the cluster system is stopped and restarted for query processing performance, a hash-based whole table For, it can be implemented to perform data movement and metadata change operations collectively by reflecting the changed number of work nodes.

Claims

In the method of extending a work node in a hybrid P2P cluster system that configures a cluster with a plurality of work nodes and a coordinator server that manages the work nodes and simultaneously processes work requested by a client in a distributed environment

When the coordinator server receives new work node addition request information from the outside, the new connection request from the client is rejected, and the ongoing service is provided to the client connected to the existing work node without interruption, and new work node access information. A first step of updating the metadata by reflecting,

The coordinator server remotely copies the metadata updated in the first step to the new work node, drives the new work node, and sets the status of the new work node to the standby state,

When the coordinator server determines that all ongoing services except for the search service are completed for all clients currently connected from all existing work nodes, a third step of performing load balancing among all work nodes including the new work node,

The coordinator server updates the metadata in response to the load balancing work result, and the fourth step of synchronizing the updated metadata with all work nodes including the new work node, and

The coordinator server changes the status information of the new work node to an operation state for all work nodes including the new work node, operates all work nodes including the new work node as a cluster, and restarts the service for the client. A method of extending a work node in a hybrid P2P cluster system, characterized in that it is configured to include.
According to claim 1,

In the third step, the existing work nodes reject the service for requesting to add/update/delete newly requested data from the currently connected client, and provide the service without interruption to the newly requested data search request, and the data currently being performed. In the hybrid P2P cluster system, characterized in that it comprises a step of transmitting progress service completion information informing that it is ready to perform a distributed operation to the coordinator server when the add/update/delete request service is completed. How to expand the working node.
According to claim 1,

The existing work nodes are configured to create a copy corresponding to a preset number and store them in different work nodes,

In the first step, the coordinator server manages only the access information of the work node until the number of new work nodes is a multiple of the number (K+1) that is "1" greater than the preset number of copies (K), and ( When the addition of new work node information corresponding to multiples of K+1) is completed, the request for a new connection from the client is rejected, and a request is made to provide the service for the client currently connected to the existing work node without interruption. A method of extending a work node in a hybrid P2P cluster system, characterized in that it is configured to include.
According to claim 1,

In the third step, the coordinator server collects data loading status information including the table unit size of the existing work nodes, the total memory size, and the current memory usage, and collects data based on the data loading status information for each existing work node. Determining data partitioning information including a work target to be partitioned to be moved, a data capacity of the partitioning decision data, and a storage target work node to store the partitioning decision data, and moving data between work nodes based on the determined data partitioning information A method of extending a work node in a hybrid P2P cluster system, comprising steps.
The method of claim 3 or 4,

The method of extending a work node in a hybrid P2P cluster system, characterized in that in the fourth step, the new work node duplicates data stored in the mobile and stores the data in another new work node.
According to claim 4,

The step of determining the data partitioning information is calculated by dividing the total data capacity stored in all existing work nodes by the coordinator server by the sum of the number of existing work nodes and new work nodes, and calculating an average data capacity, and more than a predetermined value compared to the average data capacity. A method of extending a work node in a hybrid P2P cluster system, characterized by setting an existing work node exceeding this by allowing an error as a work node to be divided.
The method of claim 4 or 6,

In the determining of the data partitioning information, a work in a hybrid P2P cluster system is characterized by determining a data capacity and a storage target work node by dividing the key value range of the partitioned determined table into the total number of work targets to be stored. Node expansion method.
The method of claim 4 or 6,

The determining of the data partitioning information may include determining a data capacity and a storage target work node by dividing and setting the key value list corresponding to the number of work targets to be stored in consideration of the number of data corresponding to the key values of the partitioned table. A method of extending a work node in a hybrid P2P-type cluster system.
The method of claim 4 or 6,

The determining of the data partitioning information is characterized by determining a data capacity and a storage target work node by dividing the range of the hash value calculated by applying the key value of the partitioned table to the hash function by the number of work nodes to be stored. A method of extending a work node in a hybrid P2P-type cluster system.
According to claim 1,

In the fourth step, the coordinator server generates data split log information including the number of new work nodes, the target work node number to be divided, and the work target number to be stored in response to load balancing work, manages it as metadata and synchronizes it with work nodes. and,

The work node checks the data partitioning log information of the search query table corresponding to the key value for a data search query including a specific key value from the client, and if the search target data is data before partitioning, a new work node is assigned a hash value. To obtain the search target work node number corresponding to the remaining value divided by the number of existing work nodes before adding, and if the obtained search target work node number is the split target node number, to the target data storage target number of the data split log information A method of extending a work node in a hybrid P2P cluster system, characterized in that a service corresponding to a client request is provided by propagating a data search query to a corresponding work node.