WO2019189963A1 - Système de gestion de grappes réparties et procédé associé - Google Patents

Système de gestion de grappes réparties et procédé associé Download PDF

Info

Publication number
WO2019189963A1
WO2019189963A1 PCT/KR2018/003697 KR2018003697W WO2019189963A1 WO 2019189963 A1 WO2019189963 A1 WO 2019189963A1 KR 2018003697 W KR2018003697 W KR 2018003697W WO 2019189963 A1 WO2019189963 A1 WO 2019189963A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
work
cluster
information
work node
Prior art date
Application number
PCT/KR2018/003697
Other languages
English (en)
Korean (ko)
Inventor
김학철
강조현
박정도
진홍석
한혁
진성일
Original Assignee
주식회사 리얼타임테크
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 리얼타임테크 filed Critical 주식회사 리얼타임테크
Publication of WO2019189963A1 publication Critical patent/WO2019189963A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • G06F9/3891Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/161Computing infrastructure, e.g. computer clusters, blade chassis or hardware partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present invention relates to a distributed cluster management system and a method thereof, and more particularly, the working nodes constituting the cluster can simultaneously process a plurality of working nodes in a distributed environment by simultaneously working with a client request using only meta information.
  • a distributed cluster management system and method thereof are provided.
  • Cluster system refers to the joint processing of a task through a single set of networked computing. The biggest difference is that if a supercomputer connects multiple CPUs internally and each CPU is designed to access a common virtual memory, the cluster system implements each connection through the network.
  • Cluster systems have the advantage of being less expensive to build than supercomputers because they can be configured using common servers. You can build the system according to your needs, and increase the number of connected servers to improve performance easily, but it is more difficult to maintain than a single system, and the performance of the system affects the network performance. There is a downside to receiving it.
  • Such a cluster system is a method of expanding a system by connecting homogeneous or heterogeneous nodes to a network, and individual systems in the cluster are called servers or nodes.
  • the load balancing cluster system is composed of a load distributor that serves to allocate a request from a client to another node in the cluster, and a node that provides a service for the request.
  • the load divider selects an appropriate node based on a specific algorithm and processes the request. In other words, one node is not divided by several nodes and processed by the distributed algorithm. Therefore, unlike a parallel system, all flooding requests are evenly distributed across multiple nodes, allowing for load balancing.
  • a master-slave distributed cluster management system is composed of a master node managing a cluster system and a slave node for job processing.
  • a peer-to-peer distributed cluster management system performs a job processing operation at an equal level. Do it.
  • the master node In a conventional load balancing cluster system, the master node is applied in a distributed environment, and the slave node performing the work processing passes through the master node, resulting in a decrease in processing performance, and when the master node fails, the entire cluster needs to be restarted. There is a problem.
  • the load balancing cluster system applying the conventional P2P scheme has a problem that additional work for selecting the master node among the work nodes is required when performing the task for cluster management.
  • a cluster consists of a work node processing a work and a cluster management node managing the work nodes, and the work nodes use only meta information by synchronizing and managing information required for work processing of each work node through the cluster management node.
  • the present invention provides a distributed cluster management system and a method for cooperatively processing a plurality of work nodes simultaneously in a distributed environment.
  • a distributed cluster management system includes: at least one job node for parallel processing a job requested by a client using meta information required for distributed job processing; And at least one cluster management node managing the work nodes in group units, and performing a cluster operation process to synchronize the meta information to manage work nodes connected to the work nodes.
  • the cluster management node extracts device connection information of the cluster management node and the work node constituting the cluster using a cluster configuration file, and works with the cluster management node using the device connection information through the cluster operation process.
  • a cluster operation manager running the node A cluster manager which acquires state information of a connected work node using the cluster configuration file and then performs an integrated management function for the work node; And a local repository for storing cluster management information including the cluster configuration file.
  • the meta information includes connection information of each node for connecting through a network
  • the cluster configuration file includes device connection information for connecting a cluster management node and a work node constituting a cluster.
  • the cluster operation manager transmits the cluster configuration file to a work node belonging to a management group managed by each cluster management node, and modifies the cluster configuration file to distribute to a work node belonging to the management group when the cluster configuration is changed. It is characterized by.
  • the cluster manager may include a work node connection manager that performs a connection or disconnection function with a work node at the request of the client; A cluster information manager which extracts connection information between a cluster management node and a work node constituting the cluster from the cluster configuration file, and transmits a cluster configuration file to the work node according to whether the cluster configuration file is changed; A work node manager for collecting state information of connected work nodes of a management group managed by the cluster management node; And a synchronization manager for synchronizing and sharing meta information of work nodes belonging to the management group.
  • the work node may include a cluster agent which checks a state of the work node using the cluster manager and the heartbeat protocol and reflects state information in meta information; And generating table information by interworking with the DBMS engine according to the table creation request of the client, and transmitting the table generation result to a cluster management node, and sharing table information with other work nodes belonging to the management group of the cluster management node. And a DBMS engine for storing state information including meta information of the work node.
  • the cluster agent may include: a node state information collector for collecting state information of the work node; And a node information synchronization manager for synchronizing the table information and state information of a work node connected with the table information.
  • the DBMS engine may include: a meta information management engine managing meta information including the table information and state information of a work node; A query processing engine configured to receive a table creation request of the client, retrieve state information of a work node, and process a table generation processing result or table generation processing error information; And a storage engine for storing table generation processing result or table generation processing error information of the client.
  • the distributed cluster management method is a distributed cluster management method performed by a distributed cluster management system in which a plurality of nodes in a distributed environment process the work requested by the client, the distributed cluster management method comprising: A cluster consists of a work node and at least one cluster management node managing the work nodes in groups, and managing a work node connected to the work node through a cluster operation process, and device connection information for each node of the cluster. A first step of storing a cluster configuration file including a; When requesting a job processing of the client, obtains connection information of the cluster management node currently running by the cluster operation process, and extracts local information including connection information of a work node belonging to a management group managed by the cluster management node.
  • a second step of doing Transmitting the cluster configuration file to a plurality of work nodes connected to the cluster management node when the local information is mapped to the connection information of the cluster configuration file;
  • the third step is characterized in that the cluster operation process is terminated when the local information does not map the access information of the cluster configuration file.
  • the second step includes the client requesting to connect to the cluster management node;
  • the cluster management node retrieving state information of a work node, selecting a work node to connect to the client using the state information of the work node, and then transmitting the selected work node information to the client; And requesting a job from the client after connecting to the DBMS engine of the work node using the work node information.
  • the state information of the work node is work state information including the number of clients connected to each work node, and the work node information includes device connection information for connecting to the work node.
  • the fourth step may include: requesting, by the cluster management node, a heartbeat message to the work node at a predetermined time interval;
  • the work node transmitting a heartbeat message according to a heartbeat message request to the cluster management node within a predetermined transmission time;
  • the cluster management node determines that a work node that does not receive a heartbeat message within a predetermined transmission time is determined to be an error-producing work node and records it as error information, and errors in meta information of DBMS engines of other work nodes in its management group. Reflecting information in real time; And collecting, by the cluster management node, the status information of the work nodes receiving the heartbeat message within a preset transmission time, and synchronizing and sharing the status information of all work nodes belonging to its management group. do.
  • the first step may further include modifying and automatically distributing a cluster configuration file for a cluster management node when the cluster configuration is changed.
  • the cluster operation process may include receiving the table generation query statement; Step 5-2, in which the first work node that first received the table creation query retrieves state information of the work node included in the table creation query; When there is no error in the state information of the work node, the DBMS engine of the first work node transmits the table creation query reception state to all work nodes in the group, and generates the table in the cluster management node managing the work node.
  • the table generation query statement may include work node information for storing a record according to a condition of a predetermined column.
  • the step 5-3 further includes returning table generation processing error information when the state information of at least one of the work nodes for storing the record is error information. .
  • the cluster management node stores reconnection synchronization information including identification information, time information, and table creation request information of the error generating work node at the time point of returning the table generation processing error information. do.
  • the cluster management node determines that a table creation request processing error state exists when there is a work node to which the table creation process result message is not transmitted within a preset time, and creates a table in all work nodes.
  • the method may further include transmitting error information to the first work node after deleting the request processing result.
  • the cluster management node is an error-producing work node in which an error occurs at the time when the first work node receives the table generation query, and the error-producing work node is configured to correct the error.
  • the method may further include reconnecting to perform reconnection by synchronizing the table information using reconnection synchronization information.
  • the error generating task node may update meta information of the DBMS engine after the error is corrected.
  • the reconnecting step may include: transmitting, by the error generating work node, a reconnection request message to the cluster management node; Transmitting, by the cluster management node, table generation log information and state information of a currently running work node to the failed work node; The error-producing work node updates the state information of another work node, sequentially performs table creation by using the table generation log information, updates table information, and then sends a table creation processing result message to the cluster management node. Making; Requesting, by the cluster management node, an update on status information of a reconnected work node reconnected after error correction to another work node; And updating the status information of the work node other than the work node except for the reconnection work node.
  • the work nodes can improve work processing performance by processing work requested by a client using only meta information, and restart the entire cluster when an error occurs in a specific work node during cluster operation.
  • the error-corrected job node can be reconnected.
  • FIG. 1 is a diagram illustrating a distributed cluster management system according to an embodiment of the present invention.
  • FIG. 2 is a diagram illustrating the configuration of the cluster management node and the work node of FIG. 1.
  • FIG. 3 is a flowchart illustrating a distributed cluster management method according to an embodiment of the present invention.
  • FIG. 4 is a flowchart illustrating a client connection process according to an embodiment of the present invention.
  • FIG. 5 is a flowchart illustrating a process of synchronizing state information of a work node of FIG. 4.
  • FIG. 6 is a flowchart illustrating a process of processing a table creation request of a client according to an embodiment of the present invention.
  • FIG. 7 is a flowchart illustrating a process of reconnecting an error generating work node according to an embodiment of the present invention.
  • FIG. 1 is a diagram illustrating a distributed cluster management system according to an embodiment of the present invention.
  • the distributed cluster system 100 configures a cluster with at least one cluster management node 200 and a plurality of work nodes 300 to simultaneously process a work requested by a client 400 in a distributed environment. .
  • the at least one cluster management node 200 manages the work nodes 300 as a group, and proceeds with a cluster operation process to synchronize meta information of each work node 300 to manage the work nodes 300 connected to the work node 300. do.
  • the meta information includes connection information of the work node 300.
  • meta information may be referred to as data about data, and generically refers to information about data that enables efficient access and management of data or data sets.
  • meta information includes basic contents, qualitative factors, document structure, implementation techniques, reference information, narration information about data contents, data access, acquisition, distribution, and utilization, creator, and administrator information. Refers to a data set or information that is widely described.
  • At least one job node 300 simultaneously processes the job requested by the client 400 using meta information necessary for job processing.
  • the cluster management node 200 performs a cluster management function for the work node 300 of the management group managed by the cluster management node 200, and the work node 300 works with the neighboring work node 300 belonging to the same group as the client 400. Parallelize the tasks requested in
  • the work node 300 may work in conjunction with the work nodes 300 connected to the cluster management node 200 other than the cluster management node 200 managing its group.
  • FIG. 2 is a diagram illustrating the configuration of the cluster management node and the work node of FIG. 1.
  • the cluster management node 200 includes a cluster operations manager 210, a cluster manager 220, and a local repository 230.
  • the cluster operation manager 210 extracts device connection information (for example, connection information such as an IP address) constituting the cluster by using the cluster configuration file, and proceeds with the cluster operation process through the device connection information. Remotely drives the cluster management node 200 and the work node 300.
  • device connection information for example, connection information such as an IP address
  • the cluster operation manager 210 transmits the cluster configuration file to the work node 300 belonging to the management group of the cluster management node 200, and modifies the cluster configuration file when the cluster configuration is changed to all the work nodes 300 belonging to the management group. To be automatically deployed).
  • the cluster manager 220 performs an integrated management function for the work node 300 of the group managed by the cluster manager 220.
  • the cluster manager 220 uses the cluster configuration file to obtain status information of the connected work node 300.
  • the cluster manager 220 includes a work node connection manager 221, a cluster information manager 222, a work node manager 223, and a synchronization manager 224.
  • the work node connection manager 221 performs a connection or disconnection with the work node 300 at the request of the client 400.
  • the cluster information manager 222 extracts connection information between the cluster management node 200 and the work node 300 constituting the cluster from the cluster configuration file, and sets the cluster on the work node 300 according to whether the cluster configuration file is changed. Run by transferring the file.
  • the work node manager 223 collects state information of the work node 300 connected to the management group.
  • the synchronization manager 224 synchronizes and shares meta information of the work node 300 belonging to the management group of the cluster management node 200.
  • the work node 300 includes a cluster agent 310 and a DBMS engine 320.
  • the cluster agent 310 checks the state information of the work node 300 using the cluster manager 220 and the heartbeat protocol and reflects the state information in meta information in real time.
  • the cluster agent 310 includes a node state information collector 311 for collecting state information of a work node and a node information synchronization manager 312 for synchronizing table information and state information of a work node connected with the table information.
  • the DBMS engine 320 receives the table creation request requested by the client 400, generates table information in association with the DBMS engine 320, and then transmits the table generation processing result to the cluster management node 200, and manages the cluster.
  • the table information is shared with other work nodes 300 belonging to the management group of the node 200 and stores meta information, state information of the work node, error information, and the like.
  • the DBMS engine 320 receives the table creation query of the meta information management engine 321 and the client 400 managing the meta information including the table information and the status information of the work node, and the state of the work node included in the query.
  • the client 400 connects to the DBMS engine 320 of the work node using the device connection information (for example, a network address) received from the cluster manager 220.
  • the application 420 requests a job to the DBMS engine 320 through the cluster connection manager 410, and the distributed client management system 100 cooperates with a plurality of job nodes 300 to request jobs from the client 400. To process.
  • FIG. 3 is a flowchart illustrating a distributed cluster management method according to an embodiment of the present invention.
  • the cluster management node 200 drives the cluster operation process by the cluster operation manager 210, extracts the connection information of the work node 300, and currently runs the cluster.
  • the access information of the management node 200 is obtained as local information (S310).
  • the access information refers to address information necessary for accessing through a network such as an IP address or an infiniband address.
  • the cluster operation manager 210 extracts the cluster management node 200 address information and the address information of the work nodes 300 constituting the cluster from the cluster configuration file. If the local information of the currently running device is not mapped to the connection information of the cluster management node 200 of the cluster configuration file 220, the driving of the distributed cluster system 100 is terminated. However, when the local information of the currently running device is the same as the connection information of the cluster management node 200 of the cluster configuration file 220, the cluster configuration file 220 is transmitted to all the work nodes 300 connected thereto. (S320, S330)
  • the cluster operation manager 210 operates the cluster manager 220 of the currently running cluster management node 200 and the local storage 230 to interoperate with the cluster agent 310 of the work node 300 connected to the client 400.
  • the cluster management is performed by synchronizing the information required for processing the job requested in step S340.
  • the cluster operation manager 210 is a cluster agent 310 of a work node connected to the currently running cluster management node 200.
  • the DBMS engine 320 process by remotely processing the work requested by the client 400 in a distributed environment at the same time (S350).
  • the cluster operation process may improve convenience of distributed cluster management by automatically modifying and distributing the cluster configuration file 220 only in the cluster management node 200 when the cluster configuration is changed.
  • the distributed cluster management system 100 drives the plurality of cluster management nodes 200 and the work nodes 300, and allows the client 400 to access the work node connection manager 221 to perform work processing. .
  • the client 400 is connected using only meta information of the cluster management node 200.
  • FIG. 4 is a flowchart illustrating a client connection process according to an embodiment of the present invention.
  • the client 400 requests a connection connection to the cluster management node 200 (S410).
  • the connection request of the client is processed by the cluster manager 230 of the cluster management node 200, and the client
  • the cluster manager 230 that receives the connection request of the server requests status information of the work node 300 from the cluster agent 310 of the connected work nodes 300.
  • the cluster agent 310 retrieves the status information of the work node in cooperation with the DBMS engine 320 and returns the number of clients 400 currently connected to the work node 300.
  • Cluster manager 230 Returns the work node 300 information to be connected to the client 400 using the state information collected from the plurality of work nodes 300 (S430).
  • the cluster manager 220 is based on selecting a work node having a minimum number of clients currently connected, but a selection condition of the work node may be variously set.
  • the cluster connection manager 410 of the client 400 connects to the DBMS engine 320 of the work node 300 by using the device connection information received from the cluster manager 230, and the application program 420 of the client 400. ) Requests a job to the DBMS engine 320 through the cluster connection manager 410, the distributed client system 100 is a plurality of work nodes 300 to work in cooperation with the requested job (S440, S450).
  • the DBMS engine 320 operating in each work node 300 synchronizes and shares information necessary for processing the work requested by the client 300.
  • the main synchronization information is table information of the DBMS engine 320 and state information of the work nodes 300.
  • the state information of the work nodes 300 is collected by the work node manager 233 of the cluster management node 200 and reflected at the time of synchronization.
  • FIG. 5 is a flowchart illustrating a process of synchronizing state information of a work node of FIG. 4.
  • the cluster manager 230 periodically requests a heartbeat message from a plurality of connected work nodes 300 (S510).
  • the cluster agent 310 performs a heartbeat of the cluster manager 220.
  • the heartbeat message according to the message request is transmitted to the cluster manager 220.
  • the heartbeat protocol sends a heartbeat message to a physically connected node at a fixed time interval and determines that an error has occurred if there is no response for a certain time.
  • the cluster manager 230 receives a heartbeat message within a predetermined transmission time from the cluster agent 310 of a specific working node 300, the working node is in a normal state and does not receive a heartbeat message within the predetermined transmission time. If not, the work node 300 determines that an error has occurred and reflects the relevant information in the global meta information of the DBMS engine 320 of the other work node 300 and at the same time, the local storage 240 of the cluster management node 200. Error information is recorded in step S520 and S530.
  • the work node uses only meta information when processing a work in cooperation with a plurality of work nodes 300 connected in individual work nodes 300. Therefore, when the meta information is changed, the cluster manager 230 of the cluster management node 200 and the cluster agent 310 of each work node 300 collaborate to perform the DBMS engine 320 of the work node 300. The changed meta information is reflected in the global meta information 321 in real time.
  • FIG. 6 is a flowchart illustrating a process of processing a table creation request of a client according to an embodiment of the present invention.
  • Records belonging to the same table of the DBMS may be distributed and stored in a plurality of work nodes.
  • the method of distributing and storing the records in a plurality of work nodes 300 may apply various methods previously presented.
  • the most important information necessary for distributed job processing is connection information of the job node 300 in which a specific record is stored.
  • the DBMS engine 320 of a specific work node 300 receives a table generation query requested by the client 400 (S610).
  • the table generation query statement of the client 400 may include work node information for storing a record according to a condition for a specific column.
  • the work node information is stored in work node 0 to 20, work node 21 to 40, work node 2 to 41, work node 3 to 41, and work node 5 to 61 and above. Can be specified.
  • the first work node 300 receiving the table generation query retrieves state information of the work node included in the table generation query. If the status of one or more work nodes among the work nodes for storing records is an error, the table generation processing error information for the corresponding error generating node is returned (S620).
  • the DBMS engines 320 of all the work nodes 300 must share the same table information in order for a plurality of work nodes 300 to collaborate and process a work requested by a client at the same time. To this end, the DBMS engine 320 having received the table creation request transmits the received table creation request to the other work nodes to the cluster agent 310 of the work node 300 (S630).
  • the cluster agent 310 notifies the cluster manager 230 of the cluster management node 200 managing the work nodes 300 of the table creation request and then receives the table creation request from the cluster manager of the cluster management node 220.
  • the process of synchronizing table information of all work nodes 300 is performed by 230.
  • the cluster manager 230 transmits a table creation request to the cluster agent 310 of the plurality of working nodes 300 connected thereto and waits for the processing result.
  • the cluster agent 310 of each work node 300 receives a table creation request from the cluster manager 230 of the cluster management node 200, generates table information in association with the DBMS engine 320, and generates a table generation process result.
  • the cluster manager 230 handles an error when the client 400 requests to create a table so that the work nodes 300 included in the distributed cluster system 100 share the same table information. Perform the function (S650).
  • the cluster manager 230 waits for a table creation process result message for a predetermined time after transmitting a table creation request to the working nodes 300 currently in operation. At this time, the waiting time information waiting for the processing result message for the table creation request is defined by the user. If there is a work node 300 that does not send a processing result message during the waiting time, the cluster manager 230 considers the table creation request processing error and instructs all work nodes 300 to delete the table creation request processing result. After that, the error information is transmitted to the first work node 300 first received.
  • the reconnection synchronization information including the time information and the table creation request information together with the work node 300 ID is stored in the local storage 240 (S660).
  • FIG. 7 is a flowchart illustrating a process of reconnecting an error generating work node according to an embodiment of the present invention.
  • the error occurrence work node is added after updating the global meta information 321 of the DBMS engine 320, that is, updating the error.
  • the error generating work node 300 transmits a reconnection request message to the cluster manager 230 after the error is corrected (S710).
  • the cluster manager 230 is a table node generating log information of the local storage 240 and a currently running work node. The relevant information is updated by transmitting the status information of the 300 to the error generating work node 300 for reconnecting (S720).
  • the reconnected work node 300 updates the status information of another work node, sequentially performs table creation by using the table creation log information, updates the table information, and then transfers the processing result to the cluster manager 230.
  • the cluster manager 220 requests the cluster agent 310 of other work nodes 300 to update the status information of the work node 300 reconnecting after correcting an error, and the cluster agent 310 requests the DBMS engine.
  • the work node information of 320 is updated.
  • the distributed cluster management system 100 may reconnect a work node having corrected an error without stopping the entire system even after an error occurs in a specific work node 300 after being driven.
  • the work node that first receives the table creation query delivers the processing result to the client.
  • the client 400 connects to the cluster management node 200 for the first time, and the cluster management node 200 designates a target to which the client 400 connects among the plurality of working nodes 300. Thereafter, the client 400 may connect with the DBMS engine 320 of the specific work node 300 to perform a task.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un système de gestion de grappes réparties et un procédé associé, le système comportant: un ou plusieurs nœuds de travail servant à effectuer un traitement parallèle de travail demandé par un client, en utilisant des méta-informations nécessaires au traitement réparti du travail; et un ou plusieurs nœuds de gestion de grappes servant à gérer les nœuds de travail par unités de groupes et à synchroniser des éléments de méta-informations en suivant un processus de fonctionnement de grappes, de façon à gérer un nœud de travail relié au nœud de gestion de grappes lui-même. Par conséquent, selon la présente invention, des nœuds de travail peuvent traiter un travail demandé par un client en n'utilisant que des méta-informations de façon à améliorer les performances de traitement du travail. En outre, lorsqu'une défaillance survient dans un nœud de travail spécifique pendant le fonctionnement des grappes, l'accès renouvelé au nœud de travail est autorisé après avoir synchronisé des éléments d'information nécessaires au traitement du travail, sans redémarrer la grappe tout entière, puis corrigé la défaillance dans le nœud de travail pendant le fonctionnement sans interruption du système de grappes réparties.
PCT/KR2018/003697 2018-03-28 2018-03-29 Système de gestion de grappes réparties et procédé associé WO2019189963A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020180035943A KR102038527B1 (ko) 2018-03-28 2018-03-28 분산 클러스터 관리 시스템 및 그 방법
KR10-2018-0035943 2018-03-28

Publications (1)

Publication Number Publication Date
WO2019189963A1 true WO2019189963A1 (fr) 2019-10-03

Family

ID=68060567

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/003697 WO2019189963A1 (fr) 2018-03-28 2018-03-29 Système de gestion de grappes réparties et procédé associé

Country Status (2)

Country Link
KR (1) KR102038527B1 (fr)
WO (1) WO2019189963A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112367189A (zh) * 2020-10-21 2021-02-12 深圳前海微众银行股份有限公司 一种分布式节点管理方法、设备及存储介质
CN114785780A (zh) * 2022-03-14 2022-07-22 广东利通科技投资有限公司 车道业务数据处理方法、装置、设备、介质和程序产品
WO2022193622A1 (fr) * 2021-03-15 2022-09-22 中兴通讯股份有限公司 Procédé de traitement de messages 5g, centre de messages 5g, terminal et support de stockage
CN115348185A (zh) * 2022-08-19 2022-11-15 招银云创信息技术有限公司 一种分布式查询引擎的控制方法及其控制装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210046481A (ko) * 2019-10-18 2021-04-28 삼성전자주식회사 전자 장치 및 그 제어 방법
KR102518774B1 (ko) * 2019-12-16 2023-04-06 한국전력공사 배전계통 감시 제어를 위한 전단처리 장치 및 방법
CN112202746B (zh) * 2020-09-24 2023-04-21 北京百度网讯科技有限公司 Rpc成员信息获取方法、装置、电子设备和存储介质
KR102517831B1 (ko) * 2022-11-30 2023-04-04 한화시스템 주식회사 미션 크리티컬 시스템 환경에서의 소프트웨어 관리방법 및 그 시스템

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004519024A (ja) * 2000-09-08 2004-06-24 ゴー アヘッド ソフトウェア インコーポレイテッド 多数のノードを含むクラスタを管理するためのシステム及び方法
KR20100048130A (ko) * 2008-10-30 2010-05-11 주식회사 케이티 메타데이터 클러스터 기반의 분산형 스토리지 시스템 및 그동작 방법
KR20130048093A (ko) * 2011-11-01 2013-05-09 한국전자통신연구원 분산 데이터 베이스 장치 및 그 장치에서의 스트림 데이터 처리 방법
KR20130133989A (ko) * 2012-05-30 2013-12-10 삼성에스디에스 주식회사 파일 스토리지 클러스터간 병렬 파일 전송 시스템 및 방법
KR20150093979A (ko) * 2014-02-10 2015-08-19 단국대학교 산학협력단 가상화 클러스터 환경에서 네임 노드를 할당하는 방법 및 장치

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100911058B1 (ko) * 2007-11-22 2009-08-06 한국전자통신연구원 메타데이터 서버 검색 방법
KR101035857B1 (ko) 2008-12-19 2011-05-19 한국전자통신연구원 데이터 관리 방법 및 그 시스템
US9081826B2 (en) * 2013-01-07 2015-07-14 Facebook, Inc. System and method for distributed database query engines
KR102033383B1 (ko) * 2016-02-18 2019-10-17 한국전자통신연구원 분산데이터 환경에서의 데이터 관리방법 및 시스템
KR102098896B1 (ko) 2016-03-02 2020-04-09 한국전자통신연구원 데이터 관리 시스템 및 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004519024A (ja) * 2000-09-08 2004-06-24 ゴー アヘッド ソフトウェア インコーポレイテッド 多数のノードを含むクラスタを管理するためのシステム及び方法
KR20100048130A (ko) * 2008-10-30 2010-05-11 주식회사 케이티 메타데이터 클러스터 기반의 분산형 스토리지 시스템 및 그동작 방법
KR20130048093A (ko) * 2011-11-01 2013-05-09 한국전자통신연구원 분산 데이터 베이스 장치 및 그 장치에서의 스트림 데이터 처리 방법
KR20130133989A (ko) * 2012-05-30 2013-12-10 삼성에스디에스 주식회사 파일 스토리지 클러스터간 병렬 파일 전송 시스템 및 방법
KR20150093979A (ko) * 2014-02-10 2015-08-19 단국대학교 산학협력단 가상화 클러스터 환경에서 네임 노드를 할당하는 방법 및 장치

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112367189A (zh) * 2020-10-21 2021-02-12 深圳前海微众银行股份有限公司 一种分布式节点管理方法、设备及存储介质
CN112367189B (zh) * 2020-10-21 2023-05-12 深圳前海微众银行股份有限公司 一种分布式节点管理方法、设备及存储介质
WO2022193622A1 (fr) * 2021-03-15 2022-09-22 中兴通讯股份有限公司 Procédé de traitement de messages 5g, centre de messages 5g, terminal et support de stockage
CN114785780A (zh) * 2022-03-14 2022-07-22 广东利通科技投资有限公司 车道业务数据处理方法、装置、设备、介质和程序产品
CN115348185A (zh) * 2022-08-19 2022-11-15 招银云创信息技术有限公司 一种分布式查询引擎的控制方法及其控制装置
CN115348185B (zh) * 2022-08-19 2023-12-05 招银云创信息技术有限公司 一种分布式查询引擎的控制方法及其控制装置

Also Published As

Publication number Publication date
KR102038527B1 (ko) 2019-11-26
KR20190113366A (ko) 2019-10-08

Similar Documents

Publication Publication Date Title
WO2019189963A1 (fr) Système de gestion de grappes réparties et procédé associé
CN111581284B (zh) 一种数据库高可用性方法、装置、系统和存储介质
CN100544342C (zh) 存储系统
WO2012057581A4 (fr) Système d'infonuagique et son procédé de synchronisation de données
US10866965B2 (en) Data replicating systems and data replicating methods
WO2015090245A1 (fr) Procédé de transmission de fichiers, appareil, et système de fichiers en grappe distribuée
GB2407887A (en) Automatically modifying fail-over configuration of back-up devices
US8156177B2 (en) Fail-safe system for managing of client-server communication
CN108400886A (zh) 一种机房服务器监控系统及方法
CN102012944A (zh) 一种提供复制特性的分布式nosql数据库
JP2633900B2 (ja) 共通バス制御方法
JPH1027147A (ja) サーバアドレス管理方式
WO2020158968A1 (fr) Procédé d'extension de nœuds de tâche dans un système de grappe p2p hybride
JP4550604B2 (ja) 設定情報同期プログラム
JP4653490B2 (ja) 相互接続を有するクラスタリングシステムおよび方法
US9325789B2 (en) Distributed server election with imperfect clock synchronization
US20080307045A1 (en) Method, system and apparatus for managing directory information
JP4133738B2 (ja) 高速ネットワークアドレス引継ぎ方法、ネットワーク装置及びプログラム
CN113259426B (zh) 微服务中解决数据依赖的方法、系统、设备和介质
WO2014137008A1 (fr) Système et procédé pour le partage d'une ressource graphique
CN112615944B (zh) 分布式dns主辅同步的方法和系统
US20020194268A1 (en) Distribute object mechanism
US20240028611A1 (en) Granular Replica Healing for Distributed Databases
CN112422628A (zh) Redis-canal跨机房缓存同步系统
JP2001223730A (ja) ゲートウェイ装置および記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18912029

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18912029

Country of ref document: EP

Kind code of ref document: A1