CN105306545A - Failover method and system for external service node of cluster - Google Patents

Failover method and system for external service node of cluster Download PDF

Info

Publication number
CN105306545A
CN105306545A CN 201510627329 CN201510627329A CN105306545A CN 105306545 A CN105306545 A CN 105306545A CN 201510627329 CN201510627329 CN 201510627329 CN 201510627329 A CN201510627329 A CN 201510627329A CN 105306545 A CN105306545 A CN 105306545A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
node
external
service
number
nodes
Prior art date
Application number
CN 201510627329
Other languages
Chinese (zh)
Inventor
陈莹昊
周龙飞
Original Assignee
浪潮(北京)电子信息产业有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

The invention discloses a failover method and system for an external service node of a cluster. The method comprises the following steps: allocating an intranet address, a node serial number and a priority level for each node in the cluster; sending broadcast to the intranet addresses of other nodes excluding the current node through an intranet by each node; wherein, the broadcast by each non-external service node comprises a node serial number per se, the broadcast by the external service node comprises the node serial number per se and external service identification information; determining node serial number information not receiving the broadcast by each node according to the received node serial number information; selecting nodes corresponding to the same node serial number information not receiving the broadcast in all nodes as failure nodes; and if the external service node is present in the failure nodes, selecting a node having the highest priority level in effective nodes to serve as the external service node. The method and the system can be used for achieving reasonable and effective failover in the case of failure of the external service node.

Description

一种集群对外服务节点失效接管的方法及系统 A method of cluster node failure to take over the foreign service and system

技术领域 FIELD

[0001] 本发明涉及计算机领域,特别涉及一种集群对外服务节点失效接管的方法及系统。 [0001] The present invention relates to computers, and more particularly relates to a method and system for external services cluster node failure takeover.

背景技术 Background technique

[0002] —般的集群通常只能在少数几个节点之间进行对外服务的切换,而且配置进行改动的时候非常麻烦,当集群面对外服务的主节点失效的时候,只能启用已经设置好的几个备用节点中的一个来接管对外的服务。 [0002] - like clusters usually only be switched between the external services of a few nodes and configuration changes very troublesome time, when the master node cluster services outside face of failure, can only enable has been set several a spare node to take over external services. 并且需要对主节点和几个备用节点提前进行分别配置,备用节点比较多的时候进行配置会非常麻烦。 And the need for master node and a few spare nodes are arranged ahead of time, more time to configure the backup node will be very troublesome. 如果希望添加新备用节点或者删除旧备用节点,修改配置非常复杂,这样子会严重降低集群的可扩展性。 If you want to add a new spare node or delete the old standby node, modify the configuration is very complex, so the child would seriously reduce the scalability of the cluster. 另外此种方法配置多备用节点的时候容易出现接管顺序的混乱的情况。 Furthermore this method to configure multiple backup node to take over when the confusion prone procedure.

[0003]因此,如何快速,简洁地实现集群对外服务节点失效接管,是本领域技术人员需要解决的技术问题。 [0003] Therefore, how fast, simple to implement cluster node fails to take over the Foreign Service, it is skilled in the art to be solved technical problems.

发明内容 SUMMARY

[0004] 本发明的目的是提供一种集群对外服务节点失效接管的方法及系统,该方法及系统能够对外服务节点失效的时候进行合理有效的接管,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。 [0004] The present invention is a method and system for providing a cluster node failure to take over the foreign service, when the method and system capable of external services node failure of rational and effective takeover, was greatly reduced when adding or removing cluster nodes modify the configuration of the time-consuming, can be prevented due to the modification of human errors caused by operation of the complex.

[0005] 为解决上述技术问题,本发明提供一种集群对外服务节点失效接管的方法,为集群内每个节点分配内网地址,节点编号以及优先级,还包括: [0005] To solve the above problems, the present invention provides a method of cluster node failure takeover external services, to assign network addresses, the node number for each node in the cluster and priority, further comprising:

[0006] 每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息; [0006] Each node sends a broadcast to the other nodes of the network address other than the present node through the network; wherein each non-broadcast transmitted external service node comprises its own node number, node sends a broadcast foreign service itself comprises the node number and external service identification information;

[0007] 每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息; [0007] Each node does not receive the node number information is broadcast to the node number information received, according to the determined;

[0008] 选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节占. [0008] Select all nodes in the node number does not receive the broadcast information corresponding to the node representing the same section as invalid.

[0009] 若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。 [0009] If the presence of the foreign service node failure node, the node with the highest priority is selected as a valid node from the external service node.

[0010] 其中,还包括: [0010] wherein, further comprising:

[0011] 若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点; When [0011] If the external service node receives the broadcast, there is a high ratio of the external service node priority node, and selecting from the highest priority node is higher than the priority of the external node serving node;

[0012] 所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务; [0012] The external service node sends to the node with the highest priority takes over the request, and stop the foreign service;

[0013] 所述优先级最高的节点接收到所述接管请求后接管对外服务。 The [0013] highest priority node, after receiving the takeover request to take over the external services.

[0014] 其中,所述每个节点通过内网向除本节点之外的其他节点的内网地址发送广播包括: [0014] wherein each node sends the network addresses of other nodes other than this node within the network by broadcasting comprising:

[0015] 所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。 [0015] The table corresponding to each node of the network addresses, sending a broadcast to the remaining network addresses of each node through the network.

[0016] 其中,所述每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息包括: [0016] wherein, each node receives the node number information, determines the node number information is not received broadcast includes:

[0017] 每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号 [0017] Each node according to the node number information is received within the preset time, determining the node number is not received broadcast

ί目息。 ί project information.

[0018] 其中,还包括: [0018] wherein, further comprising:

[0019] 定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。 [0019] The basic configuration of the timing information of each node in the cluster, the node number, and updates the priority.

[0020] 本发明提供的一种集群对外服务节点失效接管的系统,包括: [0020] A service node outside the cluster of the present invention provides a system failure takeover, comprising:

[0021] 设置模块,用于为集群内每个节点分配内网地址,节点编号以及优先级; [0021] setting module for each node in the cluster assigned network address, the node number and the priority;

[0022] 广播模块,用于每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息; [0022] broadcasting module, for each node sends a broadcast to the other nodes of the network address other than the present node through the network; wherein each non-broadcast transmitted external service node comprises its own node number, the external service node send broadcast includes its own node number and external service identification information;

[0023] 确定模块,用于每个节点根据接收到节点编号信息,确定未接收到广播的节点编号ί目息; [0023] determination means for each node fails to receive broadcast information destination node ID ί node number information is received, determined in accordance with;

[0024] 选取模块,用于选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点; [0024] selection module for selecting all of the nodes in the same node number does not receive the broadcast information corresponding to a node as a failed node;

[0025] 接管模块,用于若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。 When [0025] The takeover means for the presence of foreign service node if the node fails, the node with the highest priority is selected as the active node from the external service node.

[0026] 其中,还包括: [0026] wherein, further comprising:

[0027] 比较模块,用于若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点; [0027] The comparison module for broadcasting if the received external service node, a node when there is a high priority than the external service node, and selecting the highest priority than the external service node from a higher priority node node;

[0028] 替换模块,用于所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务;所述优先级最高的节点接收到所述接管请求后接管对外服务。 [0028] Alternatively means for taking over the foreign service node sends a request to the node with the highest priority, and stop foreign service; the highest priority node, after receiving the takeover request to take over the external services.

[0029] 其中,所述确定模块包括: [0029] wherein the determining module comprises:

[0030] 所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。 [0030] The table corresponding to each node of the network addresses, sending a broadcast to the remaining network addresses of each node through the network.

[0031] 其中,所述选取模块包括: [0031] wherein said selection module comprising:

[0032] 每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号 [0032] Each node according to the node number information is received within the preset time, determining the node number is not received broadcast

ί目息。 ί project information.

[0033] 其中,还包括: [0033] wherein, further comprising:

[0034] 更新模块,用于定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。 [0034] The updating module, for timing the basic configuration information of each node in the cluster, the node number, and updates the priority.

[0035] 本发明所提供的集群对外服务节点失效接管的方法及系统,为集群内每个节点分配内网地址,节点编号以及优先级,还包括:每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息;每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息;选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点;若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点; [0035] The method and system of the present invention, the cluster nodes external services provided by the failure to take over, within each node within the cluster distribution network address, the node number and the priority, further comprising: by each node in the network nodes in addition to those the network address of other nodes outside the broadcast transmission; wherein each non-broadcast transmitted external service node comprises its own node number, node sends a broadcast external services include its own node number and identification information of external services; each node according to the received to the node number information, determines the node number information is not received broadcast; select all the nodes in the same node ID is not received broadcast information corresponding to the relative node as the failed node; if the failure nodes present in the external service node, from the highest priority node selected as the active node in the foreign service node;

[0036] 由于该方法通过提前对节点进行配置,为每一个节点分配内网地址,节点编号以及优先级,每个节点都有自己的编号以及优先级,删除和添加时只需要对相应的参数进行修改,因此这样的方式在之后进行扩展的时候在添加或者删除节点时需要的操作将会很简单;且每个节点通过广播的形式进行身份的确认,以及是否失效可以准确快速的在对外服务节点失效的时候进行合理有效的接管。 [0036] Since the process by advancing the nodes configured for each node is assigned a network address, the node number and the priority, each node has its own number and priority, only the corresponding parameters delete and add modified, so this way when, after extending a need to add or operation will be very simple when you delete a node; and each node to confirm the identity of the form of broadcast, and whether the failure can be quickly and accurately in the Foreign service when node failure of rational and effective takeover. 且不需要人为干扰,可以防止人为失误的产生;该方法及系统能够对外服务节点失效的时候进行合理有效的接管,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。 And no human interference, can prevent human errors; when the method and system capable of external services node failure of rational and effective takeover, have greatly reduce the cost of modifying the configuration when adding or removing cluster nodes time, can be prevented modifying operation caused due to the complexity of human errors.

附图说明 BRIEF DESCRIPTION

[0037] 为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。 [0037] In order to more clearly illustrate the technical solutions in the embodiments or the prior art embodiment of the present invention, briefly introduced hereinafter, embodiments are described below in the accompanying drawings or described in the prior art needed to be used in describing the embodiments the drawings are only examples of the present invention, those of ordinary skill in the art is concerned, without creative efforts, can derive other drawings from the accompanying drawings provided.

[0038]图1为本发明实施例所提供的集群对外服务节点失效接管的方法的流程图; [0038] FIG. 1 is a flowchart of a method of external service node cluster failure takeover embodiment provided embodiment of the invention;

[0039]图2为本发明实施例所提供的集群对外服务节点失效接管的系统的结构框图。 [0039] Figure 2 a block diagram of the external service node cluster failure takeover embodiment provided a system embodiment of the invention.

具体实施方式 detailed description

[0040] 本发明的核心是提供一种集群对外服务节点失效接管的方法及系统,该方法及系统能够对外服务节点失效的时候进行合理有效的接管,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。 [0040] The core of the invention is a method and system for providing a cluster node failure to take over the foreign service, when the method and system capable of external services node failure of rational and effective takeover, was greatly reduced when adding or removing cluster nodes modify the configuration of the time-consuming, can be prevented due to the modification of human errors caused by operation of the complex.

[0041] 为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。 [0041] In order that the invention object, technical solutions, and advantages of the embodiments more clearly, the following the present invention in the accompanying drawings, technical solutions of embodiments of the present invention are clearly and completely described, obviously, the described the embodiment is an embodiment of the present invention is a part, but not all embodiments. 基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。 Based on the embodiments of the present invention, all other embodiments of ordinary skill in the art without any creative effort shall fall within the scope of the present invention.

[0042] 请参考图1,图1为本发明实施例所提供的集群对外服务节点失效接管的方法的流程图;该方法首先需要为集群内每个节点分配内网地址,节点编号以及优先级设置,其中,每台节点的基础配置保持一致;设置之后该方法可以包括: [0042] Please refer to FIG 1, FIG. 1 is a flowchart of a method embodiment of the cluster nodes external services provided by the failure to take over the embodiment; This method first requires each node within the cluster distribution network address, the node number and the priority provided, wherein the basic configuration of each node are consistent; after setting the method may comprise:

[0043] slOO、每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息; [0043] slOO, each node sends a broadcast to the other nodes of the network address other than the present node through the network; wherein each non-broadcast transmitted external service node comprises its own node number, the node sends a broadcast external services including its node number and external service identification information;

[0044] 其中,节点发送广播的目的是让每个节点了解除本身之外的其他节点的情况。 [0044] The destination node sends a broadcast is to have each node of the circumstances precluding other nodes outside of itself. 因此每个节点发送的内容可以包括自身信息,例如包含自身节点的编号,若为对外服务节点还需要包括表示自己对外节点身份的标识信息;其他节点的信息可有可无。 Therefore, the contents of each node sends its own information may include, for example, contains the number of the node itself, if the need for the foreign service node also includes Foreign node identification information indicating own identities; information on other nodes dispensable. 例如优先级信息可以通过广播发送,也可以其他节点根据广播得到的节点编号,再根据节点编号查找每个节点中保存的优先级列表。 For example, the priority information may be transmitted through broadcast, may be broadcast to other nodes according to the obtained node number, and then stored in the lookup priority list in each node according to the node number. 这里仅仅举出两种方式。 Here are just mentioned two ways.

[0045] sllO、每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息; [0045] sllO, each node does not receive the node number information is broadcast to the node number information received, according to the determined;

[0046] s 120、选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点; [0046] s 120, selected the same for all nodes in the node number information is not received corresponding to the broadcast node as the failed node;

[0047] 其中,通过对各个节点的编号信息进行比较,得到失效节点的编号信息即确定失效节点;在所有节点都正常的情况下,应该每个节点都会收到其他节点的编号信息,即加上自己的编号为全部节点信息;当存在有节点失效时,则该节点并不能发送广播,因此,正常节点可以将接收到的节点编号与编号列表中的所有编号信息进行比对,确定缺少的编号信息,即确定为未接收到广播的节点编号信息。 [0047] wherein the number information by comparing the respective nodes, the failed node to obtain number information, i.e. determine the failure nodes; under normal circumstances all nodes, each node should receive the ID information of other nodes, i.e., plus on their number of all node information; when there is node failure, the node does not transmit the broadcast, therefore, a normal node may be received by the node number of all number information in a numbered list for comparison, identify missing number information, i.e., determines the node number is not received information broadcast. 可以将所有正常节点得到的未接收到广播的节点编号信息进行汇总,在进行比较,确定出所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点。 Node number information broadcast is not received all of the normal node may aggregate obtained, making comparisons, it is determined that all nodes have not received the same broadcast information corresponding to the node number of the node as the failed node.

[0048] s 130、若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。 [0048] s 130, if the presence of the foreign service node failure node, the node with the highest priority is selected as a valid node from the external service node.

[0049] 当失效节点中存在对外服务节点时,则需要在所有有效节点中选出优先级最高的节点来接管,作为对外服务节点。 [0049] When there are foreign service node node failure, you need to select the highest priority of all active nodes in the node to take over as the Foreign Service Node. 其中,各个节点的优先级可以通过实际情况进行设定和修改。 Wherein the priority of each node may be set and modified by the actual situation. 通过设置优先级可以使得对外服务节点保持在有效节点中的最优的状态。 By setting priorities may make foreign service node to maintain an optimal state of active nodes.

[0050] 基于上述技术方案,本发明实施例提供的集群对外服务节点失效接管的方法,该方法通过提前对节点进行配置,为每一个节点分配内网地址,节点编号以及优先级,每个节点都有自己的编号以及优先级,删除和添加时只需要对相应的参数进行修改,因此这样的方式在之后进行扩展的时候在添加或者删除节点时需要的操作将会很简单;且每个节点通过广播的形式进行身份的确认,以及是否失效可以准确快速的在对外服务节点失效的时候进行合理有效的接管。 [0050] Based on the above technical solutions, the external service node cluster method provided in embodiments of the present invention, the failure to take over, the process by advancing the nodes configured for each node is assigned a network address, the node number and the priority, each node It has its own number and priority, and only needs to be deleted when adding the corresponding parameter modification, so this way when after extended operations in to add or delete nodes when needed will be very simple; and each node confirm the identity of the form of broadcast, and can be quickly and accurately whether the failure to make reasonable and effective to take over when the foreign service node failure. 且不需要人为干扰,可以防止人为失误的产生;该方法及系统能够对外服务节点失效的时候进行合理有效的接管,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。 And no human interference, can prevent human errors; when the method and system capable of external services node failure of rational and effective takeover, have greatly reduce the cost of modifying the configuration when adding or removing cluster nodes time, can be prevented modifying operation caused due to the complexity of human errors.

[0051] 基于上述技术方案,该方法还可以包括: [0051] Based on the above technical solution, the method may further comprise:

[0052] 若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点; When [0052] If the external service node receives the broadcast, there is a high ratio of the external service node priority node, and selecting from the highest priority node is higher than the priority of the external node serving node;

[0053] 所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务; [0053] The external service node sends to the node with the highest priority takes over the request, and stop the foreign service;

[0054] 所述优先级最高的节点接收到所述接管请求后接管对外服务。 The [0054] highest priority node, after receiving the takeover request to take over the external services.

[0055] 其中,节点编号的规律可以根据使用情况进行设定,例如将所有节点按照优先级从尚到低的顺序进彳丁排列。 [0055] wherein, the node number of the law can be set according to usage, for example, all the nodes are arranged in the intake butoxy left foot from still to low priority order.

[0056] 其中,该系统会实时根据有效节点以及对外服务节点的优先级的情况进行实时接管,始终保持着在所有有效节点中优先级最高的节点为对外服务节点。 [0056] wherein the system in real-time, real-time to take over the case according to the priority of active nodes and external service node, a node always maintained the highest priority of all active nodes for the foreign service node. 举出一个具体过程可以如下: Include a specific process may be as follows:

[0057] 通过使用分配内网的网段,在所有节点上准备一个固定的内网地址,节点编号和优先度对应表,只需要给所有节点分配节点编号,让节点自身根据对应表自动配置内网地址。 [0057] By using the network segments in the distribution network, preparing a fixed network address, node number and the priority correspondence table on all nodes, all nodes just give assigned node number, node itself automatically so that the configuration according to the correspondence table network address. 然后再通过使用内网广播,让所有节点在内网中向对应表中的所有内网地址定时发送含有自身节点编号广播,其中对外服务的节点发送的广播中会额外添加一段特殊的信息,表明自己是对外提供服务的节点。 Then by using the internal network broadcast, all nodes including the network time to send to the corresponding table containing all addresses within the network broadcasts its own node number, the node sends a broadcast in which external services will add some extra special message that It is a node external service provider.

[0058] 如果所有节点的在监听的过程中都无法收到某节点的广播,则视该节点为失效节点;反之则视该节点为有效节点。 [0058] If all the nodes in the process are monitored by a node can not receive the broadcast, then depending on the failed node is a node; otherwise depending on the node is a valid node. 如果对外提供服务的节点失效,则从有效节点中选出优先度最高的节点开始提供对外服务。 If a node failure to provide services, from the node with the highest priority valid node started to provide selected external services. 如果提供对外服务的节点收到了来自比自身优先度高的节点的广播,则会停止对外服务,并且向该节点发送接管请求,当节点优先度更高的节点接收到接管请求,就会进行对外服务的接管,然后开始提供对外服务。 If provide external services node receives a broadcast from the node of higher priority than its own, it stops the external service, and transmits the request to take over the node, the node when the higher priority node receives the takeover request, will be outside to take over the service, and then began to provide external services.

[0059] 基于上述各个技术方案,可选的,所述每个节点通过内网向除本节点之外的其他节点的内网地址发送广播可以包括: [0059] Based on each of the above technical solution, optionally, each node sends the network addresses of other nodes other than this node may broadcast by the network comprising:

[0060] 所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。 [0060] The table corresponding to each node of the network addresses, sending a broadcast to the remaining network addresses of each node through the network.

[0061] 其中,可以将所有节点的内网网址进行列表,并可以存储在每一个节点中;各个节点根据内网地址对应表,通过内网向其余除本节点之外的其余每个节点的内网地址发送广播。 [0061] wherein, the network may be a list of URLs of all nodes, and may be stored in each node; each node correspondence table according to the network address, through the network to the rest of each of the remaining nodes other than this node network broadcast address to send.

[0062] 这里发送的时间可以是随时发送,也可以是根据预定时间间隔进行发送。 [0062] where time may be sent at any transmission, may be transmitted in accordance with a predetermined time interval. 或者几种模式并存,可以通过用户进行设置。 Several models exist or can be set by a user. 即根据实际情况进行设定。 That is set according to the actual situation.

[0063] 基于上述各个技术方案,可选的,所述每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息可以包括: [0063] Based on each of the above technical solution, optionally, each node of the node number does not receive the broadcast information may comprise information received by the node number, determined according to:

[0064] 每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号 [0064] Each node according to the node number information is received within the preset time, determining the node number is not received broadcast

ί目息。 ί project information.

[0065] 其中,由于节点发送广播并不是每时每刻都在发送,因此,在确定无效节点时,必须要有时间限制,否则会造成混乱。 [0065] Among them, the node sends a broadcast is not sent all the time, therefore, in determining the node is not valid, there must be a time limit, otherwise it will cause confusion. 因此,每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号信息。 Thus, each node according to the node number information is received within the preset time, determining the node number information is not received broadcast. 这里的预设时间可以是节点发送广播的预设的时间间隔,也可以小于该时间间隔。 Here the preset time may be a node of a preset radio transmission time interval, may be smaller than the time interval.

[0066] 基于上述任意具体技术方案,该方法还可以包括: [0066] Based on the above any particular aspect, the method may further comprise:

[0067] 定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。 [0067] The basic configuration of the timing information of each node in the cluster, the node number, and updates the priority.

[0068] 其中,根据技术的改进,或者设备的使用情况,用户可以随时或者定期对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。 [0068] wherein, based on the use of improved technology, or a device, the user may at any time or periodically basic configuration information of each node in the cluster, the node number, and updates the priority. 使得该节点信息始终为最优的信息,从而保证了对外服务节点的能力。 So that the node information is always the best information, so as to ensure the ability of foreign service node.

[0069] 具体过程为: [0069] The specific process is:

[0070] 节点总数不固定的情况下,搭建集群并需要对外提供高可用服务的时候,我们需要对每台节点的基础配置保持一致,然后再对每个节点进行编号,分配各自的接管对外服务的优先级,优先级最高的节点获得对外服务的权限。 Under the total number of [0070] node is not fixed, and the need to build a cluster to provide high availability of external services, we need to be consistent on the basis of the configuration of each node, and then number each node, each assigned to take over the Foreign Service priority, the highest priority node obtains permission to external services. 所有节点都向其余节点进行广播,如果所有节点在一定时间内没有收到某一节点的广播,则视该节点失效。 All nodes broadcast to other nodes, if a node does not receive all the broadcast of a node within a certain time, depending on the node failure. 如果对外服务的节点失效了,就由剩余有效节点中优先级最高的负责接管对外服务;如果出现了更高优先级的节点,当前提供对外服务的节点会停止对外服务,并且向更高优先级的节点发送接管请求,更高优先级的节点收到请求以后接管对外服务。 If the foreign service node failure, and on the remaining active node with the highest priority to take control of external services; if a higher priority node appeared, the current provision of external services node will stop the foreign service, and to a higher priority the node sends a request to take over the higher priority node after receiving the request to take over the Foreign service.

[0071] 基于上述技术方案,本发明实施例提供的集群对外服务节点失效接管的方法,改变了现有技术中集群通常只能在少数几个节点之间进行对外服务的切换,而且配置进行改动的时候非常麻烦。 [0071] Based on the above technical solution, a method of external cluster node provided service failure takeover of the present invention, changing the cluster switch for external services usually only a few nodes between the prior art, and configuration changes when very troublesome. 该方法可以在集群节点数量不固定的情况下对节点进行快速配置,使得对外服务节点失效的时候进行合理有效的接管,并且在添加或者删除节点时需要的操作简单快捷,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。 This method can quickly configure the number of nodes in the cluster nodes are not fixed, so that the foreign service node failure when reasonably and effectively over, and add or delete a simple operation required when a node fast, was reduced significantly add modify the configuration or time consuming to remove a cluster node, human errors can be prevented due to changes in the operation due to complex. 且配置节点简单,只需要一个通用的配置加上单独设置节点编号即可,所有的节点拥有接管对外服务的能力,所有的节点拥有接管对外服务的能力。 And node configuration is simple, just need a common configuration with node number can be set independently, all nodes have the ability to take over the external services, all nodes have the ability to take over the external services.

[0072] 本发明实施例提供了集群对外服务节点失效接管的方法,可以通过上述方法能够对外服务节点失效的时候进行合理有效的接管。 [0072] Example embodiments provide a method of cluster node failure takes over the foreign service of the present invention, when the external services may be capable of node failure by the method described above takes over reasonable and effective.

[0073] 下面对本发明实施例提供的集群对外服务节点失效接管的系统进行介绍,下文描述的集群对外服务节点失效接管的系统与上文描述的集群对外服务节点失效接管的方法可相互对应参照。 [0073] The following system cluster foreign service node provided failure takeover is described embodiment of the present invention, a method cluster external service node cluster external services nodes described below failure to take over the system described above failure takeover may correspond to cross-reference.

[0074] 请参考图2,图2为本发明实施例所提供的集群对外服务节点失效接管的系统的结构框图;该系统可以包括: [0074] Please refer to FIG 2, a block diagram of FIG. 2 cluster external services provided by the node embodiment of the failure to take over the system embodiment of the invention; The system may comprise:

[0075] 设置模块100,用于为集群内每个节点分配内网地址,节点编号以及优先级; [0075] The setting module 100, is used to assign each node in the cluster network address, the node number and the priority;

[0076] 广播模块200,用于每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息; [0076] The broadcasting module 200, for each node sends a broadcast to the other nodes of the network address other than the present node through the network; wherein each non-broadcast transmitted external service node comprises its own node number, external services including broadcasting node sends its node number and external service identification information;

[0077] 确定模块300,用于每个节点根据接收到节点编号信息,确定未接收到广播的节点编号息; [0077] module 300 determines, for each node according to the node number information received, determining the node number is not received broadcast information;

[0078] 选取模块400,用于选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点; [0078] The selection module 400, configured to select the node number information is not received corresponding to the broadcast with respect to all the nodes in the same node as the failed node;

[0079] 接管模块500,用于若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。 [0079] module 500 to take over, if the presence of a foreign service node failure node, the node with the highest priority is selected as a valid node from the external service node.

[0080] 可选的,该系统还可以包括: [0080] Optionally, the system may further comprise:

[0081] 比较模块,用于若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点; [0081] The comparison module for broadcasting if the received external service node, a node when there is a high priority than the external service node, and selecting the highest priority than the external service node from a higher priority node node;

[0082] 替换模块,用于所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务;所述优先级最高的节点接收到所述接管请求后接管对外服务。 [0082] Alternatively means for taking over the foreign service node sends a request to the node with the highest priority, and stop foreign service; the highest priority node, after receiving the takeover request to take over the external services.

[0083] 可选的,所述确定模块300可以包括: [0083] Optionally, the determination module 300 may include:

[0084] 所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。 [0084] The table corresponding to each node of the network addresses, sending a broadcast to the remaining network addresses of each node through the network.

[0085] 可选的,所述选取模块400可以包括: [0085] Optionally, the selecting module 400 may include:

[0086] 每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号 [0086] Each node according to the node number information is received within the preset time, determining the node number is not received broadcast

ί目息。 ί project information.

[0087] 基于上述任意技术方案,该系统还可以包括: [0087] Based on the above technical solution any, the system may further comprise:

[0088] 更新模块,用于定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。 [0088] updating module, for timing the basic configuration information of each node in the cluster, the node number, and updates the priority.

[0089] 说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。 [0089] In various embodiments, the specification progressive manner described, is different from the embodiment and the other embodiments each of which emphasizes embodiment, the same or similar portions between the various embodiments refer to each other. 对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。 For the disclosed embodiment of the apparatus embodiment, since it corresponds to the method disclosed embodiments, the description is relatively simple, see Methods of the correlation can be described.

[0090] 专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。 [0090] professionals may further appreciate that the various means disclosed herein and algorithm steps described exemplary embodiments, by electronic hardware, computer software, or a combination thereof. In order to clearly illustrate the hardware and software interchangeability, in the above description, according to functions generally described compositions and steps of the examples. 这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。 Whether these functions are performed by hardware or software depends upon the particular application and design constraints of the technical solutions. 专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。 Professional technical staff may use different methods for each specific application to implement the described functionality, but such implementation should not be considered outside the scope of the present invention.

[0091] 结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。 [0091] The steps of a method or algorithm described in the embodiments disclosed herein may be implemented in hardware, or a combination thereof, in a software module executed by a processor implemented directly. 软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程R0M、电可擦除可编程R0M、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。 A software module may be placed in a random access memory (RAM), a memory, a read only memory (ROM), electrically programmable R0M, electrically erasable programmable R0M, registers, a hard disk, a removable disk, CD-ROM, or within the technical field known any other form of storage medium.

[0092] 以上对本发明所提供的集群对外服务节点失效接管的方法及系统进行了详细介绍。 [0092] The foregoing method and system of the present invention, the cluster nodes external services provided by the failed takeover described in detail. 本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。 Herein through specific examples and embodiments of the principles of the present invention are set forth in the above described embodiments are only used to help understand the method and core idea of ​​the present invention. 应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。 It should be noted that those of ordinary skill in the art, in the present invention without departing from the principles of the premise, may also be a number of modifications and improvements of the present invention, and these improvements and modifications also fall within the scope of the claims of the invention.

Claims (10)

  1. 1.一种集群对外服务节点失效接管的方法,其特征在于,为集群内每个节点分配内网地址,节点编号以及优先级,还包括: 每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对夕卜服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息; 每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息; 选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点;若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。 1. A method of cluster node failure takeover external services, wherein for each node in the cluster assigned network address, the node number and the priority, further comprising: each node in addition to the network node by the network address of the other node broadcasts transmitted; wherein each non-transmitted on the broadcast Bu Xi service node includes its own node number, node sends a broadcast external services include its own node number and identification information of external services; each node according to the received to the node number information, determines the node number information is not received broadcast; select all the nodes in the same node ID is not received broadcast information corresponding to the relative node as the failed node; if the failure nodes present in the external service node, from the highest priority node selected as the active node in the foreign service node.
  2. 2.如权利要求1所述的方法,其特征在于,还包括: 若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点; 所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务; 所述优先级最高的节点接收到所述接管请求后接管对外服务。 If the external service node receives the broadcast, there is a high ratio of the external service node priority node from the priority than the external service node: 2. The method according to claim 1, characterized in that, further comprising high-level nodes selected node with the highest priority; the external service node sends a takeover request to the node with the highest priority, and stop foreign service; the highest priority node, after receiving the takeover request to take over the external service.
  3. 3.如权利要求1所述的方法,其特征在于,所述每个节点通过内网向除本节点之外的其他节点的内网地址发送广播包括: 所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。 3. The method according to claim 1, wherein each said node includes transmitting a broadcast to the other nodes of the network address other than the present node through the network: the node corresponding to each network address in accordance with table, sending a broadcast to the remaining network addresses of each node through the network.
  4. 4.如权利要求3所述的方法,其特征在于,所述每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息包括: 每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号信息。 4. The method according to claim 3, wherein said each node does not receive the node number information of the broadcast information including the received node number, determined according to: for each node according to the received node within a preset time number information, determines the node number information is not received broadcast.
  5. 5.如权利要求1至4任一项所述的方法,其特征在于,还包括: 定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。 5. A method according to any one of claim 1 to claim 4, characterized in that, further comprising: a timing of the basic configuration information of each node in the cluster, the node number, and updates the priority.
  6. 6.一种集群对外服务节点失效接管的系统,其特征在于,包括: 设置模块,用于为集群内每个节点分配内网地址,节点编号以及优先级; 广播模块,用于每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息; 确定模块,用于每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息; 选取模块,用于选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点; 接管模块,用于若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。 A cluster node failure takeover external service system, characterized by comprising: a setting module, configured to assign each node in the cluster network address, the node number and the priority; broadcasting module, for each node the network sends a broadcast to the other nodes of the network address other than the present node; wherein each non-broadcast transmitted external service node comprises its own node number, node sends a broadcast external services comprise own node identification numbers and external services information; determining module, for each node according to the node number information received, determining the node number information is not received broadcast; selection module for selecting all the nodes have not received the broadcast information of the same node numbers of nodes corresponding as failure nodes; taking over means for the presence of foreign service node if the node failure, a node is selected as the highest priority node from the active foreign service node.
  7. 7.如权利要求6所述的系统,其特征在于,还包括: 比较模块,用于若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点; 替换模块,用于所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务;所述优先级最高的节点接收到所述接管请求后接管对外服务。 7. The system according to claim 6, characterized in that, further comprising: a comparison module configured if foreign service node receives the broadcast, there is a high ratio of the external service node priority node, from the ratio said high priority node in the foreign service node selected node with the highest priority; replacing module for taking over the foreign service node sends a request to the node with the highest priority, and stop foreign service; the highest priority node after receiving the takeover request to take over the external services.
  8. 8.如权利要求6所述的系统,其特征在于,所述确定模块包括: 所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。 8. The system according to claim 6, wherein the determining module comprises: each node of the network address according to the correspondence table, a broadcast transmission to the remaining network addresses of each node through the network.
  9. 9.如权利要求8所述的系统,其特征在于,所述选取模块包括: 每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号信息。 9. The system according to claim 8, wherein said selection module comprising: each node according to the node number information is received within the preset time, determining the node number information is not received broadcast.
  10. 10.如权利要求6至9任一项所述的系统,其特征在于,还包括: 更新模块,用于定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。 10. The system according to any one of claims 6-9, characterized in that, further comprising: an updating module, for timing the basic configuration information of each node in the cluster, the node number and the priority of the update.
CN 201510627329 2015-09-28 2015-09-28 Failover method and system for external service node of cluster CN105306545A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201510627329 CN105306545A (en) 2015-09-28 2015-09-28 Failover method and system for external service node of cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201510627329 CN105306545A (en) 2015-09-28 2015-09-28 Failover method and system for external service node of cluster

Publications (1)

Publication Number Publication Date
CN105306545A true true CN105306545A (en) 2016-02-03

Family

ID=55203288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201510627329 CN105306545A (en) 2015-09-28 2015-09-28 Failover method and system for external service node of cluster

Country Status (1)

Country Link
CN (1) CN105306545A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018032499A1 (en) * 2016-08-19 2018-02-22 华为技术有限公司 Load balancing method and associated device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512374A (en) * 2002-12-31 2004-07-14 联想(北京)有限公司 Method for node load information transfer and node survival detection in machine group
CN1819583A (en) * 2005-10-20 2006-08-16 北京邮电大学 Hierarchical tolerant invading scheme based on threshold
US20120297243A1 (en) * 2009-09-30 2012-11-22 International Business Machines Corporation Svc cluster configuration node failover system and method
CN103995901A (en) * 2014-06-10 2014-08-20 北京京东尚科信息技术有限公司 Method for determining data node failure
CN104038366A (en) * 2014-05-05 2014-09-10 深圳市中博科创信息技术有限公司 Cluster node failure detection method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1512374A (en) * 2002-12-31 2004-07-14 联想(北京)有限公司 Method for node load information transfer and node survival detection in machine group
CN1819583A (en) * 2005-10-20 2006-08-16 北京邮电大学 Hierarchical tolerant invading scheme based on threshold
US20120297243A1 (en) * 2009-09-30 2012-11-22 International Business Machines Corporation Svc cluster configuration node failover system and method
CN104038366A (en) * 2014-05-05 2014-09-10 深圳市中博科创信息技术有限公司 Cluster node failure detection method and system
CN103995901A (en) * 2014-06-10 2014-08-20 北京京东尚科信息技术有限公司 Method for determining data node failure

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018032499A1 (en) * 2016-08-19 2018-02-22 华为技术有限公司 Load balancing method and associated device

Similar Documents

Publication Publication Date Title
US8321862B2 (en) System for migrating a virtual machine and resource usage data to a chosen target host based on a migration policy
US20070083723A1 (en) Highly-available blade-based distributed computing system
US7225356B2 (en) System for managing operational failure occurrences in processing devices
US20110178985A1 (en) Master monitoring mechanism for a geographical distributed database
CN101640688A (en) Content delivery network (CDN)-based switching method for main node controller and spare controller and CDN
CN101977124A (en) Service clustering method and system based on ZooKeeper technology
US20150350102A1 (en) Method and System for Integrated Management of Converged Heterogeneous Resources in Software-Defined Infrastructure
US20070192459A1 (en) Control method of computer, program, and virtual computer system
CN102025550A (en) System and method for managing data in distributed cluster
CN103814554A (en) Communication method, device and system of virtual extensible local area network
CN102098354A (en) Method and server for allocating dynamic host configuration protocol (DHCP) addresses
CN103763121A (en) Method and device for quickly issuing network configuration information
CN101841462A (en) Method, device and system for informing VRRP state information
CN103078927A (en) Key-value data distributed caching system and method thereof
US20100318610A1 (en) Method and system for a weak membership tie-break
US20080232358A1 (en) Data Distribution in a Distributed Telecommunications Network
CN101472166A (en) Method for caching and enquiring content as well as point-to-point medium transmission system
CN102025630A (en) Load balancing method and load balancing system
CN103346904A (en) Fault-tolerant OpenFlow multi-controller system and control method thereof
CN101951369A (en) Batch terminal upgrading method and system based on automatic discovery
CN103618621A (en) Method, device and system for automatic configuration of SDN
CN102984501A (en) Network video-recording cluster system
CN103023928A (en) P2P (peer-to-peer) node matching system and method
US20160373405A1 (en) Managing dynamic ip address assignments
CN103248463A (en) Frame information transmitting method and equipment

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination