CN105306545A - Failover method and system for external service node of cluster - Google Patents
Failover method and system for external service node of cluster Download PDFInfo
- Publication number
- CN105306545A CN105306545A CN201510627329.5A CN201510627329A CN105306545A CN 105306545 A CN105306545 A CN 105306545A CN 201510627329 A CN201510627329 A CN 201510627329A CN 105306545 A CN105306545 A CN 105306545A
- Authority
- CN
- China
- Prior art keywords
- node
- external service
- broadcast
- priority
- serial number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1044—Group management mechanisms
- H04L67/1051—Group master selection mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0668—Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Hardware Redundancy (AREA)
- Computer And Data Communications (AREA)
Abstract
本发明公开了一种集群对外服务节点失效接管的方法及系统,为集群内每个节点分配内网地址,节点编号以及优先级,还包括:每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息;每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息;选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点;若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点;该方法及系统能够对外服务节点失效的时候进行合理有效的接管。
The invention discloses a method and a system for failure takeover of a cluster external service node, which allocates an internal network address, node number and priority for each node in the cluster, and also includes: The intranet addresses of other nodes send broadcasts; among them, the broadcast sent by each non-external service node includes its own node number, and the broadcast sent by an external service node includes its own node number and external service identification information; each node according to the received node number Number information, determine the node number information that has not received the broadcast; select the node corresponding to the same node number information that has not received the broadcast among all nodes as the failure node; if there is an external service node in the failure node, start from the effective node The node with the highest priority is selected as the external service node; the method and system can reasonably and effectively take over when the external service node fails.
Description
技术领域technical field
本发明涉及计算机领域,特别涉及一种集群对外服务节点失效接管的方法及系统。The present invention relates to the field of computers, in particular to a method and system for failure takeover of cluster external service nodes.
背景技术Background technique
一般的集群通常只能在少数几个节点之间进行对外服务的切换,而且配置进行改动的时候非常麻烦,当集群面对外服务的主节点失效的时候,只能启用已经设置好的几个备用节点中的一个来接管对外的服务。并且需要对主节点和几个备用节点提前进行分别配置,备用节点比较多的时候进行配置会非常麻烦。如果希望添加新备用节点或者删除旧备用节点,修改配置非常复杂,这样子会严重降低集群的可扩展性。另外此种方法配置多备用节点的时候容易出现接管顺序的混乱的情况。Ordinary clusters can only switch external services between a few nodes, and it is very troublesome to change the configuration. When the cluster faces the failure of the master node for external services, only a few nodes that have been set can be enabled. One of the standby nodes takes over external services. In addition, it is necessary to configure the master node and several backup nodes separately in advance, and it will be very troublesome to configure when there are many backup nodes. If you want to add a new standby node or delete an old standby node, modifying the configuration is very complicated, which will seriously reduce the scalability of the cluster. In addition, when this method configures multiple standby nodes, it is easy to cause confusion in the order of takeover.
因此,如何快速,简洁地实现集群对外服务节点失效接管,是本领域技术人员需要解决的技术问题。Therefore, how to quickly and concisely realize the failure takeover of the external service node of the cluster is a technical problem to be solved by those skilled in the art.
发明内容Contents of the invention
本发明的目的是提供一种集群对外服务节点失效接管的方法及系统,该方法及系统能够对外服务节点失效的时候进行合理有效的接管,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。The purpose of the present invention is to provide a method and system for cluster external service node failure takeover, the method and system can perform reasonable and effective takeover when the external service node fails, greatly reducing the cost of modifying configuration when adding or deleting cluster nodes It can also prevent human errors caused by complicated modification operations.
为解决上述技术问题,本发明提供一种集群对外服务节点失效接管的方法,为集群内每个节点分配内网地址,节点编号以及优先级,还包括:In order to solve the above-mentioned technical problems, the present invention provides a method for cluster external service node failure takeover, which allocates an intranet address, node number and priority for each node in the cluster, and also includes:
每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息;Each node sends a broadcast to the intranet address of other nodes except this node through the internal network; among them, the broadcast sent by each non-external service node includes its own node number, and the broadcast sent by an external service node includes its own node number and external service identification information;
每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息;Each node determines the node number information that has not received the broadcast according to the received node number information;
选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点;Select the node corresponding to the same node number information that has not received the broadcast among all nodes as the failed node;
若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。If there is an external service node among the failed nodes, select the node with the highest priority from the valid nodes as the external service node.
其中,还包括:Among them, also include:
若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点;If there is a node with a higher priority than the external service node in the broadcast received by the external service node, select the node with the highest priority from the nodes with higher priority than the external service node;
所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务;The external service node sends a takeover request to the node with the highest priority, and stops the external service;
所述优先级最高的节点接收到所述接管请求后接管对外服务。The node with the highest priority takes over the external service after receiving the takeover request.
其中,所述每个节点通过内网向除本节点之外的其他节点的内网地址发送广播包括:Wherein, each node sending a broadcast to the intranet address of other nodes except the node through the intranet includes:
所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。Each node sends a broadcast to the intranet address of each other node through the intranet according to the intranet address correspondence table.
其中,所述每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息包括:Wherein, according to the received node number information, each node determines that the node number information that has not received the broadcast includes:
每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号信息。Each node determines that no broadcasted node number information has been received according to the node number information received within the preset time.
其中,还包括:Among them, also include:
定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。Regularly update the basic configuration information of each node in the cluster, the node number and the priority.
本发明提供的一种集群对外服务节点失效接管的系统,包括:The present invention provides a system for failure takeover of cluster external service nodes, including:
设置模块,用于为集群内每个节点分配内网地址,节点编号以及优先级;The setting module is used to assign intranet address, node number and priority to each node in the cluster;
广播模块,用于每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息;The broadcast module is used for each node to send broadcasts to the internal network addresses of other nodes except this node through the internal network; wherein, the broadcast sent by each non-external service node includes its own node number, and the broadcast sent by the external service node Including its own node number and external service identification information;
确定模块,用于每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息;The determination module is used for each node to determine the node number information that has not received the broadcast according to the received node number information;
选取模块,用于选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点;The selection module is used to select the node corresponding to the same node number information that has not received the broadcast among all nodes as the failure node;
接管模块,用于若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。The takeover module is used to select the node with the highest priority from the valid nodes as the external service node if there is an external service node among the failed nodes.
其中,还包括:Among them, also include:
比较模块,用于若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点;The comparison module is used to select the node with the highest priority from the nodes with higher priority than the external service node if there is a node with higher priority than the external service node in the broadcast received by the external service node;
替换模块,用于所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务;所述优先级最高的节点接收到所述接管请求后接管对外服务。The replacement module is used for the external service node to send a takeover request to the node with the highest priority, and stop the external service; the node with the highest priority takes over the external service after receiving the takeover request.
其中,所述确定模块包括:Wherein, the determination module includes:
所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。Each node sends a broadcast to the intranet address of each other node through the intranet according to the intranet address correspondence table.
其中,所述选取模块包括:Wherein, the selection module includes:
每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号信息。Each node determines that no broadcasted node number information has been received according to the node number information received within the preset time.
其中,还包括:Among them, also include:
更新模块,用于定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。An update module, configured to regularly update the basic configuration information of each node in the cluster, the node number and the priority.
本发明所提供的集群对外服务节点失效接管的方法及系统,为集群内每个节点分配内网地址,节点编号以及优先级,还包括:每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息;每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息;选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点;若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点;The method and system for failure takeover of cluster external service nodes provided by the present invention assign intranet address, node number and priority to each node in the cluster, and also include: The internal network address of the node sends a broadcast; among them, the broadcast sent by each non-external service node includes its own node number, and the broadcast sent by an external service node includes its own node number and external service identification information; each node according to the received node number Information, determine the node number information that has not received the broadcast; select the node corresponding to the same node number information that has not received the broadcast among all nodes as the failure node; if there is an external service node in the failure node, from the valid node Select the node with the highest priority as the external service node;
由于该方法通过提前对节点进行配置,为每一个节点分配内网地址,节点编号以及优先级,每个节点都有自己的编号以及优先级,删除和添加时只需要对相应的参数进行修改,因此这样的方式在之后进行扩展的时候在添加或者删除节点时需要的操作将会很简单;且每个节点通过广播的形式进行身份的确认,以及是否失效可以准确快速的在对外服务节点失效的时候进行合理有效的接管。且不需要人为干扰,可以防止人为失误的产生;该方法及系统能够对外服务节点失效的时候进行合理有效的接管,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。Since this method configures the nodes in advance, each node is assigned an intranet address, node number, and priority. Each node has its own number and priority. When deleting and adding, only the corresponding parameters need to be modified. Therefore, when this method is expanded in the future, the operations required to add or delete nodes will be very simple; and each node will confirm its identity through broadcasting, and whether it is invalid can accurately and quickly confirm the failure of the external service node. Time for reasonable and effective takeover. And it does not require human interference, which can prevent human errors; the method and system can take over reasonably and effectively when the external service node fails, which greatly reduces the time spent on modifying the configuration when adding or deleting cluster nodes, and can also prevent Human errors caused by complex modification operations.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present invention, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.
图1为本发明实施例所提供的集群对外服务节点失效接管的方法的流程图;FIG. 1 is a flow chart of a method for failure takeover of a cluster external service node provided by an embodiment of the present invention;
图2为本发明实施例所提供的集群对外服务节点失效接管的系统的结构框图。FIG. 2 is a structural block diagram of a system for failure takeover of cluster external service nodes provided by an embodiment of the present invention.
具体实施方式detailed description
本发明的核心是提供一种集群对外服务节点失效接管的方法及系统,该方法及系统能够对外服务节点失效的时候进行合理有效的接管,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。The core of the present invention is to provide a method and system for cluster external service node failure takeover, the method and system can perform reasonable and effective takeover when the external service node fails, greatly reducing the cost of modifying configuration when adding or deleting cluster nodes It can also prevent human errors caused by complicated modification operations.
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
请参考图1,图1为本发明实施例所提供的集群对外服务节点失效接管的方法的流程图;该方法首先需要为集群内每个节点分配内网地址,节点编号以及优先级设置,其中,每台节点的基础配置保持一致;设置之后该方法可以包括:Please refer to Fig. 1, Fig. 1 is the flow chart of the method for cluster external service node failure takeover provided by the embodiment of the present invention; the method first needs to assign intranet address, node number and priority setting for each node in the cluster, wherein , the basic configuration of each node remains consistent; after setting, the method can include:
s100、每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息;s100. Each node sends a broadcast to the intranet address of other nodes except this node through the internal network; wherein, the broadcast sent by each non-external service node includes its own node number, and the broadcast sent by an external service node includes its own Node number and external service identification information;
其中,节点发送广播的目的是让每个节点了解除本身之外的其他节点的情况。因此每个节点发送的内容可以包括自身信息,例如包含自身节点的编号,若为对外服务节点还需要包括表示自己对外节点身份的标识信息;其他节点的信息可有可无。例如优先级信息可以通过广播发送,也可以其他节点根据广播得到的节点编号,再根据节点编号查找每个节点中保存的优先级列表。这里仅仅举出两种方式。Among them, the purpose of a node sending a broadcast is to let each node know the situation of other nodes other than itself. Therefore, the content sent by each node can include its own information, such as the number of its own node, and if it is an external service node, it also needs to include identification information indicating its external node identity; other node information is optional. For example, the priority information can be sent by broadcasting, or other nodes can search the priority list stored in each node according to the node number obtained by broadcasting. Here are just two ways.
s110、每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息;s110. Each node determines the node number information that has not received the broadcast according to the received node number information;
s120、选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点;s120. Select the node corresponding to the same node number information that has not received the broadcast among all nodes as the failure node;
其中,通过对各个节点的编号信息进行比较,得到失效节点的编号信息即确定失效节点;在所有节点都正常的情况下,应该每个节点都会收到其他节点的编号信息,即加上自己的编号为全部节点信息;当存在有节点失效时,则该节点并不能发送广播,因此,正常节点可以将接收到的节点编号与编号列表中的所有编号信息进行比对,确定缺少的编号信息,即确定为未接收到广播的节点编号信息。可以将所有正常节点得到的未接收到广播的节点编号信息进行汇总,在进行比较,确定出所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点。Among them, by comparing the number information of each node, the number information of the failed node is obtained to determine the failed node; when all nodes are normal, each node should receive the number information of other nodes, that is, add its own The number is all node information; when there is a node failure, the node cannot send broadcasts. Therefore, normal nodes can compare the received node number with all number information in the number list to determine the missing number information. That is, it is determined that the broadcasted node number information has not been received. The node number information that has not received the broadcast obtained by all normal nodes can be summarized and compared, and the node corresponding to the same node number information that has not received the broadcast among all nodes can be determined as the failure node.
s130、若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。s130. If there is an external service node among the failed nodes, select the node with the highest priority from the valid nodes as the external service node.
当失效节点中存在对外服务节点时,则需要在所有有效节点中选出优先级最高的节点来接管,作为对外服务节点。其中,各个节点的优先级可以通过实际情况进行设定和修改。通过设置优先级可以使得对外服务节点保持在有效节点中的最优的状态。When there is an external service node among the failed nodes, it is necessary to select the node with the highest priority among all valid nodes to take over as the external service node. Wherein, the priority of each node can be set and modified according to the actual situation. By setting the priority, the external service node can be kept in the optimal state among the valid nodes.
基于上述技术方案,本发明实施例提供的集群对外服务节点失效接管的方法,该方法通过提前对节点进行配置,为每一个节点分配内网地址,节点编号以及优先级,每个节点都有自己的编号以及优先级,删除和添加时只需要对相应的参数进行修改,因此这样的方式在之后进行扩展的时候在添加或者删除节点时需要的操作将会很简单;且每个节点通过广播的形式进行身份的确认,以及是否失效可以准确快速的在对外服务节点失效的时候进行合理有效的接管。且不需要人为干扰,可以防止人为失误的产生;该方法及系统能够对外服务节点失效的时候进行合理有效的接管,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。Based on the above technical solution, the embodiment of the present invention provides a method for failure takeover of cluster external service nodes. The method configures the nodes in advance and assigns each node an intranet address, node number and priority. Each node has its own The number and priority of the node, only need to modify the corresponding parameters when deleting and adding, so when this method is expanded later, the operations required to add or delete nodes will be very simple; and each node will pass the broadcast Confirmation of identity in the form, and whether it is invalid can accurately and quickly carry out reasonable and effective takeover when the external service node fails. And it does not require human interference, which can prevent human errors; the method and system can take over reasonably and effectively when the external service node fails, which greatly reduces the time spent on modifying the configuration when adding or deleting cluster nodes, and can also prevent Human errors caused by complex modification operations.
基于上述技术方案,该方法还可以包括:Based on the above technical solution, the method may also include:
若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点;If there is a node with a higher priority than the external service node in the broadcast received by the external service node, select the node with the highest priority from the nodes with higher priority than the external service node;
所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务;The external service node sends a takeover request to the node with the highest priority, and stops the external service;
所述优先级最高的节点接收到所述接管请求后接管对外服务。The node with the highest priority takes over the external service after receiving the takeover request.
其中,节点编号的规律可以根据使用情况进行设定,例如将所有节点按照优先级从高到低的顺序进行排列。Wherein, the regularity of node numbering can be set according to usage conditions, for example, all nodes are arranged in order of priority from high to low.
其中,该系统会实时根据有效节点以及对外服务节点的优先级的情况进行实时接管,始终保持着在所有有效节点中优先级最高的节点为对外服务节点。举出一个具体过程可以如下:Among them, the system will take over in real time according to the priority of valid nodes and external service nodes, and always maintain the node with the highest priority among all valid nodes as the external service node. A specific process can be given as follows:
通过使用分配内网的网段,在所有节点上准备一个固定的内网地址,节点编号和优先度对应表,只需要给所有节点分配节点编号,让节点自身根据对应表自动配置内网地址。然后再通过使用内网广播,让所有节点在内网中向对应表中的所有内网地址定时发送含有自身节点编号广播,其中对外服务的节点发送的广播中会额外添加一段特殊的信息,表明自己是对外提供服务的节点。By using the network segment assigned to the intranet, prepare a fixed intranet address, node number and priority correspondence table on all nodes, and only need to assign node numbers to all nodes, so that the nodes themselves can automatically configure the intranet address according to the corresponding table. Then, by using the intranet broadcast, let all nodes in the intranet regularly send broadcasts containing their own node numbers to all intranet addresses in the corresponding table, and an additional piece of special information will be added to the broadcasts sent by nodes serving externally, indicating that It is a node that provides services to the outside world.
如果所有节点的在监听的过程中都无法收到某节点的广播,则视该节点为失效节点;反之则视该节点为有效节点。如果对外提供服务的节点失效,则从有效节点中选出优先度最高的节点开始提供对外服务。如果提供对外服务的节点收到了来自比自身优先度高的节点的广播,则会停止对外服务,并且向该节点发送接管请求,当节点优先度更高的节点接收到接管请求,就会进行对外服务的接管,然后开始提供对外服务。If all nodes cannot receive the broadcast of a certain node during the monitoring process, the node is regarded as a failed node; otherwise, the node is regarded as a valid node. If the node that provides external services fails, the node with the highest priority is selected from the valid nodes to provide external services. If a node providing external services receives a broadcast from a node with a higher priority than itself, it will stop the external service and send a takeover request to the node. When the node with a higher priority receives the takeover request, it will perform external Takeover of the service, and then start providing external services.
基于上述各个技术方案,可选的,所述每个节点通过内网向除本节点之外的其他节点的内网地址发送广播可以包括:Based on the above technical solutions, optionally, each node sending a broadcast to the intranet address of other nodes other than the node through the intranet may include:
所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。Each node sends a broadcast to the intranet address of each other node through the intranet according to the intranet address correspondence table.
其中,可以将所有节点的内网网址进行列表,并可以存储在每一个节点中;各个节点根据内网地址对应表,通过内网向其余除本节点之外的其余每个节点的内网地址发送广播。Among them, the intranet URLs of all nodes can be listed, and can be stored in each node; each node, according to the intranet address correspondence table, sends the other intranet addresses of each node except this node through the intranet Send a broadcast.
这里发送的时间可以是随时发送,也可以是根据预定时间间隔进行发送。或者几种模式并存,可以通过用户进行设置。即根据实际情况进行设定。Here, the sending time may be at any time, or at predetermined time intervals. Or several modes coexist, which can be set by the user. That is, set it according to the actual situation.
基于上述各个技术方案,可选的,所述每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息可以包括:Based on each of the above technical solutions, optionally, each node, according to the received node number information, determines that the node number information that has not received the broadcast may include:
每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号信息。Each node determines that no broadcasted node number information has been received according to the node number information received within the preset time.
其中,由于节点发送广播并不是每时每刻都在发送,因此,在确定无效节点时,必须要有时间限制,否则会造成混乱。因此,每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号信息。这里的预设时间可以是节点发送广播的预设的时间间隔,也可以小于该时间间隔。Among them, since the nodes do not send broadcasts every moment, there must be a time limit when determining invalid nodes, otherwise it will cause confusion. Therefore, each node determines that it has not received the broadcasted node number information according to receiving the node number information within a preset time. The preset time here may be a preset time interval for a node to send a broadcast, or may be less than the time interval.
基于上述任意具体技术方案,该方法还可以包括:Based on any of the above specific technical solutions, the method may also include:
定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。Regularly update the basic configuration information of each node in the cluster, the node number and the priority.
其中,根据技术的改进,或者设备的使用情况,用户可以随时或者定期对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。使得该节点信息始终为最优的信息,从而保证了对外服务节点的能力。Wherein, according to technical improvement or equipment usage, the user can update the basic configuration information of each node in the cluster, the node number and the priority at any time or periodically. The node information is always the optimal information, thus ensuring the ability of external service nodes.
具体过程为:The specific process is:
节点总数不固定的情况下,搭建集群并需要对外提供高可用服务的时候,我们需要对每台节点的基础配置保持一致,然后再对每个节点进行编号,分配各自的接管对外服务的优先级,优先级最高的节点获得对外服务的权限。所有节点都向其余节点进行广播,如果所有节点在一定时间内没有收到某一节点的广播,则视该节点失效。如果对外服务的节点失效了,就由剩余有效节点中优先级最高的负责接管对外服务;如果出现了更高优先级的节点,当前提供对外服务的节点会停止对外服务,并且向更高优先级的节点发送接管请求,更高优先级的节点收到请求以后接管对外服务。When the total number of nodes is not fixed, when building a cluster and needing to provide high-availability services to the outside world, we need to keep the basic configuration of each node consistent, and then number each node and assign their respective priorities for taking over external services , the node with the highest priority obtains the authority for external services. All nodes broadcast to other nodes. If all nodes do not receive a broadcast from a certain node within a certain period of time, the node will be considered invalid. If the node serving external services fails, the remaining effective nodes with the highest priority will be responsible for taking over external services; if a node with a higher priority appears, the node currently providing external services will stop external services and transfer The node with the highest priority sends a takeover request, and the higher priority node takes over the external service after receiving the request.
基于上述技术方案,本发明实施例提供的集群对外服务节点失效接管的方法,改变了现有技术中集群通常只能在少数几个节点之间进行对外服务的切换,而且配置进行改动的时候非常麻烦。该方法可以在集群节点数量不固定的情况下对节点进行快速配置,使得对外服务节点失效的时候进行合理有效的接管,并且在添加或者删除节点时需要的操作简单快捷,极大得减少在添加或者删除集群节点时修改配置耗费的时间,也可以防止由于修改操作复杂引起的人为失误。且配置节点简单,只需要一个通用的配置加上单独设置节点编号即可,所有的节点拥有接管对外服务的能力,所有的节点拥有接管对外服务的能力。Based on the above-mentioned technical solution, the method for cluster external service node failure takeover provided by the embodiment of the present invention changes the conventional technology that the cluster can only switch external services between a few nodes, and it is very difficult to change the configuration. trouble. This method can quickly configure nodes when the number of cluster nodes is not fixed, so that when the external service node fails, it can take over reasonably and effectively, and the operation required when adding or deleting nodes is simple and fast, which greatly reduces the need to add Or the time it takes to modify the configuration when deleting a cluster node can also prevent human errors caused by complex modification operations. And the configuration of nodes is simple, only a general configuration plus a separate node number is required, all nodes have the ability to take over external services, and all nodes have the ability to take over external services.
本发明实施例提供了集群对外服务节点失效接管的方法,可以通过上述方法能够对外服务节点失效的时候进行合理有效的接管。The embodiment of the present invention provides a method for taking over failure of the external service node of the cluster, which can perform reasonable and effective takeover when the external service node fails through the above method.
下面对本发明实施例提供的集群对外服务节点失效接管的系统进行介绍,下文描述的集群对外服务节点失效接管的系统与上文描述的集群对外服务节点失效接管的方法可相互对应参照。The system for failover of cluster external service nodes provided by the embodiment of the present invention is introduced below. The system for failover of cluster external service nodes described below and the method for cluster external service node failover described above can be referred to each other.
请参考图2,图2为本发明实施例所提供的集群对外服务节点失效接管的系统的结构框图;该系统可以包括:Please refer to FIG. 2. FIG. 2 is a structural block diagram of a system for cluster external service node failover provided by an embodiment of the present invention; the system may include:
设置模块100,用于为集群内每个节点分配内网地址,节点编号以及优先级;The setting module 100 is used to assign intranet address, node number and priority to each node in the cluster;
广播模块200,用于每个节点通过内网向除本节点之外的其他节点的内网地址发送广播;其中,每个非对外服务节点发送的广播包括自身的节点编号,对外服务节点发送的广播包括自身的节点编号及对外服务标识信息;The broadcast module 200 is used for each node to send broadcasts to the intranet addresses of other nodes except this node through the internal network; wherein, the broadcast sent by each non-external service node includes its own node number, and the broadcast sent by the external service node The broadcast includes its own node number and external service identification information;
确定模块300,用于每个节点根据接收到节点编号信息,确定未接收到广播的节点编号信息;The determination module 300 is used for each node to determine the node number information that has not received the broadcast according to the received node number information;
选取模块400,用于选取所有节点中相同的未接收到广播的节点编号信息相对应的节点作为失效节点;The selection module 400 is used to select the same node corresponding to the node number information that has not received the broadcast among all nodes as the failure node;
接管模块500,用于若所述失效节点中存在对外服务节点时,从有效节点中选出优先级最高的节点作为对外服务节点。The takeover module 500 is configured to select the node with the highest priority from the valid nodes as the external service node if there is an external service node among the failed nodes.
可选的,该系统还可以包括:Optionally, the system can also include:
比较模块,用于若对外服务节点接收到的广播中,存在比所述对外服务节点优先级高的节点时,从比所述对外服务节点优先级高的节点中选取优先级最高的节点;The comparison module is used to select the node with the highest priority from the nodes with higher priority than the external service node if there is a node with higher priority than the external service node in the broadcast received by the external service node;
替换模块,用于所述对外服务节点向所述优先级最高的节点发送接管请求,并停止对外服务;所述优先级最高的节点接收到所述接管请求后接管对外服务。The replacement module is used for the external service node to send a takeover request to the node with the highest priority, and stop the external service; the node with the highest priority takes over the external service after receiving the takeover request.
可选的,所述确定模块300可以包括:Optionally, the determining module 300 may include:
所述每个节点根据内网地址对应表,通过内网向其余每个节点的内网地址发送广播。Each node sends a broadcast to the intranet address of each other node through the intranet according to the intranet address correspondence table.
可选的,所述选取模块400可以包括:Optionally, the selecting module 400 may include:
每个节点根据在预设时间内接收到节点编号信息,确定未接收到广播的节点编号信息。Each node determines that no broadcasted node number information has been received according to the node number information received within the preset time.
基于上述任意技术方案,该系统还可以包括:Based on any of the above technical solutions, the system may also include:
更新模块,用于定时对集群内每个节点的基本配置信息,所述节点编号以及所述优先级进行更新。An update module, configured to regularly update the basic configuration information of each node in the cluster, the node number and the priority.
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in the description is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible For interchangeability, in the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.
以上对本发明所提供的集群对外服务节点失效接管的方法及系统进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。The method and system for failure takeover of cluster external service nodes provided by the present invention have been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of the present invention, and the descriptions of the above embodiments are only used to help understand the method and core idea of the present invention. It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, some improvements and modifications can be made to the present invention, and these improvements and modifications also fall within the protection scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510627329.5A CN105306545B (en) | 2015-09-28 | 2015-09-28 | A kind of method and system of the external service node Takeover of cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510627329.5A CN105306545B (en) | 2015-09-28 | 2015-09-28 | A kind of method and system of the external service node Takeover of cluster |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105306545A true CN105306545A (en) | 2016-02-03 |
CN105306545B CN105306545B (en) | 2018-09-07 |
Family
ID=55203288
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510627329.5A Active CN105306545B (en) | 2015-09-28 | 2015-09-28 | A kind of method and system of the external service node Takeover of cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105306545B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018032499A1 (en) * | 2016-08-19 | 2018-02-22 | 华为技术有限公司 | Load balancing method and associated device |
CN109542627A (en) * | 2018-11-30 | 2019-03-29 | 北京金山云网络技术有限公司 | Node switching method, device, supervisor, node device and distributed system |
CN110048898A (en) * | 2019-05-14 | 2019-07-23 | 威创集团股份有限公司 | A kind of distribution joined screen system method for realizing redundancy and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512374A (en) * | 2002-12-31 | 2004-07-14 | 联想(北京)有限公司 | Method for node load information transfer and node survival detection in machine group |
CN1819583A (en) * | 2005-10-20 | 2006-08-16 | 北京邮电大学 | Hierarchical tolerant invading scheme based on threshold |
US20120297243A1 (en) * | 2009-09-30 | 2012-11-22 | International Business Machines Corporation | Svc cluster configuration node failover system and method |
CN103995901A (en) * | 2014-06-10 | 2014-08-20 | 北京京东尚科信息技术有限公司 | Method for determining data node failure |
CN104038366A (en) * | 2014-05-05 | 2014-09-10 | 深圳市中博科创信息技术有限公司 | Cluster node failure detection method and system |
-
2015
- 2015-09-28 CN CN201510627329.5A patent/CN105306545B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512374A (en) * | 2002-12-31 | 2004-07-14 | 联想(北京)有限公司 | Method for node load information transfer and node survival detection in machine group |
CN1819583A (en) * | 2005-10-20 | 2006-08-16 | 北京邮电大学 | Hierarchical tolerant invading scheme based on threshold |
US20120297243A1 (en) * | 2009-09-30 | 2012-11-22 | International Business Machines Corporation | Svc cluster configuration node failover system and method |
CN104038366A (en) * | 2014-05-05 | 2014-09-10 | 深圳市中博科创信息技术有限公司 | Cluster node failure detection method and system |
CN103995901A (en) * | 2014-06-10 | 2014-08-20 | 北京京东尚科信息技术有限公司 | Method for determining data node failure |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018032499A1 (en) * | 2016-08-19 | 2018-02-22 | 华为技术有限公司 | Load balancing method and associated device |
RU2716748C1 (en) * | 2016-08-19 | 2020-03-16 | Хуавей Текнолоджиз Ко., Лтд. | Load balancing method and associated device thereof |
US11070614B2 (en) | 2016-08-19 | 2021-07-20 | Huawei Technologies Co., Ltd. | Load balancing method and related apparatus |
CN109542627A (en) * | 2018-11-30 | 2019-03-29 | 北京金山云网络技术有限公司 | Node switching method, device, supervisor, node device and distributed system |
CN110048898A (en) * | 2019-05-14 | 2019-07-23 | 威创集团股份有限公司 | A kind of distribution joined screen system method for realizing redundancy and device |
Also Published As
Publication number | Publication date |
---|---|
CN105306545B (en) | 2018-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107295080B (en) | Data storage method applied to distributed server cluster and server | |
US8375001B2 (en) | Master monitoring mechanism for a geographical distributed database | |
EP3490224A1 (en) | Data synchronization method and system | |
CN111190736A (en) | Low-intrusion distributed timing task scheduling system and method based on microservice | |
EP3745678B1 (en) | Storage system, and method and apparatus for allocating storage resources | |
CN104604193B (en) | The automatic management method and device of network infrastructure with virtual unit environmental functional | |
CN106161272B (en) | Realize the method and routing device of VRRP load balancing | |
CN110597664A (en) | High-availability cluster resource deployment method, device and related components | |
CN107508694B (en) | Node management method and node equipment in cluster | |
WO2015074396A1 (en) | Automatic configuration method, device and system of software defined network | |
WO2019128670A1 (en) | Method and apparatus for enabling self-recovery of management capability in distributed system | |
CN108989476B (en) | Address allocation method and device | |
CN105959078B (en) | A kind of cluster method for synchronizing time, cluster and clock synchronization system | |
CN105847352B (en) | Expansion method, device and distributed cache system based on distributed cache system | |
CN110971662A (en) | Two-node high-availability implementation method and device based on Ceph | |
CN111355600B (en) | Main node determining method and device | |
CN105306545B (en) | A kind of method and system of the external service node Takeover of cluster | |
CN111404978A (en) | Data storage method and cloud storage system | |
CN107819556A (en) | A kind of service state switching method and device | |
JP5039975B2 (en) | Gateway device | |
CN106375210A (en) | Method for realizing VRRP (Virtual Router Redundancy Protocol) downlink load balancing and route devices | |
CN104935614B (en) | Data transmission method and device | |
CN101026613B (en) | Data link protection method and device | |
CN105007233B (en) | A kind of method that distribution address is loaded based on Dynamic Host Configuration Protocol server cluster | |
CN106534758B (en) | Conference backup method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |