CN114338370A - Ambari's high availability method, system, apparatus, electronic device and storage medium - Google Patents

Ambari's high availability method, system, apparatus, electronic device and storage medium Download PDF

Info

Publication number
CN114338370A
CN114338370A CN202210021964.9A CN202210021964A CN114338370A CN 114338370 A CN114338370 A CN 114338370A CN 202210021964 A CN202210021964 A CN 202210021964A CN 114338370 A CN114338370 A CN 114338370A
Authority
CN
China
Prior art keywords
node
central node
central
distributed coordinator
device information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210021964.9A
Other languages
Chinese (zh)
Inventor
张世龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202210021964.9A priority Critical patent/CN114338370A/en
Publication of CN114338370A publication Critical patent/CN114338370A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The application relates to a high-availability method, a system, a device, electronic equipment and a storage medium of Ambari, which are applied to the technical field of service management, wherein the method comprises the following steps: when the abnormality of the current main central node is monitored, deleting first equipment information written in the distributed coordinator by the current main central node, wherein the current main central node is one of at least two central nodes; sending a writable signal to at least one central node, so that the at least one central node writes own second equipment information into the distributed coordinator according to the writable signal; determining a target central node in which the second equipment information is successfully written in at least one central node; and sending a switching signal to the target central node so that the target central node switches the main state from the standby state to the main state according to the switching signal.

Description

Ambari的高可用方法、系统、装置、电子设备和存储介质Ambari's high availability method, system, apparatus, electronic device, and storage medium

技术领域technical field

本申请涉及服务管理技术领域,尤其涉及一种Ambari的高可用方法、系统、装置、电子设备和存储介质。The present application relates to the technical field of service management, and in particular, to an Ambari high-availability method, system, apparatus, electronic device and storage medium.

背景技术Background technique

Ambari是一种基于Web的工具,支持Apache Hadoop集群的供应、管理和监控。Ambari is a web-based tool that supports provisioning, management, and monitoring of Apache Hadoop clusters.

通常,Ambari分为master和slave两类角色,master是中心节点,master管理slave节点,并控制slave执行命令。通常master为单点模式,容错率低,会出现单点故障的问题,一旦master出现故障不可用,项目无法正常运行;并且,不支持高可用。Usually, Ambari is divided into two roles: master and slave. The master is the central node. The master manages the slave node and controls the slave to execute commands. Usually the master is in a single-point mode, with a low fault tolerance rate, and there will be a single point of failure. Once the master fails and becomes unavailable, the project cannot run normally; and high availability is not supported.

相关技术中,Ambari高可用方案主要是基于DNS的冷备方案。具体为,在master故障时,运维人员修改DNS的配置,让slave解析出新的master的ip,从而连接到新的master。但是,这种方式,需要人工介入才能恢复故障,人工介入之前,服务不可用。In related technologies, the Ambari high-availability solution is mainly a DNS-based cold backup solution. Specifically, when the master fails, the operation and maintenance personnel modify the DNS configuration so that the slave can resolve the IP of the new master to connect to the new master. However, in this way, manual intervention is required to recover the fault, and the service is unavailable before manual intervention.

发明内容SUMMARY OF THE INVENTION

本申请提供了一种Ambari的高可用方法、系统、装置、电子设备和存储介质,用以解决现有技术中,需要人工介入才能恢复故障,人工介入之前,服务不可用的问题。The present application provides an Ambari high-availability method, system, device, electronic device, and storage medium to solve the problem in the prior art that manual intervention is required to restore faults, and services are unavailable before manual intervention.

第一方面,本申请实施例提供了一种Ambari的高可用方法,应用于分布式协调器,所述分布式协调器与至少两个中心节点链接,所述方法包括:In a first aspect, an embodiment of the present application provides an Ambari high availability method, which is applied to a distributed coordinator, where the distributed coordinator is linked with at least two central nodes, and the method includes:

在监测到当前主中心节点异常后,删除所述当前主中心节点在分布式协调器中写入的第一设备信息;所述当前主中心节点为所述至少两个中心节点之一;After detecting that the current main center node is abnormal, delete the first device information written by the current main center node in the distributed coordinator; the current main center node is one of the at least two center nodes;

向至少一个所述中心节点发送可写入信号,以使所述至少一个中心节点根据所述可写入信号向所述分布式协调器写入自身的第二设备信息;sending a writable signal to at least one of the central nodes, so that the at least one central node writes its own second device information to the distributed coordinator according to the writable signal;

确定至少一个所述中心节点中,成功写入所述第二设备信息的目标中心节点;Determine at least one of the central nodes, the target central node to which the second device information is successfully written;

向所述目标中心节点发送切换信号,以使所述目标中心节点根据所述切换信号将主备状态由备状态切换为主状态。A switching signal is sent to the target central node, so that the target central node switches the active-standby state from the standby state to the main state according to the switching signal.

可选的,监测到当前主中心节点异常,包括:Optionally, the abnormality of the current main center node is detected, including:

监测到与所述当前主中心节点链接异常的时长超过第一预设时长。It is detected that the duration of the abnormal link with the current main central node exceeds the first preset duration.

可选的,所述第一预设时长大于第二预设时长,所述第二预设时长大于所述分布式协调器与所述当前主中心节点链接的心跳周期。Optionally, the first preset duration is greater than the second preset duration, and the second preset duration is greater than the heartbeat period of the link between the distributed coordinator and the current master central node.

可选的,所述分布式协调器包括主节点和至少两个第一从节点;所述确定所述中心节点中,成功写入所述第二设备信息的目标中心节点,包括:Optionally, the distributed coordinator includes a master node and at least two first slave nodes; the determining of the center node, the target center node that successfully writes the second device information, includes:

在所述中心节点的第二设备信息写入所述主节点后,通过所述主节点将所述第二设备信息,依次同步至所述至少两个第一从节点;After the second device information of the central node is written into the master node, the second device information is sequentially synchronized to the at least two first slave nodes through the master node;

将首个同步的所述第一从节点的数量达到预设值的所述中心节点,确定为成功写入所述第二设备信息的所述目标中心节点。The first central node whose number of synchronized first slave nodes reaches a preset value is determined as the target central node that successfully writes the second device information.

可选的,所述预设值为所述第一从节点的总数量的一半。Optionally, the preset value is half of the total number of the first slave nodes.

可选的,所述分布式协调器为zookeeper或etcd。Optionally, the distributed coordinator is zookeeper or etcd.

第二方面,本申请实施例提供了一种Ambari的高可用方法,应用于中心节点,包括:In the second aspect, an embodiment of the present application provides an Ambari high availability method, which is applied to a central node, including:

获取分布式协调器发送的可写入信号,所述可写入信号是所述分布式协调器在删除所述当前主中心节点在分布式协调器的目标节点中写入的第一设备信息后发送的;Obtain a writable signal sent by the distributed coordinator, where the writable signal is the first device information written by the distributed coordinator in the target node of the distributed coordinator after the distributed coordinator deletes the current main center node sent;

根据所述可写入信号,向所述分布式协调器写入自身的第二设备信息;According to the writable signal, write its own second device information to the distributed coordinator;

获取所述分布式协调器发送的切换信号;obtaining the handover signal sent by the distributed coordinator;

根据所述切换信号将主备状态由备状态切换为主状态。The active/standby state is switched from the standby state to the main state according to the switch signal.

可选的,还包括:Optionally, also include:

获取第二从节点发送的访问请求;Obtain the access request sent by the second slave node;

若所述主备状态为主状态,响应所述访问请求,并与所述第二从节点建立链接。If the active/standby state is the master state, respond to the access request and establish a link with the second slave node.

第三方面,本申请实施例提供了一种Ambari的高可用系统,包括:分布式协调器和至少两个中心节点,所述分布式协调器与至少两个中心节点链接;In a third aspect, an embodiment of the present application provides an Ambari high-availability system, including: a distributed coordinator and at least two central nodes, the distributed coordinator is linked with the at least two central nodes;

所述分布式协调器,用于在监测到当前主中心节点异常后,删除所述当前主中心节点在分布式协调器的目标节点中写入的第一设备信息;并向至少一个所述中心节点发送可写入信号;所述当前主中心节点为所述至少两个中心节点之一;The distributed coordinator is used to delete the first device information written by the current main center node in the target node of the distributed coordinator after monitoring that the current main center node is abnormal; The node sends a writable signal; the current main central node is one of the at least two central nodes;

所述中心节点,用于根据所述可写入信号向所述分布式协调器发送自身的第二设备信息;the central node, configured to send its own second device information to the distributed coordinator according to the writable signal;

所述分布式协调器,还用于确定所述中心节点中,成功写入所述第二设备信息的目标中心节点;The distributed coordinator is further configured to determine, in the central node, the target central node to which the second device information is successfully written;

所述中心节点,还用于根据所述切换信号将主备状态由备状态切换为主状态。The central node is further configured to switch the active-standby state from the standby state to the active state according to the switching signal.

可选的,还包括:至少一个第二从节点;Optionally, it also includes: at least one second slave node;

所述第二从节点,用于向所述中心节点发送访问请求;the second slave node, configured to send an access request to the central node;

所述中心节点,还用于获取所述第二从节点发送访问请求;并在所述主备状态为主状态时,响应所述访问请求,并与所述第二从节点建立链接。The central node is further configured to obtain an access request sent by the second slave node; and when the master-standby state is the master state, respond to the access request and establish a link with the second slave node.

第四方面,本申请实施例提供了一种Ambari的高可用装置,包括:In a fourth aspect, an embodiment of the present application provides a high-availability device for Ambari, including:

删除模块,用于在监测到当前主中心节点异常后,删除所述当前主中心节点在分布式协调器中写入的第一设备信息;所述分布式协调器与至少两个中心节点链接,所述当前主中心节点为所述至少两个中心节点之一;a deletion module, configured to delete the first device information written in the distributed coordinator by the current main central node after monitoring the abnormality of the current main central node; the distributed coordinator is linked with at least two central nodes, The current main central node is one of the at least two central nodes;

第一发送模块,用于向至少一个所述中心节点发送可写入信号,以使所述至少一个中心节点根据所述可写入信号向所述分布式协调器写入自身的第二设备信息;a first sending module, configured to send a writable signal to at least one of the central nodes, so that the at least one central node writes its own second device information to the distributed coordinator according to the writable signal ;

确定模块,用于确定所述至少一个中心节点中,成功写入所述第二设备信息的目标中心节点;a determining module, configured to determine, in the at least one central node, the target central node to which the second device information is successfully written;

第二发送模块,用于向所述目标中心节点发送切换信号,以使所述目标中心节点根据所述切换信号将主备状态由备状态切换为主状态。The second sending module is configured to send a switching signal to the target central node, so that the target central node switches the active-standby state from the standby state to the main state according to the switching signal.

第五方面,本申请实施例提供了一种Ambari的高可用装置,包括:In a fifth aspect, an embodiment of the present application provides a high-availability device for Ambari, including:

第一获取模块,用于获取分布式协调器发送的可写入信号,所述可写入信号是所述分布式协调器在删除所述当前主中心节点在分布式协调器的目标节点中写入的第一设备信息后发送的;所述分布式协调器与至少两个中心节点链接,所述当前主中心节点为所述至少两个中心节点之一;The first obtaining module is configured to obtain a writable signal sent by the distributed coordinator, where the writable signal is written by the distributed coordinator in the target node of the distributed coordinator when the current main center node is deleted. sent after the entered first device information; the distributed coordinator is linked with at least two central nodes, and the current main central node is one of the at least two central nodes;

写入模块,用于根据所述可写入信号,向所述分布式协调器写入自身的第二设备信息;a writing module, configured to write its own second device information to the distributed coordinator according to the writable signal;

第二获取模块,用于获取所述分布式协调器发送的切换信号;a second acquiring module, configured to acquire the handover signal sent by the distributed coordinator;

切换模块,用于根据所述切换信号将主备状态由备状态切换为主状态。The switching module is configured to switch the active-standby state from the standby state to the main state according to the switching signal.

第六方面,本申请实施例提供了一种电子设备,包括:处理器、通信接口、存储器和通信总线,其中,处理器、通信接口和存储器通过通信总线完成相互间的通信;In a sixth aspect, an embodiment of the present application provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;

所述存储器,用于存储计算机程序;the memory for storing computer programs;

所述处理器,用于执行所述存储器中所存储的程序,实现第一方面或第二方面所述的Ambari的高可用方法。The processor is configured to execute the program stored in the memory to implement the high availability method of Ambari according to the first aspect or the second aspect.

第七方面,本申请实施例提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现第一方面或第二方面所述的Ambari的高可用方法。In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the high-availability method of Ambari described in the first aspect or the second aspect is implemented.

本申请实施例提供的上述技术方案与现有技术相比具有如下优点:本申请实施例提供的该方法,通过在监测到当前主中心节点异常后,删除当前主中心节点在分布式协调器中写入的第一设备信息;当前主中心节点为至少两个中心节点之一;向至少一个中心节点发送可写入信号,以使至少一个中心节点根据可写入信号向分布式协调器写入自身的第二设备信息;确定至少一个中心节点中,成功写入第二设备信息的目标中心节点;向目标中心节点发送切换信号,以使目标中心节点根据切换信号将主备状态由备状态切换为主状态。如此,通过分布式协调器对中心节点进行管理,在监测到当前主中心节点异常后,将当前主中心节点写入的第一设备信息删除,从而使其他中心节点将自身的第二设备信息写入该分布式协调器,成功写入第二设备信息的中心节点的主备状态切换为主状态,从而,无需人工参与,由中心节点在收到写入信号后,主动写入第二设备信息,进而确定出新的主中心节点,应用更加方便。Compared with the prior art, the above technical solutions provided by the embodiments of the present application have the following advantages: the method provided by the embodiments of the present application deletes the current main center node in the distributed coordinator after monitoring that the current main center node is abnormal. The written first device information; the current main central node is one of at least two central nodes; send a writable signal to at least one central node, so that at least one central node writes to the distributed coordinator according to the writable signal its own second device information; determine the target central node that successfully writes the second device information in at least one central node; send a switching signal to the target central node, so that the target central node switches the active-standby state from the standby state according to the switching signal main state. In this way, the central node is managed by the distributed coordinator. After monitoring the abnormality of the current main central node, the first device information written by the current main central node is deleted, so that other central nodes can write their own second device information. After entering the distributed coordinator, the active/standby state of the central node that successfully writes the second device information is switched to the main state. Therefore, without manual participation, the central node actively writes the second device information after receiving the write signal. , and then determine the new main center node, the application is more convenient.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.

图1为本申请一实施例提供的Ambari的高可用系统的结构图;FIG. 1 is a structural diagram of a high availability system of Ambari provided by an embodiment of the present application;

图2为本申请一实施例提供的Ambari的高可用方法的流程图;2 is a flowchart of a high availability method of Ambari provided by an embodiment of the present application;

图3为本申请另一实施例提供的Ambari的高可用方法的流程图;3 is a flowchart of a high availability method of Ambari provided by another embodiment of the present application;

图4为本申请另一实施例提供的Ambari的高可用系统的结构图;4 is a structural diagram of a high availability system of Ambari provided by another embodiment of the present application;

图5为本申请一实施例提供的Ambari的高可用装置的结构图;FIG. 5 is a structural diagram of a high availability device of Ambari provided by an embodiment of the application;

图6为本申请另一实施例提供的Ambari的高可用装置的结构图;6 is a structural diagram of a high availability device of Ambari provided by another embodiment of the present application;

图7为本申请一实施例提供的电子设备的结构图。FIG. 7 is a structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present application.

在对本发明实施例进行进一步详细说明之前,对本发明实施例中涉及的名词和术语进行说明,本发明实施例中涉及的名词和术语适用于如下的解释。Before the embodiments of the present invention are further described in detail, the terms and terms involved in the embodiments of the present invention are described, and the terms and terms involved in the embodiments of the present invention are applicable to the following explanations.

高可用:HA(High Availability)是分布式系统架构设计中必须考虑的因素之一,它通常是指,通过设计减少系统不能提供服务的时间。High availability: HA (High Availability) is one of the factors that must be considered in the design of distributed system architecture. It usually refers to reducing the time when the system cannot provide services through design.

Ambari:是一种基于Web的工具,支持Apache Hadoop集群的供应、管理和监控。Ambari已支持大多数Hadoop组件,包括HDFS、MapReduce、Hive、Pig、Hbase、Zookeeper、Sqoop和Hcatalog等。Ambari: is a web-based tool that supports provisioning, management, and monitoring of Apache Hadoop clusters. Ambari already supports most Hadoop components, including HDFS, MapReduce, Hive, Pig, Hbase, Zookeeper, Sqoop, and Hcatalog, among others.

zookeeper:zookeeper是一个分布式的,开放源码的分布式应用程序协调服务。zookeeper: zookeeper is a distributed, open source distributed application coordination service.

LB:负载均衡服务。LB: Load balancing service.

etcd是一个分布式的、高可用的、一致的key-value存储数据库,基于Go语言实现,主要用于共享配置和服务发现。etcd是一个高可用强一致性的键值仓库,在很多分布式系统架构中得到了广泛的应用,其最经典的使用场景就是服务发现。etcd is a distributed, highly available, consistent key-value storage database, implemented based on Go language, mainly used for shared configuration and service discovery. etcd is a key-value warehouse with high availability and strong consistency. It has been widely used in many distributed system architectures. The most classic usage scenario is service discovery.

根据本申请一实施例提供了一种Ambari的高可用系统,图1是根据本申请实施例的一种可选的Ambari的高可用系统的结构示意图,如图1所示,该系统可以包括:分布式协调器和至少两个中心节点(master)。其中:An embodiment of the present application provides an Ambari high-availability system. FIG. 1 is a schematic structural diagram of an optional Ambari high-availability system according to an embodiment of the present application. As shown in FIG. 1 , the system may include: Distributed coordinator and at least two central nodes (master). in:

分布式协调器,用于在监测到当前主中心节点异常后,删除当前主中心节点在分布式协调器的目标节点中写入的第一设备信息;并向至少一个中心节点发送可写入信号;The distributed coordinator is used to delete the first device information written by the current main center node in the target node of the distributed coordinator after monitoring the abnormality of the current main center node; and send a writable signal to at least one center node ;

中心节点,用于根据可写入信号向分布式协调器发送自身的第二设备信息;a central node, configured to send its own second device information to the distributed coordinator according to the writable signal;

分布式协调器,还用于确定中心节点中,成功写入第二设备信息的目标中心节点;The distributed coordinator is also used to determine, in the central node, the target central node to which the second device information is successfully written;

中心节点,还用于根据切换信号将主备状态由备状态切换为主状态。The central node is also used to switch the active-standby state from the standby state to the main state according to the switching signal.

可选的,该Ambari的高可用系统还包括:至少一个第二从节点(slave);Optionally, the Ambari high availability system further includes: at least one second slave node (slave);

第二从节点,用于向中心节点发送访问请求;The second slave node is used to send an access request to the central node;

中心节点,还用于获取第二从节点发送访问请求;并在主备状态为主状态时,响应访问请求,并与第二从节点建立链接。The central node is further configured to obtain the access request sent by the second slave node; and when the master-standby state is the master state, respond to the access request and establish a link with the second slave node.

本申请一实施例中提供了一种Ambari的高可用方法,该方法可以应用于任意一种形式的电子设备中,如终端和服务器中。如图2所示,该Ambari的高可用方法,应用于分布式协调器,分布式协调器与至少两个中心节点链接,其中,中心节点中一个主中心节点的主备状态为主状态,其他中心节点的主备状态为备状态。其中,分布式协调器的种类有多种,例如,可以但不限于为zookeeper或etcd。An embodiment of the present application provides an Ambari high-availability method, which can be applied to any form of electronic device, such as a terminal and a server. As shown in Figure 2, the high availability method of Ambari is applied to the distributed coordinator, and the distributed coordinator is linked with at least two central nodes. The active and standby status of the central node is the standby status. Among them, there are various types of distributed coordinators, such as, but not limited to, zookeeper or etcd.

具体的,该方法Ambari的高可用,包括:Specifically, the method for high availability of Ambari includes:

步骤201、在监测到当前主中心节点异常后,删除当前主中心节点在分布式协调器中写入的第一设备信息。Step 201 , after detecting that the current main center node is abnormal, delete the first device information written by the current main center node in the distributed coordinator.

一些实施例中,当前主中心节点为至少两个中心节点之一,当前主中心节点为最近一次响应第二从节点(slave)访问请求的中心节点(master)。其中主master的主备状态为主状态。在服务正常使用过程中,当前主中心节点采用心跳机制与分布式协调器建立链接。当前主中心节点异常可以是当前主中心节点与分布式协调器链接异常。In some embodiments, the current main central node is one of at least two central nodes, and the current main central node is the central node (master) that responded to the access request of the second slave node (slave) most recently. The active-standby state of the primary master is the primary state. During the normal use of the service, the current main center node uses the heartbeat mechanism to establish a link with the distributed coordinator. The abnormality of the current main center node may be the abnormal link between the current main center node and the distributed coordinator.

在一个可选实施例中,监测到当前主中心节点异常,具体可以包括:In an optional embodiment, the abnormality of the current main center node is detected, which may specifically include:

监测到与当前主中心节点链接异常的时长超过第一预设时长。It is detected that the duration of the abnormal link with the current main central node exceeds the first preset duration.

通过监测分布式协调器与当前主中心节点的链接情况,在二者链接异常的时长超过第一预设时长时,确定当前主中心节点异常。By monitoring the link between the distributed coordinator and the current main center node, when the time period of the abnormal link between the two exceeds the first preset time period, it is determined that the current main center node is abnormal.

需要说明的是,在监测到当前主中心节点与分布式协调器之间的链接异常的时长,大于或等于第二预设时长后,将当前主中心节点的主备状态由主状态切换为备状态。其中,第二预设时长大于分布式协调器与所述当前主中心节点链接的心跳周期。It should be noted that, after monitoring that the duration of the abnormal link between the current main central node and the distributed coordinator is greater than or equal to the second preset duration, the active and standby states of the current main central node are switched from the active state to the standby state. state. Wherein, the second preset duration is greater than the heartbeat period of the link between the distributed coordinator and the current main central node.

进一步的,为避免后续出现两个主中心节点的情况,将第一预设时长设置为大于第二预设时长。从而,在当前主中心节点切换为备状态后(即所有中心节点都为备状态),在将当前主中心节点写入的第一设备信息删除,从而使备状态的中心节点,均可以向分布式协调器写入自身的第二设备信息。Further, in order to avoid the subsequent occurrence of two main central nodes, the first preset duration is set to be greater than the second preset duration. Therefore, after the current main center node is switched to the standby state (that is, all the center nodes are in the standby state), the first device information written by the current main center node is deleted, so that the center node in the standby state can be distributed to the distribution The coordinator writes its own second device information.

其中,上述的第一设备信息和第二设备信息可以但不限于为中心节点的IP(网际互连协议,Internet Protocol)。Wherein, the above-mentioned first device information and second device information may be, but not limited to, IP (Internet Protocol, Internet Protocol) of the central node.

步骤202、向至少一个中心节点发送可写入信号,以使至少一个中心节点根据可写入信号向分布式协调器写入自身的第二设备信息。Step 202: Send a writable signal to at least one central node, so that at least one central node writes its own second device information to the distributed coordinator according to the writable signal.

一些实施例中,分布式协调器在删除第一设备信息后,向至少一个中心节点发送可写入信号,从而通知系统中备状态的中心节点,可以写入设备信息。中心节点在接收到可写入信号后,便会向分布式协调器中写入自身的第二设备信息。In some embodiments, after deleting the first device information, the distributed coordinator sends a writable signal to at least one central node, thereby notifying the central node in the standby state in the system that the device information can be written. After receiving the writable signal, the central node will write its own second device information into the distributed coordinator.

其中,可以预先设定将设备信息写入分布式协调器中的某一节点,例如:/active/master。中心节点写入的设备信息均写入到该节点中。Wherein, it can be preset to write the device information to a node in the distributed coordinator, for example: /active/master. The device information written by the central node is written to this node.

可以理解的是,在当前主中心节点将主备状态切换为备状态后,若其再次与分布式协调器建立链接,也可以向分布式协调器中写入自身的设备信息。It can be understood that, after the current main center node switches the active-standby state to the standby state, if it establishes a link with the distributed coordinator again, it can also write its own device information to the distributed coordinator.

步骤203、确定至少一个中心节点中,成功写入第二设备信息的目标中心节点。Step 203: Determine the target central node to which the second device information is successfully written in at least one central node.

一些实施例中,由于系统中所有与分布式协调器链接的中心节点均会将自身的设备信息写入,因此,本申请中采用“先到先得”机制,即优先写入自身设备信息的中心节点为主中心节点。In some embodiments, since all central nodes in the system that are linked to the distributed coordinator will write their own device information, this application adopts a "first come, first served" mechanism, that is, priority to write their own device information. The central node is the main central node.

在分布式协调器中,包括主节点和至少两个第一从节点,在一个可选实施例中,确定中心节点中,成功写入第二设备信息的目标中心节点,包括:The distributed coordinator includes a master node and at least two first slave nodes, and in an optional embodiment, determining the target center node that successfully writes the second device information in the center node includes:

在中心节点的第二设备信息写入主节点后,通过主节点将第二设备信息,按照至少两个第一从节点依次同步至第一从节点;将首个同步的第一从节点的数量达到预设值的中心节点,确定为成功写入第二设备信息的目标中心节点。After the second device information of the central node is written into the master node, the master node synchronizes the second device information to the first slave node according to at least two first slave nodes in sequence; the number of the first synchronized first slave nodes is The central node that reaches the preset value is determined as the target central node that successfully writes the second device information.

一些实施例中,中心节点将第二设备信息先写入分布式协调器中的主节点,由分布式协调器的主节点将第二设备信息同步至第一从节点,由于第一从节点有多个,主节点依次将第二设备信息同步到第一从节点,在所有第一从节点中,同步了某一中心节点的第二设备信息的第一从节点的数量先达到预设值,将该中心节点确定为目标中心节点。In some embodiments, the central node first writes the second device information to the master node in the distributed coordinator, and the master node of the distributed coordinator synchronizes the second device information to the first slave node, because the first slave node has The master node synchronizes the second device information to the first slave node in turn. Among all the first slave nodes, the number of first slave nodes that synchronize the second device information of a certain central node reaches the preset value first, The central node is determined as the target central node.

其中,预设值可以根据实际情况进行设置,例如设置为小于第一从节点总数量的值。优选的,可以但不限于设置为总数量的一半。Wherein, the preset value can be set according to the actual situation, for example, set to a value smaller than the total number of the first slave nodes. Preferably, it can be set to half of the total number but not limited to.

优选的,为避免在同步过程中,两个中心节点的第二设备信息在第一从节点中的数量,同时达到预设值的情况,设置第一从节点的总数量为奇数(2N+1,N为正整数),预设值设置为N+1。Preferably, in order to avoid the situation that the number of the second device information of the two central nodes in the first slave node reaches a preset value at the same time during the synchronization process, the total number of the first slave nodes is set to an odd number (2N+1 , N is a positive integer), the default value is set to N+1.

步骤204、向目标中心节点发送切换信号,以使目标中心节点根据切换信号将主备状态由备状态切换为主状态。Step 204: Send a switching signal to the target central node, so that the target central node switches the active-standby state from the standby state to the main state according to the switching signal.

一些实施例中,分布式协调器在确定出目标中心节点后,会向该目标中心节点发送切换信号,从而使目标中心节点根据切换信号将主备状态由备状态切换为主状态,从而服务恢复。In some embodiments, after the distributed coordinator determines the target central node, it will send a switching signal to the target central node, so that the target central node can switch the active-standby state from the standby state to the main state according to the switching signal, so as to restore the service. .

本申请的Ambari的高可用方法,在当前主中心节点异常后,无需人工参与,在当前主中心节点与分布式协调器链接异常的时长达到第二预设值后,当前主中心节点将主备状态由主状态切换为备状态,并在链接异常的时长达到第二预设值后,分布式协调器删除当前中心节点写入的第一设备信息,并向系统中的中心节点发送可写入信号。各中心节点在接收到可写入信号后,向分布式协调器中写入自身的第二设备信息,分布式协调器根据写入的第二设备信息,确定目标中心节点,并向目标中心节点发送切换信号,目标中心节点根据切换信号将主备状态切换为主状态,从而恢复服务。In the high availability method of Ambari of the present application, after the current main center node is abnormal, no manual participation is required, and after the abnormal link between the current main center node and the distributed coordinator reaches the second preset value, the current main center node The state is switched from the main state to the standby state, and after the abnormal link duration reaches the second preset value, the distributed coordinator deletes the first device information written by the current central node, and sends a writable to the central node in the system. Signal. After receiving the writable signal, each central node writes its own second device information to the distributed coordinator, and the distributed coordinator determines the target central node according to the written second device information, and sends the information to the target central node. Send a switch signal, and the target central node switches the active and standby states to the main state according to the switch signal, thereby restoring the service.

本申请一实施例中提供了另一种Ambari的高可用方法,该方法的具体实施可参见上述方法实施例部分的描述,重复之处不再赘述。该方法可以应用于任意一种形式的电子设备中,如终端和服务器中。如图3所示,该Ambari的高可用方法,应用于中心节点,包括:Another high availability method of Ambari is provided in an embodiment of the present application. For the specific implementation of the method, reference may be made to the description of the foregoing method embodiment section, and repeated details will not be repeated. The method can be applied to any form of electronic equipment, such as terminals and servers. As shown in Figure 3, the high availability method of Ambari, applied to the central node, includes:

步骤301、获取分布式协调器发送的可写入信号,可写入信号是分布式协调器在删除当前主中心节点在分布式协调器的目标节点中写入的第一设备信息后发送的。Step 301: Acquire a writable signal sent by the distributed coordinator. The writable signal is sent by the distributed coordinator after deleting the first device information written by the current main center node in the target node of the distributed coordinator.

步骤302、根据可写入信号,向分布式协调器写入自身的第二设备信息。Step 302: Write the second device information of itself to the distributed coordinator according to the writable signal.

步骤303、获取分布式协调器发送的切换信号。Step 303: Obtain the handover signal sent by the distributed coordinator.

步骤304、根据切换信号将主备状态由备状态切换为主状态。Step 304: Switch the active/standby state from the standby state to the main state according to the switch signal.

在一个可选实施例中,该Ambari的高可用方法,还包括:In an optional embodiment, the Ambari high availability method further includes:

获取第二从节点发送的访问请求;若主备状态为主状态,响应访问请求,并与第二从节点建立链接。Acquire the access request sent by the second slave node; if the master-standby state is the master state, respond to the access request and establish a link with the second slave node.

一些实施例中,第二从节点为中心节点对应的从节点。每个第二从节点slave中均配置所有的中心节点master的地址,每个第二从节点slave中配置多个master的地址。In some embodiments, the second slave node is a slave node corresponding to the central node. Each second slave node slave is configured with addresses of all central node masters, and each second slave node slave is configured with multiple master addresses.

slave按预设的访问策略,访问配置的所有master。如果master正常响应了请求,那slave就认为这个master是主,并在后续继续访问这个master。如果master没有正常响应请求,那么slave继续尝试,访问其他的master。具体的,master没有正常响应请求的情况有多种,例如,该master无响应;再例如,该master有响应,但是master反馈自己不是主。The slave accesses all configured masters according to the preset access policy. If the master responds to the request normally, the slave considers the master to be the master and continues to access the master in the future. If the master does not respond to the request normally, the slave continues to try to access other masters. Specifically, there are many situations in which the master does not respond to the request normally. For example, the master does not respond; for another example, the master responds, but the master reports that it is not the master.

其中,访问策略可以是按照master的顺序,依次访问,也可以采用随机的方式访问。Among them, the access policy can be accessed in sequence according to the order of the master, or it can be accessed in a random manner.

在本申请的一个具体实施例中,参照图4,master启动之后,其主备状态为备状态,当某一master1在zookeeper的某个节点node(/active/master)上写入自己的机器信息(即上述的设备信息),才会被认定为主,并且自己切换主备状态为主状态。在master1主备状态变为主状态之后,便能正常响应slave的请求。slave按随机访问策略,访问配置的所有master。如果master正常响应了请求,那slave就认为这个master是主,并在后续继续访问这个master。否则,需要告诉slave自己是备中心节点,不接受请求。在master1异常后,并且链接异常的时长超过第二预设时长T2,则将自己主备状态切换到备状态。这时,所有的master都处于备状态,因此不会出现多个master成为主的问题。这个期间,无主可用,服务暂时不可用,但在很短时间内,会自动恢复。In a specific embodiment of this application, referring to FIG. 4 , after the master is started, its active/standby state is the standby state. When a certain master1 writes its own machine information on a certain node (/active/master) of zookeeper (that is, the above-mentioned device information), will be recognized as the main state, and switch the main state to the main state by itself. After the master1 master/slave state becomes the master state, it can respond to the slave request normally. The slave accesses all configured masters according to the random access policy. If the master responds to the request normally, the slave considers the master to be the master and continues to access the master in the future. Otherwise, you need to tell the slave that it is the standby center node and does not accept the request. After master1 is abnormal, and the duration of the link abnormality exceeds the second preset duration T2, it switches its master-standby state to the standby state. At this time, all masters are in standby state, so there will be no problem of multiple masters becoming masters. During this period, no master is available, and the service is temporarily unavailable, but it will be automatically restored within a short period of time.

写入zookeeper的node的设备信息是临时信息,链接异常的时长超过第一预设时长T1(T2<T1)之后,临时信息会被zookeeper删除。其他的master就可以尝试将自己的信息写入到zookeeper的node,写入成功的master就变为主。The device information of the node written to zookeeper is temporary information. After the abnormal link duration exceeds the first preset duration T1 (T2<T1), the temporary information will be deleted by zookeeper. Other masters can try to write their own information to the node of zookeeper, and the master that successfully writes becomes the master.

基于同一构思,本申请实施例中提供了一种Ambari的高可用装置,该装置的具体实施可参见方法实施例部分的描述,重复之处不再赘述,如图5所示,该装置主要包括:Based on the same concept, an embodiment of the present application provides an Ambari high-availability device. For the specific implementation of the device, please refer to the description in the method embodiment section, and the repeated parts will not be repeated. As shown in FIG. 5 , the device mainly includes: :

删除模块501,用于在监测到当前主中心节点异常后,删除当前主中心节点在分布式协调器中写入的第一设备信息;分布式协调器与至少两个中心节点链接,当前主中心节点为至少两个中心节点之一;The deletion module 501 is used to delete the first device information written in the distributed coordinator by the current main central node after monitoring the abnormality of the current main central node; the distributed coordinator is linked with at least two central nodes, and the current main central node The node is one of at least two central nodes;

第一发送模块502,用于向至少一个中心节点发送可写入信号,以使至少一个中心节点根据可写入信号向分布式协调器写入自身的第二设备信息;a first sending module 502, configured to send a writable signal to at least one central node, so that at least one central node writes its own second device information to the distributed coordinator according to the writable signal;

确定模块503,用于确定至少一个中心节点中,成功写入第二设备信息的目标中心节点;A determination module 503, configured to determine, in at least one central node, the target central node to which the second device information is successfully written;

第二发送模块504,用于向目标中心节点发送切换信号,以使目标中心节点根据切换信号将主备状态由备状态切换为主状态。The second sending module 504 is configured to send a switching signal to the target central node, so that the target central node switches the active-standby state from the standby state to the main state according to the switching signal.

基于同一构思,本申请实施例中提供了一种Ambari的高可用装置,该装置的具体实施可参见方法实施例部分的描述,重复之处不再赘述,如图6所示,该装置主要包括:Based on the same concept, an embodiment of the present application provides an Ambari high-availability device. For the specific implementation of the device, please refer to the description in the method embodiment section, and the repeated parts will not be repeated. As shown in FIG. 6 , the device mainly includes: :

第一获取模块601,用于获取分布式协调器发送的可写入信号,可写入信号是分布式协调器在删除当前主中心节点在分布式协调器的目标节点中写入的第一设备信息后发送的;分布式协调器与至少两个中心节点链接,当前主中心节点为至少两个中心节点之一;The first obtaining module 601 is configured to obtain a writable signal sent by the distributed coordinator, where the writable signal is the first device written by the distributed coordinator in the target node of the distributed coordinator when deleting the current main center node. sent after the information; the distributed coordinator is linked with at least two central nodes, and the current main central node is one of the at least two central nodes;

写入模块602,用于根据可写入信号,向分布式协调器写入自身的第二设备信息;a writing module 602, configured to write its own second device information to the distributed coordinator according to the writable signal;

第二获取模块603,用于获取分布式协调器发送的切换信号;A second acquiring module 603, configured to acquire the handover signal sent by the distributed coordinator;

切换模块604,用于根据切换信号将主备状态由备状态切换为主状态。The switching module 604 is configured to switch the active-standby state from the standby state to the main state according to the switching signal.

基于同一构思,本申请实施例中还提供了一种电子设备,如图7所示,该电子设备主要包括:处理器701、存储器702和通信总线703,其中,处理器701和存储器702通过通信总线703完成相互间的通信。其中,存储器702中存储有可被至处理器701执行的程序,处理器701执行存储器702中存储的程序,实现如下步骤:Based on the same concept, an embodiment of the present application also provides an electronic device. As shown in FIG. 7 , the electronic device mainly includes: a processor 701 , a memory 702 and a communication bus 703 , wherein the processor 701 and the memory 702 communicate with each other through communication The bus 703 performs communication with each other. The memory 702 stores a program that can be executed by the processor 701, and the processor 701 executes the program stored in the memory 702 to implement the following steps:

在监测到当前主中心节点异常后,删除当前主中心节点在分布式协调器中写入的第一设备信息,当前主中心节点为至少两个中心节点之一;After monitoring the abnormality of the current main center node, delete the first device information written by the current main center node in the distributed coordinator, and the current main center node is one of at least two center nodes;

向至少一个中心节点发送可写入信号,以使至少一个中心节点根据可写入信号向分布式协调器写入自身的第二设备信息;Sending a writable signal to at least one central node, so that at least one central node writes its own second device information to the distributed coordinator according to the writable signal;

确定至少一个中心节点中,成功写入第二设备信息的目标中心节点;Determine the target central node to which the second device information is successfully written in at least one central node;

向目标中心节点发送切换信号,以使目标中心节点根据切换信号将主备状态由备状态切换为主状态。或,A switching signal is sent to the target central node, so that the target central node switches the active-standby state from the standby state to the main state according to the switching signal. or,

获取分布式协调器发送的可写入信号,可写入信号是分布式协调器在删除当前主中心节点在分布式协调器的目标节点中写入的第一设备信息后发送的;Obtain the writable signal sent by the distributed coordinator, and the writable signal is sent by the distributed coordinator after deleting the first device information written by the current main center node in the target node of the distributed coordinator;

根据可写入信号,向分布式协调器写入自身的第二设备信息;According to the writable signal, write its own second device information to the distributed coordinator;

获取分布式协调器发送的切换信号;Obtain the switching signal sent by the distributed coordinator;

根据切换信号将主备状态由备状态切换为主状态。The active/standby state is switched from the standby state to the main state according to the switching signal.

上述电子设备中提到的通信总线703可以时外设部件互连标准(PeripheralComponent Interconnect,简称PCI)总线或扩展工业标准结构(Extended IndustryStandard Architecture,简称EISA)总线等。该通信总线703可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus 703 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA for short) bus or the like. The communication bus 703 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 7, but it does not mean that there is only one bus or one type of bus.

存储器702可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。可选地,存储器还可以是至少一个位于远离前述处理器701的存储装置。The memory 702 may include random access memory (Random Access Memory, RAM for short), or may include non-volatile memory (non-volatile memory), such as at least one disk storage. Optionally, the memory may also be at least one storage device located away from the aforementioned processor 701 .

上述的处理器701可以是通用处理器,包括中央处理器(Central ProcessingUnit,简称CPU)、网络处理器(Network Processor,简称NP)等,还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application SpecificIntegrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor 701 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc., and may also be a digital signal processor (Digital Signal Processing, DSP for short) , Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.

在本申请的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当该计算机程序在计算机上运行时,使得计算机执行上述实施例中所描述的Ambari的高可用方法。In yet another embodiment of the present application, a computer-readable storage medium is also provided, where a computer program is stored in the computer-readable storage medium, and when the computer program is run on a computer, the computer is made to execute the above-mentioned embodiments. Ambari's high-availability approach as described.

在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机指令时,全部或部分地产生按照本申请实施例的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、微波等)方式向另外一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如软盘、硬盘、磁带等)、光介质(例如DVD)或者半导体介质(例如固态硬盘)等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the procedures or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, from a website site, computer, server, or data center via wired (e.g., Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, microwave, etc.) means to transmit to another website site, computer, server or data center. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more available mediums integrated. The available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes, etc.), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.

需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as "first" and "second" etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is no such actual relationship or sequence between entities or operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

以上所述仅是本发明的具体实施方式,使本领域技术人员能够理解或实现本发明。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific embodiments of the present invention, so that those skilled in the art can understand or implement the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims (13)

1.一种Ambari的高可用方法,其特征在于,应用于分布式协调器,所述分布式协调器与至少两个中心节点链接,所述方法包括:1. a high availability method of Ambari, is characterized in that, is applied to distributed coordinator, and described distributed coordinator is linked with at least two central nodes, and described method comprises: 在监测到当前主中心节点异常后,删除所述当前主中心节点在分布式协调器中写入的第一设备信息;所述当前主中心节点为所述至少两个中心节点之一;After detecting that the current main center node is abnormal, delete the first device information written by the current main center node in the distributed coordinator; the current main center node is one of the at least two center nodes; 向至少一个所述中心节点发送可写入信号,以使所述至少一个中心节点根据所述可写入信号向所述分布式协调器写入自身的第二设备信息;sending a writable signal to at least one of the central nodes, so that the at least one central node writes its own second device information to the distributed coordinator according to the writable signal; 确定所述至少一个中心节点中,成功写入所述第二设备信息的目标中心节点;Determine the target central node that successfully writes the second device information in the at least one central node; 向所述目标中心节点发送切换信号,以使所述目标中心节点根据所述切换信号将主备状态由备状态切换为主状态。A switching signal is sent to the target central node, so that the target central node switches the active-standby state from the standby state to the main state according to the switching signal. 2.根据权利要求1所述的Ambari的高可用方法,其特征在于,监测到当前主中心节点异常,包括:2. the high availability method of Ambari according to claim 1, is characterized in that, monitoring the abnormality of current main center node, comprises: 监测到与所述当前主中心节点链接异常的时长超过第一预设时长。It is detected that the duration of the abnormal link with the current main central node exceeds the first preset duration. 3.根据权利要求2所述的Ambari的高可用方法,其特征在于,所述第一预设时长大于第二预设时长,所述第二预设时长大于所述分布式协调器与所述当前主中心节点链接的心跳周期。3. The high-availability method for Ambari according to claim 2, wherein the first preset duration is longer than a second preset duration, and the second preset duration is longer than the distributed coordinator and the The heartbeat period of the current main center node link. 4.根据权利要求1所述的Ambari的高可用方法,其特征在于,所述分布式协调器包括主节点和至少两个第一从节点;所述确定所述至少一个中心节点中,成功写入所述第二设备信息的目标中心节点,包括:4. The high-availability method for Ambari according to claim 1, wherein the distributed coordinator comprises a master node and at least two first slave nodes; in the determining of the at least one central node, a successful write The target central node for entering the second device information, including: 在所述中心节点的第二设备信息写入所述主节点后,通过所述主节点将所述第二设备信息,依次同步至所述至少两个第一从节点;After the second device information of the central node is written into the master node, the second device information is sequentially synchronized to the at least two first slave nodes through the master node; 将首个同步的所述第一从节点的数量达到预设值的所述中心节点,确定为成功写入所述第二设备信息的所述目标中心节点。The first central node whose number of synchronized first slave nodes reaches a preset value is determined as the target central node that successfully writes the second device information. 5.根据权利要求4所述的Ambari的高可用方法,其特征在于,所述预设值为所述第一从节点的总数量的一半。5 . The high availability method of Ambari according to claim 4 , wherein the preset value is half of the total number of the first slave nodes. 6 . 6.一种Ambari的高可用方法,其特征在于,应用于中心节点,包括:6. A highly available method of Ambari, characterized in that, applied to a central node, comprising: 获取分布式协调器发送的可写入信号,所述可写入信号是所述分布式协调器在删除所述当前主中心节点在分布式协调器的目标节点中写入的第一设备信息后发送的;Obtain a writable signal sent by the distributed coordinator, where the writable signal is the first device information written by the distributed coordinator in the target node of the distributed coordinator after the distributed coordinator deletes the current main center node sent; 根据所述可写入信号,向所述分布式协调器写入自身的第二设备信息;According to the writable signal, write its own second device information to the distributed coordinator; 获取所述分布式协调器发送的切换信号;obtaining the handover signal sent by the distributed coordinator; 根据所述切换信号将主备状态由备状态切换为主状态。The active/standby state is switched from the standby state to the main state according to the switch signal. 7.根据权利要求6所述的Ambari的高可用方法,其特征在于,还包括:7. The high availability method of Ambari according to claim 6, is characterized in that, also comprises: 获取第二从节点发送的访问请求;Obtain the access request sent by the second slave node; 若所述主备状态为主状态,响应所述访问请求,并与所述第二从节点建立链接。If the active/standby state is the master state, respond to the access request and establish a link with the second slave node. 8.一种Ambari的高可用系统,其特征在于,包括:分布式协调器和至少两个中心节点,所述分布式协调器与至少两个中心节点链接;8. A high-availability system of Ambari, comprising: a distributed coordinator and at least two central nodes, the distributed coordinator is linked with at least two central nodes; 所述分布式协调器,用于在监测到当前主中心节点异常后,删除所述当前主中心节点在分布式协调器的目标节点中写入的第一设备信息;并向至少一个所述中心节点发送可写入信号;所述当前主中心节点为所述至少两个中心节点之一;The distributed coordinator is used to delete the first device information written by the current main center node in the target node of the distributed coordinator after monitoring that the current main center node is abnormal; The node sends a writable signal; the current main central node is one of the at least two central nodes; 所述中心节点,用于根据所述可写入信号向所述分布式协调器发送自身的第二设备信息;the central node, configured to send its own second device information to the distributed coordinator according to the writable signal; 所述分布式协调器,还用于确定所述中心节点中,成功写入所述第二设备信息的目标中心节点;The distributed coordinator is further configured to determine, in the central node, the target central node to which the second device information is successfully written; 所述中心节点,还用于根据所述切换信号将主备状态由备状态切换为主状态。The central node is further configured to switch the active-standby state from the standby state to the active state according to the switching signal. 9.根据权利要求8所述的Ambari的高可用系统,其特征在于,还包括:至少一个第二从节点;9. The high availability system of Ambari according to claim 8, characterized in that, further comprising: at least one second slave node; 所述第二从节点,用于向所述中心节点发送访问请求;the second slave node, configured to send an access request to the central node; 所述中心节点,还用于获取所述第二从节点发送访问请求;并在所述主备状态为主状态时,响应所述访问请求,并与所述第二从节点建立链接。The central node is further configured to obtain an access request sent by the second slave node; and when the master-standby state is the master state, respond to the access request and establish a link with the second slave node. 10.一种Ambari的高可用装置,其特征在于,包括:10. A high-availability device of Ambari, characterized in that, comprising: 删除模块,用于在监测到当前主中心节点异常后,删除所述当前主中心节点在分布式协调器中写入的第一设备信息;所述分布式协调器与至少两个中心节点链接,所述当前主中心节点为所述至少两个中心节点之一;a deletion module, configured to delete the first device information written in the distributed coordinator by the current main central node after monitoring the abnormality of the current main central node; the distributed coordinator is linked with at least two central nodes, The current main central node is one of the at least two central nodes; 第一发送模块,用于向至少一个所述中心节点发送可写入信号,以使所述至少一个中心节点根据所述可写入信号向所述分布式协调器写入自身的第二设备信息;a first sending module, configured to send a writable signal to at least one of the central nodes, so that the at least one central node writes its own second device information to the distributed coordinator according to the writable signal ; 确定模块,用于确定所述至少一个中心节点中,成功写入所述第二设备信息的目标中心节点;a determining module, configured to determine, in the at least one central node, the target central node to which the second device information is successfully written; 第二发送模块,用于向所述目标中心节点发送切换信号,以使所述目标中心节点根据所述切换信号将主备状态由备状态切换为主状态。The second sending module is configured to send a switching signal to the target central node, so that the target central node switches the active-standby state from the standby state to the main state according to the switching signal. 11.一种Ambari的高可用装置,其特征在于,包括:11. A high-availability device of Ambari, characterized in that, comprising: 第一获取模块,用于获取分布式协调器发送的可写入信号,所述可写入信号是所述分布式协调器在删除所述当前主中心节点在分布式协调器的目标节点中写入的第一设备信息后发送的;所述分布式协调器与至少两个中心节点链接,所述当前主中心节点为所述至少两个中心节点之一;The first obtaining module is configured to obtain a writable signal sent by the distributed coordinator, where the writable signal is written by the distributed coordinator in the target node of the distributed coordinator when the current main center node is deleted. sent after the entered first device information; the distributed coordinator is linked with at least two central nodes, and the current main central node is one of the at least two central nodes; 写入模块,用于根据所述可写入信号,向所述分布式协调器写入自身的第二设备信息;a writing module, configured to write its own second device information to the distributed coordinator according to the writable signal; 第二获取模块,用于获取所述分布式协调器发送的切换信号;a second acquiring module, configured to acquire the handover signal sent by the distributed coordinator; 切换模块,用于根据所述切换信号将主备状态由备状态切换为主状态。The switching module is configured to switch the active-standby state from the standby state to the main state according to the switching signal. 12.一种电子设备,其特征在于,包括:处理器、通信接口、存储器和通信总线,其中,处理器、通信接口和存储器通过通信总线完成相互间的通信;12. An electronic device, comprising: a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other through the communication bus; 所述存储器,用于存储计算机程序;the memory for storing computer programs; 所述处理器,用于执行所述存储器中所存储的程序,实现权利要求1-5或6-7任一项所述的Ambari的高可用方法。The processor is configured to execute the program stored in the memory to implement the high availability method of Ambari according to any one of claims 1-5 or 6-7. 13.一种计算机可读存储介质,存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-5或6-7任一项所述的Ambari的高可用方法。13. A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the high-availability method of Ambari according to any one of claims 1-5 or 6-7 is implemented.
CN202210021964.9A 2022-01-10 2022-01-10 Ambari's high availability method, system, apparatus, electronic device and storage medium Pending CN114338370A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210021964.9A CN114338370A (en) 2022-01-10 2022-01-10 Ambari's high availability method, system, apparatus, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210021964.9A CN114338370A (en) 2022-01-10 2022-01-10 Ambari's high availability method, system, apparatus, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN114338370A true CN114338370A (en) 2022-04-12

Family

ID=81026378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210021964.9A Pending CN114338370A (en) 2022-01-10 2022-01-10 Ambari's high availability method, system, apparatus, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114338370A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811325A (en) * 2014-01-24 2015-07-29 华为技术有限公司 Cluster node controller monitoring method, related device and controller
CN105934929A (en) * 2014-12-31 2016-09-07 华为技术有限公司 Post-cluster brain split quorum processing method and quorum storage device and system
CN109101196A (en) * 2018-08-14 2018-12-28 北京奇虎科技有限公司 Host node switching method, device, electronic equipment and computer storage medium
US20190095293A1 (en) * 2016-07-27 2019-03-28 Tencent Technology (Shenzhen) Company Limited Data disaster recovery method, device and system
CN111787511A (en) * 2020-07-13 2020-10-16 重庆大学 A Zigbee network and its node switching method
CN112860787A (en) * 2019-11-27 2021-05-28 上海哔哩哔哩科技有限公司 Method for switching master nodes in distributed master-slave system, master node device and storage medium
CN112866314A (en) * 2019-11-27 2021-05-28 上海哔哩哔哩科技有限公司 Method for switching slave nodes in distributed master-slave system, master node device and storage medium
CN113760468A (en) * 2021-01-19 2021-12-07 北京沃东天骏信息技术有限公司 Distributed election method, device, system and medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811325A (en) * 2014-01-24 2015-07-29 华为技术有限公司 Cluster node controller monitoring method, related device and controller
CN105934929A (en) * 2014-12-31 2016-09-07 华为技术有限公司 Post-cluster brain split quorum processing method and quorum storage device and system
US20190095293A1 (en) * 2016-07-27 2019-03-28 Tencent Technology (Shenzhen) Company Limited Data disaster recovery method, device and system
CN109101196A (en) * 2018-08-14 2018-12-28 北京奇虎科技有限公司 Host node switching method, device, electronic equipment and computer storage medium
CN112860787A (en) * 2019-11-27 2021-05-28 上海哔哩哔哩科技有限公司 Method for switching master nodes in distributed master-slave system, master node device and storage medium
CN112866314A (en) * 2019-11-27 2021-05-28 上海哔哩哔哩科技有限公司 Method for switching slave nodes in distributed master-slave system, master node device and storage medium
CN111787511A (en) * 2020-07-13 2020-10-16 重庆大学 A Zigbee network and its node switching method
CN113760468A (en) * 2021-01-19 2021-12-07 北京沃东天骏信息技术有限公司 Distributed election method, device, system and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任乐乐;何灵敏;: "一种改进的主从节点选举算法用于实现集群负载均衡", 中国计量学院学报, no. 03 *
周晓垣: "《区块链时代 数字货币意味着什么》", 30 November 2018, 天津人民出版社, pages: 227 *

Similar Documents

Publication Publication Date Title
US9934242B2 (en) Replication of data between mirrored data sites
CN109842651B (en) Uninterrupted service load balancing method and system
US9367261B2 (en) Computer system, data management method and data management program
KR101871383B1 (en) Method and system for using a recursive event listener on a node in hierarchical data structure
WO2018036148A1 (en) Server cluster system
CN109101196A (en) Host node switching method, device, electronic equipment and computer storage medium
GB2407887A (en) Automatically modifying fail-over configuration of back-up devices
WO2021184587A1 (en) Prometheus-based private cloud monitoring method and apparatus, and computer device and storage medium
CN104980519A (en) Multi-computer room storage system
CN111049928B (en) Data synchronization method, system, electronic device and computer readable storage medium
WO2016202051A1 (en) Method and device for managing active and backup nodes in communication system and high-availability cluster
US9846624B2 (en) Fast single-master failover
WO2017215430A1 (en) Node management method in cluster and node device
CN110958300B (en) A method, system, apparatus, electronic device and computer-readable medium for uploading data
CN111726388A (en) A method, device, system and device for realizing cross-cluster high availability
CN111865632A (en) Switching method of distributed data storage cluster and switching instruction sending method and device
CN116996376A (en) Method, device, equipment and storage medium for updating configuration of fixed network terminal
CN112860505A (en) Method and device for regulating and controlling distributed clusters
WO2022134638A1 (en) Logic clock synchronization method and apparatus, and central time service cluster
CN107528724B (en) Optimization processing method and device for node cluster
CN112612653A (en) Service recovery method, device, arbitration server and storage system
CN114338370A (en) Ambari&#39;s high availability method, system, apparatus, electronic device and storage medium
JP2015114952A (en) Network system, monitoring control unit, and software verification method
CN111708843A (en) A cross-data center MySQL multi-active implementation method based on MGR
CN114584462A (en) Network service processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220412