CN109101196A - Host node switching method, device, electronic equipment and computer storage medium - Google Patents

Host node switching method, device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN109101196A
CN109101196A CN201810925076.3A CN201810925076A CN109101196A CN 109101196 A CN109101196 A CN 109101196A CN 201810925076 A CN201810925076 A CN 201810925076A CN 109101196 A CN109101196 A CN 109101196A
Authority
CN
China
Prior art keywords
node
host node
service center
coordination service
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810925076.3A
Other languages
Chinese (zh)
Inventor
王�锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201810925076.3A priority Critical patent/CN109101196A/en
Publication of CN109101196A publication Critical patent/CN109101196A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

This application involves Internet technical fields, disclose a kind of host node switching method, device, electronic equipment and computer readable storage medium, wherein, host node switching method includes: when Fisrt fault monitoring control devices to current primary node break down, contention requests are sent to coordination service center, contention requests are for requesting coordination service center that metadata node corresponding with Fisrt fault controller is determined as target host node;Then when receiving the confirmation message at coordination service center, metadata node corresponding with Fisrt fault controller is switched to by target host node by virtual IP VIP.The method of the embodiment of the present application realizes the compatibility to lowest version client access host node so that existing lowest version client also can normally access the host node after switching even if the active-standby switch of metadata node occurs.

Description

Host node switching method, device, electronic equipment and computer storage medium
Technical field
This application involves Internet technical fields, specifically, this application involves a kind of host node switching method, device, Electronic equipment and computer storage medium.
Background technique
In current large-scale distributed storage system, purview certification and quota control are concentrated in order to realize, it is main to use The method of centralized metadata management is centrally stored in several metadata that is, by the metadata of data all in whole system In node (Name Node).
In such framework, from the metadata node as host node to the inquiry of relative client offer data, more It is new to wait service, at this point, the availability of metadata node is directly related to the availability of whole system, therefore in distributed storage system The availability of metadata node is usually promoted in system by way of redundancy.Currently, the method for promoting metadata node availability, The mode for usually passing through HA (High Availablity, high availability) by metadata node, utilizes spare metadata node The metadata node as host node in abnormality is switched off, i.e., spare metadata node is switched to new host node The service such as inquiry, update to clients providing data.
However, the above method is only applicable to the highest version client with certain decision logic, i.e. highest version client energy Enough judge whether current meta data node is host node, if it is judged that being non-master, then continues to judge other metadata Whether node is host node, until finding the metadata node as host node, to pass through first number as host node Inquiry, the update etc. of data are carried out according to node.Since the lowest version client originally designed does not have decision logic, and can only visit Ask the host node of fixing address, then, when host node switches because breaking down, will lead to lowest version client can not be obtained Host node address after switching, so that inquiry, the update etc. of data, the i.e. above method can not be carried out by the host node after switching It cannot be used for the lowest version client without decision logic, the compatibility of lowest version client can not be carried out.
Summary of the invention
The purpose of the application is intended at least can solve above-mentioned one of technological deficiency, can not especially be compatible with and not have judgement The technological deficiency of the lowest version client of logic.
In a first aspect, providing a kind of host node switching method, comprising:
When Fisrt fault monitoring control devices to current primary node break down, competition is sent to coordination service center and is asked It asks, contention requests are for requesting coordination service center that metadata node corresponding with Fisrt fault controller is determined as target master Node;
It, will be with Fisrt fault control by virtual IP VIP when receiving the confirmation message at coordination service center The corresponding metadata node of device processed is switched to target host node.
Specifically, whether monitoring current primary node breaks down, comprising:
Fault inquiry request is sent to coordination service center with prefixed time interval, the fault inquiry request is for requesting Detect the fault message to break down in coordination service center with the presence or absence of current primary node;
If receiving the confirmation message of coordination service center return, it is determined that current primary node breaks down.
Further, fault message is that the second failed controller monitors the corresponding first number for being currently at host node The fault message sent when according to nodes break down;Wherein, Fisrt fault controller and the second failed controller are by coordinating to take The unified management of business center.
Further, metadata node corresponding with Fisrt fault controller is switched to by target host node by VIP, wrapped It includes:
By VIP and the unbinding relationship of host node that breaks down, and by VIP member number corresponding with Fisrt fault controller Binding relationship is established according to node, for the corresponding metadata node of Fisrt fault controller to be switched to target host node.
Further, it before VIP metadata node corresponding with Fisrt fault controller is established binding relationship, also wraps It includes:
The host node to break down is isolated with target host node.
Further, it after VIP metadata node corresponding with Fisrt fault controller is established binding relationship, also wraps It includes:
The corresponding metadata node of Fisrt fault controller is switched to active state by inactive state, by the first event The corresponding metadata node of barrier controller is switched to target host node.
Second aspect provides a kind of host node switching device, comprising:
Sending module, for when Fisrt fault monitoring control devices to current primary node break down, to Distributed Application Program Coordination service coordination service centre sends contention requests, and contention requests are for requesting coordination service center will be with Fisrt fault The corresponding metadata node of controller is determined as target host node;
Switching module, for being incited somebody to action by virtual IP VIP when receiving the confirmation message at coordination service center The corresponding metadata node of Fisrt fault controller is switched to target host node.
Specifically, sending module includes that fault inquiry submodule and failure determine submodule;
Fault inquiry submodule, for sending fault inquiry request, failure to coordination service center with prefixed time interval Fault message of the inquiry request for breaking down in request detection coordination service center with the presence or absence of current primary node;
Failure determines submodule, for determining current main section when receiving the confirmation message of coordination service center return Point breaks down.
Further, fault message is that the second failed controller monitors the corresponding first number for being currently at host node The fault message sent when according to nodes break down;Wherein, Fisrt fault controller and the second failed controller are by coordinating to take The unified management of business center.
Further, switching module is specifically used for VIP and the unbinding relationship of host node that breaks down, and by VIP Metadata node corresponding with Fisrt fault controller establishes binding relationship, to be used for the corresponding first number of Fisrt fault controller Target host node is switched to according to node.
It further, further include isolation module;
The isolation module, for the host node to break down to be isolated with target host node.
It further, further include processing module;
The processing module, for the corresponding metadata node of Fisrt fault controller to be switched to work by inactive state Dynamic state, is switched to target host node for the corresponding metadata node of Fisrt fault controller.
The third aspect, provides a kind of electronic equipment, including memory, processor and storage on a memory and can located The computer program run on reason device, processor realize above-mentioned host node switching method when executing described program.
Fourth aspect provides a kind of computer readable storage medium, calculating is stored on computer readable storage medium Machine program, the program realize above-mentioned host node switching method when being executed by processor.
The application implements the host node switching method provided, when event occurs for Fisrt fault monitoring control devices to current primary node When barrier, contention requests are sent to coordination service center, contention requests are for requesting coordination service center that will control with Fisrt fault The corresponding metadata node of device is determined as target host node, and being will first number corresponding with Fisrt fault controller subsequently through VIP Target host node is switched to according to node to lay the foundation;It, will be with by VIP when receiving the confirmation message at coordination service center The corresponding metadata node of one failed controller is switched to target host node, from regardless of highest version client or lowest version visitor Family end is not necessarily to judge whether current meta data node is host node, and only need to specify in the client one is directed toward host node always Fixation VIP, can by host node carry out data access so that even if occur metadata node active-standby switch, it is existing Lowest version client also can normally access the host node after switching, without batch upgrade lowest version client, realize pair The compatibility of lowest version client access host node.
The additional aspect of the application and advantage will be set forth in part in the description, these will become from the following description It obtains obviously, or recognized by the practice of the application.
Detailed description of the invention
The application is above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow diagram of the host node switching method of the embodiment of the present application;
Fig. 2 is the process schematic that the host node of the embodiment of the present application switches;
Fig. 3 is the basic structure schematic diagram of the host node switching device of the embodiment of the present application;
Fig. 4 is the detailed construction schematic diagram of the host node switching device of the embodiment of the present application;
Fig. 5 is the structural schematic diagram of the electronic equipment of the embodiment of the present application.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the application, and cannot be construed to the limitation to the application.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.
ZKFC (ZooKeeper FailOver Controller, distributed application program coordination service failed controller) It is HDFS (Hadoop Distributed File System, Hadoop distributed file system) 2.0 versions HA (High later Availablity, high availability) scheme component, ZKFC can real-time monitoring whether go out as the metadata node of host node Existing failure, and when a failure occurs, the host node of failure can be switched off using spare metadata node, i.e., by spare member Back end is switched to new host node and services to inquiry, the update etc. with the matched clients providing data of HDFS2.0.
Wherein, with the matched client of HDFS2.0 be highest version client (such as client 2.0), and with HDFS1.0 The client matched is lowest version client (such as client 1.0).Since highest version client is the HDFS2.0 with introducing ZKFC Match, thus there is certain decision logic to host node, can judge whether current meta data node is host node, such as Fruit judging result is non-master, then continues to judge whether other metadata node is host node, until finding as main section The metadata node of point, to carry out inquiry, the update etc. of data by the metadata node as host node.
However, lowest version client (such as client 1.0) be it is matched with HDFS1.0, due to being not introduced into HDFS1.0 ZKFC component, so also just not having decision logic with the matched lowest version client of HDFS1.0, to can only access fixedly The host node of location then when just will appear host node and switching because breaking down, causes lowest version client that can not obtain switching Host node address afterwards, so that the situation of inquiry, update of data etc. can not be carried out by the host node after switching.
Host node switching method, device, electronic equipment and computer readable storage medium provided by the present application, it is intended to solve The technical problem as above of the prior art.
How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.
Embodiment one
The embodiment of the present application provides a kind of host node switching method, as shown in Figure 1, comprising:
Step S110 is sent out when Fisrt fault monitoring control devices to current primary node break down to coordination service center Contention requests are sent, contention requests are for requesting coordination service center to determine metadata node corresponding with Fisrt fault controller For target host node.
Specifically, the failed controller in the embodiment of the present application can be ZKFC (ZooKeeper FailOver Controller, distributed application program coordination service failed controller), coordination service center can be ZooKeeper (distribution The coordination service of formula application program), below by taking failed controller is ZKFC, coordination service center is ZooKeeper as an example, to this Shen Please embodiment be illustrated:
It with the continuous development of technology and updates, on the basis of HDFS (i.e. HDFS1.0) system of 1.0 versions, has sent out Include several in the new HDFS system (i.e. HDFS2.0) of 2.0 versions of cloth namely the HDFS cluster of server end (such as 2 It is a) HDFS2.0.It wherein, include metadata node (NameNode), a multiple back end in each HDFS system (DataNode) and multiple from metadata node (Secondary NameNode), each HDFS is by metadata node to phase It answers reading, write-in of clients providing data etc. to service, and passes through the healthy shape of corresponding ZKFC real-time monitoring metadata node Condition, for example, passing through second by the health status of the metadata node in first HDFS2.0 of the first ZKFC real-time monitoring The health status of metadata node in second HDFS2.0 of ZKFC real-time monitoring, it is of course also possible to real-time by the 2nd ZKFC The health status for monitoring the metadata node in first HDFS2.0, by second HDFS2.0 of the first ZKFC real-time monitoring Metadata node health status, the embodiment of the present application is not limited it.Meanwhile HDFS cluster several In HDFS2.0, there can only be the metadata node of one of HDFS2.0 as host node, provide data to relative client The services such as reading, write-in.
It is (following to be referred to as with the metadata node of first corresponding HDFS2.0 of the first ZKFC real-time monitoring below Metadata node 1) health status, second corresponding HDFS2.0 of the 2nd ZKFC real-time monitoring metadata node (under State referred to as metadata node 2) health status for, the embodiment of the present application is illustrated:
Specifically, the health status of the 2nd ZKFC real-time monitoring metadata node 2, and the metadata node 2 is host node, I.e. at this time VIP is bound with metadata node 2, i.e. VIP is directed to metadata node 2, due to that can only have one of them The metadata node of HDFS2.0 is as host node, so the metadata node (i.e. metadata node 1) of first HDFS2.0 is Non-master (or referred to as standby host node), wherein (non-master is standby for the metadata node 1 of the first ZKFC real-time monitoring With host node) health status.
It further, can be to distribution when the 2nd ZKFC monitors that current primary node (i.e. metadata node 2) breaks down Formula application program coordination service ZooKeeper reporting fault information, while the first ZKFC can be monitored currently by ZooKeeper Whether host node breaks down, and when the first ZKFC monitors that current primary node breaks down, can send and use to ZooKeeper Metadata node 1 corresponding with the first ZKFC is determined as to the contention requests of target host node in request ZooKeeper, i.e., first ZKFC initiates the competitive behavior of target host node to ZooKeeper, i.e., when the first ZKFC monitors that current primary node breaks down When, contention requests are sent to ZooKeeper, the contention requests are for requesting ZooKeeper will first number corresponding with the first ZKFC It is determined as target host node according to node.
Wherein, in the case where monitoring 1 health of metadata node, if monitoring, event occurs the first ZKFC for current primary node Barrier then initiates the competitive behavior of target host node to ZooKeeper, monitors that metadata node 1 is also unhealthy in the first ZKFC In the case where, even if it monitors that current primary node breaks down, the first ZKFC will not initiate the main section of target to ZooKeeper The competitive behavior of point.
Further, ZKFC inspects periodically the health of metadata node by health monitor (i.e. HealthMonitor) Situation coordinates change notification distributed application program by call back function when metadata node health status changes Service fault monitor (i.e. ZooKeeperFailOverController, referred to as ZKFailOverController), then lead to It crosses ZKFailOverController and reports ZooKeeper.Wherein, HealthMonitor can be by periodically to metadata section Point sends the mode of request data package, checks the health status of metadata node, is somebody's turn to do if being not received by metadata node and being directed to The response that request data package returns, or receiving the duration of response has been more than preset duration threshold value, it is determined that metadata node It is unhealthy.
Step S120 will be with by virtual IP VIP when receiving the confirmation message at coordination service center The corresponding metadata node of one failed controller is switched to target host node.
Specifically, when the first ZKFC monitors that current primary node breaks down, competition host node is initiated to ZooKeeper Competitive behavior, wherein the first ZKFC can by ZooKeeper send for requesting ZooKeeper will be with the first ZKFC Corresponding metadata node is determined as the contention requests of target host node, to the competitive behavior for initiating competition host node.Wherein, First ZKFC can also carry the health and fitness information of its real-time monitoring metadata node 1, metadata node address in contention requests Information etc..
Further, ZooKeeper can be based on pre-defined rule pair after the contention requests for receiving the first ZKFC transmission Metadata node 1 corresponding with the first ZKFC is detected, when metadata node 1 corresponding with the first ZKFC meets pre-defined rule When, when the first ZKFC metadata node 1 monitored is updated to host node by ZooKeeper determination, and it can be sent to the first ZKFC Corresponding confirmation message.When the first ZKFC receives the confirmation message of ZooKeeper return, illustrate the first ZKFC successful contention Host node is arrived, at this point, the first ZKFC can pass through VIP (Virtual Internet Protocol, virtual IP) Metadata node corresponding with the first ZKFC is switched to target host node, i.e. the first ZKFC by by VIP by original direction Metadata node 2 (host node to break down) is changed to be directed toward metadata node 1 (target host node), to realize host node Switching so that no matter highest version client also lowest version client be not necessarily to judge current meta data node whether based on save Point need to only specify the fixation VIP for being directed toward host node always in the client, by the host node before switching or can cut Host node after changing carries out data access.
Host node switching method provided by the embodiments of the present application, compared with prior art, when Fisrt fault monitoring control devices When breaking down to current primary node, contention requests are sent to coordination service center, contention requests are for requesting in coordination service Metadata node corresponding with Fisrt fault controller is determined as target host node by the heart, and being will be with the first event subsequently through VIP The corresponding metadata node of barrier controller is switched to target host node and lays the foundation;When the confirmation letter for receiving coordination service center When breath, metadata node corresponding with Fisrt fault controller is switched to by target host node by VIP, from regardless of highest version Client or lowest version client are not necessarily to judge whether current meta data node is host node, need to only specify in the client One is directed toward the fixation VIP of host node always, can carry out data access by host node, so that even if metadata node occurs Active-standby switch, existing lowest version client also can normally access switching after host node, without the low version of batch upgrade This client realizes the compatibility to lowest version client access host node.
Embodiment two
The embodiment of the present application provides alternatively possible implementation, further includes implementing on the basis of example 1 Method shown in example two, wherein
It further include step S100 (being not marked in figure) before step S110: Fisrt fault monitoring control devices current primary node Whether breaking down, step S100 specifically includes step S1001 (being not marked in figure) and step S1002 (being not marked in figure), In,
Step S1001: fault inquiry request is sent to coordination service center with prefixed time interval, fault inquiry request is used The fault message to break down in request detection coordination service center with the presence or absence of current primary node.
Step S1002: if receiving the confirmation message of coordination service center return, it is determined that event occurs for current primary node Barrier.
Wherein, fault message is that the second failed controller monitors the corresponding metadata section for being currently at host node The fault message that point is sent when breaking down;Wherein, Fisrt fault controller and the second failed controller are by coordination service Heart unified management.
Specifically, the 2nd ZKFC can monitor the corresponding metadata node as host node using prefixed time interval 2 health status, prefixed time interval can be 1 second, 3 seconds, 5 seconds etc., naturally it is also possible to be set as according to the actual situation other Value, wherein the 2nd ZKFC by way of sending request data package to host node, can check master according to prefixed time interval The health status of node if being not received by host node is directed to the response that the request data package returns, or receives response Duration has been more than preset duration threshold value, it is determined that host node breaks down, at this point, the 2nd ZKFC sends main section to ZooKeeper The failure message that point breaks down.
Further, it after the failure message that the host node that ZooKeeper receives the 2nd ZKFC transmission breaks down, can protect The failure message is deposited, meanwhile, the first ZKFC can also send fault inquiry request to ZooKeeper with prefixed time interval, preset Time interval can be 1 second, 2 seconds, 4 seconds etc., naturally it is also possible to be set as other values according to the actual situation, fault inquiry request The fault message to break down for whether there is current primary node in request detection ZooKeeper, when being saved in ZooKeeper When the fault message for having current primary node to break down, corresponding confirmation message can be returned to the first ZKFC, the first ZKFC is received To ZooKeeper return confirmation message when, that is, can determine that current primary node breaks down.
Further, the first ZKFC is the component of different HDFS from the 2nd ZKFC, each is present in different HDFS, i.e., First ZKFC and the 2nd ZKFC is mutually indepedent existing, but the first ZKFC and the 2nd ZKFC is led to ZooKeeper Letter is carried out the competition of host node by ZooKeeper, i.e., is managed collectively by ZooKeeper.
For the embodiment of the present application, the first ZKFC is by way of sending fault inquiry request to ZooKeeper, Ke Yishi When monitor whether current primary node breaks down, it is ensured that it can initiate at the first time main section when host node breaks down The competition of point, when effectively preventing host node and breaking down, due to switching host node not in time, caused client can not be visited The occurrence of asking host node.
Embodiment three
The embodiment of the present application provides alternatively possible implementation, further includes implementing on the basis of example 2 Method shown in example three, wherein
Step S120 is specifically included: by VIP and the unbinding relationship of host node to break down, and VIP and first is former The corresponding metadata node of barrier controller establishes binding relationship, for cutting the corresponding metadata node of Fisrt fault controller It is changed to target host node.
It further include step S111 (being not marked in figure) before step S120: by the host node to break down and the main section of target Point is isolated.
It further include step S121 (being not marked in figure) after step S120: by the corresponding metadata of Fisrt fault controller Node is switched to active state by inactive state, and the corresponding metadata node of Fisrt fault controller is switched to target master Node.
Specifically, when the first ZKFC receives the confirmation message that ZooKeeper is returned for its contention requests, illustrate the One ZKFC successful contention has arrived host node, wherein the contention requests are for requesting ZooKeeper will member corresponding with the first ZKFC Back end is determined as target host node.After the first ZKFC is competed successfully, need to cut corresponding metadata node 1 It is changed to host node, i.e. starting host node switching flow.
Further, in the handoff procedure of host node, firstly, the first ZKFC can trigger isolation process, i.e., event will occur The host node of barrier is isolated with target host node, is not currently in moving type with the host node to break down for ensuring to monitor State provides service to client not as host node, namely ensure currently to only have unique metadata node as host node, Fissure occurs to prevent from occurring simultaneously two host nodes.Then, the first ZKFC will start the switching flow of VIP, will originally refer to It to the VIP of the host node to break down, is switched to and is directed toward metadata node 1 corresponding with the first ZKFC, i.e., by VIP and generation event The unbinding relationship of the host node of barrier, and VIP metadata node corresponding with the first ZKFC is established into binding relationship, for inciting somebody to action The corresponding metadata node of first ZKFC is switched to target host node, so that the client of various versions being capable of basis VIP carries out data access to pass through host node in real time to HDFS.Then, the first ZKFC notifies corresponding metadata node 1 Active state is switched to by inactive state, i.e. corresponding metadata node is switched to by the first ZKFC by inactive state The corresponding metadata node of first ZKFC is switched to target host node by active state, thus to the client of various versions Service is provided.
Further, Fig. 2 is the process schematic of the host node handoff procedure of above-described embodiment one to embodiment three, is being schemed In 2, the first ZKFC monitors the health status of metadata node 1 in real time, and the 2nd ZKFC monitors the metadata as host node in real time The health status of node 2, when the 2nd ZKFC monitors that host node breaks down, meeting real-time report ZooKeeper, while first ZKFC also can send the fault inquiry request whether query master node breaks down to ZooKeeper in real time, and the first ZKFC is being looked into Asking or confirming that host node breaks down is just to rob lock to ZooKeeper, i.e., host node is competed to ZooKeeper, when first After ZKFC competition to host node, the switching flow of host node just will start, wherein the switching flow of host node includes in Fig. 2 Step 1 completes the switching flow of host node after completing step 1 to step 3 to step 3, thus will be with the first ZKFC pairs The metadata node 1 answered is switched to host node, i.e. VIP has been directed toward metadata node 1, and is no longer point to metadata node 2.
For the embodiment of the present application, the host node to break down is isolated with target host node, it is ensured that host node Uniqueness, effectively prevent the generation of fissure;By VIP and the unbinding relationship of host node that breaks down, and by VIP and the The corresponding metadata node of one failed controller establishes binding relationship, so that VIP is directed toward host node always, it is ensured that lowest version client End also can correctly get current primary node, ensure that the access of lowest version client is patrolled after host node switches It collects normally, realizes transparent access of the client to host node;Fisrt fault controller by corresponding metadata node by Inactive state is switched to active state, so that the corresponding metadata node of Fisrt fault controller is switched to target host node, And service is provided to the client of various versions.
Example IV
Fig. 3 is a kind of structural schematic diagram of host node switching device provided by the embodiments of the present application, as shown in figure 3, the dress Setting 30 may include sending module 31 and switching module 32;Wherein,
Sending module 31 is used for when Fisrt fault monitoring control devices to current primary node break down, to Distributed Application Program Coordination service coordination service centre sends contention requests, and contention requests are for requesting coordination service center will be with Fisrt fault The corresponding metadata node of controller is determined as target host node;
Switching module 32 is used for when receiving the confirmation message at coordination service center, passes through virtual IP VIP The corresponding metadata node of Fisrt fault controller is switched to target host node.
Specifically, sending module 31 determines submodule 312 including fault inquiry submodule 311 and failure, as shown in figure 4, Wherein,
Fault inquiry submodule 311 is used to send fault inquiry request to coordination service center with prefixed time interval, therefore Hinder fault message of the inquiry request for breaking down in request detection coordination service center with the presence or absence of current primary node;
Failure determines that submodule 312 is current main for determining when receiving the confirmation message of coordination service center return Nodes break down.
Further, fault message is that the second failed controller monitors the corresponding first number for being currently at host node The fault message sent when according to nodes break down;Wherein, Fisrt fault controller and the second failed controller are by coordinating to take The unified management of business center.
Further, switching module 32 is specifically used for VIP and the unbinding relationship of host node to break down, and will VIP metadata node corresponding with Fisrt fault controller establishes binding relationship, for Fisrt fault controller is corresponding Metadata node is switched to target host node.
It further, further include isolation module 33, as shown in Figure 4, wherein isolation module 33 is used for the master that will be broken down Node is isolated with target host node.
It further, further include processing module 34, as shown in Figure 4, wherein processing module 34 is for controlling Fisrt fault The corresponding metadata node of device is switched to active state by inactive state, by the corresponding metadata section of Fisrt fault controller Point is switched to target host node.
Device provided by the embodiments of the present application, compared with prior art, when Fisrt fault monitoring control devices to current main section When point breaks down, contention requests are sent to coordination service center, contention requests are for requesting coordination service center will be with first The corresponding metadata node of failed controller is determined as target host node, and being will be with Fisrt fault controller pair subsequently through VIP The metadata node answered is switched to target host node and lays the foundation;When receiving the confirmation message at coordination service center, pass through Metadata node corresponding with Fisrt fault controller is switched to target host node by VIP, from regardless of highest version client also That lowest version client is not necessarily to judge whether current meta data node is host node, only need to specify in the client one always It is directed toward the fixation VIP of host node, data access can be carried out by host node, so that being cut even if the active and standby of metadata node occurs It changes, existing lowest version client also can normally access the host node after switching, without batch upgrade lowest version client, Realize the compatibility to lowest version client access host node.
Embodiment five
The embodiment of the present application provides a kind of electronic equipment, as shown in figure 5, electronic equipment shown in fig. 5 500 includes: place Manage device 501 and memory 503.Wherein, processor 501 is connected with memory 503, is such as connected by bus 502.Further, Electronic equipment 500 can also include transceiver 504.It should be noted that transceiver 504 is not limited to one in practical application, it should The structure of electronic equipment 500 does not constitute the restriction to the embodiment of the present application.
Wherein, processor 501 is applied in the embodiment of the present application, for realizing Fig. 3 or sending module shown in Fig. 4 with cut Change the mold the function of block and the function of isolation module shown in Fig. 4 and processing module.
Processor 501 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure Various illustrative logic blocks, module and circuit.Processor 501 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..
Bus 502 may include an access, and information is transmitted between said modules.Bus 502 can be pci bus or EISA Bus etc..Bus 502 can be divided into address bus, data/address bus, control bus etc..For convenient for indicating, in Fig. 5 only with one slightly Line indicates, it is not intended that an only bus or a type of bus.
Memory 503 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.
Memory 503 is used to store the application code for executing application scheme, and is held by processor 501 to control Row.Processor 501 is for executing the application code stored in memory 503, to realize that Fig. 3 or embodiment illustrated in fig. 4 are mentioned The movement of the host node switching device of confession.
Electronic equipment provided by the embodiments of the present application, including memory, processor and storage on a memory and can located The computer program that runs on reason device, when processor executes program, compared with prior art, it can be achieved that: when Fisrt fault controls When device monitors that current primary node breaks down, competition is sent to distributed application program coordination service coordination service center and is asked It asks, contention requests are for requesting coordination service center that metadata node corresponding with Fisrt fault controller is determined as target master Node lays the foundation for metadata node corresponding with Fisrt fault controller is switched to target host node subsequently through VIP; When receiving the confirmation message at coordination service center, metadata node corresponding with Fisrt fault controller is cut by VIP It is changed to target host node, from regardless of highest version client or lowest version client are not necessarily to judge that current meta data node is No is host node, need to only specify the fixation VIP for being directed toward host node always in the client, can be counted by host node According to access, so that existing lowest version client also can normally access switching even if the active-standby switch of metadata node occurs Host node afterwards realizes the compatibility to lowest version client access host node without batch upgrade lowest version client.
The embodiment of the present application provides a kind of computer readable storage medium, is stored on the computer readable storage medium Computer program realizes method shown in embodiment one when the program is executed by processor.Compared with prior art, when the first event When barrier monitoring control devices break down to current primary node, contention requests are sent to coordination service center, contention requests are for asking Ask coordination service center that metadata node corresponding with Fisrt fault controller is determined as target host node, for subsequently through Metadata node corresponding with Fisrt fault controller is switched to target host node and laid the foundation by VIP;Coordinate clothes when receiving When the confirmation message at business center, metadata node corresponding with Fisrt fault controller is switched to by target host node by VIP, From regardless of highest version client or lowest version client are not necessarily to judge whether current meta data node is host node, only need The fixation VIP for being directed toward host node always is specified in the client, can carry out data access by host node, so that even if The active-standby switch of metadata node occurs, existing lowest version client also can normally access the host node after switching, and nothing Batch upgrade lowest version client is needed, the compatibility to lowest version client access host node is realized.
Computer readable storage medium provided by the embodiments of the present application is suitable for any embodiment of the above method.Herein not It repeats again.
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the application, it is noted that for the ordinary skill people of the art For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered It is considered as the protection scope of the application.

Claims (10)

1. a kind of host node switching method characterized by comprising
When Fisrt fault monitoring control devices to current primary node break down, contention requests, institute are sent to coordination service center Contention requests are stated for requesting coordination service center that metadata node corresponding with Fisrt fault controller is determined as target master Node;
It, will be with Fisrt fault controller by virtual IP VIP when receiving the confirmation message at coordination service center Corresponding metadata node is switched to target host node.
2. the method according to claim 1, wherein whether monitoring current primary node breaks down, comprising:
Fault inquiry request is sent to coordination service center with prefixed time interval, the fault inquiry request is used for request detection The fault message to break down in coordination service center with the presence or absence of current primary node;
If receiving the confirmation message of coordination service center return, it is determined that current primary node breaks down.
3. according to the method described in claim 2, it is characterized in that, the fault message be the second failed controller monitor with The fault message that its corresponding metadata node for being currently at host node is sent when breaking down;Wherein, Fisrt fault controls Device and the second failed controller are managed collectively by coordination service center.
4. the method according to claim 1, wherein will first number corresponding with Fisrt fault controller by VIP Target host node is switched to according to node, comprising:
By VIP and the unbinding relationship of host node that breaks down, and by VIP metadata section corresponding with Fisrt fault controller Point establishes binding relationship, for the corresponding metadata node of Fisrt fault controller to be switched to target host node.
5. according to the method described in claim 4, it is characterized in that, by VIP metadata corresponding with Fisrt fault controller Node is established before binding relationship, further includes:
The host node to break down is isolated with target host node.
6. according to the method described in claim 4, it is characterized in that, by VIP metadata corresponding with Fisrt fault controller Node is established after binding relationship, further includes:
The corresponding metadata node of Fisrt fault controller is switched to active state by inactive state, by Fisrt fault control The corresponding metadata node of device processed is switched to target host node.
7. a kind of host node switching device characterized by comprising
Sending module, for when Fisrt fault monitoring control devices to current primary node break down, to distributed application program Coordination service coordination service center sends contention requests, and the contention requests are for requesting coordination service center will be with Fisrt fault The corresponding metadata node of controller is determined as target host node;
Switching module, for when receiving the confirmation message at coordination service center, by virtual IP VIP by first The corresponding metadata node of failed controller is switched to target host node.
8. device according to claim 7, which is characterized in that the sending module includes fault inquiry submodule and failure Determine submodule;
The fault inquiry submodule, it is described for sending fault inquiry request to coordination service center with prefixed time interval Fault message of the fault inquiry request for breaking down in request detection coordination service center with the presence or absence of current primary node;
The failure determines submodule, for determining current main section when receiving the confirmation message of coordination service center return Point breaks down.
9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor realizes that host node described in any one of claims 1-6 is cut when executing described program Change method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the program realize host node switching method described in any one of claims 1-6 when being executed by processor.
CN201810925076.3A 2018-08-14 2018-08-14 Host node switching method, device, electronic equipment and computer storage medium Pending CN109101196A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810925076.3A CN109101196A (en) 2018-08-14 2018-08-14 Host node switching method, device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810925076.3A CN109101196A (en) 2018-08-14 2018-08-14 Host node switching method, device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN109101196A true CN109101196A (en) 2018-12-28

Family

ID=64849677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810925076.3A Pending CN109101196A (en) 2018-08-14 2018-08-14 Host node switching method, device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN109101196A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417600A (en) * 2019-08-02 2019-11-05 秒针信息技术有限公司 Node switching method, device and the computer storage medium of distributed system
CN110688148A (en) * 2019-10-08 2020-01-14 中国建设银行股份有限公司 Method, device, equipment and storage medium for equipment management
CN111404647A (en) * 2019-01-02 2020-07-10 中兴通讯股份有限公司 Control method of node cooperative relationship and related equipment
CN111444062A (en) * 2020-04-01 2020-07-24 山东汇贸电子口岸有限公司 Method and device for managing master node and slave node of cloud database
CN112087336A (en) * 2020-09-11 2020-12-15 杭州海康威视系统技术有限公司 Deployment and management method and device of virtual IP service system and electronic equipment
CN113852506A (en) * 2021-09-27 2021-12-28 深信服科技股份有限公司 Fault processing method and device, electronic equipment and storage medium
CN113949691A (en) * 2021-10-15 2022-01-18 湖南麒麟信安科技股份有限公司 ETCD-based virtual network address high-availability implementation method and system
CN114338370A (en) * 2022-01-10 2022-04-12 北京金山云网络技术有限公司 Highly available method, system, apparatus, electronic device and storage medium for Ambari
CN115396296A (en) * 2022-08-18 2022-11-25 中电金信软件有限公司 Service processing method and device, electronic equipment and computer readable storage medium
CN116781494A (en) * 2023-08-17 2023-09-19 天津南大通用数据技术股份有限公司 Main-standby switching judgment method based on existing network equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729290A (en) * 2009-11-04 2010-06-09 中兴通讯股份有限公司 Method and device for realizing business system protection
CN103973424A (en) * 2014-05-22 2014-08-06 乐得科技有限公司 Method and device for removing faults in cache system
CN205901808U (en) * 2016-08-05 2017-01-18 国家电网公司 Accomplish distributed storage system of first data nodes automatic switch -over
CN106911728A (en) * 2015-12-22 2017-06-30 华为技术服务有限公司 The choosing method and device of host node in distributed system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729290A (en) * 2009-11-04 2010-06-09 中兴通讯股份有限公司 Method and device for realizing business system protection
CN103973424A (en) * 2014-05-22 2014-08-06 乐得科技有限公司 Method and device for removing faults in cache system
CN106911728A (en) * 2015-12-22 2017-06-30 华为技术服务有限公司 The choosing method and device of host node in distributed system
CN205901808U (en) * 2016-08-05 2017-01-18 国家电网公司 Accomplish distributed storage system of first data nodes automatic switch -over

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓鹏: "主从式云计算平台高可用性研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111404647A (en) * 2019-01-02 2020-07-10 中兴通讯股份有限公司 Control method of node cooperative relationship and related equipment
CN111404647B (en) * 2019-01-02 2023-11-28 中兴通讯股份有限公司 Control method of node cooperative relationship and related equipment
CN110417600A (en) * 2019-08-02 2019-11-05 秒针信息技术有限公司 Node switching method, device and the computer storage medium of distributed system
CN110688148A (en) * 2019-10-08 2020-01-14 中国建设银行股份有限公司 Method, device, equipment and storage medium for equipment management
CN111444062B (en) * 2020-04-01 2023-09-19 山东汇贸电子口岸有限公司 Method and device for managing master node and slave node of cloud database
CN111444062A (en) * 2020-04-01 2020-07-24 山东汇贸电子口岸有限公司 Method and device for managing master node and slave node of cloud database
CN112087336A (en) * 2020-09-11 2020-12-15 杭州海康威视系统技术有限公司 Deployment and management method and device of virtual IP service system and electronic equipment
CN112087336B (en) * 2020-09-11 2022-09-02 杭州海康威视系统技术有限公司 Deployment and management method and device of virtual IP service system and electronic equipment
CN113852506A (en) * 2021-09-27 2021-12-28 深信服科技股份有限公司 Fault processing method and device, electronic equipment and storage medium
CN113852506B (en) * 2021-09-27 2024-04-09 深信服科技股份有限公司 Fault processing method and device, electronic equipment and storage medium
CN113949691A (en) * 2021-10-15 2022-01-18 湖南麒麟信安科技股份有限公司 ETCD-based virtual network address high-availability implementation method and system
CN114338370A (en) * 2022-01-10 2022-04-12 北京金山云网络技术有限公司 Highly available method, system, apparatus, electronic device and storage medium for Ambari
CN115396296A (en) * 2022-08-18 2022-11-25 中电金信软件有限公司 Service processing method and device, electronic equipment and computer readable storage medium
CN116781494A (en) * 2023-08-17 2023-09-19 天津南大通用数据技术股份有限公司 Main-standby switching judgment method based on existing network equipment
CN116781494B (en) * 2023-08-17 2024-03-26 天津南大通用数据技术股份有限公司 Main-standby switching judgment method based on existing network equipment

Similar Documents

Publication Publication Date Title
CN109101196A (en) Host node switching method, device, electronic equipment and computer storage medium
US10389824B2 (en) Service management modes of operation in distributed node service management
JP6026705B2 (en) Update management system and update management method
US9749415B2 (en) Service management roles of processor nodes in distributed node service management
EP1643681B1 (en) Scheduled determination of networks resource availability
CN109344014B (en) Main/standby switching method and device and communication equipment
CN110855792B (en) Message pushing method, device, equipment and medium
CN103888277B (en) A kind of gateway disaster-tolerant backup method, device and system
JP2004280738A (en) Proxy response device
CN105141400A (en) High-availability cluster management method and related equipment
US20070270984A1 (en) Method and Device for Redundancy Control of Electrical Devices
CN106230622A (en) A kind of cluster implementation method and device
JPH10312365A (en) Load decentralization system
CN110119314A (en) A kind of server calls method, apparatus, server and storage medium
CN110224872B (en) Communication method, device and storage medium
CA2745824C (en) Registering an internet protocol phone in a dual-link architecture
JP5613119B2 (en) Master / slave system, control device, master / slave switching method, and master / slave switching program
CN110661836B (en) Message routing method, device and system, and storage medium
CN113824595B (en) Link switching control method and device and gateway equipment
CN115484208A (en) Distributed drainage system and method based on cloud security resource pool
EP3435615B1 (en) Network service implementation method, service controller, and communication system
CN112394662A (en) Transformer substation monitoring system server role determination method and system
CN109697126A (en) A kind of data processing method and device for server
CN110890989A (en) Channel connection method and device
JP2004295656A (en) Communication system, client device, load distribution method of server device by client device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181228