CN107046474B - service cluster - Google Patents

service cluster Download PDF

Info

Publication number
CN107046474B
CN107046474B CN201611187600.9A CN201611187600A CN107046474B CN 107046474 B CN107046474 B CN 107046474B CN 201611187600 A CN201611187600 A CN 201611187600A CN 107046474 B CN107046474 B CN 107046474B
Authority
CN
China
Prior art keywords
name node
name
node
main
standby
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611187600.9A
Other languages
Chinese (zh)
Other versions
CN107046474A (en
Inventor
璧靛奖
赵影
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201611187600.9A priority Critical patent/CN107046474B/en
Publication of CN107046474A publication Critical patent/CN107046474A/en
Application granted granted Critical
Publication of CN107046474B publication Critical patent/CN107046474B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • H04L41/0826Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability for reduction of network costs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)
  • Computer And Data Communications (AREA)

Abstract

the application provides a service cluster, when a plurality of main name nodes exist, a standby name node can know whether the main name nodes are in failure or not by sending test information to each main name node, and sends a selected main application to a target main name byte to apply for self to become the main name nodes when the main name nodes are in failure, the target main name node can add the standby name nodes into a main name node list, when only one main name node exists, the standby name node can know whether the main name nodes are in failure or not by sending test information to the main name nodes, and can add the standby name nodes into the main name node list as the main name nodes when the main name nodes are in failure. According to the method and the device, the fault information of the name nodes can be automatically discovered, and when a certain main name node is in fault, the standby name nodes become the main name nodes and are added into the main name node list, and the method and the device do not depend on the Zookeeper cluster any more, so that the system architecture is simplified, and the operation and maintenance cost is greatly reduced.

Description

Service cluster
Technical Field
The invention relates to the technical field of big data, in particular to a service cluster.
Background
Nowadays, the Hadoop HDFS has become the most mainstream distributed file storage system in the big data field, and the name node (NameNode) is the most important management node in the HDFS, and its role is mainly embodied in the following aspects: the file directory tree is responsible for maintaining the whole file system; storing metadata information of a file system; and the device is responsible for receiving operation requests of users. It can be seen that stable operation of the name node is crucial for HDFS.
In Hadoop 1.X, HDFS has a name node Single Point Of Failure (SPOF) problem, i.e. when a Single point fails, the Single point Of Failure may reach the entire system or network, thereby causing the entire system or network to be broken down. In Hadoop2.x, Hadoop provides an HDFSHA scheme based on Zookeeper, in the scheme, a main name node and a standby name node are adopted, a Zookeeper cluster and a main-standby switching controller are introduced, the main-standby switching controller is used for detecting state information of the main name node and the standby name node, and when the main name node fails, main-standby switching is realized by means of a leader election function of the Zookeeper cluster.
in the HDFS HA scheme based on Zookeeper, although the single-point failure problem of HDFS is alleviated to some extent by adopting a scheme of a primary name node and a backup name node, a Zookeeper cluster needs to be introduced in the scheme, the cluster needs at least 3 servers, and for a scene only needing to use HDFS alone, the cluster is very redundant, and the operation and maintenance cost is increased.
Disclosure of Invention
In view of this, the present invention provides a service cluster, so as to solve the problems that in the HDFS HA scheme of Zookeeper in the prior art, a Zookeeper cluster needs to be introduced, and for a scenario that only HDFS is to be used alone, the scenario is very redundant, and the operation and maintenance cost is increased, and the technical scheme thereof is as follows:
a service cluster, the service cluster comprising: the system comprises a plurality of main name nodes and at least one standby name node, wherein each name node maintains a main name node list, a standby name node list and a failure name node list;
The standby name node sends test information to each main name node according to preset periodicity so as to test whether each main name node works normally, when one main name node has a fault, the nth main name node which works normally in the main name node list is used as a target main name node, and a main selection application is sent to the target main name node, wherein the main selection application is used for applying for the main name node to become the main name node, the value of N is 1 … N, and N is the number of the main name nodes which work normally in the main name node list;
after receiving the primary application of selection sent by the standby name node, the target primary name node triggers each primary name node not to receive the data operation request of the client any more, adds the standby name node to its primary name node list, deletes the standby name node from its standby name node list, deletes the failed primary name node from its primary name node list, adds the failed primary name node to its failed name node list, and synchronizes information that the primary name node fails and the standby name node becomes the primary name node to other name nodes in the service cluster, so that the other name nodes update their lists based on the information.
Wherein each primary name node also maintains metadata;
And after the target main name node adds the standby name node to the main name node list of the target main name node, the target main name node sends metadata maintained by the target main name node to the standby name node which becomes the main name node.
And after the target main name node finishes updating the list by other name nodes in the service cluster and sends the metadata maintained by the target main name node to the standby name node which becomes the main name node, triggering each main name node in the service cluster to receive the data operation request of the client.
Wherein, the number of the standby name nodes is multiple;
And when a plurality of standby name nodes simultaneously send the selected main application, the target main name node takes the standby name node corresponding to the first received selected main application as the main name node and adds the standby name node to the main name node list of the target main name node.
each main name node periodically sends test information to each standby name node in the service cluster to test whether each standby name node works normally or not, when a target standby name node fails, the target standby name node is deleted from a standby name node list of the target standby name node, the target standby name node is added to a failure name node list of the target standby name node, and meanwhile, the information that the target standby name node fails is synchronized to other name nodes in the service cluster.
The service cluster provides an interface for expanding the name nodes, and receives an application of a newly added name node through the interface;
the target main name node receives the application of the new name node, adds the new name node to the standby name node list of the target main name node, and synchronizes information of the new name node serving as the standby name node to other name nodes in the service cluster;
Alternatively, the first and second electrodes may be,
And the target main name node receives the application of the newly added name node, sends test information to the newly added name node to test whether the newly added name node works normally, and adds the newly added name node to the standby name node list of the target main name node when the newly added name node works normally, so that the information of the newly added name node serving as the standby name node is synchronized to other name nodes in the service cluster.
The target main name node sends test information to the fault name node in the fault name node list according to a preset time interval to test whether the fault name node is recovered to be normal or not, when the fault name node is recovered to be normal, the fault name node recovered to be normal is deleted from the fault name list of the target main name node, the fault name node recovered to be normal is added into the standby name node list of the target main name node, and the information that the fault name node is recovered to be normal is synchronized to other name nodes in the service cluster.
a service cluster, comprising: the system comprises at least one main name node and a plurality of standby name nodes, wherein each name node stores metadata and has node information, and the node information comprises a node identification value, a metadata identification value and an election value;
a first standby name node in the standby name nodes sends test information to the at least one main name node according to a preset period so as to test whether the at least one main name node works normally, and when the at least one main name node fails, the first standby name node sends node information of the first standby name node and information for electing the first standby name node to become the main name node to other standby name nodes;
When the election value of the first standby name node is greater than the election value of the second standby name node, if the metadata identification value of the first standby name node is greater than the metadata identification value of the second standby name node, sending information for electing the first standby name node to become the main name node to the other standby name nodes as an election result, and if the metadata identification value of the first standby name node is equal to the metadata identification of the second standby name node, if the node identification value of the first standby name node is greater than or less than the node identification value of the second standby name node, sending the election result for electing the first standby name node to become the main name node to the other standby name nodes;
And the first standby name node sets itself as the main name node when the received election result shows that the name nodes with the number larger than the preset number elect themselves to be the main name node.
And when the election value of the first standby name node is equal to the election value of the second standby name node, the second standby name node judges whether the metadata identification value of the second standby name node is larger than the metadata identification value of the first standby name node, if so, the second standby name node adds 1 to the election value, and sends the information of the election node which becomes the main name node as the election result to other standby name nodes.
And the second standby name node sends the node information of the second standby name node to the first standby name node when the election value of the first standby name node is smaller than the election value of the second standby name node.
the technical scheme has the following beneficial effects:
According to the service cluster provided by the invention, the standby name nodes can know whether the main name nodes are in failure or not by sending test information to each main name node, and can send a main selection application to the target main name byte points to apply for the main name nodes to become the main name nodes when the main name nodes are in failure, and the target main name nodes can add the standby name nodes into the main name node list or select the main name nodes by a plurality of standby nodes through an election strategy. The service cluster and the HDFS cluster provided by the invention can independently discover the fault information of the name node, and when a certain main name node has a fault, the standby name node becomes the main name node and is added into the list of the main name node instead of relying on the Zookeeper cluster, so that the system architecture is simplified, and the operation and maintenance cost is greatly reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a service cluster according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an embodiment of a service cluster according to the present invention;
FIG. 3 is a schematic diagram of the service cluster shown in FIG. 2 after a primary name node fails and a backup name node becomes the primary name node;
FIG. 4 is a schematic diagram of the service cluster shown in FIG. 3 after a name node is extended;
FIG. 5 is a schematic diagram of the service cluster shown in FIG. 4 after the failed name node in the service cluster recovers;
Fig. 6a to fig. 6d are schematic diagrams illustrating a process of selecting a primary name node from a backup name node in a service cluster according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
an embodiment of the present invention provides a service cluster, where the service cluster is applied to an HDFS cluster, please refer to fig. 1, which shows a schematic structural diagram of the service cluster, and the service cluster may include: a plurality of primary name nodes and at least one backup name node. Each name node maintains three lists, a primary name node list, a backup name node list, and a failed name node list.
The main name node list is a list of all current main name nodes in the service cluster, the standby name node list is a list of all current standby name nodes in the service cluster, and the fault name node list is a list of all current fault name nodes in the service cluster.
The standby name node periodically sends test information to each main name node in the server group to test whether each main name node works normally or not, and when one main name node in a plurality of main name nodes fails, the nth main name node which works normally in the main name node list is used as a target main name node, and a main selection application is sent to the target main name byte point.
Wherein, the value of N is 1, …, and N is the number of the main name nodes which normally work in the main name node list. It should be noted that, which normally-working primary name node in the primary name node list is designated in advance as the target primary name node, for example, the 1 st normally-working primary name node in the primary name node list may be designated in advance as the target primary name node, that is, the value of n is 1, and when a failure occurs in one of the primary name nodes, the backup name node sends the election primary application to the 1 st normally-working primary name node in the primary name node list. Wherein, the main application is used for applying itself to become the main name node.
And after receiving the main selection application sent by the standby name node, the target main name node triggers each main name node not to receive the data operation request of the client any more, adds the standby name node to the main name node list of the target main name node, deletes the standby name node from the standby name node list of the target main name node, deletes the main name node with the fault from the main name node list of the target main name node, adds the main name node with the fault to the fault name node list of the target main name node, and synchronizes information of the main name node with the fault and the standby name node as the main name node to other name nodes in the service cluster so that the other name nodes update the list of the target main name node based on the information.
It should be noted that, when there are multiple backup name nodes in the service cluster, there may be a case where the multiple backup name nodes simultaneously find that a certain primary name node fails, and simultaneously send a primary application to the target primary name node, so that the target cluster may add the backup name node corresponding to the first received primary application as the primary name node to its own primary name node list.
in this embodiment, there are multiple primary name nodes in the service cluster, and the multiple primary name nodes can receive and process operation requests from users at the same time, so that when the HDFS cluster is large in scale, load balancing of the clients can be achieved.
It should be noted that each main name node in the service cluster also maintains metadata, and the main name node processes the data operation request of the client based on the metadata. It will be appreciated that the backup name node, after becoming the primary name node, needs to obtain metadata for the primary name node in order to implement the processing of the data operation request, and thus, in this example, the target primary name node sends metadata maintained by itself to the backup name node that becomes the primary name node after adding the backup name node to its list of primary name nodes.
It can be understood that, after receiving the backup application of the backup name node, the target master node needs to update the master name node in the service cluster, and therefore, triggers each master name node to no longer receive the data operation request of the client, that is, locks the HDFS cluster, and when each name node that normally works in the service cluster has updated its own list and the target master name node synchronizes its metadata to the backup name node that becomes the master name node, also triggers each master name node in the service cluster to receive and process the data operation request of the client again, that is, unlocks the HDFS cluster.
The service cluster provided by the invention comprises a plurality of main name nodes and at least one standby name node, wherein the standby name node can know whether the main name node fails or not by sending test information to each main name node, and can send a main selection application to a target main name byte point to apply for the main name node to become the main name node when the main name node fails, and the target main name node can add the standby name node to a main name node list. That is, the service cluster provided in the embodiment of the present invention can autonomously discover the failure information of the name node, and when a failure occurs in a certain primary name node, the standby name node becomes the primary name node and is added to the list of the primary name node, and does not depend on the Zookeeper cluster any more, thereby simplifying the system architecture and greatly reducing the operation and maintenance cost.
In addition, because a plurality of main name nodes exist in the cluster at the same time, a plurality of data operation requests from the client can be received and processed at the same time, so that the load balance of the client is realized.
in a possible implementation manner, in the service cluster provided by the foregoing embodiment, the spare name byte may periodically send a PING command (or heartbeat packet) to each primary name node in the server cluster, and for each PING command sent, if a response of the primary name node to the PING command (or heartbeat packet) is received within a preset time, it indicates that the primary name node is working normally, and if a response of the primary name node to the PING command (or heartbeat packet) is not received within the preset time, it indicates that the primary name node is failed.
In the service cluster provided by the above embodiment, each backup name node can send test information to each primary name node in the service cluster to test whether the primary name node is working normally, however, at some time, the backup name node may also be out of order, if the backup name node is out of order, the performance of the whole service cluster will be affected, in order to be able to find out the failure of the backup name node, in the service cluster provided by the above embodiment, each primary name node periodically sends test information to each backup name node in the service cluster to test whether each backup name node is working normally, when the target backup name node is out of order, the target backup name node is deleted from its own list of backup name nodes and added to its own list of failed name nodes, meanwhile, the information that the target standby name node fails is synchronized to other name nodes in the service cluster.
In some cases, for example, when the HDFS cluster is enlarged in size, the number of name nodes in the current service cluster may not meet the requirement, or when a plurality of name nodes in the current service cluster fail, it may be necessary to expand the size of the service cluster, that is, add a new name node in the service cluster.
In one possible implementation manner, when receiving an application of a new name node, a target primary name node adds the new name node as a standby name node to its own standby name node list, and then synchronizes information of the new name node serving as the standby name node to other name nodes in the service cluster.
considering that a newly added name node may also fail due to some reason, in order to ensure that a name node newly added to a service cluster is a name node capable of working normally, in another possible implementation manner, when a target main name node receives an application of the newly added name node, first sending test information to the newly added name node to test whether the newly added name node works normally, when the newly added name node works normally, adding the newly added name node to its own backup name node list, and simultaneously, synchronizing information that the newly added name node becomes a backup name node to other name nodes in the service cluster.
it should be noted that, in the foregoing implementation manner, the target host node adds the new name node to its own backup name node list, and then synchronizes the information that the new name node becomes the backup name node to other name nodes in the service cluster.
In some cases, the failed name node in the cluster may be recovered to normal, and in order to reuse the recovered-to-normal name node, in the service cluster provided in the above embodiment, the target primary name node may send test information to the failed name node in the failed name node list at preset time intervals to test whether the failed name node is recovered to normal. When the failed name node is recovered to be normal, in a possible implementation manner, the failed name node which is recovered to be normal is added to the standby name node list of the failed name node, the failed name node which is recovered to be normal is deleted from the failed name list of the failed name node, and meanwhile, the information that the failed name node is recovered to be normal and becomes the standby name node is synchronized to other name nodes in the service cluster. In another possible implementation, if the failed name node that recovers to normal is previously the primary name node, the target primary name node may add it to its own primary name node and delete it from its own failed name list, and then synchronize the information that the failed name node recovers to normal and becomes the primary name node to other name nodes in the service cluster, and in addition, the target primary name node also needs to synchronize metadata maintained by itself to the failed name node that recovers to normal; if the failed name node that recovered from being normal was previously a backup name node, the target primary name node may add it to its backup name node and remove it from its failed name list, and then synchronize the information that the failed name node recovered from being normal and became a backup name node to the other name nodes in the service cluster.
The service cluster provided in the foregoing embodiment may further include: and (6) synchronizing the nodes. And the main name node in the service cluster sends the information that the state of the name node is changed to other name nodes through the synchronous node, so that the other name nodes update the list of the main name node based on the information.
On the basis of the above-mentioned embodiments of the present invention, a specific example is shown for explanation:
Referring to FIG. 2, a block diagram of an embodiment of a service cluster is shown, which includes: two primary name nodes (Active namenodes, abbreviated as Active NN), NN1 and NN2, respectively, and two backup name nodes (Standby namenodes, abbreviated as Standby NN), NN3 and NN4, respectively. The two main name nodes NN1 and NN2 can receive and process data operation requests from clients simultaneously, and load balancing of the clients is achieved.
NN1, NN2, NN3, and NN4 each maintain a list of primary name nodes, a list of backup name nodes, and a list of failed name nodes, i.e., the primary name nodes include NN1 and NN2, the list of backup name nodes include NN3 and NN4, and there is no NN in the list of failed name nodes for the moment.
(1) The backup name node detects the failure of the primary name node:
The NN3 periodically sends test information (e.g., ping commands) to the NN1 and NN2, respectively, to test whether the NN1 and NN2 are working properly, assuming that the NN3, after sending a ping command to the NN2, no response to the ping command is received within a preset time, indicating that the NN2 is malfunctioning, at this time, the NN3 sends an election master application to the first normally working master name node NN1 in the master name node list, and after the NN1 receives the election master application, locking of the HDFS cluster is triggered immediately (the HDFS cluster does not receive any data operation request), NN3 is added as the primary name node to the primary name node list, NN3 is deleted from the alternate name node list, and, the NN2 is removed from the master name node list, the NN2 is added to the failed name node list, then, the information that the NN2 fails and the NN3 becomes the master name node is synchronized to the NN3 and NN4, and the NN3 and NN4 update their own list with the information. In addition, each of the main name nodes maintains metadata, and the main name nodes process data operation requests based on the metadata, and since the NN3 becomes the main name node, in order to enable the main name node to process the data operation requests, the NN1 needs to synchronize its own metadata to the NN 3. After the list and metadata update is completed, the NN1 triggers HDFS cluster unlocking, so that it receives data operation requests.
It should be noted that, for the NN4, the same operation as that of the NN3 is performed, and is not described herein again. When the NN2 fails, the NN3 and the NN4 may simultaneously send a candidate application to the NN1, and if the NN1 first receives the candidate application sent by the NN3, the NN3 is added to the master name node list as a master name node, whereas if the NN1 first receives the candidate application sent by the NN4, the NN4 is added to the master name node list as a master name node. Assuming that the NN3 becomes the primary name node, please refer to fig. 3, at this time, the primary name nodes in the service cluster are NN1 and NN3, the backup name node is NN4, and the failure name node is NN2, i.e., the primary name node list includes NN1 and NN3, the backup name node list includes NN4, and the failure name node list includes NN 2.
(2) The primary name node detects the failure of the backup name node:
The master name node NN1 in fig. 3 sends test information (e.g., heartbeat packet) to the backup name node NN4, and if no response of the test information is received within a preset time, it indicates that the NN4 fails, otherwise, it indicates that the NN4 is working normally. If the NN4 fails, the NN4 is deleted from the backup name node list, the NN4 is added to the failed name node list, and meanwhile, the information that the NN4 fails is synchronized to the NN3, and the NN3 updates its own list based on the information. For the NN3, it also performs the same operations as NN1, and will not be described herein.
(3) Extension of name node:
The service cluster provides an interface (e.g., a rest API) for the extended name node, through which the service cluster receives a request for a new name node, when the service cluster receives the request of the new name node, the first normally working main name node in the main name node list will send the test information (such as ping command) to the new name node to test whether the new name node can normally work, taking fig. 3 as an example, the first properly functioning primary name node in the primary name node list is NN1, i.e., NN1, sends a ping command to the new name node NN5, if the new name node NN5 is able to function properly, the new named node NN5 is added as an alternate named node to the alternate named node list, and the information of the newly added name node NN5 which becomes the spare name node is synchronized to the NN3 and the NN4, and the NN3 and the NN4 synchronize self lists based on the information. In addition, since the NN5 is a new name node, the NN1 needs to synchronize all current list information to the NN 5. Please refer to fig. 4, which shows a service cluster after adding NN5, at this time, the primary name nodes in the service cluster are NN1 and NN3, the backup name nodes are NN4 and NN5, and the failure name node is NN2, that is, the primary name node list includes NN1 and NN3, the backup name node list includes NN4 and NN5, and the failure name node list includes NN 2.
It should be noted that, in addition to the above implementation, the new name node NN5 may be added to the list of main name nodes when the main name node is NN1, and information of the new name node NN5 becoming the main name node is synchronized to NN3 and NN4, and NN3 and NN4 synchronize their own lists based on the information. In addition, since the new name node NN5 becomes the master node, it needs to process data operation requests from clients, and therefore, the master name node NN1 needs to synchronize its own metadata to NN 5.
(4) Self-detection of failed name node:
Taking fig. 4 as an example, the first normally operating master name node NN1 in the master name node list sends test information (e.g., ping command) to the failed name node NN2 at preset time intervals to test whether NN2 is normal, if NN2 is normal, NN2 is added to the backup name node list and removed from the failed name node list, and then, the information that NN2 is normal is synchronized to NN3, NN4, NN5, NN3, NN4, and NN5 to update its own list based on the information. In addition, the NN1 may also need to synchronize its current listing information to the NN 2. Fig. 5 shows the service cluster after the NN2 state is updated, at this time, the main name nodes in the service cluster are NN1 and NN3, the backup name nodes are NN4, NN5 and NN2, that is, the main name node list includes NN1 and NN3, the backup name node list includes NN4, NN5 and NN2, and there is no name node in the failure name node list.
It should be noted that, when the NN2 returns to normal, since it is the primary name node before, it may also be added to the primary name node list and deleted from the failed name node list, and then, the information that the NN2 returns to normal and becomes the primary name node is synchronized to the NN3, NN4 and NN5, and the NN3, NN4 and NN5 update their own list based on the information, and in addition, since the NN2 becomes the primary node and needs to process a data operation request from a client, the NN1 needs to synchronize its own metadata to the NN 2.
The embodiment of the present invention further provides another service cluster, where the service cluster may include: the system comprises at least one main name node and a plurality of standby name nodes, wherein each name node is stored with metadata and has node information, and the node information comprises a node identification value (NNID), a metadata identification value (MetaID) and an election value (TransNumber).
The metadata identification values of the standby nodes are obtained from the main name node, and theoretically, the metadata identifications of the standby nodes are the same, however, as the main name node may fail, when the main name node fails, the main name node may synchronize the metadata identification values to only a part of the standby name nodes, and the other part of the standby name nodes are not updated because the metadata identifications sent by the main name node are not received, so that the metadata identification values of the standby name nodes may be different. The larger the metadata identification value of the standby node is, the more new the metadata is, so that the main node is elected from the standby node based on the metadata identification value in the embodiment. The node identification values of the standby name nodes are unchanged, and the node identification values are used for identifying and distinguishing the standby name nodes on one hand, and on the other hand, when the metadata identification values of the standby name nodes are the same, the node identification values are used as a basis for further electing the main node. For example, when a backup name node receives the election information and node information sent by other backup name nodes, if the metadata identification value of the backup name node is found to be larger than that in the node information received by the backup name node, the backup name node can initiate a new round of election, and the election value is increased by 1, wherein the election value can reflect the current number of election rounds of the backup name node.
The first backup name node in the backup name nodes sends test information to at least one main name node according to a preset period to test whether the at least one main name node works normally, and when the at least one main name node fails, the first backup name node sends node information of the first backup name node to other backup name nodes and information for electing the first backup name node to become the main name node.
And when the election value of the first standby name node is greater than the election value of the second standby name node, if the metadata identification value of the first standby name node is greater than the metadata identification value of the second standby name node, the second standby name node sends the information that the first standby name node is elected to become the main name node to the other standby name nodes as the election result.
It should be noted that, when the election value of the first backup name node is greater than the own election value, there may be a case where the metadata identification value of the first backup name node is equal to the own metadata identification value, and at this time, it may be further determined whether to agree to elect the first backup name node as the primary name node based on the node identification value. In one possible implementation, it may be pre-configured that when the node identification of the first alternate name node is greater than that of the second alternate name node, the second alternate name node agrees to elect the first alternate name node as the primary name node, and in another possible implementation, it may be pre-configured that when the node identification of the first alternate name node is less than that of the second alternate name node, the second alternate name node agrees to elect the first alternate name node as the primary name node.
And the first standby name node sets the first standby name node as the main name node when the received election result shows that the name nodes with the number larger than the preset number elect to become the main name node.
and the second standby name node judges whether the own metadata identification value is larger than the metadata identification value of the first standby name node or not when the election value of the first standby name node is equal to the own election value, if so, the second standby name node updates the election value, adds 1 to the election value, and sends the information of the election itself becoming the main name node to other standby name nodes as an election result.
It should be noted that, when each backup name node sends the election result, it may send its own node information together, so that other backup name nodes update their own node information when they find it older.
And the second standby name node sends the node information of the second standby name node to the first standby name node when the election value of the first standby name node is smaller than the election value of the second standby name node, so that the first standby name node updates the node information of the second standby name node.
It should be noted that, in a possible implementation manner, the first backup name may determine whether the election results of all backup name nodes are received, and if so, directly set its own role based on the election results, specifically, if there are name nodes in the election results that are greater than a preset number of name nodes, elect themselves to be the primary name nodes, set its own role as the primary name nodes, otherwise, set its own role as the backup name nodes. In another possible implementation manner, the first backup name node does not need to set its own role after receiving the election results of all backup name nodes, but counts the number of backup name nodes that the current election itself becomes the main name node in real time based on the received election results, waits for a preset time length if the number of backup name nodes that the current election itself becomes the main name node is greater than the preset number, sets itself as the main name node if node information that is sent by other backup nodes and is better than itself (for example, node information that the metadata identification value is greater than the metadata identification value of itself is not received) is not received within the preset time length, and otherwise, performs the next election. After setting the role of the first standby node as the main name node, the first standby node synchronizes the information of the first standby node which becomes the main name node to other name nodes.
For each alternate name node, there are three states in the current election process, LOOKING (election state), FOLLOWING (alternate node state), LEADING (primary node state), and the operations performed by the second alternate name node are all those when the first alternate name node is in LOOKING state.
when the first standby name node is in a collapsed or LEADING state, for the second standby name node, if the election value of the second standby name node is the same as that of the first standby name node and the state of the first standby name node is in a LEADING state, judging whether the first standby name node is elected as a main name node by standby name nodes larger than a preset number, if so, quitting the election, if not, waiting for a preset time, and if not, electing the first standby name node as the main name node by the standby name nodes larger than the preset number, and initiating the next round of election. If the election value of the second backup name node is different from the election value of the first backup name node, two situations exist at this time, the first situation is that the election value of the second backup name node is larger than that of the first backup name node, at this time, re-election needs to be performed based on the metadata identification value, and the second situation is that the election value of the second backup name node is smaller than that of the first backup name node, at this time, the second backup name node needs to update the node information of itself.
The process of the backup name node selecting the primary name node is described below by way of an example:
(1) assuming that the backup name nodes include NN1, NN2, and NN3, the NN2 first discovers that a primary name node is down, the NN2 elects itself as the primary name node, and broadcasts its node information and information that elects itself as the primary name node to the NN1 and NN3, as shown in fig. 6 a.
(2) For the NN1, after receiving the data of NN2, it finds that its MetaID is 3 and that of NN2 is 2, i.e., its MetaID is updated a little, it makes an objection to election NN2 as the primary name node, adds 1 to the value of TransNumber, and broadcasts its node information and information electing itself as the primary name node to NN2 and NN 3. For the NN3, after receiving the data of NN2, it finds its MetaID older, updates its node information, and then broadcasts its node information and information for electing NN2 to become the main name node to NN1 and NN2, and the data transmission process is shown in fig. 6 b.
(3) as shown in fig. 6b, when the NN2 receives data from NN1, since the number of NN2 is smaller than the number of NN1 in the node information, NN1 is an updated election at this time, the number of NN2 is set to 2, and since the MetaID of NN2 is smaller than the MetaID in the node information of NN1, NN2 updates its MetaID, accepts NN1 as the primary name node, and broadcasts updated node information and information that election NN1 becomes the primary name node to NN1 and NN 3.
at this time, when the NN2 receives data from the NN3, the value of the number of the NN2 at this time is 2, and the number of the received data is 1, indicating that the NN3 is in an earlier election, the NN2 transmits its own information to the NN3, and at the same time, the NN1 also receives data from the NN3, and the number of the NN1 is 2, which is greater than the number of the received data, so the NN1 transmits its own information to the NN 3.
suppose that NN3 receives data from NN1 at this time, the received data has a transmumber of 2 greater than the transmumber of NN3 itself, and is therefore an updated election, and the MetaID of NN3 is 2 less than the MetaID of the received data, so NN3 updates its information, and sends information to NN1 and NN2 to agree that NN1 becomes the primary name node, and the data transfer process of each node at this time is as shown in fig. 6 c.
(4) the NN1 receives data from the NN2 when more than half of the nodes in its list have selected itself as the primary name node, so it changes its role to the primary name node and broadcasts information that it is the primary name node to the NN2 and NN3, exiting the election process as shown in fig. 6 d. The NN3 receives the election results of the NN1 and the NN2, and both the NN1 is selected as the main name node, so that the NN3 sets the role of the NN1 as the standby name node and broadcasts the standby name node to other nodes, and the election process is quitted; NN2 sets its role as an alternate name node as well, and the entire election process ends.
according to the service cluster provided by the invention, the standby name nodes can know whether the main name node fails or not by sending the test information to the main name node, and when the main name node fails, the standby name nodes can select the main name node through an election strategy. Namely, the service cluster and the HDFS cluster provided by the invention can autonomously discover the fault information of the main name node, and when a certain main name node has a fault, the standby name node becomes the main name node and does not depend on the Zookeeper cluster any more, the system architecture is simplified, and the operation and maintenance cost is greatly reduced.
the embodiment of the invention also provides an HDFS cluster, which may include the service cluster provided in any of the embodiments.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and device may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
the previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A service cluster, the service cluster comprising: the system comprises a plurality of main name nodes and at least one standby name node, wherein each name node maintains a main name node list, a standby name node list and a failure name node list;
The standby name node sends test information to each main name node according to a preset period to test whether each main name node works normally or not, when one main name node breaks down, the nth main name node which works normally in the main name node list is used as a target main name node, and a main application for selection is sent to the target main name node, wherein the main application for selection is used for applying for the main name node to become the main name node, the value of N is 1 … N, and N is the number of the main name nodes which work normally in the main name node list;
After receiving the primary application of selection sent by the standby name node, the target primary name node triggers each primary name node not to receive the data operation request of the client any more, adds the standby name node to its primary name node list, deletes the standby name node from its standby name node list, deletes the failed primary name node from its primary name node list, adds the failed primary name node to its failed name node list, and synchronizes information that the primary name node fails and the standby name node becomes the primary name node to other name nodes in the service cluster, so that the other name nodes update their lists based on the information.
2. The service cluster of claim 1, wherein each primary name node further maintains metadata;
And after the target main name node adds the standby name node to the main name node list of the target main name node, the target main name node sends metadata maintained by the target main name node to the standby name node which becomes the main name node.
3. The service cluster of claim 2, wherein the target primary name node triggers each primary name node in the service cluster to receive a client's data operation request after the other name nodes in the service cluster have updated the list and sent the self-maintained metadata to the backup name node that becomes the primary name node.
4. the service cluster of claim 1, wherein the backup name node is a plurality;
And when a plurality of standby name nodes simultaneously send the selected main application, the target main name node takes the standby name node corresponding to the first received selected main application as the main name node and adds the standby name node to the main name node list of the target main name node.
5. The service cluster of claim 1, wherein each primary name node periodically sends test information to each backup name node in the service cluster to test whether each backup name node is working properly, and when a target backup name node fails, the target backup name node is deleted from its own list of backup name nodes and added to its own list of failed name nodes, and simultaneously, the information that the target backup name node fails is synchronized to other name nodes in the service cluster.
6. The service cluster of claim 1, wherein the service cluster provides an interface for extending name nodes, and the service cluster receives a request for a new name node through the interface;
The target main name node receives the application of the new name node, adds the new name node to the standby name node list of the target main name node, and synchronizes information of the new name node serving as the standby name node to other name nodes in the service cluster;
Alternatively, the first and second electrodes may be,
And the target main name node receives the application of the newly added name node, sends test information to the newly added name node to test whether the newly added name node works normally, and adds the newly added name node to the standby name node list of the target main name node when the newly added name node works normally, so that the information of the newly added name node serving as the standby name node is synchronized to other name nodes in the service cluster.
7. The service cluster of claim 1 or 5, wherein the target primary name node sends test information to the failed name node in the failed name node list at a preset time interval to test whether the failed name node recovers to normal, when the failed name node recovers to normal, the failed name node recovering to normal is deleted from the own failed name list, and the failed name node recovering to normal is added to the own backup name node list, and the information that the failed name node recovers to normal is synchronized to other name nodes in the service cluster.
CN201611187600.9A 2016-12-20 2016-12-20 service cluster Active CN107046474B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611187600.9A CN107046474B (en) 2016-12-20 2016-12-20 service cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611187600.9A CN107046474B (en) 2016-12-20 2016-12-20 service cluster

Publications (2)

Publication Number Publication Date
CN107046474A CN107046474A (en) 2017-08-15
CN107046474B true CN107046474B (en) 2019-12-13

Family

ID=59542835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611187600.9A Active CN107046474B (en) 2016-12-20 2016-12-20 service cluster

Country Status (1)

Country Link
CN (1) CN107046474B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107920131A (en) * 2017-12-08 2018-04-17 郑州云海信息技术有限公司 A kind of metadata management method and device of HDFS storage systems
CN109995835A (en) * 2017-12-29 2019-07-09 浙江宇视科技有限公司 Host node electoral machinery, device and distributed memory system
CN110290159B (en) * 2018-03-19 2022-06-28 中移(苏州)软件技术有限公司 Method and equipment for scheduling management
CN111897697A (en) * 2020-08-11 2020-11-06 腾讯科技(深圳)有限公司 Server hardware fault repairing method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530200A (en) * 2012-07-04 2014-01-22 腾讯科技(深圳)有限公司 Server hot backup system and method
CN103580915A (en) * 2013-09-26 2014-02-12 东软集团股份有限公司 Method and device for determining main control node of trunking system
CN103581332A (en) * 2013-11-15 2014-02-12 武汉理工大学 HDFS framework and pressure decomposition method for NameNodes in HDFS framework
CN104320459A (en) * 2014-10-24 2015-01-28 杭州华三通信技术有限公司 Node management method and device
CN104461792A (en) * 2014-12-03 2015-03-25 浪潮集团有限公司 HA method for solving NAMENODE single-point fault of HADOOP distributed file system
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)
CN104951475A (en) * 2014-03-31 2015-09-30 中国电信股份有限公司 Distributed file system and implementation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9690671B2 (en) * 2013-11-01 2017-06-27 Cloudera, Inc. Manifest-based snapshots in distributed computing environments

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530200A (en) * 2012-07-04 2014-01-22 腾讯科技(深圳)有限公司 Server hot backup system and method
CN103580915A (en) * 2013-09-26 2014-02-12 东软集团股份有限公司 Method and device for determining main control node of trunking system
CN103581332A (en) * 2013-11-15 2014-02-12 武汉理工大学 HDFS framework and pressure decomposition method for NameNodes in HDFS framework
CN104951475A (en) * 2014-03-31 2015-09-30 中国电信股份有限公司 Distributed file system and implementation method
CN104320459A (en) * 2014-10-24 2015-01-28 杭州华三通信技术有限公司 Node management method and device
CN104461792A (en) * 2014-12-03 2015-03-25 浪潮集团有限公司 HA method for solving NAMENODE single-point fault of HADOOP distributed file system
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)

Also Published As

Publication number Publication date
CN107046474A (en) 2017-08-15

Similar Documents

Publication Publication Date Title
US10713134B2 (en) Distributed storage and replication system and method
CN107046474B (en) service cluster
US10728099B2 (en) Method for processing virtual machine cluster and computer system
US11271814B2 (en) Online capacity-expanding and online capacity-reducing methods and apparatuses for distributed consensus system
US10038593B2 (en) Method and system for recovering virtual network
CN107404509B (en) Distributed service configuration system and information management method
CN102394914A (en) Cluster brain-split processing method and device
CN103581276A (en) Cluster management device and system, service client side and corresponding method
CN109639773A (en) A kind of the distributed data cluster control system and its method of dynamic construction
CN108173971A (en) A kind of MooseFS high availability methods and system based on active-standby switch
CN107800551A (en) Redis group systems and its method, the client for improving reliability
CN111314117A (en) Fault transfer method, device, equipment and readable storage medium
EP3570169B1 (en) Method and system for processing device failure
CN113794765A (en) Gate load balancing method and device based on file transmission
CN105323271B (en) Cloud computing system and processing method and device thereof
CN112328685A (en) Full-peer distributed database data synchronization method
JP2003203018A (en) Pseudo cluster system using san
CN116346834A (en) Session synchronization method, device, computing equipment and computer storage medium
CN114301763A (en) Distributed cluster fault processing method and system, electronic device and storage medium
CN114363350A (en) Service management system and method
CN113890880A (en) Method, system, equipment and storage medium for data synchronization among multiple nodes
CN107153594B (en) HA component main selection method and system of distributed database system
WO2023169133A1 (en) Traffic disaster recovery backup method, traffic disaster recovery backup system and tof node
CN115242701B (en) Airport data platform cluster consumption processing method, device and storage medium
CN111866041B (en) Service equipment selection method, cloud storage cluster updating method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant