WO2018192533A1 - 节点设备运行方法、工作状态切换装置、节点设备及介质 - Google Patents

节点设备运行方法、工作状态切换装置、节点设备及介质 Download PDF

Info

Publication number
WO2018192533A1
WO2018192533A1 PCT/CN2018/083594 CN2018083594W WO2018192533A1 WO 2018192533 A1 WO2018192533 A1 WO 2018192533A1 CN 2018083594 W CN2018083594 W CN 2018083594W WO 2018192533 A1 WO2018192533 A1 WO 2018192533A1
Authority
WO
WIPO (PCT)
Prior art keywords
node device
state
node
running
voting request
Prior art date
Application number
PCT/CN2018/083594
Other languages
English (en)
French (fr)
Inventor
郭锐
李茂材
梁军
屠海涛
赵琦
王宗友
张建俊
朱大卫
刘斌华
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP18787276.7A priority Critical patent/EP3562123B1/en
Publication of WO2018192533A1 publication Critical patent/WO2018192533A1/zh
Priority to US16/510,723 priority patent/US10833919B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/004Error avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0772Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1051Group master selection mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1059Inter-group management mechanisms, e.g. splitting, merging or interconnection of groups
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3239Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/83Indexing scheme relating to error detection, to error correction, and to monitoring the solution involving signatures

Definitions

  • the present application relates to the field of network technologies, and in particular, to a node device operation method, a working state switching device, a node device, and a medium.
  • the BFT-Raft (Byzantine Fault Tolerance algorithm-Raft) algorithm can be applied to the node device.
  • the working state of the node device can be divided into three types: following state follower, candidate state candidate, and leader state leader.
  • following state follower When any of the node devices a is in the following state, it can be determined that the node device b is operating normally according to the heartbeat information broadcasted by the node device b running in the leader state in the cluster, and the log is copied based on the indication of the node device b.
  • the node device a does not receive the heartbeat information of the node device b for a period of time, it may be determined that the operation of the node device b has failed, the node device a may switch to the candidate state operation, and broadcast the voting request to each node in the cluster.
  • the device once receiving the voting of more than half of the node devices in the cluster, can switch to the state of the leader. It should be noted that, when the node device a running in the leadership state is running normally, if a voting request or heartbeat information is received, it is automatically ignored.
  • the cluster device may be split into two sub-clusters that are isolated from the network, such as sub-cluster A and sub-cluster B.
  • the sub-cluster A includes the node device a running in the leader state in the cluster, and the number of node devices in the sub-cluster A is smaller than The number of node devices in the sub-cluster B, the node devices in the sub-cluster B can vote for a new node device b in the lead state, and when the operation of the node device b fails, the sub-cluster B is in the candidate.
  • the node device in the state broadcasts the voting request again.
  • the voting request is ignored, even if a node device c in the sub-cluster B switches to the leader.
  • the node device a also ignores the heartbeat information of the node device c, and the node device a cannot work together with the sub-cluster B as a system, and the system has low operational reliability.
  • the embodiment of the present application provides a node device operation method, a working state switching device, a node device, and a medium, which can solve the problem of low work reliability.
  • the technical solution is as follows:
  • a method for operating a node device which is applied to a first node device, where the method includes:
  • the first node device receives a voting request of a plurality of second node devices, where the number of the plurality of second node devices is greater than a half of the number of node devices in the system; wherein the plurality of second node devices are the system a node device other than the first node device;
  • the latest log index in the voting request of the multiple second node devices switches the working state of the first node device from the leading state to the following state or candidate state.
  • a working state switching device is provided, the device being applied to a first node device, including:
  • a receiving module configured to receive a voting request of multiple second node devices, where the number of the multiple second node devices is greater than half of the number of node devices in the system;
  • An obtaining module configured to obtain, from the voting request of the multiple second node devices, the running cycle information and the latest log index if the first node device is running in the leader state;
  • a running module if the running period information in the voting request of the multiple second node devices is greater than the running period information of the first node device, and the latest one of the voting requests of the multiple second node devices
  • the log index is not less than the latest log index of the first node device, and the working state of the first node device is switched from the leading state to the following state or the candidate state.
  • a node device comprising:
  • One or more processors are One or more processors;
  • One or more memories for storing instructions executed by the one or more processors
  • the one or more processors are configured to execute the instructions to perform the steps of the node device operation method described below:
  • the running cycle information in the voting request of the multiple node devices is greater than the running cycle information of the node device, and the latest log index in the voting request of the multiple node devices is not less than the latest state of the node device. a log index that switches an operating state of the node device from the leading state to a following state or a candidate state.
  • a computer readable storage medium storing instructions executable by a processor in a device to perform the method of operating the node device.
  • the embodiment of the present application obtains the running cycle information and the latest log index in the voting request, and obtains the running cycle information if the node device receives multiple voting requests when the system is restored from the split to the normal. If the latest log records of the current node device are not smaller than the latest log index of the current node device, the operation is performed in the following state or in the candidate state, so that the node device running in the leader state in the first sub-cluster is configured.
  • the node state can be downgraded to a follow state or a candidate state, so that any node device in the first sub-cluster can participate in the election together with the node device in the second sub-cluster until the appearance of the node device in the new leader state.
  • a sub-cluster can work together with the second sub-cluster to work together as a system, improving the reliability of the system.
  • FIG. 1A is a schematic diagram of an implementation environment of a node device operation according to an embodiment of the present application
  • 1B is a schematic diagram of switching of an operating state of a node device according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for operating a node device according to an embodiment of the present application
  • FIG. 3 is a flowchart of a method for operating a node device according to an embodiment of the present application
  • FIG. 4 is a schematic block diagram of a working state switching device according to an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a working state switching apparatus according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a node device according to an embodiment of the present application.
  • FIG. 1A is a schematic diagram of an implementation environment of a node device running according to an embodiment of the present application.
  • the implementation environment is a system composed of a plurality of node devices, and the system is also equivalent to a cluster.
  • the node device 1 is a node device running in a leadership state in the system.
  • the heartbeat information such as the node device 2, the node device 3, and the node device 4 is periodically broadcast to each node device running in the following state, and each node device running in the following state can determine the node device 1 when receiving the heartbeat information. It runs normally and resets the timer (usually a random value between 0.5 and 1 second. This prevents the timers of each node device from being the same as the timer may cause repeated elections) and waits for the next heartbeat message.
  • the timer usually a random value between 0.5 and 1 second. This prevents the timers of each node device from being the same as the timer may cause repeated elections
  • the working state of each node device in the system can be dynamically switched.
  • FIG. 1B the embodiment of the present application provides a schematic diagram of switching the working state of the node device.
  • the node device running in the follow state does not receive the heartbeat information when the timer expires, it can be determined that the operation of the node device running in the leader state has failed, and the node device can be switched to the candidate state (candidate) Running; further, the node device can reset the timer and broadcast the voting request until more than half of the voting confirmation messages in the system are received to switch to a leader operation or receive a node device running in the leadership state.
  • the candidate state is maintained to start a new round of election; the node device running in the leadership state can discover the node device having higher running period information than itself. Switch to follow state operation.
  • the service command may be redirected to the node device 1, and the node device 1 broadcasts a log to each node device.
  • the request, the log addition request is used to request to add the service command to the log, and if the node device 1 can receive the confirmation message of the log request of each node device, the service command can be added to the service command of the client.
  • a log copy instruction is broadcast to each node device, so that each node device copies the service command into the log.
  • the system may be a transaction system based on blockchain technology.
  • the service command may be transaction information of the client, and the log stored by each node device may correspond to a blockchain, when adding transaction information. When it is in the log, the transaction information is actually stored in the next block of the current block. Since the data stored in the blockchain cannot be changed, the transaction information can be effectively prevented from being tampered and the transaction information can be improved. Sex.
  • Each node device in the system may form a sub-cluster separated by two networks, that is, the first sub-cluster and the second sub-cluster, and the number of node devices in the first sub-cluster is smaller than the first The number of node devices in the second sub-cluster, and the first sub-cluster includes node 1 running in the leadership state in the system. Further, the node device running in the following state in the first sub-cluster can continue to work normally according to the heartbeat information periodically broadcasted by the node device 1; the second sub-cluster is interrupted by the network of the node device 1, and the node running in the following state The device cannot receive the heartbeat information of the node device 1 when the timer expires.
  • a node device running in the following state switches to the candidate state and runs its own running cycle information.
  • the node device in the second sub-cluster receiving a voting request that is greater than half of the number of node devices in the system may switch to a leadership state operation, and broadcast the heartbeat information of the node device itself, the heartbeat information Carrying the running period information of the node device running in the leadership state, when the node running in the candidate state receives the heartbeat information, it can switch to the following state operation, and synchronize its own running cycle information into the running cycle carried by the heartbeat information.
  • each node device in the second sub-cluster will switch to the candidate state and perform the election again. If the second sub-cluster performs the election, the first sub-cluster and the second sub-cluster restore the network.
  • each node device in the first sub-cluster ignores the voting request from the node device in the second sub-cluster, even if the second sub-cluster is selected a node device of the new leader state, and the node device in the second sub-cluster can work according to the heartbeat information of the node device of the new leader state, but each node device in the first sub-cluster ignores the new leader state
  • the heartbeat information of the node device continues to work according to the heartbeat information of the node device 1, so that the first sub-cluster and the second sub-cluster cannot be restored into one system to work together, and the system has poor operational reliability.
  • FIG. 2 is a flowchart of a method for operating a node device according to an embodiment of the present application. Referring to FIG. 2, the method can be applied to the first node device 1 of the embodiment shown in FIG. 1A, including the following steps:
  • the first node device 1 receives a voting request of multiple second node devices, where the number of the multiple second node devices is greater than half of the number of node devices in the system.
  • the multiple second node devices may be node devices in the second sub-cluster in the embodiment shown in FIG. 1A.
  • the node device in the second sub-cluster is faulty, and the second node device does not receive the heartbeat information when the timer expires. Therefore, multiple Some or all of the node devices in the second node device are running in a candidate state, and generate a voting request based on information such as their own running cycle information, a last log index, and a node device identifier, and broadcast the voting request to the system.
  • the node device that receives the voting request determines whether the heartbeat information is received if the timer has not timed out, and if so, determines that the node device in the leader state is operating normally, and ignores the voting request, and if not, extracts
  • the running cycle information and the latest log index in the voting request are compared, and the extracted information is compared with the information of the own, and if the two are respectively greater than or equal to the information of the user, the node device corresponding to the node device identifier in the voting request is sent. Vote confirmation message, otherwise the voting request will be ignored. Once a node device has been voted, it will not vote for other node devices during the operation period.
  • the node device running in the leadership state in the first sub-cluster can work together with the second sub-cluster to improve the operational reliability of the cluster, and the following steps are performed.
  • the step may also be specifically: when the first node device 1 receives the first voting request, the timer is started to be timed; during the running process of the timer In the middle, continue to receive voting requests from other node devices, and stop receiving voting requests after the timer expires.
  • the first node device 1 can receive the voting request within the timer duration, and the timing duration can be the duration of one round of elections, if the number of voting requests received by the first node device 1 within the timing period is greater than If the number of node devices in the system is half, indicating that the system has split, and the first sub-cluster and the second sub-cluster have resumed network connection, and the second sub-cluster is performing election, proceed to step 202. Otherwise, the above situation Can not be confirmed, can ignore the received voting request, and continue to broadcast heartbeat information.
  • the first node device 1 acquires the operation cycle information and the latest log index from the voting requests of the plurality of second node devices.
  • the running period information refers to the running cycle number of the node device that sends the voting request.
  • the operation period information of the node device that is switched from the following state to the candidate state is incremented, and the node device that eventually becomes the leader state can carry the operation cycle information in the heartbeat information and broadcast it to other node devices, and receive the
  • the node device of the candidate state of the heartbeat information may switch to the following state operation, and synchronize its own running cycle information to the running cycle information in the heartbeat information, and may be based on the latest log index of the latest log index and the heartbeat information in the heartbeat information. Determine the log that is missing from itself and request the node device in the leader state to return the log that is missing from itself. Therefore, the run cycle information can characterize whether a node device is always in sync with the node device running in the lead state and is operating normally.
  • the latest log index refers to the index of the latest stored log of the node device that sends the voting request.
  • the log copy instruction may be broadcast to other node devices, so that the node device receiving the log copy instruction may synchronize the log of the node device of the leader state and the latest log index, and therefore, the latest log index may represent the log integrity of a node device.
  • the node device running in the leadership state is the node device with the best log integrity in its system.
  • the first node device 1 may separately extract the running cycle information and the latest log index of the corresponding protocol field location from the voting request according to the running cycle information and the latest log index in the protocol field position in the voting request.
  • the first node device 1 determines whether the running period information in the voting request of the multiple second node devices is greater than the running period information of the first node device 1, and if yes, performs step 204, if not, ignores multiple voting requests. .
  • the second sub-cluster is in the process of performing the election. Considering that the second sub-cluster has undergone an election after the split, the operation information of the second node in the second sub-cluster is compared with the first sub-cluster. The operation cycle information has one more information. Therefore, the operation cycle information can be used as one of the verification basis of the foregoing implementation scenario. If the operation cycle information in the voting request is greater than the operation cycle information of the first node device 1, the above implementation scenario is verified.
  • the first node device 1 may ignore Multiple voting requests and continue to broadcast heartbeat information.
  • the first node device 1 determines whether the latest log index in the voting request of the multiple second node devices is not less than the latest log index of the first node device 1, and if yes, switches the current working state from the leading state to the following state. If no, ignore multiple voting requests.
  • the second sub-cluster restores the network connection in the two sub-cluster after serving the client for a period of time.
  • the log stored by the node device in the second sub-cluster should be no less than the log stored by the node device in the first sub-cluster, and therefore the latest log index can be used as one of the basis for verifying the foregoing implementation scenario.
  • the latest log index in the voting request of the plurality of second node devices is not less than the latest log index of the first node device 1, indicating that the amount of logs stored by the node device corresponding to the voting request is equal to or greater than the first node device 1
  • the first node device 1 switches to the following state operation, and stops broadcasting the heartbeat information. If the latest log index in the voting request is smaller than its latest log index, the above implementation scenario If not confirmed, the multiple voting requests can be ignored and the heartbeat information can continue to be broadcast.
  • the heartbeat information can be stopped, the timer is reset, and the heartbeat information of the node device of the new leader state is waited, and if the heartbeat is not received even if the timer expires
  • the information can then be switched to the candidate state operation, and the voting request is broadcasted until it becomes the node device of the leader state, or switches to the following state when receiving the heartbeat information of the node device of the new leader state.
  • the first node device 1 can also switch the current working state to the candidate state operation, stop broadcasting the heartbeat information, and broadcast the voting request until it becomes the node device of the leader state, or receives the node device of the new leader state.
  • the heartbeat information When the heartbeat information is switched, it switches to the following state.
  • the node device running in the following state in the first sub-cluster can actively switch to the candidate state after the timer expires until it becomes the leader state in the system.
  • the node device switches to the following state when receiving the heartbeat information of the node device running in the leadership state in the system. Therefore, the foregoing node device operation method can also restore the first sub-cluster and the second sub-cluster to work in the original system, thereby improving the operational reliability of the system.
  • the embodiment of the present application does not specifically limit the timings of the steps 203 and 204 performed by the first node device 1.
  • the first node device 1 may also first judge the latest log index, and then run the operation. The period information is judged, or, in order to improve the judgment efficiency, and the first sub-cluster and the second sub-cluster are combined into one system work as soon as possible, the first node device 1 can also judge the latest log index and the operation cycle information at the same time, as long as The two respectively satisfy the above respective judgment conditions, and the first node device 1 can switch the current working state to the following state (or candidate state).
  • the embodiment of the present application obtains the running cycle information and the latest log index in the voting request when the plurality of voting requests are received, and if the obtained running cycle information is greater than the running cycle information of the current node device, and the latest log obtained is obtained. Not less than the latest log index of the current node device, running in the following state or candidate state, so that the node device running in the leader state in the first sub-cluster can be demoted to the following state or the candidate state, thereby making the first sub-cluster
  • the node device can participate in the election together with the node device in the second sub-cluster, and when the node device of the new leader state appears, the first sub-cluster can work together with the second sub-cluster as a system, thereby improving the system. The reliability of the work.
  • the first node device 1 determines the target node device according to the voting request of the multiple second node devices.
  • the target node device refers to the second node device that the first node device 1 tends to vote.
  • the second node device corresponding to any voting request satisfies the qualification of the node device that becomes the leader state, and therefore, the first node The device 1 may use the node device corresponding to the voting request in the preceding order as the target node device according to the receiving order of the voting request.
  • the first node device 1 may also adopt other determining methods when determining the target node device, which is not specifically limited in this embodiment of the present application.
  • the first node device 1 sends a voting confirmation message to the target node device in response to the voting request of the target node device.
  • the first node device 1 may generate a voting confirmation message based on its own node device identifier, and send a voting confirmation message to the target node device according to the node device identifier of the target node device.
  • the first node device 1 may send a signed vote confirmation message to the target node device.
  • Each node device in the system can be configured with its own private key and the public key of each node device. Therefore, when the target node device receives the voting confirmation message, the signature of the first node device 1 may be extracted, and the signature of the first node device 1 is verified by using the configured public key of the first node device 1. .
  • steps 205 and 206 are optional steps of the embodiment of the present application.
  • the first node device 1 may also not respond to any voting request, and the system may also select a leader state.
  • the node device when receiving the heartbeat information of the node device of the leader state, synchronizes its own running cycle information with the running cycle information carried by the heartbeat information, thereby re-combining with the second sub-cluster as a system operation.
  • the first node device 1 receives heartbeat information broadcast by a node device running in a leadership state.
  • the node device of any candidate state in the system can switch to the leadership state operation and broadcast its own heartbeat information, so that the first node device 1 can receive The heartbeat information.
  • the heartbeat information may carry the signature of each node device in the system in response to the voting request of the node device that is switched to the leader state, in order to prevent the node device from masquerading as the node device in the leadership state and improve the security of the system. . Therefore, when the first node device 1 receives the heartbeat information, the signature of each node device may be extracted, and the signature of the node device is verified by using the public key of any configured node device, if the signature of each node device is Both of them are verified, and the number of signatures passed by the verification is greater than half of the number of node devices in the system. This indicates that the heartbeat information is indeed from the node device running in the leadership state, and the timer can be reset and wait for the next heartbeat information.
  • the node device running in the leadership state can broadcast a log copy instruction, so that the first node device 1 can receive the log copy instruction broadcasted by the node device running in the leadership state, based on the log copy.
  • the instruction copies the log to add the most recently received service instructions to the log.
  • bft-raft not only solves node device consistency but also solves the problem of node device fraud, data being tampered with, lost or disordered.
  • the log copy instruction needs to carry each node device in the system in response to the running state in the leadership state. The signature of the node device when voting is requested, so that the first node device 1 can verify the log copy instruction and perform log copy after the verification is passed.
  • the node device running in the following state in the first sub-cluster does not receive the heartbeat information after the timer expires, so the handover is performed. Runs for the candidate state.
  • the node device that is switched to the candidate state in the first sub-cluster is equivalent to participating in the election with the node device in the second sub-cluster; when any of the node devices receives the greater than the When half of the node devices of the system vote, they can switch to the state of the leader and broadcast the heartbeat information. When other node devices in the system receive the heartbeat information, they can confirm the end of the election, switch to the following state, and switch to the state.
  • the running cycle information is synchronized with the running cycle information in the heartbeat information, and may subsequently work based on the heartbeat information or the log copying instruction of the node device of the leader state.
  • the election of the second sub-cluster is ended, and the node device in the second sub-cluster that is in the lead state can periodically broadcast the heartbeat information, and the second sub-cluster
  • the node device that has run in the candidate state can switch to the following state when receiving the heartbeat information for the first time, and synchronize its own running cycle information with the running cycle information in the heartbeat information; the first sub-cluster has no leadership state
  • the node device, the node device running in the following state may keep the following state when receiving the heartbeat information for the first time, and synchronize its own running cycle information with the running cycle information in the heartbeat information, and the node device running in the candidate state may When the heartbeat information is received for the first time, it switches to the following state, and synchronizes its own running cycle information with the running cycle information in the heartbeat information.
  • FIG. 2 is described by taking the first first node device 1 running in the leadership state as the execution subject in the first sub-cluster as an example. After the first node device 1 stops broadcasting the heartbeat information, the first sub-cluster is caused.
  • the node device (named first node device 5) running in the following state can be passively combined with the second sub-cluster into one system, in fact, in order to enable the first node device 5 to efficiently merge with the second sub-cluster A system works to improve the reliability of the cluster.
  • the first node device 5 can also apply the node device operation method provided by the embodiment of the present application.
  • FIG. 3 is a flowchart of a method for operating a node device according to an embodiment of the present application. . Referring to Figure 3, the method includes:
  • the first node device 5 receives a voting request of multiple second node devices, where the number of the multiple second node devices is greater than half of the number of node devices in the system.
  • step 201 The same as step 201, and will not be described here.
  • the first node device 5 acquires the running cycle information and the latest log index from the voting requests of the plurality of second node devices.
  • step 202 The same as step 202, and will not be described here.
  • the first node device 5 determines whether the running period information in the voting request of the multiple second node devices is greater than the running period information of the first node device 5, and if yes, step 204 is performed, and if not, multiple voting requests are ignored. .
  • step 203 The same as step 203, and will not be described here.
  • the first node device 5 determines whether the latest log index in the voting request of the multiple second node devices is not less than the latest log index of the first node device 5, and if yes, switches the current working state from the following state to the following Candidate status, if no, ignore multiple voting requests.
  • the first node device 5 needs to switch to the candidate state and broadcast the voting request until the heartbeat information of the node device of the new leader state is received, or switch to the following state, or until more than half of the node devices in the system are received. Switch to leadership status when voting for a request.
  • the first node device 5 can also maintain the following state.
  • the timer expires, it can automatically switch to the candidate state, until the heartbeat information of the node device of the new leader state is received, and the state is switched to the following state, or until the received greater than Half of the voting requests of the node devices in the system are switched to the leadership state.
  • the embodiment of the present application obtains the running cycle information and the latest log index in the voting request when the plurality of voting requests are received, and if the obtained running cycle information is greater than the running cycle information of the current node device, and the latest log obtained is obtained. Not less than the latest log index of the current node device, running in the following state or candidate state, so that the node device running in the leader state in the first sub-cluster can be demoted to the following state or the candidate state, thereby making the first sub-cluster
  • the node device can participate in the election together with the node device in the second sub-cluster, and the first sub-cluster can be re-combined with the second sub-cluster as a system when the node device of the new leader state appears. The reliability of the system.
  • the first node device 5 determines the target node device according to the voting request of the multiple second node devices.
  • step 205 The same as step 205, and will not be described here.
  • the first node device 5 sends a voting confirmation message to the target node device in response to the voting request of the target node device.
  • step 206 The same as step 206, and will not be described here.
  • the first node device 5 receives the heartbeat information broadcast by the node device running in the leadership state.
  • step 207 The same as step 207, and will not be described here.
  • the node device can continue to participate in the election to ensure the fairness of the overall election of the system.
  • FIG. 4 is a block diagram of a working state switching device according to an embodiment of the present application.
  • the working state switching device is applied to a first node device.
  • the device includes:
  • the receiving module 401 is configured to receive a voting request of multiple second node devices, where the number of the multiple second node devices is greater than half of the number of node devices in the system;
  • the obtaining module 402 is configured to: obtain the running cycle information and the latest log index from the voting requests of the multiple second node devices if the first node device is running in the leadership state;
  • the running module 403 is configured to: if the running period information in the voting request of the multiple second node devices is greater than the running period information of the first node device, and the latest log index in the voting request of the multiple second node devices is not less than The latest log index of the first node device switches the working state of the first node device from the leader state to the following state or candidate state.
  • the embodiment of the present application obtains the running cycle information and the latest log index in the voting request when the multiple voting requests are received, and if the acquired running cycle information is greater than the running cycle information of the first node device, and the latest log obtained is obtained. If the latest log index of the first node device is not less than, the node running in the following state or the candidate state is run, so that the node device running in the leader state in the first sub-cluster can be demoted to the following state or the candidate state, thereby making the first sub-cluster
  • the node devices in the second sub-cluster can participate in the election together with the node devices in the second sub-cluster. When the node device of the new leader state appears, the first sub-cluster can work together with the second sub-cluster to work together as one system. The reliability of the system.
  • the receiving module 401 is configured to: when receiving the first voting request, start a timer to perform timing; during the running of the timer, continue to receive the voting request until the timer expires, stopping Receive a voting request.
  • the obtaining module 402 is further configured to: obtain the running cycle information and the latest log index from the voting requests of the multiple second node devices if the first node device is running in the following state;
  • the running module 403 is further configured to: if the running cycle information in the voting request of the multiple second node devices is greater than the running cycle information of the first node device, and the latest log indexes in the voting requests of the multiple second node devices are not The latest log index smaller than the first node device switches the working state of the first node device from the following state to the candidate state or the following state.
  • the node device further includes:
  • a determining module 404 configured to determine, according to a voting request of multiple second node devices, a target node device
  • the sending module 405 is configured to send a voting confirmation message to the target node device in response to the voting request of the target node device.
  • the receiving module 401 is further configured to receive heartbeat information broadcast by a node device running in a leadership state; or
  • the receiving module 401 is further configured to receive a log copy instruction broadcast by the node device running in the leader state, and copy the log based on the log copy instruction.
  • the device provided by the foregoing embodiment performs the method for operating the node device
  • only the division of the foregoing functional modules is illustrated.
  • the function distribution may be completed by different functional modules as needed.
  • the internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the device provided by the foregoing embodiment and the node device operation method embodiment are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • FIG. 6 is a schematic structural diagram of a node device according to an embodiment of the present invention.
  • the node device 600 can be provided as a server.
  • Node device 600 includes a processing component 622 that further includes one or more processors, and memory resources represented by memory 632 for storing instructions executable by processing component 622, such as an application.
  • An application stored in memory 632 can include one or more modules each corresponding to a set of instructions.
  • processing component 622 is configured to execute instructions to perform the node device operation method described below.
  • the running cycle information in the voting request of the multiple node devices is greater than the running cycle information of the node device, and the latest log index in the voting request of the multiple node devices is not less than the latest state of the node device. a log index that switches an operating state of the node device from the leading state to a following state or a candidate state.
  • the one or more processors are configured to execute the instructions to perform the steps of:
  • the voting request continues to be received until the timer expires, and the receiving of the voting request is stopped.
  • the one or more processors are configured to execute the instructions to perform the steps of:
  • the running cycle information in the voting request of the multiple node devices is greater than the running cycle information of the node device, and the latest log index in the voting request of the multiple node devices is not less than the latest state of the node device.
  • the one or more processors are configured to execute the instructions to perform the steps of:
  • a voting confirmation message is sent to the target node device in response to the voting request of the target node device.
  • the one or more processors are configured to execute the instructions to perform the steps of:
  • Node device 600 may also include a power component 626 configured to perform power management of node device 600, a wired or wireless network interface 650 configured to connect node device 600 to the network, and an input/output (I/O) interface 658 .
  • Node device 600 may operate based on an operating system stored in the memory 632, for example, Windows Server TM, Mac OS X TM , Unix TM, Linux TM, FreeBSD TM or the like.
  • a computer readable storage medium storing instructions executable by a processor in a device to perform the node device operating method of the above embodiments.
  • the computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Hardware Redundancy (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Small-Scale Networks (AREA)
  • Selective Calling Equipment (AREA)

Abstract

本申请公开了一种节点设备运行方法、工作状态切换装置、节点设备及介质,属于网络技术领域。该方法包括:接收多个节点设备的投票请求,多个节点设备的数量大于系统中节点设备数量的半数;如果当前节点设备运行于领导状态,则从多个节点设备的投票请求中获取运行周期信息和最新日志索引;如果多个节点设备的投票请求中的运行周期信息均大于当前节点设备的运行周期信息,且多个节点设备的投票请求中的最新日志索引均不小于当前节点设备的最新日志索引,将当前工作状态从领导状态切换至跟随状态或候选状态。本申请使得第一子集群可以和第二子集群合为一个系统共同工作,提高了系统的工作可靠性。

Description

节点设备运行方法、工作状态切换装置、节点设备及介质
本申请要求于2017年4月20日提交中国国家知识产权局、申请号为2017102624639、发明名称为“节点设备运行方法及节点设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及网络技术领域,特别涉及一种节点设备运行方法、工作状态切换装置、节点设备及介质。
背景技术
随着网络技术的发展,基于集群为客户端提供服务的方式越来越普遍。为了保证集群中各个节点设备保持一致性,节点设备运行时一般可以应用BFT-Raft(Byzantine Fault Tolerance algorithm-Raft,拜占庭容错筏算法)。
根据BFT-Raft,节点设备的工作状态可以分为三种:跟随状态follower、候选状态candidate和领导状态leader。当任一节点设备a处于跟随状态时,可以根据该集群中运行于领导状态的节点设备b所广播的心跳信息,确定该节点设备b运行正常,并基于节点设备b的指示来复制日志。当节点设备a在一段时间内未接收到节点设备b的心跳信息,可以确定节点设备b的运行发生了故障,则节点设备a可以切换为候选状态运行,将投票请求广播至集群中的各个节点设备,一旦接收到该集群中半数以上的节点设备的投票,节点设备a可以切换为领导状态运行。需要说明的是,在运行于领导状态的节点设备a运行正常的情况下,如果接收到投票请求或心跳信息,会自动忽略。
在实现本申请的过程中,发明人发现现有技术至少存在以下问题:
由于集群可能分裂成网络相隔离的两个子集群,如,子集群A和子集群B,该子集群A中包括该集群中运行于领导状态的节点设备a,且子集群A中的节点设备数量小于子集群B中的节点设备数量,则子集群B中的节点设备可以通过投票选出一个新的领导状态的节点设备b,而当节点设备b的运行发生了故障时,子集群B中处于候选状态的节点设备会再次广播投票请求,如果子集 群A与子集群B此时恢复网络连接,由于节点设备a运行正常,会忽略投票请求,即使子集群B中的某一节点设备c切换为领导状态运行,节点设备a也会忽略该节点设备c的心跳信息,导致节点设备a无法与子集群B合为一个系统共同工作,系统的工作可靠性低。
发明内容
本申请实施例提供了一种节点设备运行方法、工作状态切换装置、节点设备及介质,能够解决工作可靠性低的问题。所述技术方案如下:
一方面,提供了一种节点设备运行方法,应用于第一节点设备,所述方法包括:
所述第一节点设备接收多个第二节点设备的投票请求,所述多个第二节点设备的数量大于系统中节点设备数量的半数;其中,所述多个第二节点设备为所述系统中所述第一节点设备以外的节点设备;
如果所述第一节点设备运行于领导状态,则从所述多个第二节点设备的投票请求中获取运行周期信息和最新日志索引;
如果所述多个第二节点设备的投票请求中的运行周期信息均大于所述第一节点设备的运行周期信息,且所述多个第二节点设备的投票请求中的最新日志索引均不小于所述第一节点设备的最新日志索引,将所述第一节点设备的工作状态从所述领导状态切换至跟随状态或候选状态。
一方面,提供了一种工作状态切换装置,该装置应用于第一节点设备,包括:
接收模块,用于接收多个第二节点设备的投票请求,所述多个第二节点设备的数量大于系统中节点设备数量的半数;
获取模块,用于如果第一节点设备运行于领导状态,则从所述多个第二节点设备的投票请求中获取运行周期信息和最新日志索引;
运行模块,用于如果所述多个第二节点设备的投票请求中的运行周期信息均大于所述第一节点设备的运行周期信息,且所述多个第二节点设备的投票请求中的最新日志索引均不小于所述第一节点设备的最新日志索引,将所述第一节点设备的工作状态从所述领导状态切换至跟随状态或候选状态。
一方面,提供了一种节点设备,所述节点设备包括:
一个或多个处理器;
一个或多个存储器,所述一个或多个存储器用于存储由所述一个或多个处理器执行的指令;
所述一个或多个处理器被配置为执行所述指令,以执行下述节点设备运行方法的步骤:
接收多个节点设备的投票请求,所述多个节点设备的数量大于系统中节点设备数量的半数;
如果所述节点设备运行于领导状态,则从所述多个节点设备的投票请求中获取运行周期信息和最新日志索引;
如果所述多个节点设备的投票请求中的运行周期信息均大于所述节点设备的运行周期信息,且所述多个节点设备的投票请求中的最新日志索引均不小于所述节点设备的最新日志索引,将所述节点设备的工作状态从所述领导状态切换至跟随状态或候选状态。
一方面,提供了一种计算机可读存储介质,该计算机可读存储介质存储有指令,上述指令可由设备中的处理器执行以完成上述节点设备运行方法。
本申请实施例通过在系统从分裂恢复到正常的场景下,对于节点设备来说,在接收到多个投票请求时,获取投票请求中的运行周期信息和最新日志索引,如果获取的运行周期信息均大于当前节点设备的运行周期信息,且获取的最新日志索均不小于当前节点设备的最新日志索引,则以跟随状态运行或候选状态运行,使得第一子集群中运行于领导状态的节点设备可以降级为跟随状态或候选状态,进而使得第一子集群中的任一个节点设备均可以与第二子集群中的节点设备共同参与选举,直到新的领导状态的节点设备的出现时,该第一子集群可以和第二子集群重新合为一个系统共同工作,提高了系统的工作可靠性。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1A是本申请实施例提供的一种节点设备运行的实施环境示意图;
图1B是本申请实施例提供的一种节点设备工作状态的切换示意图;
图2是本申请实施例提供的一种节点设备运行方法的流程图;
图3是本申请实施例提供的一种节点设备运行方法的流程图;
图4是本申请实施例提供的一种工作状态切换装置的模块示意图;
图5是本申请实施例提供的一种工作状态切换装置的模块示意图;
图6是本申请实施例提供的一种节点设备结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
图1A是本申请实施例提供的一种节点设备运行的实施环境示意图。参见图1A,该实施环境为一个由多个节点设备构成的系统,该系统也相当于一个集群,节点设备1为该系统中运行于领导状态的节点设备,在节点设备1运行正常时,可以定时地向各个运行于跟随状态的节点设备广播心跳信息,如,节点设备2、节点设备3和节点设备4,每个运行于跟随状态的节点设备在接收到心跳信息时,可以确定节点设备1运行正常,并重置定时器(一般为0.5-1秒之间的随机值,这样可以避免各个节点设备的定时器的计时时长相同可能造成反复选举的情况),等待下一次心跳信息。
事实上,系统中各个节点设备的工作状态是可以动态切换的,参见图1B,本申请实施例提供了一种节点设备工作状态的切换示意图。一旦运行于跟随状态(follower)的节点设备在定时器超时的情况下没有接收到心跳信息,可以确定运行于领导状态的节点设备的运行发生了故障,则该节点设备可以切换为候选状态(candidate)运行;进而,节点设备可以重置定时器,并广播投票请求,直到接收到该系统中半数以上的投票确认消息切换为领导状态(leader)运行,或者接收到运行于领导状态的节点设备的心跳信息时切换为跟随状态运行,或者定时器超时的情况下保持候选状态开始新一轮选举;运行于领导状态的节点设备可以在发现比自身具有更高运行周期信息(term)的节点设备时切换为跟随状态运行。
在该系统为客户端提供服务时,当该系统中的任一节点设备接收到客户端的服务命令时,可以将该服务命令重定向至节点设备1,由节点设备1向各个节点设备广播日志添加请求,该日志添加请求用于请求将该服务命令添加到日志中,如果节点设备1可以接收到各个节点设备对日志添加请求的确认消息, 可以响应该客户端的服务命令,将该服务命令添加到日志中,并向各个节点设备广播日志复制指令,使得各个节点设备将该服务命令复制到日志中。在实际的应用场景中,该系统可以是底层基于区块链技术的交易系统,该服务命令可以为客户端的交易信息,每个节点设备所存储的日志可以对应一条区块链,当添加交易信息到日志中时,实际是将该交易信息存储到当前区块的下一区块中,由于已存储至区块链中的数据不可更改,可以有效地防止交易信息被篡改,提高交易信息的安全性。
由于网络中断等原因,该系统可能发生分裂,系统中的各个节点设备可以形成两个网络相隔的子集群,即第一子集群和第二子集群,且第一子集群的节点设备数量小于第二子集群的节点设备数量,该第一子集群中包括该系统中运行于领导状态的节点1。进而,该第一子集群中运行于跟随状态的节点设备可以依据该节点设备1定时广播的心跳信息继续正常工作;第二子集群由于和节点设备1的网络中断,其中运行于跟随状态的节点设备在定时器超时的情况下也不能接收到节点设备1的心跳信息,依据bft-raft的超时选举机制,某个运行于跟随状态的节点设备会切换为候选状态运行,将自身的运行周期信息加一,并广播投票请求,该第二子集群中接收到大于该系统中节点设备数量的一半的投票请求的节点设备可以切换为领导状态运行,并广播节点设备自身的心跳信息,该心跳信息携带该运行于领导状态的节点设备的运行周期信息,当运行于候选状态的节点接收到心跳信息时,可以切换为跟随状态运行,并将自身的运行周期信息同步为心跳信息所携带的运行周期信息;当第二子集群中运行于领导状态的节点设备运行故障时,该第二子集群中的各个节点设备将切换为候选状态运行,并再次进行选举,如果在第二子集群进行选举的过程中,第一子集群和第二子集群恢复网络连接,由于第一子集群中的节点设备1运行正常,该第一子集群中的各个节点设备均会忽略来自于第二子集群中的节点设备的投票请求,即使该第二子集群选出新的领导状态的节点设备,且该第二子集群中的节点设备可以按照该新的领导状态的节点设备的心跳信息工作,但该第一子集群中的各个节点设备会忽略新的领导状态的节点设备的心跳信息,并继续按照节点设备1的心跳信息继续工作,导致第一子集群和第二子集群无法恢复成一个系统共同工作,系统的工作可靠性差。
图2是本申请实施例提供的一种节点设备运行方法的流程图。参见图2, 该方法可以应用于图1A所示实施例的第一节点设备1,包括以下步骤:
201、第一节点设备1接收多个第二节点设备的投票请求,多个第二节点设备的数量大于系统中节点设备数量的半数。
其中,该多个第二节点设备可以为图1A所示实施例中第二子集群中的节点设备。由于该第二子集群中原有的领导状态的节点设备运行故障,该多个第二节点设备在自身的定时器超时的情况下也没有接收到心跳信息,因此,该第二子集群中多个第二节点设备中的部分或全部节点设备正在以候选状态运行,并基于自身的运行周期信息、最新日志索引(last log index)和节点设备标识等信息生成投票请求,将投票请求广播至该系统中的各个节点设备。一般地,接收到投票请求的节点设备会判断是否在定时器未超时的情况下接收到心跳信息,如果是,则确定领导状态的节点设备运行正常,并忽略该投票请求,如果否,则提取投票请求中的运行周期信息和最新日志索引,并将提取的信息分别与自身的信息进行比较,如果二者分别大于等于自身的信息,则向该投票请求中的节点设备标识对应的节点设备发送投票确认消息,否则也会忽略该投票请求,一旦已经为某一节点设备投票,则在该运行周期内都不会再为其他节点设备投票。当然,如果是运行于领导状态的节点设备接收到投票请求,则会自动忽略该投票请求。本申请实施例为使第一子集群中运行于领导状态的节点设备能够与第二子集群合为一个系统工作,提高集群的工作可靠性,进行以下步骤。
考虑到接收到一个或少数个投票请求的情况可能为该投票请求来自于伪装成候选状态的节点设备,因此需要排除这种情况。并且,为了初步印证该系统目前是否处于这种情况:分裂后的子集群之间已恢复网络连接,且第二子集群内的节点设备正在进行选举。则节点设备需要对接收到的投票请求的数量进行限制,也即是,理应接收到大于该系统中的节点设备数量的半数的投票请求。
在实际的场景中,由于每一轮选举都有时限,因此该步骤也可以具体为:当第一节点设备1接收到第一个投票请求后,启动定时器进行计时;在定时器的运行过程中,继续接收其他节点设备的投票请求,直到定时器超时后,停止接收投票请求。也就是说,第一节点设备1可以接收定时器计时时长内的投票请求,该计时时长可以为一轮选举的时长,如果第一节点设备1在该计时时长内接收到的投票请求的数量大于该系统中节点设备的数量的半数,说明该系统发生过分裂,且第一子集群和第二子集群已恢复网络连接,且第二子集群正在 进行选举,则进行步骤202,否则,上述情况不能得到印证,可以忽略接收到的投票请求,并继续广播心跳信息。
202、如果第一节点设备1运行于领导状态,则第一节点设备1从多个第二节点设备的投票请求中获取运行周期信息和最新日志索引。
其中,运行周期信息是指发送该投票请求的节点设备当前所处的运行周期号。每次进行选举时,由跟随状态切换为候选状态的节点设备的运行周期信息会加一,最终成为领导状态的节点设备可以将运行周期信息携带在心跳信息中并广播给其他节点设备,接收到心跳信息的候选状态的节点设备可以切换为跟随状态运行,并将自身的运行周期信息同步为该心跳信息中的运行周期信息,而且可以根据自身的最新日志索引和心跳信息中的最新日志索引,确定自身缺少的日志,并请求领导状态的节点设备返回自身缺少的日志。因此,该运行周期信息可以表征一个节点设备是否始终与运行于领导状态的节点设备保持同步且运行正常。
最新日志索引是指发送该投票请求的节点设备最新存储的日志的索引,每次运行于领导状态的节点设备添加新的日志后,该最新日志索引加一,且该运行于领导状态的节点设备可以将日志复制指令广播给其他节点设备,使得接收到日志复制指令的节点设备可以同步该领导状态的节点设备的日志和最新日志索引,因此,该最新日志索引可以表征一个节点设备的日志完整性,显然,运行于领导状态的节点设备为在其系统中日志完整性最好的节点设备。
该步骤中,第一节点设备1可以分别按照运行周期信息和最新日志索引在投票请求中的协议字段位置,从投票请求中分别提取出对应协议字段位置的运行周期信息和最新日志索引。
203、第一节点设备1判断多个第二节点设备的投票请求中的运行周期信息是否均大于第一节点设备1的运行周期信息,如果是,执行步骤204,如果否,忽略多个投票请求。
该步骤中,为了进一步印证该系统发生过分裂,且第二子集群在分裂后曾经选出过领导状态的节点设备并已运行故障,且第一子集群和第二子集群已恢复网络连接,且第二子集群正在进行选举的实施场景,考虑到第二子集群在分裂后经历过一次选举,因此,该第二子集群中的第二节点设备的运行周期信息已相比第一子集群的运行周期信息多一,因此,运行周期信息可以作为上述实施场景的印证依据之一,如果投票请求中的运行周期信息均大于第一节点设备 1的运行周期信息,上述实施场景得到印证,则继续执行步骤204,如果该投票请求中的运行周期信息不大于自身的运行周期信息,说明该投票请求对应的节点设备很可能运行故障,且不符合上述实施场景,则第一节点设备1可以忽略多个投票请求,并继续广播心跳信息。
204、第一节点设备1判断多个第二节点设备的投票请求中的最新日志索引是否均不小于第一节点设备1的最新日志索引,如果是,将当前工作状态从领导状态切换至跟随状态,如果否,忽略多个投票请求。
考虑到在系统分裂之前,该系统中的各个节点设备的日志理应与第一节点设备1的日志同步,因此,第二子集群在为客户端服务了一段时间后,在两个子集群恢复网络连接时,第二子集群中的节点设备所存储的日志应该不少于该第一子集群中的节点设备所存储的日志,也因此可以将最新日志索引作为印证上述实施场景的依据之一,如果多个第二节点设备的投票请求中的最新日志索引是否均不小于第一节点设备1的最新日志索引,说明该投票请求对应的节点设备已存储的日志量等于或多于第一节点设备1的日志量,上述实施场景最终得到各项印证,因此第一节点设备1切换为跟随状态运行,并停止广播心跳信息,如果该投票请求中的最新日志索引小于自身的最新日志索引,上述实施场景没有得到印证,则可以忽略该多个投票请求,并继续广播心跳信息。
当第一节点设备1切换为跟随状态运行时,可以停止广播心跳信息,重置定时器,并等待新的领导状态的节点设备的心跳信息,如果在定时器超时的情况下也没有接收到心跳信息,则可以再切换为候选状态运行,并广播投票请求,直到自身成为领导状态的节点设备,或者接收到新的领导状态的节点设备的心跳信息时切换为跟随状态运行。
事实上,第一节点设备1也可以将当前工作状态切换为候选状态运行,停止广播心跳信息,且广播投票请求,直到自身成为领导状态的节点设备,或者接收到新的领导状态的节点设备的心跳信息时切换为跟随状态运行。
需要说明的是,一旦第一节点设备1停止广播心跳信息,第一子集群中运行于跟随状态的节点设备可以在定时器超时后主动切换为候选状态运行,直到自身成为该系统中领导状态的节点设备,或者接收到该系统中运行于领导状态的节点设备的心跳信息时切换为跟随状态运行。因此,上述节点设备运行方法还可以使得第一子集群和第二子集群恢复为原来的系统进行工作,提高该系统的工作可靠性。
另外,需要说明的是,本申请实施例对第一节点设备1执行步骤203和204的时序不做具体限定,事实上,第一节点设备1也可以先对最新日志索引进行判断,再对运行周期信息进行判断,或者,为了提高判断效率,并尽快使得第一子集群和第二子集群合为一个系统工作,第一节点设备1也可以同时对最新日志索引和运行周期信息进行判断,只要二者分别满足上述各自的判断条件,第一节点设备1即可将当前工作状态切换至跟随状态(或候选状态)。
本申请实施例通过在接收到多个投票请求时,获取投票请求中的运行周期信息和最新日志索引,如果获取的运行周期信息均大于当前节点设备的运行周期信息,且获取的最新日志索均不小于当前节点设备的最新日志索引,则以跟随状态运行或候选状态运行,使得第一子集群中运行于领导状态的节点设备可以降级为跟随状态或候选状态,进而使得第一子集群中的节点设备均可以与第二子集群中的节点设备共同参与选举,直到新的领导状态的节点设备的出现时,该第一子集群可以和第二子集群合为一个系统共同工作,提高了系统的工作可靠性。
205、第一节点设备1根据多个第二节点设备的投票请求,确定目标节点设备。
其中,目标节点设备是指该第一节点设备1趋于投票的第二节点设备。该步骤中,经过步骤203和步骤204的判断过程,对该第一节点设备1来说,任一投票请求对应的第二节点设备均满足成为领导状态的节点设备的资格,因此,第一节点设备1可以按照投票请求的接收顺序,将接收顺序在前的投票请求对应的节点设备作为目标节点设备。当然,第一节点设备1在确定目标节点设备时还可以采用其他确定方法,本申请实施例对此不做具体限定。
206、第一节点设备1响应于目标节点设备的投票请求,向目标节点设备发送投票确认消息。
该步骤中,第一节点设备1可以基于自身的节点设备标识,生成投票确认消息,并按照目标节点设备的节点设备标识将投票确认消息发送至目标节点设备。
当然,为使目标节点设备能够验证投票者的身份,提高系统安全性,第一节点设备1可以携带有签名的投票确认消息发送至目标节点设备。该系统中的每个节点设备可以配置有自身的私钥以及各个节点设备的公钥。因此,当目标节点设备接收到该投票确认消息时,可以提取出该第一节点设备1的签名,采 用已配置的该第一节点设备1的公钥对该第一节点设备1的签名进行验证。
需要说明的是,步骤205和206是本申请实施例的可选步骤。事实上,由于第二子集群的节点设备数量大于该系统的节点设备数量的半数,则第一节点设备1也可以不对任一投票请求进行响应,则该系统中也能选出一个领导状态的节点设备,并且在接收到该领导状态的节点设备的心跳信息时,将自身的运行周期信息同步为该心跳信息携带的运行周期信息,从而与该第二子集群重新合为一个系统工作。
207、第一节点设备1接收运行于领导状态的节点设备所广播的心跳信息。
一旦该系统中任一候选状态的节点设备接收到大于该系统中节点设备的半数的投票确认消息时,可以切换为领导状态运行,并广播自身的心跳信息,使得第一节点设备1可以接收到该心跳信息。
其中,为了避免有的节点设备伪装成领导状态的节点设备,提高系统的安全性,该心跳信息可以携带该系统中的各个节点设备在响应该切换为领导状态的节点设备的投票请求时的签名。因此,当第一节点设备1接收到该心跳信息时,可以提取出各个节点设备的签名,采用已配置的任一节点设备的公钥对该节点设备的签名进行验证,如果各个节点设备的签名均验证通过,且验证通过的签名数量大于该系统中节点设备数量的半数,说明该心跳信息确实来自运行于领导状态的节点设备,则可以重置定时器,并等待下一次心跳信息。
事实上,为了保证系统的一致性,该运行于领导状态的节点设备可以广播日志复制指令,使得该第一节点设备1可以接收运行于领导状态的节点设备所广播的日志复制指令,基于日志复制指令复制日志,从而将该系统最新接收到的服务指令添加到日志中。当然,基于bft-raft不仅解决节点设备一致性而且解决了节点设备欺诈,数据被篡改、丢失或顺序错乱的问题,该日志复制指令需携带系统中的各个节点设备在响应该运行于领导状态的节点设备的投票请求时的签名,使得第一节点设备1可以对该日志复制指令进行验证,并在验证通过后进行日志复制。
以下对第一节点设备1切换至跟随状态(或候选状态)后该系统的工作情况进行具体说明:
当第一节点设备1切换为跟随状态(或候选状态)运行时,由于停止广播心跳信息,该第一子集群中运行于跟随状态的节点设备在定时器超时后没有接收到心跳信息,因此切换为候选状态运行。
如果此时第二子集群的选举尚未结束,则第一子集群中切换为候选状态的节点设备相当于与第二子集群中的节点设备共同参与选举;当其中任一节点设备接收到大于该系统的节点设备的半数的投票时,可以切换为领导状态运行,并广播心跳信息,当该系统中的其他节点设备接收到该心跳信息时,可以确认选举结束,切换为跟随状态,并将自身的运行周期信息与该心跳信息中的运行周期信息同步,后续可以基于该领导状态的节点设备的心跳信息或日志复制指令等进行工作。
如果第一子集群中运行于跟随状态的节点设备切换为候选状态后,第二子集群的选举已结束,该第二子集群中成为领导状态的节点设备可以定时广播心跳信息,第二子集群中曾运行于候选状态的节点设备在首次接收到该心跳信息时可以切换为跟随状态,并将自身的运行周期信息与该心跳信息中的运行周期信息同步;该第一子集群由于没有领导状态的节点设备,运行于跟随状态的节点设备可以在首次接收到该心跳信息时保持跟随状态,并将自身的运行周期信息与该心跳信息中的运行周期信息同步,运行于候选状态的节点设备可以在首次接收到该心跳信息时切换为跟随状态,并将自身的运行周期信息与该心跳信息中的运行周期信息同步。
以上图2实施例是以第一子集群中运行于领导状态的第一第一节点设备1为执行主体为例进行说明,在第一节点设备1停止广播心跳信息后,使得该第一子集群中运行于跟随状态的节点设备(命名为第一节点设备5)可以被动地与第二子集群合为一个系统,事实上,为使第一节点设备5可以高效地和第二子集群合为一个系统工作,提高集群的可靠性,该第一节点设备5也可以应用本申请实施例提供的节点设备运行方法,例如,图3是本申请实施例提供的一种节点设备运行方法的流程图。参见图3,该方法包括:
301、第一节点设备5接收多个第二节点设备的投票请求,多个第二节点设备的数量大于系统中节点设备数量的半数。
与步骤201同理,在此不做赘述。
302、如果第一节点设备5运行于跟随状态,则第一节点设备5从多个第二节点设备的投票请求中获取运行周期信息和最新日志索引。
与步骤202同理,在此不做赘述。
303、第一节点设备5判断多个第二节点设备的投票请求中的运行周期信 息是否均大于第一节点设备5的运行周期信息,如果是,执行步骤204,如果否,忽略多个投票请求。
与步骤203同理,在此不做赘述。
304、第一节点设备5判断多个第二节点设备的投票请求中的最新日志索引是否均不小于第一节点设备5的最新日志索引,如果是,将当前工作状态从所述跟随状态切换至候选状态,如果否,忽略多个投票请求。
与步骤204同理。但该第一节点设备5需切换至候选状态,并广播投票请求,直到接收到新的领导状态的节点设备的心跳信息时切换为跟随状态,或者直到接收到大于该系统中节点设备的半数的投票请求时切换为领导状态。
当然,该第一节点设备5也可以保持跟随状态,当定时器超时,可以自动切换为候选状态,直到接收到新的领导状态的节点设备的心跳信息时切换为跟随状态,或者直到接收到大于该系统中节点设备的半数的投票请求时切换为领导状态。
本申请实施例通过在接收到多个投票请求时,获取投票请求中的运行周期信息和最新日志索引,如果获取的运行周期信息均大于当前节点设备的运行周期信息,且获取的最新日志索均不小于当前节点设备的最新日志索引,则以跟随状态运行或候选状态运行,使得第一子集群中运行于领导状态的节点设备可以降级为跟随状态或候选状态,进而使得第一子集群中的节点设备均可以与第二子集群中的节点设备共同参与选举,直到新的领导状态的节点设备的出现时,该第一子集群可以和第二子集群重新合为一个系统共同工作,提高了系统的工作可靠性。
305、第一节点设备5根据多个第二节点设备的投票请求,确定目标节点设备。
与步骤205同理,在此不做赘述。
306、第一节点设备5响应于目标节点设备的投票请求,向目标节点设备发送投票确认消息。
与步骤206同理,在此不做赘述。
307、第一节点设备5接收运行于领导状态的节点设备所广播的心跳信息。
与步骤207同理,在此不做赘述。
当然,该节点设备还可以继续参与选举,以保证系统整体选举的公正性。
图4是本申请实施例提供的一种工作状态切换装置的模块示意图,该工作状态切换装置应用于第一节点设备,参见图4,该装置包括:
接收模块401,用于接收多个第二节点设备的投票请求,多个第二节点设备的数量大于系统中节点设备数量的半数;
获取模块402,用于如果第一节点设备运行于领导状态,则从多个第二节点设备的投票请求中获取运行周期信息和最新日志索引;
运行模块403,用于如果多个第二节点设备的投票请求中的运行周期信息均大于第一节点设备的运行周期信息,且多个第二节点设备的投票请求中的最新日志索引均不小于第一节点设备的最新日志索引,将第一节点设备的工作状态从领导状态切换至跟随状态或候选状态。
本申请实施例通过在接收到多个投票请求时,获取投票请求中的运行周期信息和最新日志索引,如果获取的运行周期信息均大于第一节点设备的运行周期信息,且获取的最新日志索均不小于第一节点设备的最新日志索引,则以跟随状态运行或候选状态运行,使得第一子集群中运行于领导状态的节点设备可以降级为跟随状态或候选状态,进而使得第一子集群中的节点设备均可以与第二子集群中的节点设备共同参与选举,直到新的领导状态的节点设备的出现时,该第一子集群可以和第二子集群合为一个系统共同工作,提高了系统的工作可靠性。
在一种可能实现方式中,接收模块401用于:当接收到第一个投票请求后,启动定时器进行计时;在定时器的运行过程中,继续接收投票请求,直到定时器超时后,停止接收投票请求。
在一种可能实现方式中,获取模块402还用于:如果第一节点设备运行于跟随状态,则从多个第二节点设备的投票请求中获取运行周期信息和最新日志索引;
运行模块403还用于:如果多个第二节点设备的投票请求中的运行周期信息均大于第一节点设备的运行周期信息,且多个第二节点设备的投票请求中的最新日志索引均不小于第一节点设备的最新日志索引,将第一节点设备的工作状态从跟随状态切换至候选状态或保持跟随状态。
在一种可能实现方式中,基于图4的装置组成,参见图5,节点设备还包括:
确定模块404,用于根据多个第二节点设备的投票请求,确定目标节点设 备;
发送模块405,用于响应于目标节点设备的投票请求,向目标节点设备发送投票确认消息。
在一种可能实现方式中,接收模块401,还用于接收运行于领导状态的节点设备所广播的心跳信息;或,
接收模块401,还用于接收运行于领导状态的节点设备所广播的日志复制指令,基于日志复制指令复制日志。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的装置在执行节点设备运行方法时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与节点设备运行方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图6是本发明实施例提供的一种节点设备结构示意图。参见图6,该节点设备600可以被提供为一服务器。节点设备600包括处理组件622,其进一步包括一个或多个处理器,以及由存储器632所代表的存储器资源,用于存储可由处理部件622的执行的指令,例如应用程序。存储器632中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件622被配置为执行指令,以执行下述节点设备运行方法。
接收多个节点设备的投票请求,所述多个节点设备的数量大于系统中节点设备数量的半数;
如果所述节点设备运行于领导状态,则从所述多个节点设备的投票请求中获取运行周期信息和最新日志索引;
如果所述多个节点设备的投票请求中的运行周期信息均大于所述节点设备的运行周期信息,且所述多个节点设备的投票请求中的最新日志索引均不小于所述节点设备的最新日志索引,将所述节点设备的工作状态从所述领导状态切换至跟随状态或候选状态。
在一种可能实现方式中,所述一个或多个处理器被配置为执行所述指令,以执行下述步骤:
当接收到第一个投票请求后,启动定时器进行计时;
在所述定时器的运行过程中,继续接收投票请求,直到所述定时器超时后,停止接收投票请求。
在一种可能实现方式中,所述一个或多个处理器被配置为执行所述指令,以执行下述步骤:
如果所述节点设备运行于跟随状态,则从所述多个节点设备的投票请求中获取运行周期信息和最新日志索引;
如果所述多个节点设备的投票请求中的运行周期信息均大于所述节点设备的运行周期信息,且所述多个节点设备的投票请求中的最新日志索引均不小于所述节点设备的最新日志索引,将所述节点设备的工作状态从所述跟随状态切换至候选状态或保持跟随状态。
在一种可能实现方式中,所述一个或多个处理器被配置为执行所述指令,以执行下述步骤:
根据所述多个节点设备的投票请求,确定目标节点设备;
响应于所述目标节点设备的投票请求,向所述目标节点设备发送投票确认消息。
在一种可能实现方式中,所述一个或多个处理器被配置为执行所述指令,以执行下述步骤:
接收运行于领导状态的节点设备所广播的心跳信息;或,
接收运行于领导状态的节点设备所广播的日志复制指令,基于所述日志复制指令复制日志。
节点设备600还可以包括一个电源组件626被配置为执行节点设备600的电源管理,一个有线或无线网络接口650被配置为将节点设备600连接到网络,和一个输入输出(I/O)接口658。节点设备600可以操作基于存储在存储器632的操作系统,例如Windows Server TM,Mac OS X TM,Unix TM,Linux TM,FreeBSD TM或类似。
在示例性实施例中,还提供了一种计算机可读存储介质,该计算机可读存 储介质存储有指令,上述指令可由设备中的处理器执行以完成上述实施例中的节点设备运行方法。例如,所述计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (16)

  1. 一种节点设备运行方法,应用于第一节点设备,所述方法包括:
    所述第一节点设备接收多个第二节点设备的投票请求,所述多个第二节点设备的数量大于系统中节点设备数量的半数;
    如果第一节点设备运行于领导状态,则从所述多个第二节点设备的投票请求中获取运行周期信息和最新日志索引;
    如果所述多个第二节点设备的投票请求中的运行周期信息均大于所述第一节点设备的运行周期信息,且所述多个第二节点设备的投票请求中的最新日志索引均不小于所述第一节点设备的最新日志索引,将所述第一节点设备的工作状态从所述领导状态切换至跟随状态或候选状态。
  2. 根据权利要求1所述的方法,其中,所述接收多个第二节点设备的投票请求包括:
    当接收到第一个投票请求后,启动定时器进行计时;
    在所述定时器的运行过程中,继续接收投票请求,直到所述定时器超时后,停止接收投票请求。
  3. 根据权利要求1所述的方法,其中,所述接收多个第二节点设备的投票请求之后,所述方法还包括:
    如果所述第一节点设备运行于跟随状态,则从所述多个第二节点设备的投票请求中获取运行周期信息和最新日志索引;
    如果所述多个第二节点设备的投票请求中的运行周期信息均大于所述第一节点设备的运行周期信息,且所述多个第二节点设备的投票请求中的最新日志索引均不小于所述第一节点设备的最新日志索引,将所述第一节点设备的工作状态从所述跟随状态切换至候选状态或保持跟随状态。
  4. 根据权利要求1所述的方法,其中,所述如果所述多个第二节点设备的投票请求中的运行周期信息均大于所述第一节点设备的运行周期信息,且所述多个第二节点设备的投票请求中的最新日志索引均不小于所述第一节点设备的最新日志索引,将所述第一节点设备的工作状态从所述领导状态切换至跟随状 态或候选状态之后,所述方法还包括:
    根据所述多个第二节点设备的投票请求,确定目标节点设备;
    响应于所述目标节点设备的投票请求,向所述目标节点设备发送投票确认消息。
  5. 根据权利要求1所述的方法,其中,所述如果所述多个第二节点设备的投票请求中的运行周期信息均大于所述第一节点设备的运行周期信息,且所述多个第二节点设备的投票请求中的最新日志索引均不小于所述第一节点设备的最新日志索引,将所述第一节点设备的工作状态从所述领导状态切换至跟随状态或候选状态之后,所述方法还包括:
    接收运行于领导状态的节点设备所广播的心跳信息;或,
    接收运行于领导状态的节点设备所广播的日志复制指令,基于所述日志复制指令复制日志。
  6. 一种工作状态切换装置,所述装置应用于第一节点设备,所述装置包括:
    接收模块,用于接收多个第二节点设备的投票请求,所述多个第二节点设备的数量大于系统中节点设备数量的半数;
    获取模块,用于如果所述第一节点设备运行于领导状态,则从所述多个第二节点设备的投票请求中获取运行周期信息和最新日志索引;
    运行模块,用于如果所述多个第二节点设备的投票请求中的运行周期信息均大于所述第一节点设备的运行周期信息,且所述多个第二节点设备的投票请求中的最新日志索引均不小于所述第二节点设备的最新日志索引,将所述第一节点设备的工作状态从所述领导状态切换至跟随状态或候选状态。
  7. 根据权利要求6所述的装置,其中,所述接收模块用于:
    当接收到第一个投票请求后,启动定时器进行计时;
    在所述定时器的运行过程中,继续接收投票请求,直到所述定时器超时后,停止接收投票请求。
  8. 根据权利要求6所述的装置,其中,
    所述获取模块还用于:如果所述第一节点设备运行于跟随状态,则从所述 多个第二节点设备的投票请求中获取运行周期信息和最新日志索引;
    所述运行模块还用于:如果所述多个第二节点设备的投票请求中的运行周期信息均大于所述第一节点设备的运行周期信息,且所述多个第二节点设备的投票请求中的最新日志索引均不小于所述第一节点设备的最新日志索引,将所述第一节点设备的工作状态从所述跟随状态切换至候选状态或保持跟随状态。
  9. 根据权利要求6所述的装置,其中,所述装置还包括:
    确定模块,用于根据所述多个第二节点设备的投票请求,确定目标节点设备;
    发送模块,用于响应于所述目标节点设备的投票请求,向所述目标节点设备发送投票确认消息。
  10. 根据权利要求6所述的装置,其中,
    所述接收模块,还用于接收运行于领导状态的节点设备所广播的心跳信息;或,
    所述接收模块,还用于接收运行于领导状态的节点设备所广播的日志复制指令,基于所述日志复制指令复制日志。
  11. 一种节点设备,其中,所述节点设备包括:
    一个或多个处理器;
    一个或多个存储器,所述一个或多个存储器用于存储由所述一个或多个处理器执行的指令;
    所述一个或多个处理器被配置为执行所述指令,以执行下述节点设备运行方法的步骤:
    接收多个节点设备的投票请求,所述多个节点设备的数量大于系统中节点设备数量的半数;
    如果所述节点设备运行于领导状态,则从所述多个节点设备的投票请求中获取运行周期信息和最新日志索引;
    如果所述多个节点设备的投票请求中的运行周期信息均大于所述节点设备的运行周期信息,且所述多个节点设备的投票请求中的最新日志索引均不小于所述节点设备的最新日志索引,将所述节点设备的工作状态从所述领导状态切 换至跟随状态或候选状态。
  12. 根据权利要求11所述的节点设备,其中,所述一个或多个处理器被配置为执行所述指令,以执行下述步骤:
    当接收到第一个投票请求后,启动定时器进行计时;
    在所述定时器的运行过程中,继续接收投票请求,直到所述定时器超时后,停止接收投票请求。
  13. 根据权利要求11所述的节点设备,其中,所述一个或多个处理器被配置为执行所述指令,以执行下述步骤:
    如果所述节点设备运行于跟随状态,则从所述多个节点设备的投票请求中获取运行周期信息和最新日志索引;
    如果所述多个节点设备的投票请求中的运行周期信息均大于所述节点设备的运行周期信息,且所述多个节点设备的投票请求中的最新日志索引均不小于所述节点设备的最新日志索引,将所述节点设备的工作状态从所述跟随状态切换至候选状态或保持跟随状态。
  14. 根据权利要求11所述的节点设备,其中,所述一个或多个处理器被配置为执行所述指令,以执行下述步骤:
    根据所述多个节点设备的投票请求,确定目标节点设备;
    响应于所述目标节点设备的投票请求,向所述目标节点设备发送投票确认消息。
  15. 根据权利要求11所述的节点设备,其中,所述一个或多个处理器被配置为执行所述指令,以执行下述步骤:
    接收运行于领导状态的节点设备所广播的心跳信息;或,
    接收运行于领导状态的节点设备所广播的日志复制指令,基于所述日志复制指令复制日志。
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时,实现如权利要求1至5中的任一项所述的节点设备运 行方法。
PCT/CN2018/083594 2017-04-20 2018-04-18 节点设备运行方法、工作状态切换装置、节点设备及介质 WO2018192533A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18787276.7A EP3562123B1 (en) 2017-04-20 2018-04-18 Node device running method, working state switching device, node device, and medium
US16/510,723 US10833919B2 (en) 2017-04-20 2019-07-12 Node device operation method, work status switching apparatus, node device, and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710262463.9 2017-04-20
CN201710262463.9A CN107105032B (zh) 2017-04-20 2017-04-20 节点设备运行方法及节点设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/510,723 Continuation US10833919B2 (en) 2017-04-20 2019-07-12 Node device operation method, work status switching apparatus, node device, and medium

Publications (1)

Publication Number Publication Date
WO2018192533A1 true WO2018192533A1 (zh) 2018-10-25

Family

ID=59656639

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/083594 WO2018192533A1 (zh) 2017-04-20 2018-04-18 节点设备运行方法、工作状态切换装置、节点设备及介质

Country Status (4)

Country Link
US (1) US10833919B2 (zh)
EP (1) EP3562123B1 (zh)
CN (2) CN110233905B (zh)
WO (1) WO2018192533A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021012930A1 (zh) * 2019-07-23 2021-01-28 中南民族大学 投票节点配置方法及系统

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110233905B (zh) 2017-04-20 2020-12-25 腾讯科技(深圳)有限公司 节点设备运行方法、节点设备及存储介质
CN107967291B (zh) * 2017-10-12 2019-08-13 腾讯科技(深圳)有限公司 日志条目复制方法、装置、计算机设备及存储介质
CN109729129B (zh) * 2017-10-31 2021-10-26 华为技术有限公司 存储集群系统的配置修改方法、存储集群及计算机系统
CN107995029B (zh) * 2017-11-28 2019-12-13 新华三信息技术有限公司 选举控制方法及装置、选举方法及装置
CN108134712B (zh) * 2017-12-19 2020-12-18 海能达通信股份有限公司 一种分布式集群脑裂的处理方法、装置及设备
CN108306760A (zh) * 2017-12-28 2018-07-20 中国银联股份有限公司 用于在分布式系统中使管理能力自恢复的方法和装置
CN110162511B (zh) * 2018-02-08 2023-09-01 华为技术有限公司 一种日志传输方法及相关设备
US20210136042A1 (en) * 2018-05-07 2021-05-06 Convida Wireless, Llc Interworking between iot service layer systems and distributed ledger systems
CN109150971B (zh) * 2018-06-29 2020-10-23 腾讯科技(深圳)有限公司 超级节点投票和选举方法、装置和网络节点
CN109409828A (zh) * 2018-10-11 2019-03-01 绵阳网安科技有限公司 一种基于区块链技术的合同管理方法
CN109660367B (zh) * 2018-11-21 2021-03-26 语联网(武汉)信息技术有限公司 基于改进Raft算法的共识达成方法、装置与电子设备
CN109726211B (zh) * 2018-12-27 2020-02-04 无锡华云数据技术服务有限公司 一种分布式时序数据库
CN109947733A (zh) * 2019-03-29 2019-06-28 众安信息技术服务有限公司 数据存储装置与方法
CN111835534B (zh) * 2019-04-15 2022-05-06 华为技术有限公司 一种用于集群控制的方法,网络设备,主控节点装置及计算机可读存储介质
US11586614B2 (en) * 2019-07-30 2023-02-21 Oracle International Corporation Native persistent store support for blockchains
CN112347184A (zh) * 2019-08-07 2021-02-09 华为技术有限公司 分叉处理方法以及区块链节点
CN112835748A (zh) * 2019-11-22 2021-05-25 上海宝信软件股份有限公司 基于scada系统的多中心冗余仲裁方法及系统
CN111818159B (zh) * 2020-07-08 2024-04-05 腾讯科技(深圳)有限公司 数据处理节点的管理方法、装置、设备及存储介质
US11178002B1 (en) * 2020-07-10 2021-11-16 Abl Ip Holding Llc Autonomous adaptive controller for active wireless lighting communication
US11875178B2 (en) 2020-07-30 2024-01-16 Oracle International Corporation Using multiple blockchains for applying transactions to a set of persistent data objects in persistent storage systems
CN112019380B (zh) * 2020-08-12 2022-07-22 西华大学 一种基于权益激励的结合Raft和PBFT算法的区块链共识方法
CN112118305B (zh) * 2020-09-11 2023-04-21 北京易安睿龙科技有限公司 一种减少区块链共识系统中无效请求的方法
CN113420323B (zh) * 2021-06-04 2022-06-03 国网河北省电力有限公司信息通信分公司 数据共享方法及终端设备
US11789800B2 (en) * 2021-10-01 2023-10-17 Vmware, Inc. Degraded availability zone remediation for multi-availability zone clusters of host computers
CN114089744B (zh) * 2021-11-01 2023-11-21 南京邮电大学 一种基于改进Raft算法选择车辆队列领航车的方法
CN114268532A (zh) * 2021-11-24 2022-04-01 华人运通(上海)云计算科技有限公司 一种基于Raft协议的竞选方法、分布式系统及存储介质
CN114726867B (zh) * 2022-02-28 2023-09-26 重庆趣链数字科技有限公司 一种基于Raft的热备多主方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679796A (zh) * 2013-12-03 2015-06-03 方正信息产业控股有限公司 一种选举方法、装置及数据库镜像集群节点
CN104933132A (zh) * 2015-06-12 2015-09-23 广州巨杉软件开发有限公司 基于操作序列号的分布式数据库有权重选举方法
WO2016127580A1 (zh) * 2015-02-10 2016-08-18 华为技术有限公司 处理至少一个分布式集群中的故障的方法、设备和系统
CN107105032A (zh) * 2017-04-20 2017-08-29 腾讯科技(深圳)有限公司 节点设备运行方法及节点设备

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6671821B1 (en) * 1999-11-22 2003-12-30 Massachusetts Institute Of Technology Byzantine fault tolerance
US9230000B1 (en) * 2012-06-04 2016-01-05 Google Inc. Pipelining Paxos state machines
CN103152434A (zh) * 2013-03-27 2013-06-12 江苏辰云信息科技有限公司 一种分布式云系统中的领导节点更替方法
US10103801B2 (en) * 2015-06-03 2018-10-16 At&T Intellectual Property I, L.P. Host node device and methods for use therewith
CN105512266A (zh) * 2015-12-03 2016-04-20 曙光信息产业(北京)有限公司 一种实现分布式数据库操作一致性的方法及装置
CN105511987A (zh) * 2015-12-08 2016-04-20 上海爱数信息技术股份有限公司 一种强一致性且高可用的分布式任务管理系统
CN105743995B (zh) * 2016-04-05 2019-10-18 北京轻元科技有限公司 一种可移植高可用部署和管理容器集群的系统和方法
CN106060036B (zh) * 2016-05-26 2019-07-16 布比(北京)网络技术有限公司 去中心化共识方法及装置
CN111314479B (zh) * 2016-06-20 2022-08-23 北京奥星贝斯科技有限公司 一种数据处理方法和设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679796A (zh) * 2013-12-03 2015-06-03 方正信息产业控股有限公司 一种选举方法、装置及数据库镜像集群节点
WO2016127580A1 (zh) * 2015-02-10 2016-08-18 华为技术有限公司 处理至少一个分布式集群中的故障的方法、设备和系统
CN104933132A (zh) * 2015-06-12 2015-09-23 广州巨杉软件开发有限公司 基于操作序列号的分布式数据库有权重选举方法
CN107105032A (zh) * 2017-04-20 2017-08-29 腾讯科技(深圳)有限公司 节点设备运行方法及节点设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3562123A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021012930A1 (zh) * 2019-07-23 2021-01-28 中南民族大学 投票节点配置方法及系统

Also Published As

Publication number Publication date
CN110233905B (zh) 2020-12-25
US10833919B2 (en) 2020-11-10
US20190342149A1 (en) 2019-11-07
EP3562123A1 (en) 2019-10-30
CN110233905A (zh) 2019-09-13
EP3562123A4 (en) 2020-08-05
EP3562123B1 (en) 2021-08-11
CN107105032B (zh) 2019-08-06
CN107105032A (zh) 2017-08-29

Similar Documents

Publication Publication Date Title
WO2018192533A1 (zh) 节点设备运行方法、工作状态切换装置、节点设备及介质
WO2018192534A1 (zh) 节点设备运行方法、工作状态切换装置、节点设备及介质
KR102170345B1 (ko) 뷰 변경 프로토콜을 종료하기 위한 시스템 및 방법
WO2016070375A1 (zh) 一种分布式存储复制系统和方法
CN109151045B (zh) 一种分布式云系统及监控方法
US8671218B2 (en) Method and system for a weak membership tie-break
CN114048517B (zh) 区块链的双通道共识系统和方法、计算机可读存储介质
US20150254271A1 (en) Distributed File System and Data Backup Method for Distributed File System
CN105069152B (zh) 数据处理方法及装置
US20240054054A1 (en) Data Backup Method and System, and Related Device
WO2021184879A1 (zh) 在区块链共识处理时进行处理消息同步的方法及装置
CN105323271B (zh) 一种云计算系统以及云计算系统的处理方法和装置
CN112380064A (zh) 一种区块链中的共识节点容错方法、装置及系统
CN116232893A (zh) 分布式系统的共识方法、装置、电子设备及存储介质
CN114598593B (zh) 消息处理方法、系统、计算设备及计算机存储介质
Jehl et al. Asynchronous reconfiguration for Paxos state machines
Shi et al. Distributed file system multilevel fault-tolerant high availability mechanism
LUČIĆ Byzantine fault tolerant raft algorithm with round robin leader election
AU2019101575A4 (en) System and method for ending view change protocol
da Silva Boger et al. Intrusion-tolerant shared memory through a p2p overlay segmentation
Jehl et al. Towards fast and efficient failure handling for paxos state machines
KR20210151001A (ko) 분산 소프트웨어 정의 네트워크에서 비잔틴 장애 극복 방법
CN114422567A (zh) 数据请求的处理方法、装置、系统、计算机设备及介质
US8676950B2 (en) Independent restarting of the nodes of a peer-to-peer network
CN113727052A (zh) 一种会议恢复方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18787276

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018787276

Country of ref document: EP

Effective date: 20190725

NENP Non-entry into the national phase

Ref country code: DE