WO2017147517A1 - Lease-based heartbeat protocol method and apparatus - Google Patents

Lease-based heartbeat protocol method and apparatus Download PDF

Info

Publication number
WO2017147517A1
WO2017147517A1 PCT/US2017/019493 US2017019493W WO2017147517A1 WO 2017147517 A1 WO2017147517 A1 WO 2017147517A1 US 2017019493 W US2017019493 W US 2017019493W WO 2017147517 A1 WO2017147517 A1 WO 2017147517A1
Authority
WO
WIPO (PCT)
Prior art keywords
heartbeat request
retry
request response
sending
heartbeat
Prior art date
Application number
PCT/US2017/019493
Other languages
French (fr)
Inventor
Zhiyang TANG
Yijun Lu
Yunfeng TAO
Yunfeng Zhu
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to EP17757371.4A priority Critical patent/EP3420463B1/en
Publication of WO2017147517A1 publication Critical patent/WO2017147517A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/143Termination or inactivation of sessions, e.g. event-controlled end of session
    • H04L67/145Termination or inactivation of sessions, e.g. event-controlled end of session avoiding end of session, e.g. keep-alive, heartbeats, resumption message or wake-up for inactive or interrupted session
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0681Configuration of triggering conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/103Active monitoring, e.g. heartbeat, ping or trace-route with adaptive polling, i.e. dynamically adapting the polling rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/23Reliability checks, e.g. acknowledgments or fault reporting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols

Definitions

  • the present disclosure relates to the field of computers, and in particular, to lease-based heartbeat protocol technologies.
  • a coarse-grained mutex mechanism can ensure that only one client terminal can occupy a lock at one time.
  • An implementation of a lock relies on a lease-based session maintaining mechanism, and this session maintaining mechanism ensures that the client terminal detects a timeout at a time earlier than a server terminal detects the timeout upon the timeout of a session.
  • a client terminal may notify an application layer that a distributed lock is lost, and a server terminal may release this original lock after detecting the timeout, so that other client terminals may contend for the lock.
  • the foregoing maintenance of a session mainly relies on heartbeat(s) between a client terminal and a server terminal.
  • a heartbeat protocol is designed, a primary objective is to ensure that a session timeout of the client terminal occurs prior to that of the server terminal in situations where a quick and automatic recovery cannot be realized, e.g., network isolation or server shutdown. I n this way, the correctness of a lock service can be ensured.
  • a client terminal may attempt to report lock loss events to an application program as few as possible, which can ensure the stability of the system.
  • a reliability coordination system (such as Zookeeper) in a distributed system adopts a design for heartbeats in which sending and receiving thereof are independent of each other.
  • a server terminal i.e., Server or server device
  • a client terminal i.e., Client or client device
  • the client terminal sends heartbeat requests to the server terminal at fixed sending intervals (1/3 of a session lease period by default). Sending of a current heartbeat request is only driven by time intervals, and does not depend upon whether a response to a previous heartbeat has arrived.
  • the server terminal After receiving the heartbeat request, the server terminal updates the lease period of the current session to a future moment corresponding to 1 time of a session lease period as long as the current session has not expired, and immediately returns a heartbeat request response.
  • the current lease of the client terminal is extended forward to a future moment corresponding to 2/3 of the session lease period. If the lease of the client terminal expires, the client terminal (Client) of the reliability coordination system (ZooKeeper) in the distributed system will directly send an event, which is referred to as a session event, to an application layer, to inform an application program that the session has expired.
  • a session event an application layer
  • a heartbeat protocol of the reliability coordination system (ZooKeeper) in the distributed system if a temporary network isolation occurs since the last time when the client terminal (Client) successfully received a heartbeat request response, the client terminal (Client) has a buffer time of 2/3 of the session lease period to complete a retry for a heartbeat request.
  • the client terminal (Client) of the reliability coordination system (ZooKeeper) in the distributed system simply performs repeating heartbeat requests with a fixed sending interval. Since the client terminal (Client) lacks a reasonable retry logic to cope with various unexpected communication exceptions, the unreasonable retry logic for sending heartbeat requests causes a huge impact and pressure on network nodes and the server terminal (Server).
  • a heartbeat protocol of a reliability coordination system such as ZooKeeper
  • a reliability coordination system such as ZooKeeper
  • the retry logic is overly simple, and as a result, an application program may lose a lock due to a temporary network exception, increasing the sensitivity of the client terminal (Client) with respect to network failures.
  • An objective of the present disclosure is to provide a lease-based heartbeat protocol method and an apparatus thereof, to decrease the sensitivity of a client device with respect to network failures by solving the problems in the existing technologies.
  • the use of a heartbeat protocol of a reliability coordination system in a distributed system to maintain a session between a server terminal and a client terminal has caused an unreasonable retry logic for sending heartbeat requests, which leads to a huge impact and pressure on network nodes and the server terminal (Server), and the loss of a lock by an application program due to a temporary network exception.
  • a lease-based heartbeat protocol method may include sending a heartbeat request to a server device in a lease period, and receiving a heartbeat request response from the server device; and determining a retry sending interval based on a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal, and sending a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or the corresponding heartbeat request response is normal.
  • a lease-based heartbeat protocol method may include determining a backoff sending interval based on an anomaly determining time point of a heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal; obtaining a random time interval correction based on a random-time function and the backoff sending interval; and determining a retry sending interval based on the random time interval correction and the backoff sending interval.
  • a lease-based heartbeat protocol device may include a sending and receiving unit to send a heartbeat request to a server device in a lease period, and receive a heartbeat request response from the server device; and a retry unit to determine a retry sending interval based on a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or the corresponding heartbeat request response is normal.
  • the retry unit may further determine a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm; obtain a random time interval correction based on a random-time function and the backoff sending interval; and determine the retry sending interval based on the random time interval correction and the backoff sending interval.
  • the disclosed lease-based heartbeat protocol method and apparatus send a heartbeat request to a server device in a lease period, and receive a heartbeat request response from the server device, determine a retry sending interval based on a reverse exponential backoff algorithm when the heartbeat request response is abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, till the lease period expires or the corresponding heartbeat request response is normal.
  • the retry sending interval is determined based on the reverse exponential backoff algorithm, and the retry heartbeat request is sent to the server device again after the retry sending interval.
  • heartbeat request retries two successive retry heartbeat requests can be sent at a relatively large sending interval, thereby reducing impact and pressure of the heartbeat requests on network nodes and the server device.
  • the sending interval of the heartbeat request retries is reduced, such that re-sent heartbeat requests can be sent at a higher frequency, thereby effectively improving the success rate of recovering from a network failure while ensuring network stability and reducing network pressure.
  • the disclosed lease-based heartbeat protocol method and apparatus may determine a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm, obtain a random time interval correction based on a random-time function and the backoff sending interval, and determine the retry sending interval based on the random time interval correction and the backoff sending interval.
  • the random time interval correction for the backoff sending interval for sending heartbeat requests is obtained based on the random-time function, and the retry sending interval is determined based on the random correction of the interval and the backoff sending interval. Therefore, a resonance effect caused by heartbeat requests simultaneously sent by multiple client devices to the server device is avoided to a certain extent, thus effectively protecting the network nodes and the server device.
  • FIG. 1 is a flowchart of a lease-based heartbeat protocol method according to an aspect of the present disclosure.
  • FIG. 2 is a flowchart illustrating distribution of retry sending intervals in a lease-based heartbeat protocol method according to an aspect of the present disclosure.
  • FIG. 3 is a flowchart of determining an anomaly determining time point according to an aspect of the present disclosure.
  • FIG. 4 is a flowchart of a lease-based heartbeat protocol method when a heartbeat request response is normal according to an aspect of the present disclosure.
  • FIG. 5 is a structural diagram of a client device for a lease-based heartbeat protocol according to an aspect of the present disclosure.
  • FIG. 1 is a flowchart of a lease-based heartbeat protocol method 100 according to an aspect of the present disclosure.
  • the method 100 may include S102 and S104.
  • S102 sends a hea rtbeat request to a server device in a lease period, and receives a heartbeat request response from the server device.
  • S104 determines a retry sending interval based on a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal, and sends a retry heartbeat request to the server device again after the retry sending interval is past, until the lease period expires or a corresponding heartbeat request response is normal.
  • heart-beating between a client device and a server device in the embodiments of the present disclosure is a periodic process. I n a normal situation, a heartbeat period may be divided into three stages.
  • the first stage corresponds to a period from sending of a heartbeat request by the client device to receiving of the heartbeat request by the server device.
  • the second stage corresponds to a period from sending of a heartbeat request response by the server device to receiving of the heartbeat request response by the client device.
  • the third stage corresponds to a period in which the client device waits for a protocol sending interval (Send Interval). These three stages form one heartbeat period that cycles continuously between the client device and server device. I n an abnormal situation, the first stage and the second stage may not be successfully completed in one try, and therefore, heartbeat request(s) need(s) to be retried in the lease period.
  • the heartbeat request(s) may be retried multiple times, and the client device may wait for a retry sending interval before each retry.
  • the retries of the heartbeat request(s) are not performed for an infinite number of times, and are dependent on the lease period of the client device.
  • FIG. 2 is a flowchart 200 illustrating distribution of retry sending intervals in a lease-based heartbeat protocol method according to an aspect of the present disclosure.
  • a client device sends a heartbeat request to a server device (server), and receives a heartbeat request response from the server device at S102.
  • a first heartbeat request response is abnormal and a first retry heartbeat request needs to be sent to the server device.
  • a first retry sending interval is T 1 as shown in FIG. 2
  • a reverse exponential backoff algorithm is performed on the first retry sending interval T 1 if the retry heartbeat request needs to be sent again subsequently.
  • a second retry heartbeat request needs to be sent again after the second retry sending interval T 2 .
  • corresponding retry sending intervals obtained based on the reverse exponential backoff algorithm are T 3 , T 4 , T 5 ...T m respectively, wherein m is the number of times that the retry heartbeat request needs to be sent after a current retry sending interval.
  • the retry heartbeat request is sent to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal.
  • S104 determines a retry sending interval based on a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal, and sends a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal. Specifically, S104 determines the retry sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using the reverse exponential backoff algorithm when the heartbeat request response is abnormal.
  • the lease expiration time point Tl at S104 may be either a lease of the server device or a lease of the client device, because the lease of the client device is determined by subtracting the anomaly determining time point of the heartbeat request response from the lease of the server device.
  • a value of K which is used for indicating a reverse backoff degree of the reverse exponential backoff algorithm, is greater than 1. In implementations, K may be equal to 2.
  • the heartbeat request is sent to the server device at a time moment 0, and the lease expiration time point is Tl.
  • the heartbeat request response from the server device is abnormal (with the anomaly determining time point of the heartbeat request response being T2 as shown in FIG. 2)
  • T 1 (T1-T2)/(2 1 ) since the anomaly determining time point T2 after the anomaly of the heartbeat request response.
  • the first retry heartbeat request response is also abnormal (with an anomaly determining time point of the first retry heartbeat request response being t2 as shown in FIG.
  • T 3 (Tl-T2)/(2 3 ) since the anomaly determining time point t3 after the anomaly of the second retry heartbeat request response.
  • an N h retry heartbeat request response is abnormal at an anomaly determining time point t(N-l) of the (N-l) h retry heartbeat request response
  • the heartbeat request is sent to the server device at a time moment 0 and the lease expiration time point Tl is 00:00:52.
  • a retry heartbeat request response received from the server device is normal at a certain time point 00:00:32 before the lease expiration time point Tl (which is 00:00:52), this indicates that a heartbeat between the server device and the client device is successfully established in the current lease period.
  • an N h retry heartbeat request response is still abnormal, and a remaining time between the anomaly determining time point t(N) of the N h retry heartbeat request response and the lease expiration time point Tl (which is 00:00:52) is 150 ms (the remaining time between t(N) and Tl is merely an example here), heart-beating between the server device and the client device is considered to be disrupted at the anomaly determining time point t(N) of the N h retry heartbeat request response because no heartbeat request response from the server device can be received within the 150 ms.
  • determining the anomaly determining time point at S104 may include determining that the heartbeat request response is abnormal when the heartbeat request response is received and the heartbeat request response includes content indicating that the heartbeat request is an illegitimate request or the heartbeat request response is an error response to the heartbeat request, and determining a receiving time of the heartbeat request response as the anomaly determining time point.
  • determining the anomaly determining time point at S104 may include determining that the heartbeat request response is abnormal when the heartbeat request response is not received before a timeout, and determining a time point of the timeout as the anomaly determining time point.
  • FIG. 3 shows a flowchart 300 of determining an anomaly determining time point according to an aspect of the present disclosure.
  • client receives a heartbeat request response I from a server device (server) after sending a heartbeat request to the server device
  • the heartbeat request response I is determined as abnormal when content of the heartbeat request corresponding to the heartbeat request response I is illegitimate request content or the heartbeat request response I is an error response to the heartbeat request
  • a receiving time t of the heartbeat request response I is determined as the anomaly determining time point T2.
  • the heartbeat request response II is determined as abnormal, and a time point of the timeout, t(RT), is determined as the anomaly determining time point T2.
  • S104 determines a retry sending interval based on a reverse exponential backoff algorithm when a heartbeat request response is abnormal, and sending a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal.
  • S104 may determine a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm, obtain a random time interval correction based on a random-time function and the backoff sending interval, and determine the retry sending interval based on the random correction of the interval and the backoff sending interval.
  • S104 is the same as an expiration time point of the backoff sending interval.
  • the expiration time point of the backoff sending interval is 00:00:32
  • the time midpoint of the random time interval correction will be 00:00:32, wherein a random correction for time intervals may be several milliseconds, tens of milliseconds, or even longer.
  • the expiration time point of the backoff sending interval being 00:00:32 at S104 is merely an example, and other existing or future possible specific values of the expiration time point of the backoff sending interval, if applicable to the present disclosure, should also be included in the scope of protection of the present disclosure, and are incorporated herein by reference.
  • a backoff sending interval is determined to be 320 ms based on an anomaly determining time point of the heartbeat request response and a lease expiration time point.
  • a retry sending interval that is determined based on the random time interval correction and the backoff sending interval will be 320 ms ⁇ 40 ms, that is, a retry heartbeat request is randomly sent to a server device within the retry sending interval 320 ms ⁇ 40 ms.
  • the lease-based heartbeat protocol method 100 may further include sending a heartbeat request to the server device again after a protocol sending interval (Send Interval) when the heartbeat request response is normal at S106.
  • Send Interval protocol sending interval
  • a heartbeat request is sent to the server device again after the protocol Send Interval is lapsed since a receiving time of the heartbeat request response.
  • S106 may further send a heartbeat request to the server device again after a protocol Send Interval when the heartbeat request response is normal. Specifically, S106 may determine a re-initiation time of the heartbeat request according to a receiving time of the heartbeat request response and the protocol Send Interval if the heartbeat request response is normal, and send the heartbeat request to the server device at the re-initiation time.
  • FIG. 4 shows a flowchart 400 of a lease-based heartbeat protocol method 400 according to an aspect of the present disclosure when a heartbeat request response is normal.
  • a client device sends a heartbeat request to a server device (server) and receives a heartbeat request response from the server device.
  • the client device determines that a re-initiation time for a heartbeat request is t (normal)+ A T according to a receiving time t(normal) of the heartbeat request response and the protocol Send I nterval ⁇ ⁇ , and sends the heartbeat request to the server device at the re-initiation time t(normal)+ A T.
  • random time may also be obtained from the protocol Send Interval based on the receiving time of the heartbeat request response using the random-time function as described in the foregoing embodiment of the present disclosure, thereby determining the re-initiation time of the heartbeat request.
  • FIG. 5 is a structural diagram of a client device 500 for a lease-based heartbeat protocol according to an aspect of the present disclosure.
  • the client device 500 may include one or more computing devices.
  • the client device 500 may include one or more processors 502, an input/output (I/O) interface 504, a network interface 506, and memory 508
  • the memory 508 may include a form of computer-reada ble media, e.g., a non-permanent storage device, random-access memory (RAM) and/or a nonvolatile internal storage, such as read-only memory (ROM) or flash RAM.
  • RAM random-access memory
  • ROM read-only memory
  • flash RAM flash random-access memory
  • the computer-readable media may include a permanent or non-permanent type, a removable or non-removable media, which may achieve storage of information using any method or technology.
  • the information may include a computer-readable instruction, a data structure, a program module or other data.
  • Examples of computer storage media include, but not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electronically erasa ble programmable read-only memory (EEPROM), quick flash memory or other internal storage technology, compact disk read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission media, which may be used to store information that may be accessed by a computing device.
  • the computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • the memory 508 may include program units 510 and program data 512.
  • the program units 510 may include a sending and receiving unit 514 and a retry unit 516.
  • the sending and receiving unit 514 may send a hea rtbeat request to a server device within a lease period, and receive a heartbeat request response from the server device.
  • the retry unit 516 may determine a retry sending interval based on a reverse exponential backoff algorithm when the heartbeat request response is abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal.
  • the device 500 may include, but is not limited to, a user device, or a device formed from an integration of user device(s) and network device(s) via a network.
  • a user device may include, but is not limited to, any type of mobile electronic product.
  • a mobile electronic product may use any operating system, such as an Android operating system, an iOS operating system, etc.
  • a network device may include an electronic device that is able to automatically perform numerical computation and information processing according to preset or pre-stored instruction(s), and hardware thereof may include, but is not limited to, a microprocessor, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), an embedded device, etc.
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • DSP digital signal processor
  • a network may include, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad-hoc network (Ad Hoc network), etc.
  • the device 500 may also be a script program running on a device that is formed from an integration of user device(s) and network device(s) via a network.
  • the foregoing device 500 is merely an example, and other existing or future possible devices 500, if applicable to the present disclosure, should also be included in the scope of protection of the present disclosure, and are incorporated herein by reference.
  • the sending and receiving unit 514 may continuously send a heartbeat request to a server device in a lease period, and receive a heartbeat request response from the server device.
  • the retry unit 516 may continuously determine a retry sending interval based on a reverse exponential backoff algorithm in an event that the heartbeat request response is abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal.
  • heart-beating between a client device and a server device in the embodiments of the present disclosure is a periodic process.
  • a heartbeat period may be divided into three stages.
  • the first stage corresponds to a period from sending of a heartbeat request by the client device to receiving of the heartbeat request by the server device.
  • the second stage corresponds to a period from sending of a heartbeat request response by the server device to receiving of the heartbeat request response by the client device.
  • the third stage corresponds to a period in which the client device waits for a protocol sending interval (Send Interval).
  • the first stage and the second stage may not be successfully completed in one try, and therefore, heartbeat request(s) need(s) to be retried in the lease period.
  • the heartbeat request(s) may be retried multiple times, and the client device may wait for a retry sending interval before each retry.
  • the retries of the heartbeat request(s) are not performed for an infinite number of times, and are dependent on the lease period of the client device.
  • FIG. 2 is a flowchart illustrating distribution of a retry sending interval in a lease-based heartbeat protocol method according to an aspect of the present disclosure.
  • the sending and receiving unit 514 of the client device sends a heartbeat request to the server device (server), and receives a heartbeat request response from the server device.
  • the retry unit 516 when a first heartbeat request response is abnormal and a first retry heartbeat request needs to be sent to the server device, if a first retry sending interval is T 1 in FIG.
  • the reverse exponential backoff algorithm is performed on the first retry sending interval T 1 when the retry heartbeat request needs to be sent again subsequently, and after it is obtained that the first retry heartbeat request response is abnormal as shown in FIG. 2, a second retry heartbeat request needs to be sent again after a second retry sending interval T 2 ; if subsequent retry heartbeat request responses are abnormal, retry sending intervals obtained based on the reverse exponential backoff algorithm are sequentially T 3 , T 4 , T 5 ...T m , wherein m is the number of times that the retry heartbeat request needs to be sent after a current retry sending interval; the retry heartbeat request is sent to the server device again after the retry sending interval, till the lease expires or the corresponding heartbeat request response is normal.
  • FIG. 2 is a flowchart illustrating distribution of retry sending intervals in a lease-based heartbeat protocol method according to an aspect of the present disclosure.
  • the sending and receiving unit 514 of the client device sends a heartbeat request to a server device (server), and receives a heartbeat request response from the server device. If a first heartbeat request response is abnormal, a first retry heartbeat request needs to be sent to the server device. If a first retry sending interval is T 1 as shown in FIG. 2, the retry unit 516 performs a reverse exponential backoff algorithm on the first retry sending interval T 1 if the retry heartbeat request needs to be sent again subsequently.
  • a second retry heartbeat request needs to be sent again after the second retry sending interval T 2 .
  • corresponding retry sending intervals obtained based on the reverse exponential backoff algorithm are T 3 , T 4 , T 5 ...T m respectively, wherein m is the number of times that the retry heartbeat request needs to be sent after a current retry sending interval.
  • the retry heartbeat request is sent to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal.
  • the retry unit 516 determines a retry sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm.
  • the lease expiration time point Tl of the retry unit 516 may be either a lease of the server device or a lease of the client device, because the lease of the client device is determined by subtracting the anomaly determining time point of the heartbeat request response from the lease of the server device.
  • a value of K which is used for indicating a reverse backoff degree of the reverse exponential backoff algorithm, is greater than 1. In implementations, K may be equal to 2.
  • the heartbeat request is sent to the server device at a time moment 0, and the lease expiration time point is Tl.
  • the heartbeat request response from the server device is abnormal (with the anomaly determining time point of the heartbeat request response being T2 as shown in FIG. 2)
  • T 1 (T1-T2)/(2 1 ) since the anomaly determining time point T2 after the anomaly of the heartbeat request response.
  • the first retry heartbeat request response is also abnormal (with an anomaly determining time point of the first retry heartbeat request response being t2 as shown in FIG.
  • T 3 (Tl-T2)/(2 3 ) since the anomaly determining time point t3 after the anomaly of the second retry heartbeat request response.
  • an N h retry heartbeat request response is abnormal at an anomaly determining time point t(N-l) of the (N-l) h retry heartbeat request response
  • the heartbeat request is sent to the server device at a time moment 0 and the lease expiration time point Tl is 00:00:52.
  • a retry heartbeat request response received from the server device is normal at a certain time point 00:00:32 before the lease expiration time point Tl (which is 00:00:52), this indicates that a heartbeat between the server device and the client device is successfully established in the current lease period.
  • an N h retry heartbeat request response is still abnormal, and a remaining time between the anomaly determining time point t(N) of the N h retry heartbeat request response and the lease expiration time point Tl (which is 00:00:52) is 150 ms (the remaining time between t(N) and Tl is merely an example here), heart-beating between the server device and the client device is considered to be disrupted at the anomaly determining time point t(N) of the N h retry heartbeat request response because no heartbeat request response from the server device can be received within the 150 ms.
  • the retry unit 516 may determine the anomaly determining time point by determining that the heartbeat request response is abnormal when the heartbeat request response is received and the heartbeat request response includes content indicating that the heartbeat request is an illegitimate request or the heartbeat request response is an error response to the heartbeat request, and determining a receiving time of the heartbeat request response as the anomaly determining time point.
  • the retry unit 516 may determine the anomaly determining time point by determining that the heartbeat request response is abnormal when the heartbeat request response is not received before a timeout, and determining a time point of the timeout as the anomaly determining time point.
  • FIG. 3 shows a flowchart of determining an anomaly determining time point according to an aspect of the present disclosure.
  • client receives a heartbeat request response I from a server device (server) after sending a heartbeat request to the server device
  • the heartbeat request response I is determined as abnormal when content of the heartbeat request corresponding to the heartbeat request response I is illegitimate request content or the heartbeat request response I is an error response to the heartbeat request
  • a receiving time t of the heartbeat request response I is determined as the anomaly determining time point T2.
  • the heartbeat request response I I is determined as abnormal, and a time point of the timeout, t(RT), is determined as the anomaly determining time point T2.
  • the retry unit 516 determines a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point by using a reverse exponential backoff algorithm, obtains a random time interval correction based on a random-time function and the backoff sending interval, and determines a retry sending interval based on the random time interval correction and the backoff sending interval.
  • a time midpoint of the random correction of the interval in the retry unit 516 is the same as an expiration time point of the backoff sending interval, for example, if the expiration time point of the backoff sending interval is 00:00:32, the time midpoint of the random correction of the interval is 00:00:32, wherein the random correction of the interval may be several milliseconds, dozens of milliseconds, or even longer.
  • the expiration time point of the backoff sending interval being 00:00:32 in the retry unit 516 is merely an example of an embodiment of the present disclosure, and other existing or future possible specific values of the expiration time point of the backoff sending interval, if applicable to the present disclosure, should also be included in the protection scope of the present disclosure, and are incorporated herein by reference.
  • a time midpoint of the random time interval correction of the retry unit 516 is the same as an expiration time point of the backoff sending interval. For example, if the expiration time point of the backoff sending interval is 00:00:32, the time midpoint of the random time interval correction will be 00:00:32, wherein a random correction for time intervals may be several milliseconds, tens of milliseconds, or even longer.
  • the expiration time point of the backoff sending interval of the retry unit 516 being 00:00:32 is merely an example, and other existing or future possible specific values of the expiration time point of the backoff sending interval, if applicable to the present disclosure, should also be included in the scope of protection of the present disclosure, and are incorporated herein by reference.
  • a backoff sending interval is determined to be 320 ms based on an anomaly determining time point of the heartbeat request response and a lease expiration time point.
  • a retry sending interval that is determined based on the random time interval correction and the backoff sending interval will be 320 ms ⁇ 40 ms, that is, a retry heartbeat request is randomly sent to a server device within the retry sending interval 320 ms ⁇ 40 ms.
  • the client device 500 may further include a normal request unit 518.
  • the normal request unit 518 sends a heartbeat request to the server device again after a protocol Send Interval when the heartbeat request response is normal.
  • the normal request unit 518 may further determine a re-initiation time of the heartbeat request according to a receiving time of the heartbeat request response and the protocol Send Interval if the heartbeat request response is normal, and send the heartbeat request to the server device at the re-initiation time.
  • FIG. 4 shows a flowchart of a lease-based heartbeat protocol method
  • a client device sends a heartbeat request to a server device (server) and receives a heartbeat request response from the server device.
  • the client device determines that a re-initiation time for a heartbeat request is t (normal)+ ⁇ T according to a receiving time t(normal) of the heartbeat request response and the protocol Send I nterval ⁇ ⁇ , and sends the heartbeat request to the server device at the re-initiation time t(norma l)+ A T.
  • random time may also be obtained from the protocol Send I nterval based on the receiving time of the heartbeat request response using the random-time function as described in the foregoing embodiment of the present disclosure, thereby determining the re-initiation time of the heartbeat request.
  • the disclosed lease-based heartbeat protocol method and apparatus send a heartbeat request to a server device in a lease period, and receive a heartbeat request response from the server device, determine a retry sending interval based on a reverse exponential backoff algorithm when the heartbeat request response is abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, till the lease period expires or the corresponding heartbeat request response is normal.
  • the retry sending interval is determined based on the reverse exponential backoff algorithm, and the retry heartbeat request is sent to the server device again after the retry sending interval.
  • heartbeat request retries two successive retry heartbeat requests can be sent at a relatively large sending interval, thereby reducing impact and pressure of the heartbeat requests on network nodes and the server device.
  • the sending interval of the heartbeat request retries is reduced, such that re-sent heartbeat requests can be sent at a higher frequency, thereby effectively improving the success rate of recovering from a network failure while ensuring network stability and reducing network pressure.
  • the disclosed lease-based heartbeat protocol method and apparatus may determine a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm, obtain a random time interval correction based on a random-time function and the backoff sending interval, and determine the retry sending interval based on the random time interval correction and the backoff sending interval.
  • the random time interval correction for the backoff sending interval for sending heartbeat requests is obtained based on the random-time function, and the retry sending interval is determined based on the random correction of the interval and the backoff sending interval. Therefore, a resonance effect caused by heartbeat requests simultaneously sent by multiple client devices to the server device is avoided to a certain extent, thus effectively protecting the network nodes and the server device.
  • the present disclosure may be implemented in software and/or a combination of software and hardware.
  • an application specific integrated circuit ASIC
  • a general-purpose computer or any other similar hardware devices may be used for implementing the present disclosure.
  • a software program of the present disclosure may be executed by processor(s) to achieve the operations or functions as described in the foregoing description.
  • a software program (including a related data structure) of the present disclosure can be stored into a computer readable recording media, for example, a RAM memory, a magnetic or optical drive, a floppy disk, or similar devices.
  • some operations or functions of the present disclosure may be implemented with hardware, for example, a circuit that performs various operations or functions in cooperation with processor(s).
  • a part of the present disclosure may be applied as a computer program product, for example, computer program instruction(s) that, when executed by computing device(s), to invoke or provide the method and/or the technical solution according to the present disclosure through operations of the computing device(s).
  • the program instruction(s) that invoke(s) the method of the present disclosure may be stored in a fixed or removable recording media, and/or transmitted via broadcast or data streams in other signal carrier media, and/or stored in a working memory of a computer device that runs according to the program instruction(s).
  • I mplementations of the present disclosure may include herein an apparatus, which includes memory configured to store computer program instruction(s) and processor(s) configured to execute the program instruction(s). When the computer program instruction(s) is/are executed by the processor(s), the apparatus is triggered to run the method and/or the technical solution of the foregoing embodiments of the present disclosure.

Abstract

A lease-based heartbeat protocol method is provided. The method may include sending a heartbeat request to a server device in a lease period, and receiving a heartbeat request response from the server device; and determining a retry sending at adaptive interval in response to the heartbeat request response being abnormal, and sending a retry heartbeat request to the server device again after the retry sending interval is past, until the lease period expires or a corresponding heartbeat request response is normal. As such, two successive retry heartbeat requests can be sent at a relatively large time interval at an initial stage of heartbeat request retry. At a later stage of the heartbeat request retry, the time interval associated with the retry heartbeat requests is reduced, such that re-sent heartbeat requests can be sent at a higher speed.

Description

LEASE-BASED HEARTBEAT PROTOCOL METHOD AND APPARATUS
Cross Reference to Related Patent Application
This application claims foreign priority to Chinese Patent Application No. 201610105054.3 filed on February 25, 2016, entitled "Lease-Based Heartbeat Protocol Method and Apparatus", which is hereby incorporated by reference in its entirety.
Technical Field
The present disclosure relates to the field of computers, and in particular, to lease-based heartbeat protocol technologies.
Background
I n a distributed lock service system, a coarse-grained mutex mechanism can ensure that only one client terminal can occupy a lock at one time. An implementation of a lock relies on a lease-based session maintaining mechanism, and this session maintaining mechanism ensures that the client terminal detects a timeout at a time earlier than a server terminal detects the timeout upon the timeout of a session. Generally, after detecting a timeout, a client terminal may notify an application layer that a distributed lock is lost, and a server terminal may release this original lock after detecting the timeout, so that other client terminals may contend for the lock.
The foregoing maintenance of a session mainly relies on heartbeat(s) between a client terminal and a server terminal. When a heartbeat protocol is designed, a primary objective is to ensure that a session timeout of the client terminal occurs prior to that of the server terminal in situations where a quick and automatic recovery cannot be realized, e.g., network isolation or server shutdown. I n this way, the correctness of a lock service can be ensured. Second, in situations where a system can be automatically recovered quickly, such as network jitter or failover, a client terminal may attempt to report lock loss events to an application program as few as possible, which can ensure the stability of the system.
I n existing technologies, a reliability coordination system (such as Zookeeper) in a distributed system adopts a design for heartbeats in which sending and receiving thereof are independent of each other. After a session between a server terminal (i.e., Server or server device) and a client terminal (i.e., Client or client device) is established, the client terminal sends heartbeat requests to the server terminal at fixed sending intervals (1/3 of a session lease period by default). Sending of a current heartbeat request is only driven by time intervals, and does not depend upon whether a response to a previous heartbeat has arrived. After receiving the heartbeat request, the server terminal updates the lease period of the current session to a future moment corresponding to 1 time of a session lease period as long as the current session has not expired, and immediately returns a heartbeat request response. Each time when the client terminal receives a heartbeat request response from the server terminal, the current lease of the client terminal is extended forward to a future moment corresponding to 2/3 of the session lease period. If the lease of the client terminal expires, the client terminal (Client) of the reliability coordination system (ZooKeeper) in the distributed system will directly send an event, which is referred to as a session event, to an application layer, to inform an application program that the session has expired. In a heartbeat protocol of the reliability coordination system (ZooKeeper) in the distributed system, if a temporary network isolation occurs since the last time when the client terminal (Client) successfully received a heartbeat request response, the client terminal (Client) has a buffer time of 2/3 of the session lease period to complete a retry for a heartbeat request. However, the client terminal (Client) of the reliability coordination system (ZooKeeper) in the distributed system simply performs repeating heartbeat requests with a fixed sending interval. Since the client terminal (Client) lacks a reasonable retry logic to cope with various unexpected communication exceptions, the unreasonable retry logic for sending heartbeat requests causes a huge impact and pressure on network nodes and the server terminal (Server). From the perspective of an average number of retry heartbeat requests that are sent, at most two retry heartbeat requests can be initiated within the buffer time of 2/3 of the session lease period, causing the retry logic of the client terminal (Client) to be over-simplified within the buffer time for sending the retry heartbeat requests. As a result, the application program may lose the lock due to a temporary network exception, thus increasing the sensitivity of the client terminal (Client) with respect to network failures. In existing technologies, the use of a heartbeat protocol of a reliability coordination system (such as ZooKeeper) in a distributed system to maintain a session between a server terminal and a client terminal causes an unreasonable retry logic for sending heartbeat requests and a huge impact and pressure on network nodes and the server terminal (Server). Moreover, the retry logic is overly simple, and as a result, an application program may lose a lock due to a temporary network exception, increasing the sensitivity of the client terminal (Client) with respect to network failures.
Summary
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify all key features or essential features of the claimed subject matter, nor is it intended to be used alone as an aid in determining the scope of the claimed subject matter. The term "techniques," for instance, may refer to device(s), system(s), method(s) and/or computer-readable instructions as permitted by the context above and throughout the present disclosure.
An objective of the present disclosure is to provide a lease-based heartbeat protocol method and an apparatus thereof, to decrease the sensitivity of a client device with respect to network failures by solving the problems in the existing technologies. In the existing technologies, the use of a heartbeat protocol of a reliability coordination system in a distributed system to maintain a session between a server terminal and a client terminal has caused an unreasonable retry logic for sending heartbeat requests, which leads to a huge impact and pressure on network nodes and the server terminal (Server), and the loss of a lock by an application program due to a temporary network exception.
According to an aspect of the present disclosure, a lease-based heartbeat protocol method is provided, which may include sending a heartbeat request to a server device in a lease period, and receiving a heartbeat request response from the server device; and determining a retry sending interval based on a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal, and sending a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or the corresponding heartbeat request response is normal.
Furthermore, according to another aspect of the present disclosure, a lease-based heartbeat protocol method is provided, which may include determining a backoff sending interval based on an anomaly determining time point of a heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal; obtaining a random time interval correction based on a random-time function and the backoff sending interval; and determining a retry sending interval based on the random time interval correction and the backoff sending interval.
According to another aspect of the present disclosure, a lease-based heartbeat protocol device is further provided, which may include a sending and receiving unit to send a heartbeat request to a server device in a lease period, and receive a heartbeat request response from the server device; and a retry unit to determine a retry sending interval based on a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or the corresponding heartbeat request response is normal.
In implementations, in response to the heartbeat request response being abnormal, the retry unit may further determine a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm; obtain a random time interval correction based on a random-time function and the backoff sending interval; and determine the retry sending interval based on the random time interval correction and the backoff sending interval.
Compared with the existing technologies, the disclosed lease-based heartbeat protocol method and apparatus send a heartbeat request to a server device in a lease period, and receive a heartbeat request response from the server device, determine a retry sending interval based on a reverse exponential backoff algorithm when the heartbeat request response is abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, till the lease period expires or the corresponding heartbeat request response is normal. When the heartbeat request response is abnormal, the retry sending interval is determined based on the reverse exponential backoff algorithm, and the retry heartbeat request is sent to the server device again after the retry sending interval. As such, at an initial stage of heartbeat request retries, two successive retry heartbeat requests can be sent at a relatively large sending interval, thereby reducing impact and pressure of the heartbeat requests on network nodes and the server device. At a later stage of the heartbeat request retries, the sending interval of the heartbeat request retries is reduced, such that re-sent heartbeat requests can be sent at a higher frequency, thereby effectively improving the success rate of recovering from a network failure while ensuring network stability and reducing network pressure.
Furthermore, when a heartbeat request response is abnormal, the disclosed lease-based heartbeat protocol method and apparatus may determine a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm, obtain a random time interval correction based on a random-time function and the backoff sending interval, and determine the retry sending interval based on the random time interval correction and the backoff sending interval. When a network failure occurs at the server device, the random time interval correction for the backoff sending interval for sending heartbeat requests is obtained based on the random-time function, and the retry sending interval is determined based on the random correction of the interval and the backoff sending interval. Therefore, a resonance effect caused by heartbeat requests simultaneously sent by multiple client devices to the server device is avoided to a certain extent, thus effectively protecting the network nodes and the server device.
Brief Description of the Drawings
Other features, objectives and advantages of the present disclosure will become more apparent upon reading the detailed description of non-limiting embodiments with reference to the following accompanying drawings.
FIG. 1 is a flowchart of a lease-based heartbeat protocol method according to an aspect of the present disclosure.
FIG. 2 is a flowchart illustrating distribution of retry sending intervals in a lease-based heartbeat protocol method according to an aspect of the present disclosure.
FIG. 3 is a flowchart of determining an anomaly determining time point according to an aspect of the present disclosure.
FIG. 4 is a flowchart of a lease-based heartbeat protocol method when a heartbeat request response is normal according to an aspect of the present disclosure.
FIG. 5 is a structural diagram of a client device for a lease-based heartbeat protocol according to an aspect of the present disclosure.
Identical or similar reference labels in the accompanying drawings represent identical or similar components.
Detailed Description
The present disclosure is further described in detail herein with reference to the accompanying drawings.
FIG. 1 is a flowchart of a lease-based heartbeat protocol method 100 according to an aspect of the present disclosure. In embodiments, the method 100 may include S102 and S104.
S102 sends a hea rtbeat request to a server device in a lease period, and receives a heartbeat request response from the server device. S104 determines a retry sending interval based on a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal, and sends a retry heartbeat request to the server device again after the retry sending interval is past, until the lease period expires or a corresponding heartbeat request response is normal. It should be noted that heart-beating between a client device and a server device in the embodiments of the present disclosure is a periodic process. I n a normal situation, a heartbeat period may be divided into three stages. The first stage corresponds to a period from sending of a heartbeat request by the client device to receiving of the heartbeat request by the server device. The second stage corresponds to a period from sending of a heartbeat request response by the server device to receiving of the heartbeat request response by the client device. The third stage corresponds to a period in which the client device waits for a protocol sending interval (Send Interval). These three stages form one heartbeat period that cycles continuously between the client device and server device. I n an abnormal situation, the first stage and the second stage may not be successfully completed in one try, and therefore, heartbeat request(s) need(s) to be retried in the lease period. The heartbeat request(s) may be retried multiple times, and the client device may wait for a retry sending interval before each retry. However, the retries of the heartbeat request(s) are not performed for an infinite number of times, and are dependent on the lease period of the client device.
FIG. 2 is a flowchart 200 illustrating distribution of retry sending intervals in a lease-based heartbeat protocol method according to an aspect of the present disclosure. As shown in FIG. 2, a client device (client) sends a heartbeat request to a server device (server), and receives a heartbeat request response from the server device at S102. At S104, a first heartbeat request response is abnormal and a first retry heartbeat request needs to be sent to the server device. If a first retry sending interval is T1 as shown in FIG. 2, a reverse exponential backoff algorithm is performed on the first retry sending interval T1 if the retry heartbeat request needs to be sent again subsequently. Upon determining that the first retry heartbeat request response is abnormal as shown in FIG. 2, a second retry heartbeat request needs to be sent again after the second retry sending interval T2. If subsequent retry heartbeat request responses are abnormal, corresponding retry sending intervals obtained based on the reverse exponential backoff algorithm are T3, T4, T5...Tm respectively, wherein m is the number of times that the retry heartbeat request needs to be sent after a current retry sending interval. The retry heartbeat request is sent to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal.
Furthermore, S104 determines a retry sending interval based on a reverse exponential backoff algorithm in response to the heartbeat request response being abnormal, and sends a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal. Specifically, S104 determines the retry sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using the reverse exponential backoff algorithm when the heartbeat request response is abnormal.
I n implementations, determining the retry sending interval T based on the anomaly determining time point of the heartbeat request response and the lease expiration time point at S104 may include: T=(T1-T2)/(KN), wherein Tl is the lease expiration time point, T2 is the anomaly determining time point, K is greater than 1, and N is a current number of retries of sending the heartbeat request.
I n implementations, the lease expiration time point Tl at S104 may be either a lease of the server device or a lease of the client device, because the lease of the client device is determined by subtracting the anomaly determining time point of the heartbeat request response from the lease of the server device. A value of K, which is used for indicating a reverse backoff degree of the reverse exponential backoff algorithm, is greater than 1. In implementations, K may be equal to 2. Apparently, one skilled in the art can understand that the value of K at S104 is merely an example, and other existing or future possible specific values of K, if applicable to the present disclosure, should also be included in the scope of protection of the present disclosure, and are incorporated herein by reference.
I n implementations, the heartbeat request is sent to the server device at a time moment 0, and the lease expiration time point is Tl. When the heartbeat request response from the server device is abnormal (with the anomaly determining time point of the heartbeat request response being T2 as shown in FIG. 2), a first retry heartbeat request is sent to the server device after waiting for a first retry sending interval: T1=(T1-T2)/(21) since the anomaly determining time point T2 after the anomaly of the heartbeat request response. If the first retry heartbeat request response is also abnormal (with an anomaly determining time point of the first retry heartbeat request response being t2 as shown in FIG. 2), a second retry heartbeat request is sent to the server device after waiting for a second retry sending interval : T2=(Tl-T2)/(22) since the anomaly determining time point t2 after the anomaly of the first retry heartbeat request response. If the second retry heartbeat request response is still abnormal (with an anomaly determining time point of the second retry heartbeat request response being t3 as shown in FIG. 2), a third retry heartbeat request is sent to the server device after waiting for a third retry sending interval : T3=(Tl-T2)/(23) since the anomaly determining time point t3 after the anomaly of the second retry heartbeat request response. This pattern repeats accordingly. If an (N-l) h retry heartbeat request response is abnormal at an anomaly determining time point t(N-l) of the (N-l) h retry heartbeat request response, an N h retry heartbeat request is sent to the server device after waiting for an N h retry sending interval: TN=(T1-T2)/(2N) obtained through the reverse exponential backoff algorithm, until the lease period expires or a corresponding heartbeat request response is normal.
For example, the heartbeat request is sent to the server device at a time moment 0 and the lease expiration time point Tl is 00:00:52. When the heartbeat request response from the server device is abnormal, with the anomaly determining time point T2 of the heartbeat request response as 00:00:22, a first retry heartbeat request is sent to the server device at a time point 00:00:37 after waiting for a first retry sending interval, which is T1=(T1-T2)/(21)=15 seconds, since the anomaly determining time point T2 (which is 00:00:22). If the first retry heartbeat request response from the server device is also abnormal and an anomaly determining time point t2 of the first retry heartbeat request response is 00:00:37.200, a second retry heartbeat request is sent to the server device at the time point 00:00:44.900 after waiting for the second retry sending interval, which is T2=(Tl-T2)/(22)=7.5 seconds (i.e., 7 seconds and 500 milliseconds) since the anomaly determining time point t2 (which is 00:00:37.400). If the second retry heartbeat request response from the server device is also abnormal and an anomaly determining time point t3 of the second retry heartbeat request response is 00:00:45.100, a third retry heartbeat request is sent to the server device at the time point 00:00:48.850 after waiting for the third retry sending interval, which is T3=(Tl-T2)/(23)=3.75 seconds (i.e., 3 seconds and 750 milliseconds) since the anomaly determining time point t3 (which is 00:00:45.100), and so forth. If a retry heartbeat request response received from the server device is normal at a certain time point 00:00:32 before the lease expiration time point Tl (which is 00:00:52), this indicates that a heartbeat between the server device and the client device is successfully established in the current lease period. If an N h retry heartbeat request response is still abnormal, and a remaining time between the anomaly determining time point t(N) of the N h retry heartbeat request response and the lease expiration time point Tl (which is 00:00:52) is 150 ms (the remaining time between t(N) and Tl is merely an example here), heart-beating between the server device and the client device is considered to be disrupted at the anomaly determining time point t(N) of the N h retry heartbeat request response because no heartbeat request response from the server device can be received within the 150 ms.
Furthermore, in implementations, determining the anomaly determining time point at S104 may include determining that the heartbeat request response is abnormal when the heartbeat request response is received and the heartbeat request response includes content indicating that the heartbeat request is an illegitimate request or the heartbeat request response is an error response to the heartbeat request, and determining a receiving time of the heartbeat request response as the anomaly determining time point. In implementations, determining the anomaly determining time point at S104 may include determining that the heartbeat request response is abnormal when the heartbeat request response is not received before a timeout, and determining a time point of the timeout as the anomaly determining time point.
For example, FIG. 3 shows a flowchart 300 of determining an anomaly determining time point according to an aspect of the present disclosure. As shown in FIG. 3, if a client device (client) receives a heartbeat request response I from a server device (server) after sending a heartbeat request to the server device, the heartbeat request response I is determined as abnormal when content of the heartbeat request corresponding to the heartbeat request response I is illegitimate request content or the heartbeat request response I is an error response to the heartbeat request, and a receiving time t of the heartbeat request response I is determined as the anomaly determining time point T2. If the heartbeat request response from the server device is not received before a timeout (Request Timeout) (e.g., a receiving time of the heartbeat request response I I exceeds the timeout), the heartbeat request response II is determined as abnormal, and a time point of the timeout, t(RT), is determined as the anomaly determining time point T2.
I n implementations, S104 determines a retry sending interval based on a reverse exponential backoff algorithm when a heartbeat request response is abnormal, and sending a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal. Specifically, S104 may determine a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm, obtain a random time interval correction based on a random-time function and the backoff sending interval, and determine the retry sending interval based on the random correction of the interval and the backoff sending interval.
It should be noted that, a time midpoint of the random time interval correction at
S104 is the same as an expiration time point of the backoff sending interval. For example, if the expiration time point of the backoff sending interval is 00:00:32, the time midpoint of the random time interval correction will be 00:00:32, wherein a random correction for time intervals may be several milliseconds, tens of milliseconds, or even longer. Apparently, one skilled in the art should understand that the expiration time point of the backoff sending interval being 00:00:32 at S104 is merely an example, and other existing or future possible specific values of the expiration time point of the backoff sending interval, if applicable to the present disclosure, should also be included in the scope of protection of the present disclosure, and are incorporated herein by reference.
For example, if a heartbeat request response is abnormal and a reverse exponential backoff algorithm is used, a backoff sending interval is determined to be 320 ms based on an anomaly determining time point of the heartbeat request response and a lease expiration time point. If a random time interval correction that is obtained based on a random-time function within the backoff sending interval of 320 ms is 80 ms, because a time midpoint of the random time interval correction the is the same as the expiration time point of the backoff sending interval, a retry sending interval that is determined based on the random time interval correction and the backoff sending interval will be 320 ms±40 ms, that is, a retry heartbeat request is randomly sent to a server device within the retry sending interval 320 ms±40 ms.
I n implementations, the lease-based heartbeat protocol method 100 may further include sending a heartbeat request to the server device again after a protocol sending interval (Send Interval) when the heartbeat request response is normal at S106.
For example, if the heartbeat request response is normal, a heartbeat request is sent to the server device again after the protocol Send Interval is lapsed since a receiving time of the heartbeat request response.
I n implementations, S106 may further send a heartbeat request to the server device again after a protocol Send Interval when the heartbeat request response is normal. Specifically, S106 may determine a re-initiation time of the heartbeat request according to a receiving time of the heartbeat request response and the protocol Send Interval if the heartbeat request response is normal, and send the heartbeat request to the server device at the re-initiation time.
For example, FIG. 4 shows a flowchart 400 of a lease-based heartbeat protocol method 400 according to an aspect of the present disclosure when a heartbeat request response is normal. I n FIG. 4, a client device (client) sends a heartbeat request to a server device (server) and receives a heartbeat request response from the server device. When the heartbeat request response is normal, the client device determines that a re-initiation time for a heartbeat request is t (normal)+ A T according to a receiving time t(normal) of the heartbeat request response and the protocol Send I nterval Δ Τ, and sends the heartbeat request to the server device at the re-initiation time t(normal)+ A T. It should be noted that, when the heartbeat request response is normal, random time may also be obtained from the protocol Send Interval based on the receiving time of the heartbeat request response using the random-time function as described in the foregoing embodiment of the present disclosure, thereby determining the re-initiation time of the heartbeat request.
FIG. 5 is a structural diagram of a client device 500 for a lease-based heartbeat protocol according to an aspect of the present disclosure. I n implementations, the client device 500 may include one or more computing devices. By way of example and not limitation, the client device 500 may include one or more processors 502, an input/output (I/O) interface 504, a network interface 506, and memory 508
The memory 508 may include a form of computer-reada ble media, e.g., a non-permanent storage device, random-access memory (RAM) and/or a nonvolatile internal storage, such as read-only memory (ROM) or flash RAM. The memory 508 is an example of computer-readable media.
The computer-readable media may include a permanent or non-permanent type, a removable or non-removable media, which may achieve storage of information using any method or technology. The information may include a computer-readable instruction, a data structure, a program module or other data. Examples of computer storage media include, but not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electronically erasa ble programmable read-only memory (EEPROM), quick flash memory or other internal storage technology, compact disk read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission media, which may be used to store information that may be accessed by a computing device. As defined herein, the computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
I n implementations, the memory 508 may include program units 510 and program data 512. The program units 510 may include a sending and receiving unit 514 and a retry unit 516.
In implementations, the sending and receiving unit 514 may send a hea rtbeat request to a server device within a lease period, and receive a heartbeat request response from the server device. The retry unit 516 may determine a retry sending interval based on a reverse exponential backoff algorithm when the heartbeat request response is abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal. In implementations, the device 500 may include, but is not limited to, a user device, or a device formed from an integration of user device(s) and network device(s) via a network. In implementations, a user device may include, but is not limited to, any type of mobile electronic product. A mobile electronic product may use any operating system, such as an Android operating system, an iOS operating system, etc. In implementations, a network device may include an electronic device that is able to automatically perform numerical computation and information processing according to preset or pre-stored instruction(s), and hardware thereof may include, but is not limited to, a microprocessor, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), an embedded device, etc. In implementations, a network may include, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad-hoc network (Ad Hoc network), etc. In implementations, the device 500 may also be a script program running on a device that is formed from an integration of user device(s) and network device(s) via a network. Apparently, one skilled in the art should understand that the foregoing device 500 is merely an example, and other existing or future possible devices 500, if applicable to the present disclosure, should also be included in the scope of protection of the present disclosure, and are incorporated herein by reference.
The foregoing units operate continuously among each other. One skilled in the art should understand that the term "continuous" herein means that the foregoing units operate separately in real time or according to requirement(s) of respective preset or real-time adjusted operation modes. For example, the sending and receiving unit 514 may continuously send a heartbeat request to a server device in a lease period, and receive a heartbeat request response from the server device. The retry unit 516 may continuously determine a retry sending interval based on a reverse exponential backoff algorithm in an event that the heartbeat request response is abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal.
It should be noted that heart-beating between a client device and a server device in the embodiments of the present disclosure is a periodic process. In a normal situation, a heartbeat period may be divided into three stages. The first stage corresponds to a period from sending of a heartbeat request by the client device to receiving of the heartbeat request by the server device. The second stage corresponds to a period from sending of a heartbeat request response by the server device to receiving of the heartbeat request response by the client device. The third stage corresponds to a period in which the client device waits for a protocol sending interval (Send Interval). These three stages form one heartbeat period that cycles continuously between the client device and server device. In an abnormal situation, the first stage and the second stage may not be successfully completed in one try, and therefore, heartbeat request(s) need(s) to be retried in the lease period. The heartbeat request(s) may be retried multiple times, and the client device may wait for a retry sending interval before each retry. However, the retries of the heartbeat request(s) are not performed for an infinite number of times, and are dependent on the lease period of the client device.
Returning to FIG2, FIG. 2 is a flowchart illustrating distribution of a retry sending interval in a lease-based heartbeat protocol method according to an aspect of the present disclosure. As shown in FIG. 2, the sending and receiving unit 514 of the client device (client) sends a heartbeat request to the server device (server), and receives a heartbeat request response from the server device. In the retry unit 516, when a first heartbeat request response is abnormal and a first retry heartbeat request needs to be sent to the server device, if a first retry sending interval is T1 in FIG. 2, the reverse exponential backoff algorithm is performed on the first retry sending interval T1 when the retry heartbeat request needs to be sent again subsequently, and after it is obtained that the first retry heartbeat request response is abnormal as shown in FIG. 2, a second retry heartbeat request needs to be sent again after a second retry sending interval T2; if subsequent retry heartbeat request responses are abnormal, retry sending intervals obtained based on the reverse exponential backoff algorithm are sequentially T3, T4, T5...Tm, wherein m is the number of times that the retry heartbeat request needs to be sent after a current retry sending interval; the retry heartbeat request is sent to the server device again after the retry sending interval, till the lease expires or the corresponding heartbeat request response is normal. FIG. 2 is a flowchart illustrating distribution of retry sending intervals in a lease-based heartbeat protocol method according to an aspect of the present disclosure. As shown in FIG. 2, the sending and receiving unit 514 of the client device (client) sends a heartbeat request to a server device (server), and receives a heartbeat request response from the server device. If a first heartbeat request response is abnormal, a first retry heartbeat request needs to be sent to the server device. If a first retry sending interval is T1 as shown in FIG. 2, the retry unit 516 performs a reverse exponential backoff algorithm on the first retry sending interval T1 if the retry heartbeat request needs to be sent again subsequently. Upon determining that the first retry heartbeat request response is abnormal as shown in FIG. 2, a second retry heartbeat request needs to be sent again after the second retry sending interval T2. If subsequent retry heartbeat request responses are abnormal, corresponding retry sending intervals obtained based on the reverse exponential backoff algorithm are T3, T4, T5...Tm respectively, wherein m is the number of times that the retry heartbeat request needs to be sent after a current retry sending interval. The retry heartbeat request is sent to the server device again after the retry sending interval, until the lease period expires or a corresponding heartbeat request response is normal.
In implementations, when a heartbeat request response is abnormal, the retry unit 516 determines a retry sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm.
In implementations, the retry unit 516 may determine the retry sending interval T based on the anomaly determining time point of the heartbeat request response and the lease expiration time point using T=(T1-T2)/(KN), wherein Tl is the lease expiration time point, T2 is the anomaly determining time point, K is greater than 1, and N is a current number of retries of sending the heartbeat request.
In implementations, the lease expiration time point Tl of the retry unit 516 may be either a lease of the server device or a lease of the client device, because the lease of the client device is determined by subtracting the anomaly determining time point of the heartbeat request response from the lease of the server device. A value of K, which is used for indicating a reverse backoff degree of the reverse exponential backoff algorithm, is greater than 1. In implementations, K may be equal to 2. Apparently, one skilled in the art can understand that the value of K at S104 is merely an example, and other existing or future possible specific values of K, if applicable to the present disclosure, should also be included in the scope of protection of the present disclosure, and are incorporated herein by reference.
I n implementations, the heartbeat request is sent to the server device at a time moment 0, and the lease expiration time point is Tl. When the heartbeat request response from the server device is abnormal (with the anomaly determining time point of the heartbeat request response being T2 as shown in FIG. 2), a first retry heartbeat request is sent to the server device after waiting for a first retry sending interval: T1=(T1-T2)/(21) since the anomaly determining time point T2 after the anomaly of the heartbeat request response. If the first retry heartbeat request response is also abnormal (with an anomaly determining time point of the first retry heartbeat request response being t2 as shown in FIG. 2), a second retry heartbeat request is sent to the server device after waiting for a second retry sending interval : T2=(Tl-T2)/(22) since the anomaly determining time point t2 after the anomaly of the first retry heartbeat request response. If the second retry heartbeat request response is still abnormal (with an anomaly determining time point of the second retry heartbeat request response being t3 as shown in FIG. 2), a third retry heartbeat request is sent to the server device after waiting for a third retry sending interval : T3=(Tl-T2)/(23) since the anomaly determining time point t3 after the anomaly of the second retry heartbeat request response. This pattern repeats accordingly. If an (N-l) h retry heartbeat request response is abnormal at an anomaly determining time point t(N-l) of the (N-l) h retry heartbeat request response, an N h retry heartbeat request is sent to the server device after waiting for an N h retry sending interval: TN=(T1-T2)/(2N) obtained through the reverse exponential backoff algorithm, until the lease period expires or a corresponding heartbeat request response is normal.
For example, the heartbeat request is sent to the server device at a time moment 0 and the lease expiration time point Tl is 00:00:52. When the heartbeat request response from the server device is abnormal, with the anomaly determining time point T2 of the heartbeat request response as 00:00:22, a first retry heartbeat request is sent to the server device at a time point 00:00:37 after waiting for a first retry sending interval, which is T1=(T1-T2)/(21)=15 seconds, since the anomaly determining time point T2 (which is 00:00:22). If the first retry heartbeat request response from the server device is also abnormal and an anomaly determining time point t2 of the first retry heartbeat request response is 00:00:37.200, a second retry heartbeat request is sent to the server device at the time point 00:00:44.900 after waiting for the second retry sending interval, which is T2=(Tl-T2)/(22)=7.5 seconds (i.e., 7 seconds and 500 milliseconds) since the anomaly determining time point t2 (which is 00:00:37.400). If the second retry heartbeat request response from the server device is also abnormal and an anomaly determining time point t3 of the second retry heartbeat request response is 00:00:45.100, a third retry heartbeat request is sent to the server device at the time point 00:00:48.850 after waiting for the third retry sending interval, which is T3=(Tl-T2)/(23)=3.75 seconds (i.e., 3 seconds and 750 milliseconds) since the anomaly determining time point t3 (which is 00:00:45.100), and so forth. If a retry heartbeat request response received from the server device is normal at a certain time point 00:00:32 before the lease expiration time point Tl (which is 00:00:52), this indicates that a heartbeat between the server device and the client device is successfully established in the current lease period. If an N h retry heartbeat request response is still abnormal, and a remaining time between the anomaly determining time point t(N) of the N h retry heartbeat request response and the lease expiration time point Tl (which is 00:00:52) is 150 ms (the remaining time between t(N) and Tl is merely an example here), heart-beating between the server device and the client device is considered to be disrupted at the anomaly determining time point t(N) of the N h retry heartbeat request response because no heartbeat request response from the server device can be received within the 150 ms.
I n implementations, the retry unit 516 may determine the anomaly determining time point by determining that the heartbeat request response is abnormal when the heartbeat request response is received and the heartbeat request response includes content indicating that the heartbeat request is an illegitimate request or the heartbeat request response is an error response to the heartbeat request, and determining a receiving time of the heartbeat request response as the anomaly determining time point. In implementations, the retry unit 516 may determine the anomaly determining time point by determining that the heartbeat request response is abnormal when the heartbeat request response is not received before a timeout, and determining a time point of the timeout as the anomaly determining time point.
For example, FIG. 3 shows a flowchart of determining an anomaly determining time point according to an aspect of the present disclosure. As shown in FIG. 3, if a client device (client) receives a heartbeat request response I from a server device (server) after sending a heartbeat request to the server device, the heartbeat request response I is determined as abnormal when content of the heartbeat request corresponding to the heartbeat request response I is illegitimate request content or the heartbeat request response I is an error response to the heartbeat request, and a receiving time t of the heartbeat request response I is determined as the anomaly determining time point T2. If the heartbeat request response from the server device is not received before a timeout (Request Timeout) (e.g., a receiving time of the heartbeat request response II exceeds the timeout), the heartbeat request response I I is determined as abnormal, and a time point of the timeout, t(RT), is determined as the anomaly determining time point T2.
I n implementations, when the heartbeat request response is abnormal, the retry unit 516 determines a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point by using a reverse exponential backoff algorithm, obtains a random time interval correction based on a random-time function and the backoff sending interval, and determines a retry sending interval based on the random time interval correction and the backoff sending interval.
It should be noted that, a time midpoint of the random correction of the interval in the retry unit 516 is the same as an expiration time point of the backoff sending interval, for example, if the expiration time point of the backoff sending interval is 00:00:32, the time midpoint of the random correction of the interval is 00:00:32, wherein the random correction of the interval may be several milliseconds, dozens of milliseconds, or even longer. Apparently, one skilled in the art should understand that the expiration time point of the backoff sending interval being 00:00:32 in the retry unit 516 is merely an example of an embodiment of the present disclosure, and other existing or future possible specific values of the expiration time point of the backoff sending interval, if applicable to the present disclosure, should also be included in the protection scope of the present disclosure, and are incorporated herein by reference.
It should be noted that, a time midpoint of the random time interval correction of the retry unit 516 is the same as an expiration time point of the backoff sending interval. For example, if the expiration time point of the backoff sending interval is 00:00:32, the time midpoint of the random time interval correction will be 00:00:32, wherein a random correction for time intervals may be several milliseconds, tens of milliseconds, or even longer. Apparently, one skilled in the art should understand that the expiration time point of the backoff sending interval of the retry unit 516 being 00:00:32 is merely an example, and other existing or future possible specific values of the expiration time point of the backoff sending interval, if applicable to the present disclosure, should also be included in the scope of protection of the present disclosure, and are incorporated herein by reference.
For example, if a heartbeat request response is abnormal and a reverse exponential backoff algorithm is used, a backoff sending interval is determined to be 320 ms based on an anomaly determining time point of the heartbeat request response and a lease expiration time point. If a random time interval correction that is obtained based on a random-time function within the backoff sending interval of 320 ms is 80 ms, because a time midpoint of the random time interval correction the is the same as the expiration time point of the backoff sending interval, a retry sending interval that is determined based on the random time interval correction and the backoff sending interval will be 320 ms±40 ms, that is, a retry heartbeat request is randomly sent to a server device within the retry sending interval 320 ms±40 ms.
Furthermore, the client device 500 may further include a normal request unit 518. The normal request unit 518 sends a heartbeat request to the server device again after a protocol Send Interval when the heartbeat request response is normal.
For example, if the heartbeat request response is normal, a heartbeat request is sent to the server device again after the protocol Send Interval since a receiving time of the heartbeat request response. I n implementations, the normal request unit 518 may further determine a re-initiation time of the heartbeat request according to a receiving time of the heartbeat request response and the protocol Send Interval if the heartbeat request response is normal, and send the heartbeat request to the server device at the re-initiation time.
For example, FIG. 4 shows a flowchart of a lease-based heartbeat protocol method
400 according to an aspect of the present disclosure when a heartbeat request response is normal. In FIG. 4, a client device (client) sends a heartbeat request to a server device (server) and receives a heartbeat request response from the server device. When the heartbeat request response is normal, the client device determines that a re-initiation time for a heartbeat request is t (normal)+ Δ T according to a receiving time t(normal) of the heartbeat request response and the protocol Send I nterval Δ Τ, and sends the heartbeat request to the server device at the re-initiation time t(norma l)+ A T. It should be noted that, when the heartbeat request response is normal, random time may also be obtained from the protocol Send I nterval based on the receiving time of the heartbeat request response using the random-time function as described in the foregoing embodiment of the present disclosure, thereby determining the re-initiation time of the heartbeat request.
Compared with the existing technologies, the disclosed lease-based heartbeat protocol method and apparatus send a heartbeat request to a server device in a lease period, and receive a heartbeat request response from the server device, determine a retry sending interval based on a reverse exponential backoff algorithm when the heartbeat request response is abnormal, and send a retry heartbeat request to the server device again after the retry sending interval, till the lease period expires or the corresponding heartbeat request response is normal. When the heartbeat request response is abnormal, the retry sending interval is determined based on the reverse exponential backoff algorithm, and the retry heartbeat request is sent to the server device again after the retry sending interval. As such, at an initial stage of heartbeat request retries, two successive retry heartbeat requests can be sent at a relatively large sending interval, thereby reducing impact and pressure of the heartbeat requests on network nodes and the server device. At a later stage of the heartbeat request retries, the sending interval of the heartbeat request retries is reduced, such that re-sent heartbeat requests can be sent at a higher frequency, thereby effectively improving the success rate of recovering from a network failure while ensuring network stability and reducing network pressure.
Furthermore, when a heartbeat request response is abnormal, the disclosed lease-based heartbeat protocol method and apparatus may determine a backoff sending interval based on an anomaly determining time point of the heartbeat request response and a lease expiration time point using a reverse exponential backoff algorithm, obtain a random time interval correction based on a random-time function and the backoff sending interval, and determine the retry sending interval based on the random time interval correction and the backoff sending interval. When a network failure occurs at the server device, the random time interval correction for the backoff sending interval for sending heartbeat requests is obtained based on the random-time function, and the retry sending interval is determined based on the random correction of the interval and the backoff sending interval. Therefore, a resonance effect caused by heartbeat requests simultaneously sent by multiple client devices to the server device is avoided to a certain extent, thus effectively protecting the network nodes and the server device.
It should be noted that the present disclosure may be implemented in software and/or a combination of software and hardware. For example, an application specific integrated circuit (ASIC), a general-purpose computer or any other similar hardware devices may be used for implementing the present disclosure. I n implementations, a software program of the present disclosure may be executed by processor(s) to achieve the operations or functions as described in the foregoing description. Similarly, a software program (including a related data structure) of the present disclosure can be stored into a computer readable recording media, for example, a RAM memory, a magnetic or optical drive, a floppy disk, or similar devices. I n addition, some operations or functions of the present disclosure may be implemented with hardware, for example, a circuit that performs various operations or functions in cooperation with processor(s).
I n addition, a part of the present disclosure may be applied as a computer program product, for example, computer program instruction(s) that, when executed by computing device(s), to invoke or provide the method and/or the technical solution according to the present disclosure through operations of the computing device(s). The program instruction(s) that invoke(s) the method of the present disclosure may be stored in a fixed or removable recording media, and/or transmitted via broadcast or data streams in other signal carrier media, and/or stored in a working memory of a computer device that runs according to the program instruction(s). I mplementations of the present disclosure may include herein an apparatus, which includes memory configured to store computer program instruction(s) and processor(s) configured to execute the program instruction(s). When the computer program instruction(s) is/are executed by the processor(s), the apparatus is triggered to run the method and/or the technical solution of the foregoing embodiments of the present disclosure.
For one skilled in the art, it is apparent that the present disclosure is not limited to the details of the foregoing exemplary embodiments, and the present disclosure can be implemented in other specific forms without departing from the spirit or basic features of the present disclosure. Therefore, from whichever point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of the present disclosure is defined by the appended claims instead of the above description. Thus, the present disclosure is intended to cover all changes that are included in the meaning and scope of the equivalent elements of the claims. None of the reference labels in the claims should be regarded as limitations of the claims. In addition, it is apparent that the term "include" does not exclude other units or operations, and a singular form does not exclude a plural form. Multiple units or devices stated in an apparatus claim may also be implemented by a single unit or device through software or hardware. Terms such as "first" and "second" are used to represent names, and do not indicate any specific order.

Claims

Claims What is claimed is:
1. A method implemented by a client device, the method comprising:
sending a hea rtbeat request to a server device in a lease period, and receiving a heartbeat request response from the server device; and
determining a retry sending at adaptive interval in response to the heartbeat request response being abnormal, and sending a retry heartbeat request to the server device again after the retry sending interval is past, until the lease period expires or a corresponding heartbeat request response is normal.
2. The method of claim 1, wherein determining the retry sending interval comprises determining the retry sending interval based at least in part on an anomaly determining time point of the heartbeat request response and a lease expiration time point using the reverse exponential backoff algorithm.
3. The method of claim 2, wherein the anomaly determining time point is one of:
a receiving time of the heartbeat request response in an event that the heartbeat request response comprises content indicating that the heartbeat request is an illegitimate request or the heartbeat request response is an error response to the heartbeat request; or a time point of a timeout in an event that no heartbeat request response is received before the timeout.
4. The method of claim 1, wherein determining the retry sending interval comprises determining the retry sending interval based at least in part on a difference between an anomaly determining time point of the heartbeat request response and a lease expiration time point, and wherein the retry sending interval decreases as a number of retries of sending the heartbeat request increases.
5. The method of claim 1, wherein determining the retry sending interval comprises: determining a backoff sending interval based at least in part on an anomaly determining time point of the heartbeat request response and a lease expiration time point using the reverse exponential backoff algorithm;
obtaining a random time interval correction based at least in part on a random-time function and the backoff sending interval; and
determining the retry sending interval based at least in part on the random time interval correction and the backoff sending interval.
6. The method of claim 1, further comprising sending the heartbeat request to the server device again after a protocol sending interval is past if the heartbeat request response is normal.
7. The method of claim 6, wherein sending the heartbeat request to the server device again after a protocol sending interval is past comprises:
determining a re-initiation time of the heartbeat request according to a receiving time of the heartbeat request response and the protocol sending interval if the heartbeat request response is normal; and
sending the heartbeat request to the server device at the re-initiation time.
8. A client device comprising:
one or more processors;
memory;
a sending and receiving unit stored in the memory and executable by the one or more processors to send a hea rtbeat request to a server device in a lease period, and receive a heartbeat request response from a server device; and
a retry unit stored in the memory and executable by the one or more processors to determine a retry sending at adaptive interval in response to the heartbeat request response being abnormal, and send a retry heartbeat request to the server device again after the retry sending interval is past, the retry sending interval decreases as a number of retries of sending the heartbeat request increases.
9. The client device of claim 8, wherein the retry unit determines the retry sending interval based at least in part on an anomaly determining time point of the heartbeat request response and a lease expiration time point using the reverse exponential backoff algorithm.
10. The client device of claim 9, wherein the anomaly determining time point is one of: a receiving time of the heartbeat request response in an event that the heartbeat request response comprises content indicating that the heartbeat request is an illegitimate request or the heartbeat request response is an error response to the heartbeat request; or a time point of a timeout in an event that no heartbeat request response is received before the timeout.
11. The client device of claim 10, wherein the retry unit resends the heartbeat request to the server device until the lease period expires or a normal heartbeat request response is received.
12. The client device of claim 8, wherein the retry unit is further configured to:
determine a backoff sending interval based at least in part on an anomaly determining time point of the heartbeat request response and a lease expiration time point using the reverse exponential backoff algorithm;
obtain a random time interval correction based at least in part on a random-time function and the backoff sending interval; and
determine the retry sending interval based at least in part on the random time interval correction and the backoff sending interval.
13. The client device of claim 8, further comprising a normal request unit to send the heartbeat request to the server device again after a protocol sending interval is past in an event that the heartbeat request response is normal.
14. The client device of claim 13, wherein the normal request unit is further configured to:
determine a re-initiation time of the heartbeat request according to a receiving time of the heartbeat request response and the protocol sending interval if the heartbeat request response is normal; and
send the heartbeat request to the server device at the re-initiation time.
15. One or more computer-readable media storing executable instructions that, when executed by one or more processors of a client device, cause the one or more processors to perform acts comprising:
sending a hea rtbeat request to a server device in a lease period, and receiving a heartbeat request response from the server device; and
determining a retry sending at adaptive interval in response to the heartbeat request response being abnormal, and sending a retry heartbeat request to the server device again after the retry sending interval is past, until the lease period expires or a corresponding heartbeat request response is normal.
16. The one or more computer-readable media of claim 15, wherein determining the retry sending interval comprises determining the retry sending interval based at least in part on an anomaly determining time point of the heartbeat request response and a lease expiration time point using the reverse exponential backoff algorithm.
17. The one or more com puter-reada ble media of claim 16, wherein the anomaly determining time point is one of:
a receiving time of the heartbeat request response in an event that the heartbeat request response comprises content indicating that the heartbeat request is an illegitimate request or the heartbeat request response is an error response to the heartbeat request; or a time point of a timeout in an event that no heartbeat request response is received before the timeout.
18. The one or more com puter-reada ble media of claim 15, wherein determining the retry sending interval comprises determining the retry sending interval based at least in part on a difference between an anomaly determining time point of the heartbeat request response and a lease expiration time point, and wherein the retry sending interval decreases as a number of retries of sending the heartbeat request increases.
19. The one or more com puter-reada ble media of claim 15, wherein determining the retry sending interval comprises:
determining a backoff sending at adaptive interval based at least in part on an anomaly determining time point of the heartbeat request response and a lease expiration time point; obtaining a random time interval correction based at least in part on a random-time function and the backoff sending interval; and
determining the retry sending interval based at least in part on the random time interval correction and the backoff sending interva l.
20. The one or more com puter-reada ble media of claim 15, the acts further comprising sending the heartbeat request to the server device again after a protocol sending interval is past if the heartbeat request response is normal.
PCT/US2017/019493 2016-02-25 2017-02-24 Lease-based heartbeat protocol method and apparatus WO2017147517A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP17757371.4A EP3420463B1 (en) 2016-02-25 2017-02-24 Lease-based heartbeat protocol method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610105054.3 2016-02-25
CN201610105054.3A CN107124324B (en) 2016-02-25 2016-02-25 Heartbeat protocol method and equipment based on lease

Publications (1)

Publication Number Publication Date
WO2017147517A1 true WO2017147517A1 (en) 2017-08-31

Family

ID=59679045

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/019493 WO2017147517A1 (en) 2016-02-25 2017-02-24 Lease-based heartbeat protocol method and apparatus

Country Status (4)

Country Link
US (1) US10601930B2 (en)
EP (1) EP3420463B1 (en)
CN (1) CN107124324B (en)
WO (1) WO2017147517A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10601930B2 (en) 2016-02-25 2020-03-24 Alibaba Group Holding Limited Lease-based heartbeat protocol method and apparatus

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111328445A (en) * 2017-09-26 2020-06-23 诺基亚技术有限公司 Method, apparatus, computer program product and computer program
CN108040266A (en) * 2017-12-06 2018-05-15 深圳市雷鸟信息科技有限公司 Abnormality eliminating method, device and the storage medium of data synchronization
WO2020076557A1 (en) * 2018-10-09 2020-04-16 Google Llc Method and apparatus for ensuring continued device operational reliability in cloud-degraded mode
CN109324764B (en) * 2018-11-01 2021-11-26 郑州云海信息技术有限公司 Method and device for realizing distributed exclusive lock
CN111124812A (en) * 2019-12-02 2020-05-08 深圳市智微智能软件开发有限公司 Server monitoring method and system
CN111200538B (en) * 2019-12-25 2022-03-11 苏宁云计算有限公司 Monitoring method and device for intelligent equipment
CN111490903B (en) * 2020-04-14 2022-08-09 广州汇智通信技术有限公司 Network data acquisition and processing method and device
WO2022031258A1 (en) * 2020-08-03 2022-02-10 Hitachi Vantara Llc Randomization of heartbeat communications among multiple partition groups
DE112021005629T5 (en) * 2020-10-22 2023-08-24 Panasonic Intellectual Property Management Co., Ltd. ABNORMAL DETECTION DEVICE, ABNORMAL DETECTION METHOD AND PROGRAM
CN112395134A (en) * 2020-11-18 2021-02-23 平安普惠企业管理有限公司 Retry method, device, equipment and medium for application execution exception
CN117033092A (en) * 2023-10-10 2023-11-10 北京大道云行科技有限公司 Single-instance service failover method and system, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140128051A1 (en) * 2011-06-30 2014-05-08 Lg Electronics Inc. Method and apparatus for ranging transmission by mobile station in wireless communication system
US20140344379A1 (en) * 2013-05-17 2014-11-20 Futurewei Technologies, Inc. Multi-Tier Push Hybrid Service Control Architecture for Large Scale Conferencing over ICN
US20150363124A1 (en) * 2012-01-17 2015-12-17 Amazon Technologies, Inc. System and method for data replication using a single master failover protocol

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7289992B2 (en) 2003-05-01 2007-10-30 International Business Machines Corporation Method, system, and program for lock and transaction management
US8543781B2 (en) 2004-02-06 2013-09-24 Vmware, Inc. Hybrid locking using network and on-disk based schemes
US7350117B2 (en) 2004-10-05 2008-03-25 International Business Machines Corporation Management of microcode lock in a shared computing resource
US7496701B2 (en) 2004-11-18 2009-02-24 International Business Machines Corporation Managing virtual server control of computer support systems with heartbeat message
US7594020B2 (en) * 2005-05-31 2009-09-22 Microsoft Corporation Re-establishing a connection for an application layer via a service layer
US7523197B2 (en) * 2006-03-09 2009-04-21 International Business Machines Corporation Method for IP address discovery in rapidly changing network environment
TWI363545B (en) * 2008-06-13 2012-05-01 Coretronic Corp Management method for remote digital signages
CN102843250B (en) * 2011-06-21 2018-01-19 中兴通讯股份有限公司 The adaptive approach and device of a kind of heart beat cycle
US9756089B2 (en) * 2012-08-28 2017-09-05 Facebook, Inc. Maintain persistent connections between servers and mobile clients
CN103442353B (en) * 2013-08-22 2017-05-31 江苏赛联信息产业研究院股份有限公司 A kind of safely controllable internet of things data transmission method
CN107124324B (en) 2016-02-25 2020-09-01 阿里巴巴集团控股有限公司 Heartbeat protocol method and equipment based on lease

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140128051A1 (en) * 2011-06-30 2014-05-08 Lg Electronics Inc. Method and apparatus for ranging transmission by mobile station in wireless communication system
US20150363124A1 (en) * 2012-01-17 2015-12-17 Amazon Technologies, Inc. System and method for data replication using a single master failover protocol
US20140344379A1 (en) * 2013-05-17 2014-11-20 Futurewei Technologies, Inc. Multi-Tier Push Hybrid Service Control Architecture for Large Scale Conferencing over ICN

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10601930B2 (en) 2016-02-25 2020-03-24 Alibaba Group Holding Limited Lease-based heartbeat protocol method and apparatus

Also Published As

Publication number Publication date
EP3420463A1 (en) 2019-01-02
CN107124324A (en) 2017-09-01
EP3420463A4 (en) 2019-09-18
US10601930B2 (en) 2020-03-24
US20170251063A1 (en) 2017-08-31
EP3420463B1 (en) 2021-06-23
CN107124324B (en) 2020-09-01

Similar Documents

Publication Publication Date Title
US10601930B2 (en) Lease-based heartbeat protocol method and apparatus
US10541833B2 (en) System and method for automatically selecting baud rate in a CAN network
US5805785A (en) Method for monitoring and recovery of subsystems in a distributed/clustered system
US9240937B2 (en) Fault detection and recovery as a service
CN107145396B (en) Distributed lock implementation method and device
EP3724761B1 (en) Failure handling in a cloud environment
US20240129384A1 (en) Methods and apparatus for recovering network association information
US8719622B2 (en) Recording and preventing crash in an appliance
CN113986501A (en) Real-time database API (application program interface) uninterrupted calling method, system, storage medium and server
US20220182305A1 (en) Request Processing System and Method Thereof
US9781691B2 (en) Wireless communication device, non-transitory computer readable medium, and wireless communication system
JP2005301436A (en) Cluster system and failure recovery method for it
JP5558279B2 (en) MONITORING / CONTROL SYSTEM, MONITORING / CONTROL DEVICE USED FOR SAME, AND MONITORING / CONTROL METHOD
CN112583879B (en) Request processing method, device and system, storage medium and electronic equipment
US10674337B2 (en) Method and device for processing operation for device peripheral
US8087032B2 (en) Automated recovery process initiation for data consumers of a common information model (CIM) managed component
CN109602413B (en) Heartbeat detection method, heartbeat detection device, storage medium and server
JP2006325118A (en) Monitored data collection system
CN110391945B (en) Topology information collection method and device
WO2015024377A1 (en) Data synchronization method, apparatus and device, and computer storage medium
CN116263696A (en) Machine room task notification processing method, device and task notification processing system
CN117471965A (en) Intelligent driving domain control system health monitoring method and device
JPH07120288B2 (en) Redundant message detection processing method of data processing device
JP2011082906A (en) Radio access apparatus, system and method for monitoring the radio access apparatus, and computer program for the monitoring
JPWO2019043750A1 (en) Communication device, operation procedure management method, and operation procedure management program

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2017757371

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017757371

Country of ref document: EP

Effective date: 20180925

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17757371

Country of ref document: EP

Kind code of ref document: A1