US11831711B2 - System and method for sending and receiving remote procedure calls - Google Patents

System and method for sending and receiving remote procedure calls Download PDF

Info

Publication number
US11831711B2
US11831711B2 US17/980,268 US202217980268A US11831711B2 US 11831711 B2 US11831711 B2 US 11831711B2 US 202217980268 A US202217980268 A US 202217980268A US 11831711 B2 US11831711 B2 US 11831711B2
Authority
US
United States
Prior art keywords
rpcs
rpc
received
computing devices
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US17/980,268
Other versions
US20230052324A1 (en
Inventor
Iulian Moraru
Brian Frank Cooper
Sebastian Kanthak
Alexander Lloyd
Mert Akdere
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US17/980,268 priority Critical patent/US11831711B2/en
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LLOYD, ALEXANDER, MORARU, IULIAN, AKDERE, MERT, COOPER, BRIAN FRANK, KANTHAK, Sebastian
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Publication of US20230052324A1 publication Critical patent/US20230052324A1/en
Application granted granted Critical
Publication of US11831711B2 publication Critical patent/US11831711B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • H04L67/1078Resource delivery mechanisms
    • H04L67/108Resource delivery mechanisms characterised by resources being split in blocks or fragments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/1607Details of the supervisory signal
    • H04L1/1614Details of the supervisory signal using bitmaps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • H04L1/1867Arrangements specially adapted for the transmitter end
    • H04L1/188Time-out mechanisms
    • H04L1/1883Time-out mechanisms using multiple timers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/133Protocols for remote procedure calls [RPC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/324Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the data link layer [OSI layer 2], e.g. HDLC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • H04W28/065Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information using assembly or disassembly of packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/28Timers or timing mechanisms used in protocols

Definitions

  • the proposals can include, for example, data to be written. This can make it difficult to implement the replication protocol in a wide area, since larger proposals may take a long time to transmit over low-bandwidth links, especially in the case of congestion or failures.
  • it is necessary to timeout a proposal transfer when it is not making progress.
  • large proposals making slow progress can also timeout, preventing the proposal from being committed.
  • a leader elected to send proposals may lose leadership if it is having trouble transmitting proposals. Accordingly, large proposals can cause the replication system to be leaderless for indefinite periods, preventing any progress.
  • Large proposals may be split into multiple proposal fragments that can be transferred individually. These fragments may be treated specially to ensure both correctness and liveness. For example, Paxos ordering guarantees may be preserved, despite out-of-order and out-of-band transmission of fragments. As another example, an indication that a sender device is making progress is provided even before the whole proposal is received and acknowledged. Additionally, quality of service (QoS) of a path and network used to transmit the data contained in the large proposal may be varied, while still verifying correctness.
  • QoS quality of service
  • aspects of the disclosure provide a method of sending a proposal, the proposal including a unit of data to be written.
  • the method includes splitting, with one or more processors, the proposal into a plurality of fragments, sending each of the plurality of fragments to a destination device, receiving an acknowledgement from the destination device for each fragment received by the destination device, and sending, with the one or more processors, a message to the destination device, wherein a payload of the message includes one or more references corresponding to one of the plurality of fragments sent.
  • the method further includes determining whether the acknowledgement for any fragment is received within a predetermined time period, and performing an action, such as timing out or relinquishing leadership responsibilities, if the acknowledgement is not received with the predetermined time period.
  • Another aspect of the disclosure provides method of receiving a proposal.
  • This method includes receiving, with one or more processors, one or more fragments of the proposal, storing the received one or more fragments, sending, with the one or more processors, an acknowledgement for each received fragment, receiving, with the one or more processors, a message, wherein a payload of the message includes one or more references to each fragment of the proposal, and reconstructing, with the one or more processors, the proposal using the received one or more fragments.
  • the system includes a sender computing device, including a memory storing a proposal, and one or more processors in communication with the memory.
  • the sender computing device may be configured to split the proposal into a plurality of fragments, send each of the plurality of fragments to a receiver computing device, receive an acknowledgement from the receiver computing device for each fragment received by the receiver computing device, and send a message to the receiver computing device, wherein a payload of the message includes one or more references corresponding to one of the plurality of fragments sent.
  • the receiver computing device which also includes a memory and one or more processors in communication with the memory, may be configured to receive one or more of the plurality of fragments of the proposal, store the received one or more fragments, send the acknowledgement to the sender computing device for each received fragment, receive the message, including the one or more references to each fragment of the proposal, from the sender computing device, and reconstruct the proposal using the received one or more fragments.
  • FIG. 1 is an example system diagram according to aspects of the disclosure.
  • FIG. 2 is a block diagram illustrating an example data transmission according to aspects of the disclosure.
  • FIG. 3 illustrates an example message format according to aspects of the disclosure.
  • FIG. 4 is an example table according to aspects of the disclosure.
  • FIG. 5 is an example flow diagram illustrating a method according to aspects of the disclosure.
  • FIG. 6 is another example flow diagram illustrating a method according to aspects of the disclosure.
  • Large proposals may be split into fragments, and each fragment may be sent separately from a sender device to a receiver device.
  • RPC remote procedure call
  • the replica instead splits the proposal into fragments, and sends each fragment as a separate RPC.
  • the receiver device stores these fragments, for example, in a map in internal or external memory.
  • the sender device also sends a message including pointers to each of the transmitted fragments. For example, when an original RPC is issued, the sender revises the RPC by replacing the payload with a reference to the previously sent fragments. The receiver will then recompose the original payload and execute the original procedure call.
  • a sender device may check a size of a payload prior to sending to determine whether or not to split the proposal. For example, if the payload is larger than 10 MB, the sender will launch a sequence of smaller RPCs, each transferring a fragment of the payload. The sender will then issue another RPC, with a reference to this sequence of fragments. The receiver stores the sequence of fragments in a map in memory, indexed by one or more identifiers. Upon receiving the original RPC (which now references the sequence of fragments instead of encapsulating the payload), the receiver will recompose the original payload and execute the intended procedure.
  • the receiver device sends an acknowledgement to the sender device for each fragment received.
  • the acknowledgement may include, for example, a fragment identifier and a sequence number.
  • the sender device may keep track of these acknowledgements. For example, the sender device may periodically check if an acknowledgement for any fragment has been received. If no acknowledgements have been received within a predetermined time period, such as 10 seconds, the sender device may take an action such as canceling the sequence of fragments or relinquishing leadership.
  • a message handler of the receiver When a message handler of the receiver first receives an RPC that is part of a sequence, such as a fragment or the revised RPC with a references to fragments instead of the payload, it creates an object to keep track of all the fragments it is about to receive. It places this object into a map indexed by an identifier of the sequence. If the message handler receives all the fragments before the revised RPC, it will reconstruct the payload from the fragments and process the revised RPC as it would usually process the original RPC. If the message handler receives the revised RPC before receiving all the fragments, it may reply with a request that the sender retry later, when perhaps all fragments have arrived.
  • the message handler may also periodically check whether more than a predetermined amount of time has passed since the last time a message was received for any active sequence. If so, it may discard the object created for the sequence. If fragments are redelivered, duplicates may be identified, for example, by fragment identifier or sequence identifier.
  • FIG. 1 illustrates an example system 100 , in which a plurality of computing devices are communicatively coupled through network 150 .
  • the plurality of network devices may include servers or other computing devices, which may be capable of operating in one or more states.
  • server 110 is elected to operate in a leader state, while servers 170 - 190 operate in a slave state.
  • sender server 110 provides updates and other information to receiver devices 170 - 190 , for example, by sending proposals.
  • the network 150 may be a datacenter, a load-balanced server farm, or any other type of computing environment, including a backplane of interconnected peripherals or a system of components on a motherboard.
  • the network 150 , and intervening nodes may comprise various configurations and protocols including the Internet, World Wide Web, intranets, virtual private networks, wide area networks, local networks, private networks using communication protocols proprietary to one or more companies, Ethernet, WiFi (such as 802.11, 802.11b, g, n, or other such standards), and HTTP, and various combinations of the foregoing.
  • the sender server 110 may be any type of virtualized or non-virtualized computing device or system of computing devices capable of communicating over a network.
  • Server 110 can contain one or more processors 140 , memory 130 and other components typically present in general purpose computing devices.
  • the memory 130 can store information accessible by the one or more processors 140 , including instructions 138 that can be executed by the one or more processors 140 .
  • Memory 130 can also include data 134 that can be retrieved, manipulated or stored by the processor 140 .
  • the memory can be of any non-transitory type capable of storing information accessible by the processor, such as a hard-drive, memory card, RAM, DVD, write-capable, etc.
  • the instructions 138 can be any set of instructions to be executed directly, such as machine code, or indirectly, such as scripts, by the one or more processors.
  • the terms “instructions,” “applications,” “steps” and “programs” can be used interchangeably herein.
  • the instructions can be stored in object code format for direct processing by a processor, or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. Functions, methods and routines of the instructions are explained in more detail below.
  • Data 134 can be retrieved, stored or modified by the one or more processors 140 in accordance with the instructions 138 .
  • the data 134 may include one or more proposals to be provided to receiver devices 170 - 190 .
  • the proposals may have been received from another network device (not shown) and temporarily stored.
  • the data can be stored in internal or external memory, computer registers, in a relational database as a table having many different fields and records, or XML documents.
  • the data can also be formatted in any computing device-readable format such as, but not limited to, binary values, ASCII or Unicode.
  • the data can comprise any information sufficient to identify the relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories such as at other network locations, or information that is used by a function to calculate the relevant data.
  • the one or more processors 140 can be any conventional processors, such as commercially available CPUs. Alternatively, the processors can be dedicated components such as an application specific integrated circuit (“ASIC”) or other hardware-based processor. Although not necessary, the server 130 may include specialized hardware components to perform specific computing processes.
  • ASIC application specific integrated circuit
  • FIG. 1 functionally illustrates the processor, memory, and other elements of computing device 110 as being within the same block
  • the processor, computer, computing device, or memory can actually comprise multiple processors, computers, computing devices, or memories that may or may not be stored within the same physical housing.
  • the memory can be a hard drive or other storage media located in housings different from that of the computing devices 110 .
  • references to a processor, computer, computing device, or memory will be understood to include references to a collection of processors, computers, computing devices, or memories that may or may not operate in parallel.
  • the computing devices 110 may include server computing devices operating as a load-balanced server farm, distributed system, etc.
  • some functions described below are indicated as taking place on a single computing device having a single processor, various aspects of the subject matter described herein can be implemented by a plurality of computing devices, for example, communicating information over network 150 .
  • the sender server 110 may be connected to a plurality of client computing devices through the network 150 or through another network (not shown). In serving requests from the client devices, the server 110 may send requests, such as RPCs, to the network device 180 .
  • requests such as RPCs
  • the network devices 170 - 190 may be configured similarly to the server computing device 110 .
  • the receiver device 180 is described in further detail in connection with FIG. 2 .
  • the sender server 110 may send proposals to the network devices 170 - 190 through the network 150 .
  • the sender server 110 may determine, prior to sending a proposal, whether the proposal exceeds a predetermined size. For example, the sender server 110 may compare the proposal to a threshold, such as 10 MB. If the proposal exceeds the threshold, the sender server 110 may split the proposal into a number of fragments, and send the fragments individually to the network devices 170 - 190 . For example, each fragment may be sent as a separate RPC. In some examples, each fragment RPC may include a proposal identifier (ID) and a sequence number.
  • ID proposal identifier
  • the sender server 110 may receive an acknowledgement from the receiver servers 170 - 190 for each fragment sent, and the sender may use the received acknowledgements to determine an action to take. For example, if the sender server does not receive an acknowledgement within a given time period, the sender server 110 may time out. The sender server 110 may also replace a payload of the original RPC with one or more pointers to each of the fragments, thereby creating a revised RPC, and send the revised RPC.
  • FIG. 2 provides further detail on a receiver side of the proposal.
  • a receiver such as receiver server 180
  • the network device 180 may be a server or client computing device within a different computing infrastructure.
  • the network device 180 may be a network server connected to and serving one or more client computing devices, and the server 110 may be an application server serving a plurality of network servers.
  • the network device 180 may be a personal computing device having all of the components normally used in connection with a personal computing device, such as a central processing unit (CPU), memory (e.g., RAM and internal hard drives) storing data and instructions, a display (e.g., a monitor having a screen, a touch-screen, a projector, a television, or other device that is operable to display information), user input device (e.g., a mouse, keyboard, touch-screen or microphone), and all of the components used for connecting these elements to one another.
  • CPU central processing unit
  • memory e.g., RAM and internal hard drives
  • a display e.g., a monitor having a screen, a touch-screen, a projector, a television, or other device that is operable to display information
  • user input device e.g., a mouse, keyboard, touch-screen or microphone
  • the receiver server 180 receives fragments 220 of a proposal and a message 225 including pointers to the fragments 220 in place of its payload.
  • the receiver device 180 stores the received fragments 220 , for example, in data 184 .
  • the receiver device 180 sends an acknowledgement to the sender device 110 .
  • the acknowledgement may include identifiers specific to the proposal, the fragment, and/or the acknowledgement.
  • the acknowledgment may also include timing information, such as a time the corresponding fragment was received and a time the acknowledgement was sent.
  • the receiver device 180 may also receive the message 225 , including the pointers, and may use the message 225 to reconstruct the proposal. For example, the receiver device 180 may determine, based on the message 225 , whether all fragments were received. In some examples, the message 225 may also provide an indication in which the fragments 220 should be concatenated to reconstruct the proposal.
  • FIG. 3 illustrates an example of the message 225 .
  • the message 225 may be, for example, an RPC. Accordingly, the message may include header information 310 , such as source identifier, destination identifier, and priority.
  • the message 225 may further include a payload 350 .
  • the message may include the entire proposal to be sent in the payload 350 .
  • the proposal may be replaced, for example by the sender server 110 , with one or more references to each of the fragments 220 into which the original proposal was divided.
  • the reference may include the proposal identifier, and thus one reference could identify every fragment.
  • the receiver device 180 could identify the fragments using only the proposal ID and reconstruct the proposal.
  • the one or more references in the payload may also include sequence numbers of the fragments, a total number of fragments, or other information related to the fragments.
  • multiple references may be included in the payload, for example, such that each reference corresponds to a particular fragment.
  • the receiver device 180 may send an acknowledgement to the sender device for each fragment received.
  • the sender may track the acknowledgments.
  • FIG. 4 provides an example table which may be maintained by the sender device 110 tracking the acknowledgements. It should be understood that any other data structure, including various types of information, may be maintained by the sender device or other devices.
  • table 400 includes a number of columns 410 - 440 , listing identification information (e.g., proposal ID and sequence number) for the fragments sent, time the fragments were sent, information identifying the acknowledgement received, and time the acknowledgement was received.
  • the table also includes rows 450 - 480 . While only a few rows are shown in this example, it should be understood that the number of rows may correspond, for example, to the number of fragments sent, and thus any number of rows may be included.
  • fragments 1 - 4 of proposal “xx” were sent by the sender device. While in this example the fragments were sent at various times within seconds of one another, the timing of sending the fragments may vary. For example, the fragments may be sent all at a same time, or the timing may be further spread out. Moreover, the timing between fragments may be consistent or may vary, for example, in relation to network capacity or size of the fragment.
  • the sender device may take an action, such as timing out, resending the fragment 2 , or relinquishing its responsibilities to another device.
  • a leader lease for the sender device may be extended as long as an acknowledgment for any fragment is received within a given time period. For example, rather than comparing a current time to a time a particular fragment was sent, the sender device may periodically check if any acknowledgment was received. If a subsequent acknowledgement is not received within x seconds, for example, of a previous acknowledgement, then the sender may time out or take some other action.
  • the sending device may continue sending fragments without waiting for an acknowledgement from a previous fragment. For example, after sending the fragment 1 , the sender device sent the fragment 2 before the acknowledgment for fragment 1 was received.
  • the acknowledgments may also be received in a different order than the fragments were sent. For example, although the fragment 3 was sent before the fragment 4 , the acknowledgment for fragment 4 was received before the acknowledgement for fragment 3 .
  • FIG. 5 illustrates an example method 500 of sending large proposals.
  • the method 500 may be performed, for example, by a sender device, such as a leader server in a network.
  • the sender device may determine whether the proposal meets or exceeds a predetermined size.
  • the proposal is split into a number of fragments.
  • the sender may invoke an RPC for the original proposal and also invoke a number of fragment RPCs.
  • the proposal may be split into a number of fragments of approximately equal size, wherein the number is based on the overall size of the proposal.
  • the splitting may be a logical division based on a content of the proposal. Further, it is possible that the sender device could dynamically adapt the fragment size based on how quickly fragments are received and acknowledged.
  • the sender device sends each of the fragments to a receiver device, such as a slave server.
  • a receiver device such as a slave server.
  • Each fragment may be sent with one or more identifier, such as a unique identifier of the fragment sequence and a sequence number.
  • each fragment is sent as a separate RPC.
  • other message formats are also possible.
  • acknowledgements are received for the fragments.
  • the acknowledgments may also include identifiers and other information, for example, identifying the fragment to which they correspond.
  • the sender device may track the fragments sent and corresponding acknowledgements received. If the sender device determines that an acknowledgement was not received within a given time period (block 540 ), the sender may time out (block 545 ) or take some other action. Otherwise, the sender may continue sending fragments until it determines (block 550 ) that all fragments have been sent.
  • the sender replaces a payload of the original proposal with references to each of the fragments, and sends the original proposal. While in this example the sender only sends the original proposal with references once all fragments are sent, the order of sending fragments and the original proposal may be modified.
  • FIG. 6 illustrates a method 600 of receiving fragments and reconstructing a proposal therefrom.
  • a receiver device receives individual fragments, for example, as individual RPCs.
  • the receiver device stores the received fragments (block 620 ), for example in internal or external memory.
  • the receiver device sends an acknowledgment (block 630 ).
  • the acknowledgement may include information identifying the particular fragment received, a time of receipt, a time the acknowledgement was sent, etc.
  • the receiver device receives the original proposal having its payload replaced with references to the fragments.
  • the receiver device may compare the references to the fragments it received and stored (block 650 ). If it determines (block 660 ) that one or more fragments referenced in the message were not received, the receiver may take an action, such as requesting the missing fragments (block 665 ) or requesting to resend the entire proposal. However, if all fragments were received, the received device may reconstruct the original payload in block 670 . For example, the sender device may concatenate the fragments based on an ordering indicated in each fragment, in the reference message, or elsewhere.
  • fragments may be redelivered, thus causing duplicate fragments to be received and stored by the received device. Accordingly, the receiver device may identify and discount duplicates based on, for example, the unique identifier and sequence numbers included with each fragment.
  • RPCs While some of the examples above are described with respect to RPCs, it should be understood that other inter-process communication techniques may also be used. For example, user datagram protocol, transmission control protocol, or other types of messages may be used. Further, the examples above are not limited to Paxos, but rather may be implemented in other state machine replication protocols in any of a variety of distributed networking environments.
  • the above described systems and methods may be advantageous in that in enables large proposals to be sent with increased efficiency. Also, liveness is confirmed by receiving periodic acknowledgements for fragments of proposals. Moreover, using RPCs may be advantageous for a number of reasons. For example, using RPCs allows for attachment of specific information, such as information necessary to maintain leader time leases, to individual fragments. As a further example, by sending each fragment as a separate RPC, it is easier to integrate support for large proposal fragments into code for existing replication protocols such as Paxos, because the RPCs can be treated as any other message. Even further, the systems and methods described above are agnostic to underlying communication primitives layered underneath RPC abstraction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

Systems and methods are provided for sending and receiving remote procedure calls (RPCs). Based on a message in a first RPC, a second set of RPCs are created by one or more computing devices, where each of the second set of RPCs includes a portion of the message in the first RPC. The message in the first RPC is replaced with a reference to each RPC of the second set of RPCs. The one or more computing devices send the first RPC with the references, and also send the second set of RPCs.

Description

CROSS REFERENCE TO RELATED APPLICATIONS
The present application is a continuation of U.S. patent application Ser. No. 16/538,178, filed on Aug. 12, 2019, which is a continuation of U.S. patent application Ser. No. 14/799,891, filed on Jul. 15, 2015 (now U.S. Pat. No. 10,462,218), all of which are incorporated herein by reference.
BACKGROUND
Standard implementations of the Paxos state machine replication protocol, or of other state machine replication protocols, require “proposals” for how the replicated state should be changed to be sent and accepted as an indivisible unit. The proposals can include, for example, data to be written. This can make it difficult to implement the replication protocol in a wide area, since larger proposals may take a long time to transmit over low-bandwidth links, especially in the case of congestion or failures. In order to ensure liveness of the replication protocol, it is necessary to timeout a proposal transfer when it is not making progress. However, large proposals making slow progress can also timeout, preventing the proposal from being committed. In some instances, a leader elected to send proposals may lose leadership if it is having trouble transmitting proposals. Accordingly, large proposals can cause the replication system to be leaderless for indefinite periods, preventing any progress.
SUMMARY
Large proposals may be split into multiple proposal fragments that can be transferred individually. These fragments may be treated specially to ensure both correctness and liveness. For example, Paxos ordering guarantees may be preserved, despite out-of-order and out-of-band transmission of fragments. As another example, an indication that a sender device is making progress is provided even before the whole proposal is received and acknowledged. Additionally, quality of service (QoS) of a path and network used to transmit the data contained in the large proposal may be varied, while still verifying correctness.
Aspects of the disclosure provide a method of sending a proposal, the proposal including a unit of data to be written. The method includes splitting, with one or more processors, the proposal into a plurality of fragments, sending each of the plurality of fragments to a destination device, receiving an acknowledgement from the destination device for each fragment received by the destination device, and sending, with the one or more processors, a message to the destination device, wherein a payload of the message includes one or more references corresponding to one of the plurality of fragments sent. In some examples, the method further includes determining whether the acknowledgement for any fragment is received within a predetermined time period, and performing an action, such as timing out or relinquishing leadership responsibilities, if the acknowledgement is not received with the predetermined time period.
Another aspect of the disclosure provides method of receiving a proposal. This method includes receiving, with one or more processors, one or more fragments of the proposal, storing the received one or more fragments, sending, with the one or more processors, an acknowledgement for each received fragment, receiving, with the one or more processors, a message, wherein a payload of the message includes one or more references to each fragment of the proposal, and reconstructing, with the one or more processors, the proposal using the received one or more fragments.
Yet another aspect provides a system. The system includes a sender computing device, including a memory storing a proposal, and one or more processors in communication with the memory. The sender computing device may be configured to split the proposal into a plurality of fragments, send each of the plurality of fragments to a receiver computing device, receive an acknowledgement from the receiver computing device for each fragment received by the receiver computing device, and send a message to the receiver computing device, wherein a payload of the message includes one or more references corresponding to one of the plurality of fragments sent. The receiver computing device, which also includes a memory and one or more processors in communication with the memory, may be configured to receive one or more of the plurality of fragments of the proposal, store the received one or more fragments, send the acknowledgement to the sender computing device for each received fragment, receive the message, including the one or more references to each fragment of the proposal, from the sender computing device, and reconstruct the proposal using the received one or more fragments.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an example system diagram according to aspects of the disclosure.
FIG. 2 is a block diagram illustrating an example data transmission according to aspects of the disclosure.
FIG. 3 illustrates an example message format according to aspects of the disclosure.
FIG. 4 is an example table according to aspects of the disclosure.
FIG. 5 is an example flow diagram illustrating a method according to aspects of the disclosure.
FIG. 6 is another example flow diagram illustrating a method according to aspects of the disclosure.
DETAILED DESCRIPTION
Large proposals may be split into fragments, and each fragment may be sent separately from a sender device to a receiver device. For example, when a Paxos replica needs to make a remote procedure call (RPC) with a payload that contains a large Paxos proposal, the replica instead splits the proposal into fragments, and sends each fragment as a separate RPC. The receiver device stores these fragments, for example, in a map in internal or external memory. The sender device also sends a message including pointers to each of the transmitted fragments. For example, when an original RPC is issued, the sender revises the RPC by replacing the payload with a reference to the previously sent fragments. The receiver will then recompose the original payload and execute the original procedure call.
A sender device may check a size of a payload prior to sending to determine whether or not to split the proposal. For example, if the payload is larger than 10 MB, the sender will launch a sequence of smaller RPCs, each transferring a fragment of the payload. The sender will then issue another RPC, with a reference to this sequence of fragments. The receiver stores the sequence of fragments in a map in memory, indexed by one or more identifiers. Upon receiving the original RPC (which now references the sequence of fragments instead of encapsulating the payload), the receiver will recompose the original payload and execute the intended procedure.
The receiver device sends an acknowledgement to the sender device for each fragment received. The acknowledgement may include, for example, a fragment identifier and a sequence number. The sender device may keep track of these acknowledgements. For example, the sender device may periodically check if an acknowledgement for any fragment has been received. If no acknowledgements have been received within a predetermined time period, such as 10 seconds, the sender device may take an action such as canceling the sequence of fragments or relinquishing leadership.
When a message handler of the receiver first receives an RPC that is part of a sequence, such as a fragment or the revised RPC with a references to fragments instead of the payload, it creates an object to keep track of all the fragments it is about to receive. It places this object into a map indexed by an identifier of the sequence. If the message handler receives all the fragments before the revised RPC, it will reconstruct the payload from the fragments and process the revised RPC as it would usually process the original RPC. If the message handler receives the revised RPC before receiving all the fragments, it may reply with a request that the sender retry later, when perhaps all fragments have arrived.
The message handler may also periodically check whether more than a predetermined amount of time has passed since the last time a message was received for any active sequence. If so, it may discard the object created for the sequence. If fragments are redelivered, duplicates may be identified, for example, by fragment identifier or sequence identifier.
FIG. 1 illustrates an example system 100, in which a plurality of computing devices are communicatively coupled through network 150. The plurality of network devices may include servers or other computing devices, which may be capable of operating in one or more states. In this example, server 110 is elected to operate in a leader state, while servers 170-190 operate in a slave state. In this regard, sender server 110 provides updates and other information to receiver devices 170-190, for example, by sending proposals.
The network 150 may be a datacenter, a load-balanced server farm, or any other type of computing environment, including a backplane of interconnected peripherals or a system of components on a motherboard. The network 150, and intervening nodes, may comprise various configurations and protocols including the Internet, World Wide Web, intranets, virtual private networks, wide area networks, local networks, private networks using communication protocols proprietary to one or more companies, Ethernet, WiFi (such as 802.11, 802.11b, g, n, or other such standards), and HTTP, and various combinations of the foregoing.
The sender server 110 may be any type of virtualized or non-virtualized computing device or system of computing devices capable of communicating over a network. Server 110 can contain one or more processors 140, memory 130 and other components typically present in general purpose computing devices. The memory 130 can store information accessible by the one or more processors 140, including instructions 138 that can be executed by the one or more processors 140.
Memory 130 can also include data 134 that can be retrieved, manipulated or stored by the processor 140. The memory can be of any non-transitory type capable of storing information accessible by the processor, such as a hard-drive, memory card, RAM, DVD, write-capable, etc.
The instructions 138 can be any set of instructions to be executed directly, such as machine code, or indirectly, such as scripts, by the one or more processors. In that regard, the terms “instructions,” “applications,” “steps” and “programs” can be used interchangeably herein. The instructions can be stored in object code format for direct processing by a processor, or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. Functions, methods and routines of the instructions are explained in more detail below.
Data 134 can be retrieved, stored or modified by the one or more processors 140 in accordance with the instructions 138. In one example, the data 134 may include one or more proposals to be provided to receiver devices 170-190. The proposals may have been received from another network device (not shown) and temporarily stored. Although the subject matter described herein is not limited by any particular data structure, the data can be stored in internal or external memory, computer registers, in a relational database as a table having many different fields and records, or XML documents. The data can also be formatted in any computing device-readable format such as, but not limited to, binary values, ASCII or Unicode. Moreover, the data can comprise any information sufficient to identify the relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories such as at other network locations, or information that is used by a function to calculate the relevant data.
The one or more processors 140 can be any conventional processors, such as commercially available CPUs. Alternatively, the processors can be dedicated components such as an application specific integrated circuit (“ASIC”) or other hardware-based processor. Although not necessary, the server 130 may include specialized hardware components to perform specific computing processes.
Although FIG. 1 functionally illustrates the processor, memory, and other elements of computing device 110 as being within the same block, the processor, computer, computing device, or memory can actually comprise multiple processors, computers, computing devices, or memories that may or may not be stored within the same physical housing. For example, the memory can be a hard drive or other storage media located in housings different from that of the computing devices 110. Accordingly, references to a processor, computer, computing device, or memory will be understood to include references to a collection of processors, computers, computing devices, or memories that may or may not operate in parallel. For example, the computing devices 110 may include server computing devices operating as a load-balanced server farm, distributed system, etc. Yet further, although some functions described below are indicated as taking place on a single computing device having a single processor, various aspects of the subject matter described herein can be implemented by a plurality of computing devices, for example, communicating information over network 150.
Although only a few computing devices are depicted in FIG. 1 , it should be appreciated that a typical system can include a large number of connected computing devices, with each different computing device being at a different node of the network 150. For example, the sender server 110 may be connected to a plurality of client computing devices through the network 150 or through another network (not shown). In serving requests from the client devices, the server 110 may send requests, such as RPCs, to the network device 180. Although certain advantages are obtained when information is transmitted or received as noted above, other aspects of the subject matter described herein are not limited to any particular manner of transmission of information.
The network devices 170-190 may be configured similarly to the server computing device 110. As an example, the receiver device 180 is described in further detail in connection with FIG. 2 .
As mentioned above, the sender server 110 may send proposals to the network devices 170-190 through the network 150. The sender server 110 may determine, prior to sending a proposal, whether the proposal exceeds a predetermined size. For example, the sender server 110 may compare the proposal to a threshold, such as 10 MB. If the proposal exceeds the threshold, the sender server 110 may split the proposal into a number of fragments, and send the fragments individually to the network devices 170-190. For example, each fragment may be sent as a separate RPC. In some examples, each fragment RPC may include a proposal identifier (ID) and a sequence number. The sender server 110 may receive an acknowledgement from the receiver servers 170-190 for each fragment sent, and the sender may use the received acknowledgements to determine an action to take. For example, if the sender server does not receive an acknowledgement within a given time period, the sender server 110 may time out. The sender server 110 may also replace a payload of the original RPC with one or more pointers to each of the fragments, thereby creating a revised RPC, and send the revised RPC.
FIG. 2 provides further detail on a receiver side of the proposal. A receiver, such as receiver server 180, may be structured similarly to the sender server 110, with one or more processors 189 and memory 182, including data 184 and instructions 188 as described above. In some examples, the network device 180 may be a server or client computing device within a different computing infrastructure. For example, the network device 180 may be a network server connected to and serving one or more client computing devices, and the server 110 may be an application server serving a plurality of network servers. In some examples, the network device 180 may be a personal computing device having all of the components normally used in connection with a personal computing device, such as a central processing unit (CPU), memory (e.g., RAM and internal hard drives) storing data and instructions, a display (e.g., a monitor having a screen, a touch-screen, a projector, a television, or other device that is operable to display information), user input device (e.g., a mouse, keyboard, touch-screen or microphone), and all of the components used for connecting these elements to one another.
As shown in FIG. 2 , the receiver server 180 receives fragments 220 of a proposal and a message 225 including pointers to the fragments 220 in place of its payload. The receiver device 180 stores the received fragments 220, for example, in data 184. For each fragment received, the receiver device 180 sends an acknowledgement to the sender device 110. The acknowledgement may include identifiers specific to the proposal, the fragment, and/or the acknowledgement. The acknowledgment may also include timing information, such as a time the corresponding fragment was received and a time the acknowledgement was sent. The receiver device 180 may also receive the message 225, including the pointers, and may use the message 225 to reconstruct the proposal. For example, the receiver device 180 may determine, based on the message 225, whether all fragments were received. In some examples, the message 225 may also provide an indication in which the fragments 220 should be concatenated to reconstruct the proposal.
FIG. 3 illustrates an example of the message 225. The message 225 may be, for example, an RPC. Accordingly, the message may include header information 310, such as source identifier, destination identifier, and priority. The message 225 may further include a payload 350. When the message 225 is originally received by the sender server 110 for sending to the receiver devices 170-190, the message may include the entire proposal to be sent in the payload 350. However, the proposal may be replaced, for example by the sender server 110, with one or more references to each of the fragments 220 into which the original proposal was divided. In some examples, the reference may include the proposal identifier, and thus one reference could identify every fragment. For example, if each fragment is uniquely identified by the proposal ID, which effectively identifies the entire sequence of fragments, and the sequence number that represents its position in the sequence, the receiver device 180 could identify the fragments using only the proposal ID and reconstruct the proposal. In other examples, the one or more references in the payload may also include sequence numbers of the fragments, a total number of fragments, or other information related to the fragments. Moreover, multiple references may be included in the payload, for example, such that each reference corresponds to a particular fragment.
As mentioned above, the receiver device 180 may send an acknowledgement to the sender device for each fragment received. The sender may track the acknowledgments. FIG. 4 provides an example table which may be maintained by the sender device 110 tracking the acknowledgements. It should be understood that any other data structure, including various types of information, may be maintained by the sender device or other devices.
As seen in FIG. 4 , table 400 includes a number of columns 410-440, listing identification information (e.g., proposal ID and sequence number) for the fragments sent, time the fragments were sent, information identifying the acknowledgement received, and time the acknowledgement was received. The table also includes rows 450-480. While only a few rows are shown in this example, it should be understood that the number of rows may correspond, for example, to the number of fragments sent, and thus any number of rows may be included.
In the example of FIG. 4 , fragments 1-4 of proposal “xx” were sent by the sender device. While in this example the fragments were sent at various times within seconds of one another, the timing of sending the fragments may vary. For example, the fragments may be sent all at a same time, or the timing may be further spread out. Moreover, the timing between fragments may be consistent or may vary, for example, in relation to network capacity or size of the fragment.
Acknowledgements were only received for fragments 1, 3, and 4. If the acknowledgement for fragment 2 is not received within a predetermined time, the sender device may take an action, such as timing out, resending the fragment 2, or relinquishing its responsibilities to another device. In some examples, a leader lease for the sender device may be extended as long as an acknowledgment for any fragment is received within a given time period. For example, rather than comparing a current time to a time a particular fragment was sent, the sender device may periodically check if any acknowledgment was received. If a subsequent acknowledgement is not received within x seconds, for example, of a previous acknowledgement, then the sender may time out or take some other action.
The sending device may continue sending fragments without waiting for an acknowledgement from a previous fragment. For example, after sending the fragment 1, the sender device sent the fragment 2 before the acknowledgment for fragment 1 was received. The acknowledgments may also be received in a different order than the fragments were sent. For example, although the fragment 3 was sent before the fragment 4, the acknowledgment for fragment 4 was received before the acknowledgement for fragment 3.
In addition to the operations described above and illustrated in the figures, various operations will now be described. It should be understood that the following operations do not have to be performed in the precise order described below. Rather, various steps can be handled in a different order or simultaneously, and steps may also be added or omitted.
FIG. 5 illustrates an example method 500 of sending large proposals. The method 500 may be performed, for example, by a sender device, such as a leader server in a network. Upon receipt of a proposal for sending, the sender device may determine whether the proposal meets or exceeds a predetermined size.
If the proposal meets or exceeds the predetermined size, in block 510 the proposal is split into a number of fragments. For example, the sender may invoke an RPC for the original proposal and also invoke a number of fragment RPCs. According to some examples, the proposal may be split into a number of fragments of approximately equal size, wherein the number is based on the overall size of the proposal. In other examples, the splitting may be a logical division based on a content of the proposal. Further, it is possible that the sender device could dynamically adapt the fragment size based on how quickly fragments are received and acknowledged.
In block 520, the sender device sends each of the fragments to a receiver device, such as a slave server. Each fragment may be sent with one or more identifier, such as a unique identifier of the fragment sequence and a sequence number. In some examples, each fragment is sent as a separate RPC. However, other message formats are also possible.
In block 530, acknowledgements are received for the fragments. The acknowledgments may also include identifiers and other information, for example, identifying the fragment to which they correspond. In this regard, the sender device may track the fragments sent and corresponding acknowledgements received. If the sender device determines that an acknowledgement was not received within a given time period (block 540), the sender may time out (block 545) or take some other action. Otherwise, the sender may continue sending fragments until it determines (block 550) that all fragments have been sent.
In block 560, the sender replaces a payload of the original proposal with references to each of the fragments, and sends the original proposal. While in this example the sender only sends the original proposal with references once all fragments are sent, the order of sending fragments and the original proposal may be modified.
FIG. 6 illustrates a method 600 of receiving fragments and reconstructing a proposal therefrom. In block 610, a receiver device receives individual fragments, for example, as individual RPCs. The receiver device stores the received fragments (block 620), for example in internal or external memory. For each received fragment, the receiver device sends an acknowledgment (block 630). The acknowledgement may include information identifying the particular fragment received, a time of receipt, a time the acknowledgement was sent, etc.
In block 640, the receiver device receives the original proposal having its payload replaced with references to the fragments. The receiver device may compare the references to the fragments it received and stored (block 650). If it determines (block 660) that one or more fragments referenced in the message were not received, the receiver may take an action, such as requesting the missing fragments (block 665) or requesting to resend the entire proposal. However, if all fragments were received, the received device may reconstruct the original payload in block 670. For example, the sender device may concatenate the fragments based on an ordering indicated in each fragment, in the reference message, or elsewhere.
In some examples, fragments may be redelivered, thus causing duplicate fragments to be received and stored by the received device. Accordingly, the receiver device may identify and discount duplicates based on, for example, the unique identifier and sequence numbers included with each fragment.
While some of the examples above are described with respect to RPCs, it should be understood that other inter-process communication techniques may also be used. For example, user datagram protocol, transmission control protocol, or other types of messages may be used. Further, the examples above are not limited to Paxos, but rather may be implemented in other state machine replication protocols in any of a variety of distributed networking environments.
The above described systems and methods may be advantageous in that in enables large proposals to be sent with increased efficiency. Also, liveness is confirmed by receiving periodic acknowledgements for fragments of proposals. Moreover, using RPCs may be advantageous for a number of reasons. For example, using RPCs allows for attachment of specific information, such as information necessary to maintain leader time leases, to individual fragments. As a further example, by sending each fragment as a separate RPC, it is easier to integrate support for large proposal fragments into code for existing replication protocols such as Paxos, because the RPCs can be treated as any other message. Even further, the systems and methods described above are agnostic to underlying communication primitives layered underneath RPC abstraction.
As these and other variations and combinations of the features discussed above can be utilized without departing from the subject matter defined by the claims, the foregoing description of the embodiments should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. As an example, the preceding operations do not have to be performed in the precise order described above. Rather, various steps can be handled in a different order or simultaneously. Steps can also be omitted unless otherwise stated. In addition, the provision of the examples described herein, as well as clauses phrased as “such as,” “including” and the like, should not be interpreted as limiting the subject matter of the claims to the specific examples; rather, the examples are intended to illustrate only one of many possible embodiments. Further, the same reference numbers in different drawings can identify the same or similar elements.

Claims (20)

The invention claimed is:
1. A method, comprising:
receiving, by one or more computing devices elected to operate as a lease leader, a first Remote Procedure Call (RPC), the first RPC comprising a message and a leader lease duration;
creating, by the one or more computing devices, a second set of RPCs each including a respective portion of the message in the first RPC;
revising, by the one or more computing devices, the message in the first RPC to replace the message with a reference to each RPC of the second set of RPCs;
sending, by the one or more computing devices, the first RPC with the references;
sending, by the one or more computing devices, the second set of RPCs;
receiving, by the one or more computing devices, one or more acknowledgements indicating that one or more RPCs of the second set of RPCs were received; and
extending, by the one or more computing devices in response to receiving one or more acknowledgements within a predetermined period of time, the duration of the leader lease for the one or more computing devices.
2. The method of claim 1, further comprising:
prior to sending the entire second set of RPCs, receiving, by the one or more computing devices, timing information indicating when one or more RPCs of the second set of RPCs were received; and
changing, by the one or more computing devices, a size of one or more remaining RPCs in the second set of RPCs based on the timing information.
3. The method of claim 1, further comprising:
determining, by the one or more computing devices, a size of the message in the first RPC, wherein creating the second set of RPCs is based on the size of the message in the first RPC.
4. The method of claim 1, wherein each second RPC of the second set of RPCs includes a respective identifier referencing the first RPC.
5. The method of claim 1, wherein the second set of RPCs is created as an ordered sequence, and each of the second set of RPCs includes a number indicating a position in the ordered sequence.
6. The method of claim 1, further comprising:
determining, by the one or more computing devices, whether any of the one or more acknowledgements is not received within a predetermined time period;
performing, by the one or more computing devices, an action based on determining that any of the one or more acknowledgements is not received within the predetermined time period.
7. The method of claim 6, wherein the action comprises canceling the sending of the second set of RPCs.
8. The method of claim 6, wherein the action comprises relinquishing responsibility for sending the second set of RPCs to one or more other computing devices.
9. The method of claim 6, wherein the action comprises extending the duration of the leader lease.
10. The method of claim 1, further comprising:
receiving one or more acknowledgements indicating that one or more RPCs of the second set of RPCs was received;
determining whether any of the one or more acknowledgements is not received within a predetermined time period; and
performing an action based on determining that any of the one or more acknowledgements is not received within the predetermined time period.
11. A system, comprising:
two or more computing devices, including a first computing device elected as a lease leader and one or more additional computing devices;
wherein the first computing device is configured to:
receive a first Remote Procedure Call (RPC), the first RPC comprising a message and a leader lease duration;
create a second set of RPCs each including a respective portion of the message in the first RPC;
revise the message in the first RPC to replace the message with a reference to each RPC of the second set of RPCs;
send the first RPC with the references;
send the second set of RPCs;
receive, from the one or more additional computing devices, one or more acknowledgments indicating that one or more RPCs of the second set of RPCs were received; and
extend, in response to receiving one or more acknowledgements within a predetermined period of time, the duration of the leader lease for the first computing device.
12. The system of claim 11, wherein the first computing device is further configured to:
prior to sending the entire second set of RPCs, receive timing information indicating when one or more RPCs of the second set of RPCs were received; and
change a size of one or more remaining RPCs in the second set of RPCs based on the timing information.
13. The system of claim 11, further comprising:
determining, by the one or more computing devices, a size of the message in the first RPC, wherein creating the second set of RPCs is based on the size of the message in the first RPC.
14. The system of claim 11, wherein each second RPC of the second set of RPCs includes a respective identifier referencing the first RPC.
15. The system of claim 11, wherein the second set of RPCs is created as an ordered sequence, and each of the second set of RPCs includes a number indicating a position in the ordered sequence.
16. The system of claim 11, wherein the first computing device is further configured to:
determine whether any of the one or more acknowledgements is not received within a predetermined time period; and
perform an action based on determining that any of the one or more acknowledgements is not received within the predetermined time period.
17. The system of claim 16, wherein the action comprises canceling the sending of the second set of RPCs.
18. The system of claim 16, wherein the action comprises relinquishing responsibility for sending the second set of RPCs to one or more other computing devices.
19. The system of claim 16, wherein the action comprises extending the duration of the leader lease.
20. The system of claim 11, wherein the first computing device is further configured to:
Receive one or more acknowledgements indicating that one or more RPCs of the second set of RPCs was received;
determine whether any of the one or more acknowledgements is not received within a predetermined time period; and
perform an action based on determining that any of the one or more acknowledgements is not received within the predetermined time period.
US17/980,268 2015-07-15 2022-11-03 System and method for sending and receiving remote procedure calls Active US11831711B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/980,268 US11831711B2 (en) 2015-07-15 2022-11-03 System and method for sending and receiving remote procedure calls

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14/799,891 US10462218B1 (en) 2015-07-15 2015-07-15 System and method for sending proposals within a distributed state machine replication system
US16/538,178 US11496572B1 (en) 2015-07-15 2019-08-12 System and method for sending and receiving remote procedure calls
US17/980,268 US11831711B2 (en) 2015-07-15 2022-11-03 System and method for sending and receiving remote procedure calls

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US16/538,178 Continuation US11496572B1 (en) 2015-07-15 2019-08-12 System and method for sending and receiving remote procedure calls

Publications (2)

Publication Number Publication Date
US20230052324A1 US20230052324A1 (en) 2023-02-16
US11831711B2 true US11831711B2 (en) 2023-11-28

Family

ID=68315063

Family Applications (3)

Application Number Title Priority Date Filing Date
US14/799,891 Active 2035-09-01 US10462218B1 (en) 2015-07-15 2015-07-15 System and method for sending proposals within a distributed state machine replication system
US16/538,178 Active 2035-08-14 US11496572B1 (en) 2015-07-15 2019-08-12 System and method for sending and receiving remote procedure calls
US17/980,268 Active US11831711B2 (en) 2015-07-15 2022-11-03 System and method for sending and receiving remote procedure calls

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US14/799,891 Active 2035-09-01 US10462218B1 (en) 2015-07-15 2015-07-15 System and method for sending proposals within a distributed state machine replication system
US16/538,178 Active 2035-08-14 US11496572B1 (en) 2015-07-15 2019-08-12 System and method for sending and receiving remote procedure calls

Country Status (1)

Country Link
US (3) US10462218B1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3664398A1 (en) * 2018-12-06 2020-06-10 InterDigital CE Patent Holdings Network equipment and method for delivering data packets

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5790676A (en) 1994-11-23 1998-08-04 Hughes Electronics Corporation Radio port controller in a wireless personal communications system
US7023848B2 (en) 2000-05-16 2006-04-04 Eci Telecom Ltd. Rearrangement of data streams
US7298746B1 (en) 2002-02-11 2007-11-20 Extreme Networks Method and system for reassembling and parsing packets in a network environment
US7447234B2 (en) 2002-05-28 2008-11-04 Thales Method for the reconstitution of messages conveyed by one or more packet-switched transmission networks
US7620680B1 (en) 2002-08-15 2009-11-17 Microsoft Corporation Fast byzantine paxos
US20100180116A1 (en) 2008-11-03 2010-07-15 Telcordia Technologies, Inc. Intrusion-tolerant group management for mobile ad-hoc networks
US20120042196A1 (en) 2010-08-14 2012-02-16 Mohit Aron Management of a distributed computing system through replication of write ahead logs
US20120130950A1 (en) 2010-11-23 2012-05-24 Canon Kabushiki Kaisha Data replication to multiple data nodes
US20120233228A1 (en) 2011-03-08 2012-09-13 Rackspace Us, Inc. Appending to files via server-side chunking and manifest manipulation
US8276035B1 (en) 2008-07-28 2012-09-25 Netmotion Wireless, Inc. High performance digital communications resiliency in a roamable virtual private network
US20120254319A1 (en) 2011-04-02 2012-10-04 Recursion Software, Inc. System and method for communication between networked applications
US20120254412A1 (en) 2011-03-28 2012-10-04 Siemens Corporation Replicated state machine utilizing view change protocol resilient to performance attacks
US20130039166A1 (en) 2011-08-12 2013-02-14 International Business Machines Corporation Hierarchical network failure handling in a clustered node environment
US8412856B2 (en) 2009-10-26 2013-04-02 Sony Computer Entertainment America Llc. File input/output scheduler using immediate data chunking
US8751598B1 (en) 2010-11-03 2014-06-10 Netapp, Inc. Method and system for implementing an unordered delivery of data between nodes in a clustered storage system
US20140304357A1 (en) 2013-01-23 2014-10-09 Nexenta Systems, Inc. Scalable object storage using multicast transport
US20150242481A1 (en) 2013-04-16 2015-08-27 Hitachi, Ltd. Computer system, computer system management method, and program
US20150278244A1 (en) 2005-01-12 2015-10-01 Wandisco, Inc. Geographically-distributed file system using coordinated namespace replication over a wide area network
US20150341151A1 (en) 2012-04-04 2015-11-26 Qualcomm Incorporated Method and apparatus for selective acknowledgement of packets from an access probe
US9230000B1 (en) 2012-06-04 2016-01-05 Google Inc. Pipelining Paxos state machines
US20170085648A1 (en) 2015-09-18 2017-03-23 Avigilon Corporation Physical security system having multiple server nodes configured to implement a conditionally triggered rule
US10565227B1 (en) 2016-08-31 2020-02-18 Amazon Technologies, Inc. Leadership lease protocol for data replication groups

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5790676A (en) 1994-11-23 1998-08-04 Hughes Electronics Corporation Radio port controller in a wireless personal communications system
US7023848B2 (en) 2000-05-16 2006-04-04 Eci Telecom Ltd. Rearrangement of data streams
US7298746B1 (en) 2002-02-11 2007-11-20 Extreme Networks Method and system for reassembling and parsing packets in a network environment
US7447234B2 (en) 2002-05-28 2008-11-04 Thales Method for the reconstitution of messages conveyed by one or more packet-switched transmission networks
US7620680B1 (en) 2002-08-15 2009-11-17 Microsoft Corporation Fast byzantine paxos
US20150278244A1 (en) 2005-01-12 2015-10-01 Wandisco, Inc. Geographically-distributed file system using coordinated namespace replication over a wide area network
US8276035B1 (en) 2008-07-28 2012-09-25 Netmotion Wireless, Inc. High performance digital communications resiliency in a roamable virtual private network
US20100180116A1 (en) 2008-11-03 2010-07-15 Telcordia Technologies, Inc. Intrusion-tolerant group management for mobile ad-hoc networks
US8412856B2 (en) 2009-10-26 2013-04-02 Sony Computer Entertainment America Llc. File input/output scheduler using immediate data chunking
US20120042196A1 (en) 2010-08-14 2012-02-16 Mohit Aron Management of a distributed computing system through replication of write ahead logs
US8751598B1 (en) 2010-11-03 2014-06-10 Netapp, Inc. Method and system for implementing an unordered delivery of data between nodes in a clustered storage system
US20120130950A1 (en) 2010-11-23 2012-05-24 Canon Kabushiki Kaisha Data replication to multiple data nodes
US20120233228A1 (en) 2011-03-08 2012-09-13 Rackspace Us, Inc. Appending to files via server-side chunking and manifest manipulation
US20120254412A1 (en) 2011-03-28 2012-10-04 Siemens Corporation Replicated state machine utilizing view change protocol resilient to performance attacks
US20120254319A1 (en) 2011-04-02 2012-10-04 Recursion Software, Inc. System and method for communication between networked applications
US20130039166A1 (en) 2011-08-12 2013-02-14 International Business Machines Corporation Hierarchical network failure handling in a clustered node environment
US20150341151A1 (en) 2012-04-04 2015-11-26 Qualcomm Incorporated Method and apparatus for selective acknowledgement of packets from an access probe
US9230000B1 (en) 2012-06-04 2016-01-05 Google Inc. Pipelining Paxos state machines
US20140304357A1 (en) 2013-01-23 2014-10-09 Nexenta Systems, Inc. Scalable object storage using multicast transport
US20150242481A1 (en) 2013-04-16 2015-08-27 Hitachi, Ltd. Computer system, computer system management method, and program
US20170085648A1 (en) 2015-09-18 2017-03-23 Avigilon Corporation Physical security system having multiple server nodes configured to implement a conditionally triggered rule
US10565227B1 (en) 2016-08-31 2020-02-18 Amazon Technologies, Inc. Leadership lease protocol for data replication groups

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
Borman et al., "IPv6 Jumbograms", Standards Track, Network Working Group, Aug. 1999.
Corbett, et al., "Spanner: Google's Globally-Distributed Database", ACM Transactions on Computer Systems (TOCS), vol. 31, No. 3, Aug. 2013, 14 pages.
Druschel et al., "Fbufs: A High-Bandwidth Cross-Domain Transfer Facility", downloaded Jun. 17, 2015.
Konczak et al. JPaxos: State machine replication based on the Paxos Protocol. EPFL-Report-167765. Jul. 31, 2011.
Kozierok, Charles TCP/IP Guide Oct. 4, 2005 Publisher: No Starch Press Chapter 49 TCP Reliability and Flow Control Featuers.
Lamport, Leslie, "Paxos Made Simple", Nov. 1, 2001, 14 pages.
Liebig et al., "A publish/subscribe CORBA Persistent State Service Prototype", downloaded Jun. 17, 2015.
Low et al., "Optimization Flow Control—I: Basic Algorithm and Convergence", IEEE/ACM Transactions on Networking, vol. 7, No. 6, Dec. 1999.
Ousterhout and Ongaro. Implementing Replicated Logs with Paxos. Mar. 1, 2013. From http://ramcloud.stanford.edu/-ongaro/userstudy.
Rao et al. Using Paxos to Build a Scalable, Consistent and Hightly Available Datastore. Proceedings of the VLDB Aug. 29-Sep. 3, 2011.
Rhee et al., "TEAR: TCP emulation at receivers—flow control for multimedia streaming*", Apr. 28, 2000.
Stevens, Richard "TCP/IP Illustrated vol. 1 The Protocols" 1994 Figure 20.1 Figure 21.2 (Year: 1994).
Stevens, Richard et al TCP/IP Illustrated, vol. 1 Addison Wesley 1994. Appropriate sections uploaded.
Wechta et al., "The interaction of the TCP flow control procedure in end nodes on the proposed flow control mechanism for use in IEEE 802.3 switches", ResearchGate, Jan. 1998.

Also Published As

Publication number Publication date
US11496572B1 (en) 2022-11-08
US10462218B1 (en) 2019-10-29
US20230052324A1 (en) 2023-02-16

Similar Documents

Publication Publication Date Title
US9888048B1 (en) Supporting millions of parallel light weight data streams in a distributed system
US10432704B2 (en) Translation of messages using sensor-specific and unified protocols
EP2998863B1 (en) Converting a serial transaction schedule to a parallel transaction schedule
US8930316B2 (en) System and method for providing partition persistent state consistency in a distributed data grid
US9009234B2 (en) Complex event processing system having multiple redundant event processing engines
US10862965B2 (en) System and method for topics implementation in a distributed data computing environment
US9483482B2 (en) Partitioning file system namespace
US9614646B2 (en) Method and system for robust message retransmission
US7631315B2 (en) Using subqueues to enhance local message processing
CN113490927B (en) RDMA transport with hardware integration and out-of-order placement
US10075549B2 (en) Optimizer module in high load client/server systems
WO2017181872A1 (en) Data processing system and method
EP2995028B1 (en) Tuple recovery
US20170289044A1 (en) Highly available servers
CN109327511B (en) Data request method and server based on HTTP (hyper text transport protocol)
US11831711B2 (en) System and method for sending and receiving remote procedure calls
EP3928213A1 (en) Rdma transport with hardware integration
US20150019620A1 (en) High availability for communications based on remote procedure calls
US11218371B2 (en) Method and apparatus for performing communication in internet of things
US11399077B2 (en) Systems and methods for acknowledgement in media processing
US12008276B2 (en) Cloud printing utilizing queued message service
US11736589B2 (en) Systems and methods for acknowledgement in media processing
CN111858379A (en) Application testing method and device
US9729620B1 (en) Reducing redundant transmissions in client polling operations using a backend data grid
CN113204437B (en) Connection of application instances to client devices

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MORARU, IULIAN;COOPER, BRIAN FRANK;KANTHAK, SEBASTIAN;AND OTHERS;SIGNING DATES FROM 20150729 TO 20150805;REEL/FRAME:061654/0258

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:061885/0530

Effective date: 20170929

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE