CN116489157B

CN116489157B - Management of distributed endpoints

Info

Publication number: CN116489157B
Application number: CN202310454209.4A
Authority: CN
Inventors: 娜琳·戈埃尔; 哈尔沃·雷伊泽尔·琼斯
Original assignee: Amazon Technologies Inc
Current assignee: Amazon Technologies Inc
Priority date: 2019-09-27
Filing date: 2020-09-25
Publication date: 2024-03-15
Anticipated expiration: 2040-09-25
Also published as: CN116489157A; DE112020004639T5; CN114503531A; CN114503531B; WO2021062116A1

Abstract

A system and method for managing communications of components in a global accelerator system is provided. The global accelerator system includes client devices that communicate with a global access point via a public network to access various endpoints capable of hosting services. In turn, the global access point communicates with various endpoints organized into different data centers via a private network. To facilitate exchanges on behalf of different client devices, the global access point may characterize groupings of endpoints as subgroups or braids. Individual fabric layer communications may be encapsulated and routed to the data center by networking equipment using the 5-tuple information in the packet. The components within the individual fabric layers may broadcast or multicast status information via management messages to ensure failover or reduce duplicate processing.

Description

Management of distributed endpoints

The present application is a divisional application of patent application with application number 2020800662886, application date 2020, 9 and 25, and the name of "management of distributed endpoints".

Background

In general, computing devices utilize a communication network or a series of communication networks to exchange data. Companies and organizations operate computer networks that interconnect multiple computing devices to support operations or provide services to third parties. The computing systems may be located in a single geographic location or in multiple different geographic locations (e.g., interconnected via a private or public communication network). In particular, a data center or data processing center (referred to herein generally as a "data center") may include a plurality of interconnected computing systems to provide computing resources to users of the data center. The data center may be a dedicated data center operating on behalf of an organization or a public data center operating on behalf of the public or for the benefit of the public.

Since the resources of any individual computing device are limited, a variety of techniques are known to attempt to balance resource utilization between devices. For example, a "load balancer" device may be placed between a client device requesting use of computing resources and a server providing such resources. The load balancer may attempt to distribute requests among servers, thus allowing the servers to work in concert to provide such resources. One significant disadvantage of using such a load balancer is that it may create a single point of failure such that if the load balancer fails, the request to the server fails and the resources become unavailable.

Another type of load balancing known in the art is Domain Name System (DNS) load balancing. DNS generally refers to a network of devices that operate to translate human-recognizable domain names into network addresses on a network, such as the internet. To load balance using DNS, DNS servers are populated with network addresses of multiple computing devices that provide a given resource. When responding to a request to resolve a domain name associated with the resource, the DNS server alternates those addresses provided, thus resulting in the request for the resource being divided among the addresses provided. DNS load balancing typically avoids single point failure because DNS servers do not act as a conduit for resource traffic, and because DNS typically provides highly redundant operation. However, one significant drawback of DNS load balancing is the delay required to alter the load balancing scheme. DNS requests typically flow through a series of DNS resolvers, each of which may cache previous DNS results for a period of time. Thus, changes made at DNS servers that attempt to change the way in which the load is balanced among the servers may require a significant amount of time to propagate. These delays may lead to significant errors in network communications, particularly in the event that the server has failed. Furthermore, maintaining up-to-date DNS records can be difficult because they typically must be modified as new devices are added to or removed from the load balancing scheme.

Yet another type of load balancing known in the art is the use of "anycast" network addresses. In a network, such AS the internet, different autonomous systems ("ases") provide different network addresses to devices. Each AS informs its neighboring AS of the addresses available within its network by "advertising" the addresses. Most typically, each address is associated with a single location (e.g., a single device or group of devices). In an anycast configuration, multiple devices in multiple ases typically advertise the same network address. Depending on the configuration of the neighboring AS, client requests to access the address may then be routed to any of the plurality of devices, thereby distributing the load among the devices. One significant disadvantage of using anycast to attempt load balancing is that the routing to the anycast address is typically controlled by the neighboring network. These networks are typically under the control of other entities. It is therefore difficult or impossible to fully control the way requests are distributed among devices having anycast network addresses. Furthermore, when the configuration of the neighboring network changes, the distribution of requests may also change, resulting in volatility of the load balancing scheme.

Drawings

FIG. 1 is a block diagram depicting an illustrative logical network including a plurality of client devices and a data center and a set of global access points providing load-balanced access from a set of global network addresses to the data center.

Fig. 2 is a block diagram depicting an illustrative configuration of the data center of fig. 1.

Fig. 3 is a block diagram depicting an illustrative configuration of the global access point of fig. 1.

Fig. 4 is a block diagram depicting an illustrative configuration of a stream manager server implementing one or more stream managers within the global access point of fig. 1.

Fig. 5 depicts illustrative interactions for routing a request from a client device addressed to a global network address to the global access point of fig. 1.

Fig. 6 depicts an illustrative interaction for routing requests from a global access point to the data center of fig. 1 based at least in part on load balancing the requests between the data centers.

Fig. 7 depicts illustrative interactions for propagating endpoint information from a data center to the global access point of fig. 1 so that the access point can properly route traffic from client devices addressed to global network addresses.

Fig. 8 depicts an illustrative routine for increasing the resilience of global network addresses by selecting different neighboring devices to be advertised to from the access point of fig. 1.

Fig. 9 depicts an illustrative routine for routing traffic addressed to a global network address associated with a service provided by an endpoint within a data center using the access point of fig. 1.

Fig. 10 depicts an illustrative routine for updating information about endpoints of a data center providing network-accessible services at the global access point of fig. 1.

Fig. 11 depicts illustrative interactions for establishing a Transmission Control Protocol (TCP) session at the global access point of fig. 1 and switching the TCP session to an endpoint to enable a client device and endpoint to communicate via the TCP session.

Fig. 12 depicts an illustrative routine for establishing a Transmission Control Protocol (TCP) session at the global access point of fig. 1 and switching the TCP session to an endpoint to enable a client device and endpoint to communicate via the TCP session.

Detailed Description

Generally described, aspects of the present disclosure relate to providing load-balanced access to a pool of computing devices dispersed across multiple geographic locations using one or more global network addresses. More particularly, aspects of the present disclosure relate to providing a set of distributed access points reachable via a global network address, the access points selecting and routing requests to endpoint devices within a pool based at least in part on load balancing the requests. In one embodiment, the access point utilizes anycast routing techniques to advertise availability of global network addresses associated with a pool of computing devices, thereby attracting traffic addressed to those addresses. Upon receiving a request to access a pool, an access point may select an appropriate endpoint within the pool based on a distribution algorithm that facilitates distribution of packets to different endpoint groups (commonly referred to as endpoint groups). The access point may then act as a proxy, routing the request to the endpoint and facilitating further communication between the endpoint and the requesting device. An access point may implement a variety of techniques (as disclosed herein) to provide resilient and efficient access to a pool of endpoint devices. As disclosed herein, access points may be distributed across a wide geographic area, thus eliminating a single point of failure within the system. Furthermore, by utilizing anycast advertising, the access point can distribute requests among pools even when the requests are addressed to a single network address, avoiding the complexities and delays associated with other technologies (such as DNS-based load balancing). By acting as a proxy between the client and the device pool, rather than providing resources directly, the access point can control the distribution of requests to the pool independent of the way the external device selects an address to route the request to the anycast, thereby avoiding the damage associated with traditional anycast networking. Thus, embodiments disclosed herein significantly improve existing load balancing techniques.

Embodiments of the present disclosure may illustratively be implemented in a wide geographic area. In one embodiment, the present disclosure is implemented on the global internet and provides a global Internet Protocol (IP) address, such as an IP version 4 (IPv 4) or IP version 6 (IPv 6) address. Different data centers may exist in different geographic locations, and each data center may include one or more endpoint devices that provide access to network-based services. Examples of such services include, but are not limited to, web page hosting, data storage, on-demand computing services, and the like. The resources at each data center may be limited and thus operators of network-based services may wish to distribute the load among such services. To simplify the operation of network-based services (e.g., avoid the complexities associated with DNS-based load balancing), it may be desirable to provide operators with a single set of relatively static service network addresses that may be managed independently of the individual endpoints that provide access to the services. Such network addresses are generally referred to herein as "global" network addresses. As used herein, the term "global" is intended to refer to a range of network addresses associated with a service (e.g., network addresses apply to the entire service, rather than individual devices), and does not necessarily imply that such network addresses are accessible worldwide. However, embodiments of the present disclosure may be implemented to provide a global network address that is generally accessible from a global network (such as the internet).

To provide a global network address to a service, a system is disclosed that provides a set of geographically distributed global access points. As noted above, in this context, the term "global" access point is generally intended to refer to an access point that generally provides access to a service, rather than an individual endpoint, and does not necessarily imply that such an access point exists worldwide (although such a configuration is possible). However, access points are typically geographically distributed. In one embodiment, the global access points are located in a different and greater number of geographic locations than the geographic locations of the data centers providing the endpoints to the service, thereby reducing the average network distance between the global access points and the client devices attempting to access the service.

Each global access point may utilize anycast technology to advertise availability of a service via one or more global network addresses associated with the service. Illustratively, each access point may utilize the border gateway protocol ("BGP") to advertise global network addresses by including BGP "speakers" to advertise availability of global network addresses to neighboring networks. The global access point may thus attract traffic addresses to the global network address. As disclosed in more detail below, global access points may "form" BGP advertisements in some cases to increase the resilience of network-based services to network outages. For example, global access points may divide their advertised global network addresses into two groups and assign services at least one network address from each group. The global access point may then advertise the address of each group to different neighboring networks. In this way, each access point can effectively create two paths to the point: by using a first neighbor of the address of the first group or by using a second neighbor of the address of the second group. Thus, if one neighboring network fails in some way, there is an alternate path to the access point. Furthermore, since each access point can operate in this manner, if one access point fails entirely, traffic can be automatically routed to another access point via conventional anycast routing mechanisms. Although examples relating to two groups are provided herein, any number of groups may be provided.

To provide even further flexibility, in some cases, each access point may isolate traffic according to a specific subset of advertised global network addresses and utilize different devices or program threads to process traffic addressed to each subset. In this way, if traffic of one subset is problematic for operation of the access point (e.g., due to a misconfiguration associated with the global network address of the subset), it is unlikely to affect operation of a device or thread handling traffic of another subset. In one embodiment, the access point uses a combination of different network address groups and an isolation process for a subset of each network group. Further, services may "reorganize" between subsets of each group such that if two services are assigned network addresses within a common subset of one group, then two services may be assigned network addresses within a different subset of the second group. Under this configuration, if the configuration of a particular service causes problems with the operation of the access point with respect to a subset of one group, then other services will likely still be reachable via the alternative group (where their addresses are unlikely to be on the same subset as the problem service). By increasing the number of address groups and subsets, the total number of services affected by the problematic service can be gradually reduced.

Upon receiving a request to access a service, the global access point may be configured to route traffic to the appropriate endpoint within the data center that provides the service. To this end, the global access point may need to know the network address of the endpoint. In one embodiment, each data center is configured with an explorer device that maintains a list of endpoints that provide access to services. The resource manager may provide information to the access point of the endpoint, including network address information of the endpoint. Thus, upon receiving a request to access a service, each access point may be configured to select an available endpoint of the service and route the request to that endpoint. As will be discussed below, endpoints may be selected based on a distribution algorithm that selects a set of endpoints based on a product of geographic criteria and distribution criteria. For example, the global access point may apply a percentage distribution with a coin flipping algorithm to various geographic subgroups of the data center. In addition, the global access point may then apply a selection algorithm to select individual endpoints based on the processing attributes of the request (e.g., 5-tuple information) to ensure that the selected endpoints are consistently selected. For example, the global access point may implement a scoring algorithm that facilitates consistent selection of endpoints based on highest scores.

In one embodiment, to route traffic to endpoints, each access point is configured to utilize network address translation ("NAT"). NAT techniques are known in the art and therefore are not described in detail herein. Generally, however, NAT technology enables a device to act as a middleman between two devices while overwriting aspects of each data packet, such as source and destination network addresses, to facilitate communication between the devices. According to an embodiment of the present disclosure, each access point is operable to replace the source network address of the requesting device with its own network address (e.g., a unicast network address that uniquely identifies the global access point) and to replace the destination network address (e.g., the global network address of the service) with the network address of the endpoint providing the service. The access point may then route the packet to the endpoint, receive the response from the endpoint (if any), perform reverse conversion of the source and destination (e.g., replace the source of the response with a global network address and replace the destination of the response with the network address of the requesting device), and return the packet to the requesting device. In one implementation, an access point may utilize port translation (known in the art) to facilitate differentiation of traffic flows (series of interrelated packets) when utilizing NAT to ensure proper translation of addresses when handling traffic from multiple requesting devices.

Advantageously, using NAT enables an access point to forward traffic received from a client device to an endpoint while minimizing interference with a connection between the client device and the endpoint. For example, by utilizing NAT, an access point need not act as a "termination" point for certain network protocols, such as Transmission Control Protocol (TCP). Alternative routing techniques may, for example, cause the access point to accept a TCP connection from the client device and initialize a second TCP connection to the endpoint. Maintaining a plurality of such connections (and their correspondence) may significantly increase the resource utilization of the access point, thereby reducing its ability to route traffic addressed to the global network address. Thus, the use of NAT may provide benefits over such techniques.

Furthermore, the use of NAT at the access point enables traffic to be directed to a global network address that is different from any address utilized by the endpoint, thereby preventing disclosure of the address actually utilized by the endpoint. Thus, the endpoint may be freed from directly receiving traffic. This in turn may reduce or eliminate vulnerability of the endpoint to network attacks, such as denial of service (DoS) attacks. In this way, global access points may provide benefits similar to those provided by traditional Content Delivery Networks (CDNs). However, unlike CDNs, access points do not need the functionality of replicating services (e.g., by implementing a web server at every point within the CDN). In contrast, the access point of the present disclosure may enable traffic to flow through the access point, thereby enabling packets of the requesting device to reach the end point providing the service. For this reason, among other things, the global access points disclosed herein may provide a greater range of functionality than CDNs, thereby enabling load balancing and distribution among any number of different network-based services.

In another embodiment, each access point is configured to utilize encapsulation in order to route traffic to an endpoint. Encapsulation is a known networking technology and is therefore not described in detail herein. However, encapsulation may generally be utilized to add additional information (typically in the form of a header, and sometimes also in the form of a trailer) to the data packet, thus "wrapping" or encapsulating the data packet to produce an encapsulated data packet. In the context of the present disclosure, encapsulation may be utilized to provide a "network tunnel" between each access point and an endpoint. In one particular example, a global access point may establish a set of virtual LAN ("VLAN") channels between a subset of endpoints to facilitate more secure communications via an encapsulation tunnel. Generally described, a VLAN corresponds to a broadcast domain that is partitioned and isolated in a computer network at the data link layer. The global access point may implement various VLANs by applying tags to network frames and handling these tags in a networking system (e.g., packet processing by network components) to communicate with individual endpoints or subsets of endpoints. Thus, the global access point may implement different VLANs to maintain network communications via client devices and endpoints that, while connected to the same physical network, implement applications separately. Thus, when a packet is received from a client device at an access point and addressed to a network address of a service provided by an endpoint, the access point may encapsulate the packet with additional information, thereby enabling the packet to be routed to the selected endpoint (such as the network address of the endpoint). The encapsulated packets may then be routed through a physical network to endpoints via network elements, such as a top of rack switch ("TOR"), which may distribute incoming packets to different assigned VLANs using 5-tuple information. The endpoint may "decapsulate" the encapsulated packet to remove additional information and understand the encapsulated packet as if it were received directly from the client device. In still other aspects of the present application, endpoint devices may be assigned different subsets of packet processing responsibilities (commonly referred to as fabric layers implemented on different VLANs) that promote resiliency of the recovery state. More specifically, individual endpoints associated with a particular VLAN may exchange messaging information, such as path MTU discovery ("PMTUD") messages, to ensure that communications with the endpoints are not redundantly processed or exchange state information for failover. The exchanged messaging information may be multicast within a particular fabric layer or broadcast over multiple fabric layers. For example, other endpoints may be configured to ignore or filter broadcast communications that are not associated with an assigned braid.

In one embodiment, an endpoint may respond directly to a client device via a physical network that connects the endpoint to the client device. In another embodiment, the endpoint may respond to the client device by encapsulating the response packet and transmitting the encapsulated response packet back to the access point. The access point may in turn decapsulate the response packet and return the response packet to the client. Returning responses directly from the endpoint to the client device may advantageously reduce the workload of the access point and may also reduce traffic on the physical network connecting the access point to the endpoint. However, it may be necessary to return a response directly to utilize the physical network connecting the endpoint to the client device, which may not be as robust as the physical network connecting the access point to the endpoint. For example, where an endpoint and access point are connected via a private network and the access point, endpoint and client device are connected via a public network (such as the internet), returning a response through the access point (rather than directly) may be preferable because it may be preferable to minimize the total distance traveled over the public network (e.g., where the access point is located closer to the client device than the endpoint).

In general, encapsulation of packets between an access point and an endpoint may increase the computational resource utilization of the access point and the endpoint as compared to routing packets from the access point to the endpoint using NAT. However, encapsulation may also provide benefits over NAT implementations. For example, where packets of a client device are routed to different global access points during a TCP session, encapsulation may provide resiliency to the TCP session. Such routing may occur, for example, based on operation of the network between the client device and the different access points due to potential variations in the manner in which the network handles anycast advertisements for the different access points. Where access points utilize NAT to route packets to endpoints, each access point may independently maintain NAT information. Thus, if the client device packet is routed to a different access point, the second access point may not have sufficient information to successfully take over the communication of the client (e.g., the transformed packet of the second access point may be different from the packet that was originally generated at the initial access point). This may cause the TCP session of the client device and the endpoint to be disconnected, requiring the client device to reestablish the TCP connection.

In contrast, encapsulation may enable smooth handling of rerouting of client requests between access points. Under NAT or encapsulation implementations, each access point may be configured to select endpoints in a consistent manner. As described above, the access point may apply a selection algorithm that generates a score indicating the desired data center. Such selection algorithms may include, but are not limited to, weighted rendezvous hashing or weighted consistent hashing algorithms. Thus, client device packets are expected to be routed to the same endpoint regardless of the access point to which they were originally sent. Further, by utilizing encapsulation, there is no need to change any data of the packets of the original client device when routing the packets to the endpoint. Instead, the data may be encapsulated (e.g., with additional information that enables routing to the endpoint) and retrieved at the endpoint after decapsulation. For this reason, even if the client packets are routed to different access points, the final data available to the endpoint after decapsulation may be the same, enabling the endpoint to maintain a TCP session with the client device even as the data flows through the different access points.

In some embodiments, to further improve performance in implementations that utilize encapsulation, each access point may be configured to assist an endpoint in establishing a connection-oriented communication session (such as a TCP session) with a client device. Connection-oriented communication sessions typically require a session initiation phase in which both parties to the session communicate to establish a mutual understanding of the communication state. TCP sessions utilize, for example, a "three-way handshake". TCP three-way handshakes are known in the art and will therefore not be described in detail herein. Briefly, however, when a first sends a synchronization ("SYN") packet to a second party (including a first party sequence number to be used during a session), a TCP session is established, the second party responds with a synchronization-acknowledgement ("SYN-ACK") packet (acknowledging the first party sequence number and including the second party sequence number), and the first party responds with an acknowledgement ("ACK") packet (acknowledging the second party sequence number) to the SYN-ACK packet. Since a three-way handshake requires three separate communications between parties, the delay increase between parties during the handshake is tripled, which is often described as a "first byte delay". In the context of the present disclosure, for example, if communication between a client device and an endpoint incurs a 100 millisecond (ms) delay, a three-way handshake may take at least 300 milliseconds to expect. Since such handshaking is required before data is transferred over a TCP session, it is beneficial to reduce the time required to initialize the TCP session, e.g. to reduce the first byte delay. As will be described below, in other embodiments, TCP sessions between global access points may also include TCP sessions between global access points and intermediate devices localized to selected endpoints and third TCP sessions (new) between localized intermediate devices and selected endpoints. Thus, the first byte delay may be further reduced by further limiting the delay between the endpoint and the localization intermediary.

Thus, in embodiments of the present disclosure, each global access point may be configured to conduct a TCP three-way handshake with the client device and then "hand over" the TCP connection to the endpoint. Specifically, each access point may be configured to "listen" for incoming SYN packets from a client device and respond to such packets by conducting a TCP three-way handshake. After the handshake is complete, the access point may transmit TCP session information (e.g., a first party sequence number and a second party sequence number of the TCP session) to the endpoint selected to provide the requested service to the client device. Upon receiving the TCP session information, the endpoint may "take" the TCP session and process the client device packets as if the TCP session had been established between the client device and the endpoint. Since the access point may be assumed to be "closer" to the client device in terms of latency, the time required to establish a TCP session is reduced without interfering with the ability of the client device and endpoint to communicate via a common TCP session. Although examples are provided herein with respect to TCP handshakes, stateful communication sessions each typically require an initialization phase. Embodiments described herein may be utilized to conduct such an initialization phase at an access point while switching the context of a connection-oriented communication session to an endpoint to enable client devices and endpoints to communicate via the connection-oriented communication session.

Still further, the global access point may be configured to modify or select a different TCP-based configuration to take advantage of the higher bandwidth connection between the access point and the data center. More specifically, in accordance with aspects of the present application, TCP-based communications generally include various settings or parameters that may control the measurement of transmitted data. Such parameters may be associated with the amount of data being transmitted, which is commonly referred to as a window parameter. Parameters may also include congestion control parameters such as slow start policies used by TCP in conjunction with other algorithms to avoid sending more data than the network can forward, that is to say to avoid causing network congestion. For traditional TCP communications including a common network connection, the data window and congestion control parameters are typically set at a lower level where there are multiple TCP connections using the common network connection and may be increased to support a greater amount of data throughput as it is measured and determined. This has the effect of increasing the communication delay, commonly referred to as the last byte delay. According to aspects of the present application, the client computing device and the global access point have a first TCP connection configured with congestion control parameters and window parameters using conventional methods (e.g., an initial lower value that may be increased based on measured attributes of the connection). The global access point and the endpoint have a second TCP connection that is not configured using conventional methods because the global access point and the endpoint communicate via a second network connection (such as a private network) that may be configured to support higher data throughput while having less likelihood of congestion or less effective bandwidth. Thus, the global access point may set the congestion control parameter and the window parameter to the highest or higher initial values, which directly results in greater data throughput. Thus, the total data throughput (e.g., the last byte delay may increase).

As will be appreciated by those skilled in the art in light of this disclosure, embodiments disclosed herein increase the ability of a computing system to provide network accessible services. In particular, embodiments of the present disclosure improve upon existing load balancing techniques by providing scalable, resilient and responsive load balancing across public network addresses, improving upon known load balancing techniques. Further, the presently disclosed embodiments address technical problems inherent in computing systems; in particular, the limited nature of computing resources to provide network-accessible services and the difficulty of load balancing requests for such services in a scalable, resilient, and responsive manner. These technical problems are solved by the various technical solutions described herein, including using a set of distributed access points associated with a public network address, each access point configured to receive requests for a service, and route requests for endpoints of the service based at least in part on load balancing the requests among the endpoints. Accordingly, the present disclosure generally represents an improvement over existing network load balancing systems and computing systems.

The foregoing aspects and many of the attendant advantages of this disclosure will become better understood by reference to the following description, when taken in conjunction with the accompanying drawings, wherein, as such.

FIG. 1 is a block diagram depicting an illustrative logical environment 100 including a plurality of client devices 102 in communication with a set of global access points 106A-106N via a first network 104, the global access points 106A-106N in communication with a set of data centers 110A-110N via a second network 108. Although the client device 102, global access point 106, and data center 110 are grouped together in fig. 1, the client device 102, global access point 106, and data center 110 may be geographically remote and owned or operated independently. For example, client devices 102 may represent a large number of users accessing various global, continental, or regional locations of network-accessible services provided by data center 110, which data center 110 may be further distributed among various global, continental, or regional locations. Global access points 106 may be similarly distributed. In one embodiment, data center 110 represents devices in a location under the control of a single entity, such as a "cloud computing" provider, while global access point 106 represents devices in a common tenant location, such as a network "point of presence" or an internet exchange point (IXP). The global access point 106 may generally be more numerous than the data center 110 and located in a different physical location. However, in other embodiments, one or more of the access points 106 may be located within one or more data centers 110. Accordingly, the grouping of client devices 102, access points 106, and data centers 110 within fig. 1 is intended to represent logical groupings, rather than physical groupings.

Networks 104 and 108 may be any wired network, wireless network, or combination thereof. In addition, networks 104 and 108 may be personal area networks, local area networks, wide area networks, cable networks, satellite networks, cellular telephone networks, or combinations thereof. In the exemplary environment of fig. 1, network 104 is a global network (GAN), such as the internet, and network 108 is a private network dedicated to traffic associated with entities providing data center 110 and access point 106. Protocols and components for communicating via other aforementioned types of communication networks are well known to those skilled in the art of computer communications and therefore need not be described in greater detail herein.

Although each of the client device 102 and the access point 106 are described as having a single connection to the network 104, separate components of the client device 102 and the access point 106 may be connected to the network 104 at different points (e.g., through different adjacent networks within the network 104). In some embodiments, the data center 110 may additionally or alternatively be connected to the network 104. Similarly, although each of the access point 106 and the data center 110 is depicted as having a single connection to the network 108, separate components of the access point 106 and the data center 110 may be connected to the network 108 at different points. Thus, communication time and capabilities may vary between the components of fig. 1. The network configuration of fig. 1 is intended to illustrate communication paths in embodiments of the present disclosure, and not necessarily to depict all possible communication paths.

Client device 102 may include any number of different computing devices capable of communicating with global access point 106. For example, the individual client devices 102 may correspond to laptop or tablet computers, personal computers, wearable computers, servers, personal Digital Assistants (PDAs), hybrid PDAs/mobile phones, e-book readers, set-top boxes, cameras, digital media players, and the like. In some cases, the client device 102 is operated by an end user. In other cases, the client device 102 itself provides network-accessible services that interact with the global access point 106 to access other network-accessible services.

The data center 110 of fig. 1 illustratively includes endpoint computing devices that provide one or more network-accessible services on behalf of one or more service providers. Illustratively, the data center 110 may be operated by a "cloud computing" provider, which makes host computing devices within the data center available for service providers to use in providing their services. Cloud computing providers may generally manage the operation of data centers while providing various mechanisms for server providers to configure their respective endpoints. One illustrative configuration of the data center 110 is provided below with respect to fig. 2.

In accordance with embodiments of the present disclosure, cloud computing providers may enable service providers to associate their endpoints with one or more global network addresses that are addressable over network 104 to interact with data center 110 in a load-balanced manner. The cloud computing provider may also enable the service provider to specify how such load balancing should occur, such as by specifying the percentage of requests to be routed to each data center 110. The cloud computing provider may also enable the service provider to alter the configuration of the endpoint independently of the global network address such that altering the particular endpoint providing the service does not require reconfiguring the network address. The use of a global network address may significantly simplify the operation of the network service, as any client device 102 desiring to connect to the service may simply transmit a request to the global network address of the service. For example, changes may then be made to the endpoint providing the service without changing the DNS record for the service. As will be described below, in some cases, these changes may be made automatically so that no user action is required even when the endpoint of the service changes within the data center 110.

To facilitate global network addresses, a set of global access points 106A-106N is provided. Each access point may generally include one or more computing devices configured to obtain requests from client devices 102 to interact with the service and route such requests to endpoints within data center 110 that are selected based at least in part on load balancing the requests across data center 110. The access point 106 may also act as a type of proxy for the endpoint, enabling traffic between the client device 102 and the data center 110 to flow across the access point 106. The operation of access point 106 is discussed in more detail below. Briefly, however, the availability of global network addresses may be broadcast to neighboring network devices within the network 104 using anycast techniques, the network 104 in one implementation comprising devices that are not controlled by the public entity providing the access point 106A. The access point 106 may thus attract traffic addressed to the global network address. Thereafter, the access point 106 may select an endpoint to which to direct traffic based on factors such as availability of the endpoint, load balancing across the data centers 110, and performance criteria between the access point 106 and the various data centers 110.

After selecting data center 110, access point 106 may route the request to the endpoint. In one embodiment, the access point 106 uses NAT translation or encapsulation (e.g., virtual private network) to redirect requests to endpoints on the network 108, thereby preventing disclosure of the network address of the endpoint to the client device 102. In the case of utilizing a connection-oriented communication session between the client device 102 and an endpoint, the access point 106 is operable to conduct an initialization phase of the communication session on behalf of the endpoint, in accordance with an embodiment of the present invention. Where network 108 is a private network, global access point 106 may further act as a "offload" point of traffic to an endpoint, moving the traffic from a public network (e.g., network 104) to private network 108. In general, such private networks will be expected to have greater performance than public networks, and thus such offloading may further increase the speed of communication between client device 102 and the endpoint. As described above, in some embodiments, the global access point 106 may select different congestion control parameters and window parameters for TCP communication between the access point 106 and the endpoint to increase data throughput by utilizing the greater performance of the private network 108 as compared to conventional or smaller values of the congestion control parameters and window parameters for TCP communication between the client computing device 102 and the access point.

As noted above, access point 106 may implement a variety of techniques to ensure the resilience of network services using global network addresses. Illustratively, advertising the access points 106 using anycast may provide resiliency between the access points 106, as failure of an individual access point 106 may generally be expected to cause a device of the network 104 to route a request to another access point 106. Further, to address potential failures of the network 104, each access point 106 may be configured to control its global network address advertisement on the network 104, thereby providing multiple routing paths for each service to the access point 106. Additional details regarding such control over providing advertisements for routing paths are provided below. Still further, to address potential failures within access points 106, each access point 106 may be configured to include multiple flow managers to handle different traffic flows addressed to the global network address. The stream manager may be distributed logically (such as across program threads) and/or physically (such as across processors or computing devices). Thus, failure of one stream manager may have little effect on other stream managers within access point 106, thereby limiting the effects of partial failure within access point 106. One illustrative configuration of access point 106 is discussed below with reference to fig. 4.

Fig. 2 is a block diagram depicting an illustrative configuration of the data center 110 of fig. 1. As shown in fig. 2, data center 110 includes a pool of endpoints 201, where endpoint pool 201 includes a set of endpoints 202A-202N. Each endpoint 202 illustratively represents a computing device configured to provide access to network-accessible services. In one embodiment, endpoint 202 is a separate physical computing device. In another embodiment, endpoint 202 is a virtual computing device executing on a physical computing device. In yet another embodiment, endpoint 202 is a collection of computing devices (physical or virtual) that are collectively configured to provide access to network-accessible services. For example, each endpoint 202 may be a set of devices that are load balancer devices configured to load balance requests to the endpoint 202 among the set of devices. Each endpoint 202 communicates with network 108 and is therefore addressable over network 108. The number of endpoints 202 may vary, for example, depending on the capacity requirements of the network-accessible service. Illustratively, the service provider of such services may contract with an operator of data center 110 (e.g., a cloud computing provider) to generate and provision endpoints 202.

In one embodiment, the number of endpoints 202 may vary depending on current or historical demand for network-accessible services. To facilitate different numbers of endpoints 202, a data center may include a resource manager 206 (e.g., implemented directly on a physical computing device or as a virtual device on a host physical computing device), the resource manager 206 monitoring the load of the endpoints 202 (e.g., with respect to requests per second, computing resource utilization, response time, etc.), and dynamically adjusting the number of endpoints 202 to maintain the load within a threshold. For example, where endpoints 202 are implemented as virtual devices, resource manager 206 may generate and provision new virtual devices when a current set of endpoints 202 have a usage metric that exceeds a desired upper usage parameter, and "spin down" and remove virtual devices when the metric falls below a lower usage parameter. The resource manager 206 may be further configured to notify the global access point 106 of network address information of the endpoints 202 within the pool 201 when modifying the plurality of endpoints 202 within the pool 201 so that the access point 106 may address traffic to the endpoints 202.

In addition, the data center 110 of fig. 2 includes a health check device 204, the health check device 204 being configured to monitor the health of the endpoints 202 within the pool 201. Illustratively, the health check device 204 may periodically (e.g., every n seconds) transmit a request to the endpoint 202 and monitor whether an appropriate response is received. If an appropriate response is received, the health check device 204 may treat the endpoint 202 as healthy. If an appropriate response is not received, the health check device 204 may consider the endpoint 202 unhealthy. The health check device 204 may illustratively notify the global access point 106 of the unhealthy endpoint 202 to cause the access point 106 to reduce or eliminate traffic routing to the endpoint 202 when the endpoint 202 is unhealthy. In some cases, the health check device 204 may be further configured to check the health of the global access point 106, the health of the endpoints 202 in other data centers 110, or the health of the network path between the health check device 204 and the access point 106. For example, the health check device 204 may periodically transmit information to the access point 106 and monitor for responses from the access point 106, network metrics (e.g., delays) related to the responses, and the like. Health check device 204 may report this information to access point 106 to facilitate traffic routing to endpoint 202A within data center 110. Illustratively, the health check device 204 may report the health information of the endpoint 202 to the configuration data store 210 (e.g., a database on the data store 210), which may be propagated to the access point 106 through operation of the configuration manager 208 in the manner described below (e.g., as part of a configuration package or parcel).

The data center 110 of fig. 2 also includes a configuration manager 208, the configuration manager 208 being configured to enable a service provider to configure the operation of the data center 110 and the global access point 106. Illustratively, the configuration manager 208 may provide an interface through which a user may specify endpoints 202 that provide network-accessible services, configure those endpoints 202, and configure the resource manager 206 to scale up or down the endpoints. Configuration manager 208 may further enable a service provider to assign global network addresses to those endpoints and specify load balancing parameters to route traffic addressed to the global network addresses to the various data centers 110. The configuration created by the service provider may be stored within a configuration data store 210, which configuration data store 210 may correspond to any persistent or substantially persistent storage (e.g., hard disk drive, solid state drive, network attached storage, etc.). In some cases, configuration data store 210 may include multiple representations of a service provider's configuration. For example, to facilitate rapid reconfiguration of global access points 106, configuration data store 210 may include a database (such as a relational database) that is modified each time a service provider submits changes to their configuration. The configuration manager 208 may periodically (e.g., every 100 milliseconds, every 1 second, every 2 seconds, every 5 seconds, every 30 seconds, etc.) determine whether a change has been made to the database and, if so, generate a new configuration package for the global access point 106 that covers changes to the database (and thus, the service provider's configuration) relative to the previous configuration package. Configuration manager 208 may then store the configuration package in configuration data store 210 for retrieval by global access point 106. In one embodiment, each global access point 106 is configured to periodically poll the configuration data store 210 (e.g., every 100 milliseconds, every 1 second, every 2 seconds, every 5 seconds, every 30 seconds, etc.) to determine if a new configuration packet is present, and if so, retrieve and implement the packet. In some cases, the configuration package may be divided into packages "wraps" representing a portion of the configuration. The global access point 106 may be configured to retrieve only those packages that are modified relative to the existing packages. For example, modifications may be tracked based on packages or version control of packages. Still further, in some implementations, the package or parcel may be stored in the data store 210 as a difference or "delta" from a previous version, such that the access point 106 may retrieve only changes since the previous version of the parcel, thereby reducing the data transfer required to update the package or parcel. In one embodiment, the configuration manager 208 may periodically (e.g., every 100 milliseconds, every 500 milliseconds, etc.) check point(s) packages or parcels by collecting all changes since a previous checkpoint and storing the packages or parcels as independent versions. Such a checkpoint may facilitate quick reconfiguration in the absence of a reference frame for a previous packet or parcel by the global access point 106.

In one embodiment, only one data center 110 of the data centers 110A-110N of FIG. 1 includes a configuration manager 208, which manager 208 propagates the configuration of each service provider to other data centers 110 and access points 106. In another embodiment, each data center 110 includes a configuration manager 208, and the managers 208 of each data center 110 can communicate to configure between the data centers 110 and in synchronization with the access points 106. Still further, the data centers 110A-110N may also include various network routing components 214, such as a top-of-rack ("TOR") switch that may be configured to route data packets to different endpoints 202. More specifically, in one embodiment, the TOR switch may utilize the inclusion in the data packet to route the data packet to endpoint 202. For example, TOR may utilize 5-tuple information (i.e., IP address of source, port address of source, IP address of destination, port address of destination, and routing protocol) to automatically route a packet to an assigned endpoint 202. This facilitates utilization over a set of multiple VLANs or fabric layers that secure transmissions to different endpoints 202. Still further, in other aspects of the present application, endpoint 202 may utilize path MTU discovery messages to exchange information. In accordance with aspects of the present application, to increase resilience, status information may be exchanged to facilitate continued processing of services/requests by the data center 110 in the event of a change in data packets. More specifically, endpoints may utilize multicast message transmissions to a set of endpoints 202, the set of endpoints 202 being associated with a particular VLAN or fabric layer that allows only those endpoints to receive multicast messages (such as to pass state information if an endpoint is to be turned off and to attempt to maintain continuity of service by providing state information to another endpoint). In other embodiments, endpoints may utilize broadcast messages transmitted to all endpoints in data center 210. Because broadcast messages are received to all (or most) endpoints, individual endpoints 202 in the data center may filter or exclude inapplicable messages.

While only some components of data center 110 are shown in communication with network 108, other components may additionally be in communication with network 108 and/or network 104. The lines of fig. 2 are not intended to represent all actual or potential network connections, but rather illustrate possible flows of service-related traffic to endpoint 202.

Further, although shown within the data center 110, in one embodiment, the global access point 106 may also include a configuration manager 208, thereby enabling direct configuration of the access point 106. In another embodiment, the global access point 106 does not include any configuration manager 208 and data store 210. For example, where access point 106 is implemented in a common tenant environment (e.g., not operated by or accessible to parties other than the operator of access point 106), access point 106 may be configured to not include any persistent storage, and instead retrieve configuration information from data center 110 upon initialization of access point 106. In this way, security of access point 106 may be increased, as turning off access point 106 would be expected to result in the loss of any sensitive data that may reside on access point 106.

Although data center 110 is shown as including one endpoint pool 201 corresponding to one network-accessible service, data center 110 may host numerous pools 201, each pool 201 corresponding to a different service. Thus, multiple service providers may utilize the data center 110. Further, as noted above, each network-accessible service may be provided by endpoint 202 across multiple data centers 110. Thus, the global access point of fig. 1 may distribute traffic across data center 110 to such services.

According to an embodiment of the present disclosure, the data center 110 further includes a session handoff manager 212, the session handoff manager 212 being configured to facilitate employing a connection-oriented communication session with the client 102 initialized at the global access point 106. As discussed above, in some cases, global access point 106 may be configured to complete an initialization phase of a connection-oriented communication session, such as a TCP three-way handshake, after which the communication session is handed off to endpoint 202. In some cases, endpoint 202 may be configured to accept such a handoff by receiving a context of a connection-oriented communication session (e.g., a sequence number of both ends of a connection) and generating local state information that incorporates the context. For example, endpoint 202 may be so configured by modifying networking protocol handlers within the operating system of endpoint 202 (e.g., by modifying the kernel to accept and employ TCP context information from global access point 106). However, in order to enable a wide variety of endpoints 202 to utilize the global access point 106, it may be preferable that the endpoints 202 not need to be so configured. To achieve this, the data center 110 may also include a session handoff manager 212, the session handoff manager 212 configured to accept and employ connection-oriented communication sessions from the global access point 106. Session handoff manager 212 may then establish a separate connection-oriented communication session with endpoint 202 selected by access point 106 to provide services to client device 102 and operate as a "man-in-the-middle" between the two sessions. Because session handoff manager 212 and endpoint 202 may be commonly housed within data center 110, creation of a second connection-oriented communication session may be expected to inject minimal delay into communications between client device 102 and endpoint 202.

Fig. 3 is a block diagram depicting an illustrative configuration of the global access point of fig. 1. As shown in fig. 3, each global access point 106 communicates with network 104 via router 302. Although only a single router 302 is shown in fig. 2, the access point 106 may include multiple routers 302. Further, while a single connection to network 104 is shown, each router 302 may include multiple connections to network 104, potentially to multiple different neighboring devices within network 104, where each device may correspond to a different sub-network (e.g., an Autonomous System (AS) within network 104).

As noted above, the global access point 106 may be configured to utilize anycast technology to attract traffic to a global network address associated with a network-accessible service. Thus, router 302 is illustratively configured to advertise global network addresses to neighboring devices on network 104. In one embodiment, such an advertisement is a BGP advertisement. Such advertising may cause router 302 to attract traffic addressed to the global network address, as the advertising may cause devices on network 104 to route traffic addressed to router 302 according to the operation of the anycast technique.

As discussed above, the global access point 106 may implement a variety of techniques to increase the resilience of the access point 106. In one embodiment, the global network addresses advertised by the access point 106 are divided into multiple address groups. To reduce the potential impact of faults on the network 104, the router 302 (or routers 302) may be configured to transmit BGP advertisements for each address group to different neighboring devices (e.g., different ases) on the network 104. The network-accessible service may be associated with addresses from multiple address groups, where each address group may be provided to the client device 102 as an address to access the service. Since addresses from different groups are advertised differently on the network 104, different routing paths may be expected on the network 104 for each group's address. For example, packets addressed to an address within a first group may reach router 302 through a first AS of network 104, while packets addressed to an address within a second group may reach router 302 through a second AS. Thus, if a failure occurs within a first AS (or a downstream AS connected to the first AS), packets addressed to an address within the second group may be expected to still reach router 302, and vice versa. Thus, dividing the global network address into multiple groups may increase the resilience of the access point 106 to failures within the network 104.

Upon receiving a packet addressed to a global network address, router 302 may route the packet from a set of stream managers 304A-304N to stream manager 304 or other similar functional component. While access point 106 may implement a single stream manager 304, it may be beneficial for the access point to implement multiple stream managers 304 to provide redundant operation of such stream managers 304. Router 302 may use any number of known load balancing techniques to distribute packets to flow manager 304, such as polling selection. In one embodiment, router 302 utilizes consistent hashing to distribute packets. Consistent hashing is known in the art and, therefore, will not be described in detail herein. Consistent hashing may be beneficial, for example, to increase the change in routing of multiple packets having the same characteristics (e.g., source network address, source network port, destination network address, destination network port, protocol) to the same stream manager 304. This may advantageously enable the flow manager 304 to maintain state information regarding the flow of packets between the client device 102 and the destination endpoint 202. In some cases, such state information may be required, for example, to implement NAT technology, or to conduct an initialization phase of a connection-oriented communication session. In another embodiment, equal cost multi-path (ECMP) load balancing is used to route traffic to the flow managers 304A-304N. ECMP load balancing is known in the art and will therefore not be described in detail herein.

In one embodiment, ECMP load balancing is applied to route packets to the stream manager 304A based on the global network address to which the packet is addressed. Illustratively, each stream manager may handle packets addressed to a subset of addresses within the global network address group. For example, the first stream manager 304 may preferably handle a first quartile of addresses within a group, the second stream manager 304 may preferably handle a second quartile, and so on. By dividing network addresses within a group between different stream managers 304, access point 106 can, for example, reduce portions of services that are adversely affected by improper operation of stream manager 304, such as due to a misconfiguration of services associated with addresses handled by the stream manager 304. In the case of multiple groups using global network addresses, the services may "reorganize" between different groups such that two services having addresses that share a subset under one address group are unlikely to have addresses that share a subset under another address group. Such reorganization may reduce the total percentage of services that are completely unavailable from access point 106 due to failure of individual flow manager 304A.

To further facilitate the use of state information at flow manager 304, in one embodiment, router 302 implements a flow hash routing algorithm or similar algorithm to identify related packets that constitute a "flow" of traffic between client 102 and endpoint 202. Various flow identification algorithms are known in the art and will therefore not be described in detail herein. Illustratively, router 302 may implement a flow identification algorithm to identify a data packet flow and consistently route packets of the same flow to the same flow manager 304. In one embodiment, router 302 applies flow-based routing to data packets before otherwise distributing the packets to flow manager 304 such that if the packets are identified as part of a flow, they are distributed to the same flow manager 304 that previously handled the packets of the flow. In such an embodiment, if the packet is not identified as part of a flow, an additional load balancing algorithm (e.g., a consistent hash) is applied to the packet to determine the flow manager 304 to route the packet to.

Upon receipt of the data packet, the flow manager 304 may determine the data center 110 to which to route the packet, as well as the endpoints 202 within the data center 110. In one implementation, the flow manager 304 may apply a combination of criteria to select the data center 110 to route the packet to, including network performance criteria and load balancing criteria. In a first aspect, for a given packet, the flow manager 304 may initially select a data center 110 based on network or geographic criteria between the global access point 106 and a group of various available data centers 110 or endpoints 202. The network or geographic criteria may correspond to a measure of network distance (e.g., across network 108) from the identified access point 106. The network or geographic criteria may also include or be at least partially combined with performance criteria such as delay, hop count, bandwidth, or a combination thereof. In general, routing packets to the data center 110 with maximum network performance criteria may advantageously increase the speed of communication between the client device 102 and the data center 110. Since network performance criteria are unlikely to shift quickly between access point 106 and data center 110, simply routing each packet to data center 110 with maximum expected performance criteria may not enable load balancing requested by the service provider.

In addition to the measured or determined network or geographic criteria, each flow manager 304 may further modify the combined distribution criteria as needed to achieve the desired load balancing for the service provider. More specifically, in some embodiments, each packet of endpoints 202 (e.g., data center 110) may be subdivided into different regions. Individual data centers 110 may associate subsets of endpoints 202 into different sub-regions or other groupings of regions. Thus, the customer may specify distribution criteria that identify the measurement distribution or a method of calculating the measurement distribution of the data packets provided to the data center 110. The distribution may be specified illustratively as a percentage of traffic, a total number of data packets (e.g., a total amount of data, a cost of allocation or accounting for individual endpoints, etc. the system administrator may illustratively utilize a software tool or interface (e.g., an API) to provide the allocation, as will be described in various examples herein, in turn, the stream manager 304 may implement an algorithm (such as a coin flipping algorithm) to implement the percentage selection.

It is assumed that in one set of illustrative examples, the data centers 110 associated with a region may be further allocated into three different sub-regions ("sub-1", "sub-2", and "sub-3"). Using tools, a system administrator may specify assignments based on percentages. Various iterations will now be described with respect to an illustrative 100 packet distribution:

Subregion	Dispensing	Number of packets
			Son 1	10％	10 bags
Son 2	20％	18 (20% of the remaining 90)
			Son 3	100％	72 (100% of the remaining 72)

In the above example, the stream manager 304 may apply the distribution using a coin flipping algorithm in a manner that assigns the data packet to one of the subregions. In the following example, the system administrator may not have a complete allocation as follows:

subregion	Dispensing	Number of packets
			Son 1	10％	10 bags
Son 2	20％	18 (20% of the remaining 90)
			Son 3	10％	7 (10% of the remaining 72)

In the above example, the illustrative allocation does not consider the complete set of packets. Thus, in one embodiment, the stream manager 304 may be randomly distributed to a set of sub-regions. In other implementations, the stream manager 304 may assign based on a default distribution algorithm (such as assigning to the closest geographic sub-region) or a manually selected distribution. In yet another embodiment, the flow manager 304 may further incorporate automatic linear ramping to allocate any additional traffic as follows:

Automatic linear adjustment of unassigned 65:

subregion	Dispensing	Number of packets
			Son 1	90％	58 bags
Son 2	100％	17 (20% of the remaining 17)
			Son 3	90％	0 (10% of the remaining 0)

In another embodiment, rather than initially selecting the data center 110 based on geographic or network criteria between the access point 106 and the data center 110, the global access point 106 may be configured to initially select the data center 110 based on performance criteria between the client device 102 from which traffic is received and the initially selected data center 110. Advantageously, the use of network performance criteria between the client device 102 and the data center 110 may result in the data center 110 being consistently selected for a given client device 102 regardless of the access point 106 to which the client device's traffic is routed. After selecting the initial data center 110 (e.g., closest to the client device 102 in terms of distance or minimized delay), the access point 106 may modify the selection as needed to achieve load balancing specified by the service provider. For example, if the service provider wishes not to route more than 70% of the traffic to the initially selected data center 110 and the proportion of the traffic over a period of time exceeds the percentage, the access point 106 may select a different data center (e.g., the next best performing data center 110 relative to the client device 102). The access point 106 may then determine whether the request is routable to the different data center 110 based on the proportion of historical traffic routed to the different data center 110. This process may continue until access point 106 locates a data center 110 that is acceptable based on the service provider's load balancing specifications.

In some embodiments, load balancing is implemented locally at each stream manager 304. In other embodiments, load balancing is implemented across all of the flow managers 304 of the access point 106. In still other embodiments, load balancing is implemented across the flow managers 304 of multiple access points 106. In general, localized load balancing is expected to be less resource intensive because it requires less communication between distributed components. However, less localized load balancing may result in load balancing criteria that are more closely similar to what is expected by the service provider.

In some cases, the stream manager 304 may implement a mix of localized and non-localized load balancing. For example, each flow manager 304 may implement localized load balancing (e.g., localized for each manager 304 or each access point 106) and periodically negotiate with other access points 106 to adjust the weights applied in selecting the data center 110. For example, in the event that a service provider requests that traffic be divided evenly between data centers 110, localized load balancing may cause each of two access points 106 to divide traffic evenly between data centers 110. This may result in less than ideal routing because half of the traffic at each access point 106 may be routed to a data center 110 that is not closest. Thus, in this scenario, the access points 106 may communicate about their traffic routes, and for the sake of assumptions, assuming equal traffic volumes at each point 106 (and considering only two access points 106), each access point 106 may begin routing all of their packets to the nearest data center 110. Such partitioning will still result in a uniform partitioning of traffic between data centers 110 and, in addition, advantageously increase the average network performance metric per packet flow.

In one embodiment, the desired proportion or volume of traffic routed to a given data center 110 may be statically selected by the service provider. For example, the service provider may request that no more than 70% of the requests at the access point 106 be routed to a given data center 110. In another embodiment, the desired ratio or volume may be dynamic. For example, the service provider may specify that a desired proportion or volume of traffic routed to a given data center 110 increases or decreases from a first point in time to a second point in time according to a given rate of change (e.g., linearly, exponentially, logarithmically, etc.).

After selecting data center 110 to which to route traffic, flow manager 304 may select endpoints 202 within data center 110 to which to route traffic. Endpoint 202 may be selected according to any load balancing algorithm. In one embodiment, flow manager 304 may utilize a consistency hash to select endpoint 202.

As discussed above with respect to flow manager 304, it may be desirable for traffic for a given flow to be consistently routed to the same endpoint 202 so that endpoint 202 may maintain state information regarding the flow. Thus, each flow manager 304 may utilize a flow selection algorithm to detect subsequent packets within a flow that were previously routed by flow manager 302A to endpoint 202 and route such packets to the same endpoint 202. In one embodiment, when flow manager 304 identifies a packet as part of a flow that has been routed, flow manager 304 may omit the selection of data center 110 and endpoint 202 for the packet, thereby facilitating routing the packet to endpoint 202 that was previously used for the flow. More specifically, stream manager 304 may implement a selection algorithm that attempts to consistently identify endpoint 202. Illustratively, the selection algorithm may generate scores associated with various endpoints 202 that compare against the attribute application of the individual endpoints or communications. For example, the selection algorithm may be calculated for 5-tuple information (e.g., the source's IP address, the source's port address, the destination's IP address, the destination's port address, and the routing protocol to automatically route the data packet to the assigned endpoint 202. The selection algorithm may then process the score generated for a given communication request (e.g., connection to a service) by selecting the endpoint with the "highest" score.

In still other embodiments, the stream manager 304 may further optimize the generation of the score by correlating the calculation of the score into a hierarchy. In one embodiment, endpoints 202 are grouped into pairs that allow a score to be calculated based on the cumulative attributes. The pairs are further combined into successive combinations until only two remain, thereby producing a pyramid-like hierarchy. Instead of calculating and comparing scores for each endpoint 202 in the group, the resulting pyramid enables binary partitioning and comparison of scores, thereby further making selection of endpoints 202 more efficient.

After selecting an endpoint 202 to route a packet to, flow manager 304 may modify the packet to facilitate routing to endpoint 202. For example, when a packet is received at router 302, the destination network address of the packet may be a global network address. Flow manager 304 may thus modify the packet to replace the destination network address with the network address of endpoint 202. In one embodiment, each flow manager 304 implements NAT technology to modify packets addressed to global network addresses. For example, for packets destined for endpoint 202, each stream manager 304 may replace the global network address with the network address of endpoint 202 as the destination network address of the packet and the network address of client device 102 with the address of access point 106 as the source address. Similar translation may occur for packets from endpoint 202 to be routed to client device 102 in accordance with NAT techniques. The flow manager 304 may illustratively use port translation (known NAT techniques) to distinguish the translated flows. After the conversion, flow manager 304 may return the packet to router 302 for transmission over network 108 to the selected endpoint 202.

In another embodiment, flow manager 304 may utilize encapsulation to route packets to endpoint 202. Illustratively, each stream manager 304 may generate an IP "tunnel" to a device within the data center 110 (such as the session handoff manager 212 or a router within the data center 110). To route packets to endpoint 202, flow manager 304 may encapsulate the packets and transmit the packets to a receiving device via a tunnel. The receiving device may then decapsulate the packet and transmit the packet to endpoint 202. In one embodiment, flow manager 304 replaces the destination address of the packet (e.g., the global network address of the service) with the destination address of the selected endpoint 202 to facilitate transmission of packet 202 to the endpoint. Tunneling packets may provide benefits such as preserving the network address of the client device 102 that transmitted the packet. As previously described, in a particular example, the flow manager 304 may establish a set of VLANs that utilize encapsulated communications to different subgroups of endpoints. Thus, encapsulation may enable, for example, client device 102 and endpoint 202 to maintain connection-oriented communication sessions even when packets of client device 102 are routed through different access points 106. In some cases, flow manager 304 and endpoint 202 (or session switch manager 212) may be configured to verify communications between each other to ensure the authenticity of packets transmitted between devices. For example, each of flow manager 304 and endpoint 202 (or session handoff manager 212) may be configured to generate a digital signature that authenticates the respective device, and include such digital signature in the header of the encapsulated packets flowing between the devices, such that each device may authenticate the digital signature to verify that the packets are generated at a known device. As another example, the stream manager 304 and endpoint 202 (or session switch manager 212) may be configured to communicate via a known security protocol, such as the Transport Layer Security (TLS) protocol.

As noted above, the access point 106 may in some cases be configured to conduct an initialization phase of a connection-oriented communication session to reduce the time required to conduct such phase (e.g., due to the relative proximity of the access point 106 to the client device 102 relative to the data center 110). To facilitate such functionality, each stream manager 304 may implement an initialization phase with a client and provide information related to an initialization session to devices within data center 110, such as endpoint 202 or session switch manager 212. For example, the information may be provided via a tunnel. Illustratively, the flow manager 304 may generate a tunnel to the session handoff manager 212 and communicate a "TCP handoff" command within the tunnel that includes TCP session state information. The state information may include, for example, a TCP "five-tuple" (typically defining five values for a TCP connection: source address, source port, destination address, destination port, and protocol in use), or a portion thereof, and/or a sequence number of the TCP connection. Upon receipt of a TCP switch command, session switch manager 212 may generate a corresponding entry in its own stateful session table (e.g., TCP table), thus "employing" the connection-oriented communication session. The flow manager 304 may then transmit packets from the client device 102 via the tunnel, which may be decapsulated and processed at the session handoff manager 212 as part of the employed session. In embodiments where session handoff manager 212 is used to complete the handoff of a stateful session, stream manager 304 may not need to select an endpoint 202 to which to transmit a packet. Rather, the stream manager 304 may be configured to consistently select the appropriate session handoff manager 212 within the selected data center 110 as the destination of the client device 102 package. Session handoff manager 212 may in turn select endpoint 202 within data center 110. Thus, the use of session handoff manager 212 may transfer responsibility for selecting endpoint 202 from stream manager 304 to session handoff manager 212. Selecting endpoint 202 at session handoff manager 212 may occur similarly to such selections made by stream manager 304.

In addition to one or more routers 302 and stream manager 304, global access point 106 of fig. 2 includes health check device 306. The health check device 306 may illustratively communicate health checks to the stream manager 304 in order to determine whether such stream manager 304 is malfunctioning. As discussed above, the transmission of the health check data may occur similarly to the transmission of the health check data by the health check devices 204 within the data center 110. Additionally or alternatively, the health check device 306 may transmit health check data to the endpoint 202 and/or the client device 102. Health check data received with respect to endpoint 202 may facilitate routing decisions to endpoint 202. For example, if the health check device 306 is unable to communicate with the endpoint 202, the flow manager 304 may stop selecting the endpoint 202 as the destination of the traffic, regardless of whether the health check device 204 local to the endpoint 202 reports that the endpoint 202 is healthy. The health check data collected with respect to client device 102 may illustratively be used to modify availability advertisements (e.g., BGP advertisements) to devices of network 104. For example, an operator of the global access point 106 may review health check data associated with the client devices 102 to adjust advertisements in an attempt to redirect traffic from some of the devices 102 to access points 106 that are different (e.g., closer) than they are currently routed to on the network 104.

The global access point 106 of fig. 3 also includes a configuration manager 308, the configuration manager 308 being configured to receive configuration information related to services associated with the global network address and to configure the operation of the global access point 106 to implement such configuration. For example, configuration manager 308 may receive information about how router 302 should advertise global network addresses to network 104, information mapping services to global network addresses routed by access points 106 that become available at data centers 110, information identifying data centers 110 and endpoints 202 for such services, information specifying desired load balancing between such data centers 110, and the like. In one embodiment, the configuration manager 308 retrieves configuration information from the configuration data store 210 within the data center 110. For example, the configuration manager 308 may periodically poll the data store 210 for new configuration information. As another example, the data center 110 may "push" configuration changes to the configuration manager 308 using a variety of known push notification techniques. For example, the data center 110 may implement a publish-subscribe ("pub/sub") messaging system, and the configuration manager 208 may publish changes to the configuration of the service provider to the system. The system may then notify the configuration manager 308 of the access point 106 of such changes. Thus, as service providers modify the configuration of their services and global network addresses at data center 110, such modifications may propagate to access point 106. In one embodiment, the access point 106 does not store configuration information in the persistent storage area (and may lack any such persistent storage area) in order to reduce the likelihood that such information may be obtained from the access point 106.

Although examples are discussed above with respect to network-accessible services, each access point server 402 may be associated with multiple services. For example, where each flow manager 304's task is to handle packets addressed to a subset of global network addresses within a group of such addresses, each flow manager 304 may thus handle traffic addressed to any service associated with the global network addresses within the subset.

Although only some components of access point 106 are shown in communication with networks 104 and 108, other components may additionally be in communication with network 108 and/or network 104. The lines of fig. 3 are not intended to represent all actual or potential network connections, but rather illustrate possible flows of service-related traffic through access point 106.

The data center 110 of fig. 2 and the global access point 106 of fig. 3 are as operated in a distributed computing environment including one or more computer systems interconnected using one or more computer networks (not in the respective figures). The data center 110 of fig. 2 and the global access point 106 of fig. 3 may also operate within a computing environment having a fewer or greater number of devices than illustrated in the respective figures. Thus, the data center 110 and in FIG. 2

The description of the global access point 106 of fig. 3 should be considered illustrative and not limiting of the present disclosure. For example, the data center 110 of fig. 2, the global access point 106 of fig. 3, or various components thereof, may implement various web service components, a hosted or "cloud" computing environment, and/or a peer-to-peer network configuration to implement at least a portion of the processes described herein.

Fig. 4 depicts a general architecture of an exemplary computing system (referred to as an access point server 402) that operates to implement the stream manager 304 of the access point 106. The general architecture of the access point server 402 depicted in fig. 4 includes an arrangement of computer hardware and software modules that may be used to implement aspects of the present disclosure. The hardware modules may be implemented with physical electronics, as discussed in more detail below. The access point server 402 may include many more (or fewer) elements than those shown in fig. 4. However, it is not necessary to illustrate all of these generally conventional elements in order to provide a practical disclosure. Additionally, the general architecture illustrated in fig. 4 may be used to implement one or more of the other components illustrated in fig. 2 and 3. As shown, the access point server 402 includes one or more processing units 490, one or more network interfaces 492, and one or more computer-readable medium drives 494, all of which may communicate with each other via a communication bus. The network interface 492 may provide connectivity to one or more networks or computing systems, such as the router 302 (which may correspond to, for example, a commercially available router device). The processing unit 490 may thus receive information and instructions from other computing systems or services via a network, such as network 104 or 108. The processing unit 490 may also communicate to and from memory 480.

Memory 480 may contain computer program instructions (grouped into modules in some implementations) that are executed by processing unit 490 in order to implement one or more aspects of the present disclosure. Memory 480 typically includes Random Access Memory (RAM), read Only Memory (ROM), and/or other persistent, auxiliary, or non-transitory computer-readable media. Memory 480 may store an operating system 482, which operating system 482 provides computer program instructions for the processing unit 490 for use in general management and operation of the access point server 402. Memory 480 may also include computer program instructions and other information for implementing aspects of the present disclosure. For example, in one embodiment, memory 480 includes one or more stream manager units 483, where each stream manager unit represents code executing to implement stream manager 304 of FIG. 3. Each stream manager unit 483 may illustratively be isolated from other units 483 on the server 402. For example, each unit may represent a separate virtual machine or an isolated software container. In some cases, each unit 483 may be associated with a separate processing unit 490, interface 492, or driver 494, thereby minimizing the likelihood that operation of one unit 483 will affect operation of another unit 483. Each unit 483 illustratively includes: an endpoint selector unit 484, the endpoint selector unit 484 representing code executable to select an endpoint 202 to which to route packets addressed to a global network address; a flow table 486, the flow table 486 representing an information table mapping the flow of packets to endpoints 202; a NAT unit 488, the NAT unit 488 representing code executable to perform NAT on packets addressed to a global network address, or a response to such packets from the endpoint 202; and a session switching unit 489, the session switching unit 489 representing a code executable to perform an initialization phase of a connection-oriented communication session and switch the session to a receiving device. Although not shown in fig. 4, memory 480 also illustratively includes an encapsulation unit that represents code executable to generate a tunnel connection to another device to enable transmission of encapsulated packets and to encapsulate/decapsulate to facilitate such transmission.

The memory 480 may also include: a health check unit 496, the health check unit 496 corresponding to instructions executable to implement the health check device 306; and a configuration manager unit 498, said configuration manager unit 498 corresponding to instructions executable to implement the configuration manager 308. In some embodiments, the health check device 306 and the configuration manager 308 may be implemented as separate devices rather than as part of the access point server 402. Further, while shown as distinct from access point server 402, router 302 may be incorporated into server 402 in some cases (e.g., by including software in memory 480 that is executable to implement routing functions).

In one embodiment, the access point server 402 is devoid of any non-volatile memory and is configured to operate only with respect to volatile memory, such as RAM. Illustratively, the access point server 402 may be configured to use a pre-boot execution environment (PXE) such that upon initialization, the access point server 402 retrieves at least a portion of the contents of the memory 480 (such as the operating system 482, the health check unit 496, and the configuration manager unit 498) from a network accessible storage location (e.g., the configuration data storage area 210). Configuration manager unit 498 may thereafter retrieve additional configuration information from the network accessible storage location, such as the configuration of individual services and associated global network addresses, and utilize such additional information to generate stream manager unit 483. To prevent unauthorized disclosure of the contents of the memory 480, authentication of the server 402 at the storage location may be at least partially linked to the network location of the server 402 (e.g., at the access point 106) such that attempting to physically relocate the server 402 may result in the inability to retrieve the contents of the memory 408.

Although fig. 4 depicts a single server 402 and router 302, in some cases, global access point 106 may be implemented by multiple servers 402 and/or routers 302. In some cases, such servers 402 or routers 302 may be physically or logically isolated to avoid propagation of errors between such servers 402 or routers 302. Illustratively, where the access point 106 handles multiple pools of network addresses, each pool may be handled by a different server 402 and router 302. Thus, if one router 302 and/or server 402 fails, only the services associated with the pool handled by that router 302 and/or server 402 are expected to be affected.

With reference to fig. 5, an illustrative interaction will be described that depicts how individual global access points 106A may operate flexibly by providing multiple routes to reach network accessible services utilizing global network addresses. Fig. 5 depicts an environment 500, which may in turn represent an embodiment of a portion of environment 100 of fig. 1. Specifically, in environment 500, network 104 is divided into networks 104A-104E, where each network may represent, for example, an autonomous system. Networks 104A-104C are illustrated as communicating with client device 102. These networks 104A-104C may represent, for example, internet Service Providers (ISPs) of the client device 102. Networks 104D and 104E represent other ases with which global access point 106A has a network connection. Although networks 104D and 104E are not shown in fig. 5 as being connected to client device 102, such networks may also act as ISPs to client device 102. Each of the networks 104A-104E is shown in fig. 5 as being interconnected with other networks 104A-104E. This configuration is illustrative and may vary from implementation to implementation.

In the illustration of fig. 5, client device 102 typically has two routes through it to global access point 106A: via networks 104D and 104E. To increase the access resilience to the global access point 106A, the access point 106A may selectively transmit an availability advertisement to the networks 104D and 104E. Specifically, as shown in fig. 5, at (1), the global access point 106A may identify at least two groups of global network addresses, such as two consecutive network address ranges. Thereafter, the global access point 106A transmits an advertisement (e.g., BGP advertisement) for the first network group to the network 104D at (2), rather than advertising all global network addresses to both networks 104D and 104E. Similarly, at (3) c, the global access point 106A transmits an advertisement (e.g., BGP advertisement) for the second network group to the network 104E. In this way, traffic addressed to an address within a first network group is likely to reach access point 106A through network 104D, and traffic addressed to an address within a second network group is likely to reach access point 106A through network 104E. Each service may be associated with a global network address of at least two groups. Thus, if an error or problem occurs on either of the networks 104D or 104E, the client device 102 may utilize the alternate global network address of the service to access the global access point 106A through the remaining networks.

With reference to fig. 6, an illustrative interaction for operation of the global access point 106 to facilitate communication between the client device 102 and the data center 110 for the network-accessible service providing endpoint 202 will be described. The interaction of fig. 6 begins at (1), where the client device 102 transmits a data packet addressed to a global network address associated with the global access point 106A to the global access point 106A. For example, the data packets may be formatted according to Transmission Control Protocol (TCP), user Datagram Protocol (UDP), or Internet Control Message Protocol (ICMP). Based on the advertisement that global access point 106A indicates the global network address to which the data packet is addressed is available via global access point 106A, the packet may be transmitted via network 104 and routed to global access point 106A via operation of network 104. For example, the global access point 106A may be the closest (e.g., in terms of network distance) global access point 106 to the client device 102.

Upon receipt of the data packet, router 302 within global access point 106A assigns the packet to a stream manager 304 within access point 106. In the event that a data packet is not associated with an existing data flow, router 302 may utilize ECMP load balancing to assign the packet to flow manager 304 based on the network address to which the packet is addressed. In the case where a data packet is associated with an existing data flow, router 302 may route the packet to the same flow manager 304 within the flow to which the previous packet was routed.

At (2), the flow manager 304 within the access point 106A assigns the data packet to the data center 110. In the event that the data packet is not associated with an existing data stream, the stream manager 304 may apply a load balancing algorithm to the data packet to select the data center 110 from a set of available data centers 110. As described above, the selection algorithm may include, for example, initially selecting a data center 110, the data center 110 incorporating a combination of network and geographic criteria and a selected distribution algorithm. As explained above, the distribution algorithm may correspond to a distribution based on a distribution criterion (such as percentage distribution). In this way, data packets of client device 102 may be routed to the same data center 110 even when they are rerouted to different access points 106. Where a data packet is associated with an existing data flow, flow manager 304 may route the packet to the same data center 110 within the flow to which the previous packet was routed.

After selecting data center 110, stream manager 304 may select an endpoint 202 within the selected data center 120. For a given data center 110, flow manager 304 may illustratively maintain a set of endpoints 202 that provide network-accessible services associated with a global network address to which the data packet is addressed. In one embodiment, this set of endpoints 202 may be identified based on information received from resource manager 206 within data center 110. The flow manager 304 may further maintain information regarding the apparent health of the endpoint 202 as obtained from the health check device 204 within the data center 110, the health check device 306 within the access point 106A, or both. Flow manager 304 may thus select healthy endpoints from the set of endpoints 202 for services associated with the network address to which the packet is addressed. Flow manager 304 may utilize any of a number of known load balancing algorithms, such as consistent hashing, to distribute packets among endpoints 202. More specifically, as described above, in one embodiment, the stream manager 304 may implement a weighted rendezvous hash or weighted consistent hashing algorithm based on the 5-tuple information of the data packet. Flow manager 304 may then select endpoint 202 based on the "highest" scoring endpoint to achieve consistent selection of the complete data flow. In the case where a packet is associated with an existing packet flow, flow manager 304 may select the same endpoint 202 in the flow to which the previous packet has been routed.

After selecting an endpoint, the flow manager 304 within the access point 106A modifies the packet as needed to route the packet to the selected endpoint 202. Specifically, in the embodiment of fig. 6, the flow manager 304 applies NAT techniques to the packets at (5), such as by replacing the source and destination network addresses of the packets. Thereafter, at (6), the stream manager 304 transmits the packet to the selected endpoint 202 within the selected data center 110.

At (7), endpoint 202 processes the packet according to the network-accessible service. As described, the data center 110 may implement a network device, such as a shelf-top, that may distribute packet traffic to endpoints based on the assigned sub-portions or fabric layers. More specifically, in one aspect, TOR may utilize 5-tuple information included in the communication to route the data packet based on the allocation subsection. Still further, the communication may be encapsulated, such as in a VLAN, to provide further security. In the event of a failure, network endpoint 202 associated with the packet may broadcast a path MTU discovery ("PMTUD") message to other endpoints within the assigned subgroup or subsection to maintain state information or determine which subsections are to process the service request to avoid duplicate processing/assignment. This allows the endpoint to receive information quickly and avoids duplicate processing. Endpoint 202 then returns a response packet to global access point 106A. Since the response packet is associated with an existing data stream, router 302 within access point 106A directs the response packet to the same stream manager 304 discussed above. The flow manager 304 inverts the previous NAT transformation to address packets to the client device 102. The flow manager 304 then returns the data packet to the client 102. Thus, via the interactions of fig. 6, the client device 102 may address the data packet to a global network address and be routed (via the global access point 106) to an endpoint that provides a network-accessible service associated with the network address. The access point 106 may apply load balancing to the packets according to a desired configuration of the service provider such that the load on the service is distributed among the data centers 110.

With reference to fig. 7, an illustrative routine 700 for increasing the resilience of global network addresses by selecting advertising addresses to different adjacent devices will be described. The routine 700 may be illustratively implemented by the global access point 106 (e.g., during initialization of the access point 106). In one embodiment, an instance of the routine 700 is implemented by each access point 106.

The routine 700 begins at block 704, where a global network address to be serviced via the access point 106 is obtained at block 704. This information may illustratively be obtained as part of the configuration of access point 106 (such as within a configuration file stored in configuration data store 210), obtained during initialization of access point 106. The global network address may illustratively be partitioned into different addressing pools in the configuration. For example, each pool may include a different network address "box," such as a continuous address range. The range may, for example, represent a "subnet". In one embodiment, a range is represented by a "prefix" indicating the first n bits of a network address in the range. For example, the prefix 192.0.2.0/24 (expressed in terms of classless inter-domain routing or CIDR) may represent the first 24 bits of an IPv4 address, corresponding to addresses in the range 192.0.2.0 to 192.0.2.255.

At block 708, the access point 106 selects a neighboring device to which to advertise each pool of network addresses. In general, the available pool may be divided among available neighboring devices in order to provide different network paths from them to the access point. In one embodiment, such selection is based on specifications within the configuration of the access point 106. In another embodiment, such selection is determined by the access point 106. Illustratively, the access point 106 may divide the pool evenly among available neighboring devices (which may be discovered, for example, through typical routing protocols). In some cases, neighboring devices may be determined based additionally on the preferred route of traffic. For example, the access point 106 may be configured not to advertise a given pool of network addresses on a given neighboring device in order to cause the neighboring device to route requests for addresses within the pool to the alternate access point 106.

Thereafter, at block 710, the global access point 106 transmits an availability advertisement for the corresponding network address pool to the associated neighboring device. For example, the advertisement may be a BGP protocol advertisement. The transmission of BGP protocol advertisements is generally known in the art and will therefore not be described in detail herein.

As described above, partitioning global network addresses into pools and advertising such pools to different neighboring devices may advantageously increase the resilience of the access point 106, particularly to dropped calls or errors on neighboring networks. In particular, since each network-accessible service may be associated with a different pool of global network addresses (e.g., one network-accessible service per pool), the client device 102 of the service may be made aware of multiple routes for reaching the service (e.g., via DNS). This technique may further increase the resilience of access point 106 to limited dropped calls at access point 106, such as dropped calls to network interfaces or routers connected to neighboring networks. As noted above, in one embodiment, the resilience of each service is further increased by dividing each pool into subsets of network addresses, which in turn may be distributed among the flow managers 304 of the access points 106. The association between the service and each subset may be "reorganized" between pools. For example, the global network address of a service in a given pool may be randomly selected, or may be selected via a different selection mechanism for each pool. This may result in service packets within the subset being "reorganized" between pools. In this way, if the configuration of an individual service is problematic for other services within the subset, then other services affected by the misconfiguration are likely to vary from pool to pool. Since each service may be expected to be accessed via addresses in multiple pools, the client device 102 may connect to each other service via an alternate address that replaces the pool, thereby bypassing the problematic subset and pool. Thus, this reorganization mechanism can greatly improve the flexibility of services using global network addresses.

Referring to fig. 8, an illustrative routine 800 for routing traffic addressed to a global network address associated with a service provided by an endpoint 202 within the data center 110 will be described. The routine 800 may be illustratively implemented by the flow manager 304 within the access point 106. In one embodiment, an instance of routine 800 is implemented by each stream manager 304 of each access point 106.

The routine 800 begins at block 802, where the stream manager 304 receives a data packet addressed to a global network address advertised as accessible at the access point 106. For example, a data packet may represent a request to access a network accessible service that becomes available via a network address. In one example, the data packet is formatted as a TCP or UDP data packet.

At block 803, the routine 800 varies depending on whether the data packet is associated with an existing packet stream, as determined at block 803. Illustratively, the access point 106 may compare the attributes of the data packet with the attributes of the previous packet to determine whether the new data packet is within the same communication flow as the previous packet. Any number of flow identification techniques (a variety of which are known in the art) may be used to determine whether a data packet is within an existing flow.

If the packet is part of an existing flow, the routine 800 proceeds to block 813 where the flow manager 304 selects the same endpoint 202 selected for a previous packet within the flow as the endpoint 202 of the packet. In one implementation, each stream manager 304 may maintain a cache in memory that associates streams with endpoints 202 to facilitate this selection.

If the packet is not part of an existing flow, the routine 800 proceeds to block 804, where the flow manager 304 identifies a network accessible service associated with the network address to which the packet is addressed. The flow manager 304 may illustratively access information mapping network addresses to associated services (e.g., as maintained in memory of the access point 106) to identify the services associated with the addresses.

Thereafter, at block 806, the access point 106 selects a data center 110 that provides network accessible services. In one embodiment, the data center 110 is selected based at least in part on geographic and network performance criteria between the access point 106 and the data center 110 and a load distribution algorithm for network accessible services. Still further, in one embodiment, the threshold desired maximum value may be specified, for example, by a service provider, such as via a desired percentage of traffic to be routed to each data center 110. In another embodiment, the threshold desired maximum may be determined jointly by the plurality of access points 106 such that the combined threshold of each access point 106 achieves the desired partitioning of service providers between data centers 110. For example, the access point 106 may implement a selection algorithm that aggregates traffic volumes to services across the access point 106 and determines an optimal route for each data packet based on the access point 106 (and network performance criteria corresponding to each data center 110) to which the packet is addressed. The algorithm may then modify the optimal route for each access point 106 as needed to achieve a globally optimal route, resulting in a personalized desired ratio between data centers 110 for each access point 106. If so, the access point may modify the initial selection to be an alternative data center, such as the next closest data center. In one embodiment, the implementation of block 806 is repeated for each data center 110 selected to ensure that no data center receives more than the desired maximum proportion or volume of data packets.

After selecting data center 110, flow manager 304 selects an endpoint 202 within data center 110 to which to route the data packet. In one embodiment, flow manager 304 may utilize a consistent hash to select endpoint 202 based on a property of the packet (e.g., source IP address). As described above, in one embodiment, the stream manager 304 may implement a weighted rendezvous hash or weighted consistent hash algorithm based on the 5-tuple information of the data packet. Flow manager 304 may then select endpoint 202 based on the "highest" scoring endpoint to achieve consistent selection of the complete data flow.

Thereafter, at block 810, the flow manager 304 applies NAT to the data packet. Illustratively, the stream manager 304 may replace the source address of the packet with the address of the global access point 106 (e.g., a unicast address, rather than a global network address), and replace the destination address with the address of the endpoint 202 (which may also be a unicast address of the endpoint 202). For example, the flow manager 304 may also modify the port number of the packet to facilitate later application of NAT technology to packets within the flow. Flow manager 304 may then transmit the packet to the selected endpoint 202. Routine 800 may then end at block 814.

Various modifications to routine 800 are contemplated herein. For example, while routine 800 includes three blocks (blocks 804, 806, or 808) associated with selecting an endpoint 202 for packets not associated with an existing flow, some implementations of routine 800 may not include one or more of these blocks. Illustratively, rather than individually identifying a service and a data center associated with the service, stream manager 304 may maintain data mapping network addresses to data centers 110 and endpoints 202 associated with the addresses. The flow manager 304 may then select the data center 110 based on the criteria described above without directly identifying the service associated with the address. In other embodiments, stream manager 304 may only maintain data associating network addresses with corresponding endpoints 202, and may select endpoints 202 directly for packets without first identifying the services associated with the addresses or the data centers 110 providing the services. For example, each endpoint 202 may be selected in a manner similar to the data center 110 described above by: the endpoint 202 with the best performance criterion value is selected and the selection is modified when the endpoint 202 is receiving a desired proportion of the data volume. Further, although routine 800 is shown in fig. 8 as ending at block 814, the flow manager 304 may continue to perform other actions after routine 800, such as acting as a NAT device for communications between the client device 102 and the endpoint 202. Thus, the depiction of routine 800 in FIG. 8 is one embodiment of the concepts disclosed herein.

Referring to fig. 9, an illustrative routine 900 for selecting a data center 110 at the global access point 106 that provides network accessible services will be described. The routine 900 may illustratively be implemented by the stream manager 304 in the global access point. In one embodiment, an instance of routine 900 is implemented by each resource manager 206 of each data center.

Routine 900 begins at block 902, where the flow manager 304 of the data center 110A obtains routing configuration information. Illustratively, the routing configuration information may include an allocation or distribution of zones/sub-zones. At block 904, the stream manager determines geographic proximity or other performance criteria. Illustratively, the distribution may be specified as a percentage of traffic, a total number of data packets (e.g., a total amount of data, a cost of allocation or accounting for individual endpoints, etc. a system administrator may illustratively utilize a software tool or interface (e.g., an API) to provide the allocation, as will be described in the various examples herein.

At block 906, the flow manager forms a set of considerations based on the product of the geographic proximity and the configuration. As described above, the stream manager 304 may implement an algorithm (such as a coin flipping algorithm) to implement the percentage selection. The algorithm may also consider additional factors such as geographic or network criteria. At decision block 908, a test is conducted to determine if there is a full allocation. If so, the routine sets the configuration at block 912 and the routine terminates at block 914. Alternatively, the stream manager 304 may process by using a default procedure or an automatic linear process.

With reference to fig. 10, an illustrative interaction for conducting an initialization phase of a connection-oriented communication session at a global access point 106A and handing off the session into a data center 110, thus reducing the time required to establish such a session, will be described. The interactions of fig. 10 may be implemented by, for example, the stream manager 304 of the access point 106. The interactions of fig. 10 will be described with respect to a single access point 106A and data center 110A. However, similar interactions may potentially occur simultaneously with respect to other access points 106 and data centers 110. Furthermore, the interactions of fig. 10 will be with respect to a particular type of connection-oriented communication session: TCP sessions. Similar interactions may occur with respect to other connection-oriented communication sessions. Fig. 11 accompanies fig. 10 to illustrate the handshake aspect.

In addition, TCP transport throughput may be limited by two considerations, namely congestion-based limitations and data processing-based limitations. Congestion limits in TCP attempt to manage transmissions so as not to exceed the capacity of the network (congestion control). Data processing in TCP attempts limits the ability of the management recipient to process the data (flow control). Illustratively, individual TCP segments contain values for various TCP settings that allow for configuration and implementation of congestion limits or data processing limits.

The interaction of fig. 11 begins at (1), where client device 102 attempts to initiate a connection-oriented communication session with a network service by transmitting TCP SYN packets addressed to the global network address of the service. According to the above functionality, the TCP SYN packet is transmitted to the global access point 106A, as shown in FIG. 11. The TCP SYN packet illustratively includes a sequence number of client device 102. In addition, in some embodiments, the TCP protocol implements a congestion avoidance algorithm that steadily increases the configuration of the number of packets being forwarded and increases the number of packets linearly based on data throughput. In TCP, such a value may be a maximum segment size ("MMS"). In this embodiment, the global access point may take advantage of the additional throughput between the global access point 206 and the endpoint 202 by setting the initial value higher. This allows immediate benefit from the larger available bandwidth.

In a similar manner, the global access point 206 may further set the data processing limit (i.e., the maximum window size) to a larger value or maximum because the global access point 206 will have greater capability to process data.

At (2), the flow manager 304 of the global access point 106A continues the TCP three-way handshake by returning a TCP SYN-ACK packet to the client device 102, illustratively including the sequence number of the access point 106A (and acknowledging the sequence number of the client device 102). At (3), client device 102 continues the TCP three-way handshake by returning a TCP ACK packet to access point 106A, acknowledging the sequence number of access point 106A. Upon receipt of the TCP ACK packet, a TCP session is initiated between client device 102 and global access point 106A. Since access point 106A is expected to be in the vicinity of client device 102 (e.g., in terms of delay), interactions (1) - (3) are expected to be completed quickly relative to the initialization phase between client device 102 and endpoint 202.

At (4), client device 102 (understanding that the TCP session has been initialized with the service) transmits data packets within the TCP session addressed to the global network address of the service. The data packet is routed to the global access point 106A, which global access point 106A selects the data center 110 to which to route the packet at (5). The selection of the data center may occur in the same manner as described above (e.g., as interaction (3) of fig. 6). The access point 106 further encapsulates the data packet at (6) for tunneling to the selected data center 110 (in this case data center 110A). In the interaction of fig. 11, it is assumed that access point 106A has previously established a tunnel to data center 110A (e.g., to session handoff manager 212 of data center 110A). For example, access point 106A may maintain one or more idle TCP tunnels to data center 110A for transmitting packets to data center 110. However, additional interactions may be included in which the global access point 106A establishes a tunnel (e.g., UDP or TCP tunnel) to the data center 110A. To facilitate handoff of an established TCP session, the encapsulation data packet also illustratively includes TCP session state information, such as quintuple information for the TCP session and a sequence number of the session. In the embodiment illustratively depicted in fig. 11, the session state information is included as header information of the encapsulated data packets. At (7), the global access point 106A transmits the encapsulated packet to the data center 110A (e.g., to a session handoff manager 212 within the data center 110A).

At (8), upon receiving the encapsulated packet, a device within data center 110A (e.g., session handoff manager 212, or in some cases endpoint 202) employs the TCP session by adding information indicated within the encapsulated packet (e.g., within a header) to its own TCP session state table. The device then decapsulates the packets of the client device 102 at (9) and processes at (10) as if the packets were received directly at the device. For example, where the device is an endpoint 202, the endpoint 202 may process the data of the packet according to the service requested by the client device 102. Where the device is a session handoff manager 212, the manager 212 may process the package by identifying the endpoint 202 to service the package (e.g., in a manner similar to interaction (4) of fig. 6) and utilize the endpoint 202 to service the package. Illustratively, because session handoff manager 212 (and in this case endpoint 202) is one party to the TCP connection with client device 102, manager 212 may initiate a second TCP connection with endpoint 202 and pass data within the packets of client device 102 to endpoint 202 via the second TCP session. Manager 212 may continue to operate as a proxy between client device 102 and endpoint 202 for future communications. For example, manager 212 may obtain a response from endpoint 202 and facilitate transmission of the response to client device 102 via a TCP session with client device 102.

At (11), devices within the data center 110A transmit the response packet to the global access point 106A, such as via a tunnel. The interaction (11) may comprise, for example, encapsulating a response packet for tunneling. The global access point 106A then forwards the response packet to the client device 102 at (12). The interaction (12) may include, for example, decapsulating the response packet for transmission to the client device 102. Further communication between client device 102 and endpoint 202 within data center 110A may occur in a similar manner to interactions (4) through (7) and (9) through (12) described above. Thus, client device 102 may communicate with data center 110A via a TCP session without actually having to complete the initialization phase of the session with the devices of data center 110A. Fig. 11 depicts the above interactions illustrating assignment to two different TCP handshake algorithms.

Although fig. 10 depicts illustrative interactions, these interactions may vary between implementations. For example, although the response packet is depicted in fig. 10 as traversing the global access point 106A, in some cases, the endpoint 202 may be configured to respond directly to the client device 102 without the need for a response traversing the access point 106A. For example, endpoint 202 (or manager 212) may transmit response packets directly (e.g., through network 104) to client device 102 instead of encapsulating and transmitting response packets to access point 106A, thereby avoiding the need for encapsulation. As another variation, while session information is depicted in fig. 11 as being included within an encapsulated data packet, in other embodiments session state information may be included in a separate packet. For example, global access point 106A may be configured to transmit session state information to data center 110A separately after an initialization phase of a communication session is conducted, such as in a "session adoption" command to data center 110A. As yet another variation, while manager 212 is discussed above as selecting an endpoint 202 within data center 110A to serve client device 102, in some cases access point 106 may select such an endpoint 202 even with the use of manager 212. For example, access point 106 may select endpoint 202 and inform manager 212 which endpoint has been selected (e.g., in the header of the encapsulated packet). Where the global access point 106 selects an endpoint 202 to which to route the packet, the access point 106 may modify the packet to facilitate routing to the endpoint 202. For example, access point 106 may replace the global network address of the service with the network address (e.g., a "unicast" address) of endpoint 202 within the destination address field of the packet. Thus, the interactions of FIG. 10 are intended to be illustrative in nature.

Although the interactions of fig. 10 depict interactions of the client device 102 with a single global access point 106A, in some cases the client device 102 may interact with multiple global access points 106. As noted above, each global access point 106 may be configured to advertise a global network address to public networks (e.g., network 104) that are not controlled by the global access point 106. Thus, devices on such networks may generally determine the access point 106 to which packets addressed to the global network address of the client device 102 are routed. In some configurations, rerouting packets within a given packet stream to a different global access point 106 may adversely affect client device 102 communications. For example, where the client device 102 establishes a TCP connection that requires status information to be maintained at a separate access point 106 (e.g., a TCP connection with the access point 106 or NAT by the access point 106), rerouting of the client device 102 communication to a different access point 106 may undesirably disconnect the connection.

The interactions of fig. 10 address this by enabling the TCP connection between client device 102 and endpoint 202 (or session handoff manager 212) to be maintained even when a rerouting of client device 102 packets to a different access point 106 occurs. Specifically, each access point 106 may be configured to apply the same load balancing criteria when selecting the data center 110 or endpoint 202 to which to route packets of the client device 102. Such load balancing criteria may be agnostic to the access point 106 (e.g., unchanged regardless of the access point 106 to which it is applied). For example, the load balancing criteria may reflect the delay between the client device 102 and the data center 110 and the health of the data center 110 (or endpoints 202 therein) regardless of the delay to and from the access point 106. Thus, each access point 106 may expect to route packets for a given client device 102 to the same data center 110. Thus, for example, if the client device 102 transmits an additional packet to the second access point 106, the second access point 106 will apply the same load balancing criteria to select the data center 110A as the destination of the packet. The second access point 106 will then route the packet to the endpoint 202 (or session handoff manager 212), which endpoint 202 will process the packet in the same manner as if the packet had been routed through access point 106A. Since the interaction of fig. 10 does not require maintenance of state information at access point 106A, and since the encapsulation mechanism of fig. 10 maintains the source network address of client device 102 within the encapsulated packets, no interruption of the connection-oriented communication session (e.g., TCP session) will occur. Thus, the interactions of fig. 10 solve the rerouting problem that would otherwise occur when routing connection-oriented communication sessions using anycast techniques.

With reference to fig. 12, an illustrative routine 1200 for initializing a connection-oriented communication session at the global access point 106A and switching the session to the manager 212 within the data center 110 will be described. Routine 1200 is illustratively implemented cooperatively between the global access point 106A and the session to the manager 212, and thus the blocks of routine 1200 are divided among such devices. Although portions of routine 1200 are depicted as being implemented by manager 212, in some cases, these portions may instead be implemented directly within endpoint 202.

Routine 1200 begins at block 1202, where the access point 106 (e.g., stream manager 304) obtains a request addressed to a global network address of a service to initiate a connection-oriented communication session with the service. The request may be, for example, a TCP SYN packet.

At block 1204, the access point 106 completes the initialization phase of the session according to the particular protocol used for the session. For example, where the protocol is TCP, the initialization phase may include a three-way handshake with the client device 102.

At block 1206, the access point 106 receives data packets from the client device 102 within the session. For example, the packet may be a payload packet within a TCP session.

At block 1208, the access point 106 encapsulates the data packet for tunneling to the manager 212 via the network. The access point 106 also includes a session context for the communication session, such as a TCP quintuple and sequence number, in the encapsulated packet (e.g., as header information for the encapsulated packet). The access point 106 then sends the packet to the handoff manager 212 as a handoff request for the communication session at block 1210.

At block 1212, handoff manager 212 receives the encapsulated packet and, at block 1214, builds a communication session within its local data based on the context information from access point 106. The manager 212 thus employs the session, thereby enabling subsequent communications within the session between the client device 102 and the manager 212. At block 1216, the handoff manager 212 decapsulates the data packet of the client device 102 and processes the packet within the session. For example, manager 212 may select endpoint 202 to handle the request and transmit the content of the data packet to endpoint 202 via another communication session. Routine 1200 then ends at block 1218. Thus, client device 102 and manager 212 may communicate via a stateful session, while client device 102 is required to communicate with manager 21 to establish the session.

Routine 1200 may include additional or alternative blocks in addition to those described above. For example, prior to sending the encapsulated packet as a handoff request at block 1210, the access point 106 may select a data center to receive the handoff request in a manner similar to the selection of the data center discussed in fig. 9. Further, although routine 1200 is depicted as ending after block 1216, access point 106 and manager 212 may continue to operate to facilitate communication with client device 120 within the session or with client device 120, as discussed above. Thus, the number and arrangement of blocks in fig. 12 is illustrative in nature.

All of the methods and processes described above may be embodied in and fully automated via software code modules executed by one or more general-purpose computers or processors. The code modules may be stored in any type of non-transitory computer-readable medium or other computer storage device. Some or all of the methods may alternatively be embodied in dedicated computer hardware.

Conditional language such as "capable," "may," "possible," or "perhaps" are generally to be understood within the context of use as additionally representing, unless otherwise specifically stated: although other embodiments do not include, certain embodiments include certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that the features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments must include logic for deciding, with or without user input or prompting, whether to include such features, elements and/or steps or whether to perform such features, elements and/or steps in any particular embodiment.

Unless specifically stated otherwise, disjunctive language such as at least one of the phrases "X, Y or Z" is generally additionally understood to mean that the item, term, etc., may be X, Y or Z or any combination thereof (e.g., X, Y and/or Z), depending on the context in which it is used. Thus, such disjunctive language is generally not intended nor should it be implied that certain embodiments require the respective presence of at least one of X, at least one of Y, or at least one of Z.

Articles such as "a" or "an" should generally be construed to include one or more of the described items unless expressly stated otherwise. Thus, a phrase such as "a device is configured to" is intended to include one or more of the listed devices. Such one or more enumerated devices may also be collectively configured to perform the recited enumeration. For example, a "processor configured to perform enumeration A, B and C" may include a first processor configured to perform enumeration a working in conjunction with a second processor configured to perform enumeration B and C.

Any routine descriptions, elements, or blocks in the flowcharts described herein and/or depicted in the figures should be understood as potentially representing code modules, code segments, or portions of code which include one or more executable instructions for implementing specific logical functions or elements in the routine. Alternate implementations are included within the scope of the embodiments described herein in which elements or functions may be deleted, or executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those skilled in the art.

It should be emphasized that many variations and modifications may be made to the above-described embodiments, wherein the elements are understood to be among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.

Examples of embodiments of the present disclosure may be described in view of the following clauses.

Clause 1. A system, comprising: a plurality of endpoints, each endpoint being located in a different geographic location and comprising at least one server computing device configured to provide a network accessible service associated with a network address; a plurality of global access points associated with the network accessible service, wherein individual ones of the plurality of global access points include a processor associated with different geographic locations and configured to: advertising a network prefix of the network address, e.g. reachable via the access point, using an anycast method; receiving a network packet addressed to the network address from a client device; selecting a data center comprising a plurality of endpoints based on an implementation of a distribution algorithm corresponding to the network and a product of a geographic criterion and a distribution criterion; selecting an endpoint to which to route the network packet from the plurality of endpoints included in the selected data center based on an implementation of a selection algorithm corresponding to a consistent hashing algorithm; and routes the transformed packets to the selected endpoint.

Clause 2. The system of clause 1, wherein the data center corresponds to a zone and wherein the plurality of endpoints correspond to two or more sub-zones.

Clause 3 the system of clause 2, wherein the distribution algorithm distribution criteria corresponds to specifications for the two or more sub-regions.

Clause 4 the system of clause 1, wherein the distribution criterion corresponds to a percentage of the network traffic being fully allocated to the zone.

Clause 5. The system of clause 1, wherein the consistent hashing algorithm corresponds to one of a weighted rendezvous hashing algorithm or a weighted consistent hashing algorithm.

Clause 6. A method implemented at a plurality of access points to a network accessible service distributed across a geographic area, the method comprising: receiving, at a global access point, a network packet from a client device, the network packet addressed to the network address; selecting, by the global access point, a data center comprising a plurality of endpoints based on an implementation of a distribution algorithm corresponding to a network and a product of a geographic criterion and a distribution criterion; selecting, by the global access point, an endpoint to route the network packet to from the plurality of endpoints included in the selected data center based on an implementation of a selection algorithm corresponding to determining a score of the network packet; and routes the transformed packets to the selected endpoint.

Clause 7 the method of clause 6, further comprising: advertising a network address of the network-accessible service as reachable via each access point;

clause 8 the method of clause 6, wherein the data center corresponds to a zone and wherein the plurality of endpoints correspond to two or more sub-zones.

Clause 9 the method of clause 8, wherein the distribution algorithm distribution criteria corresponds to a specification for the two or more sub-regions.

Clause 10 the method of clause 8, wherein the distribution criterion corresponds to a percentage of the network traffic being fully allocated to the zone.

Clause 11. The method of clause 8, wherein the distribution criterion corresponds to a percentage of the partial allocation of network traffic to the zone.

Clause 12 the method of clause 11, wherein the distribution criterion comprises a default allocation of the remaining portion of the sub-region.

Clause 13 the method of clause 12, wherein the default allocation of the remaining portion of the sub-region comprises geographic proximity.

Clause 14 the method of clause 6, wherein the score of the network packet corresponds to a score based on at least a source IP address, a source port address, a destination IP address, a destination port address, and a protocol.

Clause 15 the method of clause 14, wherein selecting an endpoint from the plurality of endpoints corresponds to selecting the endpoint with the highest score.

Clause 16 is a non-transitory computer-readable medium comprising computer-executable instructions that, when executed by each of a plurality of access points to a network-accessible service distributed across a geographic area, configure each access point to: receiving a network packet from a client device addressed to a network address of the network-accessible service; selecting a data center from a plurality of available data centers and selecting a relevant endpoint to route the network packet to from a plurality of endpoints associated with a single data center based on a combined implementation based on a distribution algorithm corresponding to a network and a product of a geographic criterion and a distribution criterion and a selection algorithm corresponding to a consistent hashing algorithm that determines a score of the network packet; and routes the transformed packets to the selected endpoint.

Clause 17. The non-transitory computer-readable medium of clause 16, wherein the data center corresponds to a zone and wherein the plurality of endpoints correspond to two or more sub-zones.

Clause 18 the non-transitory computer readable medium of clause 16, wherein the distribution criterion corresponds to a scalable allocation.

Clause 19, the non-transitory computer-readable medium of clause 16, wherein the score of the network packet corresponds to a score based on at least a source IP address, a source port address, a destination IP address, a destination port address, and a protocol.

Clause 20. The non-transitory computer readable medium of clause 16, wherein the distribution criteria corresponds to performance criteria. The endpoint is based at least in part on an apparent health of the endpoint to the access point.

Clause 21, a system, comprising: a plurality of endpoints, each endpoint being located in a different geographic location and comprising at least one server computing device configured to provide a network accessible service associated with a network address; at least one global access point associated with the network accessible service, wherein the at least one global access point comprises a processor and is configured to: receiving a network packet addressed to the network address from a client device; selecting a data center comprising a plurality of endpoints based on an implementation of a distribution algorithm; selecting an endpoint from the plurality of endpoints to which to route the network packet based on an implementation of a selection algorithm; and routing the received packets to the selected endpoint via the encapsulated communication; wherein the plurality of endpoints are assigned into subgroups of tunnels that share a common encapsulation, and wherein each of the endpoints in the subgroups is configured to transmit communication messages to other subgroups for at least one of forwarding client packets or providing segmentation information.

Clause 22 the system of clause 21, wherein the encapsulation tunnel corresponds to a virtual local area network.

Clause 23 the system of clause 21, wherein the plurality of endpoints includes a network component for processing the subgroup message using five-tuple communication.

Clause 24 the system of clause 21, wherein the communication information message comprises a path maximum transmission size unit discovery packet.

Clause 25 the system of clause 21, wherein each of the endpoints transmits the communication message via a multicast message.

Clause 26 the system of clause 21, wherein each of the endpoints transmits the communication message via a broadcast message.

Clause 27. A method implemented at a plurality of access points to a network accessible service distributed across a geographic area, the method comprising: receiving, at an access point, a network packet from a client device, the network packet addressed to a network address associated with a service; selecting, by the access point, a data center comprising a plurality of endpoints based on an implementation of a distribution algorithm; selecting, by the access point, an endpoint from the plurality of endpoints to which to route the network packet based on an implementation of a selection algorithm; and routing, by the access point, the received packets to the selected endpoints via the encapsulated tunnels, wherein the plurality of endpoints are assigned into a subgroup of tunnels sharing a common encapsulation, and wherein each of the endpoints in the subgroup is configured to transmit management messages to other endpoints in the subgroup for at least one of forwarding client packets or providing segmentation information.

Clause 28 the method of clause 27, wherein the encapsulation tunnel corresponds to a virtual local area network.

Clause 29 the method of clause 27, wherein the plurality of endpoints includes a network element for routing the received packet using five-tuple communication.

Clause 30 the method of clause 27, wherein the five-tuple communication comprises a source IP address, a source port address, a destination IP address, a destination port address, and a protocol.

Clause 31 the method of clause 27, wherein the management message comprises a path maximum transmission size unit discovery packet.

Clause 32 the method of clause 27, wherein each of the endpoints transmits the management message via multicast communication.

Clause 33 the method of clause 32, wherein the individual endpoints register with the single multicast communication channel based on the associated subgroup.

Clause 34 the method of clause 27, wherein each of the endpoints transmits the management message via broadcast communication.

Clause 35 the method of clause 34, wherein all of the individual endpoints are registered with the broadcast communication channel and filter packets not associated with the assigned subgroup.

Clause 36, a non-transitory computer-readable medium comprising computer-executable instructions that, when executed by each of a plurality of access points to a network-accessible service distributed across a geographic area, configure each access point to: receiving a network packet from a client device addressed to a network address of the network-accessible service; selecting a data center and related endpoints from the plurality of endpoints associated with the selected data center based on a combination of a distribution algorithm corresponding to the network and the product of the geographic criteria and the distribution criteria and a selection algorithm based on message attributes; and routing the transformed packets to the endpoints via the encapsulated tunnels associated with the assigned subgroup, wherein the plurality of endpoints are assigned into subgroups sharing a common encapsulated tunnel, and wherein each of the endpoints in the subgroup is configured to transmit messages to other subgroups for at least one of forwarding client packets or providing segmentation information.

Clause 37 the non-transitory computer readable medium of clause 36, wherein the plurality of endpoints comprises a network element for routing network packets using five-tuple communications.

Clause 38 the non-transitory computer-readable medium of clause 36, wherein the message exchanged by the endpoints in the subgroup comprises a path maximum transmission size unit discovery packet.

Clause 39 the non-transitory computer readable medium of clause 36, wherein the individual endpoints transmit the message via multicast transmission.

Clause 40 the non-transitory computer readable medium of clause 39, wherein the individual endpoints register with the multicast communication channel based on the associated subgroup.

Clause 41. The non-transitory computer readable medium of clause 36, wherein the message is transmitted by the separate endpoint via broadcast transmission.

Clause 42 the non-transitory computer-readable medium of clause 41, wherein all of the individual endpoints are registered to a broadcast communication channel and filter messages not associated with the assigned subgroup.

Clause 43, a system comprising: a plurality of endpoint systems, each endpoint system located in a different geographic location and comprising at least one endpoint computing device configured to provide a network-accessible service associated with a network address; at least one access point associated with the network accessible service, the at least one access point comprising a processor and configured to: receiving a request from a client device to initiate a first Transmission Control Protocol (TCP) session with the network-accessible service; performing a first initialization phase of the first TCP session to establish context information for the TCP session, the TCP session including at least one sequence number; receiving data packets from the client device as part of the first TCP session; selecting an endpoint system from the plurality of endpoint systems to which to route the network packet; performing a second initialization phase of a second TCP session to establish a TCP session with the selected endpoint system, wherein the second TCP session is independent of the first TCP session; and transmitting the received data packets as part of the second TCP session.

Clause 44 the system of clause 43, wherein the first TCP session is associated with a first sequence number and a second sequence number assigned by the access point.

Clause 45 the system of clause 44, wherein the second TCP session is associated with the first sequence number and a third sequence number assigned by the selected endpoint.

Clause 46 the system of clause 45, wherein the access point is configured to convert the third sequence number and the second sequence number.

Clause 47 the system of clause 43, wherein the access point is configured to conduct the second initialization phase of the second TCP session with a regional access point.

Clause 48. A method implemented at each of a plurality of access points to a network accessible service distributed across a geographic area, the method comprising: at a first access point of the plurality of access points: receiving a request from a client device to initiate a connection-oriented communication session with the network-accessible service; performing a first initialization phase of the connection-oriented communication session to establish context information of the connection-oriented communication session; applying selection criteria to select an endpoint system from a plurality of endpoint systems for the network-accessible service to which to route the network packet; performing a second initialization phase of the connection-oriented communication session to establish context information for the connection-oriented communication session with the selected endpoint system; and transmitting the data packets and context information of the connection-oriented communication session to the endpoint system.

Clause 49 the method of clause 48, wherein the connection-oriented communication sessions each correspond to a Transmission Control Protocol (TCP) session.

Clause 50 the method of clause 49, wherein the first TCP session is associated with a first sequence number and a second sequence number assigned by the access point.

Clause 51 the method of clause 50, wherein the second TCP session is associated with the first sequence number and a third sequence number assigned by the selected endpoint.

Clause 52 the method of clause 51, further comprising: and converting the third serial number and the second serial number.

Clause 53. The method of clause 48, wherein performing the connection-oriented second initialization phase comprises performing the second initialization phase directly with the selected endpoint.

Clause 54. The method of clause 48, wherein performing the connection-oriented second initialization phase comprises performing the second initialization phase directly with the zone access point.

Clause 55 the method of clause 48, wherein the zone access point performs a third initialization phase of the connection-oriented communication session to establish context information of the connection-oriented communication session with the selected endpoint.

Clause 56, a non-transitory computer-readable medium comprising computer-executable instructions that, when executed by each of a plurality of access points to a network-accessible service distributed across a geographic area, configure each access point to: receiving a request from a client device to initiate a first connection-oriented communication session with the network-accessible service; conducting a first initialization phase of a TCP communication session to establish context information for the first connection-oriented communication session; applying selection criteria to select an endpoint system from a plurality of endpoint systems of the network-accessible service to which to route the network packet; and a second initialization phase of the stand-alone TCP communication session is performed to establish context information for a second connection-oriented communication session with the selected endpoint system.

Clause 57. The non-transitory computer-readable medium of clause 56, wherein the first TCP session is associated with a first sequence number and a second sequence number assigned by the access point.

Clause 58 the non-transitory computer-readable medium of clause 56, wherein the second TCP session is associated with the first sequence number and a third sequence number assigned by the selected endpoint.

Clause 59 the non-transitory computer-readable medium of clause 56, wherein the instructions are further operable to cause a conversion of a sequence number between the first communication session and the second communication session.

Clause 60. The non-transitory computer-readable medium of clause 56, wherein performing the connection-oriented second initialization phase comprises performing the second initialization phase directly with the selected endpoint.

Clause 61 the non-transitory computer readable medium of clause 56, wherein the second initialization phase of conducting independent TCP communication comprises conducting the second initialization phase directly with a zone access point.

Clause 62. The non-transitory computer readable medium of clause 56, wherein the instructions are further operable to cause transmission of the received packet according to the second connection-oriented communication session.

Clause 63, a system, comprising: a plurality of endpoint systems, each endpoint system located in a different geographic location and comprising at least one endpoint computing device configured to provide a network-accessible service associated with a network address; at least one access point associated with the network accessible service, each of the at least one access point comprising a processor and configured to: establishing a first TCP session to establish context information for the first TCP session between the at least one access point and the client device, the first TCP session including specifications for congestion control parameters and data censoring parameters; selecting an endpoint system from the plurality of endpoint systems to which to route the network packet; and establishing a second TCP session to establish a TCP session with the selected endpoint system, wherein the second TCP session is independent of the first TCP session, and wherein the second TCP session includes specifications of congestion control parameters and data auditing parameters.

Clause 64 the system of clause 63, wherein the first TCP session and the second TCP session are associated with different congestion parameters.

Clause 65 the system of clause 64, wherein the congestion parameter corresponds to an option from a set of values, and wherein the first TCP session corresponds to a lower value in the set of values.

The system of clause 63, wherein the first TCP session and the second TCP session are associated with different data censoring parameters.

Clause 67 the system of clause 66, wherein the data censoring parameter corresponds to an option from a set of values, and wherein the second TCP session corresponds to a maximum value in the set of values.

Clause 68. A method implemented at each of a plurality of access points to a network accessible service distributed across a geographic area, the method comprising: receiving a request from a client device to initiate a first Transmission Control Protocol (TCP) session with the network-accessible service; establishing a first TCP session to establish context information for the first TCP session, the first TCP session including specifications for congestion control parameters and data censoring parameters; selecting an endpoint system from the plurality of endpoint systems to which to route the network packet; and establishing a second TCP session to establish a second TCP session with the selected endpoint system, wherein the second TCP session is independent of the first TCP session and wherein the second TCP session includes specifications for congestion control parameters and data auditing parameters.

Clause 69 the method of clause 68, wherein the first TCP session and the second TCP session are associated with different congestion parameters.

Clause 70 the method of clause 69, wherein the congestion parameter corresponds to an option from a set of values, and wherein the first TCP session corresponds to a lower value in the set of values.

Clause 71 the method of clause 69, wherein the congestion parameter corresponds to an option from a set of values, and wherein the second TCP session corresponds to a highest value in the set of values.

Clause 72 the method of clause 68, wherein the first TCP session and the second TCP session are associated with different data censoring parameters.

Clause 73 the system of clause 72, wherein the data censoring parameter corresponds to an option from a set of values, and wherein the second TCP session corresponds to a maximum value in the set of values.

The method of clause 74, wherein the data censoring parameter corresponds to an option from a set of values, and wherein the first TCP session corresponds to a minimum value in the set of values.

Clause 75. The method of clause 68, wherein the congestion parameter corresponds to a specification of a maximum segment size.

Clause 76 the method of clause 68, wherein the data inspection parameter corresponds to a specification of window scaling values.

Clause 77 is a non-transitory computer-readable medium comprising computer-executable instructions that, when executed by each of a plurality of access points to a network-accessible service distributed across a geographic area, configure each access point to: receiving a request from a client device to initiate a first Transmission Control Protocol (TCP) session with the network-accessible service; establishing a first TCP session to establish context information for the TCP session, the TCP session including specifications for congestion control parameters and data censoring parameters; selecting an endpoint system from the plurality of endpoint systems to which to route the network packet; and establishing a second TCP session to establish a TCP session with the selected endpoint system, wherein the second TCP session is independent of the first TCP session and wherein the second TCP session includes specifications for congestion control parameters and data auditing parameters.

Clause 78. The non-transitory computer readable medium of clause 77, wherein the first TCP session and the second TCP session are associated with different congestion parameters.

Clause 79. The non-transitory computer readable medium of clause 77, wherein the congestion parameter corresponds to an option from a set of values, and wherein the first TCP session corresponds to a lowest value in the set of values.

Clause 80. The non-transitory computer readable medium of clause 77, wherein the first TCP session and the second TCP session are associated with different data censoring parameters.

Clause 81. The non-transitory computer readable medium of clause 77, wherein the data censoring parameter corresponds to an option from a set of values, and wherein the second TCP session corresponds to a maximum value in the set of values.

Clause 82. The non-transitory computer readable medium of clause 77, wherein the data censoring parameter corresponds to an option from a set of values, and wherein the first TCP session corresponds to a minimum value in the set of values.

Claims

1. A system, comprising:

a plurality of endpoints, each endpoint located in a different geographic location and comprising at least one server computing device configured to provide a network accessible service associated with a network address;

A plurality of global access points associated with the network accessible service, wherein individual ones of the plurality of global access points comprise a processor, are associated with different geographic locations, and are configured to:

advertising a network prefix of the network address as reachable via the individual global access point;

receiving a network packet addressed to the network address from a client device;

selecting a stream manager from a plurality of stream managers;

routing the network packet to the selected stream manager;

selecting, using the selected stream manager, a data center including one or more of the plurality of endpoints based on an implementation of a distribution algorithm corresponding to the network and a product of the geographic criteria and the distribution criteria, thereby selecting the data center without using a broadcast-selective approach;

selecting an endpoint to which the network packet is to be routed from the plurality of endpoints included in the selected data center based on an implementation of a selection algorithm corresponding to a consistent hashing algorithm that determines a score for the network packet, such that the endpoint is selected without using a multicast method and based on an implementation of a distribution algorithm corresponding to a network and a product of a geographic criterion and a distribution criterion; and is also provided with

The network packet is routed to the selected endpoint.

2. The system of claim 1, wherein the data center corresponds to a zone, and wherein the plurality of endpoints correspond to two or more sub-zones.

3. The system of claim 2, wherein the distribution criteria corresponds to a specification of the two or more sub-regions.

4. The system of claim 1, wherein the distribution criteria corresponds to a percentage of network traffic to a complete allocation of zones.

5. The system of claim 1, wherein the consistent hashing algorithm corresponds to one of a weighted rendezvous hashing algorithm, or a weighted consistent hashing algorithm.

6. A method implemented at a plurality of global access points associated with network accessible services distributed across a geographic area, the method comprising:

receiving, at one of the plurality of global access points, a network packet from a client device, the network packet addressed to a network address associated with the network-accessible service;

selecting a stream manager from a plurality of stream managers;

routing the network packet to the selected stream manager;

selecting, by the global access point, a data center including a plurality of endpoints based on an implementation of a distribution algorithm corresponding to a network and a product of a geographic criterion and a distribution criterion using the selected stream manager, thereby selecting the data center without using a broadcast-selective method;

Selecting, by the global access point, an endpoint from the plurality of endpoints included in the selected data center based on an implementation of a selection algorithm corresponding to a consistent hashing algorithm that determines a score of the network packet, such that the endpoint is selected without using a broadcast selection method and based on an implementation of a distribution algorithm corresponding to a network and a product of a geographic criterion and a distribution criterion; and

the network packet is routed to the selected endpoint.

7. The method of claim 6, further comprising: the network address associated with the network-accessible service is advertised as reachable via each of the plurality of global access points.

8. The method of claim 6, wherein the data center corresponds to a zone, and wherein the plurality of endpoints correspond to two or more sub-zones.

9. The method of claim 8, wherein the distribution criteria corresponds to a specification of the two or more sub-regions.

10. The method of claim 8, wherein the distribution criteria corresponds to a percentage of network traffic to a complete allocation of zones.

11. The method of claim 8, wherein the distribution criteria corresponds to a percentage of the partial allocation of network traffic to a zone.

12. The method of claim 11, wherein the distribution criteria comprises a default allocation of the remainder of the sub-region.

13. The method of claim 12, wherein the default allocation of the remainder of the sub-region comprises geographic proximity.

14. The method of claim 6, wherein the score of the network packet corresponds to a score based on at least a source IP address, a source port address, a destination IP address, a destination port address, and a protocol.

15. The method of claim 14, wherein selecting the endpoint from the plurality of endpoints corresponds to selecting the endpoint with the highest network packet score.

16. A non-transitory computer-readable medium comprising computer-executable instructions that, when executed by each of a plurality of access points associated with a network-accessible service distributed across a geographic area, cause each of the plurality of access points to:

receiving a network packet from a client device addressed to a network address of the network-accessible service;

selecting a stream manager from a plurality of stream managers;

routing the network packet to the selected stream manager;

Selecting, using the selected stream manager, a data center from among a plurality of available data centers based on an implementation of a combination of a distribution algorithm corresponding to a product of a network and a geographic criterion with the distribution criterion, and a selection algorithm corresponding to a consistent hash algorithm that determines a score of the network packet, thereby selecting the data center without using a multicast method, and selecting, from a plurality of endpoints associated with separate data centers, a relevant endpoint to which the network packet is to be routed, thereby selecting the endpoint without using a multicast method, and based on an implementation of a distribution algorithm corresponding to a product of a network and a geographic criterion with the distribution criterion;

the network packet is routed to the selected endpoint.

17. The non-transitory computer-readable medium of claim 16, wherein the data center corresponds to a zone, and wherein the plurality of endpoints correspond to two or more sub-zones.

18. The non-transitory computer readable medium of claim 16, wherein the distribution criteria corresponds to a scalable allocation.

19. The non-transitory computer-readable medium of claim 16, wherein the score of the network packet corresponds to a score based on at least a source IP address, a source port address, a destination IP address, a destination port address, and a protocol.

20. The non-transitory computer-readable medium of claim 16, wherein the distribution criteria corresponds to performance criteria.