CN114008593A - Virtualized block storage server in cloud provider underlying extension - Google Patents

Virtualized block storage server in cloud provider underlying extension Download PDF

Info

Publication number
CN114008593A
CN114008593A CN202080047292.8A CN202080047292A CN114008593A CN 114008593 A CN114008593 A CN 114008593A CN 202080047292 A CN202080047292 A CN 202080047292A CN 114008593 A CN114008593 A CN 114008593A
Authority
CN
China
Prior art keywords
block storage
pse
volume
instance
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202080047292.8A
Other languages
Chinese (zh)
Other versions
CN114008593B (en
Inventor
A·N·利古里
M·S·奥尔森
C·M·格林伍德
P·拉波维奇
M·维尔马
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/457,853 external-priority patent/US10949131B2/en
Priority claimed from US16/457,850 external-priority patent/US10949124B2/en
Priority claimed from US16/457,856 external-priority patent/US10949125B2/en
Application filed by Amazon Technologies Inc filed Critical Amazon Technologies Inc
Priority to CN202310084506.4A priority Critical patent/CN116010035B/en
Publication of CN114008593A publication Critical patent/CN114008593A/en
Application granted granted Critical
Publication of CN114008593B publication Critical patent/CN114008593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0664Virtualisation aspects at device level, e.g. emulation of a storage device or system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0661Format or protocol conversion arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Stored Programmes (AREA)

Abstract

A first block storage server virtual machine executed by a computer system to host a first volume using one or more storage devices of the computer system. Executing, by the computer system, a second virtual machine having access to the virtual block storage. Executing, by the computer system, a block storage client. Receiving, by the block storage client, a first block storage operation from the second virtual machine, the first block storage operation being performed on the virtual block storage. Sending, by the block storage client, a message to the first block storage server virtual machine to cause the first block storage server virtual machine to perform the block storage operation on the first volume.

Description

Virtualized block storage server in cloud provider underlying extension
Background
Many companies and other organizations operate computer networks that interconnect numerous computing systems to support their operations, such as using computing systems that are co-located (e.g., as part of a local network) or alternatively located in multiple different geographic locations (e.g., connected via one or more private or public intermediary networks). For example, data centers housing a large number of interconnected computing systems have become commonplace, such as private data centers operated by and on behalf of a single organization, and public data centers operated by entities as merchants to provide computing resources to customers. Some public data center operators provide network access, power, and secure installation facilities for hardware owned by various customers, while other public data center operators provide "full service" facilities that also include hardware resources available for use by their customers. However, as the size and scope of typical data centers have increased, the task of provisioning, implementing, and managing physical computing resources has become more complex.
The advent of virtualization technologies for commodity hardware has provided benefits in managing large computing resources for many customers with diverse needs, allowing various computing resources to be efficiently and securely shared by multiple customers. For example, virtualization techniques may allow a single physical computing machine to be shared among multiple users by providing each user with one or more virtual machines hosted by the single physical computing machine. Each such virtual machine is a software simulation that acts as a distinct logical computing system that provides users with the illusion that they are the sole operator and administrator of a given hardware computing resource, while also providing application isolation and security between the various virtual machines. Furthermore, some virtualization technologies are capable of providing virtual resources that span two or more physical resources, such as a single virtual machine with multiple virtual processors that spans multiple different physical computing systems. As another example, virtualization technology may allow sharing of data storage hardware between multiple users by providing each user with a virtualized data storage that may be distributed across multiple data storage devices, where each such virtualized data storage device acts as a different logical data storage that provides the user with the illusion of: they are the sole operators and administrators of data storage resources.
A wide variety of virtual machine types optimized for different types of applications, such as compute-intensive applications, memory-intensive applications, and the like, may be set up at a data center of some cloud computing provider networks in response to client requests. In addition, provider network clients may also use higher level services that rely on the virtual computing services of such provider networks, such as some database services whose database instances are instantiated with virtual machines that use the virtual computing services. However, for some types of applications, such as applications that handle very large amounts of data that must be stored at customer premises outside of the provider network, services that are limited to using hardware located at the data center of the provider network to provide virtualized resources may not be optimal, for example, for latency-related reasons and/or other reasons.
Drawings
Various embodiments according to the present disclosure will be described with reference to the following drawings.
Fig. 1 is a block diagram depicting an exemplary provider network extended by a provider underlay extension portion located within the network outside of the provider network, in accordance with at least some embodiments.
FIG. 2 is a block diagram depicting an exemplary provider underlying extension portion, according to at least some embodiments.
Fig. 3 is a block diagram depicting exemplary connectivity between a provider network and a provider underlying extension portion, in accordance with at least some embodiments.
Fig. 4 and 5 are block diagrams depicting an exemplary virtualized block storage system in accordance with at least some embodiments.
FIG. 6 is a block diagram depicting an exemplary system for booting a virtualized block storage server using a first technique, in accordance with at least some embodiments.
FIG. 7 is a block diagram depicting an exemplary system for booting a virtualized block storage server using a second technique, in accordance with at least some embodiments.
FIG. 8 is a block diagram depicting an exemplary system for booting additional compute instances in a provider underlay extension from a chunk store server according to at least some embodiments.
FIG. 9 is a block diagram depicting an exemplary system for managing a virtualized block storage server in accordance with at least some embodiments.
FIG. 10 is a block diagram depicting an exemplary system for providing volume to chunk storage client mapping in accordance with at least some embodiments.
FIG. 11 is a block diagram depicting an exemplary system for tracking volume mappings in accordance with at least some embodiments.
FIG. 12 is a flowchart depicting the operation of a method for starting a virtualized block storage server in accordance with at least some embodiments.
FIG. 13 is a flowchart depicting the operation of a method for using a virtualized block storage server in accordance with at least some embodiments.
FIG. 14 is a flowchart depicting the operation of a method for managing a virtualized block storage server in a provider underlying extension in accordance with at least some embodiments.
FIG. 15 depicts an exemplary provider network environment, according to at least some embodiments.
FIG. 16 is a block diagram of an exemplary provider network that provides storage services and hardware virtualization services to customers in accordance with at least some embodiments.
FIG. 17 is a block diagram depicting an exemplary computer system that may be used in at least some embodiments.
Detailed Description
The present disclosure relates to methods, devices, systems, and non-transitory computer-readable storage media for configuring a provider underlying extension to communicate with a network external to the provider network and for providing resources on the underlying extension that are the same as or similar to resources available in the provider network. Provider network operators (or providers) offer their users (or customers) the ability to utilize one or more of a variety of types of computing-related resources, such as computing resources (e.g., executing Virtual Machines (VMs) and/or containers, executing batch jobs, executing code without provisioning servers), data/storage resources (e.g., object stores, block-level stores, data archive stores, databases and database tables, etc.), network-related resources (e.g., virtual networks including configurations of sets of computing resources, Content Delivery Networks (CDNs), Domain Name Services (DNS)), application resources (e.g., databases, application build/deployment services), access policies or roles, identity policies or roles, machine images, routers and other data processing resources, etc. These and other computing resources may be provisioned as services.
Provider network operators often provide these and other computing resources as services that rely on virtualization technologies. For example, virtualization techniques may be used to provide users with the ability to control or utilize computing instances (e.g., VMs using a guest Operating System (OS) that operate using a hypervisor that may or may not further operate above an underlying host OS; containers that may or may not operate in VMs; instances that may execute on "bare metal" hardware without an underlying hypervisor), where one or more computing instances may be implemented using a single electronic device. Thus, users can directly utilize computing instances provided by instance management services (sometimes referred to as hardware virtualization services) hosted by a provider network to perform a variety of computing tasks. Additionally or alternatively, a user may indirectly utilize a computing instance by: code to be executed by a provider network is submitted (e.g., via an on-demand code execution service), which in turn utilizes computing instances to execute the code, typically without requiring the user to have any control or knowledge of the underlying computing instances involved.
The services that support the provisioning of computing-related resources to users and the resources of those computing-related resources provisioned to users may be generally referred to as the provider network underlay. Such resources typically include hardware and software in the form of many networked computer systems. The services and operations underlying the provider network may be broadly subdivided into two categories in various embodiments: control plane traffic carried on a logical control plane and data plane operations carried on a logical data plane. The data plane represents movement of user data through the distributed computing system, while the control plane represents movement of control signals through the distributed computing system. The control plane generally includes one or more control plane components distributed across and implemented by one or more control servers. Control plane traffic generally includes management operations such as establishing isolated virtual networks for various customers, monitoring resource usage and health, identifying a particular host or server at which a requested compute instance is to be launched, provisioning additional hardware if needed, and so forth. The data plane includes customer resources (e.g., compute instances, containers, block storage volumes, databases, file stores) implemented on the provider network. Data plane traffic generally includes non-administrative operations, such as data communicated to and from customer resources. The control plane components are typically implemented on a set of servers separate from the data plane servers, and the control plane traffic and data plane traffic may be sent over separate/distinct networks. In some embodiments, control plane traffic and data plane traffic may be supported by different protocols. In some implementations, a message (e.g., a packet) sent on the provider network includes a flag to indicate whether the traffic is control plane traffic or data plane traffic. In some embodiments, the payload of the traffic may be examined to determine its type (e.g., whether it is a control plane or a data plane). Other techniques for differentiating traffic types are possible.
Some customer applications are easily migrated to the provider network environment, while some customer workloads need to be kept indoors ("on-preem") due to low latency, high data volume, data security, or other customer data processing requirements. Exemplary indoor environments include customer data centers, robotic integrations, field locations, co-location facilities, telecommunication facilities (e.g., near cell towers), and the like. To meet customer requirements, the present disclosure relates to deploying infrastructure-like resources indoors. The term "provider underlying extension" (PSE) refers to a set of resources (e.g., hardware, software, firmware, configuration metadata, etc.) that a customer may deploy indoors (such as in a geographically separate location from the provider network) but that provide functionality (e.g., virtualized computing resources) that is the same as or similar to that provided in the provider network. Such resources may be physically transported as one or more computer systems or servers transported in a rack or cabinet, such as those typically found in an indoor location. The PSE may provide the customer with a similar set of features and capabilities as those of the provider network described above that may be deployed indoors. Indeed, from the perspective of a customer of the provider network, the PSE represents a local extension of the capability of the provider network that may be provided at any desired physical location that may accommodate the PSE (e.g., in terms of physical space, power, internet access, etc.). From the perspective of the provider network itself, the PSE may be considered to be physically located in the same provider network data center as the core provider network infrastructure, while physically located in the customer-selected deployment site. In at least some embodiments, customers who physically host the PSE may grant permissions to their own customers (e.g., other users of the provider network) to allow those users to initiate instances to host their respective workloads within the PSE at the customer's indoor location, and in some cases, to allow those workloads to access the customer's network.
In at least some embodiments, the PSE may be pre-configured, for example by a provider network operator, with an appropriate combination of hardware, software, and/or firmware elements to support various types of computing-related resources, and accomplished by: various local data processing requirements are met without compromising the security of the provider network itself or any other customer of the provider network. In at least some embodiments, the PSE is generally managed through the same or similar set of interfaces that the customer will use to access computing-related resources within the provider network. For example, customers may provision, manage, and operate computing-related resources within their in-house PSEs or PSEs at various deployment sites through the provider network using the same Application Programming Interfaces (APIs) or console-based interfaces that they otherwise use to provision, manage, and operate computing-related resources within the provider network.
In at least some embodiments, the resources of the provider network instantiate various network components to ensure secure and reliable communications between the provider network and the PSE. Such components may establish one or more secure tunnels (e.g., VPNs) with the PSE. Such components may further partition control plane traffic and data plane traffic and treat each type of traffic differently based on several factors, including the direction of traffic (e.g., to or from the PSE). In at least some embodiments, the control plane service dynamically provisions and configures these network components for the deployed PSE. Such a control plane service may monitor networking components for each PSE and invoke a self-healing or repair mechanism designed to prevent loss of communication with the PSE due to a failure within the provider network.
One service typically provided to customers of the provider network is a block storage service that can act as a virtualized persistent disk, for example, because instances are presented using a logical view of physical storage resources, but mapping this logical storage space to the actual physical location of storage is handled by the virtualization system. One or more volumes may be replicated to provide high availability and durability to clients, where replicas are typically stored on different servers. A client may attach one or more block storage volumes to an instance, and a client supporting the instance may use the virtualized block storage volume to perform block-based operations of the instance. For example, a client may specify that a particular instance is to be launched from a given boot volume (e.g., a volume containing an operating system) and have another additional volume to support data stored by a client application executing through the instance. To provide a high degree of flexibility in available storage, the provider network decouples the physical storage devices supporting a given additional volume from the computing resources supporting a given compute instance. Traditionally, a fleet of block storage servers supporting block storage services will partition their attached physical storage devices (e.g., solid state drives, magnetic drives, etc.) into a number of logical volumes. A block storage server may support storage resources for hundreds or even thousands of instances. These chunk store servers are typically executed in a "bare machine" configuration in which, for example, the server software is executed within an operating system environment running directly on dedicated server hardware, rather than on a virtual machine or within a container.
Because capacity in the expansion portion of the provider bottom tier can be significantly limited compared to the capacity in the available area, this bare block storage server design may not be well suited for bottom tier expansion. For example, some underlying extensions may have only a single server (or another small number of servers), and thus it may not be feasible to dedicate an entire server to the block storage resources, thereby preventing that server from being used for the compute instance. To address this challenge, embodiments of the present disclosure virtualize the chunk store service so that it can run within a compute instance, thereby enabling more flexible and efficient use of limited capacity. For example, a single server may be configured to host instances of computing and instances of virtualizing additional volumes (e.g., block storage servers) thereof, thereby providing a high degree of flexibility in the use of a limited set of resources by the PSE. A single server may also be partitioned into isolated failure domains to host multiple block storage servers (and even multiple replicas of a volume). A fault domain generally refers to a logical portion of a system that may fail without affecting other portions of the system. When executed on a bare metal system, the failure domain of the block storage system typically corresponds to the entire bare metal computer system. By decoupling fault domains from a per-server basis to a sub-server basis using virtualization, redundancy in the underlying hardware used to support the block storage server instances may be exploited to increase the number of fault domains within the PSE. Thus, the chunk store servers may individually manage smaller amounts of data, such that when a failure occurs, the workload to recover the data associated with the failure is reduced. These chunk store instances may be created as a Virtual Private Cloud (VPC), and instance clients in communication with the chunk storage volumes may also communicate in this same VPC. Advantageously, this enables the exploitation of VPC encryption to achieve more secure communication in the underlying extension portion.
However, virtualizing the chunk store service in the extension of the provider's bottom tier can create certain technical challenges, including initializing the chunk store service from the boot volume (also referred to as the chunk store service machine image) before there are any chunk store services running on the bottom tier extension that will be able to store the boot volume. While PSEs provide low latency computing resources to the in-house facilities, those resources may suffer increased latency when returning to the provider network. While PSEs may rely on block storage servers in the area of the provider network, such dependencies would suffer from increased delays and are therefore undesirable. As described in further detail below, embodiments of the present disclosure address this challenge via a local boot technique that can load a boot volume for a chunk store server on the PSE's local storage from data stored in an area of the provider network, and then boot the chunk store server to use the volume for other instances that are started within the PSE. Thus, the disclosed local boot technique allows a customer to use a chunk store service machine image to launch an instance in the underlying extension even when the chunk store service itself is not already present in the underlying extension.
The disclosed systems and techniques also protect the provider network from potential security issues that may arise from connecting the PSE to the provider network. In some embodiments, PSEs may require a secure networking tunnel from the customer site where they are installed to the provider network underlay (e.g., machine physical network) in order to operate. These tunnels may include virtual infrastructure components hosted in virtualized compute instances (e.g., VMs) and on the bottom layer. Examples of tunnel components include VPCs and proxy compute instances and/or containers running on compute instances. Each server in the PSE may use at least two tunnels, one for control plane traffic and one for data plane traffic. As described in further detail below, intermediate resources located along a network path between the provider network underlay and the PSE may securely manage traffic flowing between the underlay and the PSE.
In at least some embodiments, the provider network is a cloud provider network. A cloud provider network or "cloud" refers to a large pool of accessible virtualized computing resources, such as computing, storage, and networking resources, applications, and services. The cloud may provide convenient on-demand network access to a shared pool of configurable computing resources, which may be programmatically provisioned and released in response to customer commands. These resources may be dynamically provisioned and reconfigured to adjust to variable loads. Cloud computing can therefore be viewed as applications delivered as services over publicly accessible networks (e.g., the internet, cellular communication networks) and as hardware and software in cloud provider data centers that provide those services.
The cloud provider network may be formed as several areas, where an area is a geographic area where cloud providers aggregate data centers. Each zone may include two or more available zones connected to each other via a private high-speed network (e.g., fiber optic communication connection). An availability zone refers to an isolated fault domain including one or more data center facilities having power, separate networking, and separate cooling separate from another availability zone. Preferably, the available zones within an area are located far enough from each other so that the same natural disaster does not affect more than one available zone while online. The customer may connect to the availability zone of the cloud provider network via a publicly accessible network (e.g., the internet, a cellular communication network). A PSE as described herein may also be connected to one or more available regions via a publicly accessible network.
The cloud provider network may include a physical network (e.g., sheet metal boxes, cables) referred to as an underlay. The cloud provider network may also include an overlay network of virtualized computing resources running on the bottom tier. Thus, network packets may be routed along an underlying network according to constructs (e.g., VPCs, security groups) in the overlay network. The mapping service may coordinate the routing of these network packets. The mapping service may be a regional distributed lookup service that maps a combination of overlay IP and network identifier to underlying IP so that the distributed underlying computing device can lookup where to send the packet.
To illustrate, each physical host may have an IP address in the underlying network. Hardware virtualization techniques may enable multiple operating systems to run simultaneously on a host computer, for example, as virtual machines on the host. A hypervisor or virtual machine monitor on a host machine allocates hardware resources of the host machine among various virtual machines on the host machine and monitors the execution of the virtual machines. Each virtual machine may be provided with one or more IP addresses in the overlay network, and the virtual machine monitor on the host may know the IP address of the virtual machine on that host. The virtual machine monitor (and/or other devices or processes on the network underlay) may use encapsulation protocol techniques to encapsulate and route network packets (e.g., client IP packets) between virtualized resources on different hosts within the cloud provider network via the network underlay. The encapsulated packets may be routed between endpoints on the network bottom layer via overlay network paths or routes using the encapsulation protocol techniques on the network bottom layer. Encapsulation protocol technology can be viewed as providing a virtual network topology overlaid on the network floor. Encapsulation protocol techniques may include a mapping service that maintains a mapping directory that maps IP overlay addresses (public IP addresses) to underlying IP addresses (private IP addresses) that various processes on the cloud provider network may access in order to route packets between endpoints.
Those skilled in the art will appreciate in view of this disclosure that certain embodiments may be capable of achieving various advantages, including some or all of the following: (a) enabling customers of a provider network operator to deploy a wide variety of applications in a location-independent manner using provider-managed infrastructure (e.g., PSEs) at a location selected by the customer, while still maintaining scalability, security, availability, and other operational advantages that the provider network may have; (b) reducing the amount of application data and results that must be transferred over long distances (such as on a link between a customer data center and a provider network data center); (c) improving the overall latency and responsiveness of an application that potentially may consume large amounts of data as input or produce as output by moving the application close to the data source/destination; and/or (d) improve the security of sensitive application data.
Fig. 1 is a block diagram depicting an exemplary provider network extended by a provider underlay extension portion located within the network outside of the provider network, in accordance with at least some embodiments. Within provider network 100, a customer may create one or more isolated virtual networks 102. The customer may launch a compute instance 101 within the IVN to execute their application. These compute instances 101 are hosted by the underlying addressable devices (SADs) that are part of the provider network underlying (not shown). Similarly, a SAD that is part of the provider network underlay may host the control plane service 104. Exemplary control plane services 104 include: an instance management service (sometimes referred to as a hardware virtualization service) that allows a customer or other control plane service to initiate and configure an instance and/or IVN; an object store service that provides object storage; a block storage service that provides the ability to attach block storage to instances; database services that provide various database types, and the like.
It should be noted that the components depicted within provider network 100 may be considered logical components. As mentioned, these components are hosted by the SAD of the provider network bottom layer (not shown). For example, the provider network underlay may host instance 101 using containers or virtual machines operating within an Isolated Virtual Network (IVN). Such containers or virtual machines are executed by the SAD. As another example, the provider network underlay may use SAD in a bare machine configuration (e.g., without virtualization) to host one or more of the control plane services 104. In at least some implementations, SAD refers to software (e.g., a server) executed by hardware that is addressable via a network address of a provider network rather than another network (e.g., a customer network, an IVN, etc.). In at least some implementations, SAD may additionally refer to the underlying hardware (e.g., a computer system) executing software.
As depicted, the provider network 100 communicates with a provider underlying extension section (PSE)188 deployed within the customer network 185 and a PSE 198 deployed within the customer network 195. Each PSE includes one or more underlying addressable devices (SADs), such as SADs 189A-189N shown within PSE 188. Such a SAD 189 facilitates the provision of computing-related resources within the PSE. It should be noted that the description of solid-oval-dashed box combinations of components, such as SAD 189A-189N, are generally used to indicate that one or more of those components may be present in this and subsequent figures (although reference numbers in corresponding text may refer to components in singular or plural form and with or without an alphabetic suffix). The customer gateway/router 186 provides connectivity between the provider network 100 and the PSE 188 and between the PSE 188 and other customer resources 187 (e.g., other in-house servers or services connected to the customer network 185). Similarly, the customer gateway/router 196 provides connectivity between the provider network 100 and the PSE 198 and between the PSE 198 and other customer resources 197. Various connectivity options exist between the provider network 100 and the PSE 198, such as the public network like the internet shown for the PSE 188 or the direct connection shown for the PSE 198.
Within the provider network 100, control plane traffic 106 is generally (but not always) directed to the SAD, while data plane traffic 104 is generally (but not always) directed to the instance. For example, some SADs may sell APIs that allow instances to be started and terminated. The control plane service 104 may send a command to the API of such SAD via the control plane to start a new instance in the IVN 102.
An IVN, as the name implies, may comprise a hosted (e.g., virtualized) set of resources that are logically isolated or separated from other resources of the provider network (e.g., other IVNs). The control plane service may set up and configure IVNs, including assigning an identifier to each IVN to distinguish it from other IVNs. The provider network may provide various ways to permit communication between IVNs, such as by setting a peer relationship between IVNs (e.g., a gateway in one IVN is configured to communicate with a gateway in another IVN).
IVNs can be established for a variety of purposes. For example, an IVN may be set for a particular client by setting aside a set of resources for exclusive use by the client, which allows great flexibility in the networking configuration of the set of resources provided to the client. Within their IVN, customers may set up subnets, assign desired private IP addresses to various resources, set security rules governing incoming and outgoing traffic, and so on. In at least some embodiments, a set of private network addresses set within one IVN is by default inaccessible from another IVN (or more generally from outside the IVN).
The tunneling techniques facilitate traversal of IVN traffic between instances hosted by different SADs on the provider network 100. For example, a newly launched instance within IVN 102 may have IVN address a and be hosted by a SAD with an underlying address X, while instance 101 may have IVN address B and be hosted by a SAD with an underlying address Y. To facilitate communication between these compute instances, SAD X encapsulates packets sent from the newly started instance to instance 101 (from IVN address a to IVN address B) within the payload of packets having addressing information that hosts the SAD of the respective instance (from underlying address X to underlying address Y). Packets sent between SADs may also include an identifier of the IVN 102 to indicate that the data is destined for the IVN 102, rather than another IVN hosted by a SAD with the underlying address Y. In some implementations, the SAD further encrypts packets sent between instances within the payload of packets sent between the SADs using an encryption key associated with the IVN. In at least some implementations, the packaging and the encrypting are performed by a software component that hosts a SAD of the instance.
For PSEs, provider network 100 includes one or more network components to effectively extend the provider network infrastructure outside of provider network 100 to PSEs connected to the customer's in-house network. Such components may ensure that data plane and control plane operations directed to the PSE are securely, reliably, and transparently conveyed to the PSE. In the illustrated implementation, the PSE interface 108, PSE SAD agent 110, and PSE SAD anchor 112 facilitate data plane and control plane communications between the provider network 100 and the PSE 188. Similarly, PSE interface 118, PSE SAD proxy 120, and PSE SAD anchor 122 facilitate data plane and control plane communications between provider network 100 and PSE 198. As described herein, the PSE interface receives control and data plane traffic from the provider network, sends such control plane traffic to the PSE SAD proxy, and sends such data plane traffic to the PSE. The PSE interface also receives data plane traffic from the PSE and sends such data plane traffic to the appropriate destination within the provider network. The PSE SAD agent receives control plane traffic from the PSE interface and sends such control plane traffic to the PSE SAD anchor. The PSE SAD anchor receives control plane traffic from the PSE SAD agent and sends such control plane traffic to the PSE. The PSE SAD anchor also receives control plane traffic from the PSE and sends such control plane traffic to the PSE SAD agent. The PSE SAD proxy also receives control plane traffic from the PSE SAD anchor and sends such control plane traffic to the appropriate destination within the provider network. Other implementations may employ different combinations or configurations of network components to facilitate communication between the provider network 100 and the PSE (e.g., the functionality of the PSE interface, PSE SAD proxy, and/or PSE SAD anchor may be variously combined, such as by applications that perform the operation of both the PSE interface and PSE SAD proxy, the operation of both the PSE SAD proxy and PSE SAD anchor, the operation of all three components).
As indicated above, each PSE has one or more underlying network addresses of the SAD (e.g., SAD 189A-189N). Since those underlying addresses cannot be reached directly via the provider network 100, the PSE interfaces 108, 118 are disguised using additional Virtual Network Addresses (VNAs) that match the underlying addresses of the respective PSEs. As depicted, PSE interface 108 has an additional VNA 150 matching the PSE 188 SAD address, and PSE interface 118 has an additional VNA 152 matching the PSE 198 SAD address. For example, traffic destined for the SAD with Internet Protocol (IP) address 192.168.0.10 within the PSE 188 is sent to the PSE interface 108 with the additional virtual address 192.168.0.10, and traffic destined for the SAD with IP address 192.168.1.10 within the PSE 198 is sent to the PSE interface 118 with the additional virtual address 192.168.1.10. It should be noted that IPv4 or IPv6 addressing may be used. In at least some embodiments, the VNA is a logical construct that implements various networking related attributes, such as programmatically passing IP addresses between instances. Such transfers may be referred to as "attaching" and "detaching" a VNA to and from an instance.
At the higher level, the PSE interface is actually a packet forwarding component that routes traffic based on whether the traffic is control plane traffic or data plane traffic. It should be noted that in the case of the underlying addressing and encapsulation techniques described above, control and data plane traffic is routed to the PSE interface, as both are destined for the SAD. In the case of control plane traffic, the PSE interface routes the traffic to the PSE SAD agent based on the SAD address. In the case of data plane traffic, the PSE interface establishes and serves as an endpoint for one or more encrypted data plane traffic tunnels between provider network 100 and the PSE (e.g., tunnel 191 between PSE interface 108 and PSE 188, tunnel 193 between PSE interface 118 and PSE 198). For data plane traffic received from provider network 100, the PSE interface encrypts the traffic for transmission to the PSE via the tunnel. For data plane traffic received from the PSE, the PSE interface decrypts the traffic, optionally verifies SAD addressing of the packet, and sends the traffic to the identified SAD destination via the provider network 100. It should be noted that if the PSE interface receives traffic from the PSE that does not conform to the expected format (e.g., protocol) used to transport the data plane traffic, the PSE interface may drop such traffic. It should also be noted whether the PSE interface can verify the addressing of the encapsulated packet to ensure that the originator of the traffic (e.g., the instance hosted by the PSE within a particular IVN) is permitted to send the traffic to the addressed destination (e.g., the instance hosted by the provider network within the same or a different IVN).
Each SAD in the PSE has a corresponding set of one or more PSE interfaces, and each member of the set establishes one or more tunnels with the PSE for data plane traffic. For example, if there are four PSE interfaces for a PSE with four SADs, the PSE interfaces each establish a secure tunnel (e.g., sixteen tunnels) with the data plane traffic endpoints of each of the SADs. Alternatively, a group of PSE interfaces may be shared by multiple SADs by attaching an associated VNA to each member of the group.
Each PSE has one or more PSE SAD agents and one or more PSE SAD anchors that handle control plane traffic between the provider network 100 and the PSE's SAD. Control plane traffic is typically in the form of command-response or request-response. For example, the control plane service of the provider network 100 may issue a command to the PSE SAD to start an instance. Since management of PSE resources is facilitated from the provider network, the control plane commands sent over the secure tunnel should not typically originate from the PSE. At a high level, the PSE SAD proxy acts as a state safety boundary between the provider network 100 and the PSE (such boundary is sometimes referred to as a data diode). To this end, the PSE SAD agent may employ one or more techniques, such as applying various security policies or rules to the received control plane traffic. It should be noted that other control plane services 104 may indirectly or directly provide a public facing API to allow instances hosted by the PSE to issue commands to the provider network 100 via non-tunneled communications (e.g., over a public network such as the internet).
For traffic originating within provider network 100 and destined for the PSE, the PSE SAD agent may provide a control plane endpoint API for its corresponding SAD within the PSE. For example, a PSE SAD agent that may host the PSE SAD of an instance may provide an API that is consistent with an API that may receive control plane operations to initiate, configure, and terminate the instance. Depending on API calls and associated parameters to and received by the PSE SAD agent, the PSE SAD agent may perform various operations. For some operations, the PSE SAD agent may pass the operation and associated parameters to the destination SAD without modification. In some implementations, the PSE SAD agent may verify that parameters from received API calls within the provider network 100 are appropriate with respect to the API before passing through those operations.
For some API calls or associated parameters, the PSE SAD may act as an intermediary to prevent sensitive information from being sent outside the provider network 100. Exemplary sensitive information includes cryptographic information such as encryption keys, network certificates, and the like. For example, the PSE SAD agent may decrypt the data using a sensitive key and re-encrypt the data using a key that may be exposed to the PSE. As another example, the PSE SAD proxy may terminate a first security session (e.g., a Transport Layer Security (TLS) session) originating within the provider network 100 and create a new security session using a different certificate and corresponding SAD to prevent provider network certificates from being leaked to the PSE. Thus, the PSE SAD agent may receive specific API calls from within the provider network 100 that include sensitive information and issue substitute or replacement API calls to the PSE SAD in place of the sensitive information.
For traffic originating from the PSE and destined to the provider network 100, for example, the PSE SAD agent may discard all control plane commands or requests originating from the PSE, or only those commands or requests that are not directed to public facing control plane endpoints within the provider network.
In some implementations, the PSE SAD agent may process the response to the control plane operation depending on the nature of the expected response (if any). For example, for some responses, the PSE SAD agent may simply drop the response without sending any message to the initiator of the corresponding command or request. As another example, for some responses, the PSE SAD agent may clean the responses to ensure that the responses comply with the expected response format of the corresponding command or request, before sending the cleaned responses to the originator of the corresponding command or request via the control plane traffic 107. As another example, the PSE SAD agent may generate a response (immediately or after receiving the actual response from the SAD) and send the generated response to the originator of the corresponding command or request via the control plane traffic 107.
As part of acting as a security boundary between the provider network 100 and the PSE, the PSE SAD agent may track the communication state between components of the provider network (e.g., the control plane service 104) and each SAD of the PSE. The state data may include session keys for the duration of the session, pending outbound API calls with associated sources and destinations to track outstanding responses, the relationship between API calls received from within the provider network 100 and those issued to the SAD with replaced or replaced sensitive information, and so on.
In some embodiments, the PSE SAD proxy may provide status communication for other PSE-provider network communications in addition to control plane traffic. Such communications may include Domain Name System (DNS) traffic, Network Time Protocol (NTP) traffic, and operating system activation traffic (e.g., for window activation).
In some implementations, only certain components of the PSE are able to act as endpoints of an encrypted control plane traffic tunnel with the provider network 100. To provide redundancy and reliability of the connection between the provider network 100 and the PSE, the PSE SAD anchor may be used as a provider-network side endpoint for each of the PSE's available tunnel endpoints. As depicted, the PSE SAD anchor 112 is used to tunnel control plane traffic to the PSE 188 via tunnel 190, and the PSE SAD anchor 122 is used to tunnel control plane traffic to the PSE 1198 via tunnel 192.
Various embodiments may limit the radial impact of any attempted attack originating outside the provider network (e.g., from when the PSE should be included) by using the techniques to process the traffic described above and by isolating those network components exposed to the traffic from other parts of the provider network 100. In particular, network components may operate within one or more IVNs to limit the distance that an attacker can penetrate, thereby protecting the operation of the provider network and other customers. Thus, various embodiments may instantiate the PSE interface, PSE SAD proxy, and PSE SAD anchor as applications executed by virtual machines or containers executing within one or more IVNs. In the illustrated embodiment, multiple sets of PSE interfaces for different PSEs run within the multi-tenant IVN (e.g., PSE interface IVN 132 for PSEs 188 and 198). In other embodiments, each group of PSE interfaces may operate in a single-tenant IVN. Further, each set of PSE SAD agents and each set of PSE SAD anchors for a given PSE operate within a single-tenant IVN (e.g., PSE SAD agent IVN 134 for PSE 188, PSE SAD anchor IVN 136 for PSE 188, PSE SAD agent IVN 138 for PSE 198, and PSE SAD agent IVN 40 for PSE 198).
It should be noted that the redundancy provided by operating multiple instances of each of the network components (e.g., PSE interface, PSE SAD proxy, and PSE SAD anchor) allows the provider network to periodically reuse the instances hosting those components without interrupting the PSE-provider network communication. Reuse may involve, for example, restarting an instance or starting a new instance and reconfiguring other instances using, for example, the address of the reused instance. Periodic reuse limits the time window in which an attacker can take full advantage of a compromised network element (if the network element is compromised).
The PSE connectivity manager 180 manages the settings and configuration of the network components that provide connectivity between the provider network 100 and the PSE. As mentioned above, the PSE interfaces 108, 118, PSE SAD agents 110, 120 and PSE SAD anchors 112, 122 may be hosted by the provider network bottom layer as an example. The PSE connectivity manager 180 may request or initiate the activation of the PSE interface, PSE SAD proxy, and PSE SAD anchor when the PSE is shipped to customers and/or when those PSE are online and exchange configuration data with the provider network. Furthermore, the PSE connectivity manager 180 may further configure the PSE interface, the PSE SAD proxy, and the PSE SAD anchor. For example, the PSE connectivity manager 180 may append a VNA corresponding to the PSE SAD to the PSE interface, provide the address of the PSE SAD proxy of the PSE SAD to the PSE interface, and provide the address of the PSE SAD anchor of the PSE to the PSE SAD proxy. Further, the PSE connectivity manager 180 may configure the IVNs of the various components to allow communication between the PSE interface IVN 132 for the PSE and the PSE SAD proxy IVN, and between the PSE SAD proxy IVN for the PSE and the PSE SAD anchor IVN, for example.
It should be noted that to facilitate establishment of the tunnel 190-193, the tunnel endpoints may have one or more additional VNAs or assigned physical network addresses that may receive traffic from outside their respective networks (e.g., from outside the provider network of the PSE interface and PSE SAD anchor, from outside the customer network of the tunnel endpoint of the PSE). For example, the PSE 188 may have a single outward network address and manage communication for multiple SADs using Port Address Translation (PAT) or multiple outward network addresses. Each PSE SAD anchor 112, 122 may have or share (e.g., via PAT) an outward network address, and each PSE interface 108, 118 may have or share (e.g., via PAT) an outward accessible network address.
FIG. 2 is a block diagram depicting an exemplary provider underlying extension portion, according to at least some embodiments. In the illustrated embodiment, the PSE 188 includes one or more PSE frames 202 and one or more hosts 220. At a high level, each host 220 may be functionally (and possibly structurally) similar to at least some computer systems forming part of the provider network infrastructure (e.g., those hosting the underlying resources of instances within the provider network), while the PSE framework 202 provides the infrastructure of support to emulate the provider network infrastructure within the PSE, as well as provide connectivity to the provider network via control and data plane traffic tunnels (e.g., tunnels 190 and 193 of fig. 1).
In at least some embodiments, each PSE framework 202 may send or receive control or data plane traffic from each host 220 in a mesh architecture indicated by PSE control plane traffic 240 and PSE data plane traffic 242, and vice versa. Such redundancy takes into account the level of reliability that the customer may desire for the provider network.
The PSE framework 202 includes one or more control plane tunnel endpoints 204 that terminate encrypted tunnels (e.g., tunnels 190, 192) that carry control plane traffic. In some embodiments, the provider network 100 hosts a PSE SAD anchor for each control plane tunnel endpoint 204. Returning to the provider network, the PSE SAD agent (e.g., agent 110) may distribute control plane traffic to the PSE SAD anchor (e.g., anchor 112), effectively distributing the load of control plane traffic across the PSE framework 202 of the PSE 188. The PSE framework 202 also includes one or more data plane tunnel endpoints 206 that terminate encrypted tunnels (e.g., tunnels 191, 193) that carry data plane traffic from the PSE interface of the provider network, which may be connected in a mesh (e.g., a given PSE interface 108 tunnels with the data plane tunnel endpoint 206 of each PSE framework 202).
As indicated above, packets of control plane traffic and packets of data plane traffic may include SAD as a source and destination, the destination being encapsulated in packets with SAD-based addressing. As depicted, PSE frame 202 is SAD 289, and host 220 is SAD 290. It should be noted that the SAD (e.g., SAD 289, 290) within the PSE 188 may also provide secure session termination (e.g., TLS termination) for the secure session established with the corresponding PSE SAD agent (e.g., PSE SAD agent 110) within the provider network.
The SAD sells one or more control plane APIs to handle control plane operations of the SAD that are directed to resources that manage the SAD. For example, the PSE manager 210 of the PSE framework 202 may sell a control plane API for managing components of the PSE framework 202. One such component is a PSE gateway 208 that routes control and/or data plane traffic into and out of the PSE 188, such as control plane traffic to SAD 289 to PSE manager 210 and control or data plane traffic to SAD 290 to host manager 222. The PSE gateway 208 may further facilitate communication with a customer network, such as to or from other customer resources 187 that are accessible via a network of the PSE deployment site (e.g., customer network 185).
The API of the PSE manager 210 may include one or more commands to configure the PSE gateway 208 of the PSE framework 202. Other components 212 of the PSE framework 202 may include various applications or services that participate in the operation of the underlying layers of the PSE of the host 220, such as DNS, Dynamic Host Configuration Protocol (DHCP), and/or NTP services.
Host manager 222 may sell a control plane API for managing components of host 220. In the illustrated embodiment, the host pipeThe processor 222 includes an instance manager 224 and a network manager 226. The instance manager 224 may handle API calls related to the management of the host 220, including commands to start, configure, and/or terminate instances hosted by the host 220. For example, an instance management service in a provider network (not shown) may issue a control plane command to the instance manager 224 to start an instance on the host 220. As depicted, host 220 is a host of: a client instance 232 running inside a client IVN 233, a third party (3P) instance 234 running inside a 3P IVN 235, and a service instance 236 running inside a service IVN 237. It should be noted that each of these IVNs 233, 234, 235 may extend existing IVNs established within the provider network. The client instance 232 may execute some client applications or workloads, and the 3P instance 234 may execute clients that have granted permissionIn thatPSE 188Inner partThe application or workload of the other party to the instance is started and the service instance 236 may locally execute services (e.g., block storage services, database services, etc.) provided by the provider network to the PSE 188.
The network manager 226 may handle SAD addressed data plane traffic received by the host 220. For such traffic, the network manager may perform the required unsealing of the IVN packet before sending it to the addressed hosted instance. Further, the network manager 226 may handle the routing of traffic sent by hosted instances. When a hosted instance attempts to send a service to another locally hosted instance (e.g., on the same host), network manager 226 may forward that service to the addressed instance. When the hosted instance attempts to send traffic to a non-native instance (e.g., not on the same host), the network manager 226 may locate the underlying address of the device hosting the non-native instance, encapsulate and optionally encrypt the corresponding packet as a SAD addressed packet, and send that packet on the data plane (e.g., to another host within the PSE via the PSE gateway 208 or back to the provider network). It should be noted that the network manager 226 may include or have access to various data that facilitates routing data plane traffic (e.g., to find an address that hosts a SAD for an instance having an IVN network address in the destination of a packet received from the hosted instance).
Fig. 3 is a block diagram depicting exemplary connectivity between a provider network and a provider underlying extension portion, in accordance with at least some embodiments. In particular, fig. 3 depicts exemplary connectivity between a provider network and a PSE. It should be noted that for fig. 3, and as indicated at the top of the figure, the term "inbound" refers to traffic received by the provider network from the PSE, and the term "outbound" refers to traffic sent by the provider network to the PSE. Although not shown, for this example, assume that for a total of four SADs, the PSE includes two PSE frames 202 and two hosts 220. The PSE framework provides tunnel endpoints 204A, 204B for control plane traffic, and tunnel endpoints 206A, 206B for data plane traffic. The outbound traffic is decrypted and sent to a destination within the PSE underlay via the PSE gateway 208A, 208B.
For each of the four SADs, the provider network includes a VNA, one or more PSE interfaces, and one or more PSE SAD agents. In this example, for a given PSE SAD, the provider network includes PSE SAD VNA 304, two PSE interfaces 108A, 108B, and two PSE SAD agents 110A, 110B. As indicated, the PSE interface and PSE SAD agent may together be referred to as slices, each slice corresponding to a particular SAD within the PSE. In other embodiments, the PSE interface may be shared by all VNAs of the VPN, rather than by a single VNA of one of the SADs.
PSE SAD VNA 304 serve as front-lines for a given PSE over which other components of the provider network may send and receive traffic to and from the corresponding SAD of the PSE. A load balancer (not shown) may route outbound traffic sent to PSE SAD VNA 304 to one of the PSE interfaces 108A, 108B. The depicted PSE interfaces 108A, 108B for a given slice and PSE interfaces for other slices (not shown) operate within the PSE interface IVN 132. The PSE interfaces 108A, 108B send data plane traffic to the PSE via a data plane traffic tunnel and control plane traffic to the PSE by forwarding the control plane traffic to the sliced PSE SAD agents 110A, 110B. The PSE interfaces 108A, 108B store (or access) the network address of the PSE SAD agent of the associated SAD, the network address of the data plane tunnel endpoint, and one or more keys of or associated with the data plane tunnel endpoint of the PSE to enable secure communications with those endpoints.
In at least some embodiments, the PSE interfaces 110A, 110B establish a secure tunnel with each data plane tunnel endpoint 206A, 206B for data plane traffic, thereby generating N data plane tunnels, where N is the number of PSE interfaces per SAD (assuming each SAD has the same number of interfaces) multiplied by the number of data plane tunnel endpoints multiplied by the number of SADs. In this example, sixteen data plane tunnels are established between the PSE interfaces and the data plane tunnel endpoints (i.e., 2 PSE interfaces x2 data plane tunnel endpoints x4 SAD per SAD).
The PSE SAD agents 110A, 110B receive control plane traffic from the PSE interfaces 108A, 108B, perform various operations described elsewhere herein, and send the control plane traffic to the PSE via either of the two PSE SAD anchors 112A, 112B. Similarly, the PSE SAD agents 110A, 110B receive control plane traffic from either of the two PSE SAD anchors 112A, 112B, perform various operations described elsewhere herein, and send the control plane traffic 107 to destinations within the provider network. The depicted PSE SAD agents 110A, 110B for a given slice and PSE SAD agents for other slices (not shown) operate within the PSE interface agent IVN 134. The PSE interfaces 108A, 108B store (or access) the network address of the PSE SAD anchor.
In at least some embodiments, the PSE SAD agent accesses the shared data store 306 or is otherwise able to exchange information. Such information exchange may be used for several reasons. For example, recall that a PSE SAD agent may sell an API interface to emulate the API interface of an associated SAD within the PSE. Since some communications may be stateful and various load balancing techniques may prevent the same PSE SAD agent from handling all communications for a given set of operations, one PSE SAD agent may need to access the state of communications previously handled by a different PSE SAD agent (e.g., PSE SAD agent 110A sends control plane operations to the PSE and PSE SAD agent 110B receives responses to the control plane operations from the PSE). For inbound control plane traffic, the PSE SAD agent may check whether the inbound message is consistent with the expected state and, if so, send the message via control plane traffic 107, as described elsewhere herein. If not, the PSE SAD agent 110A, 110B may drop the traffic. As another example, invoking the PSE SAD proxy may bridge a separate security session (e.g., TLS session) to prevent provider network credentials from being sent to the PSE. Likewise, since the PSE SAD agent handling the outbound message may be different from the PSE SAD agent handling the response to that message, the PSE SAD agent handling the response message may use the same key established between the originator of the outbound message and the PSE SAD agent handling the outbound message in order to send the security response message to the originator via control plane traffic 107.
In this example, each PSE framework provides a single control plane tunnel endpoint 204. For each of the available control plane tunnel endpoints 204, the provider network includes a PSE anchor. In this example, the provider network includes two PSE anchors 112A, 112B. The PSE SAD anchors 112A, 112B operate within the PSE SAD anchor IVN 136. The PSE anchor 112 receives control plane traffic from each of the eight PSE SAD agents (two PSE SAD agents per slice for each of the four SADs) and sends that traffic to the PSE. The PSE anchor also receives control plane traffic from the PSE and sends that traffic to one of two PSE SAD agents associated with the SAD that originated the traffic from the PSE. The PSE anchor 112A, 112B stores (or accesses) the network address of the PSE SAD agent for each SAD, the network address of the PSE's control plane tunnel endpoint, and one or more keys of or associated with the PSE's control plane tunnel endpoint to enable secure communications with those endpoints.
In at least some embodiments, a network component or a provider network may employ load balancing techniques to distribute the workload of routing control and data plane traffic between the provider network and the PSE. For example, traffic sent to PSE SAD VNA 304 may be distributed between the PSE interfaces 108A, 108B. As another example, each PSE interface 108 may distribute traffic between the data plane tunnel endpoints 206A, 206B. As another example, each PSE interface 108 may distribute traffic between PSE SAD agents 110A, 110B. As another example, each PSE SAD agent 110 may distribute outbound traffic between the PSE SAD anchors 112A, 112B. As another example, the PSE SAD anchor 112 may distribute inbound traffic between the PSE SAD agents 110A, 110B. In any case, such load balancing may be performed by the sending entity or a load balancer (not shown). Exemplary load balancing techniques include employing a load balancer with a single VNA that distributes traffic to multiple components "behind" that address, providing each data sender with addresses of multiple recipients and distributing selected recipients at the application level, and so forth.
It should be noted that although the embodiments depicted in fig. 1-3 show separate tunnels being established for control plane traffic and data plane traffic, other embodiments may use one or more tunnels for both control and data plane traffic. For example, the PSE interface may route data plane traffic to the PSE SAD anchor for transmission to the PSE over a shared tunnel, bypassing additional operations performed by the PSE SAD agent on control plane traffic.
Fig. 4 and 5 are block diagrams depicting an exemplary virtualized block storage system in accordance with at least some embodiments. As depicted in fig. 4, provider network 400 or PSE 488 includes host 420A. Host 420A is the host of block storage server instance 450. In hosting the block storage server instance 450, the host 420A has provisioned an amount of host storage 421 as provisioned storage 447 (e.g., virtual drive or disk) for the block storage server instance 450. The block storage server instance 450 may split the provisioned storage 447 and create and host volumes on behalf of other instances. An instance management service (not shown) of provider network 400 may initiate the launching of such chunk store server instances via instance manager 426 on a host of provider network 400 or on a host of PSE 488. The instance manager 426 may be a Virtual Machine Manager (VMM), hypervisor, or other virtualization program that can provision host resources (e.g., compute, memory, network, storage) for instances (e.g., virtual machines) and start and terminate the instances.
Host 420A is also the host of instance a 430 within instance IVN 432. Block storage server instance 450 provides block storage volume A434 to instance A430 and block storage volume B440 to another instance (not shown). Host 420A includes a block storage client 460A through which a hosted instance accesses a block storage volume hosted by a block storage server (e.g., by providing a virtual block storage interface to the instance). In particular, block storage client 460 may be a software program that intercepts or otherwise receives block storage operations published to a volume "attached" to an instance (e.g., block storage volume A434 attached to instance A430).
As shown in fig. 5, provider network 400 or PSE 488 includes two hosts 420A and 420B. Likewise, host 420A is the host of block storage server instance 450. Host 420B is the host of instance B436 within instance IVN 438. Block storage server instance 450 provides block storage volume B440 to instance B436. Host 420B includes a block storage client 460B through which a hosted instance accesses a block storage volume hosted by a block storage server.
When a volume is attached to an instance, a chunk store client obtains a mapping that relates the volume to the chunk store server instance that hosts the replica. The chunk store client may open a communication session (e.g., via a network-based data transfer protocol) with each of the chunk store server instances. After receiving a chunk store operation from an instance, the chunk store client issues the operation to the appropriate chunk store server to complete the operation of the instance. For example and referring to fig. 4, instance a 430 may issue a read of a block of data at a given block address (e.g., logical block address or LBA) from volume a 434. Block storage client 460A receives the read operation and issues an operation to block storage server instance 450 including the requested block address based on the mapping data of instance attached volume A434 to block storage server instance 450. After receiving the data from chunk store server instance 450, chunk store client 460A provides the data to instance A430. A similar series of operations may be performed between instance B436, block storage client 460B, block storage server instance 450, and volume B440 of fig. 5.
Referring to fig. 4 and 5, chunk store messaging between a chunk store client and a chunk store server instance may be sent using a network-based data transfer protocol, such as global network block device (GNDB). Such messages may be encapsulated as data plane traffic and routed between block storage client 460 and block storage server 450 by network manager 424 of the respective host, as described elsewhere herein. In some embodiments, where the chunk store server instance is hosted by the same host as the instance that issued the chunk store operation, the chunk store client may issue the associated operation via a network-based data transfer protocol to a network manager, which may then internally route the operation to the hosted chunk store server instance. In other embodiments where the block storage server instance is hosted by the same host as the instance that issued the block storage operation, the block storage client may issue the associated operation via a PCIe or other interconnect residing between the block storage client and the hosted instance, thereby avoiding protocol packetization associated with network-based transfers.
Referring to fig. 4 and 5, the system may deploy various levels of encryption to protect the chunk store data. To provide end-to-end encryption of chunk store traffic, chunk store server instance 452 executes within chunk store service (BSS) IVN 452, which may include any number of chunk store server instances and spans from the provider network to the PSE. Chunk store client 460 has a VNA 462 associated with chunk store service IVN 452. In this manner, IVN level encryption using chunk store service IVN encryption key 484 may be used to encrypt and decrypt traffic sent between chunk store server instance 450 and chunk store client 460. It should be noted that encryption and decryption of traffic carried via a chunk store service IVN may be performed by a network manager at endpoints of the IVN. For example, network manager 424B encrypts traffic sent from chunk store client 460B via VNA 462B. As another example, network manager 424A decrypts traffic sent to instances (including block storage server instance 450) hosted within BSS IVN 452. To provide static encryption of data, a block storage volume may further have volume level encryption. For example, volume A434 may be encrypted using volume A encryption key 480 and volume B440 may be encrypted using volume B encryption key 482. Using encryption key 480, chunk storage client 460A encrypts data written to volume A434 by instance A430 and decrypts data read from volume A434 by instance A430. Using encryption key 482, chunk store client 460B may perform similar encryption and decryption operations between instance B436 and volume B440. It should be noted that in some embodiments, the following strategies may exist: the block storage server instance that hosts the volume is prevented from being hosted on the same host as the instance attached to that volume to prevent the volume encryption key from being located on the same host as the volume it encrypted (e.g., so that volume a encryption key 480 and volume a434 are not on host 420A).
In some embodiments, the instance and the host manager are executed by a processor different from the processors indicated in fig. 4 and 5. For example, block storage server instance 450 may be executed by a first processor (e.g., processor 1710) and host manager 422A may be executed by a second processor (e.g., processor 1775). The processors executing the block storage server instance access the storage of the host via one or more interconnects. For example, a processor (e.g., processor 1710) executing block storage server instance 450 may access storage 421 (e.g., storage 1780) via a PCIe interconnect between the processor and the storage. As another example, the processor executing block storage server instance 450 may access storage device 421 via a first PCIe bus to a processor or system chip (e.g., processor 1775) executing host manager 422A, which in turn bridges those communications to storage device 421 via a second PCIe bus.
Although depicted as a single volume in fig. 4 and 5, each instance volume 434, 440 may be mirrored across multiple replicas stored on multiple different block storage servers to provide a high degree of reliability to the system. For example, one replica may be referred to as a primary replica, which handles reads from and writes to volumes (input and output operations, or "I/O"), and the server hosting that primary replica may synchronously propagate writes to other secondary replicas. If the primary replica fails, one of the secondary replicas is selected to be used as the primary replica (referred to as failover). Replicas can be subdivided (e.g., striped) into partitions, where each partition of a given replica is stored on a different server to facilitate concurrent reads and writes. The replica may also be encoded with redundancy (e.g., parity bits) such that when a block storage server becomes unavailable (e.g., due to a server or disk failure), lost data may be recovered from the available data. Thus, instance additional volumes may be hosted by many block storage servers.
One challenge associated with hosting chunk store servers in a virtualization context is booting those servers. How does a chunk store server instance boot from a boot volume hosted by the chunk store server instance in the absence of an operating chunk store server instance? This is especially challenging in the context of PSEs that may not include a dedicated (or bare metal) block storage server. Fig. 6 and 7 depict exemplary local boot techniques for booting a first or "seed" chunk store server instance within a PSE (from which other instances of chunk store server instances may be launched). Once started, other instances may be started from the seed chunk store server instance.
FIG. 6 is a block diagram depicting an exemplary system for booting a virtualized block storage server using a first technique, in accordance with at least some embodiments. At a high level, fig. 6 depicts an example of remotely booting a chunk store server 650 within an instance 690 via a proxy server hosted within a provider network 600. The Block Storage Service (BSS)606 of the provider network 600 provides users with the ability to create block storage volumes and attach those volumes to instances. For example, a user (e.g., a customer of the provider network, another service of the provider network) may submit commands via an API or console that are relayed to the BSS 606 to create, resize, or delete volumes managed by the BSS, and attach or detach those volumes to or from the instance. BSS 606 maintains BSS object storage 608 containing volume snapshots. A snapshot may be considered a type of object that, like a file, may include object data and metadata about the object (e.g., creation time, etc.). A volume snapshot is a copy (or backup) of a volume at a given point in time, allowing that volume to be recreated on a physical or virtual drive. Snapshots are generally divided into boot volumes and non-boot volumes, with the boot volumes facilitating the booting of software executing within an instance. The machine image may be a set of one or more snapshots of a given instance, including a boot volume.
In some embodiments, BSS 606 applies object-level encryption to objects in BSS object store 608. To prevent leakage of encryption keys used within provider network 600, BSS 606 may further manage PSE object storage 610, re-encrypting objects to be sent to a given PSE using an encryption key that is different from the encryption key used to encrypt objects in BSS object storage 608. In this example, chunk store server instance machine image 612A will be used to boot chunk store server 650 on PSE's host 620. As indicated at circle a, BSS 606 decrypts chunk store server instance machine image 612A using a key maintained within provider network 600 and re-encrypts it to chunk store server instance machine image 612B using key 613, which may be sent to the PSE.
The IVN may provide a measure of protection for traffic between the chunk store server instance and the chunk store client. In some implementations, the BSS 606 manages that IVN, depicted as BSS IVN 652. The host 620 on which the block storage server 650 is to be launched includes a host manager 622 having block storage clients 660 with associated VNAs 662 in a BSS IVN 652. To facilitate communication between block store client 660 and provider network 600 via BSS IVN652, BSS 606 starts a proxy instance 642 within BSS IVN 652. Proxy server instance 642 may provide a chunk store state interface to chunk store client 660 to allow client 660 access to chunk store server instance machine image 612B. The BSS 606 initiates the proxy server instance 642 as follows.
As indicated at circle B, BSS 606 first sends a message to bare block storage server 614 to create a new logical volume to serve as the boot volume for proxy server instance 642. The bare metal block storage server 614 creates a new logical volume 619 on its one or more storage devices 618 of the bare metal server 614 and loads the volume 619 with the proxy machine image 616 stored in the BSS object storage device 608.
As indicated at circle C, BSS 606 initiates the start of a proxy server instance 642 using an instance management service 640. The instance management service 640 is another control plane service of the provider network 600 that facilitates the initiation of instances within the provider network 600 or PSE. In some embodiments, instance management service 640 may track or access host utilization metrics that indicate how "a host is" hot ", i.e., metrics such as CPU, memory, and network utilization. For example, instance management service 640 or another service can periodically poll hosts to obtain utilization metrics. The instance management service 640 may identify the host hosting the instance in response to the launch request by accessing metrics associated with a pool of potential hosts (subject to any constraints on host location, such as within a given PSE) and selecting a host with low utilization (e.g., below some threshold or thresholds). In requesting the initiation of proxy instance 642, BSS 606 specifies that instances should be initiated within BSS IVN652 and from volume 619. As indicated at circle D, the instance management service 640 identifies the host 646 hosting the instance and sends a message to the host 646 to initiate the instance. An instance manager (not shown) of host 646 provisions the host's resources for proxy instance 642 and starts the proxy instance from boot volume 619.
The BSS 606 also initiates the start-up of the block store server 650 via the instance management service 640. BSS 606 specifies that instances should be started within BSS IVN652 on a particular PSE (e.g., with a PSE identifier), as well as boot volume start instances available from proxy instance 642. As indicated at circle E, the instance management service 640 identifies the host 620 hosting the chunk store server instance 650, and sends a message to the host 620 (e.g., via a control plane traffic tunnel) to initiate the instance. As depicted, the request to start an instance is received by an instance manager 626, which may provision host resources to start and terminate the instance. Here, instance manager 626 creates an instance 690 of the hosted chunk store server.
As part of configuring instance 690, instance manager 626 may configure basic input/output system (BIOS)692 of instance 690 to load block storage server instance machine image 612B to virtual boot volume 694 via proxy server instance 642. The BIOS 692 may include a block storage device driver (e.g., a non-volatile memory express (NVMe) driver) and have two additional block storage devices: virtualized block storage as presented by proxy server instance 642 via block storage client 660 and boot volume 694. It should be noted that although depicted within instance 690, boot volume 694 corresponds to a volume on a virtual drive allocated to instance 690 by instance manager 626 from a storage (not shown) of host 620, which may be accessed via block storage client 660 or another component of host manager 622. Other embodiments may include an interface other than BIOS to connect instance 690 to two block devices, e.g., Unified Extensible Firmware Interface (UEFI).
During execution, BIOS 692 (or other firmware such as UEFI) may load boot volume 694 from chunk store server instance machine image 612B. When chunk store server instance machine image 612B is encrypted using key 613, chunk store client 660 decrypts chunk store server instance machine image 612B during the loading process. Once the load operation is complete, the BIOS continues the boot process to boot the instance 690 from boot volume D94 containing the chunk store server software. In some embodiments, BIOS 692 may unmount the block storage device corresponding to proxy server instance 642 prior to booting block storage server software 650.
Although depicted and described using proxy server instance 642, other embodiments may provide chunk store to instance 690 via chunk store client 660 using a chunk store server instance (not shown) launched from chunk store server instance machine image 612A. The BSS 606 creates a volume on the virtual drive or disk of the started block storage server instance and loads the block storage server instance machine image 612A to the volume. During the loading process, the chunk store server instance decrypts the machine image using the provider network key, and optionally encrypts the volume using a volume-specific key that BSS 606 can provide to chunk store client 660 (e.g., such as how key 480 encrypts volume B440).
FIG. 7 is a block diagram depicting an exemplary system for booting a virtualized block storage server using a second technique, in accordance with at least some embodiments. At a high level, FIG. 7 depicts an example of remotely booting an instance 790 to execute block store server software 750 via pre-boot software executed by the instance 790 prior to executing the block store server software 750. Specifically, storage 721 of PSE 720 is preloaded (e.g., prior to shipping the PSE) with a pre-boot instance boot volume 723. Preboot instance boot volume 723 includes software that can boot on an instance to load the block storage server software to another boot volume during block storage server preboot phase 798. The instance may then restart and boot from another boot volume to start the chunk store server instance during the chunk store server boot phase 799.
As indicated at circle a, BSS 606 initiates the start of a block storage server instance using instance management service 640, specifying that the instance should be started within a BSS IVN on a particular PSE. Likewise, the instance management service 640 identifies the host 620 of the PSE hosting the chunk store server instance 650, and sends a message to initiate the instance to the host manager 722 of the host (e.g., via a control plane traffic tunnel). The request to start an instance is received by an instance manager (not shown) of host manager 722.
As indicated at circle B, host manager 722 (or an instance manager) provisions the resources of the host that will execute instance 790 of block storage server 750. In addition, host manager 722 may configure BIOS 792A (or UEFI) and boot from pre-boot instance boot volume 723 using two additional block storage devices, a block storage device interface and a virtual boot volume 794 (such as boot volume 694 described above) presented to pre-boot instance boot volume 723 by a block storage client (not shown).
As indicated at circle C, BIOS 792A proceeds to allow pre-boot software 796 to boot from pre-boot instance boot volume 723. If the pre-boot instance boot volume 723 is based on a general machine image, the host manager 722 may also have to update the configuration of the instance 790 to facilitate communication between the pre-boot software 796 and the provider network 700, as indicated at circle D. In particular, the host manager 722 may attach the VNA to the instance 790, provide the pre-boot software 796 with credentials to use when accessing the PSE object storage 610, and provide keys to decrypt machine images to be loaded to the boot volume 794. The VNA configuration, certificates, and keys may have been passed to host manager 722 as part of a request from instance management service 640 to start an instance.
As indicated at circle E, once the instance 790 is configured and executing the pre-boot software 796, the pre-boot software 796 copies the chunk storage server instance machine image 612B from the PSE object store 610 and loads it to the boot volume 794. As part of loading the boot volume 794, the preboot software 796 may use the key 613 to decrypt the chunk store server instance machine image 612B (assuming that the chunk store server instance machine image was encrypted). The host manager 722 may detect completion of the loading of the boot volume 794 or the pre-boot software may signal the completion. For example, the pre-boot software may initiate an instance restart when the boot volume 794 has been loaded. As another example, the network manager 724 may detect termination of a session between the instance 790 and the PSE object store 610 (e.g., when closing a TCP session).
As indicated at circle F, during or once data transfer is complete, host manager 722 may update or otherwise reconfigure BIOS 792A-792B. Such reconfiguration includes changing the boot volume from the pre-boot instance boot volume 723 to the boot volume 794 and removing the block storage device that includes the pre-boot instance boot volume 723. Prior to restarting the instance 790, the host manager 722 may also clear the memory supplied to the instance 790 that may contain the remaining data written by the pre-boot software 796. After restarting instance 790, BIOS 792B boots block storage server 750 from boot volume 794.
Although fig. 6 and 7 depict example techniques for booting a block storage server instance within a PSE, other techniques are possible. For example, the instance manager may create a volume on the PSE's storage as part of provisioning resources for the instance to be started. Prior to booting an instance, the instance manager may send a message to a control plane service (e.g., a block storage service or an instance management service) requesting a bootable snapshot to be loaded to a volume. The control plane service may send the bootable snapshot to the instance manager via a control plane traffic tunnel. Once loaded, the instance manager may allow instance software to be launched.
FIG. 8 is a block diagram depicting an exemplary system for booting additional compute instances in a provider's underlying extension from a chunk store server in accordance with at least some embodiments. Once a block storage server instance has been started, such as using the techniques described with reference to FIG. 6 or FIG. 7, that "seed" block storage server instance may provide boot volumes for other instances (including other block storage server instances). This may be particularly useful in the context of a PSE where the seed chunk store server instance is launched from a machine image delivered or hosted via a relatively slow connection between the provider network and the PSE, as compared to the PSE's underlying interconnect. In this example, instance 890 has been launched on host 820A of PSE 888. Instance 890 executes block storage server software 850 and operates within BSS IVN 844. In addition, the chunk store clients 860A and 860B have additional VNAs 862A and 862B that permit communication over the BSS IVN 844.
As indicated at circle a, the instance management service 640 of the provider network 800 may request the BSS 606 to create a new volume from the specified snapshot. As indicated at circle B, BSS 606 may direct block storage server instance 890 to create a volume based on the specified snapshot. In some implementations, the instance management service 640 can provide a volume-specific encryption key (e.g., key 680) to encrypt the volume.
As indicated at circle C, block storage server instance 890 may create a volume by taking a specified snapshot from object storage 810 (such as PSE object storage 610). Although not shown, in other embodiments, object store 810 may be an object store hosted by a BSS cache instance on PSE 888 to buffer volume snapshots and/or machine images. The BSS 606 may manage BSS cache instances, and clients may specify to the BSS 606 that a particular snapshot is to be loaded into the cache prior to any request to start an instance. In this manner, block storage server instance 890 may create volumes from cached boot volume snapshots, thereby greatly reducing boot time of instances on the PSE by avoiding delays associated with data transfer from object storage in provider network 800 to block storage server instance 890.
As indicated at circle C, the instance management service 640 of the provider network 600 may issue commands to the host of the PSE 888 to append or unmount volumes hosted by the chunk store server instance 890 to other instances hosted by the PSE 888. For example, volume a 832 may be loaded with a bootable snapshot. Instance management service 640 can direct host 820A to initiate instance a 830 using a volume identifier associated with volume a 832. In response, chunk store client 860A may attach volume a 832 hosted by chunk store server instance 890 to instance a 830 via BSS IVN 844, thereby allowing instance a 830 to be launched from volume a 832. As another example, volume B836 may be loaded with a non-bootable snapshot containing other data. Instance management service 640 can direct host 820B to attach volume B836 to the hosted instance 834. In response, block storage client 860B may append volume B836 hosted by block storage server instance 890 to instance B834 via BSS IVN 844.
FIG. 9 is a block diagram depicting an exemplary system for managing a virtualized block storage server in accordance with at least some embodiments. Executing chunk store servers within a virtualized environment has several benefits, including allowing an instance management service to automatically scale the number of chunk store server instances executed as needed. Rather than the BSS operating with a defined pool of bare metal servers that is manually expanded, the instance management service can monitor the utilization of host resources (e.g., CPU, memory, network, etc.) and automatically adjust the number of running block storage server instances. For example, if two hosts of a block storage server instance report high resource utilization, the instance management service may initiate additional block storage server instances on hosts with lower resource utilization. The block storage service may then create a volume on the newly started instance, thereby avoiding an increase in workload and possible performance degradation of the block storage server instance that was running.
In addition, executing the block storage server as an example allows the number of failed domains to exceed the number of host computer systems by decoupling the failed domains from the computer systems as a whole. Increasing the number of failure domains allows for an increased number of block storage servers executing on a fixed set of hardware, and increasing the number of block storage servers reduces the total footprint of data managed by any particular server. For example, assume that the system includes nine host computer systems that are not subdivided into smaller failure domains (i.e., one computer system is one failure domain). To avoid data loss, each host or failure domain executes a single block storage server. If, for example, those nine chunk storage servers host 90 Terabytes (TB) of data across volumes (including encoding that allows data recovery), each chunk storage server will store approximately 10TB of data. If one of the hosts fails, then a 10TB will need to be recovered (e.g., from the other 80 TB). Such data recovery imposes data computation, transfer, and storage costs. If those nine hosts were each subdivided into two failure domains and the number of block storage servers increased to eighteen, each block storage server would store approximately 5TB of data, roughly halving the amount of data that would need to be recovered (and corresponding cost) in the event of a component failure in one of the failure domains.
In this example, PSE 988 or provider network 900 includes three hosts 920A, 920B, and 920C. The host may be considered a single fault domain or subdivided into two or more fault domains, depending on the hardware design of the host. Here, hosts 920 each include two fault domains. Host 920A includes a failure domain 922 and a failure domain 924 such that components of one failure domain may continue to operate even if there is a component failure in the other failure domain. For example, the failure domain 922 may correspond to a first processor of a multiprocessor system that is connected to a first memory bank (e.g., RAM) and that uses a first set of one or more storage drives (e.g., SSD), while the failure domain 924 may correspond to a second processor of the system that is connected to a second memory bank and that uses a second set of one or more storage drives. It should be noted that some components may be shared across fault domains, such as power supplies, again in terms of redundancy in the hardware design and how the fault domains are overlaid onto the hardware.
Host 920A is executing block storage server instance 950A within failure domain 922 and block storage server instance 950B within failure domain 924. Host 920B is executing block storage server instance 950C within failure domain 926 and block storage server instance 950F within failure domain 928. Volume A includes a primary replica 660 provided by block storage server instance 950A and a secondary replica 662 provided by block storage server instance 950B. Volume B includes a primary replica 664 provided by the block storage server instance 950C and a secondary replica 666 provided by the block storage server instance 950A. Volume C includes a primary replica 668 provided by the block storage server instance 950B and a secondary replica 670 provided by the block storage server instance 950C. Although depicted as two replicas per volume, in practice each volume may have fewer or more replicas, and each replica may be split (e.g., via striping) between many different instances of block storage servers.
As indicated at circle A, the instance management service 940 monitors utilization of the hosts' physical resources, such as processor utilization, memory utilization, storage utilization, and network utilization, which may be aggregated across hosts separated by the fault domain or based on resources (e.g., the instance management service 940 may receive metrics from the host 920A indicating that the average CPU utilization is 50%, the CPU utilization of the processors of the fault domain 922 is 50%, or the CPU utilization of the particular processors supporting the fault domain 922 is 50%). The instance management service 940 can track resource utilization of the hosts in a database (not shown). It should be noted that the depicted fault domain may host other instances (e.g., customer instances) that affect utilization of the host's physical resources in addition to the chunk store server instance. The instance management service 940 may periodically update the BSS 906 using the resource utilization metric to allow the BSS 906 to select a lower-used block storage server instance when creating a new volume, as indicated at circle B.
As indicated at circle C, the instance management service 940 can launch a new chunk store server instance when the resource utilization of the hosts supporting the chunk store server instance exceeds one or more thresholds. The threshold may be defined in various ways, such as based on each resource aggregated across hosts (e.g., average processor utilization of all processors executing the block storage server is above 50%), based on some combination of resources across hosts (e.g., storage utilization is above 80% and processor utilization is above 50%), or based on individual resources and/or hosts. To avoid launching a chunk store server instance in a failure domain that already hosts the chunk store server instance, instance management service 940 may track (e.g., in a database containing resource usage metrics) which failure domains are occupied and which failure domains are available. Here, instance management service 940 has determined that the resource utilization of hosts 920A, 920B, and/or 920C has exceeded a certain threshold, and starts new block storage server instances 950D and 950E in previously unoccupied fault domains 930 and 932 of host 920C, respectively. When failure domain 928 includes a block storage server 950F that does not host any volumes, the resource utilization of other instances hosted within that failure domain or the aggregate resource utilization of host 920B may be too high to support another block storage server instance.
As indicated at circle D, instance management service 940 may update BSS 906 to provide an updated identification of the operational block store instance 950 (now including block store servers 950D and 950E). Based on the identified block storage instance 950 and the resource utilization metrics received from the instance management service 940, BSS 906 may create a new volume on host 920C, as indicated at circle E. Here, assuming that host 920C exhibits low resource utilization reported from instance management service 940, BSS 906 creates a new volume D including a primary replica 670 provided by block storage server instance 950D and a secondary replica 672 provided by block storage server instance 950E.
FIG. 10 is a block diagram depicting an exemplary system for providing volume to chunk storage client mapping in accordance with at least some embodiments. As mentioned slightly above, a single volume may be associated with multiple replicas, such as a primary replica and a number of secondary replicas. Each replica may be distributed across a number of block storage servers. The association between a volume, its replicas, and the servers hosting those replicas (or several replicas) may change over time upon the occurrence of a hardware failure or when data is migrated between servers as part of a background load balancing operation. When deploying the block storage server to the PSE, those association changes are preferably allowed even when the PSE is disconnected from or unable to reach the provider network, in order to maintain a high degree of data availability (e.g., failure from the primary replica to the secondary replica) and durability (e.g., immediately begin the process of recreating the lost data in the event of a server failure).
As depicted, PSE 1088 includes components to track how volumes are distributed (or mapped) across the chunk store server instances, such that when a chunk store client needs to access a volume, the chunk store client can locate the chunk store server hosting that volume. The example volume a mapping data 1098 includes several items for each entry: a server identifier (e.g., a unique identifier associated with an instance or host hardware), a server address (e.g., an IP address), and a volume type (e.g., primary or secondary) of volume a. The mapping data may include different or additional items, such as chunk identifiers (e.g., whether a replica is split or striped across multiple chunk store server instances). The example volume A mapping data 1098 indicates that volume A includes a primary replica 1052 provided by the block storage server instance 1050A and two secondary replicas 1054 and 1056 provided by the block storage server instances 950B and 1050C, respectively. It should be noted that PSE 1088 may, and may, host many other chunk store server instances (not shown).
In order to reliably store volume mapping data 1098, distributed data storage 1064 may store volume mapping data 1098. In some embodiments, each distributed data storage corresponds to a cluster of nodes that individually maintain the state of the volume map. Each node in a cluster exchanges messages with other nodes of the cluster to update its state based on the state viewed by the other cluster nodes. One of the nodes of the cluster may be designated as a leader or primary node through which changes to the volume mapping are proposed. The nodes of the cluster may implement a consensus protocol, such as the Paxos protocol that proposes and agrees to changes in the volume mapping data for a given volume. The cluster may track volume mapping data for one or more volumes. As depicted, cluster 1066 tracks volume mapping data 1098 for volume A, while other clusters 1067 track volume mapping data for other volumes.
In some embodiments, the nodes of the cluster are instances executed by hosts of the provider network. Such instances persist their separate views of the volume mapping data to the host's non-volatile storage (e.g., via the block storage client to the volumes hosted by the block storage server instance). In other embodiments, the nodes of the cluster are part of block storage server software executed by a block storage server instance. As depicted in exemplary node software environment 1090, the nodes may execute as containers 1082 hosted within container engine processes included in chunk store server software 1080. Such nodes may persist their view of the volume-mapped data directly to the volume provided by the block storage server software. Preferably, the nodes of the cluster are hosted by separate instances or within separate failure domains.
Like other software executed by the host of the PSE, the node suffers from a hardware failure. In this case, the remaining node (or block storage service of the provider network) may detect the loss of the node, create a new node to replace the lost node, and update the volume mapping data of the new node based on the consensus view of the volume mapping data of the other nodes. Thus, not only may the volume mapping data vary, the identity of the instance of the node hosting the cluster may also vary. Clusters may be tracked using cluster mapping data. The example cluster mapping data 1099 includes several items: a NODE identifier (e.g., VOL _ a _ NODE1), a NODE address (e.g., an IP address), and a NODE type (e.g., whether it is a leader NODE). In this example, the cluster is formed of five nodes.
The cluster map data may be determined and maintained by the cluster discovery service 1062. As indicated at circle a, cluster discovery service 1062 may monitor node locations of clusters of various volumes hosted by chunk store server instances within the PSE. The cluster discovery service 1062 may monitor node locations in various ways. For example, in an embodiment where the node is executing in environment 1090, cluster discovery service 1062 may periodically poll all chunk store server instances 1050 hosted by PSE 1088 to obtain the identity of any resident nodes. As another example, a network manager of a host of PSE 1088 may be configured to route special broadcast messages to any hosted cluster node (e.g., such as hosted directly or indirectly by a chunk store server instance). The cluster discovery service 1062 may periodically broadcast queries to obtain the identity of any hosted cluster nodes.
In some implementations, the cluster discovery service 1062 is an instance hosted by one of the hosts of the PSE 1088. Such instances may have VNAs with IP addresses reserved within BSS IVN 1052 so that they can be reached even if the host must be changed due to a hardware failure. In other embodiments, the cluster discovery service 1062 may be integrated into the PSE's DNS service. For example, a volume cluster may be associated with a domain, and a DNS service may resolve a name resolution request to that name to the IP address of one or more nodes of the cluster.
As indicated at circle B, instance management service 1040 may send a message to block storage client 1060 for the particular host that attaches the volume to the hosted instance (not shown). For example, instance management service 1040 can send a message to the host that includes the instance identifier and the volume identifier for volume a. As indicated at circle C, chunk store client 1060 may query cluster discovery service 1062 to obtain the identity of one or more nodes of volume a cluster 1066. In some embodiments, chunk store client 1060 can buffer cluster map data in cluster map data cache 1066. It should be noted that in some embodiments, cluster discovery service 1062 may be omitted and block storage client 1060 is configured to query (e.g., via the broadcast mechanism described above) the block storage server instance of PSE 1088 to identify the nodes of volume a cluster 1066.
As indicated at circle D, the block storage client 1060 may obtain a current view of the volume mapping data for volume a from the cluster 1066 and connect to the hosting volume a block storage server 1050 based on the volume mapping data, as indicated at circle E. Although not depicted, in some embodiments, upon receiving a connection request from a client, the block storage server may send a message to the volume cluster to indicate whether the block storage server is still hosting that volume (or at least a portion of that volume). Although connection requests are received from the chunk store client, the chunk store server may not host the volume for a variety of reasons. For example, the most recent change to the set of servers hosting the volume may not have propagated to or across the volume cluster, or the block storage client sending the connection request may have relied on outdated buffered volume mapping data. Regardless of whether the block storage server is hosting the volume, the block storage server receiving the connection request from the client may obtain the identity of one or more nodes of the volume from the cluster discovery service 1062. If the block storage server is no longer hosting the volume, the block storage server may propose an update to the data storage maintained by the cluster to remove the block storage server from the volume mapping data. In addition, the chunk store server may send a response to the chunk store client that initiated the request to indicate that the connection attempt failed, optionally indicating that the volume is no longer hosted by the server. If the chunk storage server is still hosting the volume, the chunk storage server may send an acknowledgement to the cluster to indicate to the cluster that at least that portion of the mapping data is still valid.
FIG. 11 is a block diagram depicting an exemplary system for tracking volume mappings in accordance with at least some embodiments. As described above, connectivity between the PSE and the provider network cannot be guaranteed. To meet the desired level of data availability and data durability, the PSE includes facilities that allow mapping changes between a given volume and the block storage server instance hosting that volume. As described with reference to FIG. 10, a cluster implementing distributed data storage may be used to track volumes for a given volume. FIG. 11 depicts a hierarchical method of volume placement, or a process whereby an instance of a chunk storage server is selected to host a replica or portion of a replica of a volume. Specifically, the BSS volume placement service 1107 of the BSS1106 makes an initial placement determination and associated volume mapping after the volume is created, and the PSE volume placement service 1108 manages subsequent changes to the volume mapping over the lifetime of the volume. PSE volume placement service 1108 may be implemented in various ways, such as by an instance hosted by PSE 1188, a component integrated into a PSE frame (e.g., PSE frame 202), and so forth.
As indicated at circle a, PSE volume placement service 1108 monitors the status of the chunk store server instance hosted by PSE 1188. For example, PSE volume placement service 1108 may periodically poll the hosted chunk store server instances to check whether they respond, and/or collect metrics related to resource usage of the hosted chunk store server instances, their failure domains, and/or hosts (e.g., such as described with reference to fig. 9). As depicted, PSE volume placement service 1108 may send the collected server states to BSS volume placement service 1107 as indicated at circle B. It should be noted that in other embodiments, BSS volume placement service 1107 may obtain metrics related to resource utilization from an instance management service (such as described with reference to fig. 9).
After receiving a request to create a new block storage volume for an instance hosted by PSE 1188, BSS1106 may request a volume placement recommendation from BSS placement service 1107. Depending on the profile of the new volume (e.g., how many replicas, whether replicas are striped, etc.), BSS placement service 1107 may provide identification of recommended chunk store server instances. In this example, BSS placement service 1107 recommends chunk store server instances 1150A and 1150B to host a new volume a with two replicas. BSS1106 employs the recommended chunk store server instance to create a new cluster 1166 to track the volume mapping data for volume a, as indicated at circle C. The mapping data initially identifies chunk store server instance 1150A as hosting a primary replica and chunk store server instance 1150B as hosting a secondary replica. In addition, BSS1106 sends one or more messages to the identified block storage servers to create storage volumes (e.g., the storage volumes hosted by block storage server instance 1150A and the storage volumes hosted by block storage server instance 1150B), as indicated at circle D. Storage volumes may be supported by the base host storage device capacity provisioned to the respective block storage server instance. The storage volume created using block storage server instance 1150A may host a primary replica 1152 of volume a and the storage volume created using block storage server instance 1150B may host a secondary replica 1154A of volume a. In some embodiments, the block storage server instance may load the newly created storage volume from a volume snapshot or machine image, such as described with reference to FIG. 8. In this example, an instance (not shown) attached to volume A performs a block store operation at a time, where block store server instance 1150A communicates with block store server instance 1150B (e.g., propagates writes to primary replica 1152 to secondary replica 1154B, as indicated at circle E).
At some point, chunk store server instance 1150B may experience a problem, as indicated at circle F. For example, chunk store server instance 1150B may be slow or unresponsive (e.g., due to memory leaks, hardware failures, etc.). The detection of the problem may be performed in various ways. In some embodiments, chunk store server instance 1150A detects problems, such as due to an inability to confirm propagated writes. In this case, chunk store server instance 1150A may include a policy that includes one or more actions to perform in response to a detected problem. For example, block storage server instance 1150A may wait for some number of consecutively propagated writes to be unacknowledged or for some period of time. At that point, the chunk store server instance 1150A may request a replacement chunk store server instance for the secondary replica 1154 from the PSE volume placement service 1108. In other implementations, the PSE volume placement service 1108 detects a problem (e.g., based on the collected metrics or responsiveness) during the monitoring described above with reference to circle a. Likewise, PSE volume placement service 1108 may include a policy that includes one or more actions to be performed in response to a detected problem, including initiating replacement of a block storage server instance for secondary replica 1154. Regardless of the detector, PSE volume placement service 1108 provides the identity of the replacement chunk store server instance to chunk store server instance 1150A.
In this example, PSE volume placement service 1108 identifies chunk store server instance 1150C through chunk store server instance 1150A, as indicated at circle G. A message is sent to the block storage server instance 1150C to create a storage volume to support relocation of replica data. For example, after identifying block storage server instance 1150C, PSE volume placement service 1108 may send a message to block storage server instance 1150C to create a storage volume. As another example, block storage server instance 1150A may send a message to create a storage volume after receiving the identification from PSE volume placement service 1108. Once created, the chunk store server instance 1150A may initiate mirroring the replica 1152 of the chunk store server instance 1150C as replica 1154B, as indicated at circle H. It should be noted that if block storage server instance 1150B is still responding (e.g., but exhibits poor performance), a mirroring operation may be performed by copying replica 1154A to replica 1154B. While it is feasible to re-mirror from replica 1154 or replica 1152 in this scenario, because those replicas are not distributed among storage servers, in other embodiments, it may be desirable to access the various storage servers where the replicas are distributed to recreate or otherwise generate the lost data, such as by using redundancy (e.g., parity bits, error correction codes, etc.) encoded into the stored data. For example, if replica data is encoded and distributed across ten block storage server instances, one of which is missing, the missing data may be recreated by reading the data associated with the replica from the nine remaining block storage server instances. As another example, if another ten block storage servers use the same distribution pattern to host another replica of a volume, the lost data may be copied from the block storage server instance hosting the corresponding portion of the other replica.
As indicated at circle I, chunk storage server instance 1150A may submit a request to cluster 1166 to update the volume mapping for volume A, using chunk storage server instance 1150C to replace chunk storage server instance 1150B as the host for the secondary replica. Chunk store server instance 1150A may submit the request to cluster 1166 after initiating or completing the re-imaging of chunk store server instance 1150C. In other embodiments, another entity may submit a request to cluster 1166 to update the volume mapping for volume a, such as PSE volume placement service 1108 or block storage server instance 1150C.
FIG. 12 is a flowchart depicting the operation of a method for starting a virtualized block storage server in accordance with at least some embodiments. Some or all operations (or any other processes, or variations and/or combinations thereof, described herein) are performed under control of one or more computer systems configured with executable instructions and implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executed collectively by hardware or a combination thereof on one or more processors. The code is stored on a computer-readable storage medium, for example, in the form of a computer program comprising instructions executable by one or more processors. The computer readable storage medium is non-transitory. In some embodiments, one or more (or all) of the operations are performed by a computer program, or an application executed by one or more components of an extension portion of a provider network. The extended portion of the provider network includes one or more physical computing devices or systems and is remotely located (e.g., outside of the data center network) from the data center of the provider network, such as within the residence of the customer of the provider network. A provider network, such as a cloud provider network, includes various services performed by computer systems located within a data center of the provider network. One or more components of the extension communicate with the provider network, such as by receiving management operations from a service executed by a computer system of the provider network. In some embodiments, one or more (or all) of the operations are performed by components of a host of other figures (e.g., host 420A).
The operations include receiving, by a computer system of an extension of a provider network, a first request to launch a first virtual machine to host a block storage server application at block 1205, wherein the extension of the provider network is in communication with the provider network via at least a third party network. In providing block storage to instances hosted by the PSE, virtualization provided by a host of the PSE may be used to host the block storage server. For example, a block storage service of the provider network may initiate a start of a block storage server virtual machine via an instance management service, which may then issue a start request to a host manager of a selected host of the PSE via a secure communication channel between the provider network and the PSE.
The operations further include provisioning the first virtual machine with at least a portion of the storage capacity of the one or more storage devices of the host computer system as provisioned storage devices at block 1210. As part of starting a virtual machine, a host manager of a host system may allocate or otherwise provision some portion of the computing resources of the host system to the virtual machine. Such resources may include, for example, storage capacity of a storage device of a host system (e.g., SSD), memory capacity (e.g., RAM), a processor or processor core, and so forth.
The operations also include executing, at block 1215, the block storage server application using the first virtual machine. As part of executing the block storage server application, the operations further include creating a logical volume on the provisioned storage device in response to a second request from the block storage service of the provider network to create the logical volume at block 1220. For example, a block storage service of a provider network may send one or more messages that create volumes (using the provisioned storage capacity) that may be attached to other instances so that the instances may access the volumes via a block storage interface.
As part of executing the chunk store server application, the operations further include receiving a third request to perform an input/output operation on the logical volume at block 1225, and performing the requested input/output operation on the logical volume at block 1230. For example and referring to FIG. 4, instance A430 may issue a command to read a block of data from a block address of a virtual block device attached to the instance and supported by block storage volume A434. The block storage server instance 450 may receive that command (e.g., via BSS IVN 452) and process the command for the block storage volume a 434. As another example and referring to fig. 5, instance B436 may issue a command to write a block of data to a block address of a virtual block device attached to the instance and supported by block storage volume B440. Block storage server instance 450 may receive that command (e.g., via BSS IVN 452) and process the command for block storage volume a 440.
FIG. 13 is a flowchart depicting the operation of a method for using a virtualized block storage server in accordance with at least some embodiments. Some or all operations (or any other processes, or variations and/or combinations thereof described herein) are performed under control of one or more computer systems configured with executable instructions and implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executed collectively by hardware or combinations thereof on one or more processors. The code is stored on a computer-readable storage medium, for example, in the form of a computer program comprising instructions executable by one or more processors. The computer readable storage medium is non-transitory. In some embodiments, one or more (or all) of the operations are performed by a computer program, or an application executed by one or more components of an extension portion of a provider network. The extended portion of the provider network includes one or more physical computing devices or systems and is remotely located (e.g., outside of the data center network) from the data center of the provider network, such as within the residence of the customer of the provider network. A provider network, such as a cloud provider network, includes various services performed by computer systems located within a data center of the provider network. One or more components of the extension communicate with the provider network, such as by receiving management operations from a service executed by a computer system of the provider network. In some embodiments, one or more (or all) operations are performed by a host of other figures (e.g., host 420A).
The operations include executing, by the computer system, a first block storage server virtual machine to host a first volume using one or more storage devices of the computer system at block 1305. As depicted in fig. 4, for example, host 420A is hosting a virtual machine (block storage server instance 450). The operations also include executing, by the computer system, a second virtual machine having access to the virtual block storage at block 1310. Host 420A also hosts another virtual machine (instance 430). The operations also include executing, by the computer system, the block storage client at block 1315. Host 420A includes a chunk store client 460A that may facilitate attaching chunk storage to a hosted virtual machine. As part of executing the block storage client, the operations further include receiving, at block 1320, a first block storage operation from the second virtual machine executing on the virtual block storage, and sending, at block 1325, a message to the first block storage server virtual machine to cause the first block storage server virtual machine to perform the first block storage operation on the first volume. The virtual machine may issue block storage operations (e.g., block reads, writes, burst reads, writes, etc.) to the block storage attached via the block storage client. For example, a block storage client may relay those block storage operations to a block storage server hosting a volume containing block addresses across the network.
FIG. 14 is a flowchart depicting the operation of a method for managing a virtualized block storage server in a provider underlying extension in accordance with at least some embodiments. Some or all operations (or any other processes, or variations and/or combinations thereof described herein) are performed under control of one or more computer systems configured with executable instructions and implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executed collectively by hardware or combinations thereof on one or more processors. The code is stored on a computer-readable storage medium, for example, in the form of a computer program comprising instructions executable by one or more processors. The computer readable storage medium is non-transitory. In some embodiments, one or more (or all) of the operations are performed by a computer program, or an application executed by one or more components of an extension portion of a provider network. The extended portion of the provider network includes one or more physical computing devices or systems and is remotely located (e.g., outside of the data center network) from the data center of the provider network, such as within the residence of the customer of the provider network. A provider network, such as a cloud provider network, includes various services performed by computer systems located within a data center of the provider network. One or more components of the extension communicate with the provider network, such as by receiving management operations from a service executed by a computer system of the provider network. In some embodiments, one or more (or all) operations are performed by components of the PSEs of other figures (e.g., PSE 1088, PSE 1188).
The operations include receiving, by a first block storage server instance, a first request to create a first storage volume to store a first portion of a first logical volume at block 1405, and receiving, by a second block storage server instance, a second request to create a second storage volume to store a second portion of the first logical volume at block 1410. As depicted in fig. 11, for example, the initial placement of a volume (e.g., replica, striped replica, etc.) may originate from chunk store service 1106 of the provider network and be received by a component of PSE 1188. In the example depicted in FIG. 11, volume A is initially stored using block storage server instances 1150A and 1150C on the host of PSE 1188.
The operations also include sending, at block 1415, a third request to the third block storage server instance to create a third storage volume to store the second portion of the first logical volume. At some point, the chunk store server instance hosting a portion of the volume may need to be changed. As described herein, one of PSE volume placement service 1108 or other chunk storage server instances of the managed volume may send a message to the other chunk storage server instance to replace the changed instance.
The operations also include storing, by the third block storage server instance, a second portion of the first logical volume to a third storage volume at block 1420. The operations further include, at block 1425, updating the data storage containing the identification of each block storage server instance hosting a portion of the first logical volume to remove the identification of the second block storage server instance and to add the identification of the third block storage server instance. To track migration of volumes across instances, PSE 1188 may host a data store that maps the volumes (including replicas thereof, stripes across servers (if any), etc.). Such data storage may be a cluster such as described with reference to fig. 10.
FIG. 15 depicts an exemplary provider network (or "service provider system") environment, according to at least some embodiments. Provider network 1500 may provide resource virtualization to customers via one or more virtualization services 1510 that allow customers to purchase, lease, or otherwise obtain instances 1512 of virtualized resources, including (but not limited to) computing and storage resources implemented on devices within the provider network or networks in one or more data centers. A local Internet Protocol (IP) address 1516 can be associated with resource instance 1512; the local IP address is an internal network address of the resource instance 1512 on the provider network 1500. In some embodiments, the provider network 1500 may also provide public IP addresses 1514 and/or public IP address ranges (e.g., internet protocol version 4(IPv4) or internet protocol version 6(IPv6) addresses) that are available to customers from the provider 1500.
Conventionally, provider network 1500 may allow, via virtualization service 1510, customers of a service provider (e.g., customers operating one or more client networks 1550A-1550C including one or more customer devices 1552) to dynamically associate at least some of the public IP addresses 1514 assigned or allocated to the customers with particular resource instances 1512 assigned to the customers. Provider network 1500 may also allow a customer to remap public IP address 1514 previously mapped to one virtualized computing resource instance 1512 allocated to the customer to another virtualized computing resource instance 1512 also allocated to the customer. Customers of the service provider, such as operators of customer networks 1550A-1550C, can, for example, implement customer-specific applications using virtualized computing resource instances 1512 and public IP addresses 1514 provided by the service provider, and present the customer's applications over an intermediate network 1540, such as the internet. Other network entities 1520 on the intermediate network 1540 can then generate traffic to the destination public IP address 1514 published by the customer networks 1550A-1550C; the traffic is routed to the service provider data center and routed at the data center via the network underlay to the local IP address 1516 of the virtualized computing resource instance 1512 that is currently mapped to the destination public IP address 1514. Similarly, response traffic from the virtualized computing resource instance 1512 may be routed back through the network underlay onto the intermediate network 1540 to the source entity 1520.
A local IP address, as used herein, refers to an internal or "private" network address of a resource instance in a provider network, for example. The local IP address may be located within an address block reserved by Internet Engineering Task Force (IETF) comment Request (RFC)1918 and/or have an address format specified by IETF RFC 4193 and may be capable of changing within the provider network. Network traffic originating outside the provider network is not routed directly to the local IP address; instead, the traffic uses a public IP address that is mapped to the local IP address of the resource instance. The provider network may include networking devices or appliances that provide Network Address Translation (NAT) or similar functionality to perform mapping from public IP addresses to local IP addresses, and vice versa.
The public IP address is an internet changeable network address assigned to the resource instance by the service provider or by the customer. Traffic routed to the public IP address is translated, e.g., via a 1:1NAT, and forwarded to the corresponding local IP address of the resource instance.
The provider network infrastructure may assign some public IP addresses to particular resource instances; these public IP addresses may be referred to as standard public IP addresses, or simply standard IP addresses. In some embodiments, the mapping of the standard IP address to the local IP address of the resource instance is a default startup configuration for all resource instance types.
At least some public IP addresses may be assigned to or obtained by customers of provider network 1500; the customer may then assign their assigned public IP address to the particular resource instance assigned to the customer. These public IP addresses may be referred to as client public IP addresses, or simply client IP addresses. Rather than being assigned to a resource instance by the provider network 1500 in the case of a standard IP address, a customer IP address can be assigned to a resource instance by a customer, e.g., via an API provided by a service provider. Unlike standard IP addresses, customer IP addresses are assigned to customer accounts and may be remapped to other resource instances by the respective customers as needed or desired. The customer IP address is associated with the customer's account rather than a particular resource instance, and the customer controls that IP address until the customer chooses to release the IP address. Unlike conventional static IP addresses, the client IP address allows the client to obscure a resource instance or an available zone failure by remapping the client's public IP address to any resource instance associated with the client's account. The client IP address, for example, enables the client to resolve the problem with the client's resource instance or software by remapping the client IP address to an alternate resource instance.
FIG. 16 is a block diagram of an exemplary provider network that provides storage services and hardware virtualization services to customers in accordance with at least some embodiments. Hardware virtualization service 1620 provides a number of computing resources 1624 (e.g., VMs) to customers. Computing resource 1624 may be leased or leased, for example, to a customer of provider network 1600 (e.g., to a customer implementing customer network 1650). One or more local IP addresses may be provided to each computing resource 1624. Provider network 1600 may be configured to route packets from the local IP address of computing resource 1624 to a public internet destination and to route packets from a public internet source to the local IP address of computing resource 1624.
Provider network 1600 may provide the ability for guest network 1650, e.g., coupled to intermediate network 1640 via local network 1656, to implement virtual computing system 1692 via hardware virtualization service 1620 coupled to intermediate network 1640 and provider network 1600. In some embodiments, hardware virtualization service 1620 may provide one or more APIs 1602, such as web services interfaces, via which customer network 1650 may access functionality provided by hardware virtualization service 1620, e.g., via console 1694 (e.g., web-based applications, standalone applications, mobile applications, etc.). In some embodiments, at provider network 1600, each virtual computing system 1692 at customer network 1650 may correspond to a computing resource 1624 that is leased, leased or otherwise provided to customer network 1650.
A customer may access the functionality of storage service 1610 from an instance of virtual computing system 1692 and/or another customer device 1690 (e.g., via console 1694), e.g., via one or more APIs 1602, to access and store data to storage resources 1618A-1618N (e.g., folders or "buckets," virtualized volumes, databases, etc.) of virtual data storage 1616 provided by provider network 1600. In some embodiments, a virtualized data storage gateway (not shown) may be provided at customer network 1650 that may locally buffer at least some data, such as frequently accessed or critical data, and may communicate with storage service 1610 via one or more communication channels to upload new or modified data from a local cache such that primary storage of the data is maintained (virtualized data storage 1616). In some embodiments, a user via the virtual computing system 1692 and/or on another client device 1690 may install and access the virtual data storage 1616 via the storage service 1610 acting as a storage virtualization service, and these volumes may appear to the user as local (virtualized) storage 1698.
Although not shown in FIG. 16, virtualization services can also be accessed from resource instances within provider network 1600 via API 1602. For example, a customer, appliance service provider, or other entity may access virtualization services from within a respective virtual network on provider network 1600 via API 1602 to request allocation of one or more resource instances within the virtual network or within another virtual network.
FIG. 17 is a block diagram depicting an exemplary computer system that may be used in at least some embodiments. In at least some implementations, such a computer system may function as a server that implements one or more of control plane components and/or data plane components, various virtualized components (e.g., virtual machines, containers, etc.), and/or SED for supporting the provider underlay and/or PSE described herein. Such a computer system may include a general-purpose or special-purpose computer system that includes or is configured to access one or more computer-accessible media. In at least some embodiments, such computer systems can also be used to implement components (e.g., customer gateways/routers 186, other customer resources 187, etc.) that are outside of the provider's underlying and/or provider's underlying extensions. In the illustrated embodiment of a computer system, the computer system 1700 includes one or more processors 1710 coupled to a system memory 1720 via input/output (I/O) interfaces 1730. Computer system 1700 also includes a network interface 1740 coupled to I/O interface 1730. Although fig. 17 illustrates computer system 1700 as a single computing device, in various embodiments, computer system 1700 may include one computing device or any number of computing devices configured to operate together as a single computer system 1700.
In various embodiments, the computer system 1700 may be a single-processor system including one processor 1710 or a multi-processor system including several processors 1710 (e.g., two, four, eight, or another suitable number). Processor 1710 may be any suitable processor capable of executing instructions. For example, in various embodiments, processors 1710 may be general-purpose processors or embedded processors implementing any of a variety of Instruction Set Architectures (ISAs), such as the x86, ARM, PowerPC, SPARC, or MIPS ISAs, or any other suitable ISA. In a multi-processor system, each of the processors 1710 may typically (but not necessarily) implement the same ISA.
The system memory 1720 may store instructions and data that can be accessed by the processor 1710. In various embodiments, the system memory 1720 may be implemented using any suitable memory technology, such as Random Access Memory (RAM), static RAM (sram), synchronous dynamic RAM (sdram), non-volatile/flash-type memory, or any other type of memory. In the illustrated embodiment, program instructions and data implementing one or more desired functions, such as those methods, techniques, and data described above, are shown stored as code 1725 and data 1726 within the system memory 1720.
In one embodiment, I/O interface 1730 may be configured to coordinate I/O traffic between processor 1710, system memory 1720, and any peripheral devices in the device, including network interface 1740 or other peripheral interfaces. In some embodiments, I/O interface 1730 may perform any necessary protocol, timing or other data transformations to convert data signals from one component (e.g., system memory 1720) into a format suitable for use by another component (e.g., processor 1710). In some embodiments, I/O interface 1730 may include support for devices attached through various types of peripheral buses, such as the Peripheral Component Interconnect (PCI) bus standard or variations of the Universal Serial Bus (USB) standard, for example. In some embodiments, for example, the functionality of I/O interface 1730 may be split into two or more separate components, such as a north bridge and a south bridge. Also, in some embodiments, some or all of the functionality of I/O interface 1730 (such as an interface to system memory 1720) may be incorporated directly into processor 1710.
For example, network interface 1740 may be configured to allow data to be exchanged between computer system 1700 and other devices 1760 attached to network 1750, such as the other computer systems or devices depicted in FIG. 1. In various embodiments, for example, network interface 1740 may support communication via any suitable wired or wireless general data network, such as a type of ethernet network. In addition, network interface 1740 may support communication via a telecommunications/telephony network, such as an analog voice network or a digital fiber optic communications network, via a Storage Area Network (SAN), such as a fibre channel SAN, or via I/O any other suitable type of network and/or protocol.
In some embodiments, computer system 1700 includes one or more offload cards 1770 (including one or more processors 1775 and possibly one or more network interfaces 1740) connected using I/O interfaces 1730 (e.g., a bus implementing a version of the peripheral component interconnect express (PCI-E) standard, or another interconnect such as the QuickPath interconnect (QPI) or the UltraPath interconnect (UPI)). For example, in some embodiments, computer system 1700 can function as a host electronic device hosting a compute instance (e.g., operating as part of a hardware virtualization service), and one or more offload cards 1770 execute a virtualization manager that can manage the compute instance executing on the host electronic device. As an example, in some embodiments, offload card 1770 may perform compute instance management operations, such as pausing and/or un-pausing compute instances, starting and/or terminating compute instances, performing memory transfer/copy operations, and so forth. In some embodiments, these management operations may be performed by the offload card 1770 in coordination with the hypervisor being executed by the other processors 1710A-1710N of the computer system 1700 (e.g., following a request from the hypervisor). However, in some embodiments, the virtualization manager implemented by offload card 1770 may mediate requests from other entities (e.g., from the compute instance itself) and may not coordinate with (or service) any separate hypervisor. Referring to fig. 2, in at least some embodiments, at least a portion of the functionality of the PSE framework 202 and host manager 222 is executed on one or more processors 1775 of the offload card 1770, while the instances (e.g., 232, 234, 236) are executed on one or more processors 1710.
In some embodiments, computer system 1700 includes one or more Storage Devices (SD) 1780. Exemplary storage 1780 include solid state drives (e.g., with various types of flash or other memory) and magnetic drives. Processor 1710 can access SD 1780 via interface 1730 or, in some cases, via offload card 1770. For example, offload card 1770 may comprise a system-on-chip (SoC) that includes multiple interconnect interfaces (e.g., PCIe-PCIe bridges) that bridge between interfaces of interface 1730 and SD 1780.
In some embodiments, system memory 1720 may be one embodiment of a computer accessible medium configured to store program instructions and data as described above. However, in other embodiments, program instructions and/or data may be received, transmitted or stored on different types of computer-accessible media. Generally speaking, a computer-accessible medium may include non-transitory storage media or memory media such as magnetic media or optical media, e.g., disk or DVD/CD coupled to computer system 1700 via I/O interface 1730. Non-transitory computer-accessible storage media may also include any volatile or non-volatile media, such as RAM (e.g., SDRAM, Double Data Rate (DDR) SDRAM, SRAM, etc.), Read Only Memory (ROM), etc., which may be included in some embodiments of computer system 1700 as system memory 1720 or another type of memory. Further, a computer-accessible medium may include transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as a network and/or a wireless link, such as may be implemented via network interface 1740.
The various implementations discussed or presented herein may be implemented in a wide variety of operating environments, which may in some cases include one or more user computers, computing devices, or processing devices, which may be used to operate any of a number of applications. The user or client device may include any of a number of general purpose personal computers, such as desktop or laptop computers running a standard operating system, as well as cellular, wireless, and handheld devices running mobile software and capable of supporting a number of networking and messaging protocols. Such a system may also include a number of workstations running any of a variety of commercially available operating systems and other known applications for purposes such as development and database management. These devices may also include other electronic devices such as virtual terminals, thin clients, gaming systems, and/or other devices capable of communicating via a network.
Most embodiments utilize at least one network that will be familiar to those skilled in the art for supporting communication using any of a variety of commercially available protocols, such as transmission control protocol/internet protocol (TCP/IP), File Transfer Protocol (FTP), universal plug and play (UPnP), Network File System (NFS), universal internet file system (CIFS), extensible messaging and presence protocol (XMPP), AppleTalk, and the like. The network may include, for example, a Local Area Network (LAN), a Wide Area Network (WAN), a Virtual Private Network (VPN), the internet, an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network, and any combination thereof.
In implementations utilizing a web server, the web server may run any of a variety of server or mid-tier applications, including hyper HTTP servers, File Transfer Protocol (FTP) servers, Common Gateway Interface (CGI) servers, data servers, Java servers, business application servers, and the like. The server may also be capable of executing programs or scripts in response to requests from user devices, for example by: executing one or more web applications that may be implemented as one or more scripts or programs written in any programming language, such as
Figure BDA0003436492030000541
C. C # or C + + or any scripting language such as Perl, Python, PHP or TCL and combinations thereof. The servers may also include database servers, including, without limitation, database servers available from Oracle (R), Microsoft (R), Sybase (R), and IBM (R), among others. The database servers may be relational or non-relational (e.g., "NoSQL"), distributed or non-distributed, etc.
The environment may include a variety of data storage devices as well as other memory and storage media as discussed above. These may reside in various locations, such as on a storage medium local to one or more of the computers (and/or in one or more of the computers), or remotely from any or all of the computers across a network. In one particular set of embodiments, the information may reside in a Storage Area Network (SAN) familiar to those skilled in the art. Similarly, any required files for performing the functions attributed to a computer, server, or other network device may be stored locally and/or remotely, as appropriate. Where the system includes computerized devices, each such device may include hardware elements that may be electrically coupled via a bus, including, for example, at least one Central Processing Unit (CPU), at least one input device (e.g., a mouse, keyboard, controller, touch screen, or keypad), and/or at least one output device (e.g., a display device, printer, or speaker). Such systems may also include one or more storage devices, such as disk drives, optical storage devices, and solid state storage devices, such as Random Access Memory (RAM) or Read Only Memory (ROM), as well as removable media devices, memory cards, flash memory cards, and the like.
Such devices may also include a computer-readable storage medium reader, a communication device (e.g., a modem, a network card (wireless or wired), an infrared communication device, etc.), and a working memory as described above. The computer-readable storage media reader can be connected with or configured to receive a computer-readable storage medium, which represents remote, local, fixed, and/or removable storage and storage media for temporarily and/or more permanently containing, storing, transmitting, and retrieving computer-readable information. The system and various devices will also typically include a number of software applications, modules, services or other elements located within at least one working memory device, including an operating system and application programs, such as a client application or web browser. It will be appreciated that alternative implementations may have numerous variations from the implementations described above. For example, it may also be possible to use custom hardware, and/or particular elements may be implemented in hardware, software (including portable software, such as applets), or both. In addition, connections to other computing devices, such as network input/output devices, may be used.
Storage media and computer-readable media for containing the code or portions of code may include any suitable media known or used in the art, including storage media and communication media such as, but not limited to, volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information such as computer-readable instructions, data structures, program modules, or other data, including RAM, ROM, electrically-erasable programmable read-only memory ("EEPROM"), flash memory or other memory technology, compact disc read only memory ("CD-ROM"), Digital Versatile Discs (DVDs) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the system devices. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various embodiments.
Various embodiments are described in the foregoing description. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the implementations. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the described implementations.
Bracketed text and boxes with dashed borders (e.g., large dashes, small dashes, dot-dash lines, and dots) are used herein to illustrate optional operations that add additional features to some implementations. However, such notation should not be taken to mean that these are merely options or optional operations and/or that blocks with solid line boundaries are not optional in certain implementations.
Reference numbers with suffix letters (e.g., 101A, 102A, etc.) can be used to indicate that one or more instances of a reference entity can exist in various embodiments, and when multiple instances exist, each need not be the same, but instead can share some common features or actions in a common manner. Moreover, the use of a particular suffix is not intended to imply the presence of a particular amount of that entity unless specifically indicated to the contrary. Thus, in various implementations, two entities using the same or different suffix letters may or may not have the same number of instances.
References to "one embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same implementation. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the purview of one skilled in the art to effect such feature, structure, or characteristic in connection with other ones of the embodiments whether or not explicitly described.
Additionally, in the various embodiments described above, unless specifically stated otherwise, extracted language such as the phrase "A, B or at least one of C" is intended to be understood to mean A, B or C or any combination thereof (e.g., A, B and/or C). Thus, the disjunctive language is not intended and should not be construed to imply that a given implementation requires the presence of at least one of a, at least one of B, or at least one of C, respectively.
At least some of the embodiments of the disclosed technology may be described in view of the following examples:
1. a computer-implemented method, the computer-implemented method comprising:
executing, by a first one or more processors of a first computer system of a provider network, a block storage server virtual machine to host a first storage volume using one or more storage devices of the first computer system;
executing, by the first one or more processors, a guest virtual machine having access to a virtual block storage;
executing, by the second one or more processors of the first computer system, a block storage client, wherein the executing the block storage client comprises:
receiving, from the guest virtual machine, a first block storage operation performed on the virtual block storage; and
sending a message to the block storage server virtual machine to cause the block storage server virtual machine to perform the first block storage operation on the first storage volume.
2. The computer-implemented method of clause 1, wherein the message is sent via a virtual network protected using a key to encrypt and decrypt traffic sent via the virtual network.
3. The computer-implemented method of clause 2, wherein a first virtual network address of the virtual network is associated with the block storage server virtual machine and a second virtual network address of the virtual network is associated with the block storage client.
4. A computer-implemented method, the computer-implemented method comprising:
executing, by a computer system, a first block storage server virtual machine to host a first volume using one or more storage devices of the computer system;
executing, by the computer system, a second virtual machine having access to a virtual block storage device;
executing, by the computer system, a block storage client, wherein executing the block storage client comprises:
receiving, from the second virtual machine, a first block storage operation performed on the virtual block storage; and
sending a message to the first block storage server virtual machine to cause the first block storage server virtual machine to perform the first block storage operation on the first volume.
5. The computer-implemented method of clause 4, wherein the message is sent via a virtual network protected using a key to encrypt and decrypt traffic sent via the virtual network.
6. The computer-implemented method of clause 5, wherein a first virtual network address of the virtual network is associated with the first chunk store server virtual machine and a second virtual network address of the virtual network is associated with the chunk store client.
7. The computer-implemented method of any of clauses 4-6, wherein the first chunk store server virtual machine and the second virtual machine are executed by a first one or more processors of the computer system, and the chunk store client is executed by a second one or more processors of the computer system.
8. The computer-implemented method of any of clauses 4-7, wherein the first block storage server virtual machine is a first virtual machine hosted by the computer system and the second virtual machine is a second virtual machine hosted by the computer system.
9. The computer-implemented method of clause 4, further comprising executing, by the computer system, a second block storage server virtual machine to host a second volume using the one or more storage devices of the computer system, wherein the second volume is a replica associated with the first volume.
10. The computer-implemented method of clause 9, wherein the first block storage server virtual machine executes using a first physical component of the computer system and the second block storage server virtual machine executes using a second physical component of the computer system that is different from the first physical component.
11. The computer-implemented method of any of clauses 4-10, wherein the first block storage operation is writing a data block, and wherein the block storage client is further to encrypt the data block using an encryption key associated with the virtual block storage to generate an encrypted data block, and wherein the message sent to the first block storage server virtual machine includes the encrypted data block and causes the first block storage server virtual machine to write the encrypted data block to the first volume.
12. The computer-implemented method of any of clauses 4-11:
wherein the computer system is included in an extension portion of a provider network, the extension portion of the provider network in communication with the provider network via at least a third party network; and is
Wherein an instance management service of the provider network initiates the execution of the first and second block storage server virtual machines by the computer system.
13. A system, the system comprising:
one or more storage devices of a host computer system; and
a first one or more processors of the host computer system to execute a first block storage server application and a second application having access to virtual block storage, the first block storage server application comprising instructions that, after execution, cause the first block storage server application to host a first volume using the one or more storage devices;
a first one or more processors of the host computer system to execute a block storage client application, the block storage client application comprising instructions that upon execution cause the block storage client application to:
receiving a first block storage operation performed on the virtual block storage from the second application; and
sending a message to the first block storage server application to cause the first block storage server application to perform the first block storage operation on the first volume.
14. The system of clause 13, wherein the message is sent via a virtual network protected using a key to encrypt and decrypt traffic sent via the virtual network.
15. The system of clause 14, wherein a first virtual network address of the virtual network is associated with the first block store server application and a second virtual network address of the virtual network is associated with the block store client application.
16. The system of any of clauses 13-15, wherein the first block storage server application executes within a first virtual machine hosted by the host computer system and the second application executes within a second virtual machine hosted by the host computer system.
17. The system of clause 13, wherein the first one or more processors of the host computer system further execute a second block storage server application comprising instructions that, after execution, cause the second block storage server application to host a second volume using the one or more storage devices of the host computer system, wherein the second volume is a replica associated with the first volume.
18. The system of clause 17, wherein the first block storage server application executes using a first physical component of the computer system and the second block storage server application executes using a second physical component of the computer system that is different from the first physical component.
19. The system of any of clauses 13-18, wherein the first block storage operation is writing a block of data, and wherein the block storage client application comprises further instructions that, after execution: further instructions that cause the chunk store client application to encrypt the data chunk using an encryption key associated with the virtual chunk store to generate an encrypted data chunk, and wherein the message sent to the first chunk store server application includes the encrypted data chunk; and causing the first block storage server application to write the encrypted data block to the first volume.
20. The system of any of clauses 13-19:
wherein the host computer system is included in an extension portion of a provider network, the extension portion of the provider network in communication with the provider network via at least a third party network; and is
Wherein an instance management service of the provider network initiates the execution of the first and second block storage server applications by the host computer system.
21. A computer-implemented method, the computer-implemented method comprising:
receiving, at a host computer system of an extension portion of a provider network, a first request to launch a first virtual machine to host a block storage server application, wherein the extension portion of the provider network is in communication with the provider network via at least a third party network;
provisioning the first virtual machine with at least a portion of a storage capacity of one or more physical storage devices of the host computer system as provisioned storage devices; and
executing the chunk store server application using the first virtual machine, wherein the executing the chunk store server application comprises:
creating a logical volume on the provisioned storage device in response to a second request from a block storage service of the provider network to create the logical volume;
receiving, from a chunk store client application via a virtual network, a third request to perform an input/output operation on the logical volume; and
performing the requested input/output operation on the logical volume.
22. The computer-implemented method of clause 21, wherein prior to executing the chunk store server application using the first virtual machine, the method further comprises:
executing, using the first virtual machine, another application to load a boot volume to a boot volume from a machine image obtained from a data storage of the provider network, wherein the boot volume is another logical volume of the provisioned storage;
modifying the first virtual machine to boot from the boot volume; and
restarting the first virtual machine.
23. The computer-implemented method of clause 21, wherein prior to executing the block storage server application using the first virtual machine, the method further comprises loading a boot volume of the first virtual machine from a machine image stored by a first block storage, wherein the boot volume is another logical volume of the provisioned storage, and wherein the first block storage is a virtual block storage.
24. A computer-implemented method, the computer-implemented method comprising:
receiving, by a computer system of an extension portion of a provider network, a first request to launch a first virtual machine to host a block storage server application, wherein the extension portion of the provider network is in communication with the provider network via at least a third party network;
provisioning the first virtual machine with at least a portion of a storage capacity of one or more storage devices of a host computer system as a provisioned storage device; and
executing the chunk store server application using the first virtual machine, wherein the executing the chunk store server application comprises:
creating a logical volume on the provisioned storage device in response to a second request from a block storage service of a provider network to create the logical volume;
receiving a third request to perform an input/output operation on the logical volume; and
performing the requested input/output operation on the logical volume.
25. The computer-implemented method of clause 24, wherein prior to executing the chunk store server application using the first virtual machine, the method further comprises:
executing, using the first virtual machine, another application to load a boot volume from a machine image obtained from a data storage of the provider network, wherein the boot volume is another logical volume of the provisioned storage;
modifying the first virtual machine to boot from the boot volume; and
restarting the first virtual machine.
26. The computer-implemented method of clause 25, further comprising purging memory of the first virtual machine prior to executing the chunk store server application.
27. The computer-implemented method of clause 24, wherein prior to executing the block storage server application using the first virtual machine, the method further comprises loading a boot volume of the first virtual machine from a machine image stored by a first block storage, wherein the boot volume is another logical volume of the provisioned storage, and wherein the first block storage is a virtual block storage.
28. The computer-implemented method of clause 27, wherein the loading is performed by at least one of a basic input/output system of the first virtual machine and a unified extensible firmware interface of the first virtual machine.
29. The computer-implemented method of any of clauses 24-28:
wherein the third request is received from a chunk store client application via a virtual network; and is
Wherein traffic sent over the virtual network is encrypted using a key associated with the virtual network.
30. The computer-implemented method of any of clauses 24-29, wherein the first virtual machine is one of a plurality of virtual machines, each of the plurality of virtual machines executing a block store server application, and the method further comprises:
determining that a resource utilization associated with one or more of the plurality of virtual machines has exceeded a threshold;
provisioning the second virtual machine with at least a portion of a storage capacity of one or more storage devices of another host computer system; and
executing another storage server application using the second virtual machine.
31. The computer-implemented method of any of clauses 24-30, further comprising:
provisioning a second virtual machine with at least another portion of the storage capacity of the one or more storage devices of the host computer system as another provisioned storage device; and
executing another block of the storage server application using the second virtual machine, wherein the first virtual machine executes using a first physical component of the computer system and the second virtual machine executes using a second physical component of the computer system that is different from the first physical component.
32. The computer-implemented method of clause 31, wherein the input/output operation is a write operation to write a data block to a block address of the logical volume, wherein performing the requested input/output operation on the logical volume comprises writing the data block to the block address of the logical volume, wherein the first physical component is a first memory device and the second physical component is a second memory device, and wherein the expanded portion of the provider network comprises one or more physical computing devices located outside a data center of the provider network and within a room of a customer of the provider network.
33. A system, the system comprising:
one or more storage devices of a host computer system of an extension portion of a provider network, wherein the extension portion of the provider network is in communication with the provider network via at least a third party network; and
one or more processors of the host computer system that execute a host manager application, the host manager application including instructions that, upon execution, cause the host manager application to:
receiving a first request to launch a first virtual machine to host a block storage server application;
provisioning the first virtual machine with at least a portion of the storage capacity of the one or more storage devices as provisioned storage devices; and
executing, using the first virtual machine, the chunk store server application comprising instructions that, after execution, cause the chunk store server application to:
creating a logical volume on the provisioned storage device in response to a second request from a block storage service of a provider network to create the logical volume;
receiving a third request to perform an input/output operation on the logical volume; and
performing the requested input/output operation on the logical volume.
34. The system of clause 33, wherein the host manager application includes further instructions that, after execution, cause the host manager application to, prior to executing the block storage server application using the first virtual machine:
executing, using the first virtual machine, another application to load a boot volume from a machine image obtained from a data storage of the provider network, wherein the boot volume is another logical volume of the provisioned storage;
modifying the first virtual machine to boot from the boot volume; and
restarting the first virtual machine.
35. The system of clause 34, wherein the host manager application comprises further instructions that, after execution, cause the host manager application to clear memory of the first virtual machine prior to execution of the block storage server application.
36. The system of clause 33, wherein the component of the first virtual machine includes instructions that, after execution, prior to executing the chunk store server application using the first virtual machine: cause the component to load a boot volume of the first virtual machine from a machine image stored by a first block storage, wherein the boot volume is another logical volume of the provisioned storage, and wherein the first block storage is a virtual block storage.
37. The system of clause 36, wherein the component is at least one of a basic input/output system of the first virtual machine and a unified extensible firmware interface of the first virtual machine.
38. The system of any of clauses 33-37:
wherein the third request is received from a chunk store client application via a virtual network; and is
Wherein traffic sent over the virtual network is encrypted using a key associated with the virtual network.
39. The system of any of clauses 33-38, wherein the host manager application comprises further instructions that, upon execution, cause the host manager application to:
provisioning a second virtual machine with at least another portion of the storage capacity of the one or more storage devices of the host computer system as another provisioned storage device; and
executing another block of the storage server application using the second virtual machine, wherein the first virtual machine executes using a first physical component of the computer system and the second virtual machine executes using a second physical component of the computer system that is different from the first physical component.
40. The system of clause 39, wherein the input/output operation is a write operation to write a data block to a block address of the logical volume, wherein performing the requested input/output operation on the logical volume comprises writing the data block to the block address of the logical volume, wherein the first physical component is a first memory device and the second physical component is a second memory device, and wherein the expanded portion of the provider network comprises one or more physical computing devices located outside a data center of the provider network and within a room of a customer of the provider network.
41. A computer-implemented method, the computer-implemented method comprising:
receiving, by a first block storage server instance, a first request from a block storage service of a provider network, the first request to create a first storage volume to store a first portion of a first logical volume;
receiving, by a second block storage server instance, a second request from the block storage service, the second request to create a second storage volume to store a second portion of the first logical volume;
determining that the second chunk store server instance is unresponsive;
sending a third request to a third block storage server instance to create a third storage volume to store the second portion of the first logical volume;
storing, by the third block storage server instance, the second portion of the first logical volume to the third storage volume;
updating a data storage containing an identification of each chunk storage server instance hosting a portion of the first logical volume to remove the identification of the second chunk storage server instance and to add the identification of the third chunk storage server instance; and is
Wherein the first, second, and third chunk storage server instances are hosted by one or more computer systems of an extension of the provider network, and wherein the extension of the provider network is in communication with the provider network via at least a third party network.
42. The computer-implemented method of clause 41, further comprising:
receiving, by a chunk store client from the data storage device, the identification of each chunk store server instance hosting a portion of the first logical volume;
establishing, by the chunk store client, a connection to at least one chunk store server instance included in the identification;
receiving, by the block storage client, a first block storage operation from another instance, the first block storage operation writing a data block to a block address of a virtual block storage; and
sending, by the block storage client, a message to the at least one block storage server instance via the connection to cause the at least one block storage server instance to write the block of data to the block address of the first storage volume.
43. The computer-implemented method of any of clauses 41-42, further comprising polling the first and second chunk store server instances for responsiveness.
44. A computer-implemented method, the computer-implemented method comprising:
receiving, by a first block storage server instance, a first request to create a first storage volume to store a first portion of a first logical volume;
receiving, by a second block storage server instance, a second request to create a second storage volume to store a second portion of the first logical volume;
sending a third request to a third block storage server instance to create a third storage volume to store the second portion of the first logical volume;
storing, by the third block storage server instance, the second portion of the first logical volume to the third storage volume; and
updating a data storage device containing an identification of each chunk storage server instance hosting a portion of the first logical volume to remove the identification of the second chunk storage server instance and to add the identification of the third chunk storage server instance.
45. The computer-implemented method of clause 44, further comprising:
receiving, by a chunk store client from the data storage device, the identification of each chunk store server instance hosting a portion of the first logical volume;
establishing, by the chunk store client, a connection to at least one chunk store server instance included in the identification;
receiving, by the block storage client, a first block storage operation from another instance, the first block storage operation writing a data block to a block address of a virtual block storage; and
sending, by the block storage client, a message to the at least one block storage server instance via the connection to cause the at least one block storage server instance to write the block of data to the block address of the first storage volume.
46. The computer-implemented method of clause 45, wherein the data storage device is a distributed data storage device comprising a plurality of nodes that independently execute a consensus protocol to update the identification.
47. The computer-implemented method of clause 46, further comprising obtaining, by the chunk store client, an identification of each node of the plurality of nodes from a service.
48. The computer-implemented method of clause 46, wherein a node of the plurality of nodes is executing within a container hosted by the first block storage server instance.
49. The computer-implemented method of clause 48, further comprising broadcasting, by the chunk store client to a plurality of chunk store server instances, a request for identification of any of the plurality of nodes.
50. The computer-implemented method of any of clauses 44-49, further comprising polling the first and second chunk store server instances for responsiveness, wherein the third request is sent in response to determining that the second chunk store server instance is not responsive.
51. The computer-implemented method of any of clauses 44-50, wherein the first storage volume is striped across a plurality of block storage server instances, and each of the plurality of block storage server instances is included in the identifying.
52. The computer-implemented method of any of clauses 44-51, wherein the first, second, and third chunk store server instances are hosted by one or more computer systems of an extension portion of a provider network, wherein the first and second requests are sent from a chunk store service of the provider network, and wherein the extension portion of the provider network communicates with the provider network via at least a third party network.
53. A system, the system comprising:
one or more computing devices of an extension portion of a provider network, wherein the extension portion of the provider network is in communication with the provider network via at least a third party network, and wherein the one or more computing devices include instructions that upon execution on one or more processors cause the one or more computing devices to:
receiving, by a first block storage server instance, a first request to create a first storage volume to store a first portion of a first logical volume;
receiving, by a second block storage server instance, a second request to create a second storage volume to store a second portion of the first logical volume;
sending a third request to a third block storage server instance to create a third storage volume to store the second portion of the first logical volume;
storing, by the third block storage server instance, the second portion of the first logical volume to the third storage volume; and
updating a data storage device containing an identification of each chunk storage server instance hosting a portion of the first logical volume to remove the identification of the second chunk storage server instance and to add the identification of the third chunk storage server instance.
54. The system of clause 53, wherein the one or more computing devices include further instructions that, upon execution on the one or more processors, cause the one or more computing devices to:
receiving, by a chunk store client from the data storage device, the identification of each chunk store server instance hosting a portion of the first logical volume;
establishing, by the chunk store client, a connection to at least one chunk store server instance included in the identification;
receiving, by the block storage client, a first block storage operation from another instance, the first block storage operation writing a data block to a block address of a virtual block storage; and
sending, by the block storage client, a message to the at least one block storage server instance via the connection to cause the at least one block storage server instance to write the block of data to the block address of the first storage volume.
55. The system of clause 54, wherein the data storage device is a distributed data storage device comprising a plurality of nodes that independently execute a consensus protocol to update the identification.
56. The system of clause 55, wherein the one or more computing devices comprise further instructions that, upon execution on the one or more processors, cause the one or more computing devices to: obtaining, by the chunk store client, an identification of each node of the plurality of nodes from a service.
57. The system of clause 55, wherein a node of the plurality of nodes is executing within a container hosted by the first block storage server instance.
58. The system of clause 57, wherein the one or more computing devices include further instructions that, upon execution on the one or more processors, cause the one or more computing devices to: broadcasting, by the chunk store client to a plurality of chunk store server instances, a request for identification of any of the plurality of nodes.
59. The system of any of clauses 53-58, wherein the one or more computing devices comprise further instructions that, after execution on the one or more processors, cause the one or more computing devices to poll the first chunk storage server instance and the second chunk storage server instance for responsiveness, wherein the third request is sent in response to determining that the second chunk storage server instance is unresponsive.
60. The system of any of clauses 53-59, wherein the first storage volume is striped across a plurality of block storage server instances, and each of the plurality of block storage server instances is included in the identifying.
The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made therein without departing from the broader spirit and scope of the disclosure as set forth in the claims.

Claims (15)

1. A computer-implemented method, the computer-implemented method comprising:
executing, by a computer system, a first block storage server virtual machine to host a first volume using one or more storage devices of the computer system;
executing, by the computer system, a second virtual machine having access to a virtual block storage device;
executing, by the computer system, a block storage client, wherein executing the block storage client comprises:
receiving, from the second virtual machine, a first block storage operation performed on the virtual block storage; and
sending a message to the first block storage server virtual machine to cause the first block storage server virtual machine to perform the first block storage operation on the first volume.
2. The computer-implemented method of claim 1, wherein the message is sent via a virtual network protected using keys to encrypt and decrypt traffic sent via the virtual network.
3. The computer-implemented method of claim 2, wherein a first virtual network address of the virtual network is associated with the first block storage server virtual machine and a second virtual network address of the virtual network is associated with the block storage client.
4. The computer-implemented method of any of claims 1-3, wherein the first chunk store server virtual machine and the second virtual machine are executed by a first one or more processors of the computer system, and the chunk store client is executed by a second one or more processors of the computer system.
5. The computer-implemented method of any of claims 1-4, wherein the first block storage server virtual machine is a first virtual machine hosted by the computer system and the second virtual machine is a second virtual machine hosted by the computer system.
6. The computer-implemented method of claim 1, further comprising executing, by the computer system, a second block storage server virtual machine to host a second volume using the one or more storage devices of the computer system, wherein the second volume is a replica associated with the first volume.
7. The computer-implemented method of claim 6, wherein the first block storage server virtual machine executes using a first physical component of the computer system and the second block storage server virtual machine executes using a second physical component of the computer system different from the first physical component.
8. The computer-implemented method of any of claims 1-7, wherein the first block storage operation is writing a block of data, and wherein the block storage client is further to encrypt the block of data using an encryption key associated with the virtual block storage to generate an encrypted block of data, and wherein the message sent to the first block storage server virtual machine includes the encrypted block of data and causes the first block storage server virtual machine to write the encrypted block of data to the first volume.
9. The computer-implemented method of any one of claims 1-8:
wherein the computer system is included in an extension portion of a provider network, the extension portion of the provider network in communication with the provider network via at least a third party network; and is
Wherein an instance management service of the provider network initiates the execution of the first and second block storage server virtual machines by the computer system.
10. A system, the system comprising:
one or more storage devices of a host computer system; and
a first one or more processors of the host computer system to execute a first block storage server application and a second application having access to virtual block storage, the first block storage server application comprising instructions that, after execution, cause the first block storage server application to host a first volume using the one or more storage devices;
a first one or more processors of the host computer system to execute a block storage client application, the block storage client application comprising instructions that upon execution cause the block storage client application to:
receiving a first block storage operation performed on the virtual block storage from the second application; and
sending a message to the first block storage server application to cause the first block storage server application to perform the first block storage operation on the first volume.
11. The system of claim 10, wherein the message is sent via a virtual network protected using a key to encrypt and decrypt traffic sent via the virtual network.
12. The system of claim 11, wherein a first virtual network address of the virtual network is associated with the first block store server application and a second virtual network address of the virtual network is associated with the block store client application.
13. The system of any of claims 10-12, wherein the first block storage server application executes within a first virtual machine hosted by the host computer system and the second application executes within a second virtual machine hosted by the host computer system.
14. The system of claim 10, wherein the first one or more processors of the host computer system are to further execute a second block storage server application, the second block storage server application comprising instructions that, after execution, cause the second block storage server application to host a second volume using the one or more storage devices of the host computer system, wherein the second volume is a replica associated with the first volume.
15. The system of claim 14, wherein the first block storage server application executes using a first physical component of the computer system and the second block storage server application executes using a second physical component of the computer system different from the first physical component.
CN202080047292.8A 2019-06-28 2020-06-11 Virtualized block storage server in cloud provider underlying extension Active CN114008593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310084506.4A CN116010035B (en) 2019-06-28 2020-06-11 Virtualized block storage server in cloud provider underlying extensions

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US16/457,853 US10949131B2 (en) 2019-06-28 2019-06-28 Control plane for block storage service distributed across a cloud provider substrate and a substrate extension
US16/457,850 2019-06-28
US16/457,850 US10949124B2 (en) 2019-06-28 2019-06-28 Virtualized block storage servers in cloud provider substrate extension
US16/457,856 US10949125B2 (en) 2019-06-28 2019-06-28 Virtualized block storage servers in cloud provider substrate extension
US16/457,853 2019-06-28
US16/457,856 2019-06-28
PCT/US2020/037195 WO2020263578A1 (en) 2019-06-28 2020-06-11 Virtualized block storage servers in cloud provider substrate extension

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202310084506.4A Division CN116010035B (en) 2019-06-28 2020-06-11 Virtualized block storage server in cloud provider underlying extensions

Publications (2)

Publication Number Publication Date
CN114008593A true CN114008593A (en) 2022-02-01
CN114008593B CN114008593B (en) 2023-03-24

Family

ID=71950729

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310084506.4A Active CN116010035B (en) 2019-06-28 2020-06-11 Virtualized block storage server in cloud provider underlying extensions
CN202080047292.8A Active CN114008593B (en) 2019-06-28 2020-06-11 Virtualized block storage server in cloud provider underlying extension

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202310084506.4A Active CN116010035B (en) 2019-06-28 2020-06-11 Virtualized block storage server in cloud provider underlying extensions

Country Status (5)

Country Link
EP (1) EP3987387A1 (en)
JP (2) JP7440195B2 (en)
KR (1) KR102695802B1 (en)
CN (2) CN116010035B (en)
WO (1) WO2020263578A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103503376A (en) * 2011-12-29 2014-01-08 华为技术有限公司 Cloud computing system and method for managing storage resources therein
CN103797770A (en) * 2012-12-31 2014-05-14 华为技术有限公司 Method and system for sharing storage resources
CN105283838A (en) * 2013-06-10 2016-01-27 亚马逊科技公司 Distributed lock management in a cloud computing environment
CN105549904A (en) * 2015-12-08 2016-05-04 华为技术有限公司 Data migration method applied in storage system and storage devices
CN105892943A (en) * 2016-03-30 2016-08-24 上海爱数信息技术股份有限公司 Access method and system for block storage data in distributed storage system
CN106462408A (en) * 2014-05-20 2017-02-22 亚马逊科技公司 Low latency connections to workspaces in a cloud computing environment
CN106656631A (en) * 2017-01-19 2017-05-10 武汉噢易云计算股份有限公司 Method and system of logical volume dynamic allocation on shared storage

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9294564B2 (en) * 2011-06-30 2016-03-22 Amazon Technologies, Inc. Shadowing storage gateway
US9235589B2 (en) * 2011-12-13 2016-01-12 International Business Machines Corporation Optimizing storage allocation in a virtual desktop environment
AU2014209611B2 (en) * 2013-01-22 2017-03-16 Amazon Technologies, Inc. Instance host configuration
US20140366155A1 (en) * 2013-06-11 2014-12-11 Cisco Technology, Inc. Method and system of providing storage services in multiple public clouds
US9600203B2 (en) * 2014-03-11 2017-03-21 Amazon Technologies, Inc. Reducing data volume durability state for block-based storage
US20180150234A1 (en) * 2016-11-28 2018-05-31 Hewlett Packard Enterprise Development Lp Cloud volume storage
US10684894B2 (en) * 2017-11-10 2020-06-16 Amazon Technologies, Inc. Capacity management in provider networks using dynamic host device instance model reconfigurations

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103503376A (en) * 2011-12-29 2014-01-08 华为技术有限公司 Cloud computing system and method for managing storage resources therein
CN103797770A (en) * 2012-12-31 2014-05-14 华为技术有限公司 Method and system for sharing storage resources
CN105283838A (en) * 2013-06-10 2016-01-27 亚马逊科技公司 Distributed lock management in a cloud computing environment
CN106462408A (en) * 2014-05-20 2017-02-22 亚马逊科技公司 Low latency connections to workspaces in a cloud computing environment
CN105549904A (en) * 2015-12-08 2016-05-04 华为技术有限公司 Data migration method applied in storage system and storage devices
CN105892943A (en) * 2016-03-30 2016-08-24 上海爱数信息技术股份有限公司 Access method and system for block storage data in distributed storage system
CN106656631A (en) * 2017-01-19 2017-05-10 武汉噢易云计算股份有限公司 Method and system of logical volume dynamic allocation on shared storage

Also Published As

Publication number Publication date
WO2020263578A1 (en) 2020-12-30
KR102695802B1 (en) 2024-08-19
EP3987387A1 (en) 2022-04-27
JP2022538826A (en) 2022-09-06
CN116010035B (en) 2024-06-25
JP2024073416A (en) 2024-05-29
KR20220011186A (en) 2022-01-27
CN114008593B (en) 2023-03-24
CN116010035A (en) 2023-04-25
JP7440195B2 (en) 2024-02-28

Similar Documents

Publication Publication Date Title
US10949125B2 (en) Virtualized block storage servers in cloud provider substrate extension
US11620081B1 (en) Virtualized block storage servers in cloud provider substrate extension
US11539552B1 (en) Data caching in provider network substrate extensions
US10491539B1 (en) System and method for initializing and maintaining a series of virtual local area networks contained in a clustered computer system
US10949131B2 (en) Control plane for block storage service distributed across a cloud provider substrate and a substrate extension
US11394662B2 (en) Availability groups of cloud provider edge locations
JP7135260B2 (en) Computer-implemented method and system
US11757792B2 (en) Using edge-optimized compute instances to execute user workloads at provider substrate extensions
US11431497B1 (en) Storage expansion devices for provider network substrate extensions
US11662928B1 (en) Snapshot management across cloud provider network extension security boundaries
US11159344B1 (en) Connectivity of cloud edge locations to communications service provider networks
US11659058B2 (en) Provider network connectivity management for provider network substrate extensions
US10133593B1 (en) Virtual machine migration
US11411771B1 (en) Networking in provider network substrate extensions
CN114026826B (en) Provider network connection management for provider network underlying extensions
US11809735B1 (en) Snapshot management for cloud provider network extensions
US11374789B2 (en) Provider network connectivity to provider network substrate extensions
CN114008593B (en) Virtualized block storage server in cloud provider underlying extension
US11595347B1 (en) Dual-stack network addressing in cloud provider network edge locations
US11363113B1 (en) Dynamic micro-region formation for service provider network independent edge locations
US20240036988A1 (en) Disaster recovery pipeline for block storage and dependent applications
Rivero de la Cruz High available GNU/Linux systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant